On the surface, perception
doesn’t actually seem all that complicated. For humans, it comes so naturally
that it’s one of the defining characteristics of life. After all, what is a
human without the ability to see, hear, touch, feel, or taste? At a base level,
perception is simply the ability to tell the difference between at least two
things – what is sight without at least the contrast of light and dark? Intelligence,
again at a basic level, then, comes from the ability to make decisions based on
perception. Electronic computers have always had the ability to change their
execution based on the internal state of the machine (i.e. make decisions) to
the point that it is one of the key requirements for even mathematical
formalizations of a computer to be able to compute everything that is computable (fun fact: referred to as Computable Functions). With this in mind, one of the biggest
challenges in computing hasn’t been how to create a machine that makes
decisions (my alarm clock already does that based on the time) but how to digitize
perception so that computers can then make meaningful
decisions based on their perception of the environment. It’s progress in this exact
area that has essentially been the foundation of machine learning or, as it’s
commonly called these days, “AI”.
In the pre-electronic computer era, the core mathematics
involved in machine learning were mostly developed and scattered across the
fields of statistics, linear algebra, and calculus/optimization, but when
computers came onto the scene, entirely new venues of possibilities opened to
researchers – venues that were so computationally intensive that, up until that
point, they would have just been too tedious, time consuming, or boring for
ordinary humans to do. With easy computation, though, it became possible to
combine these subjects such that between the early 1950s and today, we’ve gone
from using computers for basic regression “fit a trend line to my data” tasks all
the way to being able to transfer
artistic style between images or, as the subject of my show and tell
discussed, synthesize new images just
from a description of what is in it. Many of the key advancements in
machine learning haven’t been to show that this is possible mathematically –
the field of mathematics is generally broad enough that you can ask “is there a
function that takes a set of sentences and maps each one to an image?” and it’ll
go “sure! If you can represent them mathematically somehow, there’s a function
(or at least a relation) that maps between them!” – they’ve been to create
methods such that it suddenly becomes feasible to find or create these
functions based on examples of how they behave.
In the context of new media, the capabilities offered by
these techniques are somewhat unprecedented. As more media becomes digital, it
is possible not only to duplicate effortlessly but even transform that media
into a new form effortlessly. With this in mind, the line between what is
original and what is copied grows ever thinner, and really, it’s going to be either
interesting or scary, depending on who you ask, to watch media develop in the
coming years. To a certain extent, we’re already getting to that point, and in
my project, I’m going to explore what I see as the frontiers of
machine-generated or machine-assisted-generation of media with the hope that
looking at the state of things now will give us a viewport into how things will
be in the future.
No comments:
Post a Comment