If I needed to sum up the approach used by “trivial”
content generation systems, it’s the phrase “monkeys at typewriters”, referring
to the infinite
monkey theorem. This theorem essentially states that “a monkey hitting keys
at random on a typewriter for an infinite amount of time will almost surely
type any given text”. With regards to trivial content generation systems,
replace “monkey” with “machine/computer”, and there you go, you have a content
generation system. The key piece of the infinite monkey theorem that we care
about is that the theorem itself doesn’t necessarily place any limit on the
size of the text it can generate, and thus the likelihood of any one text being
generated is 0 since it can generate an infinite number of them. However, with
digital computers, this assumption is out the window: ultimately, digital
computers are finite machines with finite memory and finite computational
capabilities. Thus, the number of things they can generate (and store; it’s an
important requirement) are finite – it’s quite a big number, but it is finite. For
example, the computer I’m typing this on has 16GB of RAM, which, assuming 1
byte (8 bits) per character and nothing else in the memory, means that I can
hold a total of 16*2^30 characters in my memory. With 2^8 = 256 possible characters, that means that there are
a total of 256^(16*2^30) possible texts I could type (256 choices per
character and 16*2^30 characters to choose), and while this is a big
number, it ultimately is a finite
number. At its core, this is another idea that is at the heart of machine
generated content: there is only a finite amount of content that can be
generated and stored by any given machine. In comparison to real media like
paintings, film-based-photographs, etc – which, in theory, have an infinite
number of possible instances so long as you avoid quantum mechanics or
discussions about the limits of human perception, neither of which am I
qualified to talk about – digital representations are only finite
approximations of this work that was drawn from the infinite ether of possible
artworks. There are an infinite number paintings that can be made on a 7x7
palette, but there are a finite number ((3*256)^(1000*1000) assuming just RGB, 8-bit pixels) of 1000x1000
pixel images that can be made representing the images on that palette. In the
context of remix culture, assuming a 44.1kHz audio sample rate and a 3 minute
song with 256 different amplitude values that can be recorded at each instant,
there are 256^(3*60*44.1*1000) songs that can be created. With our
discussions about “loss” with regards to capturing physical media digitally,
this is a part of the loss that is occurring. These digital representations are
specially crafted to contain a large amount of the perceptible information that
humans would have received from the physical copy, but they are ultimately
attempting to represent a potentially infinite amount of information in a large,
but finite, amount of space.
With this framework in mind, we can finally discuss
content generation systems built around this approach. At the bottom of the
totem pole with regards to these methods are the systems that take this idea to
heart and embrace it wholly. For example, ShitpostBot 5000 is a
Facebook page with a simple concept: users submit “templates” and “source
images”, and every 30 minutes, the bot that gives the page its name picks a “template”
along with all the source images needed to fill it out, and it posts it on
Facebook (and Twitter). A good majority of them are just meaningless
junk, but occasionally, thanks to either dumb luck or the magic of
confirmation bias, some of them actually
work.
However, the approach used in systems like these aren’t exactly exciting. Luckily,
there are many other techniques used in content generation that yield much
better results. Before moving on, it’s worth taking note that a lot of the
content the bot puts out is quite distasteful, to the point that the first
version of the page was reported enough times that it was removed from
Facebook. Regardless, on a similar level to this, we have technologies like random username
generators, random password
generators, and really, any computer data could, in theory, be enumerated
so long as you put an upper limit on it. It’s just a boring way to do things,
and in many cases, it takes an infeasible amount of human time.
No comments:
Post a Comment