Transcription
Mario Klingemann: I mean really, you always put so much love into every detail of this conference. It has such a great vibe, and I’m really proud that you think I’m worthy to speak here, so thanks a lot to you because what I saw so far was awesome. Yes.
Audience: [Applause]
Mario Klingemann: Okay, the time is running, so I should not spend too much time on the introduction, but my talk is called Machimaginarium. I’m @quasimondo on Twitter, so you already realize I love making up weird names and, yes, there will be a few more made up words during this talk. This is about, well, the subtitle is A Journey into Artificial Creativity. Even if you are not in my filter bubble, you might have noticed over the past year or two years that kind of the headlines were AI solves this or machine learning is better than humans at that becomes more and more frequent.
You might be wondering or worrying kind of what will they become better at next. Will I still have a job in five years? I mean I will not be able to answer this. The idea is to show you a few things I am doing with machine learning, deep learning.
I don’t even like the term AI so much. I don’t think we have reached AI yet, but it sounds good. Yes, I will just show you a few things and try to answer the question if machines can be creative.
But let me rephrase that question first a little bit and ask, can humans be creative? Usually I answer that question, but in Aspen’s talk this morning I saw this Bobby McFearen video and I found out that’s a total fit, so I hope you’re not too tired, but we actually try this now. I guess you heard that.
Like it goes like, I go like, bom-bom-bom, and you have to sing with me. Okay? So please, so bom-bom-bom-bom. Bom-bom-bom. Bom-bom. Bom-bom.
Audience: [Laughter]
Mario Klingemann: Okay, it worked actually, yes, because that’s the point, right? Yes, we are very creative, but our creativity is somehow limited because what we are really good at is kind of making connections and extrapolating. But once we end up in a space where the rules are not really sure, we don’t really know kind of what’s next.
Actually, over 100 years ago Jerome K. Jerome, one of my favorite authors, had actually realized that human thought is incapable of originality. No man ever yet invented a new thing. Only some variation or extension of an old thing. I think we had heard that before in Jeremy’s talk before. Yes.
What we think is creativity is really just putting together, like making connections between things that we have already seen or other people have invented before us. We can transfer certain concepts from something we already know onto something else, but we are totally incapable of inventing something from scratch.
The question is kind of -- oh, yeah, well, a good example like a selfie stick. We had that seen before. Yes, all you need is a camera, a stick, a hand and social media, and you have something that appears new, but everything is made out of parts that have already been around.
Well, actually that’s a good thing and, in a way, that’s how I deal with creativity. I think there is no useless knowledge. In a way you try it, you have to absorb as much knowledge as you can because you never know what kind of connection you can make from that.
Machines can, of course, help us also with this approach. They are really good at trying all kinds of connections, so an early experiment of mine called Ernst, based on Max Ernst, was able to generate new collages, so I just give it this set of elements and then, of course, the machine can go infinitely through variations of the theme, stick it together, and I’ll just have to sit back and select the ones I find interesting or esthetic or novel. But of course, while the machine in this case is just a tool, I still have to make the decisions.
The thing is, like, human creativity is kind of we’re always just building on top of what other people have built before. Well, now come in the machines and the question is, can they -- kind of like, how can they help us, or can they surpass us in this process? The other question is like, who is the author when the machine does something?
Is it like -- because eventually -- is it cheating if the machine just simply looks at what we have done and then kind of makes its own new creations from that, or does it have to start from scratch? I made this kind of little illustration or based on these English children’s books, which I tried to kind of paraphrase the problem.
Peter sees the computer. “But the machine only creates what humans have taught it to,” says Peter.
“So do you,” says mommy.
That’s what I mean is kind of well, why should for the machines be there different rules than for humans? We can only build on what other humans have done before, so, well, I guess it’s a good start if the machines are allowed to do that too, and they don’t have to start from scratch. That is kind of my approach. I’m trying to kind of train my machines to learn what humans have done before us or already done, what humans find interesting, entertaining, unusual, and, well, then extrapolate, find empty spaces, and such.
Yes, the tool to help us with kind of dealing with these complex things like human creates is called deep learning. Well, pretty much deep learning was invented to turn cats into data.
Audience: [Laughter]
Mario Klingemann: And, well, the way it does it is actually, so, well, it’s not true. It turns images of cats into data. If you look at a pixel image in a computer, it is already data because every single pixel has already five pieces of information. It’s the XY coordinate and the RGB value. The only problem it’s way too much information, so it’s way too redundant. It has so much information that we don’t need.
Pretty much what deep learning does is remove the air between the pixels and try to kind of condense down the information. One of the tools used for that is a so-called convolutional neural network. You might have seen these kind of diagrams. Well, in the beginning you throw in an image of a cat and that cat goes through a series of so-called convolutional layers, which, step-by-step, reduce the information within the image so, in the end, it can tell you, yes, I see a cat in this image.
Inside, actually people think it’s a black box; we don’t know what’s going on there. But it’s actually bitmaps all the way down if you are dealing with images. You can see in the early layers where the cat goes in. It’s still, well, it looks like Photoshop filters, these classical ones, and that’s what it is. It’s really like blur and sharpen kernels, only the machine learns different ones.
The deeper you go down into that network, the more abstract the information gets. At the end it recognizes maybe something like, oh, this is fur. This is an edge. Then deeper down it combines certain things that it recognized and maybe then it says, this is an eye and that is an eye, and this is a nose, so in the next layer is says, oh, it’s a face. That’s all the magic.
Also, there are certain kind of phases inside this network, so the first layers are really dealing with superficial stuff. We call that usually the style. It’s really like gradients, textures, everything that is just on the surface. Deeper down the concepts become more abstract in a way. Well, eyes, nose, something, wheels, and almost at the end that is where it gets really interesting because it creates something, a so-called feature vector, which in practice is really just a series of numbers.
The amount of numbers tells us how many dimensions it has. Sometimes it turns a cat into 128 numbers, sometimes into 1,024 numbers. But these feature vectors are really, really special because what they actually are, are kind of like an address, a unique address of any object it sees in this multidimensional space.
Of course, we cannot imagine multidimensional spaces, but we have tools to visualize them. One is called t-SNE, and you will heard that later on. What it does is it projects this multidimensional space down into 2D so we can see it. What happens there is that things that look similar, they end up in that space in similar areas. That means that most of the cats will end up at a certain spot there, and the dogs might be somewhere else.
But since it’s multidimensional, it’s not that easy because there might be other cats living in other areas. Maybe some are drawn. Maybe some are photos. But the nice thing is really you can start calculating in this space, so you actually turn something abstract like a cat into a mathematical concept. That allows you to do all kinds of extrapolations from there.
The way I imagine these multidimensional spaces is really like a laundry machine because, what happens if you put your laundry into the machine and you pull it out again? Everything gets twisted and--I don’t know--the sock winds around the jeans and--I don’t know--a scarf is kind of in the pocket of something else. Everything is intertwined and twisted. But, in the end, the manifold of the sock or everything that is a sock is still contained within itself.
And so while, as humans, we can pick the sock and we know, yes, this is the sock manifold, and the machine pretty much does the same. You have all these things where cats fly around in this weird, twisted space, and all the machine does is try to untwist this space so it can make easy predictions on distinguishing cats from dogs. But in the end it’s just kind of a mathematical transformation of that space.
I already mentioned it. Because we have a hard time dealing with more than three dimensions, there are really nice tools to help us in getting kind of a picture of what’s going on in our data. One is called t-SNE, which means t-Stochastic Neighbor Embedding, and all the cool kids use it now, I’m pretty sure. I don’t know. Who has used t-SNE? Who is doing data visualization?
Yes, okay. Very good. Okay, so you know it. So it’s a really nice tool.
[Laughter]
Mario Klingemann: Whenever you deal with some data, you should just throw your data at it and do it because -- start again. What it does is you give it a lot of image data, for example, which has been run through a classifier. It neatly sorts it out to you into semantic clusters. You don’t even have to think about it.
This is stuff from the British Library collection where they had one million unlabeled images, and I tried to sort them out in a way by starting small, trying to separate things that I recognized from others. And, of course, using a tool like this I don’t have to do it manually. I just have to find a few examples, and then I can start sorting through this stuff.
One thing you see, well, you saw, was that these clusters are all kind of ugily together. If you really want to have a clear picture, what you want is, well, have it laid out in a beautiful, maybe a grid. Here’s an example like record covers, kind of as they came out t-SNE. This is then how a tool I wrote called RasterFairy is sorting all this thing out into something that can then be nicely layouted and, of course, gives you also even a better and clearer picture like how your data, what’s in your data, what is similar and what belongs together. RasterFairy is on GitHub, and it has quite kind of nice functionalities. And I also just like the way it animates because it feels very organic in a way how because the tricky part is, so everything that has been clumped together in the original version should still stay together. That is actually a tricky mathematical thing, and I had this little fight with Kyle MacDonald, who is faster in making this. Yes, the nice thing about RasterFairy is that you can transform any kind of cloud in any other cloud, so for I don’t know what purposes, but I liked that it’s so versatile.
I was already talking about it. The first kind of usefulness of using these more networks is that concept of similarities. If you have something, you can find more things of that same thing. That is actually becoming quite important later on. Yes, it helps you get a clear picture how your data is structured, what is actually found in there.
One thing I’m doing is simply just finding things, starting these collections. This is again from the British Library. I’m not really particularly interested in fossils, but once you start digging through a million images and you say, oh, yeah, this is the third fossil that came by, so how about I start a collection? Then you pick them all out. Then you, well, start appreciating the little beauty in there.
Or we saw kind of these rock tools before. Yes, what’s more boring than a rock? But then when you start collecting them, you just add a little bit of happiness whenever you come across it. I mean so much work went into creating these, well, these etchings. And, well, usually they just get discarded. Nobody is interested. Pulling this out, maybe somebody could re-appreciate the beauty that is actually in these.
Or these are 36 anonymous profiles. Just again something you usually just walk by. But once you’ve collected in mass, well, I find those quite esthetic.
Or 16 very sad girls, so you find these stories. You ask questions. Why are there so many sad girls in these actually pre-movie, so these were from--what was it called--thrilling stories for the masses. It was kind of mass media achieved stories and, yes, they were all kind of sad.
Well, there’s also the series of desperate men, which are usually going, like, ooh, I’m ruined, or something like that. It’s nice. You go through all this old material that has been produced before, and you get some inspiration and some ideas what to do with it.
You can also -- well, and that’s the interesting part. Of course the goal is that eventually the machine understands kind of how humans see the world and get a similar concept of that. This was an experiment I did in a workshop in Los Angeles last year with some students. I asked them to bring 20 everyday objects from home. Then we had them all photographed and run them through a computer recognition like one of those convolutional neural networks, and had the machine kind of sort of them out thematically, like we just asked them, like, sort them visually.
This is kind of the layout it created. There are some nice details because it actually put all the balls, everything round and ballsy together. Everything like computer consoles, cassettes, everything plastic was -- just by looking at the visual shape, it ordered them all together. I really like the one with the crucifix and the brush, so both made of wood, I guess, or thin things. But this is my favorite, so you see that assortment of gloves up there. Then you see the Super Mario doll and, of course, it has the white gloves, but there’s also the other white gloves, so there is something, some understanding, or at least something where we can find familiarity. The machine has something like similar ideas about how the world is stuck together like we.
Yes, then we of course did something like a real installation, which is sometimes difficult because some of the objects were bigger than the others. The students made some interactive things on top of it where we could light them, highlight them, do searches, all these kind of things.
The similarity can also be done with movies, and I always like, as you saw, collecting similar things. I applied the same thing on movie scenes on old movies. The way it works, I give it a simple example, or two or three examples. In this case hands, I showed just like scenes where there is a single hand doing something. Then have the machine find me everything else where the scene is similar where I just get to see hands. This is a collection of hands.
[Music]
Male: Becoming -- completed the first step in the process--
Mario Klingemann: Well, I liked it because, in a way, you know, maybe Christian Marclay has the same experience called The Clock, which is a 24-hour installation of movies that always show a clock in some place, so the idea is not new, again, but I really like that it’s possible now. I don’t have to manually look through the entire movies, but I just give examples and the machine retrieves them for me. I mean there might be actual useful things that are not so just abstract and artistic.
Yeah, so doing all this stuff with the British Library and kind of showing that I am able to do something with lots of data and neural networks once got me this email coming from, well, the email that people always hope for from the Google Arts & Culture in Paris. They asked me if I would like to have a residency there. Of course I didn’t hesitate at all and proposed to them, like, the Google Culture Institute has more than seven million culture artifacts in their database, like paintings, sculptures, everything, and a lot of metadata. And so my task is to find interesting new ways to make discoveries in this data or, well, approach art from this angle of saying, like, well, can we show people something? Can we interest people in seeing more art in a way or finding interesting things in art that they don’t know about?
I proposed this project called X Degrees of Separation, and I’m pretty sure you’re familiar with the concept of six degrees of separation where, in a way, every human is connected to every other human on the planet by a maximum of six steps, so I know somebody who knows somebody. In the end, you know--I don’t know--somebody who fries noodles in Shanghai.
And so I’ve asked the question, can you do the same with artwork? Can you find connections between an artwork A and an artwork B or some artifact, and then find intermediate artworks that kind of, well, give you a transition in somewhat way ever? Of course you could think I just take the metadata, so how are two artworks related? Maybe they are painted by the same artist, or they were created in the same time, or they are from the same movement.
But I wanted to, again, have the machine do the decision, so I was using the same algorithm that drives the Google Image Search. You know that one where you drag an image onto the search box, and then it gives you everything that looks similar. Well, the lucky position I was in, I’d get that feature vector I was talking about. So every image gets turned into this, exactly this 128 numbers, and then I can start building, well, almost like a roadmap, so everything has its place, and then I really treat it like if it was, well, I want to travel from Dusseldorf to Berlin, and then I take the Autobahn and, in this case, it’s the links between the artworks.
Here are a few examples. How do you get from this portrait on the left to this sculpture on the right? Well, it only looks at the visual similarities, and pretty clearly you can see that there are visual similarities. The interesting part is some of them are -- because it’s a multidimensional similarity, it goes these interesting pathways. Sometimes, yes, it’s a face or sometimes it’s just the way a certain shape is formed.
But I find the -- well, the reason I like these is because, yes, along the way, like you might know the beginning and the end because, well, everybody, when you go to a museum, you go to your favorite artists. But in between, there are so many other interesting things to discover, and there is simply no chance you will ever know about them. But maybe if you go along this path there’s a chance for serendipity where, well, in the middle you might actually discover something that you really like. Yes, it works with any kind of visual.
Here again, sculpture to painting. Well, I find these quite compelling and surprising. And one more.
Yes, there’s an online version too, but I will not show it to you now. But, yes, if you want to go there, you can play around with it yourself and hopefully make some interesting discoveries.
So far I was just talking about similarity, but if you re-imagine this kind of landscape or map that was created by every object that was known, it looked almost like continents and everything. Because we can do mathematics in there, that is where you could ask, okay, but what is actually happening in, well, the empty spaces? Or kind of, if I know the similarity between A and B and I have a value, so I can do certain measures. One idea is that when I look at an image and look at every other image I know, I can measure the distance between them. It’s really like if it looks much dissimilar than what I have known before, well, it’s new. It’s novel.
I built this bot. It lives on Twitter. It’s called The Noveltist, which actually follows right now, I think, over 1,000 accounts that tweet a lot of images. It looks at them and remembers every single one of them and only retweets the ones that look sufficiently different to what it has seen before. It does not make any sense, but anyway I create this collection of images that, well, somehow are standing out or I hope are standing out. Yes, this is an example of what it has found novel.
There’s a little problem with it, and that’s the general problem with neural networks that the model I used has been trained. While it’s kind of this general model, which is looking for mobile phones and everything, but it also knows about 300 sorts of animals and puppies. There is the danger, so the model is very specialized in puppies. The difference between a dog and a cat or two dog races is much bigger than between a mobile phone and a flower because resolution in the puppy space is so much higher, so it finds puppies extremely novel and constantly tweets them.
Audience: [Laughter]
Mario Klingemann: But it’s kind of like a way to, well, hopefully discover something that is -- because, I mean, I don’t know about you, but when I go through my Twitter feed, I get to see all the same stuff. I mean it’s probably necessary, but people constantly post the same stuff, and I find that horribly boring, so this will definitely never show me the same things and hopefully something new.
I’ve been using it for like a generative piece, which also, when you have generative art, in the end you have this 10,000 bowls of oatmeal problem, like you have this algorithm that generates art, but okay. If you look at a thousand of them, yes, they pretty much look all the same. I was trying to figure out what’s the potential range this algorithm of mine has, so I have it constantly create, and The Noveltist only picks the ones that are looking sufficiently different to kind of all the rest. Then hopefully it gives me some parameters that tell me, okay, yes, this is an interesting space I could manually explore.
Okay. Another part, which is important, is training, obviously, because if you don’t want to take models that have been pre-trained for you, you have to train your own. You might have heard that is usually dealing with a lot of images that you have to show the thing, in the thousands, ten thousand, a hundred thousand. When you train it in a way, well, you have to show it the right stuff.
I built myself some tools to help me with that, and one -- ah, yes. One example I was showing, going to show is, so in the British Library collection there are all these decorative initials. Typically when you do character recognition, you deal with, well, handwritten or typically like well-defined ones, but I was asking, can I train the machine to recognize all these crazy letter forms, all these difficult ones?
For that I built me a little tool, which uses this classic method of swiping. The machine tells me, I think it’s an A, and I say left swipe nope and right swipe yes if it’s correct. That method, so you want to be super fast with it, and this allows me to go manually through 1,000 or 2,000 images in an hour and very quickly make the model better and better because that is unavoidable if you want to train it on your individual data, which I think is very important if you want to stay in control what the machine does.
Yes, in the end that worked totally fine. These are some examples what kind of A’s it recognized. Even though they are totally different, because the machine has only been trained on letter forms, it has become really good at recognizing them.
It has become good like this that it says, okay, I think it’s a B. I thought, no, it’s a ruin, but it was insisting on it’s a B, so I looked in the original scans where this image was found, and actually the machine was right.
Audience: [Laughter]
Mario Klingemann: It is a B. That’s a problem, right? If everything is a hammer, everything looks like a nail, so this machine is seeing letters everywhere because it doesn’t know anything else but letters.
It creates all these nice -- like I give it any kind of set of images and it defines for me, with certain probability, things that look like letters to it. I thought that’s actually the interesting part, so I started this collection of things it thinks are letters but are not. This is, I guess you can guess, the A collection, so everything the machine thinks looks like an A, like an M, T. And, oh, I skipped the one, yes. I think I had some more T’s in there, but yeah.
Again, it’s nice because it’s like a human inspiration works, right? In the way, the best stuff is when you, well, actually, have a random inspiration or you come across something somewhere else and suddenly realize how interesting that could fit into your current problem.
But so far we had the similarity. We had kind of like novelty detection, but of course what I wanted to show you is how to generate, how to create with these machines. Obviously that’s possible too. Well, because I like making up names, I have a title for this whole process because it’s not really kind of -- I’m not really a creator. I call it neurography because it’s like photography in these virtual, latent spaces, in this multidimensional spaces.
I, as an artist or, well, however we might call it, go into these spaces and look for interesting imagery that the neural network has generated. I have to still select, but the machine generates. Of course I have control. Unlike a photographer, I can shape that space by training, by selecting what material I want. But eventually I can just look around and pick what I like.
The first step is, for example, called neural portraits. It’s the idea to say, like, okay, if I train a model on a lot of faces, it’s then how does the machine actually perceive a human face? What I’m using for this is this paper by Nguyen A.--I don’t know--so you can read it. Sorry, I can’t. Because in the end the heroes in this whole story are the scientists. I’m just plucking the low hanging fruit.
There are all these fantastic papers out there where I’m seeing something like this and say, oh, could this be used for my purposes? What this does is, after I’ve trained the network, it feeds in pretty much like a random noise into the network and says, okay, what do you recognize? Then it says, okay, nothing, but maybe a one percent chance for a dog. Then it changes the incoming noise more and more so that bar with the dog, if I want the dog or, in this case, the face, becomes higher and higher. It shapes that noise that I send in more and more in order to maximize the output for a certain category, in this case faces.
I trained it on a lot of faces and, as you can see, yes, they are white because--I don’t know--I harvested them, so there is that personal bias or, in this case, laziness that I just went through Tumblr and scraped--I don’t know--100,000 faces. This is what the neural network then says it recognizes in a face. Yes, it doesn’t need much. It’s very creepy, but yes it’s just a few eyes, a big mouth, and some hair. But, yeah, it’s definitely kind of a new way of looking at humans.
Yes, it’s very easy to be creepy. This is the big version when you scale it up because there’s this other issue we cover later on is that these things are right now all in a rather low resolution, so it needs some tricks to get a bigger resolution. This looks actually quite happy. I think it looks more like a happy dog.
Audience: [Laughter]
Mario Klingemann: Or this, but, yes, we recognize some face in there, but definitely not a true face. But that is where I think it gets interesting because, yes, if I wanted to create a face that looks like a face, I can use a 3D tool or something like that. I’m more interested in finding these in between spaces where it looks kind of weird or new to me.
What you can also do is you can take an untrained network, so one that is coming out of the box. Usually these untrained networks are not empty, but they are trained. They’re filled with random numbers, so you have this cascade and all the weights that control these convolutions are just random. The question is, what happens if I feed something in there and try to emphasize a random category? Will it just get me noise? No, it actually creates something like this, and it definitely has, well, it feels like an abstract painting to me, and there are sometimes things we think we recognize or at least they are interesting compositions.
Again, it’s a nice, inspirational tool maybe for a color scheme or even to print it out. Well, there’s definitely something not human going on. But yeah, I don’t know. I can’t really describe it, but I find it kind of, yeah, it’s in this weird space in between that I’m actually looking for. And a few more bigger ones.
You can also animate because this space is continuous. In this case I’m going, flying through this multidimensional space in a way and say, okay, highlight me a certain category. This is then kind of what it looks like inside if I try to stay abstract and not get too concrete. As you can see, the resolution is really low, but, well--
What you can also is, when I saw this is, well, okay. It looks like it could be definitely used for a music video or so. I thought, okay, but I want the machine to entirely be able to create the video. I don’t want to give any kind of advance. Like I want to see what happens if I just try to connect the music to the visual by, again, treating the parts of the music the same way I treat the images.
I order the frequencies and songs and sound snippets by similarity and, if something sounds similar, it should end up in a similar space in this abstract, multidimensional space. It starts somewhere, and then it says, okay, this sounds different to what I know, so move to a random position. But then later on in the song is says, oh, this sounds like something I’ve heard before, so let’s go back there.
I don’t play the entire video, and excuse me for the techno. I have to talk with you because I think you have to make me some better music.
[Music]
Mario Klingemann: I cut this short because I see my time is running out. Just keep in mind you can see the rest on YouTube if you like this kind of music. It goes on, but it has some interesting elements where you really see that, yes, it is kind of synched to the music.
But abstract is not everybody’s, well, cup of tea, so I thought, like, how do I get human elements into my art, and how do I harvest, like, something that is in the world? So I thought, well, to get human elements in there, we are interested in human, so I need to have something like human poses. Where do I find human poses? In photos. There is again neural networks that you can give a photo and what it will give you back is kind of a stick figure of how it -- well, how it believes a human is standing there.
After harvesting about 100,000 of different poses, I can interpolate between them. This animation that you see is actually not from a video. It’s really just going through, like, randomly jumping between poses it had found in totally different photos and adding a little bit of physics or momentum. Nobody ever moved like that, but I saw this kind of like there was this little phase where it looks like doing this. Yes, like usually, ah, I think this, yes, or like how I usually dance. And I thought, oh, okay, how about if I take this pose and connect it to that same engine I used before with the music?
Audience: [Laughter]
Mario Klingemann: So let’s make the puppets dance. Again, so it’s not video. It’s just random poses it makes up.
[♪ It might seem crazy what I’m about to say. Sunshine she’s here, you can take a break. I’m a hot air ballon, I could go to space. ♪]
Mario Klingemann: I think this is good.
[♪ With the air, like I don’t care baby, by the way. Huh. Because I’m happy, clap along if you feel like a room without a roof. Because I’m happy, clap along if you feel like happiness is the truth. Because I’m happy, clap along if you-- ♪]
Mario Klingemann: Okay. Again, I will cut it short. You can find it on YouTube, but this is nice, right? I can have something dancing without even having a real person and, well, let’s see where we go with that because this is now actually getting very interesting. You have to remember these four, three letters, GAN, because this is kind of what everybody is using now. It’s called Generative Adversarial Network, and it’s magic because it’s actually not one, but it’s the principle is you have two neural networks.
One is trying to be a forger. It tries to generate images. It tries to be really good. Let’s say it tries to be really good at faking Picasso paintings, so all it does is it, well, generates a painting and says, this is a Picasso.
Then you have a second neural network, which is the critique or kind of the detective. It tries to catch the first neural network cheating, so it knows about Picasso paintings, and it looks at it and says, no, no, no, that’s definitely not one.
Well, but what happens is whenever it tells the forger that it did a mistake, the forger learns from its mistakes. So at some point suddenly the forger gets better and tricks the critique into believing it has created a real Picasso. But of course then, because we have the control, the critique learns how to get better in distinguishing these little differences.
In this process, they’re continually getting better and better in, well, whatever you try to train it on. One famous example, famous at least in my circles, the example is this model called pix2pix. It’s pretty brilliant because it’s super intuitive. What you do is you give it one image, which is kind of, well, telling it this is what I will give you. Then you give it another image that says this is what I want you for me to create.
It’s really two bitmaps, and you can be super creative about what kind of combinations you give it. You just have to give it a lot of examples. And so you can see--I don’t know--down here you train it on drawings of handbags, and then it generates handbags, but scientists just kind of sometimes don’t have the right feel, right, because who wants drawings of handbags?
How it got really popular and I think you have seen that one is that edges to cats because, yes, that immediately kind of got it through the roof. When that came out, I was like, grrr, why didn’t I think of it, because, of course, I had been playing around with it before? But I thought, like, okay, I’ll try something else.
You know that classic example. You have something blurry in a movie, and then they say, oh, can you enhance that? Then you suddenly see the face of the whatever, culprit, murderer, something. I thought, can the machine do that? If I give it a very blurry example of an image and the original, will it eventually be able to reconstruct the original if I just give it a blurry image.
This is what I’m getting. On the left, this is my input. On the right, this is what the machine imagines. This is actually interesting because, well, as you can see, a lot of information has been lost in the left image, and the machine has to make up all the details. At that point it actually has to get creative in a way. Whilst this is not like a photo, I really like this look and the abstraction of it.
If you have kind of corser details, you actually can almost get back to the original image. But, yes, there is again we’re a little bit in creepy land, so it’s not really an enhancement, but I like the artifacts and things that are going on there.
Instead of calling it “enhancement,” I call this technique called now “transhancement” because in a way it tries to enhance it, but also transforms it. Well, if you give it regular photos, you get these kind of things and, yes, it’s creepy or sometimes quite interesting if you have, again, not so fine details. You almost get back to the original, but yes, oooh.
But what I really like about it is the artifacts, the little details, because it generates kind of new types of artifacts I have not seen yet because they are between paintily and digital and, well, I find those really attractive. Again, like a closeup, and so I feel I’m in a good space here.
Two minutes, so I have to go faster and faster. A few more. Well, obviously you can -- because it can make up infinite detail, the deeper you zoom into something, it can always generate new detail. Yeah, you can do. The original eye was, I think, like 16x16 pixels, so it can make up all these details.
But, as you saw, I was using photos other photographers had shot, so I thought that’s not a good idea because sometimes you might recognize them, so how about if I try to generate my own faces too? In this case I trained it on so-called face markers, which is a technique used in face recognition where you get 68 markers, just points where your mouth is, your nose.
Say, this is what I give you, and please give me -- and then on the right I gave it a lot of portraits from the British Library collection, so the left sketch makes the entire right portrait. And none of these faces really exists in reality, so there it’s getting more interesting.
Another video like if it’s slower. You see, well, you must like artifacts, but we get better at the artifacts, so there was young Elvis coming now I think.
Audience: [Laughter]
Mario Klingemann: No, there’s some hipster again, but okay.
Audience: [Laughter]
Mario Klingemann: But of course now you are free because you, well, I can give it a few more eyes or can give additional details. That’s where it gets interesting. You’ve trained it on a certain rule set, but then nobody forbids you to break out of that rule set. It knows eyes, so it can do more.
This was all trained on -- no, what do I skip? Okay. Well, let’s -- oops. Nope.
Let’s have one more look because this was training on the British Library and all different faces, but I also trained it on a music video of just Francoise Hardy, and so this looks like a video, but it’s all based again on just the face markers and it reinvents her. So it makes just every motion, every movement of the hand, every zoom is just generated by the machine by looking at five different music videos of her. So it’s possible to not be totally creepy with it.
And this is another one where this is actually -- so you can see the resolution gets better. But, well, okay, it looks like some face. It could be used as a base. I think I don’t want to go into creating realistic faces. Just showing off, but again, like another face generated, then transformed through style transfer or style transfer, but all these original faces do not exist. So, well, once you have this material, you can expand in various directions.
I skipped the face transfer now. Well, play it quickly. But of course this technique is dangerous because, yes, you can just create a sock puppet and make people say things they usually don’t say. Yes, quickly, but--
Kellyanne Conway: I mean if we’re going to keep referring to our press secretary in those--
Mario Klingemann: --but I skip this because I want to show the last thing, and I go two minutes over.
We have now the faces that I can generate, and I showed you before how I harvested the poses from the world. Now of course I think, like, okay, how do I get the poses into my images? I thought I can train that same model also on not just a single face, but I can just give it a stick figure and then it will output me the rest of the painting.
And this is, for example, the thing. So on the left top you see the stick figure I feed in, and on the right you see what the entire thing it has output for me. Or another one, stick figure on the left, something on right. Of course the results depend totally on what kind of training material I use, but this image is definitely not in my training set.
This I call Solitary Confinement, this series, because there’s this weird thing, based on my training set, I get this whole series of children in prison, or I don’t know. It looks like somebody in a pajama and some prison cell. I really like mostly the background texture and stuff. Well, what I do, I just move through the space and have to pick, and that’s the interesting part. I’m not even sure if I’m the creator or more the curator. But, yeah, the machine offers me a lot of options. Again, like something--I don’t know--a Roman motorcycle courier. Something more abstract body thing. Or it’s getting creepy again.
But yes, all of these are generated by the machine, and so I just make the pick. So one more, and some paper brick cover, Pulp Fiction. I like this. It looks like some race car driver, astronaut, and especially that artifact on the left, I really like. Or you can go even less concrete and overdrive the settings, go into weird poses, add some style transfer. And again, well, the space is so huge that, well, it’s more hard to decide where to go, but there are all these new fields to discover.
Okay, so now you saw all this and, well, you can decide yourself if you already have to be worried that you will be replaced by the machines any time soon. I think it’s an extremely powerful tool that is at our hands right now. I believe that, especially if it comes to pop art or kind of mass market produced art like the stuff you can buy at Bauhaus or all these other things, well, the machines definitely will be able to create something that looks like art. But eventually, well, they still do it for humans, so we still have a role as consumers at least. But if you enter the field now, you might still have the possibility to shape it and to steer where it’s going because right now, yeah, we still have a lot of steering possibilities, and so I think you don’t have to worry yet.
Thank you very much.
Audience: [Applause]