#btconf Berlin, Germany 05 - 08 Nov 2018

Léonie Watson

Léonie Watson is the Director of Developer Communications at The Paciello Group, member of the W3C Advisory Board and co-chair of the W3C Web Platform WG, technology writer and speaker. She began using the internet in 1993, turned it into a career in 1997, and (despite losing her eyesight along the way), she's been enjoying herself thoroughly ever since.

Prefer to watch this video on YouTube directly? This way, please.

I, Human

Asimov’s Three Laws of Robotics gave us a set of principles for governing the behaviour of Artificial Intelligence (AI). Asimov also said:

“The Three Laws are the only way in which rational human beings can deal with robots - or anything else. But when I say that, I always remember (sadly) that human beings are not always rational!”

In “I, Human”, Léonie uses the Three Laws of Robotics to explore what it means to be a human with a disability, in a world of AI and smart technologies.

Transcription

[Applause]

Leonie Watson: Thanks.

[Applause]

Leonie: Good morning. Hello.

Humans are extraordinary. We are all extraordinary. We are smart and intelligent. We’re sometimes stupid and sometimes horrible. We’re capable of great creativity and of great destruction.

We’re resilient as human beings, and we’re also vulnerable. But, perhaps more than anything, we are curious and we are adaptable. And, that is probably why we find ourselves trying to understand artificial intelligence when we still really don’t understand ourselves because, I think, we recognize that, through artificial intelligence, we can explore the possibility of amplifying our humanity.

It turns out, we’ve been dreaming and writing about artificial intelligence for a very long time. There’s a book from the third century called “The Argonautica,” the tale of Jason and the Argonauts. They encounter a bronze statue that is protecting the island of Crete. It throws boulders at them when they sail too close. Perhaps we see the first example of humanity thinking of an artificial intelligence as a weapon or a form of protection.

In “Metamorphosis,” a book by Ovid, there are a number of stories, one of which tells the story of a sculptor who creates a sculpture of a beautiful woman, and he falls in love with his statue. The goddess Aphrodite eventually brought the statue to life. When he kissed the statute, she came to life, began breathing, and here we see, perhaps, the first example of an artificial intelligence as a form of companionship, even of love.

“Frankenstein” was a novel written by Mary Shelley in 1818. It’s recognized as one of the first works of science fiction. In it, we see the possibility of electricity, perhaps the foreteller of digitization being important to artificial intelligence, of us creating intelligence and life in another form.

In 1872, Samuel Butler wrote a book called “Erewhon.” It actually drew on a short paper he’d written some years before. It was entirely fictional, but it put forward the idea that perhaps machines could learn and evolve, following Darwin’s principles in the way that humanity had done. He was laughed at, at the time but, if we look back now, in the days of machine learning and intelligence that can learn and adapt for themselves, it suddenly doesn’t seem like quite such a laughable idea after all.

In 1950, Isaac Asimov wrote “I, Robot,” another collection of short stories that included the Three Laws of Robotics. He actually put them forward in a book apparently from 2058, the “Handbook of Robotics,” and he declared these Three Laws of Robotics as being the ways that humanity, humans can interact with robotics and artificial intelligent lifeforms.

Asimov had something interesting to say about those three laws. He thought that they were the only way that rational human beings could interact with robots or, indeed, interact with anything else. But then, he noted, sadly, he always tended to forget that humans were not really always that rational. It turns out that not only are we not rational, but we’re actually really complicated.

It was about 300,000 years ago that homo sapiens, humanity as we know it now, really emerged onto the historical map. We were not really very much like we would recognize ourselves today, but our origins come from many hundreds of thousands of years. It’s taken us a long time to get from our first origins to the point where we are walking around with technology and so many of the ideas that we see around us now.

About 50,000 years [ago], we entered a phase known as behavioral modernity. This is when, as humans, we started developing many of the characteristics we do recognize in ourselves, the characteristics of creativity, painting, cave paintings. We have lots of evidence of that, of collaborating, working in teams for hunting, for gathering, for raising baby humans, for abstract thinking, for making markers, for storing knowledge for the first time outside of our brains because we were able to leave prints on cave walls or markers in the rock and sand. It’s from this point, really, that humanity, as a thing we are now looking at through the guise of artificial intelligence, starts to become known to us.

What is human intelligence? Well, it’s lots of things. We really don’t understand it particularly well, but we know something of the inputs and outputs that go into forming human intelligence. It’s the things that we see, the things that we hear, the way that we speak and communicate and, of course, the thinking that underpins all of those things that we do as humans pretty much all the time, every day.

Seeing uses about 30% of our brain power when we are looking at the world. Each optic nerve has about a million fibers in it processing light and information to the correct centers of our brain. Yet, seeing is not necessarily that straightforward.

There’s an optical illusion on the screen where you can see two characters in some squares. Ask yourself, what color are those characters? There’s a good chance that your brain will think they are not the actual colors they are, that they appear on screen. It’s easy to fool the brain with an optical illusion into thinking it is seeing something it isn’t. Despite all that massive processing and power in thinking, we can still fool our sight to a certain extent.

Sight is also one of the areas where, as humans, we’re vulnerable sometimes. People like me can be blind. They can have low vision when they don’t see very well at all. They can have conditions like color blindness where certain color combinations prove problematic and difficult to see. We can have situational and temporary visual disabilities. Bright sunshine, for example, that makes it difficult for us to see what’s on the screen in front of us.

Hearing only uses about 3% of our brain at any given time. Our audio nerves have just a matter of a few thousand nerves in them, but we’re still processing a lot of audio information. When we listen, we’re thinking incredibly complex ideas and thoughts behind it.

If we listen to this famous piece of music, you’ll perhaps feel a little bit whimsical, a little bit happy, uplifted. Let’s see what you think.

[The Beatles singing “Hey Jude”]
♪ Hey Jude, don’t make it bad. Take a sad song and make it better. ♪
Leonie: But, if we do something very, very simple, like shift that into a minor key, then the emotion that our brain puts in behind the listening experience becomes very different.

♪ Hey Judge, don’t make it bad. Take a sad song and make it better. ♪

Leonie: While you’ve been listening to that, assuming that song is familiar to you, you’ll have been thinking about the last time you heard it, the first time you heard it, the emotions that went with those times when you heard that piece of music, how it makes you feel now, and a whole bunch of other things that your brain is tick-tick-tick-tick-tick feeding into that listening experience.

Of course, hearing is another area of human vulnerability. Some people can’t hear at all. Others don’t hear so well. Again, environmental situational conditions make it difficult to hear. You can all hear me now, I hope, because of technology. But, if we were all in a loud and noisy bar, there’s a good chance we’d all be struggling to hear each other. So, a lot of vulnerability for humans comes with hearing.

Speaking uses about 50% of our brain at any given time. We think of it as being quite easy. Most people do speak. But, when you think of something very, very simple, like saying, “Hello,” this is how complicated and different it can be.

[Several speakers saying “hello” in different languages]

Leonie: That’s perhaps 20 of the many hundreds of different human languages that we speak around this planet, and so many different ways of just forming a simple greeting.

Of course, not everyone is able to speak. Some people have the words formed correctly in their brains but, for some reason, the messages don’t reach the physical act of speaking. Their lips don’t form the right words. Some people lack the physical capability of speaking and so, for various reasons, not all humans find it easy or even possible to communicate using speech.

We come back to artificial intelligence and the seminal paper by Alan Turning in which he really first put forward the idea that we recognize now as artificial intelligence, the possibility that we could create a machine that had intelligence. There was a conference in 1956 at Dartmouth College in the United States when they put forward the hypothesis that every human action and form of intelligence could be so precisely described that it could be replicated in machine form. This really is the beginning of what we now know to be the AI industry or professional area of specialty.

What of our Three Laws of Robotics? Well, the first law says that a robot, through action or inaction, may not allow a human to come to harm. Allowing a human to come to harm is an interesting concept. One thing we are exploring through artificial intelligence is how we can use AI to prevent humans from coming to harm by filling in some of the gaps in our vulnerabilities.

One of those is seeing. One of the earliest forms of AI was put into image recognition, the act of processing, analyzing, and understanding the content of images in order to make some form of, sense of, it all.

In 1976, they estimated that simulating retinal reality would take a number of actions per second. The fastest computer at the time, a Cray 1, was horribly incapable of that kind of processing power. So, in the mid-1970s, we had some good ideas about how to recreate the seeing experience, but we didn’t have the computing power to be able to drive it forward. It wasn’t until 2011, and we knew that they averaged between 10,000 and a million MIPs that really had a proper understanding of quite how much computing power was required to simulate even the most simple forms of vision.

In the 1980s, Moravec pronounced his paradox. He realized that when it comes to artificial intelligence, doing the complicated stuff like algebra is actually the easy bit. It’s the low-level replication of intelligence that’s the damned difficult bit to solve. In other words, we can get computers to solve complex equations, real-time simulations, and all sorts of things that our human brains really struggle to do, but what we really find the most difficult of all is creating an intelligence that can do the stuff that we as humans find incredibly simple.

Let’s take an example: a picture of a chair. This one happens to look pretty much like you might expect a chair. It’s got legs, something you sit on, and a back. And so, we could sit down and write a set of rules that would help a machine identify a chair. It’s got legs, probably four, something to sit on, and a back.

That’s all very well until we take another chair, a camping chair this time that looks nothing like the first chair. As humans, we can pretty much recognize it as a chair straight away, but the heuristics, the rules that we determined for the first example have gone straight out of the window. So, we add more rules in to tell the difference between the previous chair and this chair.

Then we take another chair. This one looks nothing like the first one or the second one. All our rules go out of the window again, and we have to start over. In short, putting heuristics together to identify even the most simple human identification task is extremely difficult.

It wasn’t until 2006, and Eric Schmidt of Google first announced Google’s cloud infrastructure and brought the word “cloud” into common use. It wasn’t until we had access to infrastructure, architecture, server power, [and] processing power in the cloud that was elastic, expandable, and could be developed and adapted to our needs that we really started to be able to put the computational power we needed behind the artificial intelligence that we knew was possible.

By the time 2016 came around, even a mainstream platform like Facebook was able to start deploying artificial intelligence in its stream. They introduced image recognition. Up until this point, if you used Facebook as a blind person and one of your friends posted a picture, unless they happened to describe the picture in the caption, which almost never happened, you had no idea what was in the picture at all. With this image recognition, what you get now is simple but, actually, pretty effective and very useful descriptions.

This picture was posted the day I most recently changed my hair color. The description is, “It’s a picture of a person indoors and smiling,” which is very true. It’s not quite smart enough to understand that the reason that picture was posted was because my hair color was different. And, at that point in time, Facebook’s recognition API wasn’t quite smart enough to know who I was. That has changed now with Facebook. Slowly and surely, the image recognition capability is getting more and more useful, more and more accurate.

In 2017, Microsoft came up with an app called SeeingAI. This really lifts the notion of artificial intelligence as a means of helping people who can’t see, see more of the world to a whole other level.

Narrator: SeeingAI is a Microsoft research project for people with visual impairments. The app narrates the world around you by turning the visual world into an audible experience. Point your phone’s camera, select a channel, and hear a description. The app recognizes saved friends.

SeeingAI: Jenny. Near top right. Three feet away.

Narrator: It describes the people around you, including their emotions.

SeeingAI: 28-year-old female wearing glasses looking happy.

Narrator: It reads texts out loud as it comes into view, like on an envelope.

SeeingAI: Kim Lawrence. PO Box.

Narrator: Or a room entrance.

SeeingAI: Conference 2005.

Narrator: Or scan and read documents like books and letters. The app will guide you and recognize the text with its formatting.

SeeingAI: Top and left edge is not visible. Hold steady.

[camera shutter snap]

SeeingAI: Lease agreement. This agreement--

Narrator: When paying with cash, the app identifies currency bills.

SeeingAI: Twenty U.S. dollars.

Narrator: When looking for something in your pantry or at the store, use the barcode scanner with audio cues to help you find what you want.

[fast beeps]

SeeingAI: Campbell’s tomato soup.

Narrator: When available, hear additional product details.

SeeingAI: Heat in microwave full on high--

Narrator: And even hear descriptions of images in other apps like Twitter by importing them into SeeingAI.

SeeingAI: A closeup of Bill Gates.

Narrator: Finally, explore our experimental features like scene descriptions--

[camera shutter snap]

Narrator: --to get a glimpse of the future.

SeeingAI: I think it’s a young girl throwing a frisbee in the park.

Narrator: Experience the world around you with the SeeingAI app from Microsoft.

[music]

Leonie: The difference this technology makes just cannot be underestimated. When I first started using it, I remember sitting in a hotel eating breakfast one morning and just thinking to myself, “I wonder what, you know, what would I see if I could look up.” Normally, I’d ask the person I was with and would have a description. But, instead, I got out my phone, just pointed it straight ahead of me, and took a picture. It told me I was sitting next to a window, a window that I would have had no idea was there, let alone what was outside and through the window, if it hadn’t have been for this artificially intelligent driven app that uses lots of big data, lots of cloud architecture to bring all of this to my phone, to something I can carry around with me that helps me overcome so many of the challenges that I experience on a day-to-day basis.

In 2018, Apple introduced face recognition to its iOS platform and devices. We saw another step change in AI as a means of overcoming something that humans find difficult.

[NVDES singing “Turning Heads”]
♪ Black and white city, city wall, city wall. Oh. ♪
♪ Oh, she could be dancing down the hall, dancing down the hall. ♪
♪ Oh, we’re turning heads. We’re turning heads. We’re turning heads. ♪
♪ Oh, yeah. Oh, yeah. Oh, yeah. ♪
♪ And nothing can stop us now. ♪

Leonie: And, in doing so, they introduced something that’s not only very useful to people, in general, but it also helps lots of people overcome some of the difficulties, people who find passwords difficult to remember and recall, people who lack physical or mobility, dexterity, and find entering a password cumbersome or difficult. With this and, of course, with other forms of biotechnology or bio-identification, things have suddenly become a lot easier. In this case, because of image recognition. In other words, our artificially intelligent way of simulating sight.

Law number two: A robot must obey orders given to it by human beings, except where it should come into conflict with the first law. Well, to obey a human being, the thing has got to understand the human being and, to some extent, it has to be able to communicate back with us. And so, we see another form of artificial intelligence, that of speech recognition; being able to listen, understand, and identify spoken words; something else that we have put AI into for a long number of years.

Back as far as 1993, Apple introduced voice recognition on its Macintosh platform.

[music]

Macintosh operator: Macintosh, open letter.

[ding]

Macintosh operator: Macintosh, print letter.

[printer humming]

Macintosh operator: Macintosh, fax letter.

Narrator: While everyone else is still trying to build a computer, you can understand--

Male: Macintosh, shut down.

Narrator: --we’ve built a Macintosh that can understand you.

Macintosh: Good-bye.

[music]

Leonie: And so, in the mainstream, we saw, for one of the first times, the ability to talk to computers. In 1997, Dragon became one of the first pieces of software that we actually recognize as speech recognition software, something that is specifically designed to listen and interpret human speech, in this case, to convert it into text or, in more recent years, into forms of interaction with a computer. You can control your computer using Dragon software.

Narrator: This is Dragon Naturally Speaking, comma, speech recognition software that turns your voice into text three times faster than typing with up to 99% accuracy, period.

Leonie: That software that was originally used by lawyers and doctors who needed convenient ways of dictating text because their hands were otherwise occupied, but now is used by millions of people around the world to help them interact with their technology more easily because the AI helps the software understand what they’re saying and translate it into another form. We also have the speaking part of the deal, the ability for technology for AI to take text or some other form of data and translate it into synthetic speech so that we, as humans, for the most part, can understand what the AI wants to communicate to us.

A long, long time ago, Kempelen created a set of lungs and a voice box that could effectively simulate human speech. It was very mechanical, very simple, but it’s one of the first examples we have of where humanity was trying to recreate the human idea of speech.

1931, at the World’s Fair, there was a demonstration of something called the Voder machine. This too was mechanical. It was entirely built of systems that were basically pumps and pulleys, and probably sticky tape and all sorts, but it achieved a really quite remarkable degree of speech simulation.

Voder operator: The machine uses only two sounds produced electrically. One of these represents the breath. [blowing air] The other, the vibration of the vocal cord. [humming] There are no phonograph records or anything of that sort. Only electrical circuits such as are used in telephone practice.

Let’s see how you put expression into a sentence. Say, “She saw me,” with no expression.

Voder machine: She saw me.

Voder operator: Now say it in answer to these questions. Who saw you?

Voder machine: She saw me.

Voder operator: Whom did she see?

Voder machine: She saw me.

Voder operator: Did she see you or hear you?

Voder machine: She saw me.

Leonie: By the 1960s, 1961, we had begun using digital synthetic speech, and we’d also began trying to make it sing.

[IBM 7094 singing “Daisy Bell”]
♪ Daisy, Daisy, give me your answer to. ♪
♪ I’m half crazy, all for the love of you. ♪

Leonie: Well, why wouldn’t we? All the computational power in a building the size of an airport, and we would make it sing. And, of course, where would 2001 Space Odyssey have been if they hadn’t?

By 2014, things had got considerably more sophisticated. We saw the first of the voice assistant devices for the home in the form of the Amazon Echo when our ability to really communicate with a smart technology became something quite revolutionary.

[music]

Narrator: Introducing Amazon Echo.

[music]

Alexa: Hi, there.

Narrator: And some of Echo’s first customers.

[music]

Narrator: Echo is a device designed around your voice. Simply say, “Alexa,” and ask a question or give a command.

Female: Alexa, how many tablespoons are in three-fourths cup?

Alexa: Three-fourths cups is 12 tablespoons.

Narrator: Echo is connected to Alexa, a cloud-based voice service, so it can help out with all sorts of useful information right when you need it.

Male: To me it’s, you know, bringing technology from Iron Man at Tony Starks’s house into your own.

Narrator: Echo can hear you from anywhere in the room, so it’s always ready to help.

Male: I can have the water running. I could be cooking. The TV can be on in the back room, and she still can hear me.

Narrator: It can create shopping lists--

Female: Alexa, add waffles to my shopping list.

Narrator: --provide news--

Newscaster: From NPR News in Washington.

Narrator: --control lights--

Male: Alexa, turn on the lights.

Female: We use it to set timers. We use that feature all the time, and that’s one that’s specifically helpful, I think, uniquely to blind people.

Narrator: --calendars--

Alexa: Today, at 1:00 p.m., there’s lunch with Madeline.

Narrator: --and much more.

Leonie: And so, we are using AI now, again, to help us overcome some of the things we find difficult as humans, whether it’s someone who is blind using the timers or accessing things like news in a more convenient format, whether it’s someone who has got memory-related difficulties, adding something to a shopping list or setting themselves a reminder to do something when they might otherwise forget.

The Echo now, of course, let’s us drop in and communicate with each other with other Echo owners. And so, we’re seeing AI used as a mechanism for that idea of companionship, of preventing isolation that we see so much and so often in society.

In 2013, something else interesting happened because, of course, a human’s ability to communicate is not always reliant on speech. Plenty of people use all sorts of methods of communicating, and one of those--particularly well known--is sign language.

Narrator: This project is a sign language translator. It translates from one sign language to another. It helps the hearing and the deaf communicate.

[Translator speaking in a foreign language]

Narrator: You can communicate between American sign language and Chinese sign language, or potentially any sign languages to any other natural language.

Male: The Kinetic will capture the sign language and then recognize the meaning of the sign language included in the posture, the trajectory, and then there is an automatic translation into spoken language.

[Translator speaking in a foreign language]

Kinetic: I really like this--

Leonie: And so, we have the ability to use AI not only to translate from speech to text, but also from motion to text. And, if we can translate that text into speech, we can use these technologies to create a translation directly from, say, sign language to spoken interaction, so all the time we’re breaking down barriers between people who speak different languages or who communicate through different mechanisms.

And so, our final law that a robot must protect itself, providing that to do so doesn’t conflict with either of the first two laws. Well, this is one of the really interesting factors when it comes to artificial intelligence. It’s all very well to say that a robot or an AI has to protect itself, but the really important thing we need to remember in this is that we are the ones who build these technologies. And, if we make mistakes in that, then all sorts of harm and problems can occur.

In 2013, IBM decided that it wanted to train its AI, Watson, to be able to speak in a more informal, recognizable way for humans. So, they fed it the urban dictionary thinking it would learn some slang and, you know, maybe some vernacular and some informal ways of communicating. What they discovered as it learnt to swear like a sailor.

[Laughter]

Leonie: And, they eventually had to delete [laughs] the urban dictionary from its memory because it became appallingly impolite [laughs] or humorously impolite, depending on your stance of cursing and swearing. But [laughs], something that we thought would happen with the best of intentions to make an AI more approachable, more normal to humans, had, in fact, a very unintended consequence.

In 2016, Microsoft released an AI that was designed to learn from the people that it interacted with, notably from Twitter and, again, would become a very approachable thing for human beings. In some respects, I suppose, it actually did fulfill that particular goal. But, human beings, especially those on Twitter, being what they are, it very, very quickly, within 24 hours, became quite a nasty piece of work that was coming out with all kinds of violent, antisemitic, and terrible rhetoric that it had learnt from Twitter. So, again, Microsoft had to pull that from the shelves.

In 2017, it was discovered that Google’s translation AI had a pretty terrible, if unintended, gender bias when it came to translating, I believe, into the Turkish language. It just so happened that when it translated things and was left to make some decisions for itself, it tended to come out with samples that would always put the male half of our species in the intelligent, charismatic leadership type descriptors and women in the slow, lazy, and not much good for anything at all category of descriptors. It’s being corrected but, again, it’s just one of those laws of intended consequences. Nothing that we as the humans who are the architects of these technologies ever intended, but we found that when the AI was left to its own devices, learning from the data sources it was given, these unintended things happen.

In 2018--this is my favorite--Amazon’s recognition API identified most members of the U.S. Congress as being criminals. I don’t know what the outcome of the midterm elections was yesterday, but this may actually have turned out to have been more of a prophecy than we ever knew.

[Laughter]

Leonie: But, still, we can’t have our AI running around incorrectly identifying people as criminals. The potential for damage and harm to good, fine, upstanding members of our communities is simply too terrible to contemplate. But, again, these are technologies that we are building. We’re feeding them the data, and we’ve got to get that bit right before we can even remotely expect anything that we build to uphold the third law of robotics.

So, what of all of this? I think Microsoft actually, in their video, put it far better than I can that artificial intelligence amplifies our humanity.

[music]

Narrator: Behind every great achievement, every triumph of human progress, you’ll find people driven to move the world forward. People finding inspiration where it’s least expected. People who lead with their imagination. But, in an increasingly complex world, we face new challenges. And, sometimes it feels like we’ve reached our limits.

[music]

Narrator: Now, with the hope of intelligent technology, we can achieve more. We can access information that empowers us in new ways. We can see things that we didn’t see before. And, we can stay on top of what matters most.

[music]

Narrator: When we have the right tools and AI extends our capabilities, we can tap into even greater potential. Whether it’s a life-changing innovation, being the hero for the day, making a difference in someone’s future, or breaking down barriers to bring people closer together, intelligent technology helps you to see more, do more, and be more. And, when your ingenuity is amplified, you are unstoppable.

[music]

Leonie: So, as we continue to explore the ideas and the possibilities of artificial intelligence, I want you to remember this. Humans are extraordinary. We are all extraordinary. But, most of all, you are extraordinary. Go out there and have fun. Thank you.

[Applause]

Marc Thiele: Leonie Watson.

Speakers