Josh Clark: Well, hey, everyone. You all look very nice today. I guess my job is to keep you alive and awake. I think, if you’re setting a low bar, that’s good. I do feel full of energy, one, inspired by Marc, the driving force behind this thing. Thank you, Marc.
Josh: Also, by DJ Toby. Huh? It’s like, man. That’s, like a shot.
[Cheers and applause]
Josh: Thank you, Toby. But I’m also excited because I’m here to talk about how you approach the texture of a new technology. How do you approach a new design material? In other words, what do you do when you want to engage and launch something that is an exciting and new technology? Which is why I want to start by telling you about the Juicero.
[Laughter and cheers]
Josh: An Internet-connected fruit and vegetable juicer that costs only $400. The big innovation here is that it did not actually juice anything. It just squeezed these prefabricated juice packs. Only it turns out, you didn’t actually need the machine to do that.
Josh: You could just juice it yourself and, after two minutes, you had exactly the same result as if you used the machine. It seems like it might seem like this is not a useful sort of approach to technology but, actually, it turns out that you could actually use the machine to scan the barcode to find out when it’s going to expire, which is useful. Of course, it’s also handy that they print the expiration date on the packet.
I guess I want to say, this is not the first example of complete bullshit in design in the kitchen. I want to talk a little bit about Philippe Starck’s designer juicer. Of course, if you try to use this, it will just make a mess in your kitchen, but it’s okay. It’s all right because Phillipe tells us that his juicer was not meant to squeeze lemons at all. Of course, it was meant for conversations, so it’s okay.
Josh: Phil, you minx. You’ve punked all of us with your crazy ideas. I don’t really like this. I think this is self-serving when I believe that our job is to serve others. This is the ego of the designer, and I think we run into this a lot as the ego of the technologist, the enthusiasm of the making of the thing, of falling in love with the design material so much that you sort of forget the service that the thing might do.
Again, the kitchen is full of examples like this. Here’s a Wi-Fi enabled kettle, which I know you’ve all been waiting for. A Brit named Mark Rittman owns one of these, and he set about trying to make a cup of tea, but the kettle encountered a little bit of network trouble. It couldn’t actually sort of get connected, and it insisted that Mark recalibrate it. He did, but then it sort of suddenly sabotaged his Wi-Fi router.
Josh: And then disappeared into some corner of the network, and he couldn’t find it. Mark was undefeated. He set about port scanning his network to try to find where the kettle was there. That was after three hours. Ten hours later, he finally got the kettle back online and working and had a delicious cup of tea.
Josh: But the smart lights needed a firmware upgrade. The Internet was not sympathetic, exactly. Everyone was sort of like, “This doesn’t seem like the future I was promised.” Sort of more to the point.
Josh: I apologize for the language. I’m just reporting the facts.
Josh: This is a review from Fast Company of another connected kitchen device. It doesn’t even matter which one because I think that this description seems really familiar of a lot of technologies that we encounter. “Automated yet distracting. Boastful yet mediocre. Confident yet wrong.” This is not what we set out to create, but this is the feeling that is imposed on the people who use our products so often.
What I want to talk about today is, how do you use new technologies, which are often fragile, often brittle, in responsible ways. How do we use them, first, in the way that they want to be used that sort of works with the grain of the technology but, also, in a way that adds meaning, that’s actually useful in our lives and, in a sense, makes us more of who we are? So often, it feels like we’re working for the machines. How can we make it so that they amplify our creativity, focus our attention in our judgment and make us more of who we are?
That’s what I want to talk a little bit about is working with a new design material, and I want to talk particularly about machine learning and artificial intelligence. I run a design studio called Big Medium in New York City, and our focus is really on designing for what’s next, helping companies engage with new technologies in ways that are, again, sort of meaningful, responsible, useful, and sublime. I’d like to think that we do things that are sublime.
A lot of that has focused in the last decade on pushing the frontiers of mobile. I think mobile has defined the last decade of digital design and really shaped its direction. That has been changing for us, though, in the last couple of years. More and more, we’re working with client projects that engage with machine-generated content, machine-generated results, machine-generated interaction where the machines are in charge.
I would say that if mobile defined the last decade of digital interaction, then I think machine learning is already defining the next. I think, as designers, as front-end developers, we’re often not quite sure what our role is in this. I’m not a data scientist, but I think that design actually has a huge role to play in this. I think a lot of it is understanding machine learning as a design material. It is a design material in the same way that HTML and CSS is a design material, in the same way that prose is a design material in the same way that dense data can be a design material. How does it want to be used?
Like any design material, it’s important to understand its texture; not only how it can be used, but how it wants to be used. These are sort of the questions that I think are useful when you’re starting to work with a new technology, with a new material. What is it for? What tools can we use for it? What is its grain? Again, this idea of not only what does it offer, how does it want to be used, the texture of it, but what does it require of us?
Let’s start with this first question of what is it for or, put another way, why team up with machine learning? What do we get from this? I think, broadly, the whole premise of it and, indeed, what we’re beginning to see how it’s used is, what if we could detect patterns in anything and then act on those in some productive and meaningful way?
If you think about it, there are sort of five ways that you can use machine learning. I’ll step through these very quickly. Recommendation, it’s sort of like, here’s something. Among the whole sea of options you have, this is the first option. It’s sort of a priority sorting based on data and information that we have.
Very close to that is prediction, which is essentially sort of saying, you’re doing this. Historical statistics say this is the next most likely thing that you will do. This is the next thing in your flow.
Classification is sort of organizing things by human classes or categories and then training the machines to identify that. It’s like, yes, this leaf is a maple leaf; that is an oak leaf. Human classifications that the machines are trained to fit.
Clustering is the interesting part, which is unsupervised categories. “Hey, neural network. Here’s a whole pile of data. You figure out how these things are organized.” You can sometimes get these really interesting insights.
Then the trickiest bit, which is generation, letting the machines make something: a painting, a screenplay. They’re terrible at this right now. Really awkward, sort of clumsy things, but it’s really interesting to see how they’re evolving. How might the machines be creative?
Just some examples of this, if we look at recommendation, if we look at Slack for an example of that, they launched a feature this year for big teams that helps you find who talked about certain topics. This is machine learning that helps you find experts in your organization. You need an expert in the hiring process; Isabella is the person to go to.
These are recommendations. This is the rank list of documents that matches specific context or concept. It’s closely related both to classification, which I’ll discuss in a bit but, also, to prediction, as I mentioned. And so, I’ll look at that.
Prediction, I think, just think of our predictive keyboards. This is just another example of everyday machine learning, just taking the statistically most likely next word and showing it up above the keyboard; just a simple intervention to speed the error-prone task of touchscreen typing. It’s a basic add-on historical data. Here are the things most likely to happen next. That’s prediction.
We’ve got classification. Google Forms, which is likely familiar to many of you. When you add a question, you choose the format of the answer that you want. The default is multiple choice, radio buttons, but there are a lot of options here. They’ve made it as simple as they could, but it still takes some time and thought.
Look what happens when you start typing the question text. “How satisfied are you?” maps to a linear scale. Of course, the correct answer here is one, the lowest of the thing, right?
They’ve added a little machine learning to the mix to look at your question and classify it to a specific answer type. “Which of the following apply?” maps to checkboxes. It’s just a convenient bit of intelligence to make the process easier.
Classification is human-generated categories where the machines map content or data into those categories. You’re probably thinking it’s sort of like, “These seem awfully mundane examples. Where’s the fancy robot intelligence that’s going to sort of do everything for us?”
Yeah, that’s what I like to say. You can quote me on it. Let’s sprinkle a little machine learning on that. Right? We have the technology that could actually be that casually used.
Now, there are bigger opportunities, too, for the machines to sometimes take over the entire task and not just do these almost invisible assistants, like writing simple news articles. The Washington Post, the newspaper owned by Amazon’s Jeff Bezos, is actually having the bots write news articles. These are sort of the simple news articles that have actually fallen off of the grid for a lot of papers, a lot of papers in the U.S. that no longer cover local sports, local elections.
Basically, it sort of takes this data and fits them into prefab templates. It’s not really writing the entire article, but it’s writing portions of it. This is sort of a kind of classification where if, for example, in an election the polling is showing one result but the election shows another, it might use something like -- it might classify it to “A stunning reversal of fortune,” as a phrase that gets put into the thing. It’s classifying the data, finding these phrases, and putting it into the template.
The system is called Heliograph. They use it for stories with, as I said, simple narratives: political contests, sporting events. It’s something that is actually sort of starting to now actually run on its own. Feed it some data; you get articles out of it.
Clustering is the bit I think that can feel like either magic or nonsense depending on the success of the algorithm. This is, as I mentioned before, completely unsupervised. The machines just go out and find patterns. It sort of says these things are different from normal, are different from the average in these common ways. That’s a cluster.
Here’s another group that’s different from average in other ways. That’s another cluster. It’s classification that’s done by machine logic, not human logic, which means that it can use things, it can see things that we otherwise wouldn’t, both because of the scale it can use but also because machines find patterns that we’re kind of not tuned into.
It’s a way to identify things in which things depart from the normal or the average. It’s often used to detect fraud, crime, disease, or spending outside of your normal habits, groups of data points that sit outside the norm in some interesting way. That also means you can use it to identify clusters of products, customers, or people by behaviors or associations that might not immediately be apparent or obvious.
I’ll show you an example that combines unsupervised clustering with content generation, and you begin to get glimmers of what looks like creativity. Douglas Summers-Stay is a researcher who is interested in artificial creativity, and he created a wordplay generator. You give the system a word or an idea, like rocking chair, and it comes back with a two-word rhyming description like knitter sitter. You give it Star Wars, and it comes back with droid overjoyed.
You have all of these examples of these things that it can do. It’s a system based on what are called semantic vectors that organizes words that have strong associations. Here it finds those words and then runs them through a separate algorithm to find the pairs that rhyme. That’s a fun example.
We’ll look at even more powerful clustering examples later but, just broadly, that’s sort of the universe of tools that we’re able to use and that, again, are becoming sort of very available. It’s very sort of democratized with these open source systems that are accessible to all of us.
Again, the possibilities here then is, what would happen if we can detect patterns in anything and act on them? These can be small interventions, as I said. Sprinkle a little machine learning on it.
When you consider the pattern matching applications we’ve looked at, it means that there are sort of four things that you’re able to do with these things now. One is, be smarter answering the questions that we already asked. That Slack example is a good example of that. Often, in sort of big organizations, we’re trying to find somebody who knows something about that. We’re doing these text searches. Slack has used semantic clustering to classify different people based on their areas of expertise, so it’s easier to search for them. It’s just better at something that we’re already doing.
It gets more interesting, though, when you realize that you’re actually able to ask new kinds of questions. If you work in a customer care center, you can actually search for angry emails. I want angry. I want urgent. It’s something that, because of semantic analysis, we’re now able to ask a different question than we were able to ask before.
I think something that we’re more familiar with and seeing a lot of is the idea that we can now have all these new sources of data. All the messy ways that we communicate with each other as humans through speech, through doodle and sketches, through photos, through video, used to be completely opaque to the machines, and now it’s available as a source of data. What are the opportunities there? Finally, as I mentioned with clustering, how can we surface invisible patterns that we hadn’t seen to identify new patterns of disease, new customer segments, or new connections between artworks, which we’ll visit in a moment?
This is sort of these -- we sort of map these things to those five categories that we mentioned before, and we start to see these opportunities, but what do we do with it, and what does this mean as designers? These are sort of the technical tools that are at our disposal. If you think about it, the role of design, and particularly of UX, is to figure out--it’s always to figure out--what’s the problem and where can we have sort of the maximum impact? Where can we have an intervention that would be most useful with technology?
Particularly in this, this is important because it takes effort and data and process change to add machine learning into any workflow. Choosing that right point of intervention is important. It’s practical. It’s about budget and resources.
But more from a human need, it’s really about identifying where is the human need and how can machines help to solve that human need? From a UX perspective, these are the questions that teams are always asking. Good design teams always ask these kinds of questions. What’s the problem? What data can determine that answer? Who holds that data, and who are the people to serve? That’s just good user experience.
The new part, though, is that we’re sort of having to apply this at unprecedented scale. This is UX research at entirely massive scale as we start to think about how the data comes into it.
Here’s one example. We did some work at Big Medium with a healthcare company. We zeroed in, particularly on opportunities to help radiologists do their jobs better. They’re, of course, the folks who examine x-rays and medical scans for problems.
You might sort of think their work looks like this. Oh, what does this blemish indicate on this scan? Really, their work is mostly just looking at negative scan after negative scan after negative scan after negative scan, full of this clerical work before they even get to apply their expertise to look at even an interesting scan.
Part of this, when you look at the overall flow of what their workday is like, this is the opportunity. How can the machines find and set up the interesting scans so that radiologists can do their work better? They took over the time-consuming, repetitive, detail-oriented, error-prone, joyless part of their job, and it turned out to be very successful.
We identified the problem and sort of a possible solution, which is, how do we help people do what they do best by letting the machines do what they do best? These are almost never the same thing. Amplify our humanity by supporting our activities in ways that are uniquely possible through machine learning.
A lot of that is just thinking about and looking at, very carefully, the sort of flow of activity that we’re doing for any particular task and sort of really identifying, man, what are those time-consuming, repetitive, detail-oriented, error-prone, joyless tasks? Nobody wants to do this stuff, right? It’s not what we wake up for. I can’t wait to do more joyless tasks.
Josh: But the machines love it. That’s the stuff that they do. Let’s make the machines do these things so we can focus on what we do best. I guess the way to put that is, what are the opportunities where we can focus attention and judgment so that we can do the thing that we came to do? Even those modest examples of predictive keyboards, Slack searches, and Google Form suggestions are about doing that. Even clear out just even little distractions so it can let it focus on what’s happening. Let’s start with that human need and understand how the machines can help.
The next bit, and this is sort of where I want to spend the bulk of the rest of the time is looking at what’s the grain of these things? What are the unique qualities of the design material, in this case, machine learning, that gives us cues about how we should work with it?
Another thing is, what is the sweet spot of it? How do we take that grain and really hit the target for using the technology in ways it wants to be used? I worked on the phrasing of that “hit that target” so I could do that animation for you. It’s just kind of a little pressed, but there you go.
And sort of what I want to talk about is these aspects that define sort of the special qualities of machine learning that then I think have lessons for us of how we should design for it. The first one is, man, it’s this. It’s like these machines. They study the world. They sift through enormous volumes of evidence and data, and they come to, often, shall we say, surprising conclusions.
It takes about 14,000 brain hours for a human being to learn to run, and it turns out that an AI system can figure out how to run in less than half as many CPU hours, but the results are like this.
Josh: Gets some audio.
We’re just using the arms for forward momentum with little consideration of the physics, friction, or the realities of rotator cuffs.
Josh: It makes more sense with the audio, doesn’t it?
Josh: But then again, we also come up with our own forms of, strange forms of, locomotion, right? Just consider the prancercise craze of 2013. Yeah.
When people are so difficult to understand and so often predictable and unpredictable and eccentric, how can we expect the machines always to make sense of what’s happening here, right? At times, they’re actually pretty close. Right?
Josh: It’s all to say, the machines are weird, but in large part that’s a lot also because we’re weird. We may never totally understand what we look like to the machines from the other side. We communicate and behave in ways that are not always predictable or they are themselves somewhat fanciful. What do the machines make of us as creators and artists?
This Renoir painting is in the Barnes Foundation in Philadelphia. They’ve done a lot of work there on machine learning with their collection. When curators ran it through a computer vision algorithm for identification, the robots confidently identified it as a boy holding a teddy bear, of course. Right?
But they didn’t strictly see that as a failure. That might just sort of seem like that is just wrong. They didn’t see it as a failure.
When they asked a neural network to find cherubs in their collection, it found things that were definitely not cherubs but, yet, were also cherub-like. The curators began to find some interesting connections, some productive friction that got them thinking in new ways. Shelley Bernstein led this effort with machine learning there, and she described these connections as magic, that these unexpected results had generated new insights.
One thing to say, especially with the sort of clustering idea, these invisible patterns that I’m talking about, the results are often weird, but this is the first lesson. Consider that a productive friction. The patterns and connections they make can surface themes that might have otherwise not occurred to us.
With the unexpected, let’s talk a little bit about unexpected visitors. B.J. May has a Nest doorbell that he sort of programmed a little bit to lock the front door when it doesn’t recognize a face. It locked one day, so he went to look into what happened. What he found was that it had seen really sort of an unusual face here.
Josh: It was like, “I don’t know. I don’t think we should let this guy in.”
Josh: It seems a little angry. The machines perceive and interpret the world differently than we do. That weirdness, unpredictability, and mistakes are essential to this design material.
One of the real lessons is that, as I’ve worked more and more with machine-generated results and machine-generated content and, particularly, machine-generated interaction where the machines themselves are the ones in charge of talking to the customer more than an experience that I’ve purposefully designed, the more that I realize I’m not in control of this as a designer. That’s new.
I’m used to designing fixed paths through content that I control; the happy path. Right? Now we have to anticipate a fuzzy range of results in confidence, which is why it’s really important to understand the strengths and weaknesses of your model.
This is always our job, but it is especially our job now. The job is to set appropriate expectations and channel behavior that’s consistent with the capabilities of the system. How do we do that?
Right now, the Siri and the Alexa set expectations by saying, “Ask me anything,” and consistently disappoint because that channels behavior in a way that is not consistent with the capabilities of the system, not yet. How do we design for that?
I think one way is about the manners. Of course, friends, the answer is B.A.S.A.A.P., which we all know stands for Be As Smart As A Puppy. This is a phrase from Matt Jones of the late great digital agency BERG London, a pioneer in connected devices and digital weirdness. He came up with this way of thinking about creating honest, gentle interfaces for our weird systems. Our goal should be making smart things that don’t try to be too smart and that make endearing failures in their attempts to improve, like puppies or robot soccer players.
The right manner helps us to anticipate that kind of failure and forgive it, right? So often the promise is so confident that it’s so much more disappointing when it fails. How do we anticipate, forgive, and recover from error? It may sometimes be annoying. Puppies can be annoying. But it won’t surprise or, at worst, endanger us because too often when machines make a mistake, it simply feels like this.
Josh: Back to the drawing board.
Josh: And that’s about the design, right? How do we cushion mistakes? How do we set expectations so that they don’t disappoint or anger? I think that one of the lessons here really is, again, in the same way that we need to set appropriate expectations and channel behavior in ways that are appropriate to the system, the design challenge is, how do we match the language and manner of that system to that ability?
This next thing brings me up to sort of the narrow domains that any given system can handle. These are really domain. It’s like machine learning knows only a very narrow, very specific part of the world and can only work with that. Once it goes outside of it, it gets very confused.
CycleGAN is a tool, a neural network that’s trained on pictures of horses or fruit and taught to transform them. A picture of apples becomes a picture of oranges or, boom, your horse is a zebra. But when you introduce new elements, it quickly gets confused and thinks that it’s all part of the same problem, so it’s like everything.
Josh: Everything. If it only knows horses, everything is a horse, right? It’s a narrow problem. I think that particularly in this era of machine learning, of artificial intelligence, we have to realize that this is not real intelligence. There is not expertise here. This is just deep pattern matching on a highly specific domain. It’s not intelligence or expertise.
Benedict Evans puts it this way that really it just gives you just tons of interns and maybe ten-year-olds. Five years ago, he says, if you gave a computer a pile of photos, the best it would be able to do is tell you is the dimensions of them. Maybe a ten-year-old could sort them into men and women. If you were lucky, a 15-year-old could say these are cool. Perhaps an intern could say these are interesting. Maybe that’s the best you can do. Maybe today machine learning can match that 10-year-old and maybe the 15-year-old, and we’re still hoping for the intern.
What would you do if you had a million 15-year-olds looking at your data? What are the possibilities there? It’s sort of a scary thought.
All right, anyway, so onward.
I think, because they’re narrow, what it means is you need to focus your questions and inquiry on very narrow problems. That might sound disappointing. That might sound small in scale, but narrow problems don’t have to be small problems.
Deep Patient is a system that is built on thousands and thousands and thousands of personal health records to try to find patterns. This is one of those clustering examples. Unsupervised, here you go, neural network. Figure out what to do with this.
What it found was -- and so, again, what that does is it’s sort of like, here’s normal, and here are a whole bunch of clusters of people who aren’t normal in the same similar way. In this case, one of those clusters predicted schizophrenia about two years before a human doctor would diagnose.
On the one hand, you’re sort of saying, “Wow, this is amazing. This gives us new insight into this clearly difficult to diagnose condition,” but it doesn’t give us any insight. It just goes chuck-chuck-chuck-chuck-ding, schizophrenia, which is helpful to focus judgment and attention on that problem, but it also doesn’t necessarily help us understand it because even the people who built the system don’t understand it.
If we don’t understand where information came from, can we really call it knowledge? That’s a big philosophical question that I’ll sort of put off to the side. But I think that it also suggests that, even as we aren’t exactly clear on even how these systems work, the effect of being able to focus human attention and judgment on an area to say this is interesting seems really useful and it can solve big problems.
But again, this introduces sort of a third aspect of the grain of these technologies is that the logic of them is often opaque, especially with deep learning, which is essentially models creating models creating models. We can’t follow them. Just to use a really sort of silly example about me, anybody use Spotify, the discover playlist? I feel that we’re close enough that I can share this with you, friends. Whenever I listen to it, it sounds like this.
Josh: Yeah, that’s me.
Josh: [Laughs] My playlist is deeply uncool. The problem is, it’s accurate. Right? I am deeply uncool. I wish I was cooler. I wish I was as cool at Toby.
Josh: And so, I’m embarrassed. I’m embarrassed, and I actually try to listen to different kinds of music so that my playlist will sound better. Anybody do that? I don’t like the mirror that the algorithm is holding up to me, so I want to impress Spotify by listening to, you know--
Josh: --what the kids listen to these days, or usually what the kids listen to ten years ago. Whatever.
But also, I’m afraid to skip. Will I never hear that song again that I deep down secretly love? My mental model of how it works changes how I behave, but I’m also not sure if my mental model is correct. I’m sort of paralyzed, and I guess here I am listening to Margaritaville again.
I think one of the things that’s really true, because these systems are opaque, at times to their makers, but certainly to the people using them, is that our job, part of our design obligation is to create some data literacy to help people understand how these systems work broadly but, particularly, how your system works. How does the system behave, what signals does it observe, and what is it optimized for?
Back to Spotify. The interface should probably give explicit feedback to indicate how fast-forwarding a song affects your profile, if at all. I am just riffing here, but just sort of this idea of sort of saying, “Hey, should we forget about this? Should we just never speak of this again?”
Josh: It gives me the option to do it, but also suggests that it hasn’t happened yet. That’s sort of a thing of like, “Yeah, you’re just skipping it. No problem.” I’m not sure this is the right language, but you get the idea. How do we sort of give people the option so that we’re surfacing some ideas of what’s happening and what’s not?
Throughout, I think that really the important part is signaling our intention. When are you capturing information? Here in Europe, of course, the GDPR has certain rules to signal how we capture generation, so we just slapped cookie notices all over everything, which I would say is not what the spirit of that regulation intended. It’s sort of trying to make privacy a first-order design principle.
I think the opportunity is, how might we make transparency a first-order design principle, not through regulation, but by doing the right and responsible thing to signal when we’re capturing information, make it easy to find out how it’s being used? This is a problem, though, because transparency is not a core value for our industry, especially for the big players that profit from this.
If you’re an Android user, you have Chrome running in the background, and you just let it sit, you just put it down over here, walk away, and don’t use your phone, it sends Google your location information 340 times an hour, just about every ten seconds, when you’re not even using your phone. It ramps up when you use it.
Facebook gathers your information from your friends’ contacts so you could not even use the service, never even joined it, but Facebook has an ad profile about you even if you’re not part of the network. You’re giving up your friends just by using the service. Not intentionally, usually. They just sort of grab it opportunistically.
Worse, when you give your phone number for double authentication for security, Facebook makes that phone number available for ad targeting. If an advertiser happens to have your phone number over here, they can target you because you signed up for double authentication security. That is fucked up. Quietly harvesting data any way that they can.
Now, you might say, well, if we told them what we were doing, people might not sign up for our service, to which I would say, “If you think people won’t like it, change your fucking service. You probably shouldn’t be doing it.” Right?
Pardon the language. I’m getting a little upset, a little riled.
The point here, we want to make transparency a default design principle, which is, I think, really sort of something that is important not just in these sort of privacy things, but again in revealing the model of how these things work. The literacy and promoting data literacy not only in our own service, but how the system works is, I think, one of the most urgent things that design is facing right now.
The last thing, though, or next to the last thing I want to talk about with the grain here is that machine learning sees the world in shades of gray. We often think of it as, like, we want it to rush to an answer. We go to Google and put in a thing. It’s like, give me that Google featured snippet at the top that shows me the two sentences that answer my question on the Internet. We want the answer as quickly as possible.
It turns out the machines don’t see things as facts. These are probabilistic systems. Nothing is fixed. It’s all probabilities. Its algorithms tend to be explicit about their confidence in their predictions.
One thing I’ll just mention here, there are a bunch of different services that you can use to implement machine learning, very low effort that sort of all of the big folks, the big companies like Amazon, Google, IBM and, here, Microsoft are providing. You can investigate yourself how the grain of these services work.
Microsoft’s cognitive services here is one example. Again, Amazon, Google, and IBM all have similar offerings. Basically, you can get access to vision, speech, language, and all kinds of different services, all the superpowers that you would expect with this.
I want to show you an example just to give you a sense of the probabilistic stuff. Because I am just that self-absorbed, I’m going to feed it an image recognition, feed an image of myself to the image recognition service at Microsoft. You can see it does pretty well here. It’s a man looking at the camera with the confidence of 89%.
Josh: I’m not totally sure about [throat clearing]--
Josh: Pardon me! I mean, really!
Josh: I’m working on it. I’m working on it.
That one sort of sits with you after a while. It’s really--
But you also see when you go back to this that, as you look at these, there are all these different confidence levels. Even at older, I mean at least sort of saying it’s guessing, right? Just sort of saying flatly, this is an older man. It’s 29 -- 28% confident of that. It’s guessing. It’s like, hmm, an older man? You know. There’s a lack of confidence there that we need to get to, and I’m just going to look at that. Oh, yeah.
Josh: It didn’t even rank, though, you guys. I’m sure that was a low confidence answer. How do we expose those things? We’re familiar with some things like this, like in Netflix it says 72%. Again, I’m just a little embarrassed that it’s that high for me, but I’ve already told you guys about Margaritaville. I don’t know. It’s not that cool.
Netflix surfaces that confidence number directly. What I’ve seen in usability testing is that confidence numbers like this are often not well absorbed or understood. This is not the most effective way to do it.
I think that some of the things like language is sort of like this is a man, maybe an older man, looking at the camera. It could be perhaps a better way to do it. What is the language and body language that we need to sort of adjust to get these changes in probability across instead of just flatly saying the answer and calling me an elephant?
In an application like Netflix, the stakes are pretty low, but there are other places where the stakes are much, much higher, like law enforcement. We’re starting to see the police use face recognition algorithms or pattern matching for predictive policing.
Josh: Shades of Minority Report here to pluck suspects out of the wild. That’s pretty serious, you know. The question is sort of, are the machines up to it? Well, the ACLU ran photos of Congress against a data set of 25,000 mugshots and got 28 matches, so Amazon has apparently confirmed with science that the American Congress is a bunch of crooks.
Josh: There you go. There was some applause. All right. It’s Election Day in the U.S.
Let’s look under the hood here, though. The ACLU left the setting to the default for Amazon for its face recognition, which declares a match when there’s 80% or better confidence. Amazon countered and said, “Whoa! We tell police departments to use at least 95% confidence. It’s a probabilistic tool. This is not a fair example.”
But, in a way, this makes the ACLU’s point. The machines require careful tending. Those of us who are close to the machines need to understand particularly how confident we can be in them and how accurate they are, and how confident we can be in their confidence. When they’re wrong, do they know that they’re guessing? How can we surface that?
We need to make that transparent to the users and the civilians who get their results from us because this stuff has real impact. There’s really hurtful impact. This is something that can put people away in prison, of course.
Even on a more personal level, until a few months ago when you asked Google Home, “Okay, Google. Are women evil?” you see what’s coming, don’t you? “Yes,” with a 30-second explanation of why, with complete confidence that this is the answer.
I am uncomfortable saying those words out loud, so I just want to be really clear, friends. Women are not evil. Right? This is not good. But there was some sort of confidence level that sort of said this is the best answer to that question, perhaps from the phrasing of the question.
The design of the interface just sort of said this is correct, and that is not just a data science problem. It’s that too. But it’s also a design problem that the presentation suggests that there’s just one answer. The interface suggests confidence that shouldn’t exist.
“I don’t know” is better than a wrong answer or, as I said before, “I think I know,” can actually be really useful too. Maybe an older man. It could be an elephant.
Josh: How do we sort of start to introduce those rather than say, “This is an older elephant”?
Josh: Another way to put it, though, is how do we build systems that are smarter enough to know when they’re not smart enough? It turns out the algorithms already are. We are falling down on the job as designers of sort of revealing that lack of confidence.
The important thing here is they don’t see facts, and we shouldn’t report them as facts. We should present information as signals and as suggestions, not as absolutes because that brings me to the last point that I want to make here talking about the grain of the technology is that the machines know what we feed them. Right? Their entire job is to understand what normal is and predict the next normal thing or call out things that differ from normal, in the case of fraud, crime, or disease; general healthcare, right?
What if normal, what if that standard, what if normal is garbage? What if our machines learn from us our own bad or dubious choices? What if they absorb and reinforce existing inequalities or leave out entire categories of people?
We saw this just last month at Amazon where their recruiting tool apparently was strongly bias against women. They’ve got a whole history of sort of, you know, the technology industry, of course, has favored men in hiring, and that seeped into their algorithm. And so, they found that they were being biased against women as a result.
Let’s not codify the past. This is a phrase from the data scientist Cathy O’Neil in her excellent book “Weapons of Math Destruction.” On the surface, you’d think that removing humans from a situation might eliminate racism, stereotypes, or any very human bias but, in fact, the risk is that we bake that bias into the operating system of our culture itself. The machines, which we start to trust more and more, are just reporting based on that bias. In the case of predictive policing or sentencing algorithms where the data in the United States is so completely biased against young black men, or hiring and promotion algorithms, as we saw with Amazon, are biased against women and people of color, the data codifies that ugly past.
Now, there is some possibility here that’s interesting that’s new with the machines is that they surface this bias naively, innocently, and in ways that are very direct and are not obfuscated or hidden. A lot of times it actually is an opportunity to visit things like we saw at Amazon. Whoa! The algorithm is not what we wanted to do. Let’s fix this problem that we have. It can be signals and opportunities for change.
But I think that a lot of it is also then the systems need to help people understand where they are and sometimes know when to be critical of the result. When is this a good answer or suggestion or possibly a very flawed answer?
Facebook, of course, is really dealing with this with the way that they’re dealing with false and propaganda information on their platform. Early last year, they rolled out this design for flagging probably false information. It’s sort of like somebody tries to share something and it’s like, “This has been disputed. This is probably wrong.” You try to share it, and it will be like, “You’re sharing something wrong. This is not good.” Really trying to stop them.
It had good success; 80% reduction of future impressions. But that other 20% dug in. That reinforced the way that they felt about this stuff. When someone challenges a deeply held view and says, “That’s wrong,” we tend to dig in even further. While it helped the mainstream, it hardened the fringes.
They went back to the drawing board and came back with this design instead where you can actually sort of see here are some related articles that happen to be from fact checker sites. When you share, you’re prompted to know that there’s additional reporting you might want to check out. Here’s more reading. It’s a kinder way than saying, “Don’t share that, stupid. That’s wrong.” This is like, “Hey, if you’re interested in that, you might be interested in these.” How would you talk to a loved one about this? How do we bring some care and compassion into this instead of further creating a divisive situation?
In other words, we don’t need the machines to replace human judgment. We actually want it to augment it. As I said earlier, we can focus human attention and human judgment in one area. This is sort of a place to sort of say, “Hey, you can usually sort of trust things. We’re in a tricky area here. Use your brain now.”
I think this is maybe one of the most important things that’s a subtle thing that we already saw how a really sort of slight change in language and the design interface can really have an effect on the way that Facebook is seeing these kinds of results. This is what I’m talking about, the texture or the grain of machine learning, these characteristics that suggest different ways that we need to approach it. I’ve tried to suggest some techniques or at least some principles that we can do to counter some of the problems and take advantage of some of these opportunities.
I want to just sort of close by thinking about the fact that I think it’s sort of implicit here that the way that we design these will change our behavior or lead us to certain conclusions that we might not otherwise have, that our values and behaviors will shift by creating these things and by using them. If we’re starting here, again, with this opportunity, what is the result of this? What do we want from it? I don’t mean just in the instant. What is the larger result that we want as a result of using these machines to do some of these clerical tasks for us? What result do we want to have?
Crucially, what does it require of us? How is it going to change us? Beyond the interaction, what is the larger outcome we want to see, and what trades are we willing to make?
Just one example is medical care. What if you’re sick, and now the machines can handle the diagnosis? You’re just like, “Well, I don’t know. This seems a little bit freaky. I sort of like having a human doctor.” I’m not saying we get away with the human doctor. I’m just saying the machines can do the analysis and the diagnosis so that then the doctor will come in. Their job is not to do the analysis. The machines have already done it. But to deliver the message in a way that is caring and warm. The doctor will let the patient tell their story.
In the U.S., we get about four or five minutes with our doctor and they’re gone. I would like some more time to talk with my doctor about the concerns that I have and have sort of more of a patient content. If the machines can handle that part, perhaps the human can be a little bit more human. Perhaps we don’t need -- perhaps that caregiver doesn’t need as much education, which can help to reduce health costs, another problem that we have in the United States. Perhaps, if that training costs less, then we could actually have more jobs.
This sort of thing, this fear that we have about the machines replacing important parts of what we do is a real fear, but it’s all about how we deploy it. I think that these outcomes of more compassion, better healthcare for less money, more jobs is a good outcome, and we can do that. But it’s up to us to be really intentional about it because if we don’t decide for ourselves, the technology will decide, and it will just sort of roll over us. Friends, I think we can all agree; the future should not be self-driving. Values are cooked into software. Software is political. It is ideological. It has values embedded in it whether we’re aware of it or not, so let’s be intentional about where we’re going to drive these things.
This is not really what we need more of -- Right? -- where we’re working for the machines so that they can do their jobs instead of the reverse. Instead of humans working for the machines, let’s let the machines amplify our potential by doing our joyless work for us and delivering new insight from us. Instead of hijacking our agency, they can augment our judgment so that we can be more creative, more insightful, and more compassionate.
This is our choice, the people in this room. We’re the ones who are going to design this stuff. We’re the ones who are going to make these decisions. I think that that is really exciting. I guess I would like to say this is a time to be generous, to share our ideas, to be generous in the way that we talk to our users about these services, be generous in the way that we promote data literacy with each other as an industry and, certainly, with the people who use these systems, and to share ideas.
In the spirit of sharing ideas, there’s a slight little sort of step over to the right here, but I want to share something. I’m excited. I’m launching something later today as a gift to you all, something that was a collaboration between Brad Frost, Dan Mall, and me and InVision. I’m just going to give you a little glimpse. I can’t show you too much. It’s launching in a few hours.
Dan Mall: Our job would be to help designers understand how to work with design systems.
Brad Frost: As the digital landscape gets bigger and more complex--
Josh: We’ve learned a lot of new techniques that we’re excited to share with the world.
That’s kind of exciting. I’m excited.
All right, that’s coming in a few hours. But really, sort of the big thing is, this is sort of something that is about creating a design foundation. We need stability and sturdy foundations for the work that we do because this is a time for wild ideas, and we need to build those on top of sturdy foundations. We can do small interventions, but we can also do these audacious projects that really make a big difference in our lives. The tools are here now for you to make something amazing, so please do that. Make something amazing.
If you’re interested in more about this, I want to point you to mindfultechnology.com, which is a movement about creating products that focus attention instead of distract it. Juvetagenda.org, which is the output of a retreat that I went to last year that Andy Budd organized, and we had a whole bunch of researcher, journalists, designers, and science fiction writers figuring out what are the questions that we need to ask about designing for artificial intelligence. Finally, bigmedium.com where you can see more writings from me, some talks from me, the services that we offer around this stuff.
I think, in the meantime, it feels like there’s just one thing left to be said.
Joanna Rohrback: Let’s stop talking and do some walking.
Josh: Thank you very much.
Josh: Thanks a lot.
Josh: Thank you.