Transcription
(Applause)
LÉONIE WATSON:
Thank you.
Before we start, I want to ask you all to keep two words in mind. One is copyright, the other, consumption.
Whatever you think about generative AI, whether you use it or you don't, the truth is the models that underpin the tools that we use have been trained indiscriminately on other people's data. Your data, my data, and data that should have been protected by copyright law but hasn't been. That's theft. And it's unethical and it has to stop.
Something else that has to stop is the relentless consumption of natural resources, electricity and water in particular, by the data centers that all of the generative AI tools that we use are based upon.
As individuals, it's increasingly difficult, arguably even now impossible, to extricate ourselves from generative AI. It's in our operating systems, it's in our applications, our browsers, our search engines.(...) Pretty much everything we use as people who have any involvement with technology at all,(...) we find generative AI is in there somewhere. And so if the owners of the companies that produce these tools won't stand up and take responsibility,(...) then governments and regulators must.
(Music)
Do not try and bend the spoon.(...) That's impossible.
Instead, only try to realize the truth.
What truth?
There is no spoon.
Generative AI can seem astonishingly real, but like the spoon in The Matrix,
it's just an illusion. It's just patterns of probability masquerading as truth.(...) And AI doesn't understand the world that we live in. It can't really perceive, it can't understand, it has very little concept of nuance.
All it really is is sequences of code masquerading as plausible facsimiles of the truth and the world in which we live.
And accessibility gives us a very good lens to look at this idea through.
Let me give you a quick example.(...) A couple of months ago, I asked ChatGPT to give me the HTML code for an accessible button.(...) And here's what it came up with.
Absolutely. Here's an accessible button in its simplest form, following best practices to ensure compatibility with screen readers and keyboard navigation.
Button type equals button. Area label equals submit form.(...) Submit slash button.
And at first, you might be thinking, that is indeed a plausible facsimile of an accessible HTML button.(...) Except,
ChatGPT, what can you tell me about the type attribute?
The type attribute of the button element is important because it determines how the button behaves.
If you don't explicitly set a type, the button defaults to submit.(...) This means that if the button is inside a form, clicking it will submit the form.
Type equals button ensures that the button does not submit the form when clicked.
Okay,(...) so you've got submit twice and type equals button?
You're absolutely right to point out the inconsistency.
In the example I gave, the button was labeled submit and intended to suggest a submission action. But I set type equals button, which prevents it from submitting the form.(...) This could definitely be confusing since submit usually implies that the button should submit a form.
Yes, yes, it does. And yes, it could. But then mind about the type attribute, why have you used the area label attribute when you've got a perfect good label inside the button element to begin with?
You're absolutely right again.(...) The area label attribute is not needed when the button already has a clear visible text label.(...) The visible text content of a button element is automatically accessible to screen readers and other assistive technologies.(...) So there's no need for area label unless you want to provide an alternative or additional description that's not visible.
So at this point, we'd all be forgiven for thinking that these tools cannot be depended on to produce accessible HTML.(...) And we'd be right. But I don't think it's that simple. Because the truth of the matter is, if these tools hadn't cheerfully hoover up everything that was out there on the web, and if we hadn't spent the last 30 years putting a large amount of completely inaccessible button code out there on the web, the answer we should have got might have gone something like this.
To create an accessible button in HTML, outside of a form, you can use the button element like this.
Button.(...) Accessible name.
Slash button.
This element works well with a keyboard. You can focus it using the tab key and activate it with either the space bar or enter key.(...) When triggered this way, the browser will run any JavaScript tied to the button's click event.(...) Just make sure to style the button so that its appearance changes when it gets focus, providing clear feedback to users. For users relying on screen readers or other assistive technologies, the button element is automatically recognized as a button.(...) Just remember to replace accessible name with a descriptive label that fits your button's function, such as search or play.
And that's all there is to it. So no more inaccessible buttons, okay?
But the point I want to make here is that I don't think we can think about AI. And I'm just going to say AI, generative AI is too much of a mouthful. But I don't think we can separate the concept of AI from the concept of AI and humans. People. Those of us who are responsible for creating these tools, for using them, for consuming them, for making them part of our own products, and for the ways in which we react and respond to them. We have to look at those two things together.
And perhaps one of the easiest ways we can see that is in the form of AI hallucinations, or if you like, glitches in the matrix.(...) We know that generative AI is not always reliable at what it does. One of the ways we can see that is in its descriptions of images.
This is an old example now from one of the earlier models of chat GPT. But I gave it a picture of a shelf of things in my home office. And this is the description I got.
The picture shows a wooden shelf attached to a wall.
On the shelf, there are three items from left to right. An old fashioned rotary telephone, a metal slinky, and a Rubik's cube.(...) The telephone is beige with a black rotary dial and a coiled cord. The metal slinky is in its classic coiled position. The Rubik's cube is colorful with different colors on each side, including green, yellow, and red.
Okay, so this is my shelf, and I'm fairly sure there's more than three things on it. But, no mind that for the minute, what color was the telephone?
Yes, I apologize for any confusion. The telephone is indeed beige in color with a black rotary dial and a coiled cord connecting the handset to the base.
Okay, I give up. No mind about the telephone. What about the bloody great big yellow pencil?
I apologize, but there is no pencil visible in the picture you provided.
We'll come back to this example in a minute. As I said, that was from a little while ago. But the point being is that if we're using chat GPT or any of these other tools to describe images, tempting as it is to believe what we're getting, we've got to learn to verify and validate. We can't trust anything. We're going to use these tools. I think that's pretty much a given at this stage. But if we do, we've got to learn that what we get can't always be taken as the truth.(...) And it's not just what we're told by these things. It's the other kinds of content that they can produce.
On screen is a picture that chat GPT created this week in response to the prompt create a picture of a blind person.(...) And if you're looking at it, you might be thinking, that's person wearing sunglasses. They've got a red and white cane. But yeah, not bad.
We'll skip over the kind of stereotype of a blind person wearing sunglasses. That's a very Hollywood kind of definition. I mean, I do, but only when I'm trying to pull off looking cool, it's got nothing to do with the fact that I can't see straight.
It's the cane that's the really interesting thing here.
Conventionally speaking, blind people use white canes.(...) And even that isn't really the case anymore. If you see me walking around, you might have noticed I use a black cane. And that's not just me trying to be awkward either. They come in all sorts of colors now, blue, pink, red, orange, all sorts. And that's because people like to customize their look and feel. And that includes their assistive technologies and devices. And at the end of the day, if you see someone walking along and they're sweeping a cane out in front of them, you can pretty much bet that they're doing that because they can't see where they're going.(...) This is a bit different though, because the red in this cane is particularly significant. A red and white cane signifies someone who is deaf and blind.
So where a sweeping a cane in front of you is a good indicator that you can't see, there's no way to communicate that you can't hear either. So the red and white is really important. So what's been generated here is not technically speaking a picture of a blind person, but of a deaf blind person. And that's quite an important difference, especially if you happen to be the deaf blind person who needs people to understand that effect based on the color of their cane. So again, if you don't know the subject that you're dealing with, with these tools, verify and validate, it's really important.
And then we come to another slightly murky side of the way AI is not working, but the way it's being used by certain companies to market their own products. And that's AI remediation tools.(...) And they're being used to really architect a different reality.(...) You might have heard of them as overlays, the companies that do AI remediation don't like that term for reasons best known to themselves. What they are is third party JavaScript that you can implement on your website for a fee, of course,
and they will use AI to make your website instantly accessible.
And these tools will promise you pretty much the earth.(...) Accessivy, for example, on its website now will promise you conformance with legislation like the Americans with Disabilities Act or the forthcoming European Accessibility Act.(...) Another user way will promise you conformance with the Web Content Accessibility Guidelines 2.1 and 2.2. Interestingly, it also promises to boost the performance of your website along the way.
The sad truth is that AI remediation actually could do a lot of good out there, but the way it's marketed means those promises are quite frankly not worth the paper they're written on. And there's a good and demonstrable reason for this.
They can't fix a bunch of stuff. You can promise conformance with WCAG, but the simple truth is you can't do that in a tool like this.
They're no good at fixing broken heading hierarchies, for example. They can't apply captions to video content.(...) To give you another really good example, I wasn't joking when I said there's a lot of inaccessible button code out there. And one of the ways that people do inaccessible buttons is(...) they'll take something like a span,(...) they'll apply a mouse event handler to it, on click event handler to it in their own JavaScript, style it to look like a button and call it done.(...) Problem is, if you use a keyboard, that's not going to work. If you use a screen reader, you're not going to recognize it as a button.(...) These tools can't fix that because of browser sandboxing.(...) If an event handler has been applied to an element in JavaScript, then a third-party JavaScript can never find that information out. It's simply not possible. The browser gets in the way of that quite rightly because of security.
So, if it can't detect that there's a random span element with a mouse event handler attached to it, it can't know that it's supposed to be interactive and it can't, therefore, fix it and make it keyboard interactive as well.(...) So, some really good examples why these tools cannot do what they say they do.
There's another concern, too, and that's with disability fingerprinting.(...) You'll all know that, as we use browsers, information like IP addresses and from their geolocation, language and other things are available.(...) But some of these tools will also detect if you're using an assistive technology, like a screen reader.(...) If you're using a screen reader, there is a very high statistical probability that you're either blind or you have no usable vision. Other people do use them, people with ADHD and autistic spectrum disorder, but statistically speaking, you're going to have a visual disability. So, that's another piece of information that can be added to your browser fingerprint profile.(...) And you might think that's, you know, not too bad an additional piece of data until you realize that it really can narrow things down.(...) The EFF a few years ago estimated that the average browser print profile was unique to about one in every quarter of a million people.
That's about half the population of the city in which I live. So, there are two people in Bristol in the UK that have the same browser profile, and that's, as we know, readily available information.(...) If one of these tools can then use that information and team it up with the fact that I also use screen reader and have a disability, what do you think the chances of the other one person in Bristol with the same browser profile as me also being a screen reader user? And quite rapidly, if they team it up with publicly available information, they know exactly who I am. And they're quite capable of tracking me. A lot of these tools, if you select you want the accessibility features for a screen reader user or someone with ADHD or dyslexia, will remember those settings and they'll keep hold of that information until you visit another website using their widget or overlay and they'll automatically apply those settings for you. So, they not only know who you are, where you are, what you're doing, what your disability is,(...) they're taking that information with you across the web and it's really not a good situation.
And then there's the tactics that some of these companies are using. Audio, I in particular,(...) there's a very well-respected accessibility specialist called Adrian Roselli and he spent a lot of the past few years documenting the flaws with many of these tools. They're all demonstrable flaws. He's posted about them on his blog, explained himself, shown videos.(...) And this particular company, instead of thanking him and using that research, that free research to make their product better, took him to court. It's what's known in the US as a slap suit, basically a lawsuit designed to shut someone up. Fortunately, it caused uproar in the community and the company in question audio, I dropped the case and it was settled out of court, but it had an incredibly chilling effect. Adrian was contacted by young people, by students, wanting to research these tools in proper and professional ways, who found themselves too afraid to do that because of the reaction of this particular company.
Karma is a thing, though, it turns out. So, XSEB has long been renowned for being the most extravagant in its use of marketing. And it was fined a million dollars last week by the Federal Trades Commission in the United States for doing exactly that. Over-marketing itself, over-promising itself, and even, it turns out, paying the people with disabilities it quoted on its website saying that its product was the best thing since sliced bread. So, hopefully, this will see a bit of a change. But underpinning all this is the ways some people are using AI to market alternative versions of reality to other people.
It's not all bad, though. There are some interesting things coming out now and going to get more popular in the future, I think. And they are AI agents, hopefully a little nicer and more conducive than a certain agent Smith.
But they're tools that are effectively semi-autonomous.
For a while, and still some people think that one of the ways these things is going to make a big difference is in personalized UI. Jacob Nilsson posted about this a year or so ago. He thinks that when you come to a website or, indeed, any other bit of technology, you'll be able to tell it how to customize the UI just for your purposes. I don't think this is possible. If these tools are not currently capable of producing a single accessible button, I hold out no hope whatsoever that they are going to come up with a uniquely tailored screen reader accessible with certain color schemes because I've got a little bit of vision left user interface just for me. Maybe in the future, but certainly not right now.(...) Where I do think things are getting interesting is with Adentix. If you haven't come across this word yet, it's a thing or someone who has agency, hence the kind of semi-autonomous nature of these tools.(...) OpenAI has released Operator. Anthropic has Computer Use. Amazon has Nova AC Team.
There are more of them out there. Google is working on a Chrome extension called Mariner.(...) They all do essentially the same thing. You can give them an instruction like book a table for nine at Fellini's on Friday night or buy me a pair of black jeans.(...) Now, of course, again, the marketing is a thing. I think those are kind of two extravagant activities for the moment, but where I think they've got interesting kind of applications for accessibility is with task simplification.
So I'll give you an example.(...) On screen now is a video playing of all the settings in the bit of Google Mail that you have to get to to set your out of office message.
Google Mail, I will stress, is actually highly accessible, at least to me with my screen reader, but there's an awful lot of settings in this part of the application. If you're a mouse user, you can scroll down the page a bit, click on the radio button and off you go. If you're a screen reader user and you're familiar with the content on the page, you can get to it in about eight or nine screen reader specific shortcuts. If you're a sighted keyboard user, it takes more than 40 key presses to get where you need to go. So the effort involved in doing something like setting your out of office varies and arguably gets harder depending on your mode of interaction. How nice would it be if we could just say to a thing in our browser, hey,(...) go set my out of office, turn it on now, turn it off in a week when I get back from my holiday and leave a message to tell everybody I'll reply when I'm back.(...) So much easier and so much nicer. And that's where I think the application of agentics will really start to take off in these small everyday labor saving kind of devices, if you like.
And then we come on to, for my part at least, the really interesting and exciting part about AI, because although there's a lot to be wary of and a lot to be skeptical of, there are some really good things happening, especially if you have a disability like mine.
I said we'd come back to image descriptions and they've been a real game changer.(...) The ability to ask something to describe an image is amazing. And yes, I know we've always had the ability to ask a friend, ask a family member, ask a colleague for a description, but this is really different. And they're learning. So despite the mistakes, the first time I used chat GPT a few years ago, this is the description you get now for the same image.
The image shows a wooden shelf beneath a sloped ceiling.(...) On the shelf, there are several objects arranged. One, a black rotary dial telephone with a traditional handset, which appears vintage.(...) Two,(...) a novelty oversized pencil, which is bright yellow with a black tip and eraser. Three,(...) a classic pin impression toy that creates imprints when pressed against objects. Four, a metal slinky toy expanded slightly displaying its coil structure. Five, a solved standard three by three Rubik's cube with yellow, blue and green visible on the front side.
And that's a truly astonishing amount of detail and decidedly improved in the couple of years since I first tried it. And if you're thinking, yeah, but it'll never be as good as having a human.(...) I asked a few people recently to describe this image, not accessibility specialists, just people I happen to know. And some of the descriptions I got back went along the lines, it's a shelf with some stuff on it.
It's a shelf with some kind of cool 80s things, telephone, Rubik's cube,(...) and so on. And don't get me wrong, I am not criticizing the many people throughout my time as a blind person who have helped me out. But humans do describe things in fairly casual ways. And these tools don't. And it's also a lot easier to ask these tools, the follow-up questions. Again, yes, you can do that to the people that are giving you descriptions, but I never want to kind of, you know, assume on their time. I don't want to kind of presume on their generosity. It's much nicer for me to have my own independence to keep hammering something like this with question after question, because I want more details. And that's meant I can go back through pictures from, you know, the past 25 years that I've been a blind person and suddenly discover all the details I was never truly aware of before. And that's really amazing.
If you're a content producer, there's another benefit with the descriptions for images. They can be a really useful way to help you get over the blank page of writing a text description for an image that has to go on your website. It's a lot easier to take something generated by AI and edit it than it is to start from scratch.(...) What's really nice is that you can be quite specific about what kind of text description you want.(...) So when I asked, how would you describe this same image as though it was going to be used on an ultra fashionable lifestyle website description I got was this.
Trust me, I haven't looked at that shelf in the same way since.
But you can do it again in different ways, exactly the same image, but now describe it as though it was going to be used on a collector's website of 80s memorabilia.
That
might be a description of me to be perfectly honest with you, but there we go. If you're really interested about contextual text descriptions, because there's a whole talk in there somewhere, head over to tetrological.com forward slash blog and read the post by my teammate, Craig Abbott, who's done some amazing research into contextual text descriptions with a whole bunch of different models, the good and the bad, and it's well worth the read.
It gets more interesting, though. Recently, an app seeing AI from Microsoft has introduced the ability now where I can explore an image. So I can upload the same image, I'll get a description like the ones we've been listening to. But now it'll create a rendered version of the image where it will identify each individual object within it. And what's more, on my phone, I can now investigate query a description of each separate image inside the picture.
Now,
those descriptions right now are absolutely rubbish. I grant you no use to anybody. But, again, this just ability now in the future when the accuracy improves that I'll be able to not only get a detailed image description, but explore all key objects inside it and perhaps even in time query it for a more detailed description of an individual object. Again, that's just a level of information that hasn't been available since the last time I could see.
And then there were video descriptions.
Video I played at the beginning from The Matrix had a large section at the beginning where there was music and action. So if you could see it, you knew what was going on, but I didn't.
And humans and generative AI can team up. And I did this again with chat GPT. And it does require a two-way effort. The AI detected the time codes.(...) It told me what was going on on screen. I needed some sighted verification because the first time it got it somewhat wrong. But then I created a script and generated the synthetic speed and it blended it all back together. So we get something like this.
A child sits cross-legged on the floor in a soft-lit room. They hold a spoon upright with a steady hand.(...) Neo watches as the spoon starts to bend. The child stays serene.(...) Neo's expression changes to amazement. The child tilts their head, preparing to speak as Neo stares at the now warped spoon.
Do not try and bend the spoon.(...) That's impossible.
Instead, only try to realize the truth.(...) What truth?
There is no spoon.
Neo continues to stare at the spoon.
Now it's a long way off before I think that will be more instantaneous where you can just feed a video and say, give me the audio description. But the fact that we're getting closer is another huge step forward to universal accessibility. Providing audio description is time-consuming. It's costly and it almost never happens if you look at video content on the web. So it means that I'm now a step closer to having video content become accessible.
But what if we could do this in real time? That would be even more amazing. And it turns out we already can.
iOS 18 introduced real-time audio description. So I can pan my camera around an environment. I'm going to show you another bit of my home office and get a description in real time.
Yes,
I have a lot of junk in my office. You're absolutely right. But this is, again, really amazing just to be able to walk around an environment and translate to maybe reading signposts or road signs or shop fronts to find out where you are. There's so many use cases for this. But there is a bit of a drawback.
And that's that you've got to walk around with your phone out in front of you. That's generally a bad idea if you're in a city, especially one where increasingly drive by phone theft is a thing. It's also really inconvenient. If like me, you use a cane and so you're traveling, you want to get around an airport, you've got your phone in one hand, a suitcase in one hand, and you put your cane,
damn it, it just doesn't work. So wearable is the next thing that's really starting to change this. And I have a pair of these Ray-Ban Metaglasses with Metas AI on board. And they've got a feature where you can take a picture and have it describe it to you. And it's much more intuitive because you just say, take a picture of what I'm looking at and tell me what you see. And you can talk to it and ask it more questions and things like that. They are also working on a real time description. It's being tested at the moment, beta tested in the US. It's not available in the UK.
Kills the battery. Apparently you get about 15 minutes of glasses battery out of it compared to all days sort of a use. But imagine a time when we've solved that problem and I could sit and watch a movie and just tell my glasses to get on with the description. Or maybe you've got kids and you want to go watch them play a football match and you could get it to describe it to you in real time. Or the thing I used to love doing, but I don't get to do anymore is just going and sitting in a cafe and watching the world go by. I can't wait. I have so much missed doing that. How nice would it be? And it seems we might be getting closer courtesy of generative AI.
Whatever you think of the glitches, of the alternative architected realities, of the agenics and of the benefits, I think it's safe to say at this point that we are facing a new reality. Yes, we are in the middle of one enormous hype cycle right now. There's no getting away from it. But then I was around at the turn of the century when the thing that was the most massive hype cycle the world had ever seen was the web. And 25 years later, it turns out it's still going strong. So I think at this point, AI may never reach those same heights as a web, an open and transparent web. But I think it's here to stay. It's arguably already become so pervasive, as I said at the start, that I don't think it'll vanish entirely.(...) And so I think we have to face this new reality. We have to understand how to use these tools, when we can use them, when we shouldn't use them. But again, coming right back to my very first point, I think we've got to hold the people that make these tools accountable. Because we can't use tools that are unethical and that are damaging our planet. We can't do that in good conscience no matter how good the benefits are.(...) Thank you.