#btconf Berlin, Germany 13 - 16 Nov 2019

Aaron Gustafson

As would be expected from a former manager of the Web Standards Project, Aaron Gustafson is passionate about web standards and accessibility. He’s been working on the Web for over two decades now and is a web standards advocate at Microsoft. In addition to working closely with the Edge team, Aaron works with partners on Progressive Web Apps, with a focus on cross-platform compatibility. He penned the seminal book on progressive enhancement, Adaptive Web Design, and has been known to have some opinions, many of which you can read at aaron-gustafson.com.

Want to watch this video on YouTube directly? This way, please.

Conversational Semantics

This would branch off from my Designing the Conversation talk that I’ve given off and on for the past few years. The focus is understanding semantic HTML and the framing is the importance of this stuff in the world of headless UIs (like Alexa, Cortana, etc.). I plan to cover semantic meaning and how that translates into something that is meaningful for voice assistants and bots, taking advantage of native browser features to help users fill in forms more quickly via voice, and could even talk a bit about using JavaScript for speech recognition and synthesis if that’s of interest.



Aaron Gustafson: All right, well, thank you for bearing with me as I keep you from lunch. As Marc mentioned, my name is Aaron Gustafson. My pronouns are he, him, his. I work as a Web standards and accessibility advocate and sometimes PWA advocate at Microsoft.

Today, I’m here to talk to you, actually, about the lower level stuff that Jeremy was just talking about, about HTML, about semantics, and fun things like that. You’ll have to bear with me a little bit through this talk because there are quite a few slides that there is nothing on the screen. There is nothing wrong. Don’t worry about it. There’ll be a lot of listening to me talk, so just setting you up for that ahead of time.

Right now, kind of as we look at what’s going on in tech, a lot of folks are starting to pay attention to what’s going on with chatbots, digital assistants, and all of these new technologies that are starting to enable us to have more human interactions with content and services. As we look at these different technologies, most of them are built using specifically coded data sets and specific models, APIs, and stuff like that.

I look at that and I’m kind of sad, mainly because there are 4.5 billion Web pages, the last time I checked online, with all of this content trapped in them that’s unable to go anywhere. This could be articles, stories, blog posts, educational materials, books - as Jeremy mentioned, marketing messages. All of this on the Web but, for the most part, an untapped treasure-trove of content that could be used in a useful, nonvisual context as well.

Now, there have been numerous projects--search engine spiders are probably the most common on that most of you are familiar with--that have looked to turn this kind of unstructured mess of data into something that’s usable. We can do a lot more to improve the utility of the content that we create in order to make it more usable, both by real people as well as by computers that power voice-based user interfaces or, as I sometimes refer to them, headless UIs.

That’s what I’m here to talk to you about because I want to release all of that content from the screen and I want to empower it to go anywhere and everywhere. I want it to find its way into virtual assistants, into voice response technologies, even voiceless chatbots, and I want it to do that without requiring us to code, recode, and use all of these proprietary APIs and stuff like that over and over again, introducing all of this redundancy.

I love the idea of the “publish once, distribute everywhere” model of the Web and I want to bring that dream to reality when it comes to these virtual assistants. I want to enable users to interact with forms, widgets, and stuff like that only using their voice. I think it’s possible and I think we’re almost there.

Mainly, what I’m here to talk to you about today is how to use HTML and how to use ARIA to make our content more structured, more sensible and, more importantly, more meaningful so that it can do all of this awesome stuff. It begins with the humble element.

Let’s consider the example I have up on the screen here. I have an emphasis element. When it comes to visual rendering, it gets rendered as italics but it also is intended to actually add emphasis to the content that’s within that element. HTML is chockfull of these semantic elements that convey meaning and nuance and relationships. Using these and being aware of these semantic options that are available to us--as Jeremy mentioned earlier, there are over 100 elements that are available to us as of HTML 5--they allow us to author more expressive documents. Ignoring that kind of plethora of options that we have in HTML can actually undermine the usability of the content that we author.

When we create webpages, we need to be mindful of the conversation that we’re creating with our customers and the process. We need to choose elements with thought and with care.

One of the best indicators for how HTML will make it into the world of virtual assistants is to actually look at another assistive technology because assistance, assistive, it’s all related. That’s screen readers. Not only do screen readers take the content that is on the page and verbalize that, doing what their name implies, but they also enable users to rapidly navigate pages based on different means. They provide a means of translating visual design constructs into audible cues, things like proximity, proportion, and such. At least, they do so when documents are authored thoughtfully.

Without further ado, let’s jump in and look at some solid examples of how we can create more meaningful documents in this sort of headless UI context. We’re going to start with what are called phrasing elements.

Those of you who have been on the Web for a long time like Jeremy and I have will probably remember these as inline elements. We used to call them that because they were rendered inline, but the reality is that they are for phrases, words, and such. Phrasing elements are actually much more accurate.

This emphasis element is an example that we saw earlier that’s a phrasing element. Here the word “really” is marked as emphasized. When it’s spoken, it could be said like, “I’m really happy to see you.” I’ve emphasized that a little bit.

Sometimes emphasis isn’t enough when we want to indicate the content is very important to the context or to our customers - very important for people to pay attention to. We can use the strong element, which is for something that’s of strong importance like “all fields are required.” That’s pretty strong, right? That’s a really important thing for somebody to pay attention to when they’re filling in a form.

Now, visibly, em and strong are displayed as italics and bold, respectively. In the early days of the Web, we had the I element and the B element that were rendered exactly as these two. Those of you who maybe used some of the early WYSIWYG software for building out webpages may remember that Dreamweaver used B and I and Frontpage.

Anyone remember Frontpage? Yeah. Yeah, a couple of people. That actually used em and strong, which was actually more semantically appropriate. Who knew?

We thought that they were interchangeable because they looked the same. With B and I being shorter, they kind of proliferated on the Web. Semantically, however, these elements are quite different to their doppelgangers.

The I element is similar to emphasis, but it’s more generic. It’s to indicate an alternate voice or mood. It could be used to indicate sarcasm or idiomatic remarks or shifts in language. Personally, I’d rather us have an explicit sarcasm tag. If you’re up for that, let’s submit that to the Web we want.

Here we have, “It’s a terrible movie and it made $200 million. Go figure.” Or, “She’s admired for her energy and joie de vivre.” In this case, it’s indicating a shift in language and I’m indicating that the language within the I is French. This could actually indicate to a voice synthesizer, “Hey, you need to shift your pronunciation to a more French appropriate pronunciation when you move from English to French.”

Now, the B element is used for content that should be--in the language of the spec--”stylistically offset” from the content around it, but it’s of no greater importance. Keywords would be an option. Sometimes people use it for people’s names or product names within the context of the content that they write. Don’t use it for books or film titles or anything like that. There’s the cite element that’s for that. Functionally, the B is a lot like a span, except it defaults to being bold and it’s a slightly shorter tag, but it’s somewhat of a generic phrasing element.

There are a bunch of other constructs within HTML. I’m not going to get into all of them. ABBR is great for abbreviations and acronyms. You can include the expansion using ARIA label title, which used to be the way that you were supposed to put the expansions in there. It’s not reliable, so don’t use that, but you can use ARIA label if you want to have the expansion or, better yet, write it into your content and then have ABBR off to the side, maybe in a definition element, which is another great semantic element that you can use showing the defining instance of a term.

Yeah, most screen readers do not pay attention to title, so ARIA label is how you would get it written in there and that’s part of the ARIA spec, which stands for Accessible Rich Internet Applications. I’ll talk about that a little bit more in a few minutes.

I’m not going to go into all of the other phrasing elements, but I will mention span, which I mentioned before. It’s just not very interesting. It’s just generic content. There is nothing particularly interesting in the sentence; it’s just a span.

Okay, so there’s the special kind of phrasing element, which kind of is a hybrid now. The link can actually wrap around what we used to call block-level elements too. It’s a little bit complicated now, but I still think of links--because I’m old--as being phrasing elements. I want to call them out specifically because there’s a lot that we can do to make links really interesting and kind of fine-tune how our users interact with them.

The primary way we use links is to connect related content. It’s important to choose meaningful words and phrases because generic link text like “click here to learn more” is not terribly useful, especially when there are 17 of them on the page. Right? There are a lot of ways that enabled people who are using headless UIs, whether they are some sort of voice response thing or whether they’re screen readers, where they actually navigate from link to link. If all they just keep hearing is “click here to learn more” and there’s no context around what that content is that they’re learning more about, they’re useless.

Here we have a ramp embedded in the staircase of Robson Square in Vancouver, British Columbia. Here the link is to Robson Square, to an article on Wikipedia about that.

Now, we can also use links to reference content within the current document or at some specifically identified position within another document using anchors. Here we have an anchor to a figure. This is actually, I think, from my book Adaptive Web Design, the Web version of that where you can click and link directly to the figure. Any references that we make to referenceable content, charts, tables, that sort of thing can all be marked up in a figure element, which I love to give figure elements IDs because they’re supposed to be referenced. Then I can reference them from wherever. Even if I’m referencing it from another chapter in the book, which is a separate HTML page, it’ll take you to this chapter and jump you right to the appropriate figure.

Now, interestingly, there is a proposal which I’m super excited about that is going to allow us to link to arbitrary text, even if it doesn’t have an ID. This is something that’s being experimented with currently in the Chromium project, but I’m really, really hopeful that it becomes a standard so that we can actually start to link directly to quotes within the context of a paragraph, which would be super helpful for making references to published papers, blog posts, calling out people on the BS, all that sort of fun stuff.

Now, you can go a step further and identify links as pointing to the permanent home of a given article, a post, or what have you. Here I’m using rel=bookmark to point to an article. It’s the full version of an article from a teaser. In this case, you can finish reading “The Web should just work for anyone in less than ten minutes,” and the link is rel=bookmark.

There are actually a bunch of these different rel values that are available to you. A lot are defined within the HTML spec. Then there are a bunch that were defined within the realm of what are called microformats, which is kind of a vernacular that we all created. When we were building webpages, we realized that we were creating and solving a lot of the same problems and these weren’t necessarily problems that the standards bodies were going to go fix. They were things that we could kind of do on our own by using specific classes or rel attributes rel attribute values. They allow us to actually create some pretty neat things.

In this case, I’m linking to another site that I own, so you can read my bio. The rel is equal to me. This is telling that the relationship of that document to the current document is that that’s me. Right?

You can even inform a user about the language on the page that you’re going to be linking to, so you can provide a little context about that. In this case, you can read this page in Spanish. I have the hreflang=es and you use the standard language codes for that.

You can indicate the type of content being linked to using the mime type, which is also pretty interesting. Then, in a relatively new addition to the spec, you can actually add the download keyword, which indicates to a browser that, “Hey, instead of navigating to this content, go ahead and download it instead,” so instead of presenting it, which is kind of a nice affordance and very simple to implement simply by adding the download attribute to that link.

I will note that, for security reasons, cross-origin URIs will always be navigated to. But if it’s something that exists on your own domain, you can go ahead and create download links this way, which is pretty cool. No JavaScript involved.

Then anchor elements do support non-Web pseudo protocols. Two of the common examples are mail to and tel. There’s also SMS, Web cal, sort of things like that. Some operating systems in browsers will actually support them and enable apps to register these custom protocols. On your desktop, clicking a tel link, even though you’re not a phone, might trigger Skype, for instance, to make that call.

One word of caution, though. Even though we can create these kinds of custom protocols, unrecognized ones may prompt the user to actually choose an application or the browser may just have no idea what to do with it and just kind of freak out, so use with caution.

All right, now, all of this phrasing content is great, but we spent a little time in the weeds and I want to pull back a little bit to talk about documents themselves. As you’re probably aware if you’ve interacted with Siri, the Google Assistant, Cortana, or what have you, headless UIs really do kind of place a greater cognitive load on you to remember what’s going on. It’s hard to keep track of where you are in an interface and it can be challenging to move around when you can’t gather information about the interface based on visual cues because you can’t see anything.

The more complex an interface is, the more challenging this becomes. The same is true in visual interfaces as well, which is why the mobile-first movement really encourages us to focus each page on a single task. That reduces the noise and raises the signal but most Web pages are the antithesis of clear and straightforward because, as our screen sizes enlarged, we just kept filling that space with more and more crap: sharing links, related content, cross-promotions. All these things kind of took away from the actual content that we were trying to put forth.

Thank God Forbes redesigned because this was just atrocious.

To combat this, screen readers provided a bunch of mechanisms to enable users to gather information about the current interface, to move efficiently through it, and just to kind of find the bit that’s relevant to them in the current moment. One of these mechanisms is browsing by heading. The various heading levels in HTML from H1 all the way down to H6 create a document outline, naturally, and assistive technologies enable users to navigate through that document outline in order to traverse the document and find the content that they’re looking for.

Since only the actual content of these heading elements is what’s read out, we really need to avoid having cutesy phrases and stuff like that and actually have some meaningful headings that tell us something about the content that’s going to follow each of them. Stick to summarization of what’s happening in that section.

Another mechanism to enable skimming interfaces involves moving the focus carrot from one interactive element to another. Traditionally, that navigation is done using the tab key on the keyboard, kind of in the screen reader scenario, but it’s also possible using voice assistive technologies to say things like “next” and “previous.” If you’re in a scenario where maybe your sighted and you actually can see the screen, you can actually say the name of the link and the assistive tech will take you to that link and launch it, which is kind of interesting too. There are a lot of these hybrid scenarios too.

Now, in most documents, this means moving from link to link as we move the focuser or form field to form field. This makes the content that we choose for our links even more important. As I mentioned before, we don’t want to have “click here,” “click here,” “click here” because that’s not usable.

Here, somebody would move from John Harsanyi to the “Veil of Ignorance.” Those are meaningful, telling you where that link is going to, what that content is about.

I mentioned forms. I’m going to come back to forms in a minute because they’re a whole other thing. They do accept focus and allow us to move from field to field.

You could also control the ability for a user to focus on a given element that may not be focusable in and of itself. By adding a tab index of zero to an element, all of a sudden that element appears in the tab order for the document.

You can also specify higher numbers than that, but then you’re starting to manipulate the tab order. Just by having tabindex=“0” it allows that element to stay where it is in terms of the tab order of the page but it can now take focus.

There’s also, interestingly, a tab index of negative one, which a lot of people don’t know about, but that doesn’t put it into the normal tab order of the page but enables you to move focus to an element using JavaScript, which is pretty cool, so it kind of gives you a little bit more power to control where users are moving if you’re using JavaScript.

Now, the final way that users traverse documents is via what are called landmarks. This was a concept that surfaced originally in the ARIA spec. The idea was that you could use the role attribute to indicate the function or the role that certain parts of the interface were playing given the page. In this example, the user is giving a role of navigation to this particular div of navigational links. Rather than just having the idea of nav and assuming that to be enough, this actually provides the additional context of, this is a navigational element within the thing. Of course, a nav element would have been better, but we’ll get to that in a minute.

There are a bunch of different role values that are available that are considered landmarks, things like banner, navigation, search, main, complementary, and contentinfo. I’m going to walk through an example of a site and how it used it so that you can kind of understand that.

If you haven’t seen 24 WAYS, it’s kind of an advent calendar for Web nerds. They use landmarks all over the place. It’s a fairly interactive site but it’s also highly accessible. They have their primary header on the site, which has a role of banner. Interestingly, if you have a header element, the first header element that’s inside of a non-sectioning element, so like the body, automatically is granted the role of banner, so you don’t need to explicitly say it but you can.

There is the main element which, actually, the main element came after the role of main, so it also automatically inherits the role of main. Again, a little bit of redundancy there. There’s content about navigation, so the nav element in this case with the role of navigation just kind of covering all the bases making sure, even if we’ve got older assistive technology or an older browser that doesn’t understand HTML 5 but does understand ARIA, we can still make sure that information is exposed. Then the copyright information and such is exposed as content info, which is sort of the additional meta-information, copyright designations, and such.

Here’s how users can experience that. Hopefully, the audio is up.

[A voice assistant is speaking.]

That’s pretty cool, right? You can do some pretty neat things with that. You can start to see how this stuff could start to translate into voice-based interactions.

Now, imagine being able to say to your voice assistant, “Read me the top three headlines from NewYorkTimes.com,” and that voice assistant could look at the New York Times website, look for the main element, then pick out the heading levels, and then read you those headlines back. Then it would be able to know what each of those things are. Then it could take you to those articles if you read the title back to it, “Read me more about this.” All of that becomes possible when we’re using semantic HTML like this.

Landmarks also give users the opportunity to jump directly to a location within an interface. In a voice context, somebody could say, “Hey, read me the navigation for this page,” or maybe they can say, “Search for wooden baby toys,” if they’re on a shopping site or something like that. The voice assistant could immediately take them to the search landmark and fill in “wooden baby toys” into that field and hit “send.” Right? Then start reading back the results. All of that starts to become possible.

Now, it’s worth noting that many of these landmarks do have equivalent HTML tags and that’s because HTML 5 and ARIA were being kind of developed at the same time. They were trying to pave some of the same cow paths. As Jeremy talked about, you’ve got things that are kind of in fashion and then they’re moving lower and lower and being remembered by those lower layers. This was an example of those cow paths being remembered and coming into the collective memory as part of HTML through ARIA and through HTML 5.

Banners, as I mentioned, the first header element not inside of some sort of sectioning element. Content info is the first in that same situation. Navigation is the nav element. Search does not have an equivalent. Main is main and complementary is the aside element.

Now, one last element I want to talk about before I move on is the div. We often see the div element employed when the designer thinks they need to group some elements together or they really don’t like the default styles given to a given element like a button. I have a whole article on that.

I’m not going to get into that but there are so many better choices than div. Let’s just say that. There are all sorts of ways to organize our content. There are paragraphs. There are lists. There are description lists. There are figures, fig captions. There are articles, sections, headers, footers, main - all of these semantic elements; a ton of meaningful stuff out there that we can use to do more for our users, for our customers, and to make them more powerful for our voice assistants and such as well.

We don’t get any of those awesome benefits like being able to ask for the headings of the New York Times or headlines for the New York Times if we don’t actually use semantics in our documents. If everything on the New York Times homepage was divs, it’s gibberish. It’s nothing. The div gets you none of those benefits.

HTML has a ton to offer in terms of enabling our interfaces to operate more effectively in the world of headless UIs. Beyond just content, though, it has the capacity to streamline interactivity as well, and so I want to talk a bit about forms.

Anyone who has followed my work over the years probably knows that I have spent an inordinate amount of time thinking about forms. I’ve had a number of talks about them over the years and written quite a bit. They’re a bit of a necessary evil. They often have a really poor user experience. They’re often poorly planned. But, thankfully, HTML is there to actually have our backs on forms and can make forms so much better.

I want to start with something simpler before we get into the actual form fields themselves. I want to talk about labels.

In their Alexa skills kit, Amazon actually recommends that we make it clear that our users need to respond. Maybe instead of using labels like “first name,” what if we could be more conversational?

Voice Assistant: What’s your first name?

Aaron Gustafson: Consistently is important too, so if you do go the route of starting to address your users as you, make sure you do that consistently and you don’t later have my profile because then it starts to get weird, right? Confusing people; is it you or is it me? I don’t know.

It’s also important to write error messages that are more conversational and clear about what we’re asking for and why.

Voice Assistant: Without your first name, I won’t know how to address you. Could you please provide it?

Aaron Gustafson: That’s a little bit nicer, right?

Finally, if we’re thoughtful about the sort of button text that we’re writing, we can really drive home what a user is doing. How many of you want to submit to Facebook? Okay. Yeah.

Voice Assistant: “Reserve your spot” button.

Aaron Gustafson: Right? If we have more active button text, we can actually provide a little bit better conversational interface for our users.

Now, I mentioned the importance of these labels a bunch, but let’s look at how to properly label our fields and I’ll start with the simplest example possible: What’s your first name?

Now, some of you in the audience--you don’t have to raise your hands or admit it or anything like that--might look at this and think, “Oh, that’s fine. There’s text there and there’s a field. That’s what I need,” right? But there’s nothing that’s actually associating that text with the field programmatically. Visually, if you were to put these things next to each other, it’s all going to render in one line because all this stuff is in line, but there’s nothing to tell a voice assistant that this is related. Okay?

To create that association, we need to use the explicit label element. Now, this is not the end-all, be-all. This is not how you should do it. I’m going to kind of evolve this. We have two ways of associating this label with the field.

The first way is by explicitly associating them. You use a for attribute and that for attribute is an ID reference to the field. Here you can see they both have matching text of first_name for the for attribute and the ID value of the field itself. We can also use implicit association wherein the label actually wraps the field and the text. you could even mix them up and put them together. There’s nothing wrong with that.

In most cases, this is what you want to reach for. Actually, just to support really old browsers and to be able to style radio controls and checkboxes a little bit differently from your everyday input, I would use the implicit association to wrapping the label around the field and the text for radio and checkbox controls but then I would use this approach so that way I could address radio and checkbox because, at that time, we were still dealing with IE6, which didn’t have attribute selection in CSS. I could actually say, “Okay, any input that is a child or a descendant, rather, of the label, I know that’s a radio or a checkbox because of how I write my HTML.” It made it a little bit simpler to style them differently. I could just say, “Input is 80% wide,” and it wouldn’t make my checkboxes 80% wide because the label input would be auto-width or something like that.

Once you have a good label, it’s important to pay attention to the field type that you’re using. We have a bunch of options in HTML. It’s important for us to choose the right one for each job. If all you’re looking for is a simple response from a user that text input is your go-to, you don’t even need to say “type=text” because that’s the default type. Anybody who has ever fat-fingered writing radio like R-D-I-O and all of a sudden it becomes a text input has realized that text is the default. I’ve done that a number of times. You don’t even actually need to explicitly say “text.” That will be what automatically goes in there.

If you’re looking for an email--Jeremy mentioned this--you can use the email type. This was introduced with HTML 5. The appearance is that it looks like a text field. In terms of the browser, in most cases, you’re going to get native validation of that email address. If you don’t, that’s okay. You can actually write some very, very simple JavaScript that can pay attention to the markup that you’ve got and provide that as a Polyfill on top of your form in order to kind of bootstrap the stuff.

Otherwise, you get it for free. You get the validation for free of an email. In certainly visual browsers, like virtual keyboards and stuff like that, the email field will actually provide specific typing controls, so the @ symbol and stuff like that that allow sighted users to more quickly enter that information.

The cool thing is, as I mentioned, if you mess up the input type, the type that you type into the type--that’s a lot of types, right?--if you mess up how you’re doing your form control, it falls back to being a text input. That’s the neat thing about using email, even in a browser that doesn’t understand it, even if you were to go back to the original Web browser, which actually didn’t support forms, did it because forms was HTML 2. If you went back to an early Web browser, let’s say you were to view it in links, for example, a fully text-based browser that did support forms, that field would still work because it falls back to being an input of type=text. That’s okay.

Browsers ignore what they don’t understand. That’s the power of building in layers, as Jeremy was talking about. This enables the concept of progressive enhancement, to be able to improve the experience as users have greater and greater capability within their browser, within their device, et cetera. That’s a topic that’s very, very near and dear to my heart, as well as Jeremy’s.

HTML 5 also introduced the URL field to be able to take URL data. In a very similar way, it will do some validation based on that, handle all that for you automatically, and it just looks like a text field. Text fields were what we had for, like, 15 years before we got the input and URL field, so that’s kind of cool.

HTML 5 gave us a dedicated number field. Most modern browsers will validate these fields. Some provide virtual keyboards that are specific to entering numbers as well for quick number entry. You also, in some instances, will get kind of an up and down arrow to the right of the field to allow you to increment and decrement the field. You can even indicate start and end sizes or numbers that are valid within the context of that input and how much you want it to step each time using those up and down controls, which is pretty cool - all for free just by using markup, so no extra JavaScript required.

Here you can see that actually in play in the range element, which is a native slider. Here, I’ve indicated the minimum is 0, the maximum is 11, and it steps at 1.

A couple of people got the joke. Old people like me, yes.

Now, there are a ton of other field types. I’m not going to get into all of them because there are a lot more that I want to kind of talk about when it comes to forms and I’ve only got about 12 minutes left. UX design, as a concern, is often focused on reducing the friction of accomplishing a task. That’s our core job as UX designers, and so we should look for every opportunity to do that within our forms because it’s only going to make things better for people who are using conversational assistants and headless UIs. We want to avoid introducing unnecessary complexity.

Now, in the U.S., this sort of thing proliferates on the Web where somebody is entering their phone and, instead of having a single field, because we tend to break our phone numbers into area code and then these two other bits, people feel like they have to enforce that formatting and so they break it into three fields so that they can make sure that they only get the numbers. Nobody enters dots instead of hyphens to be clever or spaces because they’re European.

The thing is, though, that’s a backend problem, right? That is not a problem that we should be forcing our users to solve. When we go down this path, the designers place undue burden on their users rather than on their developers and the user experience suffers for it.

Now, sure, you could write JavaScript to auto-advance somebody as they move through these fields as they type, so once they’ve typed three numbers into the first one, it jumps to the second one, et cetera. Anyone who has ever had that sort of an interface to work with, if you ever go back to edit it, if you highlight some text and then you go to type again, it’ll immediately jump to the next field and it becomes such a pain in the ass. Yeah, it’s not a good experience at all.

When it comes to using this via voice, now, all of a sudden, a user actually has to supply three separate values. Do the developers know how to label that? Yeah, they might know area code, but do any of them know what their central office code is or their line number? That’s meaningless, even to developers. Even if they knew how to label that properly, the users aren’t going to know what that is.

For all of these reasons, it just makes sense to use what HTML gives us and there is a specific tel field, right? It doesn’t validate phone numbers because there are too many international formats, lengths, and stuff like that, but that’s okay. We should be doing that work on the server-side anyway.

It’s inconsequential for us to write code to sanitize and homogenize content, things like phone numbers, passport numbers. Honestly, we should be writing that code anyway because we can’t trust what comes from the client, and I’ll talk more about that in a moment. If we want consistency and we want structure in our data, we should be enforcing that ourselves. We should be working harder so that we aren’t making our users work harder.

Now, forms can be really frustrating to fill out, even for visual users, for sited users. They especially are so if they’ve very long, so being tasked with reducing frustration and reducing friction in the process, we should remove all of the fields that are unnecessary, first of all. I’m just going to come out and say that. If it’s not required, it shouldn’t be there. We can also help our users avoid errors ahead of time by being a little bit smarter about how we request information.

Most browsers allow users to store some of their information within it and voice assistants could be very much the same in the not too distant future. Some of them also watch for common data that can be filled and will prompt the user to go ahead and enter that. You’ve probably seen this with credit card numbers, with your address, all of that sort of stuff. We can actually control that to some degree within our markup.

A lot of browsers pay attention to the ID or the name that are being used for fields and that’s how they kind of build up some potential options that a user might want to use. That’s great, but we can actually use the auto-complete field to manage that in a little bit saner way.

We used to only have the on and off options, which you definitely want to set autocomplete to off for any sort of sensitive information like passport numbers or, if you happen to be American like me, social security numbers, credit card numbers. That sort of stuff, you may not want auto-complete to turn on.

But now we have all of these predefined tokens that are available to us for auto-complete. Don’t worry about reading this. I’m going to give you an example in a moment. In a headless UI context, this could actually allow the virtual assistant to ask the user what they would like to fill in.

Here’s an example. Here we have a label and the label is, “Is there a mobile number we can reach you on regarding delivery?” In this case, it’s using the auto-complete tokens of shipping, mobile, and tel. I’m being very explicit both in the label and in the auto-complete, the sort of information that I’m looking for. Users could have a specific phone number associated with shipping or a specific mobile number with shipping. There are lots of options.

There’s a great article from Jason Grigsby that’s up on the Cloud Four blog that gets into a lot of the details of this and all of the different affordances that you can create using auto-complete. I highly recommend taking a look at that.

All right. You can also let users know when fields are required. Jeremy mentioned this as well. We have traditionally done things like putting asterisks and letting people know that things are required by making it strong and putting a class of required and so on and so forth. That’s not particularly awesome and doesn’t always translate well.

Instead of that, we could say something like, okay, we’ll say all fields are required in the text ahead of time just in case somebody is reading through, having the text read to them. But then when we get actually down to the field, we can use the required attribute and aria-required=“true” to indicate that this field is required.

Now, the reason that aria-required is here as well is that, just as with the roles not necessarily being mapped directly to the HTML elements, not all browsers map to ARIA or map to required to aria-required automatically. This just kind of covers all of your bases and it’s really not that much extra to add in there. Now, all of a sudden, users can know that a field is required if they’re in a headless UI scenario.

You can also provide hints as the sort of content that you’re expecting. This can help users avoid errors before they happen. In this case, it’s a form that’s looking for a Delta flight number. There’s a pattern attribute, which is being used. It’s using a regular expression to say, “This is the sort of format I’m looking for.” Then there’s a placeholder in there saying, “exempli gratia” and then “DL5407” so that’s showing a user then this placeholder; here’s an example of the sort of content that I’m expecting you to put in here. A virtual assistant could read out that information as well if it so chose.

Now, roundtrips to the server are expensive, especially on flaky networks, so we can help users to mitigate this by doing some validation in the browser. As I mentioned, HTML 5 enables that pretty easily by paying attention to things like the required attribute. Unfortunately, we don’t have required for, like, choose one or more in checkbox land of those sorts of things, which still kind of sucks, but you’ll have to introduce JavaScript to do that sort of thing. Maybe we’ll get it in the future. But then, we can rely on the type validation within the browser if it supports that as well.

Patterns also get validated against. In a sighted context, in a kind of traditional browser context, the message can get popped up if it doesn’t match the format. That same sort of notification could be used in a virtual context as well to let non-sighted users or people who are using headless UIs understand the same information. It’s pretty inconsequential to layer on, as I said, JavaScript that pays attention to this same markup, the same declarative markup and provide that level of validation.

When you do find an error, you can add that information into the page. Having a validation error here that is a strong element, in this case, it’s your email address is required. Then I’m associating that. It has an ID, email error, and aria-describedby has been added to the field. That’s an ID reference to the field or to the validation error message. That sounds something like this.

[A voice assistant is speaking.]

Aaron Gustafson: Okay. Now, in the future, hopefully not too distant future, we’ll actually have an ARIA error message that works in exactly the same way as describedby but allows for a little more disambiguation, but there’s not support for that yet.

All right. The last thing I want to talk about is server-side validation because client-side validation is great but, as I mention, you can never trust the client. We always need to protect things on the server-side as well. I’ll give a quick example of that from the very early days of the Web.

Back in 2001, when the original Xbox was the game system to drool over, we put a lot of faith in the browser as people who built websites. We didn’t think about how our sites could effectively be hacked and our markup could be used against us. I remember one specific instance that involved this very gaming console on an e-commerce site where they had a hidden input with the cost of the gaming console.

Now, this was the days before we had dev tools when we could bust open the dev tools and fix a form that’s broken. How many of you have fixed forms that are broken in order to be able to submit them? Yeah. We’re all hacking on people’s sites, fixing their problems.

This was the days before we had that. I don’t even think we had Venkman, for JavaScript debugging, at the time. What we did have was “save.” One enterprising individual decided to save this webpage and so it saved all of the markup. Then they simply went to this hidden input, changed the cost to 1, and the cool thing was, the form still submitted to the same endpoint that the webserver had.

They submitted the form and got their Xbox for $1. It was pretty awesome - pretty awesome. The developer had no checks in place to actually see that the price that they got on the backend when they were going to process the order was what the price of the item should be because they didn’t realize that you can’t trust the client. The client can always be adjusted and things can always be changed in order to make things work. As much as we put our faith in JavaScript and we put our faith in being able to have things run exactly as we want them to in the browser, it’s inconsequential to change things on the Web today.

We need to make sure that we use the server as the last line of defense for this content. Even if you’re running everything in the client and then just pushing information back using an API, on the API end you better be validating that and you better be sanitizing it and such. Otherwise, you’re going to run into problems.

Since the server-side is your last line of defense, you also need to be prepared to return validation error messages from the server-side as well. If you are doing a full-page refresh or something like that or taking someone to an error page, you need to do things like summarize what those server-side errors are. You can put a message like this at the top of the form. Then you can start with a helpful introduction: “There were errors with your form submission.” Then you could have a list of what those errors are.

Lists are great because it actually reads out to people how many items there are in that list so they could, here, in this case, list of three items. Then as they move through them, message is a required field, name is a required field, email is a required field, and each of those can be linked directly to the inputs that they’re associated with using the same ID reference that the labels use, which is pretty cool.

You could even throw a tab index of 0 on here if you wanted to make sure that somebody was able to focus into it, which would be kind of cool. It might end up sounding something like this:

[A voice assistant is speaking.]

Aaron Gustafson: That’s pretty cool. Then you use the same pattern that we saw before about having the aria-describedby or ARIA error message indicating what that error is in the context of the field.

HTML is a truly robust and expressive language that’s often overlooked and undervalued. It has incredible potential to nurture conversations with our users without requiring a lot of effort on our part. Simply taking the time to code our webpages well will enable our sites to do more, thereby enabling our users to do more.

Hopefully, this rather brief overview has opened your eyes to the wonderful world of HTML and how semantic markup can make your content more structured, more sensible and, more importantly, meaningful. If you were already a true believer, hopefully, there are little nuggets that you were able to take away from this that you can put to work in your own stuff as well.

Thank you all very much.