#btconf Berlin, Germany 11 - 12 Sep 2023

Charlie Gerard

Charlie is a senior frontend developer, a creative technologist, and the author of a book about machine learning in JavaScript. She loves exploring the field of human-computer interaction and experiments building side projects at the intersection of art, science and technology.

Her latest research focuses on cybersecurity, tinkering with the radio frequency spectrum and prototyping on-skin interfaces. She shares all her projects on her website charliegerard.dev.

Prefer to watch this video on YouTube directly? This way, please.

Exploring Alternative Interactions in JavaScript

As the way we build web applications gets homogenous, web development seems to be losing some of its creativity. Websites often look and function the same way however, the potential of what can be built is ever growing. From new browser capabilities, to JavaScript libraries unlocking the use of tools not initially designed for the web, to entirely new technologies; there’s never been a better time to experiment. In this talk, I want to share what I’ve learned over the past 9 years of building creative side projects exploring alternative interactions in JavaScript.

Transcription

Thank you.

Thanks everyone for coming back after the break for the last two talks, and thanks, Mark, for inviting me.

I'm actually really honoured to be here.

I've been able to watch some of the talks, and it feels like I'm sharing the stage with super creative, interesting people, they're pros at what they do, and I'm like, no, it's me!

You know?

But a lot of what I do is prototypes, so hopefully, you know, you'll still get excited about what that is.

But what I want to talk about today is exploring alternative interactions in JavaScript.

But before I show you a lot of examples, here's a little bit about me.

I'm a senior front-end developer.

I'm also on the side, I'm an author.

I wrote a book about machine learning in JavaScript, and I'll show you a little bit of that in this talk.

And overall, I like to call myself a creative technologist, because I like to push the boundaries of what can be done on the web and with JavaScript or other technologies, but as a front-end developer, working with JavaScript is a little bit easier for me.

And when I'm not coding, usually I like to travel solo.

This is a picture of me in Iceland, best trip of my life, and next I'm trying to plan either Greenland or Antarctica.

So if somebody has been there, I'm up for travel tips.

But this is not at all what we're going to talk about today.

I want to cover the topic of human-computer interaction.

So I talked this morning, Tami mentioned a little bit human-computer interaction.

So just in case, if some of you have never really looked into this, this is a field of research that is focused on the interaction between human and computers.

So I guess the first time that research actually says something in pretty simple words.

And what I'm going to talk about today is I'm going to focus on three types of inputs that you can play with in the browser and in JavaScript.

But if you're not an engineer, don't worry.

I'm not going to show that much code.

It's mostly about a conceptual vision of what you can do now in the browser.

And if you are an engineer and you want to see code, everything I'm going to show you is open source on my GitHub, and I'll have a link to that at the end.

So it's probably good that I didn't put that much code into it, because it's, like, the end of the second day, so we're just going to be able to chill and hopefully get excited about different ways to interact with UIs.

So first, let's talk about webcam data.

So if I just tell you webcam data, and you might have played with it before, it usually starts with a laptop or a computer, anything that either has a webcam or it's connected to a camera.

And in JavaScript, you would use navigator.media devices, you deal with the stream, and what you end up with is usually a browser with the camera feed, and you could build stuff like, you know, video communication apps or live streams for conferences and things like that.

But you can build a lot more interesting things with code that's a little bit similar if you add in the middle machine learning.

And here that's the logo for TensorFlow.js, because that's what I've been working with, but there's a lot of other libraries that don't necessarily rely on TensorFlow.js.

And what it means is that you still get the stream from the camera, but you add a little bit of thing to it, so then you end up with an alternative interaction.

So instead of just displaying the stream in the browser, you use some kind of, like, machine learning and predictions of what is in the camera feed to then interact with a web page.

So I'm going to go through examples of things that I personally built.

There's other people doing things, but in general, you're kind of, like, the expert of what you build, so I feel better talking about stuff that I did, so I know why, how, and all of that.

So the very first thing that I want to show is using facial expressions.

So I don't remember� I mean, I don't know if some of you remember the game Rainbrow.

That was an iOS game in 2017.

And it went viral, because basically the only interaction with that interface was with facial expressions.

And you only had two facial expressions, I believe.

It was, like, if you look angry, then the little emoji goes down, and you're supposed to, like, get stars.

And if you look surprised, then you go up.

And the thing is, I don't have an iPhone.

I'm an Android person.

So I thought, well, how do I play, then?

And what is the best way to just recreate it in JavaScript in the browser so that all the non-iPhone users can also play with it as well?

So I'm going to try to do a demo, hoping that the lighting of the stage is good enough so that it will pick up on my face.

But if I� okay, the camera is on.

I can see it.

And if I tap anywhere, and I do, like� oh, I'm so shocked.

Wait.

And I'm frowning.

And it goes down.

It doesn't come with the sound.

You don't have to do the sound.

But, you know, it's like� oh, la, la.

Okay.

So I'm going to stop showing it here.

But the point is basically that it's looking at my facial expression in the feed.

And then you only get that as an output of working with JavaScript.

It was built with a library called face API.js.

And you only get the label of the facial expression as a result.

And you can play with that as an input to your interface.

So I really liked the fact that� obviously, maybe the real creator of Rembrandt maybe wasn't happy that I refer to the same thing on the web.

But I like the fact that if you use the skills that you already have as a frontend developer, you can create a platform that is accessible to a lot more people than only iPhone users.

So that's working with only facial expressions.

But then you can use face landmarks.

So again, that is using a TensorFlow.js and a model called face mesh.

You have access to a lot of key points in your face.

And instead of only getting facial expression as an output, you can get the X and Y coordinates of a lot of different points of your face.

And with that, you can build whatever kind of interface that you want.

So this project, it was silly.

It started as a friend of mine tweeted something at me at the time when Twitter was better.

And he said� I think he was streaming with a camera.

And he was saying, oh, I would love to be able to zoom in and out of my camera just using my eyebrows.

And you can have some funny effects when you're streaming.

So again, I'm a JavaScript person.

So I thought, okay, I know how to use face mesh.

And I don't have a camera plugged into my laptop.

But you can have the feed from the webcam.

And using Canvas, you can zoom in on the Canvas.

So I'm going to try to do it.

And it was actually pretty interesting.

It was the first time that a goal of the project was to calculate the X and Y coordinates differences between my two eyebrows.

So� okay.

I don't know if that's going to work.

Because I'm not seeing myself very well.

But I think if I raise my left eyebrow� no.

Okay.

Wait.

Okay.

So� wait.

I'm going to try to� because I need to zoom out after.

So if I do that� okay.

Oh, wait.

Not that.

And then� no.

Get away.

So there was another thing.

But anyway.

So I realized I can't zoom out.

I can't raise my right eyebrow� stop.

Okay.

I can't� I can't raise my right eyebrow on its own.

So I have to do this.

Okay.

So� you know, you learn very important things when you build stuff like that.

Right.

So you see what I mean when I say people before me were pros and now I just build stuff like this.

But it's interesting.

Obviously if you wanted to create an interface that was using face key points, you probably wouldn't want to do this.

But it would be interesting to take that project and hook it up with a camera, maybe using the Web USB API and being able to be on a screen, maybe do a weird face and it zooms in and you can have fun with it.

I'm not a streamer, but I assume that maybe would be fun for streamers.

But to get away and get a bit more broad, you can do face detection in general, not necessarily using a model that can recognize faces, but you can use something that's called a teachable machine.

So I'm going to try to show you this one where I used it for my face, but you can� it's doing image recognition in general.

So the UI is very small.

Yeah.

I built that a long time ago.

So what I have here is I have four labels.

Right, left, down, and neutral.

And what I'm going to do, it's taking screenshots very fast of what is going on overall in the Webcam feed.

So it has no idea that there's a face on it.

That doesn't matter.

But I think if I tilt my� okay.

I'm going to record samples.

And then I go left.

Then I go down.

And then I go neutral.

And if I start the prediction, I have a keyboard.

And as I'm moving my face, it's like selecting letters.

Oh, no.

The other way.

The other way.

All right.

So I'm going to close the tab.

So you can� and the thing is, I trained� I mean, it's using transfer learning.

So there is a model that is pre-trained with other samples before.

And you're able to train it with your own in a few seconds.

Like my demo, it didn't really work well.

But it took a few seconds to retrain the model.

You can have it a lot more accurate if you record more samples.

And also, obviously, the lighting on this stage is coming from above, not from, like, in front of me.

So if you ever work with camera or Webcam data, this is one of the downfalls or the limitations that you get is that it has to be pretty bright for it to be able to recognize things properly.

But, again, it was, like, one of my early explorations of what would a UI look like if I could, in this case, write with the head movements.

But you could also scroll down and up or press enter or do whatever you want.

The thing is, as soon as you realize that it's possible, then you can have the creative work of thinking about the actual interaction.

So that was for face.

But you can do� you can, you know, take a step back and see a bit bigger and look at pose detection.

So this is kind of� again, it's using TensorFlow.js.

I didn't intend to have this talk about just TensorFlow.js.

But as I was putting my projects together, I realized there was a common theme there.

And it's using another model that gives you key points about the entire body.

So not only your face, but you can have where your shoulders are, elbows, wrists, knees, and all that stuff, relative to the size of your screen.

So you get X and Y coordinates and you're able to control interfaces with this.

The idea for the project here that I called BeatPose was to� there was an open source repository on GitHub that was a clone of BeatSaber in JavaScript as well using the A-frame library.

But at the time, to connect to it, you had to use, I believe, the joystick from an Oculus Go, Oculus Quest, I forget the name.

Like the VR headset.

And, again, it's like, well, I didn't have one at the time and I don't want to spend $500 on a VR headset if I can try to rebuild my own version.

So I thought it would be super cool to be able to smash some beats with just my hand movements.

So I'm just going to play the video hoping that it will� the sound is not really good because it was recorded in my apartment and the acoustic wasn't good.

So I'm going to just pause it here because basically you kind of like saw the point.

And if you are a player of BeatSaber, you might be thinking, well, it's not as fast and I want to make bigger movements.

But again, so this recording was with using the first version of their kind of pose detection model.

And now it is a lot more performant.

I updated the project with one of the latest versions and I was able to play in advanced mode, so a lot faster.

And sometimes, you know, if you're working with this, the interesting part as well is that the world is created in 3D coordinates, but the detection is in 2D coordinates.

So to be able to have the collision detection, you have to do some pretty tricky logic in how to be able to map 2D coordinates to a 3D world, which is a little bit complicated and that could be another talk, but not today.

And so using pose detection as well� oh, no, okay.

I think that I built at the beginning of the pandemic.

You know, at the time, people didn't really know if you could even go outside apart from buying groceries.

And I wanted to go for a run.

And obviously I was like, well, I don't know if I can.

And I just moved to Amsterdam at the time, so I didn't really know where to go.

So I used the same concept.

I think I can play.

There's no sound.

There's going to be sounds of birds.

And as I'm detecting that I'm running, I'm playing a video on YouTube that somebody uploaded.

And if I stop running, it stops.

So if I want to see the end of the trail, I have to keep running, basically.

And the thing is, these videos sometimes are, like, two hours long, so I would never do that.

But sometimes for a little bit, you can run.

And I projected it in the living room, because I guess in front of the laptop, it's not the same experience at all.

And then at the end, I just tried to create a very simple 3D world so that if I didn't like the videos that were on YouTube, I was like, oh, maybe I could run in my sci-fi world.

And obviously you could create much better things if you were spending more time on it.

But again, it was me trying to validate, is that even possible?

And it was interesting engineering work around, well, what does it mean to be running?

What would be the key points in my body to be able to do this?

And I called it quarantine.

Funny.

So another one using pose detection is a clone of Fruit Ninja that I built as well.

And all of it is also in JavaScript.

So the fruits, I forgot maybe they were made in Blender, but then they're loaded in the browser using 3GS.

And the motion detection is also done in TensorFlow.js, and I had that same issue of mapping 2D coordinates to the fruits to be able to slice them and stuff.

So I'm going to try to do the demo, but again, with the lighting, I'm not sure.

So I preloaded this.

We should be okay.

It's loading.

Do-do-do.

Okay.

So if I, let's see if I can.

Yeah, smashing fruits.

Okay.

So see, sometimes the...

Okay.

And now game over.

But...

And, again, my interest in this is that if you're looking at your phone or your iPad, it's such a small screen.

And if you're looking at the interaction of really smashing things, you want to be able to take a lot more space.

And obviously I'm still restricted to the size of my laptop because I'm using the webcam.

And again, this is because it's JavaScript.

You could rebuild all of this in other programming languages and actually be able to create real interactions in a bigger space.

But I like to be able to take that kind of same interaction and create maybe a more human thing.

It's like if you want to slice stuff, then my natural interaction would be to slice like this.

So I don't want to be like do-do-do on my phone.

It's boring.

Ooh, okay.

So a more recent one that I built is you might have seen a Squid game on Netflix.

But if not, basically in the series there is a game where everybody is in a room.

There's prisoners or something like that.

And there's a robot that comes to you, three, or that sings.

And you're supposed to run towards the end of the room.

And if the robot turns and you move, you die.

And you have to...

Obviously everybody wants to live.

So you try to reach the end.

So I'm gonna play...

So again, I have to give credit to the UI at first.

I stumbled upon it because a frontend developer called Louis Hoogrupt.

I might pronounce his last name really terribly.

But he created the UI and I stumbled upon it.

But it wasn't really with that kind of interaction.

I'm not sure there was interaction or it was just the animation.

But then I was like, oh, but I can add on to it and collaborate with other people.

So if I play it...

It says you won for people who can't breathe.

So again, here, what does it...

Obviously if I'm not moving, then again, if you're thinking...

If you're an engineer putting on an engineering hat.

Or even if you're not an engineer, what does it mean to fall?

What kind of key points around my body do I want to focus on?

And what's the threshold between moving slightly or a lot?

Because even in the series, they're not completely still.

You can move a little bit.

But if you're making a bigger movement that it can detect, then you die.

And I thought it was an interesting...

It was almost like the perfect use cases for applications like this.

Where you have an interface on the laptop, but you can kind of bring it to life.

And here I'm only on my own.

I only did one person detection.

But I believe the model can detect multiple people.

And it would be fun to make it an even bigger installation or something.

Something more useful if you're a designer is to use body movement to design websites in Figma.

So I'm just gonna play it.

So again, I'm gonna talk over it as well.

So obviously if you're a designer, you'll be like...

Well, I do a lot more than that.

So I know.

I know.

So in terms of interactions, I'm just pinching my fingers and I'm bringing layers to the front.

And I'm just placing them on the UI.

And then I just have an extra gesture to be able to zoom in and out.

So I know that when you're using Figma, there's a lot more that you do.

But once you start realizing...

Okay, so you can send...

You can track hand movements, then send them to Figma, and then decide what you want to do with a Figma plugin.

So it was a very hacky Figma plugin.

If you want to really publish a Figma plugin, you can't do that.

Because they don't allow access to camera for security reasons.

But as a prototype, it was really interesting.

Because everybody watches these sci-fi movies where people draw interfaces in space and super cool movements.

But we never really do that in real life.

So now that...

If you could, what kind of gestures would you want?

What would you want to be able to do in Figma?

Because I'm sure there's things that wouldn't actually be nice to do with hand movements.

For example, typing still with head movements is not really great.

So you might still want to be able to type the name of your layers with your keyboard.

But maybe you would want to interact with voice commands and then do gestures for another thing.

So that's kind of like interesting experimentation.

So then if you get out of the browser, you can also use pose recognition for more of like room-sized computing.

So it's something I had looked into a while ago.

And that's a terrible prototype.

There's much better ones out there.

But it's being able to adapt what you're building to your house.

So I have two lights behind my TV.

One purple and one blue.

And as I'm moving my different arms.

So the right one turns on the right light.

The left one turns on the left light.

In this prototype, it's still using my laptop on my coffee table.

So it's still in JavaScript and then pinging the API of my Philips Hue lights.

The real point of a much better prototype would be able to really point at devices that you want to turn on.

So here that was just an exploration.

But let's say if I wanted to then turn on the TV, I might have a specific gesture just for that TV.

Or if I have my coffee machine on the left, then you would be able to create a room that understands you rather than you having to understand your devices.

So I think the main limit why it's not really possible right now is because when you talk about cameras, people are a bit freaked out.

I wouldn't want to put an Amazon camera, even though there's already security cameras.

But I don't have one.

I don't like this.

But I like experimenting with this and making it open source because I want people to be able to build their own stuff.

And then you can deal with your own security if you want.

So moving on from post detection, I want to go a little bit more focused on gaze detection.

And this one I might not try live because I don't think it's going to work.

But it's basically using the direction of your eyes to write.

And here on a digital keyboard.

But am I trying?

Try it?

Not try it?

How much do I have?

Ah.

Okay.

All right.

You like when it fails and then I'm like so lonely up here.

Okay.

So I have...

Okay.

So if I look left.

Yes.

Okay.

Right?

No.

Right.

No.

Okay.

I'm going to try to write the letter I. Yay!

Woohoo!

So the point here as well, I don't know if you have noticed, but this is when I realized that when you're doing keyboards on the web page, it's actually really it was not a good idea in my first example to just select one letter at a time with my head.

It would take a while.

But then if you split the keyboard into two columns and you let the user decide where is the letter, you end up doing binary search with your eyes and you allow people to write faster.

So this was actually a reproduced project that Google had made as a research project that was on the phone as well for people who had limited mobility.

And again, I wanted to see, well, okay, if you can do it on the phone, you should be able to write the kind of same interface on the web as well.

And it works.

And then, you know, I started thinking as well, what if it works with a keyboard like that?

With gaze detection, you could also try to do hands-free coding.

There's other ways to do hands-free coding.

I've seen people experiment with voice detection.

But in this concept as well, so I'm just going to explain the layout a little bit.

So on the right, I just moved the code away from the browser and I put it in an electron app so it stays in JavaScript land.

And then I realized that when you're coding, there's also a very limited amount of things that you can do.

If you want to declare a variable in JavaScript, you have var const let.

If you want to create a data type, there's also a very limited set of data types.

So if you look at all the things that you can do, you can kind of create a map that you can select with your eyes as well.

And then it would send that to VS code with a plugin.

And you can create� you can use snippets.

So instead of having to write every letter of a component, there's also snippets that you can use to create, you know, one of two types of components.

And then you have heart reload in the browser.

So I'm moving my eyes.

It's selecting whatever I want to write.

I'm using snippets so if I write RCE, then it writes the component for me.

And then I select letters with my eyes.

And it's just going to� I think I wanted to write hi with an I. But I used it like a Y.

That's fine.

It works as well.

And, yeah.

So it's like a very early as well exploration of what it could look like if I was spending more time.

What would it look like to actually be able to write code with your eyes?

Because if you combine the fact that you can do some kind of binary search between what you want to select in an interface, and then you're using snippets, basically, technically, I wrote a React component in less than 30 seconds with my eyes.

So it's really thinking about how we're writing code with that as well.

So just to recap, because I'm going to move on to another type of input data.

You can do facial expressions.

You can get the landmarks on your face.

You can get pose detection.

And you can get gaze detection as well.

So there might be more.

But that's what I've personally experimented with.

So now let's move on to hardware data.

So what do I mean by hardware data?

Here it could be a joystick.

Anything that gives you data from a piece of hardware that's not usually the keyboard and the mouse.

And in general, the way that you connect that with the browser, you would use the Web API, Web USB, Web Serial, or a custom Node.js module that you would have on the server and streaming whatever event or anything in the browser.

And if you just use this, it's already kind of an alternative type of interaction, because it's not the keyboard, the mouse, or a swipe.

So it's already another way of interacting with an interface.

But again, if you add machine learning in the middle, you're able to eventually maybe create your own models with your own custom interaction.

You don't buy a device and you have to use it the way other people told you to use it.

You can kind of create your own things.

So one of the very early examples of what I did with this is using an EMG sensor.

So it's muscle sensor.

It was an armband that was called the Mayo.

I say was because I think you can't buy it anymore.

But if you're interested in doing something like this, you can buy different electronic components of muscle sensors, and you might have to do a bit more work, because it would really be custom.

But this armband was when you were setting it up the first time, you had a set amount of gestures that you could record.

So basically if I was moving my hand like this or like this, it was recording the live electrical signals and then kind of like matching it with that label.

And when you were using it with live data, it was able to recognize and do pattern matching with what it had recorded before, and you were able to control things.

And considering it was connected with Web Bluetooth, I used the Web Bluetooth API and I could get live events in the browser as well.

I don't think I really did that much in the browser.

It was more like, oh, controlling drones in JavaScript with my arms.

And yeah, that's pretty much where I stopped.

But I'm sure you can do much more interesting stuff.

Ooh, brain sensors.

So this is like one of my early experiments as well, where the very first sensor that I bought was called the Emotiv Epoch, which was in the pictures of Tommy's talk this morning.

That's why I was like, ooh, somebody else who plays with brain sensors.

And you can actually do stuff in JavaScript as well.

So one of my first experiments was to see what it would look like to interact with a UI.

Yes, with a 3D UI, as if, you know, kind of like early VR brain controlled experiments.

So this was made in 3GS as well.

So the landscape is very� I tried to go, like, Tron-like vibes.

I'm not a designer.

You can see that.

And here what I'm doing is I have mental commands that I trained.

So the same way that the Mio was recording data of certain things that I trained and then matching that with live data, that does the same as well.

I trained mental commands that was, I think, going left and going right.

And then using live data, it's able to match what it had recorded before.

And when I'm going left and right in the UI, it's all done with my brain.

So that was one of the very early prototypes as well.

But one of the things with that device is that to get access to raw data, you had to pay.

And you had to pay quite a lot, I think.

Or at least more than what I was comfortable paying, because I wasn't making that much money at the time.

And also, if I only prototype with it once or twice a year, it's not really worth it to pay a monthly fee.

So I moved on to another device that is called the NeuroCT Notion.

And now it's the crown.

The new model is the crown.

And the concept is a bit the same, where I trained two mental commands and I tried to play a brain-controlled clone of Street Fighter.

You can see it's a clone.

There's really just two characters.

And only one of them is like...

That's basically me.

And I trained two different gestures where I think when I'm tapping my right foot, it's doing a Hadouken.

And then the other one.

But...

Yeah.

So very much a prototype.

I didn't push it that much further.

But working with this, I realized that...

Well, okay.

First of all, there's a bit of latency, depending on how good you have trained your thoughts.

I mean, that's weird to say.

And also, it's a bit hard.

It's actually hard to focus on a specific thought and not really just, like, your mind starts going somewhere else.

But I also realized...

Okay.

So in general, it works better if you train two thoughts.

But then...

Okay.

So if I have only one or two thoughts, what kind of game could I work with around that as a constraint?

So I decided to do the Dino Game.

There's only one input.

So with the Dino Game, your only input is the space bar.

So here you can see my very sorry face.

And again, I'm trying to think about tapping my right foot.

And I'm trying to, like...

And at first, I'm not too bad.

And then it starts to go down.

But, you know, it's when you start to be distracted by the fact that you lost.

I'm trying to lose and lose and lose.

But it's interesting as an interaction.

Because again, I can simplify my code by only having one gesture.

And then I can actually find a type of interface that works with that gesture.

So obviously, you can't really, if you wanted to pair that with a car or whatever, you wouldn't want to do that.

Even though some people have with Tesla.

But I think it would have just been, like, accelerate or something.

Which is still very dangerous.

But you start learning about that kind of technology before it's available everywhere else.

So moving on from wearables.

If you are into more clothing or fashion tech, you can also work with conductive thread that you can pair with a JavaScript system as well.

And this particular project was actually I don't know if anybody here is familiar with Project Jackar.

That was a few years ago.

A collaboration between Google and Levi's.

And they had made this jacket.

I think it was for cyclists.

Where if your phone is in your pocket, to avoid having you looking at the screen, you could pick up a phone call or play or pause a song by just touching the thread.

And again, I was like, well, I don't want to buy a jacket just because I want to know how this works.

So I bought just some fabric conductive thread.

And I sewed them in a grid.

So instead of just having one end point of am I touching it or not, if you sew them in a grid, you're able to use all interactions as a potential X and Y coordinate for a UI.

And here I tried to recreate one of the UI that they had in their experiments.

Where, again, it is 3GS, so JavaScript in the browser.

And now you can start to see, well, actually, it could eventually be used in either another 3D environment.

Where it could be a game.

But even in a 2D UI, you could have certain gestures that could be interesting to build as well.

What do I have next?

Oh, okay.

So motion sensors.

So usually IMU sensors, you can also say accelerometer and gyroscope.

And in this experiment, I had at the time I wanted to learn to play drums.

I still want to, but I didn't.

And I had this friend who was a drummer, and he told me about these devices that you can buy for when you want to drum and you're traveling.

So you have these devices, you can attach to your drumsticks and two that you attach to your feet, and they're motion sensors, and you can play air drums.

And when I bought them, I realized, oh, they tell me I have to download a piece of software, and I don't want to download things, I want to do it on the web.

So I built a JavaScript framework to connect the devices to the web via Bluetooth, and then because you only get JavaScript data in the browser, then the world is your oyster.

You can do what you want.

You can attach whatever sound you want.

You can add animations.

So here was me trying to be cool in my living room.

Oh, no, not that one.

Okay.

Wait.

I think it's on YouTube.

That's why.

Yeah.

If you're a drummer, you're going to cringe, because I'm not very good.

I fucked up here.

That's why it's like...

That's basically the only pattern I know.

So I did it forever.

But it was really cool.

It was super fun.

And here I used the drums in the way they're supposed to be used to play music, but you could try to do something else with them.

You could interact with other devices.

You could turn on your lights by doing a certain gesture, or you could use machine learning later on to be able to...

I think you can get raw data from them as well, whereas here I only used the MIDI signals that you're getting.

But staying in motion sensor land, this one is probably one of my kind of favorites that I built.

And again, so using accelerometer and gyroscope, I think I came across an installation that was doing something similar, but I wanted to do it in the browser.

And what it does is I kind of wanted to create a hoverboard experience, you know, back to the future.

But again, in my living room by myself.

And I have an accelerometer and gyroscope on a skateboard that is on a carpet, and I'm just moving the natural gestures that you would do on a skateboard, so tilting back and forth.

And using this natural gesture, you can...

Oh, no!

I forget which one is on YouTube.

Anyway.

I thought I was ready for this.

So yeah.

I'm connecting, standing on my skateboard.

And having fun, basically.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

And I'm just moving the natural gestures that you would do on a skateboard.

I'm not going to play the whole thing.

You get the point.

But again, it's like using natural gestures, because on a skateboard, you're usually tilting back and forth.

So I did also tilting the front to go faster.

Or whatever.

But again, it was a prototype.

And when I realized, oh, I can do it, then I can decide to go further with it or not.

But again, the UI is made in 3GS.

It probably would look better if Guillaume was making a 3D UI.

But now you know that you can interact with interfaces that way as well.

And again, the projection is just because it felt better when it was bigger.

In front of my laptop, it would have been a bit weird.

But that would work as well.

Even if you don't have a projector, it still works.

Yes.

Okay.

So this one, I wanted to try to demo it live.

But I'm not sure it's going to work.

So I have a video recording, if not.

But so repurposing the...

I'm going to try to speak and figure out how I'm going to do this at the same time.

So here I have the same UI prototype of Street Fighter that I had used with my brain sensor before.

And then I thought, okay.

So brain sensor, I could see that there was a little bit of trouble, because there's not that many gestures, and it's not always quite responsive.

But it wouldn't be cool to be able to play Street Fighter in real life without having to type on my keyboard.

So I tried to do that.

And usually I did not...

The first version wasn't using my phone.

It was using another device that had an accelerometer and gyroscope.

But knowing that you can get motion data with a web API as well, you could actually use your phone as a controller and hold it in your hand and try to punch things.

So I tried it this morning.

It worked.

Sometimes it doesn't.

It depends on Code Sandbox.

Sometimes Code Sandbox is happy with me, and sometimes it is not.

So once this is loaded...

All right.

Okay.

And now I need my phone, because it's using WebSocket.

So do not let me down.

Okay.

So I'm going to start.

I'm the red character.

Oh, you don't see a bit here.

Okay.

So if I do...

Yeah!

Okay.

So if I do...

Yeah!

Oh, my god!

It's so good!

Okay.

No, not that one.

Okay.

So, you know, it can't always work.

But that's fine.

So it's like using...

And the way that I did this is...

I'm going to go back to my slides.

Stay safe.

And so this was, like, probably the most complicated project I ever built, because it was...

Even though you get raw data from Axiomator and JavaScript in the browser, for it to be able to understand my gestures, I had to use machine learning and record all of these samples and create my own model and then doing all of it in JavaScript.

And the cool thing, though, is that we're probably all kind of punching the same way.

So even though I made this model for myself, I'm sure that if people were trying it, at least the punch, I think, would kind of be similar.

So hopefully...

You know, obviously it's a prototype.

There's only the red character moving.

But if I was pushing it a bit further, you could have a real game with two people where you would just punch in the air.

And I feel like it's such a better interaction than pressing buttons on a keyboard.

But...

So just to recap that part as well.

Muscle sensors.

So I use my arm band.

But you can find others around or make your own if you're adventurous and you like tinkering with stuff.

Otherwise, brain sensors and Axiomator and JavaScript.

But there is a lot more out there.

There's infrared sensors and a lot of other things that you can work with.

And I'm sure some of you might have already worked with that.

Okay.

So I'm getting to...

No, I have time.

So the third input that I want to talk about is audio data.

And this one is, like, I think I stumbled upon it more recently.

And it kind of, like, you know, made me think about things differently.

So if you're thinking about audio data, usually you have a microphone.

And if you're working in the browser, you might use navigator.media devices.

But you turn on microphone, not camera.

And what you end up doing in general, people, you build visualizations.

So either a spectrogram or a music visualization, some cool stuff in the browser.

But again, if you've been following this talk, what do you add in the middle?

So if you also add a machine learning to this, you can learn patterns from that sound data and use that as some kind of interaction as well.

So one example of this is something that was kind of recreated a research paper that was around acoustic activity recognition.

And it was really this kind of, like, whoa moment where the paper focuses on the fact that a lot of different things that we do every day produces sound that we recognize.

If I ask you to think about the sound of your toaster when it's done.

You know that.

Or even in your house, the sound of when you open the fridge.

You can tell that somebody is opening the fridge from another room because you've understood that that sound is linked to that particular activity.

So here in JavaScript, again, I think if that loads.

I don't need the camera anymore.

So if that loads, I'm going to see.

So as I'm speaking, okay, there should be the spectrogram of me speaking.

And when the model is loaded, that might take a little bit of time.

So I'll just keep talking.

It's going to show it's going to write the label of me speaking there.

And if I stop and I do fake coughing.

So it recognizes the pattern.

You can see here the pattern of what coughing looks like.

And then it's able to and phone ringing.

Not really.

But it's fine.

So sometimes, you know, there's a little bug.

But again, it's a prototype that I make on the weekend.

But you can see that what it does is instead of using raw data, it's actually taking captures of a spectrogram and using image recognition models so it's more performant.

Then audio data would be really too much for a model.

But yeah.

So then working with this, I didn't really push it further either.

But I had people who were telling me, oh, it would be great to get my kids to brush their teeth.

You know, you could have a system that's listening and tells you from another room if your kids are really, you know, brushing their teeth or something.

I did try it.

Brushing teeth does work.

It's a sound I did train.

I just don't want to do it here.

But yeah, it does work.

And you could have systems, like, for example, if you're following a recipe on an iPad and you're cooking and you could train to recognize the sound of chopping something on a chopping board.

That's a sound you could recognize if you've ever cooked before.

And you could pause the video automatically if you're chopping so you don't have to go back and forth between your UIs and what you're actually doing.

So that would be one example.

But there's a lot more that you can do.

And using this same code, I actually rebuilt something.

I don't know if you have an Apple Watch.

But when Apple, I think in 2020, their annual conference, they released this thing with Apple Watch where it would recognize the sound of running water and it would start a counter for 20 seconds to make sure that people were washing their heads during the pandemic.

And the thing is, when I was watching the conference, I realized, oh, I think I know how to do this.

So while watching the conference, in two hours, I put this thing together, and it works.

I didn't have to use that much water to train it.

Because people were a bit� yeah.

People that had, like, really wasted water were like, no, I used less water than toilet flush.

So it was done really quickly.

And then the UI is a counter.

And yeah.

So I shared that.

And it's basically the same thing.

But instead of having to buy an Apple Watch, it would work on your phone or iPad or laptop or desktop.

Anything that can run JavaScript.

And you don't have to spend a lot of money on an Apple Watch.

And I was a bit surprised that the only thing people seemed to say is that my laptop is too close to the water.

Who cares?

I don't care.

I don't need it.

But yeah.

Apparently the rest doesn't matter.

And more recently, a few days ago, I actually� I mean, a few weeks ago, I read research around using the sound of touch on your face to create on face interactions.

So it's the type of� again, I've been thinking about sound, but I didn't think about more subtle sounds.

So if you're touching your face in different ways, it's close to your ear.

Your ear might� you might realize that your ear picks up on different sounds.

So if I'm tapping my cheekbone or rubbing my cheek, it creates a different sound.

And using the same kind of system with machine learning and sound data, you would actually be able to recreate interactions as well.

So in this very rough prototype, I used tapping on my cheekbone to scroll a web page.

But you could do a lot more.

And it can work with any wireless earbuds.

Because even though if you have Apple earbuds, you have some interaction built in.

Like if you're tapping on the device, it stops or pauses a song.

But not everybody has Apple earpods.

So with something like that, it could run with anything.

And you can create your own kind of custom interaction as well.

So that was a pretty rough prototype.

But anyway.

So I'm getting to the end of this talk.

I just wanted to show quickly� I don't know if� I'm kind of in front of the links.

But if you ever� you know, here are some of the links of the things that I talked about.

But if you want specific links that are not there, but that I mentioned, feel free to talk to me afterwards.

But I don't have a very inspirational way to end this talk.

I just wanted to� I hope that with this talk, maybe you've learned something.

Or maybe you're thinking, oh, actually, yeah, you're right.

I want to interact with interfaces in another way than what I've been used to in the past.

That would be my goal.

I understand that we're using the keyboard and the mouse because it's fast.

But if you remember the first time you learned it, you didn't know where the keys were.

And you were probably a lot more slow than you are now.

And I feel like we're dismissing all other types of interaction just because we want to be fast and productive.

But as technology increases, even brain sensors, I know that at the moment it probably sounds like something that's never going to make it.

But the hardware on devices is getting better.

And machine learning models are getting better.

I don't have the link to the research.

But there's something I wanted to read recently.

They launched a research where they were able to reconstruct a song that a participant heard only via only using their brain waves.

So that's creepy a little bit.

Because it's not going to be used for good.

It's like if people can match what you're thinking with actual words, it would be basically reading your thoughts, which is not great.

But if you're building my own little thing just as an experiment, it could be pretty interesting.

But anyway, thank you so much for your time.

I have nothing else to say.

Thank you.

Speakers