Google releases ASL fingerspelling recognition model

Alex (Daily Moth): You recently released a video collaboration…

[Video clip, Credit: YouTube/TensorFlow

Video description: Sean Forbes is seated on a couch. He says, “Yo, I want to tell you about machine learning and how it can help the deaf community with a new data set of fingerspelled words.” There are computer animations that show Sean’s hands being tracked and analyzed. ]

Alex: …showing that you are working on something that enables a computer to read hands. You showed a cool video that showed it.

[Video clip, Credit: YouTube/TensorFlow

Video description: There are three separate areas in the video clip. The first shows a close-up of Forbes’ hands fingerspelling, “DEAF.” The second area shows Forbes drumming. The third area shows a close-up of Forbes’ hands fingerspelling, “FINGERSPELLING.” The hands are analyzed in a 3-D computer model.]

Alex: What is Google doing with artificial intelligence and sign language recognition?

Sam Sepah, Google Accessibility Research Lead: We at Google want to acknowledge that people in the Deaf community have pushed Google for many years that the new speech-to-text technology that hearing people can use with their voices to create text on a display is not accessible to the deaf community. They asked, “Where is our language, our original accents? Do any devices understand our language? There is no equal experience when we use Google products.” So we recognize that. We are working on building a sign language team. We have one right now. They have done research and experiments and we have a proof of concept of fingerspelling recognition to create signs to text. It works!

[Video clip, Credit: YouTube/TensorFlow

Video description: Forbes is standing in a room at the Google HQ. He says, “We can create new tools for everyone. The largest collection of fingerspelled words ever…” The hands are tracked with computer animations and analyzed in a 3-D computer model.]

Alex: So you’re saying that Google has developed a technology that can read fingerspelling? And translate it to text? And is it accurate?

Sepah: We have a machine learning model that can detect fingerspelling such as ABCDEF. The model can match it up with letters. It works perfectly. As for accuracy, the English alphabet part is easy because it’s a small set. But when it is compared to sign language — we see that isolated signs can be detected, such as “mom” or “dad.” The model can translate certain signs. But ASL is complicated, as you know. We have many classifiers such as (a car colliding with a tree). How do you translate that? We are working on that. The most important thing is whether we can meet the basic needs for our users who use sign language? Can the model recognize it? It works. The model has not been integrated into various products yet because we have not launched it yet. That is our goal. The most important thing here is a proof of concept that the fingerspelling model is working. We are making it open-source to the worldwide community, which means anyone, such as academic researchers, small companies, and nonprofit organizations, can use the model and apply it in the world to change how we communicate with our technology.

[Video clip, Credit: YouTube/TensorFlow

Video description: Forbes is standing in a room at Google HQ. He says, “Imagine this; I finger spell to the camera, it understands me!” (Gestures fingerspelling). “Hey, where’s the coffee shop near me? Sign to the map so I can navigate freely.” The hands are tracked with computer animations and analyzed in a 3-D computer model.]

Alex: So you’re creating this technology concept—

Sepah: —A model, right.

Alex: —and making it open to the world.

Sepah: It’s public. For academic researchers, companies, anyone.

Alex: So any company, such as Waze Maps? Or Yelp?

Sepah: Any company can use it because it is an open-source license. Even a small company, a start-up for deaf children, maybe something such as an app for K-12 education, they can use it. It’s up to everyone. The open-source license is important because we are raising the standard in the industry.

Forbes: That’s what makes me excited about this because I feel like this is needed for the Deaf community and nobody should have to pay for it. It should be readily available. I mean, can you imagine it if you had to pay for GPS?

Alex: What kind of devices do you use? Does it work with an iPhone camera? If you sign to it, can it recognize you? Or do you have to use a separate device? How does it work?

Sepah: Yeah. Believe it or not, our technology is tremendous. It works with mobile phones. It can detect very effectively. And I can take it with me anywhere around the world.

—----

[Sponsored video from Convo: www.convorelay.com]

—--------

[Advertisement from Disaster Distress Helpline: After a disaster, you may be at risk for emotional distress. Warning signs can include feeling isolated, anxious, having trouble sleeping and more. If you or someone you know is Deaf or hard of hearing, the Disaster Distress Helpline offers a direct videophone option.

This free service for ASL users is answered 24/7 by trained crisis workers fluent in ASL and can be accessed using any videophone-enabled device and dialing 1-800-985-5990 or at https://bit.ly/3CEwnNT

—--------

Alex: I’m wondering if there are any privacy concerns because the device would be filming you, while audio-based models only detect noises. The camera can “see” you all the way. How do you address privacy concerns?

Sepah: It is a good question. A couple of things. When people turn the camera on, they know what they are consenting to. Unfortunately, in the deaf community, there is often a tradeoff of privacy for accessibility. The second thing is that the cameras don’t detect faces in this current iteration of the sign language recognition model. It only detects hands. So it means concerns with privacy, personal information with faces, is not an issue. Even if you place your hands close to your face, the model captures only the hands, not the face. In the future, we want to expand the technology to include facial expressions because it is important, for sure. But the current model is focused on the hands, fingerspelling, and some signs. Two hands maximum. So that means privacy concerns should be minimized.

Alex: Sean, you just went to Google HQ to film the music video and promotion. What was it like? To meet the team, the people behind the AI product? I’m curious, can you share what you saw?

Forbes: It was incredible. The environment — as someone who loves technology — I was, of course, very observant. I was like, “What’s going on over there? How about over there?” To see the collaborative community… that’s what I think people don’t realize, that Google is by itself a huge collaborative community. There are different groups and sometimes they are working on a project and they are figuring out how their project can work with this other project. It was really fascinating to see that. What I really felt from the whole experience is that everyone is excited about sign language with AI.

[Video clip, Credit: YouTube/TensorFlow

Video description: Forbes and Sepah are standing at the end of a long table with Google employees sitting on either side. They are all doing the “deaf applause.” The video closes with a drone video of the exterior of the Google HQ building.]

Alex: Thank you so much!

Forbes: Thank you!

Sepah: Sure!

DEAF NEWSGuest User