1DeepMind’s lip-reading artificial intelligence
A team of artificial intelligence researchers at Alphabet-owned DeepMind are training computers to read people’s lips, with accuracy that rivals some of the world’s best human lip-readers.
In a new research paper published this month, researchers from London-based DeepMind and the University of Oxford described feeding an algorithm nearly 5,000 hours of BBC programming and training it to decipher 17,428 words and a little over 118,000 sentences.
In final testing, the DeepMind algorithm was able to correctly identify words from lip reading alone 23.5 percent of the time. That’s almost as good as professional human lip readers, who got it right 26.2 percent of the time, researchers found.
DeepMind noted a number of tricky sentences that its algorithm was able to correctly decipher, like “West Wales and the South West, as well as Western Scotland” and “According to provisional figures from the electoral commission.”
They also noted phrases that lip-reading got wrong: “Children in Edinburgh” was interpreted as “Children and handed broke.”
When researchers coupled their lip reading algorithm with the accompanying audio track from the BBC program, the results were immediately more accurate. Eventually, predictive grammar algorithms that know, for example, that the phrase “children and handed broke” doesn’t make sense contextually will make the system even more accurate.
The applications for such software are immediately useful. In the future, people might speak to an interactive video display on a busy street, and the software would be able to parse data using their audio track and visual cues. Automated transcription, like the kind found on YouTube, will get better, and allow people to search for content inside individual videos more effectively.