Microsoft 'breakthrough' tech translates your voice instantly

Microsoft's Research division has demonstrated a speech translation tool that preserves the speaker's voice. Check out the video.

Luke Westaway Senior editor
Luke Westaway is a senior editor at CNET and writer/ presenter of Adventures in Tech, a thrilling gadget show produced in our London office. Luke's focus is on keeping you in the loop with a mix of video, features, expert opinion and analysis.
Luke Westaway
2 min read

Microsoft Research has demonstrated a "breakthrough" speech translation technology, which sees speech translated into another language almost instantly, while preserving the original speaker's voice.

Professional translators will gulp nervously watching the video below, which shows the brand-new process in action. Skip to the 7-minute mark to see the near-instant translation taking place, but the rest of the clip is worth a gander too, because it explains more about how the technology came to be.

"We hope in a few years that we'll be able to break down the language barrier between people," Microsoft Research's Rick Rashid says, shortly before his sentiments are converted into Chinese speech.

I don't speak any Chinese, so I can't verify the accuracy of the translations, but based on the audible applause in the room, the technology seems to be going down a storm.

On its official blog, Microsoft explains the tech is modelled on 'Deep Neural Networks', which uses patterns from human brain behaviour to train up artificial speech recognisers. The breakthrough occurred just over two years ago, Rashid writes, and was the work of researchers from Microsoft and the University of Toronto.

This newfangled approach decreases the word error rate by 30 per cent compared to earlier methods, which means one word in seven or eight is wrong, rather than one word in five. "As we add more data to the training, we believe that we will get even better results," Rashid says.

When it comes to Chinese translation, the first step of the process apparently is to translate individual words into their Chinese equivalents, then order the words correctly. An hour or so of Rashid's voice was monitored before the presentation in the video, and used to modulate the computer's digital voice to sound more like his own.

Fingers crossed that in a few years the tech will be commercially available, and we'll be able to have a tiny collar-worn speaker blurt out our drinks order in perfect French next time we hop the Eurostar to Paris.

Are you impressed by Microsoft's "breakthrough"? How long do you think it'll be before machine translations are indistinguishable from those of meaty humans? Tell me in the comments or on our Facebook wall.