X

Microsoft brings transcriptions to its Word documents app, but there are catches

The tech giant adds a new feature to the web version of Microsoft Word allowing people to record and upload audio for transcription.

Ian Sherr Contributor and Former Editor at Large / News
Ian Sherr (he/him/his) grew up in the San Francisco Bay Area, so he's always had a connection to the tech world. As an editor at large at CNET, he wrote about Apple, Microsoft, VR, video games and internet troubles. Aside from writing, he tinkers with tech at home, is a longtime fencer -- the kind with swords -- and began woodworking during the pandemic.
Ian Sherr
3 min read
microsoft-logo-laptop-3802
Angela Lang/CNET

Microsoft's Word writing tool will soon be able to record and transcribe audio, marking an evolution long requested by everyone from students to reporters to Microsoft executives. However, it has strikingly limited features when compared with competitors.

The new transcription technology, which will be made available for free to Microsoft 365 subscribers writing with Word via a web browser, allows people to both record and upload audio files to be transcribed often within moments. In demonstrations with reporters on Monday, Microsoft showed it worked well recording output from a computer's speakers to its internal microphone (so, no headphones plugged in). People can also upload prerecorded audio to the service.

But that's where its features matching competitors ends and where the tasks it can't perform start to pile up.

The transcription feature only works on the web version of Word, not on its desktop Windows or Mac apps and not on its mobile companions. Microsoft said it hopes to have the technology available for phones and tablets by the end of the year but wouldn't commit to offering the technology for the desktop apps. 

Competitors such as transcription tools built by Google for phones powered by its Android software can work with more languages, or work offline. And apps like Otter.ai, for example, offer easier search, markup and sharing.

A look at how Microsoft's transcription tools look on the web. 

Microsoft

Microsoft said what it offers against competitors is the simplicity of recording, storing and accessing transcripts within its suite of apps.

"We're really uniquely positioned to help provide a one-stop shop, where your audio, recording transcript, notes, and ultimately your story can all live together inside a single familiar secure tool," said Dan Parish, Microsoft's group program manager who worked on this new feature. He said the technology grew out of Microsoft's effort to help people "spend less time and energy creating their best work, and really focus on what matters most."

Microsoft's move to offer transcription technology marks a change that even the company acknowledged was a long time coming. People are increasingly relying on voice-enabled technology for many aspects of their lives, whether it's to turn up the music while they're cooking, send a text message while driving, or find a movie on their smart TV. Even the US government relies on automated voice transcription to help keep records of some of the president's phone calls.

As people increasingly adjust to working away from their office, Microsoft said its transcription software can help -- both to keep notes and to act as a third hand if you're suddenly interrupted by a child or pet during a meeting.

google-home-assistant-1353-3

Amazon, Apple, Google and Microsoft have been increasingly investing in voice control technology.

James Martin/CNET

Microsoft acknowledged the technology has limitations that the company hopes to make better.

For example, Microsoft said it will allow people to record unlimited audio if they use a web browser, but limits them to 300 minutes (five hours) per month if they record and upload later, such as if they're in a classroom with poor internet. Microsoft also said each audio file people upload has to at or below 200MB, or about 75 minutes of low-quality, mono MP3 recording. Like other services, people can upload MP3, WAV, MP4 and M4A files, though other services such as Otter.ai support various movie files too such as AVI, MOV and MPG.

Microsoft also said that transcription of a recording made in Word will happen within moments of pressing stop, in part because Microsoft is actually transcribing behind the scenes. However, an uploaded audio file could take as long to transcribe as the recording itself.

But Microsoft said it sees itself as "definitely right at the top of the industry" in terms of accuracy. That's in part thanks to its connections to the Azure Cognitive Services technology, which it's been refining for years.

"in general, obviously, we feel quite confident in the quality that we are producing here," Parish said.