CNET también está disponible en español.

Ir a español

Don't show this again

Stimulus deal deadline French's Mustard Beer Trump bans TikTok New Apple 27-inch iMac MacOS Big Sur Public Beta Samsung Galaxy Note 20

I talk like a robot now, and it's all Siri's fault

Commentary: My gadgets' voice assistants are cool and convenient, but speaking to a soulless machine has made me sound like one, too.

"Yes comma I can't wait to see you too period I'm so excited exclamation mark."

This is not a sequence of words uttered by normal human beings. They are the words you speak when you are talking to a gadget and care about punctuation. And lately, I've been talking to my gadgets a lot.

You probably have, too.

If you've ever invoked "Siri," "OK, Google," "Cortana" or "Alexa," then you already know that a major selling point of these so-called personal assistants is being able to get chummy with them. They know jokes, for Pete's sake!

But you might also notice the way your voice gets a little more wooden and your delivery more calculated when you talk to your tech. And that's crazy when you consider a future where everything from cars to smart-home appliances will be seamlessly integrating voice control into your life.



James Martin/CNET

For the most part, the promise of breezy conversations with your technology, let alone speaking normally to a device, is still only partially met. The reality -- my reality -- is that I have to adjust my language patterns so that gadgets get me. If I want my device to understand me every time, I have to sound like a calculated automaton. I have to sound like it does.

The problem with 'natural' language

I have a fair amount of experience talking to inanimate objects. I dictate texts, emails and notes into my various test phones and smartwatches when I walk, all to give my aching thumbs a break from typing. At home, I constantly ask my Amazon Echo to do things like play music and calculate cooking measurements. But despite all assurances that they grok casual, or "natural" language, the personal assistants in my life only sometimes understand me.

In theory, casual speech should be enough to get you what you want. Instead of barking, "Cortana/Siri/Alexa! Weather! San Francisco!" you can instead fancifully ask if you need sunscreen or an umbrella and the device will respond with the day's temperature highs and lows.

The problem with this type of speech, called natural language, is that it doesn't consistently work. You can spend all afternoon posing cutesy questions to the Amazon Echo's "Alexa" personality, Apple's Siri and all the rest about the need for umbrellas, or who their daddy is, or how many angels can dance on the head of a pin and they'll answer because they've been programmed to do so. But ask something they're not ready for like, "When's the next leap year?" or even, "Do I need rain boots today?" and you may not get a straightforward response.

I've discovered again and again that virtual assistants yield hit-or-miss results. They frequently misunderstand names, common words, local restaurants and cultural terms. Sometimes I repeat my query just to see how many times it takes to register. Other times, I give up and type in the search term myself.

Like talking to a 3-year-old

Have you ever heard someone talk to a new gadget for the first time? You tend to raise your voice and slow your speech. You think through an entire command before speaking it, because machines are literal and because a gadget's voice capture period quickly times out -- these devices do not suffer long pauses. If you don't say what you mean fast, you'll have to waste time speaking again.

C'mon, Alexa. Talk to me.

Sarah Tew/CNET
It's crossed my mind that talking to a smart device is sometimes like talking to a young child.

You have to speak clearly and deliberately, and part of you is always a little surprised and impressed when it does what you want. (I hang out with a lot of toddlers.)

I've had the best luck with Google and some of the worst with Siri, though I do still use Apple's software for dictation. My colleague Dan Ackerman swears by Amazon's Alexa, but it lacks the ability to dictate messages or fetch search results. "I'm best with factual questions," Alexa says back to me. At least "she" knows her limitations.

It's all Siri's fault... right?

I facetiously blame Apple, whose Siri voice assistant emerged as the iPhone 4S' killer feature. Back in October 2011, Apple crowed about Siri's ability to interpret natural language, but more than four years later, progress has stalled. Apple and competitors have added features here and there, but have yet to solve the problems of reliable understanding and execution that I encounter day to day.

In 2011, Siri really did stand out for its flexibility. Compared with the existing voice-dictation software of the day, which was used on BlackBerry devices and flip phones alike, the iPhone's Siri was much less rigid.

Siri wants to speak your language.

Marta Franco/CNET

Although voice-recognition company Nuance supplied the technology behind Siri as well as competing software, Siri alone understood "Give Heather a call" and "Dial Heather for me" as easily as it did the common, but comparatively dry command to "Call Heather."

Now in 2016, I find that casual forms of relatively simple requests still work best. It's the longer, more complicated -- more conversational -- demands on both Apple's Siri and its voice dictation software that require the kind of stilted speech you get when you slow down and pause between words in an effort to say... what... you... want... just... once. Or else correct garbled text by hand.

It'll all get better, eventually

Voice-recognition software is hard to make, and I'm confident that virtual assistants and voice dictation will improve in the future as people increasingly rely on back-and-forth "conversation" with their cars, smart-home appliances and watches.

In small ways, it already has. Microsoft and Google have both done work letting you ask some follow-up questions to Cortana and Google Now. And overall, the "OK, Google" vocal recognition engine has done a great job of drawing from Google's search and maps databases to nail it on spelling and to match your sounds to actual meaning. Google alone pulls up the most relevant search results to a query, a pleasant surprise every time.

Nailed it. Google Now's response to a follow-up question was right on.

James Martin/CNET

But that brings me back to the crux of the problem. Instead of being surprised that a voice assistant does understand us, shouldn't we be surprised when it doesn't? If voice input is truly the way we'll control devices in the future, then we must have better than what we now have. In fact, I think we deserve it.

Just think: Rather than amuse us with terrible jokes to make it seem likable and adept, what if your gadget could sustain a dialogue wherein you research flight prices on multiple routes before finally buying through the device?

Imagine verbally checking Yelp ratings and OpenTable availability at several restaurants before making a reservation. And when you do, how awesome would it be if your talking tech actually offered to put that event in your calendar and email your friends, too?

Here's one final, humble request while we're at it. Let's see these voice assistants take the pain out of dictating punctuation by automatically adding in the same commas, periods and capitalization that are implicit in everyday speech. Just because they sound like robots when they're reading out some text, why should we?

I mean really comma is that so much to ask question mark.