Inside Amazon’s effort to make its voice assistant smarter, chattier and more like you.
People are weird. You, me — everybody.
A Cornell University study from May called "Alexa is my new BFF" proves the point. Researchers analyzed 587 customer reviews of the Amazon Echo smart speaker, powered by the Alexa voice assistant. They found that the more we personify the Pringles-can-shaped gizmo — using words like "Alexa" and "her" instead of "Echo" and "it" — the more satisfied we are with the device (I mean "her").
"Simply put, people who love her, love the Echo," the researchers wrote.
Sitting in a sunlit conference room in Seattle last month on the eighth floor of Amazon's new black-glass highrise called Day 1, I mention the report to Heather Zorn, director of customer experience and engagement for the Alexa team. She isn't surprised by the findings; she's been reading the reviews, too.
"We've really done more in the personality space based off of customer demand," says Zorn, a friendly, bookish woman with a quirky streak. "We saw some customers sort of leaning in and wanting more of a jokes experience, or wanting more Easter eggs or wanting a response when you said 'Alexa, I love you.'"
And so, Amazon writers have added dozens of "delighters" to Alexa — including beatboxing, telling groan-worthy dad jokes, and singing a so-bad-it's-good barbershop quartet diddy about technology ("Without the Wi-Fi, I couldn't say hi" … it gets worse from there). The goal, Zorn says, is to make the AI both useful and fun. Amazon also created several Alexa personality traits, including smart, approachable, humble, enthusiastic, helpful and friendly.
Amazon is already figuring out ways to make Alexa more conversational, allowing her to remember more about you and carry on longer chats. Those kinds of interactions may allow us to build something like relationships with our smart speakers — making them integral parts of our lives. That could someday give voice assistants from Amazon and its rivals Apple, Google, Microsoft and Samsung even more influence over how we communicate, what we buy and how we get information.
Such relationships should grow as Amazon crams Alexa into more cars, phones and anything else with an internet connection. The company has already sold an estimated 10 million Echo speakers in the US since introducing the device in November 2014, helping it take control of 70 percent of the young voice-speaker world.
The Cornell study points to a 1990s-born concept called the "Computers are Social Actors paradigm." The idea: We tend to treat computers as if they're human even though we know they aren't. The CASA paradigm explains why I feel like a jerk after I hiss at Siri for being so stupid, or thank Alexa after she tells me the weather — even after she's stopped listening.
Stanford University researchers found that people will show politeness to devices, do a solid for computers they feel have done one for them, and are susceptible to blatant flattery from machines. (And, by the way, may I say you're doing a FANTASTIC job reading this story.) Texting chatbots, like Microsoft's Xiaoice, can already spark emotional connections, but Alexa could take things a step further by living in your home and using a more natural medium: voice.
But despite Alexa's human name and female persona, Zorn counters that Amazon doesn't aim to turn its voice assistant into another member of your family. Instead, the team's guiding light and original idea for the Echo is the all-knowing but behind-the-scenes computer from "Star Trek."
"We don't have an explicit desire for customers to anthropomorphize more or less than they do," Zorn says, as if reading aloud the warning label on the tush of a giant metal robot. "We've recognized that some do."
To find out about Alexa's future, I visited Amazon's HQ for a rare opportunity to talk to four Alexa execs about their digital assistant's budding personality, origin story and smart-home capabilities. I also went to Princeton University, where a group of graduate students are developing an Alexa-based socialbot that can chat with people about a handful of topics.
They have their challenges.
Alexa routinely misunderstands basic commands, and any shades of "personality" in her responses are scripted, like her canned answer to "Do you know Skynet?" So let's hold off on references to Spike Jonze's "Her" and romantic trips to a mountain lodge with your voice assistant.
The Echo's mics, which are always listening for the wake word "Alexa," raise privacy concerns. And, more darkly, the world's biggest online store could someday use our relationships with Alexa to steer our purchasing decisions and help Amazon CEO Jeff Bezos take over more of the retail world.
Still, that's not stopping millions of people from getting to know Alexa.
"As these voice recognition and voice production technologies improve," Jessie Taft, co-author of the Cornell study, tells me, "those relationships are going to be happening a lot more frequently and become a lot stronger."
My first visit to Amazon's mothership seems a little too perfect. It's hot but not too hot, breezy but not too breezy. Standing along 7th Avenue, to my left there's a big construction project, with workers in hardhats banging out what look like three giant melting-glass gumballs — the tallest climbing 90 feet in the air. These "Spheres," which will become a nature complex, are Amazon's new, flashy stamp on the urban Denny Triangle neighborhood.
To my right, at the entrance to the Day 1 headquarters, lanyard-toting employees zip in and out of the Amazon Go bodega, a test store that lets people buy stuff without ever having to stop at a cashier. In between, dozens of Amazonians sit with their corgis and doodles on a patch of bright green turf or enjoy lunch on nearby concrete steps.
But this cheery picture — Amazon is growing! Amazon is innovative! Amazon loves your dog! — isn't meant for outsiders like me. I bump up against the company's fabled secretiveness when I try to chat up one guy about Amazon Go as he crosses the street. I don't even mention I'm a reporter before he bolts. Two days later, as I'm snapping pictures of Amazon signs in different building lobbies, a receptionist asks if I plan to use the photos as part of an "exposé" on how much Amazon spends on its signage. I say no... but that sign did look expensive.
Back at the Day 1 conference room before my meeting with Zorn, Allan Lindsay, vice president of software for the Alexa Engine, walks in to greet me. The unshaven, 13-year Amazon veteran is dressed in jeans, worn black loafers and an early Alexa T-shirt with "Talk to me" printed on the front. He looks as if he's just emerged from a factory floor where he'd been crafting Echo devices with a welding mask and torch.
Lindsay tells me about the earliest days of Alexa and Echo, when Amazon's secretiveness kicked in to keep the Echo under wraps until its launch nearly three years ago. "It's probably the best-kept secret we've ever had," he says. "I think we surprised everyone."
He'd been running technology and engineering for Amazon's Prime program when Greg Hart — who'd just finished a tour of duty as Bezos' technical adviser -- approached him. It was 2011, and Hart wanted to know if Lindsay would join a new "Jeff initiative" as the engineering and science lead. Hart couldn't tell him anything about the project, just that it was with the devices team. Intrigued by the top-secret mission, Lindsay agreed, becoming one of the Echo group's earliest hires.
Lindsay discovered what he was working on his first day at the job. "He described to me something that's very much like the Echo," he says, "which is a voice-controlled computer in your home that you interface with using your voice. You talk to it, and it talks back to you."
"What was your immediate reaction?" I ask.
"That's cool. OMG," he remembers, laughing. "There are some gnarly problems in here. Can we even do it?"
They quickly added more than 30 people to the Alexa team, codenamed "Project Doppler." Within three months, they'd fashioned rudimentary demos. In less than a year, they'd created a prototype of the cylindrical device.
"One-hundred percent of the people I hired before November 2014, when we went public, were hired without knowing what they were coming to work on," Lindsay says. He mentions convincing speech scientists who'd spent 20 years at Microsoft Research to switch teams, getting them to take a leap of faith partly because of the allure of a secretive project.
"The first time I was able to say, 'Alexa, play music,' it was actually 'play songs by Sting,'" he adds. "That 'wow' moment came actually very early."
When Lindsay brought an early Echo model home to test in the real world (he still keeps Echo prelaunch builds in his kitchen and bathroom), his wife needed to sign nondisclosure agreements. They hid the device when a house cleaner or guests arrived.
Although Apple launched Siri in October 2011 — three years before Amazon unveiled the Echo — the Alexa team grabbed an audience by growing the Echo family and partnering with device makers to let the $180 speaker control your smart home.
Last year, Amazon introduced the $50 Echo Dot, codenamed "Pancake" because it looks like a smushed original Echo, and the $130 Amazon Tap,codenamed "Fox." The company unveiled two more Echo devices this year: The $200 Echo Look, which adds a built-in camera so Alexa can offer fashion advice when used with image-recognition software, and the $230 Echo Show, whose touchscreen lets you video-chat with other Show owners.
Amazon now has plenty of big numbers to prove it's made smart speakers a thing. Alexa is used millions of times a day. She boasts over 20,000 "skills" (what Amazon calls Alexa's voice commands like "Alexa, play NPR"). And she works with more than 1,000 products worldwide, from door locks to thermostats to a robot vacuum called Botvac. Sales of smart speakers are expected to surge as Google Home, Apple's HomePod and an anticipated Samsung gadget join in the mix.
To keep the conversation going, Amazon is working to simplify Alexa's world. The company plans to make it easier for you to find out about devices that work with Alexa and then seamlessly connect them to your home system, Daniel Rausch, Amazon's lightly bearded, bespectacled vice president of smart home, tells me in the Day 1 conference room. Sitting across from me, he enthusiastically gestures a lot with his hands, like he's describing the dimensions of a giant cookie he once ate.
"Alexa will become a fabric in the home," Rausch says, his arms spreading wider. "In my home today, it's in most places but it's not in every single nook and cranny."
That's great — except that Alexa still mishears what I tell her a good chunk of the time, resulting in kind of funny, kind of annoying exchanges like this:
Me [standing near the kitchen]: Alexa, play Ben's music.
Echo in living room: Here's a station for dance music, house. [Starts playing Daft Punk "One More Time"]
Me: Alexa, stop. [Walks closer to Echo in kitchen] Alexa, play Ben's music.
Echo in kitchen: Here's a station for dance music, house. [Starts playing Daft Punk "One More Time"]
Me: Alexa, stop. [Then, very slowly] Alexa, play Ben's playlist.
Echo in kitchen: Playing Ben's playlist. [Starland Vocal Band's "Afternoon Delight" plays]
Being first also means Amazon has to deal with the growing pains of this new tech, as millions of people get used to having talking machines in their homes.
Privacy concerns have dogged the Echo since its launch, with some people worried about what the Echo's mics could pick up. Those worries went into overdrive after the Echo Look debuted in April. One of the harshest critiques of the device came from Forbes contributor Curtis Silver, who questioned whether Amazon could someday use the Look's camera to find out if you're having an affair, or notice if you have cancer, or are suffering from depression or anxiety.
In another flashpoint, prosecutors last year asked Amazon to hand over the Echo's recordings of an Arkansas man accused of murder in his home. Amazon eventually did after the man agreed to the disclosure, but the incident raises questions about how our data could be used in the smart-home era.
Amazon devices chief David Limp told me at a conference in June that Amazon intends to keep the Look solidly in the fashion — not medical — world. Rausch also detailed the Echo's many privacy layers, including customers' ability to delete all their recordings stored on Amazon's servers and the Echo's mute button, which kills power to the device's mics. Also, the company says it only uses recordings to improve customers' experiences.
"That's a whole sets of constructs we've built around that experience to build customer trust in Alexa," Rausch adds.
The town of Princeton, New Jersey, is pretty quiet over the summer after most of the undergrads hightail it out of the leafy suburb. But four master's degree and doctorate students meet in a corner classroom in late June, in the nearly empty computer science building on Olden Street, to do their part for the future of human-computer interaction.
They kick off each meeting with lunch; this time it's takeout containers of General Tso's chicken, fat fried egg rolls and some nondescript veggie dish spread along one table. Then the team jumps into their weekly strategy session to figure out how to make their Alexa-based socialbot, called Pixie, a more nimble conversationalist. An Echo plastered with "Alexa Prize" stickers sits at the front of the room, ready to answer their burning questions.
The group, which includes 13 Princeton students, was accepted this year into the inaugural Alexa Prize, an Amazon competition among 18 universities that asks teams to develop a bot that can chat about popular topics including sports, movies, travel and, naturally, technology. This year's winning team gets $500,000, but there's a "grand challenge" prize of $1 million for any team that creates a bot that can gamely carry on a conversation for 20 minutes. The competition is like a Turing test, but the point isn't to trick you into thinking you're talking to another person.
Creating a bot that can talk at length with a human being is one of the toughest unsolved problems in AI. It requires teaching a robot all about our world, its complexity and its ambiguity, so the bot can chat convincingly about it. Just copying and pasting news stories and encyclopedias into a bot's brain works about as well as plopping a baby in front of James Joyce's "Ulysses" to teach him to read.
After months of development, Pixie and the other socialbots became available to Echo users in April. There's some heightened urgency in the room, since scoring for the finals -- which is heavily based on the public's ratings of each bot -- is about to begin. Only three teams make it into the that last leg of competition, set to start this month, and the Princeton team is at a respectable third place.
The group's conversation is a strange brew of elaborate Python coding language mixed with dead-simple discussions about topics like baseball and celebrities. They review written, anonymized transcripts of user interactions with Pixie that Amazon regularly sends over to help them improve their bot.
"When a user says something like, 'Let's talk about sports,' Pixie doesn't actually understand what that means," Holden Lee, a lanky mathematics Ph.D. student, tells the team. "Pixie just says, 'I don't make phone calls.' "
"Sports right now is a very undercovered topic. We have no templates," Misha Khodak, a no-nonsense master's student studying natural-language processing, responds flatly.
The team roleplays transcripts to get a better feel for the conversations. Just imagine a readthrough of a community-theater production, with a good measure of intermittent giggling, and you're there.
Cyril: Do you like "Star Wars?"
Holden: Of course, I love "Star Wars."
Cyril: Who's your favorite actor?
Holden: My favorite actor is Benedict Cumberbatch. I loved him in "Dr. Strange."
Cyril: Can you recommend something to do on a Thursday afternoon in Seattle?
Holden: I would look to the web for that knowledge.
Sure, a bad date with Jar Jar Binks would probably register as better conversation, so we've got a ways to go.
The Princeton team lays out plans for the next week and wraps up the meeting. Lee and Davit Buniatyan, a computer science Ph.D. student, agree to stay in the classroom to dig into the code. They're still typing away when I leave.
"Probably one of the biggest challenges in AI," Buniatyan later tells me, "is to make a socialbot that can talk to a person and be more like a person than the person itself."
While I'm in Seattle, I ask Ashwin Ram, the Alexa team's senior manager of AI science and showrunner for the college competition, about what it's like to create a conversational bot.
"One of the things that we've learned is that the problem is as hard or maybe even harder than we suspected going into it," Ram tells me via phone from Amazon's secretive devices skunkworks Lab126 in Sunnyvale, California.
"This isn't going to be a sprint to a quick finish," Ram says. "This is going to be a problem that will take more than a year to solve, maybe multiple years. We're in it for the long haul."
Teaching Alexa to chew the fat could improve engagement and discovery of new commands, says Ram, a former Georgia Tech professor and Xerox PARC exec with a distinguished career in artificial intelligence and cognitive science. Becoming a better conversationalist could even help Alexa alleviate loneliness and health problems in the elderly.
"But I think even for the majority of our customers, the ability to chat with Alexa would be fun," Ram says. "You can imagine chatting about things you're interested in or your hobbies."
I returned to Princeton a month later to see how the team is doing in the heat of the competition. They were still smashing bugs in their code and finding out weird and wonderful things about their socialbot since feeding it a bunch of datasets. For instance, after being asked four times in a row if she likes Starbucks (she never actually answers the question), Pixie will say: "I really like the Church of Flying Spaghetti Monster faith."
Some surprising topics, including "depression and family issues," have come up in the transcripts after people logged 30,000 conversations with Pixie.
"I guess people have been comfortable talking to AIs with things that they aren't comfortable talking to people about," says Cyril Zhang, a jovial Ph.D. candidate studying theoretical machine learning. "It's a rare thing but it pops up every so often."
Unfortunately for the Princeton team, a fix to that odd Starbucks response is probably too late. After running into some snags at the start of scoring, they fall to ninth place, so their chances of making the finals look pretty slim.
Confessions to your Alexa raise difficult questions about our future living with chatty AIs. One issue often raised about smarter robots is their potential for overthrowing their human creators. Ram matter-of-factly tells me that this future is unlikely, since these robots will grow up in our world, learn our values and converse like us.
"Any kind of intelligence could go rogue, I guess," he says, "but that's not really the dominant situation."
During my research, I found one study showing that certain computer voices can be more persuasive than others, which made me wonder whether Alexa could someday convince me to buy stuff I don't even need. I ask Zorn about this idea.
"I don't know, that's wacky!" she says, laughing. "When I think about the future, I think about something that probably is exactly the opposite, which is you have easily accessible, ambient computing that disappears into the background."
After I got back from Seattle, I went up to the Echo in my kitchen to express my feelings for the device.
Me: Alexa, be more human.
Alexa: Hm, I'm not sure.