In the Age of AI, Who Owns Your Voice? The Cautionary Tales Are Adding Up

Commentary: The OpenAI–Scarlett Johansson case is a warning to all of us that the voices we hear may well have been detached from their original owners.

Katie Collins Senior European Correspondent

Katie a UK-based news reporter and features writer. Officially, she is CNET's European correspondent, covering tech policy and Big Tech in the EU and UK. Unofficially, she serves as CNET's Taylor Swift correspondent. You can also find her writing about tech for good, ethics and human rights, the climate crisis, robots, travel and digital culture. She was once described a "living synth" by London's Evening Standard for having a microchip injected into her hand.

See full bio

Katie Collins

May 24, 2024 1:38 p.m. PT

5 min read

In the Disneyfied fairy tale The Little Mermaid, the titular character Ariel is swindled out of her voice by the sea witch Ursula. It's only once her voice has been separated from her body that Ariel realizes how vital it is to her personhood.

This week, the Hans Christian Andersen classic has looked like a parable for our times as movie star Scarlett Johansson grappled publicly with the question of who owns her voice and what she can do about it if someone mimics it for commercial gain.

When Johansson declined a request to lend her voice to OpenAI's latest generative AI tool, ChatGPT-4o, company CEO Sam Altman served up another actor who sounded convincingly like her. Altman denied trying to make GPT-4o sound like Johansson. But his denial rang hollow given that on the same day OpenAI unveiled GPT-4o, he posted a one word tweet — "her" — the title of the 2013 sci-fi film in which Johansson voiced the female AI assistant Samantha.

Whatever the case may be, the Johansson situation is a warning to all of us. However closely attached we are to the sound of our own voices, AI has created a world in which this sense of ownership is under threat. Whether we give our voices freely or they're stolen from us, AI can be used to make it sound like we've uttered things we've never said.

Audiobox and me

I just got a little taste of that myself. At Vivatech in Paris this week, I tried out Meta's generative AI voice tool Audiobox, which the company first announced last summer. The tool works by listening to an audio recording of your voice and synthesizing it so it can be used to read text out loud, just as if you were speaking the words yourself. You can try it here.

I recorded several seconds of my voice, using an iPad, and then typed in a phrase I wanted the tool to read out loud to me. Within a minute, it read the sentence back to me in my voice. Hearing my voice emerge from the iPad, reading a sentence I've never actually spoken out loud, was an uncanny experience.

Meta's Audiobox dashboard — Meta's AI tool is just one of many.
Meta/Screenshot by CNET

Meta's privacy policy means that your AI Audiobox voice is your own to use, but a whole raft of companies, including ElevenLabs and Speechify, are popping up that are capable of creating AI versions of voices. I gave my voice freely for the sake of this experiment, but it could just as easily have been recorded from the radio, TV, a YouTube video or a podcast and made into an AI clone without my knowledge or permission. People whose voices are out there in the public domain — celebrities, for example — are the easiest targets, but from voice notes to phone recorders, no one is immune.

Remember the election robocall that mimicked President Joe Biden's voice?

In the age of AI, we have little choice but to accept that our voices are never truly safe from spoofing. Just as we've grown wary of AI-generated images, we have to be open to the reality that not everything we hear can be believed.

When nothing can be taken at face value, trust is more important than ever. The people creating our tech need to show us they can be trusted to do the right thing by the people they're designing for — us. Companies such as Meta and Google are familiar by now with people publicly scrutinizing their trust and safety policies and their ability to enforce those policies. Even if we're still uneasy, most of us understand these efforts and have a sense where we stand.

OpenAI needs to build trust, not break it

Newer companies such as OpenAI without a legacy of trust and safety still have to earn their stripes. This week, Altman's company fell at one of the first hurdles.

OpenAI may not have trained ChatGPT-4o on Johansson's voice, but by finding a way to mimic it regardless of the actor's wishes, the company clearly signaled that it considers people's vocal likenesses to be fair game. And given that she refused to lend her voice to OpenAI but the company made its tool sound like her anyway suggests it wasn't interested in honoring her refusal to consent.

Altman rejects this version of events, saying in a statement that OpenAI cast the actress voicing the chatbot Sky before ever reaching out to Johansson. "Out of respect for M. Johansson, we have paused using Sky's voice in our products," he added. "We are sorry to Ms Johansson that we didn't communicate better." OpenAI didn't immediately respond to a request for further comment.

Consent has also been at the core of copyright lawsuits filed against OpenAI and Microsoft over the texts used as training material for their large language models.

AI has made questions of copyright and ownership of IP "a little bit fuzzy," Dario Amodei, CEO of AI company Anthropic, said at VivaTech. Amodei worked for OpenAI before breaking out to set up Anthropic, maker of the Claude chatbot, and he's been critical of his former employer. Anthropic has so far stuck to text rather than introduce "other modalities," precisely because of the complexity of these questions, said Amodei.

As AI becomes increasingly smart and capable, Amodei said, we're going to need to grapple as a society with the realization that AI is going to infringe, sometimes uncomfortably, on what humans are able to do.

I would argue that we're already there.

Like other technologies before it, AI has arrived in our lives before the guardrails are in place. Governments are scrambling now to rectify this and provide tech companies with a rulebook to adhere to. Earlier this year, for instance, the US Federal Communications Commission, outlawed the use of AI-cloned voices in robocalls.

As this debate plays out, we might not be able to lock our voices away completely, but we can protect ourselves in the brave new world of AI by remembering that the voices we hear coming out of our TVs, radios, phones, PCs and other devices may well have been detached from their original owners.

Editors' note: CNET used an AI engine to help create several dozen stories, which are labeled accordingly. The note you're reading is attached to articles that deal substantively with the topic of AI but are created entirely by our expert editors and writers. For more, see our AI policy.

Services and Software Guides

VPN

Cybersecurity

Streaming Services

Web Hosting & Websites

Other Services & Software