Meta's Voicebox Generative AI Makes Anyone Speak a Foreign Language

All the AI needs is a 2-second audio clip to generate speech.

Oscar Gonzalez Staff reporter
Oscar Gonzalez is Texas native who covers video games, conspiracy theories, misinformation and cryptocurrency.
Expertise Video Games, Misinformation, Conspiracy Theories, Cryptocurrency, NFTs, Movies, TV, Economy, Stocks
Oscar Gonzalez

Meta is working on a new AI. 

Sarah Tew/CNET

Generative artificial intelligence like ChatGPT and Google's Bard generates certain text in response to a query using natural language processing and machine learning. Meta's new generative AI, Voicebox, does things a little differently -- by producing audio clips. 

Voicebox, announced Friday by Facebook's parent company Meta, can synthesize speech using a 2-second audio sample. With that clip, it can match the audio style as well as do text-to-speech generation or re-create a portion of the speech that may have been interrupted by some external noise. Voicebox can also take that sample and have it read English text in other languages such as French, German, Spanish, Polish or Portuguese. 

Meta says Voicebox can be used to give a natural-sounding voice to virtual assistants or nonplayer characters in the metaverse, which are digital worlds in which people will gather to work, play and hang out. It could also be used by visually impaired people to hear messages read by the voices of their friends. 

Watch this: How to Clone Your Own Voice with AI

Voicebox is still a work in progress and not available to the public yet. Meta says it recognizes the potential harm this AI could be used for and is working on an effective way to distinguish between authentic speech and audio generated by Voicebox.

Editors' note: CNET is using an AI engine to help create some stories. For more, see this post.