Deepfakes are a major security concern for their ability to make it appear as though people are doing or saying something they never actually did. But startup Descript is trying to use the artificial intelligence technology for something simpler: podcast editing. On Wednesday the company launched a beta of a new feature in its Descript podcasting software. The feature, called Overdub, is designed to let users create a realistic text-to-speech model of themselves by uploading a few minutes of audio.
The technology, which comes from partner company Lyrebird, is meant to save podcasters from having to rerecord or splice audio when a mistake or change is made. Instead, you simply type the words you want to add into the recording, and Overdub makes it sound as if your voice is saying them.
It's expressly not meant for creating deepfakes, according to a blog post from Descript CEO Andrew Mason. To train the voice model, users need to record themselves speaking randomly generated sentences, so others can't use preexisting recordings to create a model using someone else's voice, the post said.
However, Descript said in an ethical FAQ that though its product is unique, the foundational research is already widely available. Future products from others may not have the same constraints for use.
"That's why it's important for us to showcase the technology to the world in a controlled environment," Jose Solteo of Lyrebird said in an email to CNET. "So that the world can be better prepared against potential malicious attacks."
Descript stores the recordings used to create a voice double. However, users can delete their recordings at any time, Solteo said.