Samsung deepfake AI could fabricate a video of you from a single profile pic

Even the Mona Lisa can be faked.

Joan E. Solsman Former Senior Reporter
Joan E. Solsman was CNET's senior media reporter, covering the intersection of entertainment and technology. She's reported from locations spanning from Disneyland to Serbian refugee camps, and she previously wrote for Dow Jones Newswires and The Wall Street Journal. She bikes to get almost everywhere and has been doored only once.
Expertise Streaming video, film, television and music; virtual, augmented and mixed reality; deep fakes and synthetic media; content moderation and misinformation online Credentials
  • Three Folio Eddie award wins: 2018 science & technology writing (Cartoon bunnies are hacking your brain), 2021 analysis (Deepfakes' election threat isn't what you'd think) and 2022 culture article (Apple's CODA Takes You Into an Inner World of Sign)
Joan E. Solsman
4 min read

Artificial intelligence developed by a Samsung lab in Russia can fabricate video from a single image, including a painting. 

Egor Zakharov

Imagine someone creating a deepfake video of you simply by stealing your Facebook profile pic. The bad guys don't have their hands on that tech yet, but Samsung has figured out how to make it happen.  

Software for creating deepfakes -- fabricated clips that make people appear to do or say things they never did -- usually requires big data sets of images in order to create a realistic forgery. Now Samsung has developed a new artificial intelligence system that can generate a fake clip by feeding it as little as one photo. 

The technology, of course, can be used for fun, like bringing a classic portrait to life. The Mona Lisa, which exists solely as a single still image, is animated in three different clips to demonstrate the new technology. A Samsung artificial intelligence lab in Russia developed the technology, which was detailed in a paper earlier this week. 

Here's the downside: These kinds of techniques and their rapid development also create risks of misinformation, election tampering and fraud, according to Hany Farid, a Dartmouth researcher who specializes in media forensics to root out deepfakes. 

When even a crudely doctored video of US Speaker of the House Nancy Pelosi can go viral on social media, deepfakes raise worries that their sophistication would make mass deception easier, since deepfakes are harder to debunk. 

Watch this: How San Francisco's ban could impact facial recognition tech

"Following the trend of the past year, this and related techniques require less and less data and are generating more and more sophisticated and compelling content," Farid said. Even though Samsung's process can create visual glitches, "these results are another step in the evolution of techniques ... leading to the creation of multimedia content that will eventually be indistinguishable from the real thing."

Like Photoshop for video on steroids, deepfake software produces forgeries by using machine learning to convincingly fabricate a moving, speaking human. Though computer manipulation of video has existed for decades, deepfake systems have made doctored clips not only easier to create but also harder to detect. Think of them as photo-realistic digital puppets.

Lots of deepfakes, like the one animating the Mona Lisa, are harmless fun. The technology has made possible an entire genre of memes, including one in which Nicolas Cage's face is placed into movies and TV shows he wasn't in. But deepfake technology can also be insidious, such as when it's used to graft an unsuspecting person's face into explicit adult movies, a technique sometimes used in revenge porn. 

Deepfake videos usually require a big data set of images to fabricate a fake video of someone, but an artificial intelligence system developed by Samsung created a fake clip from a single picture. 

Egor Zakharov

In its paper, Samsung's AI lab dubbed its creations "realistic neural talking heads." The term "talking heads" refers to the genre of video the system can create; it's similar to those video boxes of pundits you see on TV news. The word "neural" is a nod to neural networks, a type of machine learning that mimics the human brain. 

The researchers saw their breakthrough being used in a host of applications, including video games, film and TV. "Such ability has practical applications for telepresence, including videoconferencing and multi-player games, as well as special effects industry," they wrote.

The paper was accompanied by a video showing off the team's creations, which also happened to be scored with a disconcertingly chill-vibes soundtrack. 

Usually, a synthesized talking head requires you to train an artificial intelligence system on a large data set of images of a single person. Because so many photos of an individual were needed, deepfake targets have usually been public figures, such as celebrities and politicians. 

The Samsung system uses a trick that seems inspired by Alexander Graham Bell's famous quote about preparation being the key to success. The system starts with a lengthy "meta-learning stage" in which it watches lots of videos to learn how human faces move. It then applies what it's learned to a single still or a small handful of pics to produce a reasonably realistic video clip. 

Unlike a true deepfake video, the results from a single or small number of images end up fudging fine details. For example, a fake of Marilyn Monroe in the Samsung lab's demo video missed the icon's famous mole. It also means the synthesized videos tend to retain some semblance of whoever played the role of the digital puppet, according to Siwei Lyu, a computer science professor at the University at Albany in New York who specializes in media forensics and machine learning. That's why each of the moving Mona Lisa faces looks like a slightly different person.

Generally, a deepfake system aims at eliminating those visual hiccups. That requires meaningful amounts of training data for both the input video and the target person. 

The few-shot or one-shot aspect of this approach is useful, Lyu said, because it means a large network can be trained on a large number of videos, which is the part that takes a long time. This kind of system can then quickly adapt to a new target person using only a few images without extensive retraining, he said. "This saves time in concept and makes the model generalizable."

The rapid advancement of artificial intelligence means that any time a researcher shares a breakthrough in deepfake creation, bad actors can begin scraping together their own jury-rigged tools to mimic it. Samsung's techniques are likely to find their way into more people's hands before long.

The glitches in the fake videos made with Samsung's new approach may be clear and obvious. But they'll be cold comfort to anybody who ends up in a deepfake generated from that one smiling photo posted to Facebook. 

Originally published May 23.
Update, May 24: Adds information about a doctored Nancy Pelosi video.