Facebook hopes to boost AI fairness, decrease bias by sharing real-life data

The social media company paid more than 3,000 people in the US to participate in videos used to improve AI.

Queenie Wong Former Senior Writer

Queenie Wong was a senior writer for CNET News, focusing on social media companies including Facebook's parent company Meta, Twitter and TikTok. Before joining CNET, she worked for The Mercury News in San Jose and the Statesman Journal in Salem, Oregon. A native of Southern California, she took her first journalism class in middle school.

Expertise I've been writing about social media since 2015 but have previously covered politics, crime and education. I also have a degree in studio art. Credentials

2022 Eddie award for consumer analysis

See full bio

Queenie Wong

April 8, 2021 6:00 a.m. PT

2 min read

Facebook says "biases can make their way into data used to train AI systems."

Sarah Tew/CNET

Facebook said Thursday it's releasing new data that could help researchers improve artificial intelligence systems so they're less biased and more fair.

AI is already being used in various tech products from self-driving cars to facial recognition. While technology can make our lives easier, civil rights groups have raised concerns that biased AI systems could harm minorities. Studies have shown, for example, that facial recognition systems have a harder time identifying women and darker-skinned people.

Part of the problem could lie in the data that tech workers use to train computer systems.

"These biases can make their way into data used to train AI systems, which could amplify unfair stereotypes and lead to potentially harmful consequences for individuals and groups -- an urgent, ongoing challenge across industries," Facebook's AI researchers said in a blog post on Thursday.

To help tackle fairness and bias in AI, Facebook said it paid more than 3,011 people in the US of different ages, genders and skin types to talk about various topics and sometimes show different facial expressions. A total of 45,186 videos of people having unscripted conversations were included in this data set known as Casual Conversations.

Participants also provided their own age and gender, which is likely more accurate than relying on a third party or model to estimate this information. Rather than using images from a public database, people are being asked if they want to provide their data to improve AI and also have an option to remove their information, said Cristian Canton Ferrer, a Facebook AI research scientist. "It's a great example of a responsible kind data set," he said.

Trained annotators also labeled people's skin tones and different lighting conditions, which could impact how the color of a person's skin appears in a video. Canton Ferrer, who was using Facebook's video chat device Portal , said the data set could help researchers evaluate if a AI-powered camera has a harder time tracking someone with dark skin in a dimly lit room. People also have different accents, which smart speakers sometimes have a harder time recognizing.

"This is a first step," Canton Ferrer said. "Fairness is a very complex and multi-disciplinary kind of question, and you cannot answer just with one data set."