Facebook said Thursday it's releasing new data that could help researchers improve artificial intelligence systems so they're less biased and more fair.
AI is already being used in various tech products from self-driving cars to facial recognition. While technology can make our lives easier, civil rights groups have raised concerns that biased AI systems could harm minorities. Studies have shown, for example, that have a harder time identifying women and darker-skinned people.
Part of the problem could lie in the data that tech workers use to train computer systems.
"These biases can make their way into data used to train AI systems, which could amplify unfair stereotypes and lead to potentially harmful consequences for individuals and groups -- an urgent, ongoing challenge across industries," Facebook's AI researchers said in a blog post on Thursday.
To help tackle fairness and bias in AI, Facebook said it paid more than 3,011 people in the US of different ages, genders and skin types to talk about various topics and sometimes show different facial expressions. A total of 45,186 videos of people having unscripted conversations were included in this data set known as Casual Conversations.
Participants also provided their own age and gender, which is likely more accurate than relying on a third party or model to estimate this information. Rather than using images from a public database, people are being asked if they want to provide their data to improve AI and also have an option to remove their information, said Cristian Canton Ferrer, a Facebook AI research scientist. "It's a great example of a responsible kind data set," he said.
Trained annotators also labeled people's skin tones and different lighting conditions, which could impact how the color of a person's skin appears in a video. Canton Ferrer, who was using Facebook's video chat device Portal, said the data set could help researchers evaluate if a AI-powered camera has a harder time tracking someone with dark skin in a dimly lit room. People also have different accents, which smart speakers sometimes have a harder time recognizing.
"This is a first step," Canton Ferrer said. "Fairness is a very complex and multi-disciplinary kind of question, and you cannot answer just with one data set."