Why You Can Trust CNET

Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy through our links, we may get a commission. Reviews ethics statement

Science

IBM hopes 1 million diverse faces can reduce bias in AI

A more diverse data set can help advance fairness in facial recognition tech, says IBM.

Marrian Zhou Staff Reporter

Marrian Zhou is a Beijing-born Californian living in New York City. She joined CNET as a staff reporter upon graduation from Columbia Journalism School. When Marrian is not reporting, she is probably binge watching, playing saxophone or eating hot pot.

See full bio

Marrian Zhou

Jan. 29, 2019 2:01 p.m. PT

dif-cropped2-300x287 — IBM made a million-face data set to help reduce bias in facial recognition technology.
IBM Research

IBM Research on Tuesday released a new data set that contains 1 million images of diverse human faces, with an aim to help advance fairness and accuracy in facial recognition technology.

"For the facial recognition systems to perform as desired -- and the outcomes to become increasingly accurate -- training data must be diverse and offer a breadth of coverage," wrote John Smith, an IBM fellow, in a blog post. "The images must reflect the distribution of features in faces we see in the world."

This comes after artificial intelligence in facial recognition systems has reportedly shown bias. Last week, an MIT study revealed that Amazon's Rekognition tech had a harder time recognizing the gender of darker-skinned women and made more mistakes identifying gender overall than competing technologies from Microsoft and IBM.

While researchers are already working with attributes like age, gender and skin tone, these features can't adequately characterize everyone, according to IBM. Things like face symmetry, facial contrast, the pose the face is in, and the length or width of eyes, nose, forehead, mouth and more need to be considered.

IBM's data set, called Diversity in Faces, has 10 coding schemes, which include features like head length, nose length, forehead height, facial ratios, age, gender, pose, resolution and more.

The million-face data set is available today to researchers around the world on request.