CNET también está disponible en español.

Ir a español

Don't show this again

Photography

IBM stirs controversy by using Flickr photos for AI facial recognition

The million images were shared under the liberal licenses that Flickr photographers chose, but they probably didn't picture this.

An annotated photo from IBM's Diversity in Faces data set

An annotated photo from IBM's Diversity in Faces data set.

IBM

Some photographers who contributed photos to the Flickr photo-sharing site were surprised IBM used those same photos in a million-image collection to train AI face-recognition systems -- but perhaps they shouldn't have been.

The Flickr photos had been shared under a Creative Commons license, a framework under which people can loosen restrictions on photos, text, video or other material that otherwise would be protected by copyright. CC licenses can bar commercial use or require others using the photos to attribute them to their source, but the general idea is to make the work available for others to use.

"None of the people I photographed had any idea their images were being used in this way...It seems a little sketchy that IBM can use these pictures without saying anything to anybody," Greg Peverill-Conti, an executive at public relations firm SharpOrange whose photos were used, told NBC News Tuesday.

IBM used only photos licensed under Creative Commons, and IBM's legal team approved the program, a company representative said. The data is offered only to academic researchers through a project called Diversity in Faces. The faces are annotated with human observations about factors like sex and age and with geometric measurements, and they are intended to help researchers counter bias that can undermine AI fairness.

"We take the privacy of individuals very seriously and have taken great care to comply with privacy principles, including limiting the Diversity in Faces dataset to publicly available image annotations and limiting the access of the dataset to verified researchers. Individuals can opt-out of this dataset," spokesman Saswato Das said in a statement. "IBM has been committed to building responsible, fair and trusted technologies for more than a century and believes it is critical to strive for fairness and accuracy in facial recognition."

One lesson here: If you don't want your imagery used to train artificial intelligence systems -- or to appear in books, Wikipedia articles, art projects and corporate PowerPoint presentations -- choose your Creative Commons licenses carefully or don't use them at all. Even then you might be surprised, since IBM's use -- reduced-size images that have been significantly annotated -- is arguably transformative and therefore permissible even with copyrighted images under copyright law's fair-use provisions. So perhaps the only way to truly avoid having your photos used in AI is to avoid sharing them at all.

Creative Commons details

You might also like the Creative Commons ethos. Sharing data that researchers may freely use -- to rid AI systems of racial bias, for example, or to improve voice recognition, as with Mozilla's Common Voice project -- is arguably a laudable goal. 

The Creative Commons organization, a nonprofit that oversees the licenses, didn't comment on IBM's specific usage. But Chief Executive Ryan Merkley said the matter of faces used to train AI systems is broader than just a licensing issue.

"Our tools were built to solve for copyright, and they do that well," Merkley said. "But copyright isn't a good tool to address privacy, or research ethics, or surveillance AI."

The organization published a blog post about the IBM-Flickr case and FAQ about the AI situation more broadly on Wednesday.

One sticking point is whether IBM's use is noncommercial. It offered the images only to academic researchers, but IBM benefits  commercially from a higher profile in the world of AI, too. IBM didn't comment on the broader commercial issue of its program.

Merkley didn't pass judgment on IBM's use. But he did say the permission depends on how CC-licensed images are using them, not who is using them. "Being a company doesn't necessarily mean you can't use non-commercial content," he said.

More than 700 of Peverill-Conti's photos are in the collection and some photographers had trouble getting IBM to remove their photos from the data set, NBC News said. Peverill-Conti didn't respond to CNET's request for comment.

Flickr defends IBM's usage

Flickr's leader, SmugMug Chief Executive Don MacAskill, tweeted on Tuesday that IBM retrieved the photos before SmugMug acquired the photo-sharing site. However, he defended IBM's type of usage as adhering to the principles of Creative Commons.

"We love & support photographers and their right to choose their own licenses for their work. By default, they reserve all of their rights, and have the option to loosen them if they'd like," MacAskill tweeted.

"People didn't have to opt-in to the dataset because they had already opted into the Creative Commons license. They took action. This is the way licensing works. It's also the magic that enables artists & scientists all over the world to create & invent using CC-licensed works," he added.

Flickr has more than 400 million photos shared under Creative Commons licenses. Although Flickr eliminated a Yahoo-era plan that offered photographers a free terabyte of photo storage, it exempts Creative Commons shots from the limit.

Originally published March 12 at 7:10 p.m. PT.
Update, 8:21 p.m. PT: Adds further comment from IBM.
Update, March 13: Adds further background and comment from Merkley and links to Creative Commons information on the subject.