Researchers are crawling the internet for photos of people wearing face masks to improve facial recognition algorithms.
Your face mask selfies aren't just getting seen by your friends and family -- they're also getting collected by researchers looking to use them to improve facial recognition algorithms. CNET found thousands of face-masked selfies up for grabs in public data sets, with pictures taken directly from Instagram.
The COVID-19 pandemic is causing a surge in people wearing face masks, and facial recognition companies are scrambling to keep up. Face masks cover up a significant portion of what facial recognition needs to identify and detect people -- essentially threatening the future of a multimillion-dollar industry unless the technology can learn to recognize people beyond the coverings.
To do that, they need more masked photos to train their algorithms.
In April, researchers published the COVID19 Mask Image Dataset to Github, using more than 1,200 images collected from Instagram. A month earlier, researchers from China compiled a database with more than 5,000 masked photos they gathered online.
The creators behind the April database used their AI startup Workaround to help comb through the images and properly label them with masks on or off, said Wafaa Arbash, the company's CEO.
"We were inspired by all the companies that were launching free tools and everything they can do to help," Arbash said. "We have these public images from Instagram, so these are not private images. We were just searching and getting the right data."
Facial recognition companies have long used people's pictures without consent to train their algorithms. Civil liberty advocates contend that facial recognition technology threatens privacy and free speech, warning as well that there are almost no laws preventing abuse of the surveillance tools.
Clearview AI, a controversial facial recognition company, claimed it has a First Amendment right to scrape more than 3 billion images from social networks to use for its database.
Governors in more than half of the US states are mandating face masks in public because the coverings help prevent the spread of COVID-19. The masks have also slowed down the spread of facial recognition, since the garments block key parts of your face that the technology usually analyzes.
Some facial recognition providers have turned to asking their own staffers to send in face-masked selfies, as well as editing masks on top of the photos that they already have. Digitally adding masks to photos is how the US National Institute of Standards and Technology plans on testing facial recognition algorithms.
But there are only so many employees a company can ask to take selfies, and edited face mask photos may not be as effective as organic images for training algorithms. Facial recognition companies also need a diverse set of pictures so the algorithms can better recognize women, people of color, people of different ages and a variety of mask types.
For her company's public database, Arbash said the photos came from searching on Instagram with hashtags related to masks. They gathered about 3,000 pictures from the social media platform, but narrowed it down to a set of 1,200 photos. The sample images posted included a child's photo as part of the set -- Arbash said it was a possible error that this picture ended up in its database.
Arbash said they didn't ask the people included in the database for permission to use their face mask selfies to help develop facial recognition, and that if they wanted to be excluded, they could make their pages private. The people included aren't aware they're in this data set, she said.
"We're not making any money off of this, it's not commercial," Arbash said. "The goal and the intention was to help any data science or machine learning engineers who are working to fix this issue and help with public safety."
The links to the images from Instagram have since expired, but the data set's page put out a public call asking if anyone knew how to retrieve the photos. Arbash said if there's enough interest, the company would consider looking more into how to get more face mask images.
"We do not allow third parties to collect or use photos posted by our users in this way, without their consent. We are continuing to investigate this," Facebook said in a statement.
The Real World Masked Face Dataset claims to be the largest masked face data set, with more than 5,000 masked faces of 525 people gathered from the internet. The compilation comes from researchers at Wuhan University in China, where the coronavirus outbreak began.
A research paper released on March 23, accompanying the data set, says the images are of public figures gathered "from massive internet resources." The researchers didn't respond to a request for comment.
The practice of grabbing people's photos from social media to train facial recognition algorithms isn't new, but the focus on face masks because of COVID-19 is. There's an urgency among developers to create face mask detection technology as a public safety concern, but ethical issues come up when the images are collected without consent.
"People might not like the idea that their picture could be used to develop a database that could go to law enforcement or government surveillance in a foreign autocratic country like China," said Jake Laperruque, a senior counsel at the Constitution Project. "You're putting photos out there, maybe not with an expectation of privacy, but you have an expectation of how it can and can't be used."