Facebook's AI research could spur smarter AR glasses and robots

Rummaging through drawers to find your keys could become a thing of the past.

Queenie Wong Former Senior Writer
Queenie Wong was a senior writer for CNET News, focusing on social media companies including Facebook's parent company Meta, Twitter and TikTok. Before joining CNET, she worked for The Mercury News in San Jose and the Statesman Journal in Salem, Oregon. A native of Southern California, she took her first journalism class in middle school.
Expertise I've been writing about social media since 2015 but have previously covered politics, crime and education. I also have a degree in studio art. Credentials
  • 2022 Eddie award for consumer analysis
Queenie Wong
4 min read

Facebook has been working on augmented reality glasses.

Getty Images

Facebook envisions a future in which you'll learn to play the drums or whip up a new recipe while wearing augmented reality glasses or other devices powered by artificial intelligence. To make that future a reality, the social network needs its AI systems to see through your eyes. 

"This is the world where we'd have wearable devices that could benefit you and me in our daily life through providing information at the right moment or helping us fetch memories," said Kristen Grauman, a lead research scientist at Facebook. The technology could eventually be used to analyze our activities, she said, to help us find misplaced items, like our keys.

That future is still a ways off, as evidenced by Facebook's Ray-Ban branded smart glasses, which debuted in September without AR effects. Part of the challenge is training AI systems to better understand photos and videos people capture from their perspective so that the AI can help people remember important information. 


Facebook says analyzing video shot from the first-person perspective is challenging for computers.


Facebook said it teamed up with 13 universities and labs that recruited 750 people to capture more than 2,200 hours of first-person video over two years. The participants, who lived in the UK, Italy, India, Japan, Saudi Arabia, Singapore, the US, Rwanda and Colombia, shot videos of themselves engaging in everyday activities such as playing sports, shopping, gazing at their pets or gardening. They used a variety of wearable devices, including GoPro cameras, Vuzix Blade smart glasses and ZShades video recording sunglasses.

Starting next month, Facebook researchers will be able to request access to this trove of data, which the social network said is the world's largest collection of first-person unscripted videos. The new project, called Ego4D, provides a glimpse into how a tech company could improve technologies like AR, virtual reality and robotics so they play a bigger role in our daily lives.

The company's work comes during a tumultuous period for Facebook. The social network has faced scrutiny from lawmakers, advocacy groups and the public after The Wall Street Journal published a series of stories about how the company's internal research showed it knew about the platform's harms even as it downplayed them publicly. Frances Haugen, a former Facebook product manager turned whistleblower, testified before Congress last week about the contents of thousands of pages of confidential documents she took before leaving the company in May. She's scheduled to testify in the UK and meet with Facebook's semi-independent oversight board in the near future.

Even before Haugen's revelations, Facebook's smart glasses sparked concerns from critics who worry the device could be used to secretly record people. During its research into first-person video, the social network said it addressed privacy concerns. Camera wearers could view and delete their videos, and the company blurred the faces of bystanders and license plates that were captured. 

Fueling more AI research


Doing laundry and cooking looks different in video from various countries. 


As part of the new project, Facebook said, it created five benchmark challenges for researchers. The benchmarks include episodic memory, so you know what happened when; forecasting, so computers know what you're likely to do next; and hand and object manipulation, to understand what a person is doing in a video. The last two benchmarks are understanding who said what, and when, in a video, and who the partners are in the interaction.

"This sets up a bar just to get it started," Grauman said. "This usually is quite powerful because now you'll have a systematic way to evaluate data."

Helping AI understand first-person video can be challenging because computers typically learn from images that are shot from the third-person perspective of a spectator. Challenges such as motion blur and footage from different angles come into play when you record yourself kicking a soccer ball or riding a roller coaster.

Facebook said it's looking at expanding the project to other countries. The company said diversifying the video footage is important because if AR glasses are helping a person cook curry or do laundry, the AI assistant needs to understand that those activities can look different in various regions of the world. 

Facebook said the video dataset includes a diverse range of activities shot in 73 locations across nine countries. The participants included people of different ages, genders and professions.

The COVID-19 pandemic also created limitations for the research. For example, more footage in the data set is of stay-at-home activities such as cooking or crafting rather than public events. 

Some of the universities that partnered with Facebook include the University of Bristol in the UK, Georgia Tech in the US, the University of Tokyo in Japan and Universidad de los Andes in Colombia.