This post was published before the Facebook company became Meta. For the most recent Meta Careers blog posts, visit our blog homepage. Enhancing the experiences people have while using Facebook products calls for a deep understanding of the visual world around us, and computer vision (CV) researchers at the Facebook company are inventing new ways for computers to learn from cues. “The world and how we connect with one another is changing,” Manohar P., Director of Artificial Intelligence (AI) at Facebook, observes. “The teams working on computer vision at Facebook are helping us adapt to those changes and power a wide range of forward-thinking innovations.”
Manohar recently joined fellow leaders from Facebook AI Research (FAIR), Facebook AI Applied Research (FAIAR), and Facebook Reality Labs (FRL) in a virtual discussion to share learnings and insights from their research. Here’s what they said about their work, their vision for the future, and what it means to be a leader in AI.
Driving computer vision forward with AI applied research
The CV teams at Facebook AI are tackling three key areas in computer vision: recognition, video understanding, and 3D vision. “Whether it’s pushing self-supervised learning—which is getting a lot of attention because we’re seeing exciting results every few months either within Facebook or from the research community—or scaling research to larger models and data sizes, we keep working to improve accuracy,” research manager Wan-Yen L. says.
Openness is a value that distinguishes the organization from other research labs. “We’re lucky to have some of the best researchers in the world, and we publish our findings and open-source the code so everyone has the opportunity to reproduce our results easily,” Wan-Yen notes. “With openness, we can leverage help from the entire community to advance AI.”
The team’s openness also empowers them to collaborate with different teams across their main focus areas. Wan-Yen shares, “For example, our team is working closely with the Video Understanding team to develop a joint codebase so we can use it to accelerate both research and productization. We have people from the research team who are interested in making product impact. We also have people from the Video Understanding team who have research expertise. There isn’t a sharp boundary between the teams to separate our work. It’s very integrated, and that’s the best part.”
Facebook computer vision scientists and research managers
Creating magical AR and VR experiences with the XR People team
Have you used an Oculus headset or made a video call with Portal? If so, you’ve experienced the technology Facebook’s XR People team works on every day. Part of Facebook Reality Labs, the team’s mission is to help people feel more present with each other, even when they’re apart. Currently focused on developing face and body reconstructed technology, the XR People team’s work will enable new representations of people in AR and VR platforms—whether they’re realistic or stylized versions of you. “Our work ranges from eye tracking and face tracking to modeling, reconstruction, body tracking, and more,” Elif A., an engineering manager, shares. “Our work is highly interdisciplinary, as it involves computer vision and machine learning capabilities that capture, analyze, and reconstruct human appearance, movement, expression and interactions. Then we decode what’s happening and present it in a visually pleasant way.”
To make magical experiences like this, the team looks for people who have a variety of backgrounds, skills, and talents. “On the Development team, we sit between research and production,” Elif explains. “We specifically look into what the research teams do, what they’re working on and how they’re innovating, and we explore the areas they’re looking at. We also talk with different product teams to understand the problems they’re solving and their vision for the future.”
“We need people from all aspects of this field, including researchers, software engineers, and experienced developers,” Elif says. “Collaborating with innovative people, technical people, and scientists helps us make the right decisions.”
Using data to decode video understanding
Matt F., a research science manager, explains that each time someone uploads a video to Facebook, technology plays an important role in helping people find it, understand what it’s about, and ultimately, inspiring them to watch it. If it violates our policies, technology also plays an important role in removing it or adding it to the queue for content moderators to check. “AI is influential at every stage of the process, whether it’s having a basic understanding of the video content, what language it’s in, or who’s in it,” he says.
Video understanding has many complex challenges, which range from what humans are naturally able to understand to technological hurdles. Matt explains that modalities like speech, audio, and text are a big focus right now. “Most current models only understand one to three seconds of video at a time. We want to understand what’s going on in a few minutes.”
To accomplish this, the team leverages data to run tests and engineer solutions. They also conduct research, complete production work, and examine things like user errors, hashtag use, and noise to understand each video upload.
“We look at the gaps,” Matt explains. “This helps us understand content and get it to the right place. With this technology, you want a really large vocabulary with all kinds of special interests like places, locations, and topics. In the history of AI, the everyday things you take for granted are the hardest: for machines, winning a chess match turned out to be easy compared to recognizing chess pieces. I would be excited to deliver something here to understand everyday events, and I hope we can soon.”
Whether it’s empowering new experiences like Portal’s Smart Camera, protecting people by using AI to proactively detect and remove policy-violating photos, or better connecting people to the content they care about, the teams working on CV at Facebook are pushing for deeper understanding and solving problems at an unprecedented scale. Together, they’re building life-changing experiences for billions of people around the world.