Apr 01 2025
From research to product: Multimodal AI in Ray-Ban Meta glasses

Multimodal conversational AI is at the heart of Ray-Ban Meta glasses, enabling a seamless experience that combines audio, video, text and image inputs. When you ask your Ray-Ban Meta glasses, “Hey Meta, tell me what I’m seeing,” multimodal AI is what enables the glasses to respond in real-time.


Shane M., a research scientist at Meta, studied multimodal AI for years before eventually productizing his work into Ray-Ban Meta glasses. His research on AnyMal, a unified multimodal large language model (LLM), was crucial to scaling the technology.


“A few years into our work, there was a monumental improvement in language model capabilities,” Shane recalls. “That was a really exciting time because suddenly, we could combine the ongoing research on language models with our work on multimodal AI to build this brand new system that could process images and text at the same time.”


Once the model was finalized, Shane and his team found an immediate product fit in Ray-Ban Meta glasses.


“Glasses were a good form factor to house this model, as they naturally enable the AI system to see exactly what the user sees. Wearers simply ask the model questions to learn more about the world around them, which unlocks different everyday AI use cases, like asking the glasses what you should make for dinner based on what you can see in your fridge, or asking about the history of the building in front of you.”

A man wearing a red shirt and an EMG prototype device on his wrist

“Our Open Source Initiative is one of the biggest benefits of working at Meta. No matter which domain you're working on—whether computer vision, natural language processing or LLMs—we make sure our research findings are made available to the public. I’m very passionate about this as a researcher.”

New technology rooted in open source


Shane’s original multimodal model, AnyMal, was built entirely using open source components—a principle that has remained important to him throughout his present-day work with Meta.


“What I appreciate about open source is its reproducibility. With the right components, you can replicate the same experiments exactly. This level of transparency is key for the advancement of AI as a whole.”


“Our Open Source Initiative is one of the biggest benefits of working at Meta. No matter which domain you're working on—whether computer vision, natural language processing or LLMs—we make sure our research findings are made available to the public. I’m very passionate about this as a researcher.”


Shane also leveraged the resources at Meta to complete his work and share his insights with the open source community. He remembers conducting years of trials before his team finally perfected the multimodal AI model that billions of Meta users benefit from today.


“It takes a tremendous amount of GPU resources to conduct qualitative inquiry evaluations at that level,” Shane explains. “Not every researcher has access to that kind of support. We aimed to be as transparent as possible in our paper, sharing both negative and positive results from our ablation studies so other researchers could build on our findings without needing to repeat the same experiments.”


The wide range of engineering and technical talent at Meta also contributed to Shane and his team’s success, giving them access to experts across multiple AI fields to help guide their research.


“We encountered numerous challenges when developing multimodal AI, and many of its technical components were built from scratch. At that time, training on top of an LLM was almost prohibitively expensive—not to mention difficult. Even evaluating the model’s accuracy was a challenge because we couldn’t just measure it against a standard multiple-choice test. Thankfully, there was another team of Meta engineers who were pioneering LLM evaluation benchmarks and assessment protocols. Not only was their work technically fascinating, but we were able to adapt it for our own research.”


“I work in the field of multimodal AI, which bridges computer vision and natural language understanding—two fields often moving in completely separate directions. My role is to combine the best ideas from each world into one joint model. The ability to freely collaborate with experts at Meta from both fields has been key to overcoming challenges and advancing my research.”

"The ability to freely collaborate with experts at Meta from both fields has been key to overcoming challenges and advancing my research.”

How the tech of tomorrow is impacting people, today


Today, Shane is more motivated than ever to deepen his understanding of multimodal AI. One use case that’s driving him forward is Be My Eyes, an app that pairs blind or low-vision users with a network of sighted volunteers. Once connected, the app enables volunteers to see through the Ray-Ban Meta glasses to assist vision-impaired wearers with everyday tasks.


“Hearing how Meta AI has helped vision-impaired users inspires me to keep improving our model’s accuracy,” says Shane. “Meta has a bottom-up culture that gives us the freedom to choose our own research topics. This autonomy and trust have been an incredible motivator for me personally. At the same time, I also feel a strong responsibility to ensure the technology we release positively impacts users and meets the highest standards. As a researcher, Meta provides me with incredible resources, but it’s up to me to show how my ideas can create meaningful impact for billions of people around the world. I don’t take having this support for granted.”


Looking ahead, Shane is excited to expand on our current computer vision capabilities and explore how other forms of AI—like agentic models—can open the door to more AI-driven innovations.


“Computer vision understanding has come so far, but there are still areas where we can improve. As an industry, I don’t think we’re far off from unlocking three-dimensional vision understanding, and I’m excited to learn how that technology will advance our work even further.”

Stay connected.

Meta logo, homepage link

Careers

Follow us

LinkedIn icon
Instagram icon
facebook icon
Threads icon
YouTube icon
Twitter icon

Equal Employment Opportunity

Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here.

Meta is committed to providing reasonable support (called accommodations) in our recruiting processes for candidates with disabilities, long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form .