Guest(s): Mahima Gupta, Product Manager; Rakesh Ranjan, Director of AI Research
How do you make 3D creation as easy as generating an image—especially when high-quality 3D training data is scarce and the workflow is notoriously complex? In this episode, Pascal Hartig talks with Mahima Gupta (Product Manager) and Rakesh Ranjan (Director of AI Research) from Meta XR Tech about AssetGen, a foundation model for generating 3D assets from simple prompts.
They dive into why 3D is fundamentally harder than 2D (geometry, textures, rigging, and perspective), how AssetGen is built as a multi-stage pipeline (shape + texture), and what it takes to ship these capabilities into real creator tools.
They also explore the bigger ambition: moving from single assets to entire 3D worlds—using vision-language models to infer what a scene needs and how to lay it out in a coherent, navigable environment.
Pascal: Hello and welcome to episode 78 of the Meta Tech Podcast, an interview podcast by meta where we talk to engineers who work on our different technologies. My name is Pascal, and Meta Connect might be over, but for us here on the podcast, this is just the beginning.
I have an uncharacteristically timely interview for you. If you have tuned into Meta Connect in the past couple of days, you will have seen multiple references, both by Zack and our CTO Boz, just some stunning new features in Horizon Worlds to generate 3D assets and even entire worlds with nothing more but a text bot.
These features are powered by AssetGen, a foundation model for 3D model creation, and I had the immeasurable pleasure of talking with Mahima and Rakesh about both the technical aspects of training the model where there simply isn't a lot of data around and the unexpected challenges of integrating this into consumer products.
Even if you didn't you show up to Meta Connect directly, you will have likely seen headlines about the new Meta Ray-Ban Display, which features an EMG wristband that translates the signals by your muscles, like subtle finger movements into commands for your glasses.
I had a chat with the team behind the incredible tech in just the last episode, and I'm sure they are all thrilled that the life demagogues were on the sides. If you haven't listened to the interview and want to learn how this all works, check out episode 77 and seriously, just subscribe to the show already.
But now let's talk AssetGen with Rakesh and Mahima.
I don't know about you, but my drawing and painting skills are pretty atrocious, and yet I could at least create a little stick figure in Photoshop and figure out how to animate it in After Effects.
But if you now ask me to create the most basic version of that in 3D, I just give you a blank stare. Despite having blender installed on MacBook. Mahima and Rakesh and their team are on a mission to change that with AssetGen. Mahima and Rakesh. Welcome to the Meta Tech Podcast.
Rakesh: Thank you.
Mahima: Thank you for having us. This is super exciting.
Pascal: I think so too, because this is something very novel. But before we dive into the actual topic, let's give you two a chance to introduce yourself to our audience. Mahima, can I start with you? How long have you been at Meta and what did you do before?
Mahima: It’s a great question. I was actually looking at the date and I'm coming up to my three years at Meta, which is just hard to believe. It feels like a lifetime. I don't remember being anywhere else before that, so now you're making me think. But before Meta I was at Amazon where I ran a bunch of the connected devices, features for Alexa devices, and had our footprints across every device.
I've been at GoPro in the past, so historically been more on the hardware and cross platform side, but very, very excited to be in AI given everything that's been going on recently.
Pascal: Very cool. Rakesh, what about you?
Rakesh: Yeah, I have been at Meta for about five and a half years now, which is. Similar to Mima feels like a, you know, lifetime, uh, lifetime of lot of interesting things that we have done over these years. But before coming to Meta, I was at Nvidia, Nvidia research where I used to work in applying deep learning to real time graphics, which was really new at the time when I was working on it. And now that's how graphics has been done everywhere. So yeah, very exciting.
Pascal: And what is your team's mission? If you can put this into a few sentences.
Mahima: Yeah, I could take a stab at, uh, defining our mission statement, but taking a step back, I think one thing that Meta has done such a good job of is giving creative tools to anyone and everyone, right? Historically you needed to be an expert in video creation and then we introduce Instagram and Reels for anyone to be able to create.
Same thing for Facebook. Now, anyone can communicate with anyone around the world and share these connections and these common ideas. Similar to that, our ultimate mission is to democratize creation for folks. Historically 3D generation and creating in 3D and even thinking in 3D has just been incredibly tedious.
And we're confident that with our state-of-the-art AI models, we can transform both creation for social and entertainment uses.
Pascal: Rakesh, do you want to add to that?
Rakesh: Yeah, so we are part of reality labs and you know, like reality labs, uh, the devices that we make, they work in the real world. And the real world is 3D. And, uh, understanding the real world requires you to understand 3D really, really well. And, the team that I represent, and I'm really like the face of a team of extremely talented individuals. We have the highest talent density in 3D so to speak. Both on the side of 3D understanding as well as on 3D generation. And in fact, I look at these two things as the two sides of the same coin and as Mahima said, like our goal is to make it so easy to create 3D content and to be able to devise. You know, methods and devices that can understand your 3D environments and then augment them with the creativity that 3D generative AI brings.
So that's kind of like, you know, where we excel and we are focused on.
Pascal: Yeah. Fascinating. So let's dive into this a little. I mentioned it in my intro that I'm personally terrified of creating anything in 3d probably not a line, but can you expand on it a little bit? What actually makes this challenging to create 3d assets?
Rakesh: Yeah. So unlike 2D, 3D has one more dimension, which is the depth, the Z dimension, and it is incredible, like with this one extra dimension, how much complexity comes into the process of creating 3D content. So if you think about like creating, you know, 2D image in a paintbrush app, really, all you care about is, you know, the two dimensions of your brush strokes and colors. And pretty much with that, you can get something that looks reasonably good. And you know, it gives you enough tools. Now imagine doing that in 3d you not only have to think of, you know, the XYZ dimensions, where every single point in space goes. You have to think about how to place the camera. You have to think about the perspective. You have to think about the topology of the mesh that gets created, or any representation that you have. You have to think about rigging. You have to think about animation that is all extremely complex, and that's that's why, you know, like it reads, it requires so much specialized skills, and why it is so difficult for even if it's difficult for humans, it's a bit non trivial for AI models, too. And that's why 3D has been behind 2D but yeah, we are getting there, yeah.
Pascal: And I think that even translates to the real world, because most of us will have something around us to sketch out a quick 2D drawing, right? But if I wanted to create the sculpture, I'm not sure if I have the tools lying around for that right now, let alone the skills.
Rakesh: That's a great point.
Pascal: Mahima, do you have anything to add to this?
Mahima: Yeah, I think you nailed it. I think most of us, and I have a little girl at the moment, and we're trying to teach her to draw, which is alone a big, big, big challenge, right? But drawing on a 2D piece of paper is so much easier than thinking in 3D to Rakesh’s point, the world is 3D. So number one, you have to think in 3D and then on top of it, you need the right tools, whether it's your fingers or a chisel or a hammer, these tools cannot just be held by anybody. It requires skill. And historically, it's taken years and years of skills. I mean, people have degrees to be able to create assets and objects in 3D. We can now enable people to do that with a prompt. And so that is the real magic behind it, and that is what democratizing creation here looks like.
Pascal: Got it. So now we need to talk about why we actually care, because we do have very successful apps that mostly right now actually work in the 2D space. Right? You mentioned Instagram. We have reels. We have Facebook where loads of people post pictures. Very few of them post. 3D models. So this is probably the point where we need to talk about the metaverse.
So can you talk about how these two things converge?
Mahima: I mean, I can take a stab at this. I think the, um, what, what's something people don't realize about, and such a great question by you Pascal, is the gaming industry today is one of the largest entertainment industries in the world. And I'm not a, you know, an avid gamer, but this is just a data point that I was completely blown away by.
And the application that we have today in Meta is Horizon and Horizon Worlds, and our VR devices bring us closer and closer to that 3D realism and 3D experiences that people offer. But again, the challenge continues to be, to create for these 3D surfaces. Right. Whether it is Horizon on mobile or it's Facebook games, which by the way, I highly recommend it, is super fun to play Kaiju.
Or it is putting on your headset and just looking around even gorilla-ize and stuff of the world. Creating these games is immensely technical and requires technical expertise. So creating for Metaverse, there's so many challenges. You have to have the skillset, you have to have the teams, you have to understand how to get customers and what their interests are. You have to be very fast at publishing and storytelling.
So yeah, these are some of the things that come to my mind. I'm sure Rakesh has another depth of dimension here from a technical perspective.
Rakesh: Yeah, I mean, if you are creating a platform in the Metaverse, like there is this chicken and egg problem that every platform has to deal with, which is you have a limited pool of creators today, specifically today, because the tools are so difficult to use to create 3D content. Now if you're a creator with these skills, like you will look for platforms where there are users because you want users to use your product.
And if for a user you would want to go to a platform where there is enough content to engage with, and this is the chicken and egg problem that we want to be able to solve. By lowering the floor, lowering the skills that are needed in order to create content. So even folks with creativity, but not having a lot of these, you know, any specialized skills should be able to create content and that increases the pool of creators itself. And in its extreme, you can imagine that every user is a creator as well. And as we go towards that trajectory, we are basically solving that very core problem that a platform has to deal with.
Pascal: How do you then envision that people use the tools as part of Horizon Worlds in the future?
Mahima: It is. This is such a great question because we've had to kind of go back to the drawing board to think about what is the creation experience journey like in the future. Right. Today it's a very preset journey with downloading the tools on your PC and opening up, you know, the editors and thinking about creating assets on this plane. And it's just historic and like it's how creators work today. Our part of, part of the job of our team is to actually think about the future, right? Like what is creation going to be like in five years from now? And ultimately, my vision at Rakesh, and I tend to agree on this is creation should be as simple as thinking about something.
Right? Like if you think about, “hey, I wanna be in a world that has vampires and zombies,” boom, we can create that world for you and we can create all relevant building blocks. So these would be assets and environments and sky boxes and soundscapes and stuff that can show up by just thinking about it.
Similarly, if you want a more educational world and you wanna think about what's happening inside. You know, a human brain, you can imagine that and prompt that into existence. So the future of creation, in my mind and my, my hope is as simple as just uttering the right words and boom, you get that world around you.
Pascal: Actually, what is the creation flow like today in Horizon? If I want to import my own model, because I've only checked it out in my headset so far, and never engaged with the creator tools that presumably exists for it. What's, what's the flow like?
Rakesh: So, uh, so yeah, so today if you want to create a Horizon World, we provide these IDs, you know, like the environments, which you can, it's called today called Horizon’s Unity editor and basically it's a workflow that you can download and you can create content. You can create 3D contents, entire worlds. You can create characters, you can animate them. So it is much closer to the traditional workflows, but it's evolving. Like now we have already announced and released features into this editor, which are Gen AI based. So now a creator can actually write text prompts of what they want. And the model will pop out those 3D assets then after that you can, you know, like edit them, change them, and then you can integrate them into your world. So already we see like what we are providing is a sort of, you know, assistant to even the existing workflows in Horizon editor. And going forward, of course the North Star, as Mahima is describing is that, you know, anyone can imagine they have an idea and they can, as long as they can describe it with prompts, either it's text, whether it is images, videos, and you can get that 3D world.
Pascal: Will the end result be that you can spend more of your time than actually in VR in your headset while you create these worlds? Or do you still think there will be a role for these standalone tools that you might open on your laptop instead?
Mahima: I was gonna say, I think it's a really good question in the sense that we ultimately want to make anyone a creator and create during play and during consumption. That being said, I feel like there's always a need for professional creators and professionals always have a different perspective. Right?
Even today on Instagram, when you see high quality content or influencer content. It's often shot not on mobile it's shot on different devices it's edited on, on desktop and laptop. I don't think that's going away and Rakesh and I like to say that this is the tide that's going to raise all the boats.
So not only is anyone and everyone going to be able to create these tools will also enable the existing creators to create faster, higher quality and just more content and for those creators, you know, desktop tools are there and Meta's doing a really good job at continuing to improve those and release those as well.
Pascal: That's actually great comparison that makes a lot of sense to me because I've been dabbling in reels and I've definitely shot a few directly in the camera, in Instagram then I've tried out the edits app, for instance, but occasionally it is incredibly useful to make use of the Adobe Suite and throw in a bunch of Premier Effects or even After Effects. If you want, and I'm sure there will be a similar breakdown for this, right? You may shoot something directly in the moment in the headset, but then for something a little bit more complex where you might want to do a little bit of scripting, maybe you go back to your desktop app where you have the full feature set available.
Mahima: Exactly. That's, I think that compares is spot on. If you want to create, you can do it directly on mobile or on VR. But again, we have all of these amazing, amazing tools available for desktop creators as well.
Pascal: Cool. So now let's talk a bit about the model behind all of this that is Asset Gen and how you managed to train this because we know there are now plenty of models out there that can generate increasingly higher quality images, and even though this is probably not exactly a solved problem yet, we can all see that we are now seeing almost like diminishing returns because they have got really good for 3D imagine this is a fairly different story because as we've discussed before, I don't see too many images usable for training, or, sorry, not too many assets, 3D assets on my feed that I could just throw into a training process. So talk a bit about the training process behind AssetGen and why it's harder than one might think.
Rakesh: Yeah. So as you alluded to, image generation now is pretty sophisticated. Really, they are really good. And these models are trained. The image generation models are trained on very, very large number of images. We are talking about billions of images, potentially tens of billions of images. Compare that with the number of total 3D assets probably that has been created by mankind is a very tiny fraction of the number of images that are out there. So we have a fundamental limitation on the amount of data that is out there that these models can be trained on and AI models are really good at learning. You just throw data at them and they learn what you want it to. So in case of 3D the way we train this model is we have a pipeline that is, that learns on some noisy data initially it's, you know, we pre, pre-train on it on a larger scale of data, and then we kind of. You know, fine tune this on some curated super high quality 3D data sets. But beyond that, like even Asset Gen is not even a single model there. It's a very complex pipeline that has many stages, some machine learning based, some non-machine learning, but primarily there are two big tasks that assets gen has to solve. One of them is generating the geometry, and the other one is generating the texture. Geometry is what gives it the 3D shape and appear and texture is what is the appearance, what you actually see, the colors. So we have a pipeline that goes from, let's say a user enters a text prompt or an image that is ingested by the first part of the model that is conditioned on the text prompt and generates the 3D asset, which is the geometry, and it's a 3D diffusion based pipeline.
Once we do that, then we have another diffusion model that takes, again, the original text prompt. And as well as, uh, the geometry that was generated in the first stage as condition to generate multiple views of what you would see that object from different views and then that is re-projected and there is a lot of post-processing steps that gives you the final textured asset. So, it's fairly complex, but it's the final output is delightful.
Pascal: So Rakesh, you mentioned that right now there is like this orchestra of different models that work together to generate the mesh and then the projections of the textures on top of this. Is there any work of consolidating this or would that be a benefit of moving this to a single diffusion model?
Rakesh: Yeah, I mean that, that's a really good question. I mean, this is a very active problem that we are trying to solve is like when you have these two models image mesh generation and texture generation as separate steps. It introduces, you know, some room for error. One of the things that we are trying to do is have a single unified 3D diffusion model where you diffuse not only on the shape but also on the colors and appearance and that is exactly something that we are actively working on. And I think this is when solved, it'll make it much easier to get rid of some of the artifacts that otherwise we see that comes from steps like reproduction and stuff. So that is indeed a very promising direction. Yeah.
Pascal: That's very exciting. Okay, so you've outlined your vision before and now you've described how you generate these individual assets for it. How do you go from one or two or more of these assets to entire worlds that you want to create with a prompt?
Mahima: This is such an exciting topic for us as a next step, for our team because this is the real unblock, right? Like if you really think about games generation, so far what we've been doing is building the building blocks that add up to the world ultimately. Now this is a completely new way of thinking.
Now we want to think of a world tops down. So once you utter a prompt for a world, what are the assets that we need to infer that that world needs to contain? What are the different elements that that world needs to contain? That is when AssetGen steps in and starts generating an inventory of the critical items that need to be generated for that world. And then the big area of innovation that the team is currently trying to unlock is around layout generation. And the layout generation pieces will work in partnership with the asset pieces and layout those assets in a very clever way so that all of a sudden the prompt converts into a full world. And everything is stylistically consistent, geometrically consistent. It is navigable. It has the right properties of the world that you would expect to get from a manually artist created world. So imagine of, think of this as building blocks that can magically snap together to form an entire universe.
Rakesh: And to add to what Mihima said and going back to the point of data sets, so. As I was saying that, you know, for 3D objects, we have far fewer training data available compared to say images. Now, when we go to scenes, you can imagine the amount of 3D scene data is even smaller, even a tinier fraction than what we had for objects. So how do you solve that problem? And in order to go from single objects to a world which consists of many objects and environments, you have to have a very good understanding of worlds because world consists of objects which are composed together. And this is where things like visual vision language models come into picture, or they're called VMs in the community. Because VMs have a really good understanding of what scenes look like. I mean, they still like, you know, ingest images when in their training process, and many, many of these images are of world scenes, so they can, they have a very good understanding of how objects compose together, what is their relationship with respect to each other. And that's where, that's the power of VLM that we kind of bring in into generating, going from objects to scenes or worlds and it, it's a fairly complex pipeline you can imagine, like, you know when a user gives a prompt that I want a French medieval village, this is a, you know, pretty simple prompt. But how do you go from that one single sentence to something that is a very visually appealing 3D world which is game playable. So this, this is where like, you know, our vision language models come into picture and interpret that and convert it into something that is tied up to models like AssetGen and other pieces that generate the layouts, the 3D layouts and the objects in them and how they're placed together. So that's roughly the kind of direction that generating worlds needs, and that's what we are looking into. Yeah.
Pascal: So what is the sequence of events then, I'm not getting this entirely right, how the internals work, of course. But with the VLM then basically spit out an image first in 2D ish, based on the ingested images, what the scene might look like, and then instruct AssetGen. Do you effectively create all the individual pieces of that scene?
Is that like roughly how it works?
Rakesh: That that's a good way of conceptualizing the idea because, you know, VLMs, you know, can take a prompt and, you know, generate an image or generate even a textual description of like, you know, which object goes where and that is one way of doing it, and it's a pretty good starting point in some sense, but you can imagine like when you go from, you know, uh, through a text modality or something, it is, it can lose a lot of information and that's where like, uh, you need these models to talk to each other more natively and more, you know, in the latent space, in the representation space. So that's where like, it's no more necessarily like an AssetGen, getting a text prompt of generate me an object. But rather this VLM talking to assets directly and saying, Hey, you know, like this is what a layout should look like. And what you do is, these two models being integrated much deeper in the representation space rather than talking to each other via text.
Mahima: One thing I wanna add, Pascal, what you just said actually very accurately represents the customer flow. I think what Rakesh is referring to is how the models work, but the users will interact with the models in almost the same way as you described. Which is that you type a prompt, you get a 2D image, you can iterate in 2D space when you're happy with the type of the world that you want to see, which will then kick off all of the models to then collect this world elements together to generate a world for you.
Passy: Right. That makes a lot of sense. And I would imagine that computationally, it's gonna be much cheaper to generate this first version of the 2D image before you kick off the image generation process.
Mahima: That's exactly right. And this is exactly how AssetGen also works today because it's much cheaper to iterate in 2D than 3D. This then that's what we've been encouraging our creators to embrace.
Passy: Right. What kind of times should I expect, and this is probably gonna be outdated by the time that we release this, because everything gets faster, both through computational capacity increases and algorithmic improvements. But how long do people need to wait right now to get this kind of first preview and then for the kind of larger scene?
Rakesh: So right now, I mean for the very first step that Mahima said, like you go from a text prompt to an image that is usually just a few seconds, usually three to four seconds, and you'd get a preview of what you're going to get. And once you select that, then you know the whole AssetGen kicks in. And that's roughly around two minutes right now. A little over two minutes. And uh, but you can imagine like this is an area that is, you know, moving really fast. We have prototypes that, you know, are much faster and we expect that this will be, you know, getting faster while keeping the quality pretty high very soon. And I expect like, you know, this time's not very far away when we can generate an entire, you know, 3D object in a matter of a few seconds.
Pascal: That sounds like a fairly good workflow. And again, potentially one of the reasons why there is right now at least a bit of a need for a desktop version because waiting in VR is just really boring. You don't have much to do besides looking at the spinner if it's a loading spinner for an entire world or this. Although I can also imagine as you're waiting for the bigger generation to complete, there might be other things you can tweak in the meantime.
Mahima: I'm totally with you. I'm totally with you. And we've been experimenting with trying to create world elements in VR and it's very likely for us to be able to create, you know, small worlds in a couple of minutes as well. So I think there's a lot of innovation happening in this area and one thing that this last couple of years of AI has taught me is take no time, no capability for granted. Everything is about to change. Everything is going to have innovation.
Pascal: For sure. And talking about the continuous improvements, how do you ensure that the outputs of your models remain high quality? What? What does your feedback loop look like at the moment?
Mahima: Yeah. So if you've been keeping up, at least from the product management side, the big thing in product management and AI is all about writing evals, right? So models are fairly ambiguous, but you need to have the right kind of evaluation criteria to make sure that the outputs of those models are giving you exactly what you are looking for.
So evals is where our team spends a bunch of time. Starting from PRDs of course, but it's the evals that you really, really want to nail down the specific wins the specific outputs, again, that we discussed for this for a specific model. Once the model is developed and we internally feel like, you know what, this is meeting our quality bar and this is giving us what we want, we then have a multi-step process to make sure that before it goes to the creators, we're very, very confident that the outputs will be awesome.
So first and foremost, our team does evaluations. Following that, we then have our internal technical artists also run probably a few weeks of evaluations to make sure, and this is a quick and tight feedback loop. Like they'll eval every day. They'll send us their feedback, we'll incorporate it, we'll improve the model, and just be really agile about those improvements and then after that, we actually do something called a scaled evaluation. So we'll produce several assets, we'll send it out to golden eyes, it's a term used for folks who can really well understand what to look for in these, you know, technical outputs in these assets. And like what are wins, what are losses like, make sure the geometry works, make sure the texture consistencies there. And we have like a hundred criteria, right? And after that, my personal and most important benchmark is the customer. So once we have met the scaled evaluations, we open up these models and release them via an AB test to a select set of users and customers and then we're very active about collecting the right data, the right metrics, the right UXR studies, and again, we're just all very tight and getting that, incorporating that feedback, and just continuing that loop.
Pascal: Yeah, and as we've seen, users are incredibly creative, which especially for something that is so inherently non-deterministic, will be incredibly exciting to see what they come up with. There will be probably ways to break it and create absolute gibberish, but I guess that's kind of part of the fun, like in most kinds of software. Bugs are just kind of annoying things crash or don't work, whereas in games, bugs and glitches can often be kind of source of enjoyment in its own right. So I'm very curious to see what the first results will be.
Mahima: Yes, I, and, and one of the things that we've been noticing, which has been really surprising as a data point, it's that creators who are creating worlds using gen AI tools tend to have a higher engagement at the end of that day. So we are encouraging not just creators to have faster feedback loops and faster time to market. But also make more engaging games and therefore have, you know, a revenue flywheel that works both for them and for the company. So I think this is a very positive signal that ultimately anyone wants to see.
Passy: Yeah. Talking about the kind of people that can benefit from AssetGen and this kind of model in in general. Do you think this will be primarily be these kind of indie creators that don't have like me, the skills by themselves to create assets otherwise, or do you think there's a broader market for this?
Rakesh: So, definitely you know, like it opens up, you know, the space for people not having super specialized skills also to be able to use these tools and create content. But that's not the only thing because, you know, like even professional and semi-professional creators, they can use these tools to move much faster.
I mean, even for very professional creators creating just a character mesh a textured mesh of a character can be hours potentially. And these tools can give them, you know, maybe the initial point, they can get 70 - 80% of their way and they just have to do the last mile of work and they can just move much faster.
So it's, it's a tool that, uh, lifts up everyone.
Pascal: That sounds very similar to what it is. In coding as well, it is often very handy to use one of these tools to create an initial prototype, a little outline, a bit of boilerplate, and then you refine it. You make sure that it actually works the way you want and spend the additional time on it. But just having this kind of kickstart, having all the kind of annoying first steps out of the way and having something to show is for, for these kind of workflows, is incredibly helpful.
Mahima: Yeah, it's gonna be so exciting when users can use this all day.
Pascal: Yeah, so where are we actually today? Is there anything of what you've described available today or any cool demos that you can highlight?
Mahima: Oh my gosh, yes. We were very proud of this. We shipped our models very recently AssetGen and TextureGen are the two models that are in market today. Creators can use it in the Horizon tools. If you download the Horizon Tools on your pc, you can create for Horizon and our Gen AI tools amongst other Gen AI tools that horizon also offers, are easily accessible and they can be just leveraged and played with and stuff. We are of course, this is the beginning of the journey of 3D asset creation and 3Ds creation in general. That is again, very much on our roadmap is coming and we're working very actively towards that. So look out for that.
Pascal: Super exciting, and you just got ahead of me of the next question I was gonna ask, but I'm gonna ask it anyway. What is next for you and your team? What are you most excited about? Maybe there's more than what you've just discussed.
Rakesh: So, you know, as I said earlier in this podcast that 3D understanding and 3D generation. These two things, we view this as two sides of the same coin. And so far, like, you know, our team has worked on shipping a lot of AI models, which work on 3D understanding this is the perception stack that ships with every Quest device. And now we have 3D generation models, which can create 3D content. And our goal is to bring these two things very close to each other because if you understand the 3D world very well, you will generate 3D worlds very well. And if you can generate 3D worlds very well, you can understand 3D worlds very well. And this is the same kind of interplay you can already see today with, uh, native image models. The models that understand images really well can generate images really well, and that is essentially the research direction that my team is taking is bringing these two things together in the representation space so they have a common representation space for understanding perception and generation.
And this is a really exciting field. It's a cutting edge that brings together all of the AI research in LLMs in 3D computer vision, in perception, in generation. It's, it's, it's super awesome. It's very engaging.
Pascal: Yeah. Even for a lay person like me, it is quite surprising how much you can leverage developments that happen in other spaces, like the image diffusion models and LLMs and how you fuse everything together. Mahima, do you have something to add?
Mahima: Now, I was gonna say, and, and to round this out, I will start, I will end where we started, which is that the world is 3D. Right. We exist in a 3D world. We don't exist in a 2D world. Everything around us, we think in 3D, we operate in 3D. We interact with this world in 3D. And the work that Rakesh and team have been doing is just, just just the very first tip of the iceberg of amount of innovation that's possible and the opportunities that we can create for Meta and for our users worldwide.
On how to think in 3D, how to operate in 3D, how to create 3D and the applications are just massive, right? Horizon and Horizon Worlds is the first thing that we wanna solve because we know it is a company priority. But think about autonomous, you know, driving autonomous vehicles and agents training a world where things can be easily understood through simulations. I mean, these are. There's just so many opportunities that this technology can unlock for us, and we're very, very excited to be a part of this.
Pascal: It is very exciting indeed, but we are running out of time, so at this point, all I have to say is thank you both for creating brand new tools for allowing us to express ourselves in 3D and joining me here on the Meta Tech Podcast.
Mahima: Thank you for having us. This was so fun.
RELATED JOBS
Show me related jobs.
See similar job postings that fit your skills and career goals.
See all jobs