Guest(s): Sabrina Wasserman, Software Engineer
Mobile GraphQL is a framework Meta uses to fetch and manage data in mobile apps with GraphQL—powering products across the family of apps. But building a mobile GraphQL platform isn’t just about sending queries. It’s about making product development easier while improving the user experience.
In this episode, Pascal Hartig talks with Sabrina, a Software Engineer on Meta’s Mobile GraphQL Platform Team, about what it takes to build client-side GraphQL at scale. They unpack how consistency works (publish/subscribe updates without re-fetching), why immutable, type-safe generated models reduce crashes, and how APIs like pagination helpers and optimistic mutations remove repetitive boilerplate from product code.
Sabrina also shares lessons from incremental adoption (especially migrating from REST), how the team decides which “automagic” features are worth the complexity, and what’s next as mobile GraphQL moves toward more co-located, component-driven data patterns.
Pascal: Hello and welcome to episode 73 of the Meta Tech Podcast, an interview podcast by Meta where we talk to engineers who work on our different technologies. My name is Pascal and sorry, just give me a second here. These numbers, they they kind of look scary. Let me just drop them in the right bucket here. Okay, and done. Okay, back to the pot.
According to some unverified stats I found on the internet, GraphQL is used by over 30,000 different companies. Being originally invented to power the Facebook app, it won't come as a surprise that Meta has one of the largest, if not the largest, GraphQL deployments in the world.
And while we've touched on it here many times on the podcast, we've never dedicated a full episode to it, which was noticed by some of our listeners. We will remedy that oversight today with this episode's guest, Sabrina, who works on a team that empowers product developers by identifying ways on how GraphQL can let them move faster.
And yes, that sounds terribly abstract, but think about the last time you had to build a like button in an app and then ensure that the correct state is reflected across all surfaces. Now, if you have a declarative way to query your data and a unified store to read it from, you can get all of this for free. So, even if you don't work on mobile apps, I think there are some intriguing patterns that this discussion reveals about how shared infrastructure can be leveraged to make engineers move faster and create better experiences for end users. But enough from me. Here's my interview with Sabrina.
Pascal: GraphQL isn't exactly new anymore. It started being developed back in 2012, which for those of you who stopped paying attention to the calendar in 2020 was 13 years ago. And it's now being used not just by Meta, but by many different companies across the industry. Given the maturity of GraphQL.
What's left to do? It turns out quite a lot. To discuss just a small slice of the past and current challenges, I have Sabrina with me here, a software engineer on the mobile GraphQL platform team. Sabrina, welcome to the Meta Tech Podcast
Sabrina: Thank you so much for having me.
Pascal: Absolutely. So tell us a bit about yourself before we go into GraphQL.
How long have you been at Meta and what did you do before?
Sabrina: Yeah, absolutely. Um, so I'm Sabrina. I've been at Meta for about four years now. And the entire time, I've actually done GraphQL. I've always been on mobile GraphQL platforms or some, like, renamed equivalent, working on client side consistency and GraphQL APIs. Um, and this is technically the first position I've had out of college, so I've been, like, very much in the GraphQL space for most of my professional career.
Pascal: That's actually amazing. So basically your entire career is just GraphQL. I think I could not have a better person than on the pod to actually discuss this. So tell us a bit about your team. So what is your mission? What do you do as an entity?
Sabrina: Yeah, absolutely. So the platform team specifically is focused on client side GraphQL and focused on what can we do as a framework to make adopting GraphQL easier across the family of apps. We try to focus on not just building for, say, Facebook or Instagram, but what are generic solutions and generic patterns we see with GraphQL that can basically make it easier to adopt and where are the pain points that we can step in and build APIs to make things simpler.
Pascal: Fabulous. So, okay, let's just pick some people up who may have not heard about GraphQL, even though this is probably a fairly small slice of the audience, I would assume. Can you give us just the briefest of descriptions about what GraphQL is?
Sabrina: Yeah, absolutely. So GraphQL is a querying language, right? Um, the idea is you can query against some graph of data, but unlike SQL, it's a lot more friendly to product development. There are basically a couple main operations you can take with GraphQL. You can query for data. You can mutate data and then re query for it all in one network round trip.
And the idea of GraphQL is to be very developer friendly, right? It's implementation specific. So it basically just defines the querying language and each implementation can differ both on client and server. But the idea is you basically have some schema of data that you can query against using the syntax.
Pascal: Fantastic. And I guess we want to focus primarily on the client side here. And one of those clients is Relay. Can you tell us a bit about what Relay is and what it is that you primarily work on? Which, as if I understood correctly, is not Relay.
Sabrina: Yes, absolutely. So Relay is an open source GraphQL querying client and also consistency engine. This exists in JavaScript. It integrates very tightly with React and I'm sure many folks who use open source GraphQL actually use Relay quite a lot. It gives you a lot of benefits, particularly data co-location.
So if you use it with React, the data you're querying for and subscribe to lives directly with the UI components you're using. So you're only re-rendering exactly when data changes. It provides this pub/sub model that basically means if data changes in Relay, you'll get subscriptions to it and somewhere else. And a lot of really nice, uh, APIs for pagination. Relay is like a gold star for GraphQL APIs that we tend to follow, at least on the mobile side.
Pascal: Right. Can you talk briefly about the consistency engine that you mentioned there? So what exactly does this mean for the client.
Sabrina: Yeah, absolutely. So I work on mobile GraphQL, but a lot of what we do is the same idea of consistency re-relay. Basically, the idea is, if you have a bunch of GraphQL data in an app, and you might be querying for the same data in multiple different places. Say you have a contact book and you have your list of contacts, you click into the contact, the individual fields for say, like the name and the last name are the same as the data that you were looking for in the larger list.
If you update that data somewhere in the app, be it through some local mutation or through some query to the network. You want to be able to let everywhere else locally know that this changed, but you obviously don't want to do it in a way that is networking intensive. So consistency is basically just a big publisher subscriber model.
It says, Hey, this data changed. You are subscribed to changes in this data. And so we alert you and vend an update to this model so that you know that things have changed and your UIs can update cleanly under the hood without any kind of networking traffic.
Pascal: Got it. That makes a lot of sense. I used to work on a different social media app, and I do remember the problems we've had in ensuring that if you like a post in a feed view and then go into the detail view, you still see that it was liked. This is actually a surprisingly tricky challenge if you do not have this kind of consistent data model behind the scenes that ensures you're actually pulling from the same source and without, as you say, fetching the entire object again from the network.
Sabrina: Yeah, exactly. It's one of those problems that tends to grow a lot. Like, you try to handle it in case A, and then you have to handle case B, but whoops, you added case C, and then it blows up. So the idea of doing this at kind of a GraphQL consistency engine layer is we can handle it for product developers, and then they can focus on making their UIs look good and making the products work seamlessly so they don't have to worry about data consistency.
Pascal: And I think this is gonna be a theme in our discussion here, because what you're just describing is you're taking a problem away from product developers. If this didn't exist, then you as a product engineer would have to spend all this time ensuring that this like button state is correctly replicated across all the different surfaces that are out there.
Sabrina: Yeah, exactly.
Pascal: But if you're having the consistency engine behind it, then this is one worry taken away from you just moved up the stack into you, the graph care layer, and you can carry on making sure that you are delivering the best experience for our users.
Sabrina: Yeah, exactly. I think a big theme of what we develop is basically how can we make everyone else's lives easier so that they can build as fast as possible.
Pascal: Right. So I mentioned that Relay is something that we've built for React, or actually the open source community has built for React. So what is it that we use on our mobile clients? What the equivalent there?
Sabrina: Yeah, so we have a framework that is very similar to Relay. Um, we call it Pando. Um, but at a high level, it provides a lot of the same functionality as Relay. It handles things like data consistency. It handles things like individual subscriptions to data. It handles things like pagination APIs. Optimistic mutations APIs.
Lots of like client side APIs that Relay also provides. The main difference is because it's for mobile, we actually implement it in C++ in a shared layer that we then have native wrappers around. Because it's not JavaScript, we have to contend with things like threading models and things like that.
But we definitely use Relay as a north star of what a lot of our APIs end up looking like on mobile. A lot of it ends up being very similar.
Pascal: Right. So you as a product developer, if we're going back into that space, do you ever have to actually write C++? Or what's the kind of end user experience in that case?
Sabrina: So, say you want to use our framework, and you're basically going from scratch. You get two big things. So, the first is you can write a GraphQL query, but that's just GraphQL, right? Like, how do you actually get that in the context of a native platform language? So, for Android, Kotlin, or Java, on iOS, Objective C, or Swift.
Well, as of now, you can't, right? So that's kind of step zero of where our infrastructure comes in. There's two components. One is a build time component. We basically have a script that you can run, and it will generate a type safe model accessor based off of that GraphQL query. So this would either be like a Java or Kotlin implementation or an Objective C implementation. That corresponds to the exact fields that you query for within your GraphQL query. So that's like step zero. Step one is then we provide this like runtime query execution environment. So now you can take that generated model and say like, okay, based off of this, execute a query based off of this query. We send it to network. We do things like caching, and then we also involve the consistency engine and say, okay, if anyone's looking for this, publish, if you're interested, subscribe.
Pascal: Got it. I'm not really sure if we have a nice adjective for something that feels very Meta, as we had with Facebooky, but the first step you talked about, the code generation part, is something that we do just across the board, whether you want to build a UI, even though with the newer iterations of Litha, for instance, we now have some kind of Kotlin, it's not code generation anymore, but it's just compiler magic basically happening for you. But generating safe data models is something we do basically everywhere and even if it stopped there, if that was the only thing you did, I think the benefits would already be blatantly obvious to anyone who's interacted with an untyped REST API before and then notices that the app crashes because, oh, I had a typo in my accessor to this dictionary I got back.
Sabrina: Yeah, exactly. And that's one of our big value adds we hope from REST and for some of the existing usages of REST right now at Facebook. We get these, like, type safe accessors, the models are immutable themselves, and so we make guarantees that this isn't going to change out from under you unless we vend you a new model from a consistency update.
And beyond just type safe accessors, the other thing we also generate is type safe setters, which, that doesn't mean you can mutate those models, they're still immutable. But, say you want to do some kind of local change against your schema, so maybe you have a post and you click hide, and you want to have that hidden status show, but that doesn't really correspond to server data, it really is client, but you want it reflected everywhere, we can generate these type safe setters for you that you can publish into the Consistency Engine and still do safely without breaking, you know, all the network backed schematized data as well for all your subscribers.
Pascal: So at this point, you're basically going way beyond a simple querying client, because there is no network stuff involved. You're just providing client developers, product developers, an opportunity to store states that is consistently queryable across the application.
Sabrina: Yeah, exactly.
Pascal: That's really exciting. And I'm sure this has kind of grown out of a need more than out of a kind of design committee, because this is not really something I would think of when designing a client library for a rest api, for instance.
Sabrina: Yeah, a huge part of our job is just listening to product developers tell us these problems, pattern matching them, and then developing something in native GraphQL for them. This is where a lot of, say, like, the pagination work we've done in the past comes in.
Paginating with GraphQL inherently is hard. It's kind of gross. You have to query for the same field over and over. You might have to write two queries so that your first query encompasses, like, everything your view needs and your second one is for stuff that is only specific to pagination. You have to concatenate all of these lists. And we take feedback like that and say, okay, how can we build a first class GraphQL experience for you so that you don't have to contend with all of this?
We do it internally.
Pascal: Let's talk a bit more about this. I find pagination fascinating. I feel like the, there is a lot of complexity in there that you don't see when you just think about it briefly, because I wrote some photo gallery when I was back in high school with shots that I've taken, and it was perfectly fine there to just basically have a page number, multiply this by the number of elements you want to see on a given page, and then you provide an offset and a limit, and you're basically done.
That's pagination. So, obviously, it's not quite as simple, but let me just frame it like this for you. Why is that not enough?
Sabrina: Yeah. It's a great question. So if you were to schematize like the simplest version of pagination, it would basically exactly that. It's like, okay, it is a list of some data object I have, and it is some like index with which I want to start and maybe the size I want to go from. And then every time I paginate, I just update that index and update the size based off of the next set of the list. This basically doesn't stand up to mutations on the server, so one of the operations you can run on GraphQL is obviously a mutation, and so you can imagine if this list is changing as you use it, say it's like some list of friends for a contact book. As you insert or remove or reorder things, the indexes just blow up. You may end up querying for, like, duplicate pieces. You may end up skipping because on the server they've changed. And so it does end up being very robust to the changes. So then you move on to something more complicated, like cursor based pagination, which is basically a linked list. It still can have issues like data duplication, but it tends to be a lot more robust to changes like that than, say, indexes.
And cursor generation itself is, like, a tough problem, right? How do you generate these stable cursors that make sense for a piece of data that is, like, ever growing, like a friends list or like your notifications, right? So the server provides apis for how to actually like, do this generation simpler.
And then on the client, what we try to do is say, okay, fetching these next queries for you could also be complicated because if you think about when you're writing your initial UI view, right, you're not writing the query necessarily for only the list of things that you're looking at. You might be looking at, say, like, what the UI state should look like. Is there a button that should appear here? And what should the text say on that button? What color should this render in? What information should be displayed, say, in some header that's completely unrelated to the list, right?
And then every time you paginate, it's basically a re-execution of that query with, like, the indexes changed and also the, or in our case, the cursors, right? Which means you may end up writing a second query now as a product developer. Okay, so you have query one that you execute when you enter the surface. Query two, where you copy over the cursor you ended with on query one and you send it through. And now you have to do this n times every time you scroll.
You have to concatenate the lists. If you want to cache the list, now you have to try to do some custom mechanism to like combine everything and write it to disk and read it back in a way that the cursors are still stable. And this just ends up being like the same boilerplate code over and over and over again, And it is surprisingly easy to get wrong.
Which is why what we try to do is do all that generation for you under the hood. So Relay actually has this open source. They have something called the Connection Specification. And it dictates basically if you define your data representation of a connection type, or basically something you want to paginate on. Where you guarantee you have the start and end cursors under these exact fields so that we can query for them and your data is always under a field called either edges or node. We can actually query or generate queries against that known schema or rather that known specification for the schema.
So now as a project developer, you write that initial query. You annotate that single field that follows the specification. And you're done. We generate a type for you in Java or Objective C that basically has a pretty little function, loadNextPage, and it takes in the size and we get to do all the cursor management, all the caching, all the query regeneration, all for you.
As a product developer, you don't have to handle that at all.
Pascal: And again, one of those things that otherwise if I were in some planning session and somebody told me build a paginated list previously I would have to think about all these complexities like how often does the data change is it like, as I've just talked about, like a static photo list, maybe I'm okay with indices. If I'm the only one actually managing this, maybe this is fine, but if it gets more complex, if it's like a feed of events that are being published and are coming in in real time, then it's definitely not going to be good enough, and then you're going up and up and up in your estimates of how long it's going to take you, and in this case, it's just done, it's done for you,
Sabrina: Exactly. The hope is that their development experience becomes a lot simpler and faster.
Pascal: So you talked a bit about the GraphQL specification there. What's kind of your relationship to, I think it's the GraphQL foundation now who oversees that. So how do you interact with this? Are there often features that you need actual spec changes for, or are you mostly painting within the existing boundaries of the spec?
Sabrina: So, for the most part at Meta, we tend to work within the boundaries of the spec. The cool thing about GraphQL, though, is because it's implementation specific and you can define your schema, if you want to make, say, like a custom directive, you don't need to work with open source to do that. You can do that internally, and then if you find it valuable, we can bring it to open source.
I personally have only been involved in like, very little of the GraphQL open sourcing, but there are a lot of conversations in open source that have started from problems that Meta and other companies have seen before. A big example of this is incremental delivery. Um, so at stream and at defer are two features in GraphQL that dictate how the data payload should be sent from the server, as opposed to one giant payload you can break it up.
Streaming is just Uh, for some lists you get n plus one plus one deferring is saying, Hey, you don't have to compute this data first on the server. Give me the stuff I need and then give me this later. And these are things that come from, you know, struggles we have at graph, uh, internally at Meta and other companies.We discuss them and they come into these open source proposals, which is really cool.
I know there are also, one of the members of my team is also one of the members of the technical steering committee of the foundation. So he's also a very big encourager of if you see things get involved. The GraphQL community is really fantastic and people are really excited to hear our ideas and we're also very excited to hear other people's ideas.
So it's very cool that we have that opportunity.
Pascal: I didn't even know there were entire GraphQL conferences just dedicated to this one topic and you see all the different companies come in and talk about their experiences and learnings that I've had from it. So that just shows how far this has grown beyond the initial, I guess, publishing of a spec that it was in the very beginning with a reference implementation, I believe.
Sabrina: Yeah, and it's very cool to see, uh, the problems people run into because there's a surprising number of overlap. Um, like, I found, I went to GraphQL Conference last year and I got a lot of insight into what other companies are doing. Some really neat ideas with what they're doing, but also it's funny to see the same problem types come up and sort of nod knowingly and empathize with that engineer.
Oh, I've gone through this exact issue. It's really cool that we're, you know, I'm not totally off on what I've gone through because you've gone through it too.
Pascal: Yeah, fantastic. And of course, if somebody has worked on the outside with GraphQL, it's going to be much easier for them to jump into the code base if they join Meta at some point and be productive here.
When we talk about different features that you can implement within the bounds of the GraphQL spec, so it seems like it is quite dynamic in that sense. How do you think about the complexity that new features, Introduce into just, just working with it, because I think there's always the trade off about providing just the absolute bare minimum that people need to be productive versus providing literally anything that people could use within their products, but then it becomes a mess to work with.
Sabrina: Yeah, it's a really great question, and we contend with that a lot. We try not to make our APIs too auto magical, because if they are, we're taking on all of that tech debt. Even the pagination APIs are probably the top level of what we would do for an API, because that involves, like, query generation, internal state management, tracking, thread safety without tracking, loading states, etc.
And that is a lot to do on behalf of clients.
I think the two pieces we listen for a lot are, one, is this a common problem or is this a one off problem? Because sometimes a product might have a really specific UI and say, Oh, it'd be so nice if your GraphQL framework could do this. And if it sounds like a very specific problem, we have to make the right call of saying, We can help you architect this view, but we're not going to bring what you're suggesting into our framework itself because it's just a huge technical burden and there aren't a ton of adopters.
So that's piece one. Piece two is also, does it make sense with the spec? Is it going to break the spec in any way? We've had discussions about this with, say, Delta updates from the server. Or is it going to, uh, do something that breaks our framework specifications? So we have framework level guarantees.
For example, we'll never, like, like, leave you hanging if you've executed a query will always either trigger an update callback or an error callback, And we've had proposed APIs that sometimes may be asynchronous and make you wait for potentially no updates, and these are the conversations we have of what invariants are we breaking, and if we do need to break them, is it something that more than just one product is going to need?
Pascal: Yeah. And it's a very common discussion to have within Meta infrastructure teams as well. I used to work on Litho and there's also always the consideration of, if this is, as you say, too automagically and things did not work as intended, what are people going to do? Most likely they're going to come to you for support.
And, I'm not sure what your ratio is, but it is not uncommon for one engineer to support hundreds, if not thousands, of developers within the company if you work on an infrastructure team. So that's also consideration to make about how likely is it that people will actually fall into the pit of success with the API that you have provided.
Sabrina: Yeah, absolutely. Um, our team is a lot smaller than people might realize, so it does fall into the camp of there might be like two or three people that support one thing. And so we try to be really mindful about what we build and make sure that it's the right things to take on.
Pascal: Yeah, so far we've talked mostly about other companies, funnily enough, but can you give us a bit of an overview of the usage and the state of GraphQL across the family of apps, or even broader within the kind of product suites within Meta?
Sabrina: Yeah, so it's interesting. It varies a lot across the family of apps. Um, when Facebook was, or pardon me, when GraphQL was made, it was really made with Facebook in mind, right? Facebook, the mobile app. And so if you look at GraphQL usage across the family of apps, Facebook is by far the most sophisticated.
They have a lot of our most sophisticated APIs and they have a lot of, also our. V0s and V1s of things. It gives us a lot of insight, both into what does GraphQL usage look like at scale.
If you ever want to test any performance optimizations, which we do quite a lot at the infrastructure level, Facebook is the go to spot. Because you just have so many users, so much traffic, and there is so much GraphQL specifically ingrained in the app that it is a very good space to look at problems and say, okay, what are people facing, and can we help people in the rest of the family apps or even in Facebook building that.
Instagram is much different. For example, Instagram is a lot more REST traffic that is now in the process of migrating to GraphQL, which brings a lot of really interesting concerns. How do you schematize their data against a GraphQL schema rather than REST? And then also, if all of your UI components are built off of REST, how can you migrate to GraphQL and get the DevX benefits of GraphQL with, say, like our type safe generated models, um, not overfetching, being able to subscribe to specific fragments of data, um, without basically forcing the entire app to refactor.
So a big work stream we work on right now, at least in the Instagram space is, how can we do this incrementally? How can we incrementally introduce GraphQL? Not just for new use cases, because those are relatively simple, right? You just schematize the data and you write it in GraphQL. But also for existing use cases.
What surfaces can we make easier so that you can adopt this incrementally? And then as you go, start reaping those DevEx wins.
Pascal: Yeah, and the incrementalism you mentioned there, it is so clear when you see Instagram, which is not exactly a new acquisition anymore, but it was one originally, how long the REST model has persisted and there are definitely companies out there who approach their acquisitions of, okay You're going to stop all new development You're going to bring this over to our stack and then you may continue building stuff. But that was simply not an option for an app that had so much momentum and where we wanted to ensure that people can enjoy new products without a, you know, two year pause until everything was moved over to Hack and GraphQL on the backend.
Sabrina: Yeah, absolutely. And we also want to be really mindful of, we're not just bringing GraphQL for the sake of GraphQL. It's that we're bringing GraphQL to improve some part of performance or user experience without interrupting that. So it's, it's a very, uh, like tricky line to walk. Um, but I think it hopefully will pay off.
Pascal: And also worth mentioning that these decisions are never top down. I've never seen a VP say, okay, you, you're going to stop now and you're going to introduce GraphQL into your app. It's engineers who say, Hey, I really want to have easier pagination or optimistic mutations and don't want to spend all my time rebuilding this from scratch.
So why don't we introduce GraphQL or a different UI framework that already exists in a different part of the company
Sabrina: I feel like we are very lucky at Meta that I cannot remember a single instance in my four years here of someone saying go work on this. A lot of it is me going to management and saying, Hey, there's kind of this really neat data problem. Can I solve it? People are like, sure. And then you go build it.
And it's really nice to be able to have that relationship with management and the engineers where you can just have the freedom to go do that and it spawns projects like this. Like a lot of pieces of our infrastructure on mobilespawn from like, hey, could we try this? And all of a sudden, many years later, there's like, tens of engineers working on these frameworks.
And it's, it's really cool how these decisions can be engineer led.
Pascal: Let me ask you a question about this, because one of the benefits of a manager handing down some neatly packaged project for you is that there's probably some metric attached to it. It's like, drive this number up or down or something like this or hold it steady. But if you're going to somebody and say like, I think I am seeing a certain pattern and I think we should make this better for the entire organization.
How do you measure success in that case?
Sabrina: Yeah, so as you said, if you have an API that's meant to have a performance improvement, it's a very clear metric you can assign to it. Whether it's the end to end latency of the request, whether it's, uh, you know, how quickly the data gets delivered, whether it's, you know, how many resources it uses on the actual client, like, uh, CPU. And so it's very easy.
For things that are developer experience improvements, things like pagination APIs. A lot of it comes from talking with the developers, we can do sentiment surveys and understand, hey, how long did it take you to write this feature with this API? And how long would it have taken you to write this feature without this API?
We're actually starting to build tools on the client side in GraphQL to measure this automatically, which is really cool. But because we often get a lot of product teams with the same type of problem scope management tends to be very open when we say, you know, we can give you developer sentiment here. And it tends to pay dividends in the future, so when new products come and they immediately adopt it, that can count as a quote metric for, hey, this was useful because this new product came in. They didn't need our help. They adopted it immediately. They built very quickly. And it adds more momentum to this API.
So adoption is a biggie as well. If you build something and one person adopts it, you know, might indicate that you were building the wrong thing because you built a one off in infrastructure that should hopefully be for many people. But if you build an API, people are able to adopt it, especially if they can adopt it without specific GraphQL infrastructure support. It's a really strong indicator that it was a positive API, and this gets rewarded very well, as if it was some metric improvement.
Pascal: Yeah, I'm very glad that we are paying so much attention to this qualitative pieces of feedback as well. I actually used to run a dev x focused survey for mobile frameworks specifically for this. And it's really interesting because you can just see how well liked is your framework in comparison to others. And React for instance, always did incredibly well. And it was really interesting then to compare your own framework that you worked on to that. But GraphQL is another one and the adoption metrics, or even just the free form feedback you usually get in there, it's incredibly helpful to drive the decision making of where to invest future resources into.
Sabrina: Yeah, exactly.
Pascal: So we talked about pagination as one of these examples where your work actually pulls complexity out of the scope of product engineers. Do you have another example of something you've built or contributed to or your team did that did something similar?
Sabrina: Yeah, absolutely. One of the APIs that Relay supports, as does our mobile frameworks, is optimistic mutations. Uh, let me be very clear, I am not the person who invented this. I have helped build implementations of this in newer versions, but this certainly is not from my brain. But the idea is basically, say you want to execute some GraphQL mutation, quintessential example being you want to like something, let's say. Um, in a normal world without an optimistic mutation in GraphQL. The way you do this is the button is pressed, you execute the mutation. You wait for it to finish on the server. It comes back. LightCat updates. You show the LightCat update. Great.
The problem is network round trips are expensive, right? Especially if you're somewhere where the internet might be particularly slow. You will see a visible lag in that field being updated. And that's a user experience problem, right? We want product users to be able to see this update immediately. And so that's where optimistic mutations comes into play. It's basically, how API where if you know what the success state of data is, in this case the, a story is liked, then you can actually show that immediately.
And what does this actually look like from the GraphQL API side? It's a couple of things. The first is we have to have some type of type safe setters for this data. So we generate special classes called model builders that look at kind of the structure of your query and what you're querying for and per object lets you actually set those fields. So you could have say set like status to true. And what we do is we actually publish that immediately to our consistency engine, so all subscribers get that update.
So they immediately get that like state, right? That way, if, in most cases, when this like comes back successful, then we basically just update our state with any metadata, but none of the UIs have to really change. And it's basically an immediate change to the UI, even though the change on the server wasn't immediate.
You can imagine this does have some complexity because we need to support rolling back changes if, say, you have no network, and so you've tried liking something, you have no network, so we have to change that, both for you and the subscribers. Because the success case is the more common case, we're really focused on how can we make this UX better, how can we show this right away, and for product developers, it's just get a model builder, set like status to true, execute mutation with optimistic model, plug and chug, and they're basically done.
Pascal: Yeah, and it all comes together because of what we've described before. The consistency engine, right? You just pull your data from that model that you already have so there is no need to update anything in the rendering logic because that is already there. So it's really just owning this entire data flow as you do that and when I just think about all the different things that I as a product engineer would otherwise have to build, there's probably some sort of loading spinner or something you want to have. Or at least disable the button so you don't click it again while it's working. Potentially want to have some retries if you want to do it even more sophisticated, some sort of worker that waits until the network is there before re-executing it. And of course, the rollback thing that you've described, and it's all gone.
It's all the stuff that you now do for the product developer.
Sabrina: Exactly. And the nice part is this is both a developer experience and a user experience win. So it's twofold, right? You get both the developer can work faster, and the product end user gets the best experience right away, which is really cool to have as an infra team because even though we're very much in the infrastructure, we can still support, you know, making the product experience best for all, which is really rewarding.
Pascal: Yeah, and for like, it might be a small annoyance if you don't see your click or tap immediately reflected in the UI, but if you, for instance, submitting a comment, that is usually completely blocking operation and you can't actually move on to the next page until it's submitted. And I've over the past few years, I've spent quite a lot of time on trains through Europe, where the internet connection is, mediocre at best. And I notice immediately which apps support a model of optimistic mutations and which ones don't.
Sabrina: Yeah. Exactly.
Pascal: So for a feature like this, or maybe a different one, because you said you weren't directly involved in it, in its development or conception. How do you collaborate with your client teams in this case?
Sabrina: Yeah. So a big part of it actually, as you said before, a lot of this is engineer driven, so we'll hear from an engineer, I'm trying to do this thing. Is there a way to do it in GraphQL? And if the answer is no, but we've heard it from multiple people, that's when we kind of get to the inception stage of okay, we, I think we need something native for GraphQL for this. I think we need potentially an idea for what this could look like so that it's much simpler for them. An example of this could be client resolvers is one of the things we've worked on that Relay supports.
Um, a client resolver for context is an API where you might have some fields you query for via GraphQL But every time you use it, you have to have some business logic that gets applied on the client. And you don't want to duplicate that business logic, you know, everywhere across the app. And so instead, you create a special client side only field that's annotated. And when you access the field instead of just getting this like server backed fields, it's some lambda that you can fill out, hat's the server backed field plus that little business logic.
And so we come up with ideas like this when we have products coming to us and saying, you know, I'm trying to migrate from REST. I have a custom getter in REST that does lots of business logic. Migrating to GraphQL is now hard, because I don't have an equivalent of this, and I'm worried that if I migrate to GraphQL, I'm not going to be able to have this field updated in the right state.
So this is where we all come together and might write, like, what we call an RFC, or a request for comments, which is basically, okay, if I'm thinking about this from a GraphQL developer first perspective, What does this look like? And for mobile specifically, a lot of this also says, what does Relay do?
Because Relay is quite sophisticated, and so we try not to reinvent the wheel. And it's probably wrong, in fact, to create different, uh, primitive APIs in Relay versus our mobile version of this. Because we want to have things aligned. And so, uh, in this case, Relay actually supports resolvers. And so we look at what is a mobile version of resolvers. Which, in our cases, you have a custom client side schema. You annotate that field with a client resolver directive. And then it generates this stub of lambda that you can fill out. It's a little different in Relay, but this is what it looks like in the mobile world.
We'll even talk to Relay and say, What are your thoughts on this? Uh, you know, What do you do here? And what are challenges you run into? Because we'll probably run into the same ones. From there, someone might build a prototype of it, and this doesn't have to be, you know, like a perfect prototype. It could just be one that only works for one product surface. You test it, you know, versus dogfooding, and then hopefully later in production when this feature is ready to roll out, you make sure everything works 100%. Obviously with gating evolves that if things go wrong, we're able to adjust and roll back and not break user experience.
And then once we have kind of a test in flight, we all agree on the higher level design, we've proved out this concept. Well, we start, okay, let's solidify this. Let's add validation for the usage of this. Let's make this a really sophisticated API. Um, and then we'll start advertising it so that product folks can adopt. And usually when you post about it, you'll have an engineer or two comes and you can say, oh, I was having this exact problem, which is really nice. Cause then you can point them to the documentation or your first example.
Pascal: Yeah, the best kind of validation.
Sabrina: Yeah.
Pascal: Is there a way, how you prevent people from adopting experimental features too early?
Sabrina: Um, I wish we had a better one. Um, I would be lying if I said we didn't have cases where people find an API and say, I want to use this, and we're not ready. The nice part is, a lot of the times we'll actually name these experiments like, Experimental. Do Not Use. And we actually are mostly okay if people come to us first and say we're really interested in this, because it actually doubles down on, you know, this is valuable. Someone wants this right now, even while we're testing it.
As our general philosophy is, as long as things are gated, such that if things go wrong, we can shut off your feature. That's okay. And sometimes we even add infra backed gating. So that say someone actually does use something experimental. It's just very easy for us to say, no, no, we're going to de, even though you have this enabled, we're going to set it off for you.
XMLA lot of the times, we hopefully work fast enough that people don't get it in the experimental stage. But there certainly have been a couple APIs where people see and get really excited about and use. And so, um, a lot of it's really just about communication.
Pascal: Encoding the adoption status of a feature in the name is also something that has been discussed out in the open because React itself often makes use of this. There are also these infamous internals which are just like, set state, unsafe, do not use or you will get fired, things. They're surprisingly effective and they do reveal something about the internal culture which is often very copy-paste driven and I don't necessarily mean this in a derogatory way. But it is, you get inspired by what other people do. If you're building a feature that resembles something else that is already out there, you will open the source code for it. And you won't check like, oh, uh, let me check the document for this. What adoption status this is. Is it already for open enrollment or not?
No, you will just use it. And if it works, you'll ship it.
Sabrina: Yeah. The times it can get frustrating is say you have something that maybe shouldn't have been exposed like an API that maybe products shouldn't have control over. You add it for one case, you go to delete it and realize it's gone. Uh oh. Someone else has this.
Pascal: I’ve been there. Yeah.
Sabrina: And you know, it's, it's easy to get frustrated as an infra engineer, but honestly, I, I really empathize with the product developers, right? Like they're trying to do what they can do to get their feature out fastest. If they really need this, then maybe it indicates to us, we need to rethink what we're doing and why they need this and what can we do to make them need something better or make this work as is. But it is, it is funny sometimes, especially if you get a bug report and it's like, well, experimental's in the name. So, uh, I think it was experimental.
Pascal: You’ve have been warned. Yeah.
Sabrina: Yeah, but, you know, I do really empathize with the product engineers.
Pascal: Can you tell me about an occasion where something was particularly challenging?
Sabrina: Yeah, just in, like, API adoption or pano development in general?
Pascal: In general, anything that comes to your mind.
Sabrina: Hmm. Yeah, I think I have a good one. one of the biggest projects that I have worked on at Meta, and I want to again be very clear that I was not the only person on this project. I had many colleagues, who were fantastic to work with on this and who also did it for much longer than me. But over the past couple years, we've worked on a new version of our mobile relay per se. A lot of this is Banked on performance improvements, you know, more shared memory models, et cetera. And so we had a lot of confidence in it. And so we'd been adopting it across the family of apps. And as I said, Facebook is kind of the big motherload of GraphQL usage is. It’s where most GraphQL is used within the family of apps. And so we got to this question of can Facebook use our new infrastructure? How do we make the primitives that exist in Facebook and generated GraphQL work with those of our new systems? And how can we do it in a way that doesn't, uh, interrupt product developers?
This ended up being a multi year effort with, again, a number of engineers outside of myself. That was very challenging because you basically end up learning a lot of the ins and outs about how all the products are architected in Facebook app. It's, it’s almost like a crash course in how GraphQL was used when it was first adopted up to its most recent adoptions. And you have to support everything that Facebook supports, right? Even if it's an API that we're convinced, you know, doesn't have as much merit or we think was wrong as we've gone through and want to change it. You have to figure out ways to make it work with your new system. But how do you do that in ways where you don't compromise the new system or turn the new system into the old system?
And this is one of the most challenging problems we worked on, I think, or I personally worked on at the company. Being able to migrate, figure out what parts break and what subtle ways. What invariants do we have in the old system that we didn't actually realize were invariants, say like ordering of fields coming back from the server or orderings of operations that might not be guaranteed in the new system that are in the old. And make decisions on where does it make sense to actually fix the product here versus where does it make sense to change the infrastructure. And how do we do that without compromising the benefits of the new infrastructure comparative to the old.
Pascal: That is a scary set of challenges to deal with.
Sabrina: Yeah. And you end up working a lot with product engineers. And also like, as is life, sometimes these product engineers don't work on these products anymore. People switch teams. And so, sometimes trying to figure out who the best owner is, you can become either an owner or at least an expert in certain services. And there were a couple spots where we actually jumped in and shipped fixes in product code and were able to even ship performance improvements because we realized as we were doing this, let's just fix this while we go. Which, as you might imagine, makes the project a lot longer, because now you're not migrating info, you're fixing a product, but on aggregate, you're making the whole app healthier.
Pascal: For sure, but I think that's one of the exciting parts about we, working on these infrastructure teams, that you are empowered to do these things. I do it all the time that I am working on a piece of infrastructure, see somebody using stuff in the wrong way. And I know that there are companies that have completely different ownership models where I now would have to go to this team and beg them to please fix this so I can roll out my new API. But over here, it is perfectly normal to just go in and fix it yourself.
Sabrina: Exactly
Pascal: Okay, maybe just one last question before we wrap things up. Is there anything coming up that you're excited about?
Sabrina: I think I'm in general just excited about our infrastructure becoming a lot more mature. When I joined this team four years ago, we were rebuilding our old platform from scratch.
The really fun part about that is you get to just jump in and build a lot of these things from scratch or architect these really cool pieces. And we have a lot of new potential avenues to explore in the mobile GraphQL space that are potentially really cool performance opportunities or ways to use our APIs differently. A big one is Relay, for example, has a lot of data co-location, right? They integrate so cleanly with React and it's a really pleasant developer experience to be able to have your subscriptions live at the component level in band with your data definition. You only re-render an update specifically with that data.
And I think. longer term, in mobile, we're trying to get towards that state of having that co-location. And so the power in that and being able to build APIs that fit that need is really exciting, both from an infrastructure perspective to build it out, but also to give our product engineers something more familiar and also hopefully more performant.
Pascal: I have to agree that all of this sounds incredibly exciting, and I almost feel now I'm missing out because I rarely work on any products and haven't touched GraphQL in such a long time. But all of these cool features you and your team have added got me genuinely excited about all of this.
So thanks so much for working on all these amazing features that make the lives of our product developers easier and joining me here on the Meta Tech podcast.
Sabrina: Yeah, thank you so much for having me. It was really fun to talk about GraphQL.
Pascal: And that was my interview with Sabrina. Although I think that Sabrina has made it abundantly clear during the interview, she wanted me to stress how she is just one person working on a larger team that drives all these amazing features. If you want to surround yourself with smart and humble engineers like Sabrina, she's going to hate me for saying this, then why don't you check out metacareers.com. And if you identify any other blind spots in the coverage on this podcast like Eric from Sweden who called out the lack of GraphQL focused discussions, then send me a message on Instagram or on Threads where we are @MetaTechPod. And that's it for another episode of the Meta Tech Podcast. Until next time, toodle-loo.
RELATED JOBS
Show me related jobs.
See similar job postings that fit your skills and career goals.
See all jobs