Apr 09 2024
Building custom silicon for the future of AI
By Meta Careers
Share icon
Facebook share icon
Facebook share dark icon
Whatsapp share icon
Whatsapp share icon
Twitter share icon
Twitter share dark icon
Close icon

“Even as a kid, I wanted to understand the whole picture,” says Nicolaas V., systems engineer, sweeping his hands to show how he thinks. “Teachers would teach one thing at a time and say, ‘Don’t worry about the next part yet,’ even though it’s all related. I could never grasp that. That's why I love building at Meta — from the moment we start, we're thinking about how everything connects.”

Before joining Meta, Nicolaas built network processors for another company. “Normal chips, like CPUs, carry information with big threads — imagine strong people hammering railway stakes into the ground. The team I was on built a chip made of lots of little threads, like tiny ants carrying bits of information around the world.” Nicolaas brought this type of creative thinking to Meta when he joined in 2020 and began helping build infrastructure for the future of AI.

Where ideas and opportunities create impact

Moving from a small company, Nicolaas immediately noticed the scale and resources at Meta — making it a place he could turn ideas into reality. “There’s a culture of autonomy and trust that’s unique to Meta as a global company,” he explains. “There are more opportunities to try something new. For example, when my team discovered I knew about remote direct memory access (RDMA) and network interface cards (NICs), I was asked to join a project that hadn’t been done internally before: building a GPU cluster in infrastructure. I had the time of my life.”

While large-scale GPU clusters are now foundational for Meta, people weren’t always sure how the technology would work. “After four months of building, our small team was able to do in two hours of training in what had previously taken five days,” Nicolaas remembers. “This was an aha moment for the company, and it led us to the path we’re on today within our AI practice.”

By 2021, the GPU cluster project had grown exponentially. Nicolaas joined another project where the team needed NIC expertise. Once it became a plan of record (PoR), a senior team lead approached Nicolaas with another ground-breaking opportunity. “He said, ‘We want to build an AI chip,’” Nicolaas smiles. “The ask brought my knowledge of GPU clusters, RDMA and NICs together, and it’s how I started on the Meta Training and Inference Accelerator (MTIA) program, our in-house, custom-built silicon.”

Making a case to grow MTIA from the ground up

“The MTIA chip was a really interesting challenge because we were building from nothing. We had no background, no IP and no capability yet,” Nicolaas shares. “The existing silicon was not optimized to operate in large clusters. So we had to develop the silicon, software and hardware to connect the devices together so they could run as one giant computer. To extend the metaphor: Rather than one person hammering a railway, we had to train thousands of people to hammer at the exact same moment.”

While Nicolaas dove headfirst into the technical project, a question remained: Did it make sense for Meta to build its own chips? This opened up a discussion around business cases, which inspired Nicolaas and his team to develop a high-level business plan. “We outlined how MTIA helps us make sure we run more efficiently. We just had to build it and make it scale.”

Nicolaas standing next to two members of his team, looking at a whiteboard with a diagram on it.

This was another experience unique to Meta. “While most companies would hire someone from the outside with expertise, leadership trusted us to figure it out,” Nicolaas shares. “They encouraged us to define the project and build up very quickly.” Nicolaas rapidly expanded his skill set — negotiating with commercial vendors and leveraging his network to find the right partners for Meta. Through this approach, he saw firsthand how Meta was leading the industry forward in infrastructure built specifically for AI.

“MTIA has been the greatest project I’ve worked on, living at the intersection of hardware and software.”

An intersectional approach to AI systems

For the past year, the next-gen MTIA chip has been Nicolaas’ primary focus on the AI systems team. He spends his time ensuring the hardware, software and silicon are integrated. “Building from the ground up has allowed me to ‘stack’ all the layers — which I love as someone who prefers to see the full picture. Our team moves away from traditional approaches, which have many abstraction layers, so we can move data up and down these systems as efficiently as possible.”

Once the team built the chip with enough connectivity and bandwidth, they moved to hardware: designing a system to connect everything together. Next, they built the software to enable communication between chips. “MTIA is relatively small compared to GPUs — that’s what makes it performant and efficient — so how do we get the system to scale larger and run bigger workloads? We did something novel: adding space in the system to start connecting future clusters of MTIA when it becomes necessary.”

As AI grows, Nicolaas believes flexibility will make the biggest difference in innovation. “Our AI models are changing faster than we can adjust the silicon, so we need to be able to flex the system to meet different balances of compute, memory and network. That's our job: to make sure we can flex ourselves as far as possible, up and down and sideways to meet the needs of our models. That’s how we’re enabling the future of AI. That’s how we’re setting up Meta and the industry to grow.”

Stay connected.

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. We may use your information to maintain the safety and security of Meta, its employees, and others as required or permitted by law. You may view Meta Pay Transparency Policy, Equal Employment Opportunity is the Law notice, and Notice to Applicants for Employment and Employees by clicking on their corresponding links. Additionally, Meta participates in the E-Verify program in certain locations, as required by law.

Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, you may contact us at accommodations-ext@fb.com.
Let us know you're interested.
Share your resume or LinkedIn profile with our recruiting team and create personalized job alerts.