Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Meta Logo

AI/HPC Systems Performance Engineer

Meta

$117,000 - $173,000
Aug 21, 2025
Menlo Park, CA, USA
Apply Now

Meta's AI Training and Inference Infrastructure is growing exponentially to support ever increasing uses cases of AI, resulting in a dramatic scaling challenge that engineers have to deal with on a daily basis, and the need to build and evolve the network infrastructure to connect myriads of training accelerators like GPUs together.

Requirements

  • Experience with using communication libraries, such as MPI, NCCL, and UCX
  • Experience with developing, evaluating and debugging host networking protocols such as RDMA
  • Experience with triaging performance issues in complex scale-out distributed applications
  • Understanding of AI training workloads and demands they exert on networks
  • Understanding of RDMA congestion control mechanisms on IB and RoCE Networks
  • Experience with machine learning frameworks such as PyTorch and TensorFlow
  • Experience in developing systems software in languages like C++

Responsibilities

  • Active member of a multi-disciplinary team to develop solutions for large scale training systems
  • Responsible for the overall performance of the communication system, including performance benchmarking, monitoring and troubleshooting production issues
  • Identify potential performance issues across the stack: comms lib, RDMA transport, host networking, scheduling and network fabric. Develop and deploy innovative solutions to address the performance issues

Other

  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • BS/MS/PhD in relevant fields (EE, CS), with 2+ years work experience
  • Individual compensation is determined by skills, qualifications, experience, and location
  • Meta offers benefits, including bonus, equity, and benefits
  • Must be able to work from California if hired for this position