Meta is seeking an experienced software engineer to join their Accelerator Solutions & Technologies group to support the development of Meta's accelerators collective communications software library and optimize the performance of distributed AI/ML workloads.
Requirements
- 2+ years experience in developing C++ codebase
- 2+ years experience in developing Python codebase
- Understanding of performance, benchmarking measurement, and optimization on collective communications and distributed at-scale model training
- Experience with SystemC
- Knowledge of AI/HPC hardware requirements and specifications (e.g., configuring hardware components for AI/HPC workloads)
- Understanding of the transport stack (e.g., RoCE) and its constraints particularly pertaining to interconnect and collective
- Familiarity with relevant tools, libraries, and frameworks (e.g., PyTorch, CUDA)
Responsibilities
- Contribute to our developer infrastructure, including simulation and HW emulation platforms, to enable performance measurement and optimization for Meta’s in-house accelerator programs
- Understand and contribute to the collective communications library, intended to be deployed on Meta’s AI/ML superclusters
- Support networking and compute hardware acceleration techniques to improve ML inference and training model performance
- Perform architectural analysis to ensure system designs meet performance, scalability, and reliability requirements
- Implement simulation models for Meta’s Accelerator ASICs, develop and analyze various scenarios to evaluate data center performance and identify potential improvements
- Collaborate with architects and engineers to integrate simulation results into system design processes
- Use instruction set simulators to define performant firmware for Meta's training/inference accelerators
Other
- Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
- Masters or PhD in Computer Science, Computer Engineering, or any other relevant technical field
- Collaborating with a large set of cross-functional and international partners.
- Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer.
- Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process.