Our team makes PyTorch run faster and more resource-efficient without sacrificing the flexibility and ease of use of PyTorch.
Requirements
- Experience in ML compiler, Distributed Training, ML systems, or similar
- Proficient in Python or Cuda programming
- Experience working on other ML compiler stack, especially on PT2 stack or Triton
- Experience doing performance optimization on machine learning models
Responsibilities
- Develop new techniques in TorchDynamo, TorchInductor, PyTorch core, PyTorch Distributed.
- Explore the intersection of PyTorch compiler and PyTorch distributed.
- Optimize Generative AI models across the stack (pre-training, fine-tuning, and inference).
- Improve general PyTorch performance.
- Conduct cutting-edge research on ML compiler and ML distributed technologies.
- Collaborate with users of PyTorch to enable new use cases for the framework both inside and outside Meta.
Other
- Currently has or is in the process of obtaining a PhD degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
- Research or software engineer experience demonstrated via grants, fellowships, patents, internship, work experience, and/or coding competitions
- First-authored publications at peer-reviewed conferences (e.g. NeurIPS, MLSys, ASPLOS, PLDI, CGO, PACT, ICML, or similar)