Meta's MTIA (Meta Training & Inference Accelerator) Software team is developing a comprehensive AI Compiler strategy to deliver a flexible platform for training and serving new DL/ML model architectures with high performance for production environments across specialized hardware architectures. The compiler stack, DL graph optimizations, and kernel authoring directly impact the performance and deployment velocity of AI training and inference platforms at Meta.
Requirements
- Proven C/C++ programming skills
- Experience in AI framework development or accelerating deep learning models on hardware architectures
- Experience with compiler optimizations such as loop optimizations, vectorization, parallelization, hardware specific optimizations such as SIMD.
- Experience with MLIR, LLVM, IREE, XLA, TVM, Halide is a plus.
- Experience in developing training and inference framework components.
- Experience in system performance optimizations such as runtime analysis of latency, memory bandwidth, I/O access, compute utilization analysis and associated tooling development.
- Experience with CUDA programming, OpenMP / OpenCL programming or AI hardware accelerator kernel programming.
Responsibilities
- Development of SW stack with one of the following core focus areas: AI frameworks, compiler stack, high performance kernel development and acceleration onto next generation of hardware architectures
- Contribute to the development of the industry-leading PyTorch AI framework core compilers to support new state of the art inference and training AI hardware accelerators and optimize their performance
- Analyze deep learning networks, develop & implement compiler optimization algorithms
- Collaborating with AI research scientists to accelerate the next generation of deep learning models such as Recommendation systems, Generative AI, Computer vision, NLP etc
- Performance tuning and optimizations of deep learning framework & software components
Other
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- A Bachelor's degree in Computer Science, Computer Engineering, relevant technical field and 7+ years of experience in AI framework development or accelerating deep learning models on hardware architectures OR a Master's degree in Computer Science, Computer Engineering, relevant technical field and 4+ years of experience in AI framework development or accelerating deep learning models on hardware architectures OR a PhD in Computer Science Computer Engineering, or relevant technical field and 3+ years of experience in AI framework development or accelerating deep learning models on hardware architectures.
- Experience working with frameworks like PyTorch, Caffe2, TensorFlow, ONNX, TensorRT
- Knowledge of GPU, CPU, or AI hardware accelerator architectures.