AMD is looking to solve the business and technical problem of designing, developing, and optimizing a frontend compiler for neural networks on their XDNA Neural Processing Units to power cutting-edge generative AI models, impacting the efficiency, scalability, and reliability of ML applications.
Requirements
- Strong programming skills in C++, Python.
- Experience with proprietary/open source compiler stack: TVM, MLIR.
- Experience with ML frameworks (e.g., ONNX, PyTorch) is required.
- Experience with ML models such as CNN, LSTM, LLMs, Diffusion is a must.
- Experience with ONNX, Pytorch runtime integration is a bonus.
Responsibilities
- Design and implement NPU compiler framework for neural networks.
- Develop hardware aware graph optimizations for high level ML frameworks like ONNX.
- Research new algorithms for operator scheduling for efficient inference of latest NN models.
- Interface with ONNX / Pytorch runtime and lower level HW implementation.
- Contribute to high performance inference for GenAI workloads such as Llama2-7B, Stable diffusion, SDXL-Turbo etc.
- Manage CPU, and memory resources effectively during model execution.
- Research heterogenous mapping of ML operators for maximum efficiency.
Other
- Master’s, or PhD degree in Computer Science, Electrical Engineering, or related fields.
- LI-HYBRID