AMD is looking to improve the performance of key Machine Learning applications and benchmarks on NPU
Requirements
- Detailed and thorough understanding of ONNX, PyTorch runtime stack, open-source frameworks
- Strong experience in scheduling operators between NPU, GPU and CPU
- Experience with graph parsing, operator fusion
- Strong experience with AVX, AVX512 instruction set, cache behavior in CPU
- Strong experience with managing system memory
- Detailed understanding of compiler interfacing with runtime stack, JIT compilation flow
- Strong programming skills in C++, Python
Responsibilities
- Define software stack that interfaces with open-source runtime environments like ONNX, PyTorch and NPU compiler
- Define runtime operator scheduling, memory management, operator dataflow based on tensor residency
- Propose algorithmic optimization in operators that are mapped to CPU using AVX512
- Interface with ONNX / PyTorch runtime engines to deploy the model on CPUs
- Develop efficient model loading mechanisms to minimize startup latency
- Collaborate with kernel developers to integrate ML operators seamlessly into high level ML frameworks
- Design and implement C++ runtime wrappers, APIs, and frameworks for ML model execution
Other
- Communicate effectively and work optimally with different teams across AMD
- PhD degree in Computer Science, Computer Engineering, Electrical Engineering
- Motivating leader with good interpersonal skills
- Location: San Jose, CA (hybrid option available)
- AMD benefits at a glance