Ampere is looking to advance AI capabilities and pave the way for high-performance and efficient computing solutions to meet future AI demands by developing cutting-edge AI frameworks and optimizing AI architectures.
Requirements
- Previous development experience using or implementing a collective communication library like NCCL/RCCL/MPI, or working experience of socket programming
- Strong expertise in programming languages such as Python, C/C++ with a strong background in performance tuning
- Previous software development with a focus on AI frameworks – PyTorch, llama.cpp, ONNX, etc
- Solid understanding of AI and machine learning concepts, including neural networks and data processing frameworks
- Experience with high-performance computing systems and cloud-based architectures
Responsibilities
- Develop communication kernels and enable cutting-edge distributed inference technologies like Tensor/Pipeline parallelism, disaggregated prefil and decode for Ampere accelerator
- Go deep into the entire SW/HW stack to accelerate deep learning including inference serving, framework integration, compiler, runtime library, communication and compute kernel development, and performance tuning
- Work on deep learning model enabling with performance and accuracy for popular frameworks like PyTorch and Llama.cpp and for serving platforms like vLLM and SGLang
- HW/SW codesign to optimize existing AI architectures to enhance computational efficiency, increase throughput, reduce latency, and improve scalability
- Build state-of-the-art software and hardware AI co-processors/accelerators
- Collaborate with cross-functional teams to integrate AI solutions into Ampere's cloud-native processor platforms and accelerators
Other
- BS Computer Science, Mathematics or a related technical field & 12 years of related experience; or MS degree & 8 years; or PhD & 5 years
- Unlimited Flextime and 10+ paid holidays
- Premium medical insurance, dental insurance, vision insurance, as well as income protection and a 401K retirement plan
- Travel requirements not specified
- Clearance requirements not specified