Pony.ai is seeking to advance the training and inferences of AI models in autonomous driving systems to achieve state-of-the-art performance and efficiency in autonomous driving.
Requirements
- Strong programming skills in C/C++ or Python.
- Solid understanding of CPU or GPU execution model, e.g. threads, registers, cache, memory, cost and performance trade-off, etc.
- Experience in benchmarking, profiling and validating performance.
- Experience with parallel programming: CUDA, ROCm, Triton, Cutlass, etc.
- Experience in computer vision, image processing, machine learning and deep learning.
- Experience in model optimization techniques such as quantization, pruning, etc.
- Experience in optimizing the utilization of compute resources, identifying and resolving compute and data flow bottlenecks.
Responsibilities
- Performing in-depth analysis and optimization to model training and deployment to achieve the state of art in performance and efficiency in autonomous driving.
- Work across the entire AI framework/compiler stack (e.g. Torch, CUDA and TensorRT), support model development and prototype key deep learning algorithms.
- Analyze the tradeoffs between performance, cost and energy for autonomous driving.
- Collaborating closely with diverse groups in Pony.ai to influence the next-generation compute platform HW and SW design.
- Research the latest model architectures, programming models and hardware.
Other
- Currently pursuing a Masters or PhD program or a related discipline.
- Strong communication skills and ability to work cross-functionally between software and hardware teams.
- This position is fully onsite in Fremont, at least 3 months.