Aurora is looking to solve the problem of enhancing the performance of Deep Learning networks utilized in their Autonomous Vehicle (AV) systems to make transportation safer and more accessible.
Requirements
- Strong programming skills in CUDA, C++ and Python
- Extensive experience in high-performance computing and parallel programming, specializing in optimizing workloads to reduce GPU memory usage, minimize latency, and/or maximize throughput.
- Proficiency in leveraging performance analysis tools such as NVIDIA Nsight Systems , Nsight Compute and applying techniques like roofline model for performance optimization.
- Hands-on experience in optimizing DL/ML workloads at the framework level using at least one deep learning framework (e.g., PyTorch, TensorFlow), ensuring efficient and scalable model deployment.
- Strong understanding of the fundamentals of computer vision and transformer-based deep learning architectures, with proficiency in foundational neural network building blocks.
- Experience with TensorRT, OpenAI Triton, Mojo and other inference acceleration tools.
Responsibilities
- Conduct performance analysis and optimization of Deep Learning networks running on the Autonomous Vehicle (AV).
- Optimize software architecture, system performance, and latency for deep learning applications.
- Work on deployment of deep learning models on the AV and training on large-scale data centers.
- Troubleshoot performance issues using profiling and roofline model techniques.
- Collaborate with cross-functional teams to enhance the efficiency of self-driving technology.
Other
- Minimum 5+ years of professional experience in software engineering.
- BS, MS, or PhD in Computer Science or a related field.
- Strong communication skills, enabling effective teamwork across multidisciplinary teams.
- Comfortable working in Linux/Unix environments.
- Demonstrated ability to quickly learn and adapt to emerging technologies and tools in a fast-paced environment