Tesla’s AI team is pushing the frontier of real-world machine learning, building models that reason, predict, and act with human-level physical intelligence, and the company needs to design and optimize models to run efficiently across Tesla’s diverse compute stack
Requirements
- Proven experience in scaling and optimizing inference for large ML models, particularly transformers or similar architectures
- Familiarity with quantization-aware training, model compression, and distillation for edge and real-time inference
- Proficiency with Python and C++ (modern standards 14/17/20) and deep learning frameworks such as PyTorch, TensorFlow, or JAX
- Strong understanding of computer systems and architecture, with experience deploying ML models on GPUs, TPUs, or NPUs
- Hands-on expertise with CUDA programming, low-level performance profiling, and compiler-level optimization (TensorRT, TVM, XLA)
- Experience collaborating with compiler/hardware engineers to bridge model and system-level optimization
- Excellent problem-solving skills and the ability to debug and tune high-performance inference workloads
Responsibilities
- Design, train, and deploy large neural networks that run efficiently on heterogeneous hardware (GPU, CPU, Tesla’s in-house AI ASIC)
- Develop and integrate quantization, sparsity, pruning, and distillation techniques to improve inference performance
- Design inference algorithms that improve inference performance in terms of quantization and latency
- Profile and improve latency, throughput, and memory efficiency for large ML models across edge and cloud environments
- Collaborate with compiler and hardware engineers to co-design architectures for efficient real-time inference
- Design and implement custom GPU kernels (CUDA / OpenCL) to accelerate model operations and post-processing pipelines
- Conduct systematic benchmarking, scaling, and validation of inference performance across Tesla platforms
Other
- Bachelor's, Master's, or Ph.D. degree in Computer Science or related field
- Travel may be required
- Must be eligible to work in the United States
- Excellent communication and collaboration skills
- Ability to work in a fast-paced environment