Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AI Engineer, ML Inference Optimization, Autonomy & Robotics

Tesla

$124,000 - $420,000

Nov 7, 2025

Palo Alto, CA, United States of America

Tesla’s AI team is pushing the frontier of real-world machine learning, building models that reason, predict, and act with human-level physical intelligence, and the company needs to design and optimize models to run efficiently across Tesla’s diverse compute stack

Requirements

Proven experience in scaling and optimizing inference for large ML models, particularly transformers or similar architectures
Familiarity with quantization-aware training, model compression, and distillation for edge and real-time inference
Proficiency with Python and C++ (modern standards 14/17/20) and deep learning frameworks such as PyTorch, TensorFlow, or JAX
Strong understanding of computer systems and architecture, with experience deploying ML models on GPUs, TPUs, or NPUs
Hands-on expertise with CUDA programming, low-level performance profiling, and compiler-level optimization (TensorRT, TVM, XLA)
Experience collaborating with compiler/hardware engineers to bridge model and system-level optimization
Excellent problem-solving skills and the ability to debug and tune high-performance inference workloads

Responsibilities

Design, train, and deploy large neural networks that run efficiently on heterogeneous hardware (GPU, CPU, Tesla’s in-house AI ASIC)
Develop and integrate quantization, sparsity, pruning, and distillation techniques to improve inference performance
Design inference algorithms that improve inference performance in terms of quantization and latency
Profile and improve latency, throughput, and memory efficiency for large ML models across edge and cloud environments
Collaborate with compiler and hardware engineers to co-design architectures for efficient real-time inference
Design and implement custom GPU kernels (CUDA / OpenCL) to accelerate model operations and post-processing pipelines
Conduct systematic benchmarking, scaling, and validation of inference performance across Tesla platforms

Other

Bachelor's, Master's, or Ph.D. degree in Computer Science or related field
Travel may be required
Must be eligible to work in the United States
Excellent communication and collaboration skills
Ability to work in a fast-paced environment