Tesla's Autopilot AI Infrastructure team needs to explore and identify neural network architectures that offer improved training speed performance without compromising driving capability, to accelerate the training process on large supercomputers.
Requirements
- Strong knowledge of Python and Linux
- Experience in one of the following: PyTorch, JAX, TensorFlow, or equivalent auto-differentiation framework
- Experience training modern neural networks for vision, language or multimodality
- Familiarity with the GPU programming model and kernel software stack (CUDA, HIP, or Triton)
- Knowledge of machine learning, computer vision, or neural networks
Responsibilities
- Co-design neural network architectures with ML Engineers with the goal of minimizing time-to-training-convergence and maximize model flop utilization
- Instrument, profile, and analyze end-to-end training (data loader, input pipeline, kernels, comms) to identify architecture-driven training bottlenecks
- Redesign blocks (attention variants, convolutional stems, normalization, routing, feature fusion) for higher arithmetic intensity and better kernel efficiency
- Explore width/depth/receptive-field trade-offs to match quality baselines with reduced step time or improved throughput per accelerator
- Prototype and evaluate operator fusions, layout changes, and mixed precision variants to raise achieved FLOP utilization
- Partner with kernel / systems engineers to surface architecture patterns that enable kernel fusion or memory reuse
- Run controlled ablations to ensure driving performance parity while improving time-to-training convergence
Other
- Pursuing a degree in Computer Science, Computer Engineering, or relevant field of study with a graduation date between 2026-2027
- minimum of 12 weeks, full-time and on-site, for most internships
- students who are actively enrolled in an academic program
- If your work authorization is through CPT, please consult your school on your ability to work 40 hours per week before applying.
- You must be able to work 40 hours per week on-site.