Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

XPeng Motors Logo

Senior Staff Computer Vision Engineer

XPeng Motors

$244,140 - $413,160
Aug 14, 2025
Santa Clara, CA, US
Apply Now

XPENG is seeking to optimize model inference and deploy high-performance, large-scale AI models for autonomous driving and beyond.

Requirements

  • Strong coding skills in C++ and Python with a focus on performance and scalability.
  • Proficient in deploying deep learning models using TensorRT, ONNX Runtime, or TVM.
  • Familiarity with CUDA programming and parallel computing principles.
  • Solid understanding of model inference workflows and system-level performance tuning.
  • Experience in quantization-aware training or post-training quantization.
  • Hands-on experience with deploying vision-language or large multimodal models.
  • Familiarity with low-precision inference (INT8/FP16), kernel fusion, and operator-level optimization.

Responsibilities

  • Optimize large-scale multimodal models for low-latency inference and efficient memory usage across diverse hardware platforms.
  • Apply state-of-the-art model compression techniques, including quantization (e.g., INT8/FP16), pruning, and knowledge distillation.
  • Develop and integrate custom inference kernels targeting GPU or custom AI accelerators.
  • Build profiling tools and performance models to analyze bottlenecks and guide optimization strategies.
  • Contribute to real-world deployment efforts in autonomous driving systems, including on-vehicle testing and iteration.
  • Track the latest research in efficient ML inference and integrate relevant techniques into production pipelines.

Other

  • Master’s or Ph.D. in Computer Science, Electrical Engineering, or related field.
  • Effective communicator and collaborative team player.
  • Track record of open-source contributions or publications in ML/AI conferences (e.g., NeurIPS, ICML, CVPR).
  • Background in system profiling, latency modeling, or compiler-level optimization.