Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Research Scientist Intern

XPeng Motors

Salary not specified

Dec 9, 2025

Santa Clara, CA, US

XPENG is looking to develop the core brain for its end-to-end autonomous driving systems by creating a next-generation Vision-Language-Action (VLA) Foundation Model.

Requirements

Experience in multi-modal modeling (vision, language, or planning), with deep understanding of representation learning, temporal modeling, and reinforcement learning techniques.
Strong proficiency in PyTorch and modern transformer-based model design.
Prior experience building foundation or end-to-end driving models, or LLM****/VLM architectures (e.g., ViT, Flamingo, BEVFormer, RT-2, or GRPO-style policies).
Knowledge of RLHF/DPO/GRPO, trajectory prediction, or policy learning for control tasks.
Familiarity with distributed training (DDP, FSDP) and large-batch optimization.

Responsibilities

Conduct research on designing and implementing large-scale multi-modal architectures (e.g., vision–language–action transformers) for end-to-end autonomous driving.
Design and integrate cross-modal alignment (e.g., visual grounding, temporal reasoning, policy distillation, imitation and reinforcement learning) to improve model interpretability and action quality.
Closely collaborate with researchers and engineers across the modeling and infrastructure team.
Contribute to top-tier AI/CV/ML conferences publications and present research findings.

Other

Currently enrolled in the Master/Ph.D program in Computer Science, Electrical/Computer Engineering, or related field, with the specialization in the CV/NLP/ML.
Publication record in top-tier AI conferences (CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, etc).