XPeng Motors is looking to design and implement vision large language models for autonomous driving or intelligent cabin functions, and research/explore vLLM and foundation model algorithms to target top academic publications.
Requirements
- Track-record R&D experience in one of the computer vision topics: vision-large-language-model, multimodality foundation model, action recognition, open vocabulary detection, zero-shot object detection/segmentation/tracking, face recognition, 3D reconstruction
- Excellent programming skills and knowledge of C++ or Python
- Familiar with OpenCV, Numpy and any deep learning frameworks: PyTorch, Tensorflow, etc.
- Ability to root-cause engineering failures and optimize algorithm over non-idealities
- Experience with delivering product in one of the following topics: face detection, face recognition, pose estimation, 3D reconstruction, action recognition (driver monitoring)
- Knowledge in linear algebra, classic computer vision/image processing
- Experience with GPU programming
Responsibilities
- Design and implement vision large language models for autonomous driving or intelligent cabin functions
- Research and explore vLLM and foundation model algorithms, targeting top academic publication
- Test, debug, and optimization to generate robust and efficient vision algorithms
- Work with cross-functional teams on product definition, human-machine interaction (HMI) and HW/SW integration
Other
- PhD in Computer Science, Electrical Engineering, or related fields
- Able to do full-time internship onsite in SV office this summer and fall
- Excellent written and oral communication skills
- Publications in top-tier CV/ML venues: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR