XPENG is looking to design, architect, and implement company-scale distributed systems for next-generation autonomy software evaluation, improve engineering efficiency for machine learning engineers, and provide insights into software quality and release readiness through complex workflows.
Requirements
- Delivered a company scale and industry leading infrastructure from scratch.
- Expert level understanding of K8S, Queueing and in memory data structures.
- 10+ years developing backend services (we primarily write C++ and Python).
- 3+ experience working on complex Machine learning infrastructure.
- Experience developing and maintaining machine learning production systems deployed to the cloud and on premise.
- Experience in using Bazel for Complex large scale machine learning infrastructure.
- Experience with modern python tooling like Ruff, Mypy, Typeguard and pytest.
Responsibilities
- Design , Architect and implement company scale distributed system for next generation of the autonomy software evaluation.
- Collaborate with multiple teams in side XPENG to deliver best in class infrastructure for next generation of XPENG innnovations.
- Design and implement tools and infrastructure to improve engineering efficiency of machine learning engineers daily workflows.
- Design and implement complex workflow on cloud and on premise infrastructure to provide insights into Software Quality and Release Readiness of features.
- Leverage LLMs to bring efficiency to existing established processes of triaging, analysis and troubleshooting.
Other
- Demonstrate a can-do attitude and able to thrive at a high pace, always evolving landscape of requirements.
- Collaborate with stake holders to deliver highly complex and flexible infrastructure to meet their use cases, SLA and QOS.
- Experience working in a fast paced environment.
- Self motivated and ability to deal with ambiguity and evolving requirements.
- Experience of working on Autonomous vehicle stack.