The ML Compute Platform is part of the AI Compute Platform organization within Infrastructure Platforms. Our team owns the cloud-agnostic, reliable, and cost-efficient compute backend that powers GM AI. We enable rapid innovation and feature development by optimizing for high-priority, ML-centric use cases. We are seeking a Staff ML Engineer to help build and scale robust compute platforms for ML workflows.
Requirements
- Expertise in either Go, C++, Python or other relevant coding languages
- Strong background with kubernetes at scale
- Relevant experience building large-scale with distributed systems
- Experience leading and driving large scale initiatives
- Experience working with Google Cloud Platform, Microsoft Azure, or Amazon Web Services
- Hands-on experience building ML infrastructure platforms with strong developer/user experience
- Experience with GPU/TPU optimizations
Responsibilities
- Design and implement core platform backend software components
- Collaborate with ML engineers and researchers to understand platform pain points and improve developer experience
- Analyze and improve efficiency, scalability, and stability of various system resources
- Lead large-scale technical initiatives across GM’s ML ecosystem
- Help raise the engineering bar through technical leadership and best practices
- Contribute to and potentially lead open source projects; represent GM in relevant communities
- Ensure efficient model training and seamless deployment into production
Other
- This role is categorized as hybrid. This means the successful candidate is expected to report to the GM Global Technical Center - Cole Engineering Center Podium or Mountain View Technical Center , CA at least three times per week, at minimum or other frequency dictated by the business.
- This job is eligible for relocation assistance.
- 8+ years of industry experience
- Experience leading and driving large scale initiatives
- Thrive in a dynamic, multi-tasking environment with ever-evolving priorities.