Coupang is looking to solve the problem of building the future of commerce by defining and executing the long-term vision and roadmap for the company’s AI infrastructure orchestration layer, aligning it with overall business and AI Services goals.
Requirements
- Expert-level knowledge of containerization and orchestration (Docker, Kubernetes)
- Strong background in DevOps and MLOps principles and tooling
- Proficiency in at least one modern programming language (e.g., Python, Go)
- Deep, hands-on experience designing and operating large-scale distributed systems and cloud-native architectures
- Proven experience specifically with AI infrastructure orchestration (e.g., using Kubernetes, Kubeflow, or similar MLOps tools) and managing accelerated compute resources (GPUs, TPUs etc)
- 15+ years of Cloud backend engineering, Cloud Design, Deployment, DevOps
- 15+ years of experience leading system design, architecture leveraging Private Clouds and AWS and/or Azure/ GCP
Responsibilities
- Define and execute the long-term vision and roadmap for the company’s AI infrastructure orchestration layer
- Lead, mentor, and grow a high-performing engineering and operations team focused on AI infrastructure, and platform engineering
- Oversee the design, implementation, and maintenance of the core orchestration platforms for large-scale AI model training and deployment
- Ensure reliability, security, and compliance of the AI infrastructure, meeting strict standards for data governance and model integrity
- Establish Service Level Objectives (SLOs) and Key Performance Indicators (KPIs) for the AI platform services and lead efforts for continuous optimization and performance tuning
- Select, evaluate, and integrate the core technologies required for the AI stack
- Champion infrastructure-as-code (IaC) principles to manage and provision AI resources consistently and at scale
Other
- Bachelor's or Master’s degree in Computer Science, Engineering, or a related technical field
- 15+ years of progressive experience in software engineering, infrastructure, or platform operations
- 5+ years of experience leading and managing technical teams, ideally in a Director or Sr. Director level or equivalent capacity
- Excellent cross-group collaboration, outstanding verbal and written communication
- Medical/Dental/Vision/Life, AD&D insurance, Flexible Spending Accounts (FSA) & Health Savings Account (HSA), Long-term/Short-term Disability