OCI is building a robust ecosystem to support the end-to-end lifecycle of AI and machine learning workloads, from GPU infrastructure and training pipelines to model serving and deployment tools. The Senior Software Engineer will work on critical components of OCI’s AI platform, including high-scale GPU cluster management, self-service ML infrastructure, and model serving systems to power Oracle's GenAI and ML initiatives.
Requirements
- 4+ years of experience shipping scalable, cloud native distributed systems
- Experience building control plane/data plane solutions for cloud native companies
- Proficient in Go, Java, Python
- Experienced at building highly available services, possessing knowledge of common service-oriented design patterns and service-to-service communication protocols
- Experience with production operations and best practices for putting quality code in production and troubleshoot issues when they arise
- Experience with container orchestration like Kubernetes
- Production experience with Cloud and ML technologies
Responsibilities
- Build cloud service on top of the modern Infrastructure as a Service (IaaS) building blocks at OCI
- Design and build distributed, scalable, fault tolerant software systems
- Participate in the entire software lifecycle – development, testing, CI and production operations
- Leverage internal tooling at OCI to develop, build, deploy and troubleshoot software
- Participate in on-call for the service with the team
Other
- Able to effectively communicate technical ideas verbally and in writing (technical proposals, design specs, architecture diagrams and presentations)
- BS in Computer Science, or equivalent experience
- MS in Computer Science
- Experience in diagnosing, troubleshooting and resolving performance issues in complex environments
- Deep understanding of Unix-like operating systems