Oracle Cloud Infrastructure’s (OCI) Identity and Access Management (IAM) team builds and operates the foundational services that secure OCI — enabling customers to control who has access to their cloud resources across global regions. The team designs and delivers large-scale, distributed systems that ensure reliability, performance, and consistency for millions of users worldwide.
Requirements
- 6+ years of experience in distributed systems or service engineering.
- Proven track record designing, building, and operating high-traffic, highly available web services.
- Strong proficiency in Java, C++, C-Sharp , or similar object-oriented languages.
- Experience with RESTful web services and service-oriented architectures.
- Proficiency with at least one scripting language (e.g., Python, Bash) for automation and tooling.
- Hands-on experience with public cloud platforms (Oracle Cloud, AWS, Azure, etc.).
- Experience with Docker or containerized service development.
Responsibilities
- Design and deliver core features from concept to production with strong technical ownership.
- Operate and scale highly available and resilient distributed services handling millions of requests per second.
- Improve system reliability, observability, and performance through automation and proactive monitoring.
- Lead the design and implementation of medium to large-scale features in distributed systems.
- Own the full lifecycle of features from requirement gathering, design, development, testing, and production rollout.
- Build and operate services that are highly available, resilient, and performant.
- Proactively monitor systems, detect anomalies, and drive corrective actions before issues impact customers.
Other
- Balance speed and quality through iterative development and continuous improvement.
- Collaborate with cross-functional teams to define priorities, enhance developer experience, and uphold operational excellence.
- Make architectural and technical decisions with guidance from senior engineers or team leads.
- Apply best practices in observability, logging, monitoring, and alerting.
- Participate in on-call rotations and handle incidents with a focus on root-cause analysis and long-term fixes.