OCI is building the services and tools behind the world's largest AI mega-cluster, requiring the design, development, and maintenance of highly available, scalable services to power next-generation AI infrastructure and expand its cloud footprint.
Requirements
- 5-12+ years experience designing, developing and operating large scale, highly available distributed systems.
- Strong knowledge of Java, C, or C++, and experience with scripting languages such as Python, Perl, etc.
- Strong knowledge of data structures, algorithms, operating systems, and distributed systems fundamentals.
- Strong understanding of databases, NoSQL systems, storage and distributed persistence technologies.
- Working familiarity with operating systems internals, networking protocols (TCP/IP, HTTP) and standard network architectures.
- Strong troubleshooting and performance tuning skills.
- Experience with network designs, modeling and network linking algorithms is a strong plus
Responsibilities
- Own the software design and development for major components of Oracle’s Cloud Infrastructure.
- Design, build, and maintain highly available, scalable services.
- Dive deep into any part of the stack and low-level systems.
- Design broad distributed system interactions.
- Develop services and tools for logical and physical modeling for new data centers and data center networks.
- Handle planning, design, delivery, and bootstrap of data center infrastructure.
- Troubleshoot and performance tune systems.
Other
- Be a rock-solid coder and a distributed systems generalist.
- Value simplicity and scale.
- Work comfortably in a collaborative, agile environment.
- Be excited to learn.