Oracle Cloud Infrastructure (OCI) needs to boost the productivity and efficiency of the Global Network Operations Center (GNOC) and Network Reliability Engineering (NRE) teams by providing them with observability, automation, and actionable insights at hyperscale through innovative solutions.
Requirements
- Strong coding skills in Go and Python3
- Experience with distributed systems, micro-services, and cloud-native technologies
- Proficiency in Linux environments and scripting languages
- Proficiency with database creation, maintenance and code using SQL and Go or Py3 libraries
- Experience using AI coding assistants or AI-powered tools to help accelerate software development, including code generation, code review, or debugging.
- Experience with C, Cpp, Java, or Rust
- Familiarity with workflow automation (e.g., Apache Airflow), CI/CD pipelines, or infrastructure as code
Responsibilities
- Architect, build, and support distributed systems for process control and execution based on Product Requirement Documents (PRDs).
- Develop and sustain DevOps tooling, new product process integrations and automated testing.
- Develop ML in Python 3; build backend services in Go (Golang); create command-line interface (CLI) tools in Rust or Python 3; and integrate with other services as needed using Go, Python 3, or C.
- Build and maintain schemas/models to ensure every platform and service write is captured for monitoring, debugging and compliance
- Build and maintain dashboards that monitor the quality and effectiveness of service execution for "process as code" your team delivers.
- Build automated systems that route code failures to the appropriate oncall engineers and service owners.
- Ensure high availability, reliability, and performance of developed solutions in production environments.
Other
- 3 - 5 years of experience in process as code, software engineering, automation development, or similar roles
- Bachelors in computer science and Engineering or related engineering fields
- Understanding of network operations or large-scale IT infrastructure
- Excellent problem-solving, organizational, and communication skills
- Operate in an Extreme Programming (XP) asynchronous environment (chat/tasks) without daily standups, and keep work visible by continuously updating task and ticket states in Jira.