The AL infra and platform Engineering Service team is responsible for building AI Platform, AI Ops capabilities to improve operational efficiency and effectiveness of enterprise engineering cloud services - thereby improve customer experience and service resiliency.
Requirements
- strong background in software development
- experience in ML ops, dev ops and ML Platform
- architect broad systems interactions
- be hands-on
- dive deep into any part of the stack
- good sense of cloud infrastructure and networking knowledge
- experience developing and operating high-scale services
Responsibilities
- Build cloud service on top of the modern Infrastructure as a Service (IaaS) building blocks at OCI
- Design and build distributed, scalable, fault tolerant software systems
- Participate in the entire software lifecycle – development, testing, CI and production operations
- Lead software projects without needing significant guidance and guide/mentor/coach junior engineers
- Design software architecture for mission critical components and be able to get buy-in from the stakeholders on it including senior members of the team, software architects in the org and management
- Balance between product feature development and production operational concerns like writing runbooks, ops automation, structured logging, instrumentation for metrics and events
- Leverage plethora of internal tooling at OCI to develop, build, deploy and troubleshoot software
Other
- collaborate with cross functional teams
- work seamlessly in a collaborative, agile environment
- provide technical leadership to the broader organization
- Understand operational excellence and know-how to infuse a culture of being proactive within your team
- Recommend and justify major changes to new and existing products and establish consensus with data-driven approaches