Oracle is seeking to develop and deploy cutting-edge AI solutions, including machine learning, LLM applications, and agentic AI, for enterprise customers. The company aims to build scalable, production-ready AI systems and requires an AI/ML Infrastructure Engineer to support these efforts.
Requirements
- 4–7 years of software engineering experience focused on backend systems, distributed systems, or AI/ML applications.
- Hands-on experience with Docker and deploying containerized applications in Kubernetes environments.
- Strong Linux administration skills, including shell scripting, package management, troubleshooting, and performance tuning.
- Proven experience designing or managing infrastructure for AI/ML or HPC workloads. HA, Failover, Cross-Region DR
- Expertise with at least one public cloud (OCI, AWS, Azure, GCP) and willingness to specialize in OCI.
- Knowledge of DevOps and CI/CD pipelines.
- Advanced understanding of applications, server technologies, network routing, and security.
Responsibilities
- Design, deploy, and operate infrastructure components—including cloud compute, distributed systems, and data storage—to support AI/ML model training, evaluation, and deployment.
- Build automation pipelines for provisioning, configuring, and monitoring AI/ML infrastructure using Terraform, Docker, Kubernetes, and related tools.
- Optimize resource utilization and performance through cluster tuning, caching, data preprocessing, and system-level performance enhancements.
- Troubleshoot and resolve complex issues in distributed computing environments, ensuring high availability, reliability, and scalability.
- Enforce strong security and compliance standards through access control, vulnerability management, and encryption best practices.
- Partner closely with applied scientists, platform engineers, and cloud infrastructure teams to gather requirements and deliver frictionless ML workflows.
- Produce clear and comprehensive documentation for infrastructure, APIs, designs, troubleshooting, and best practices.
Other
- Excellent communication skills, particularly in distributed and asynchronous team environments.
- Demonstrated ability to own problems end-to-end and collaborate effectively with internal teams and external customers.
- Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
- Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
- Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law.