Oracle is looking to solve the problem of designing, implementing, and managing infrastructure for AI/ML or HPC workloads, and ensuring security and compliance standards are met throughout the AI/ML infrastructure stack.
Requirements
- Experience in scripting and automation using tools like Ansible, Terraform, and/or Kubernetes.
- Experience with containerization technologies (e.g., Docker, Kubernetes) and orchestration tools for managing distributed systems.
- Solid understanding of networking concepts, security principles, and best practices.
- Strong Linux skills with hands-on experience in Oracle Linux/RHEL/CentOS, Ubuntu, and Debian distributions, including system administration, package management, shell scripting, and performance optimization.
- Strong proficiency in at least one of the programming languages such as Python, Rust, Go, Java, or Scala
- Understanding machine learning frameworks and libraries such as TensorFlow, PyTorch, or sci-kit-learn and their deployment in production environments is a plus.
- Familiarity with DevOps practices and tools for continuous integration, deployment, and monitoring (e.g., Jenkins, GitLab CI/CD, Prometheus)
Responsibilities
- Design, deploy, and manage infrastructure components such as cloud resources, distributed computing systems, and data storage solutions to support AI/ML workflows.
- Implement automation solutions for provisioning, configuring, and monitoring AI/ML infrastructure to streamline operations and enhance productivity.
- Optimize infrastructure performance by tuning parameters, optimizing resource utilization, and implementing caching and data pre-processing techniques.
- Ensure security and compliance standards are met throughout the AI/ML infrastructure stack, including data encryption, access control, and vulnerability management.
- Troubleshoot infrastructure performance, scalability, and reliability issues and implement solutions to mitigate risks and minimize downtime.
- Stay updated on emerging technologies and best practices in AI/ML infrastructure and evaluate their potential impact on our systems and workflows.
- Document infrastructure designs, configurations, and procedures to facilitate knowledge sharing and ensure maintainability.
Other
- Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
- Range and benefit information provided in this posting are specific to the stated locations only
- US: Hiring Range in USD from: $96,800 - $223,400 per year. May be eligible for bonus and equity.
- Medical, dental, and vision insurance, including expert medical opinion
- 11 paid holidays