HCA Healthcare is looking to build and enhance a scalable, reliable, and cutting-edge Machine Learning Operations (MLOps) platform to streamline workflows, enable seamless collaboration, and drive innovation in AI/ML solutions, while prioritizing transparency, fairness, and ethical practices.
Requirements
- Advanced proficiency in cloud platforms, especially Google Cloud Platform (GCP).
- Experience with on-premises and edge deployments is a plus.
- Solid understanding of AI/ML concepts, technologies, and best practices, with hands-on experience deploying ML solutions at scale.
- Proficiency in Python and other scripting tools for automation and platform optimization.
- Strong analytical and troubleshooting skills, with a track record of solving complex problems under pressure.
- Proven experience managing and leading cloud architecture and engineering teams.
- Strong background in AI/ML or data science technologies and platform development.
Responsibilities
- Lead the enhancement of the AI platform to improve the developer experience for data and ML engineers.
- Optimize workflows by integrating state-of-the-art tools and technologies, ensuring scalability and efficiency.
- Architect and manage the cloud infrastructure supporting the MLOps platform, leveraging infrastructure-as-code (IaC) tools like Terraform.
- Optimize for scalability, security, cost-effectiveness, and high availability.
- Collaborate with the AI/ML reliability engineering team to design and implement components that ensure the platform’s operational reliability, observability, and fault tolerance.
- Build and maintain robust DevOps pipelines tailored for ML workflows, enabling automated model training, testing, deployment, and monitoring.
- Design and manage tools to enhance platform reliability, including dashboards, logging systems, and alerting frameworks, to ensure seamless operations.
Other
- Partner with data science, product management, engineering, and business teams to understand their requirements and ensure the MLOps platform effectively supports their needs.
- Effectively communicate technical concepts and strategies to both technical and non-technical audiences.
- Apply knowledge from related disciplines, such as data science and health/biology sciences, to design holistic MLOps solutions that meet the unique needs of the organization.
- Demonstrated expertise in leading Responsible AI initiatives, with a focus on ethical AI practices.
- Excellent communication, leadership, and project management skills.
- 7+ years of experience in ML Ops, Dev Ops, or related role required