The company is seeking to bridge the gap between research and engineering by developing and maintaining infrastructure to deploy, monitor, and manage machine learning models efficiently and effectively.
Requirements
- Python, Typescript, Shell script languages
- Experience with ML pipeline tools (Kubeflow, Airflow, MLflow)
- Services on AWS such as S3, Lambda, DynamoDB
- CI/CD systems (GitHub Actions, Jenkins, GitLab)
- Infrastructure-as-Code experience (Terraform, CloudFormation)
- Containerization (Docker, Kubernetes)
- Expert proficiency in Python; working knowledge of ML frameworks (e.g., PyTorch, TensorFlow, MLflow)
Responsibilities
- Pipeline Development: Implement, optimize, and maintain CI/CD pipelines for ML systems, including integrations with GitHub workflows and Jenkins.
- Collaboration: Partner with data scientists, frontend engineers, and platform teams to deliver seamless integration of ML models into core evaluation platforms.
- Environment Management: Administer ML development/production environments using cloud-native solutions; optimize for scalability, reliability, and cost.
- Tooling and Automation: Evaluate, build, and deploy automation tools to streamline the end-to-end ML lifecycle.
- Quality & Monitoring: Enhance and develop quality evaluation features and ensure robust monitoring via dashboards and automated alerts.
- Documentation & Best Practices: Champion engineering best practices, promote code quality, and document workflows, tools, and processes for effective team adoption.
Other
- Master's in computer science or related STEM field
- Minimum 5 years in software engineering; at least 2 years dedicated to DevOps/MLOps in cloud and production environments.
- Strong problem-solving, excellent written/verbal communication, and the ability to lead and collaborate effectively across teams.