Aura is looking to create a safer internet by developing intelligent digital safety products, and this role is intended to help accomplish this mission by designing, building, and maintaining infrastructure and pipelines that support the end-to-end machine learning lifecycle.
Requirements
- 5+ years of experience working in machine learning or data engineering environments with deep expertise in MLOps, infrastructure-as-code, and model lifecycle automation.
- Proven experience deploying machine learning models at scale in production environments (batch and real-time), preferably in privacy-sensitive domains.
- Exceptionally strong coding proficiency in Python with an understanding of Software Engineering principles and design patterns.
- Experience with infrastructure tools (e.g., Terraform) and a deep understanding of their application and integration.
- Hands-on experience with ML platforms (e.g., MLflow, Databricks, SageMaker)
- Hands-on experience with CI/CD tools (e.g., Github Actions, Jenkins)
- Hands-on experience with Containerization (e.g., Docker, Podman, Kubernetes)
Responsibilities
- Automate and optimize ML workflows using CI/CD pipelines, containerization, and orchestration tools to ensure reliable, efficient, and repeatable model delivery.
- Collaborate closely with data scientists and product teams to productionalize models, integrate them into customer-facing features, and ensure reliable performance in real-world applications.
- Develop and own model monitoring, alerting, and logging systems to track model drift, performance degradation, and anomalies in production environments.
- Define and advocate for best practices around model versioning, lineage, testing, and reproducibility to uphold high standards of reliability and compliance.
- Ensure privacy, security, and compliance in all ML infrastructure by embedding secure engineering principles and collaborating with InfoSec, Legal, and platform teams.
- Contribute to the evolution of Aura's ML platform and tooling, evaluating and integrating new technologies that improve velocity and robustness.
- Support the entire lifecycle of model development to help build automatic processes to ensure near-zero downtime.
Other
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
- One week a month to be 24/7 on call
- Travel requirements not specified
- Clearance requirements not specified
- Ability to work in a remote environment