Contextual AI is looking for a Machine Learning Infrastructure Engineer to build and maintain the infrastructure for their machine learning, data, and service pipelines, specifically focusing on scaling their RAG 2.0 systems.
Requirements
- Strong software engineering skills with experience in Python, or similar languages.
- Experience designing, operating, or scaling distributed systems or developer infrastructure.
- Comfort working in Linux environments, and with tools like Kubernetes, Terraform, CI/CD pipelines, and modern observability stacks.
Responsibilities
- Design, build, and maintain reliable and performant systems used across engineering.
- Collaborate with other engineers, product managers, and researchers to build infrastructure that meets evolving needs.
- Improve internal tooling, automation, and developer experience.
- Contribute to incident response, postmortems, and the development of best practices around system reliability and scalability.
- Owning and building important, highly scalable, available, performant, and reliable distributed systems (and their building blocks) to power the entire stack.
- Work across layers of the stack—debugging system bottlenecks, evolving core infrastructure, and solving novel problems in performance and scalability.
- Build scalable, fault-tolerant systems and lead efforts around service health, incident response, and resilience.
Other
- Excellent communication and collaboration skills, especially in cross-functional settings.
- Ability to navigate complex systems and a willingness to dig deep when debugging tricky issues.