Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Machine Learning Engineer - Infrastructure

Rad AI

Salary not specified

Dec 9, 2025

Remote, US

Rad AI is looking to hire a Staff Machine Learning Engineer to build and maintain the infrastructure that supports their AI research and products, aiming to accelerate language model R&D and serve those models to radiologists, ultimately improving clinical outcomes for patients.

Requirements

8+ years of industry experience in ML Engineering in cloud-native environments
In-depth knowledge of Python (required), Javascript/Typescript (nice to have), or other modern languages in the ML domain
Strong experience with infrastructure and DevOps tools such as Kubernetes, Docker, and Ansible
Strong knowledge of cloud computing platforms such as AWS (preferable), GCP, and Azure
Experience architecting distributed systems, storage systems, and databases
Experience working with machine learning frameworks such as PyTorch and LangGraph
Experience with Airflow (preferable) or other orchestration tools

Responsibilities

Architect the infrastructure that supports our machine learning applications, services, and workflows
Architect and maintain our ML platform that supports continuous integration, continuous delivery, and continuous training for our machine learning models
Develop cloud-native services and serverless architectures to build scalable and resilient systems
Partner with data scientists to design the data pipeline that enable various machine learning models in production
Write code that meets our internal standards for security, style, maintainability, and best practices for a high-scale HIPAA web environment
Design, deploy, and maintain the full ML platform stack including monitoring and observability, data analytics, backend integration with customer-facing products, and the full model R&D lifecycle
Work with Product Management, Research, and Engineering to iterate on new features and address inefficiencies across our AI/ML infrastructure

Other

Excellent communication skills, with a strong sense of ownership and a systematic approach to problem-solving
Proven ability to manage and lead active incidents, address what caused them, and establish systems to avoid them in the future via blameless postmortems
Experience working at a fast-growing startup
Experience in a HIPAA-compliant environment
Location Flexibility (Remote-first company!)