Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Principal MLOps Engineer

Trase

$200,000 - $250,000

Oct 17, 2025

Remote, US • Seattle, WA, US

Trase Systems is looking to solve the complexity and risks associated with deploying, managing, and optimizing AI in the enterprise by providing an end-to-end solution. The Principal MLOps Engineer will be instrumental in advancing their ML systems, focusing on model training, pipeline development, and fine-tuning LLMs to ensure peak performance and drive innovation.

Requirements

Expertise in designing and operating scalable, production-grade ML systems on AWS, GCP, or Azure.
Mastery of Docker and Kubernetes for managing production ML workloads.
Proven experience managing complex infrastructure as code (IaC) with tools like Terraform.
Deep experience architecting CI/CD/CT pipelines for complex ML workflows (e.g., GitHub Actions, Jenkins).
Strong Python programming skills for infrastructure automation, tooling, and services.
Experience architecting solutions across the full ML lifecycle, from experiment tracking to advanced deployment patterns and monitoring.
Familiarity with modern MLOps tools like MLflow, Kubeflow, SageMaker, or Vertex AI.

Responsibilities

Own the technical vision, strategy, and end-to-end architecture for Trase’s MLOps platform, ensuring scalability, reliability, security, and cost-efficiency.
Architect and build a sophisticated CI/CD/CT ecosystem to automate the entire ML lifecycle, from data validation to production monitoring.
Lead the design of scalable and resilient ML infrastructure using IaC (Terraform) and container orchestration (Kubernetes) on a major cloud platform.
Establish MLOps best practices, including frameworks for version control, experiment tracking, model governance, and responsible AI.
Implement a robust monitoring and alerting framework to track model performance, detect drift, and ensure the reliability of production ML services.
Define patterns for operating large-scale LLMs and multi-modal AI in production with efficiency and compliance.
Solve highly ambiguous, large-scale ML deployment challenges where no precedent exists, defining best practices for the org.

Other

10+ years in software/infrastructure engineering, with 5+ years in a senior/lead MLOps, ML Infrastructure, or Platform role.
Exceptional communication skills to articulate complex architectural strategy to stakeholders at all levels.
Experience with the operational challenges of LLMs, including fine-tuning pipelines, RAG systems, and vector databases.
Some travel is required.
If you want to be on the cutting edge of technology, building AI solutions for the future, and are up for a challenge, let’s talk!