Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Trase Logo

Principal MLOps Engineer

Trase

$200,000 - $250,000
Oct 17, 2025
Remote, US • Seattle, WA, US
Apply Now

Trase Systems is looking to solve the complexity and risks associated with deploying, managing, and optimizing AI in the enterprise by providing an end-to-end solution. The Principal MLOps Engineer will be instrumental in advancing their ML systems, focusing on model training, pipeline development, and fine-tuning LLMs to ensure peak performance and drive innovation.

Requirements

  • Expertise in designing and operating scalable, production-grade ML systems on AWS, GCP, or Azure.
  • Mastery of Docker and Kubernetes for managing production ML workloads.
  • Proven experience managing complex infrastructure as code (IaC) with tools like Terraform.
  • Deep experience architecting CI/CD/CT pipelines for complex ML workflows (e.g., GitHub Actions, Jenkins).
  • Strong Python programming skills for infrastructure automation, tooling, and services.
  • Experience architecting solutions across the full ML lifecycle, from experiment tracking to advanced deployment patterns and monitoring.
  • Familiarity with modern MLOps tools like MLflow, Kubeflow, SageMaker, or Vertex AI.

Responsibilities

  • Own the technical vision, strategy, and end-to-end architecture for Trase’s MLOps platform, ensuring scalability, reliability, security, and cost-efficiency.
  • Architect and build a sophisticated CI/CD/CT ecosystem to automate the entire ML lifecycle, from data validation to production monitoring.
  • Lead the design of scalable and resilient ML infrastructure using IaC (Terraform) and container orchestration (Kubernetes) on a major cloud platform.
  • Establish MLOps best practices, including frameworks for version control, experiment tracking, model governance, and responsible AI.
  • Implement a robust monitoring and alerting framework to track model performance, detect drift, and ensure the reliability of production ML services.
  • Define patterns for operating large-scale LLMs and multi-modal AI in production with efficiency and compliance.
  • Solve highly ambiguous, large-scale ML deployment challenges where no precedent exists, defining best practices for the org.

Other

  • 10+ years in software/infrastructure engineering, with 5+ years in a senior/lead MLOps, ML Infrastructure, or Platform role.
  • Exceptional communication skills to articulate complex architectural strategy to stakeholders at all levels.
  • Experience with the operational challenges of LLMs, including fine-tuning pipelines, RAG systems, and vector databases.
  • Some travel is required.
  • If you want to be on the cutting edge of technology, building AI solutions for the future, and are up for a challenge, let’s talk!