Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Red Cell Partners Logo

Principal MLOps Engineer

Red Cell Partners

$200,000 - $250,000
Sep 29, 2025
Remote, US • Seattle, WA, US • McLean, VA, US
Apply Now

Red Cell Partners is building and investing in technology-led companies. Trase Systems, a Red Cell company, aims to empower enterprise leaders to harness the full potential of AI without complexity and risks by providing an end-to-end solution for deploying, managing, and optimizing AI. The Principal MLOps Engineer will advance Trase's ML systems, focusing on model training, pipeline development, and fine-tuning LLMs.

Requirements

  • Expertise in designing and operating scalable, production-grade ML systems on AWS, GCP, or Azure.
  • Mastery of Docker and Kubernetes for managing production ML workloads.
  • Proven experience managing complex infrastructure as code (IaC) with tools like Terraform.
  • Deep experience architecting CI/CD/CT pipelines for complex ML workflows (e.g., GitHub Actions, Jenkins).
  • Strong Python programming skills for infrastructure automation, tooling, and services.
  • Experience architecting solutions across the full ML lifecycle, from experiment tracking to advanced deployment patterns and monitoring.
  • Familiarity with modern MLOps tools like MLflow, Kubeflow, SageMaker, or Vertex AI.

Responsibilities

  • Own the technical vision, strategy, and end-to-end architecture for Trase’s MLOps platform, ensuring scalability, reliability, security, and cost-efficiency.
  • Architect and build a sophisticated CI/CD/CT ecosystem to automate the entire ML lifecycle, from data validation to production monitoring.
  • Lead the design of scalable and resilient ML infrastructure using IaC (Terraform) and container orchestration (Kubernetes) on a major cloud platform.
  • Establish MLOps best practices, including frameworks for version control, experiment tracking, model governance, and responsible AI.
  • Implement a robust monitoring and alerting framework to track model performance, detect drift, and ensure the reliability of production ML services.
  • Define patterns for operating large-scale LLMs and multi-modal AI in production with efficiency and compliance.
  • Solve highly ambiguous, large-scale ML deployment challenges where no precedent exists, defining best practices for the org.

Other

  • 10+ years in software/infrastructure engineering, with 5+ years in a senior/lead MLOps, ML Infrastructure, or Platform role.
  • Exceptional communication skills to articulate complex architectural strategy to stakeholders at all levels.
  • Serve as the organization's thought leader on MLOps, mentoring engineers, and driving cross-functional alignment on platform strategy and best practices.
  • Define the multi-year roadmap for Trase’s MLOps ecosystem in alignment with business and product strategy.
  • Some travel is required.