Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

TRM Labs Logo

Senior MLOps Engineer – LLMOps

TRM Labs

$215,000 - $230,000
Nov 24, 2025
Remote, US
Apply Now

TRM Labs is looking to build and scale the technical infrastructure for AI/ML systems, with a specific focus on enabling next-generation AI applications, Large Language Models (LLMs), and agentic systems. The goal is to build robust pipelines, high-performance infrastructure, and operational tooling for fast, safe, and scalable AI deployment.

Requirements

  • Write high-quality, maintainable software — primarily in Python, but we value engineering ability over language familiarity.
  • Have a strong background in scalable infrastructure, including: Containerization and orchestration (e.g. Docker, Kubernetes)
  • Infrastructure-as-code and deployment (e.g. Terraform, CI/CD pipelines)
  • Monitoring and logging frameworks (e.g. Datadog, Prometheus, OpenTelemetry)
  • Understand and implement ML Ops best practices, including: Model versioning and rollback strategies
  • Automated evaluation and drift detection
  • Scalable model and agent serving infrastructure (e.g. vLLM, Triton, BentoML)

Responsibilities

  • Build reusable CI/CD workflows for model training, evaluation, and deployment — integrating Langfuse, GitHub Actions, and experiment tracking, etc.
  • Automate model versioning, approval workflows, and compliance checks across environments.
  • Build out a modular and scalable AI infrastructure stack — including vector databases, feature stores, model registries, and observability tooling.
  • Partner with engineering and data science to embed AI models and agents into real-time applications and workflows.
  • Continuously evaluate and integrate state-of-the-art AI tools (e.g. LangChain, LlamaIndex, vLLM, MLflow, BentoML, etc.).
  • Drive AI reliability and governance, enabling experimentation while ensuring compliance, security, and uptime.
  • Deploy infrastructure to support offline and online evaluation of LLMs and agents — including regression testing, cost monitoring, and human-in-the-loop workflows.

Other

  • Demonstrate strong ownership and pragmatism, balancing infrastructure elegance with iterative delivery and measurable impact.
  • Rapid Issue Resolution. TRM Engineers identify and resolve critical onsite issues in minutes to hours, not weeks.
  • Navigating Bureaucracy. We anticipate and address procedural hurdles, build trust with key stakeholders, and find alternative pathways to approvals.
  • Efficient Knowledge Transfer. Engineers document and share updates in real time, ensuring the entire team—onsite and remote—has full visibility into plans, blockers, and resolutions.
  • TRM may not be the right fit for everyone. If you're optimizing for work life balance, we encourage you to: Ask your interviewers how they personally approach balance within their teams, and Reflect on whether this is the right season in your life to join a high-velocity environment.