Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Scale AI Logo

Machine Learning Engineer - Model Evaluations, Public Sector

Scale AI

$187,000 - $300,000
Dec 16, 2025
San Francisco, CA, US • Saint Louis, MO, US • New York, NY, US • Washington, DC, US
Apply Now

The Public Sector ML team at Scale deploys advanced AI systems into mission-critical government environments and aims to build evaluation frameworks that ensure these models operate reliably, safely, and effectively under real-world constraints.

Requirements

  • Experience in computer vision, deep learning, reinforcement learning, or NLP in production settings.
  • Strong programming skills in Python; experience with TensorFlow or PyTorch.
  • Background in algorithms, data structures, and object-oriented programming.
  • Experience with LLM pipelines, simulation environments, or automated evaluation systems.
  • Ability to convert research insights into measurable evaluation criteria.
  • Cloud experience (AWS, GCP) and model deployment experience.
  • Experience with LLM evaluation, CV robustness, or RL validation.

Responsibilities

  • Develop and maintain automated evaluation pipelines for ML models across functional, performance, robustness, and safety metrics, including LLM-judge–based evaluations.
  • Design test datasets and benchmarks to measure generalization, bias, explainability, and failure modes.
  • Build evaluation frameworks for LLM agents, including infrastructure for scenario-based and environment-based testing.
  • Conduct comparative analyses of model architectures, training procedures, and evaluation outcomes.
  • Implement tools for continuous monitoring, regression testing, and quality assurance for ML systems.
  • Design and execute stress tests and red-teaming workflows to uncover vulnerabilities and edge cases.
  • Collaborate with operations teams and subject matter experts to produce high-quality evaluation datasets.

Other

  • This role will require an active security clearance or the ability to obtain a security clearance.
  • Graduate degree in CS, ML, or AI.
  • Knowledge of interpretability, adversarial robustness, or AI safety frameworks.
  • Familiarity with ML evaluation frameworks and agentic model design.
  • Experience in regulated, classified, or mission-critical ML domains.