Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Applied Research - Evals & Data

Prime Intellect

Salary not specified

Oct 27, 2025

San Francisco, CA, US

Prime Intellect is building the open superintelligence stack, aiming to enable anyone to create, train, and deploy frontier agentic models by aggregating and orchestrating global compute into a single control plane and pairing it with the full RL post-training stack.

Requirements

Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment.
Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines).
Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate).
Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform).
Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL.

Responsibilities

Designing and iterating on next-generation AI agents that tackle real workloads—workflow automation, reasoning-intensive tasks, and decision-making at scale.
Developing the distributed systems, evaluation pipelines, and coordination frameworks that enable these agents to operate reliably, efficiently, and at massive scale.
Building data capture, processing, and versioning workflows for feedback, model traces, and reward signals.
Design and implement novel RL and post-training methods (RLHF, RLVR, GRPO, etc.) to align large models with domain-specific tasks.
Build evaluation harnesses and verifiers to measure reasoning, robustness, and agentic behavior in real-world workflows.
Integrate applied data collection and analytics into the post-training process to surface regressions, emergent skills, and alignment opportunities.
Architect and maintain distributed training and inference pipelines, ensuring scalability and cost efficiency.

Other

This is a customer facing role at the intersection of cutting-edge RL/post-training methods, applied data, and agent systems.
Translating customer needs and insights from applied data into clear technical requirements that guide product and research priorities.
Work side-by-side with customers to deeply understand workflows, data sources, and bottlenecks.
Prototype agents, data pipelines, and eval harnesses tailored to real use cases, then hand off hardened systems to core teams.
Translate customer insights and evaluation results into roadmap and research direction.