Develop reliable AI systems for the world's most important decisions at Scale
Requirements
- Deep technical background in applied AI/ML: 5–10+ years in research, engineering, solutions engineering, or technical product roles working on LLMs or multimodal systems, ideally in high-stakes, customer-facing environments
- Hands-on experience with model improvement workflows: demonstrated experience with post-training techniques, evaluation design, benchmarking, and model quality iteration
- Ability to work on hard, ambiguous technical problems: proven track record of partnering directly with advanced customers or research teams to scope, reason through, and execute on deep technical challenges involving frontier models
- Strong technical fluency: you can read papers, interrogate metrics, write or review complex Python/SQL for analysis, and reason about model-data trade-offs
- Executive presence with world-class researchers and enterprise leaders; excellent writing and storytelling
- Bias to action: you ship, learn, and iterate
- Experience with RLVR, benchmarks, and other evaluation frameworks
Responsibilities
- Translate research product: work with client side researchers on post-training, evals, safety/alignment and build the primitives, data, and tooling they need
- Partner deeply with core customers and frontier labs: work hands-on with leading AI teams and frontier research labs to tackle hard, open-ended technical problems related to frontier model improvement, performance, and deployment
- Shape and propose model improvement work: translate customer and research objectives into clear, technically rigorous proposals—scoping post-training, evaluation, and safety work into well-defined statements of work and execution plans
- Translate research into production impact: collaborate with customer-side researchers on post-training, evaluations, and alignment, and help design the data, primitives, and tooling required to improve frontier models in practice
- Own the end-to-end lifecycle: lead discovery, write crisp PRDs and technical specs, prioritize trade-offs, run experiments, ship initial solutions, and scale successful pilots into durable, repeatable offerings
- Lead complex, high-stakes engagements: independently run technical working sessions with senior customer stakeholders; define success metrics; surface risks early; and drive programs to measurable outcomes
- Build evaluation rigor at the frontier: design and stand up robust evaluation frameworks (e.g., RLVR, benchmarks), close the loop with data quality and feedback, and share learnings that elevate technical execution across accounts
Other
- Customer-obsessed: start from real research needs; prototype quickly; validate with data
- Cross-functional by default: align research, engineering, ops, and GTM on a single plan; communicate clearly up and down
- Field-forward: expect regular customer time and research leads; light travel as needed
- Bachelor's, Master's, or Ph.D. degree in a relevant field
- Excellent communication and collaboration skills