Grafton Sciences is building physical general intelligence and needs a Senior ML Infrastructure / MLOps Engineer to build and operate the infrastructure that powers large-scale training, fine-tuning, RLHF/DPO pipelines, dataset governance, experiment tracking, and model deployment.
Requirements
- Strong experience with ML infrastructure, distributed training, experiment management, or production ML systems.
- Familiarity with containerization, orchestration, model runners, dataset governance, and evaluation pipelines.
- Ability to design reliable training and deployment workflows that support high-throughput experimentation.
Responsibilities
- Build and maintain scalable infrastructure for training, fine-tuning, RLHF/DPO workflows, and distributed experiments.
- Develop data pipelines, dataset versioning systems, experiment tracking tools, and reproducibility frameworks.
- Operate containerized inference and training environments, CI/CD for models, and evaluation automation.
- Design distributed training systems, containerized model runners, data versioning workflows, and reproducible evaluation pipelines that enable rapid iteration across LLMs, RL agents, and surrogate models.
Other
- Collaborate with LLM researchers, RL scientists, data engineering, and systems teams to support rapid iteration and robust model deployment.
- Comfortable working across ML, infrastructure, data systems, and engineering teams in a fast-paced research environment.
- Above all, we look for candidates who can demonstrate world-class excellence.