Sepal AI is looking to solve the problem of ensuring AI agents achieve human-set goals safely, securely, and repeatably
Requirements
- 3+ years writing production Python and automating infra (Docker, Kubernetes, Terraform, or similar)
- Experience with MCP-style eval tooling
- Experience with RLHF/LLM agent frameworks
- Experience with benchmark design
Responsibilities
- Build and maintain Python scripts & CI pipelines for AI doing complex tasks
- Extend our internal evaluation framework to capture success/failure signals and edge-case logs
- Collaborate with research teams to plug in new LLM agents, datasets, and scoring rubrics
Other
- Able to translate vague research specs into clean, testable engineering artifacts
- Flexible work arrangements
- Competitive compensation
- Collaborate with a diverse team of world-class experts