10a Labs is looking to solve the problem of ensuring the robustness and performance of a mission-critical classification system by leading technical development and red teaming strategy.
Requirements
- Background in data science, applied ML, or ML engineering, with proven experience in production-grade systems.
- Strong analytical toolkit (Python, SQL, Jupyter, scikit-learn, Pandas, etc.) and familiarity with modern ML tooling (e.g., PyTorch, Hugging Face, LangChain).
- Experience working with LLMs or embedding-based classification systems.
- Safety evaluation, red teaming, or adversarial content testing in LLMs.
- Trust & safety or risk-focused classification systems.
- Annotation ops, feedback loops, or evaluation pipeline design.
- Experience with open-source model evaluation tools (Promptfoo, DeepEval, etc.).
Responsibilities
- Design and oversee the technical implementation of a robust red teaming project.
- Develop evaluation frameworks, performance metrics, and model validation strategies aligned with safety goals.
- Lead adversarial testing efforts (e.g., red teaming, evasion probes, jailbreak simulation).
- Work with researchers and domain experts to define labeling schemas and edge-case tests.
- Partner with ML and infrastructure engineers to ensure production readiness, observability, and performance targets.
- Communicate technical strategy and tradeoffs clearly across internal and client teams.
Other
- 3-5 years of experience in applied data science, ML product work, or security-focused AI, including technical leadership or staff-level ownership.
- Has designed and evaluated real-world ML systems with a focus on model behavior, error analysis, and continuous improvement.
- Can design red teaming workflows to surface model blind spots and failure modes.
- Operates effectively across ML, infra, and policy / strategy contexts.
- Thinks like a builder, analyst, and communicator all in one.