The company is looking to solve the problem of ensuring AI systems, specifically large language models (LLMs) and multimodal systems, are safe, aligned, and robust against misuse.
Requirements
- 5+ years of experience in machine learning or AI systems, with 2+ years in a technical leadership capacity
- Experience integrating safety interventions into ML deployment workflows (e.g., inference servers, filtering layers, etc.)
- Good understanding of transformer-based models and experience with LLM safety, robustness, or interpretability
- Strong background in evaluating model behavior, especially in adversarial or edge-case scenarios
Responsibilities
- Lead the development of model-level safety defenses to mitigate jailbreaks, prompt injection, and other forms of unsafe or non-compliant outputs
- Design and develop evaluation pipelines to detect edge cases, regressions, and emerging vulnerabilities in LLM behavior
- Contribute to the design and execution of adversarial testing and red teaming workflows to identify model safety gaps
- Support fine-tuning workflows, pre/post-processing logic, and filtering techniques to enforce safety across deployed models
- Work with red teamers and researchers to turn emerging threats into testable evaluation cases and measurable risk indicators
- Stay current on LLM safety research, jailbreak tactics, and adversarial prompting trends, and help translate those into practical defenses for real-world products
Other
- Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, or a related field
- Strong communication skills and ability to drive alignment across diverse teams