Veeva Systems is looking to ensure the reliability, accuracy, and safety of their Veeva AI Agents through rigorous evaluation and systematic validation methodologies.
Requirements
- Data Integrity & Validation: A strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards.
- Prompt Engineering & Model Expertise: Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions.
- Automated Evaluation Implementation: Proficiency in designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors.
- Debugging Agentic Systems: Must be comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure points
- Programming & Frameworks: Proficiency in Python for developing custom evaluation frameworks, writing scripts, and integrating pipelines with CI/CD systems.
- Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools)
Responsibilities
- Evaluation Strategy & Planning: Define and establish comprehensive evaluation strategies for new AI Agents.
- LLM Output Integrity Assessment: Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics
- Creating High-Fidelity Datasets: Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios.
- Automation of Evaluation Pipelines: Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior
- Root Cause Analysis: Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations
- Reporting & Performance Metrics: Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams
Other
- Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMs
- High work ethic.
- High integrity and honesty.
- Applicants must have the unrestricted right to work in the United States or Canada.
- Flexible PTO and company paid holidays