Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AI Data Engineer

Veeva Systems

$85,000 - $225,000

Dec 12, 2025

Boston, MA, US

Veeva Systems is looking to ensure the reliability, accuracy, and safety of their Veeva AI Agents through rigorous evaluation and systematic validation methodologies.

Requirements

Data Integrity & Validation: A strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards.
Prompt Engineering & Model Expertise: Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions.
Automated Evaluation Implementation: Proficiency in designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors.
Debugging Agentic Systems: Must be comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure points.
Programming & Frameworks: Proficiency in Python for developing custom evaluation frameworks, writing scripts, and integrating pipelines with CI/CD systems.
Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools)

Responsibilities

Evaluation Strategy & Planning: Define and establish comprehensive evaluation strategies for new AI Agents.
LLM Output Integrity Assessment: Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics.
Creating High-Fidelity Datasets: Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios.
Automation of Evaluation Pipelines: Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior.
Root Cause Analysis: Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations.
Reporting & Performance Metrics: Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams.

Other

Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMs
High work ethic.
High integrity and honesty.
Applicants must have the unrestricted right to work in the United States or Canada.
Flexible PTO and company paid holidays