Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AI Evaluation Engineer

Arizona State University

$95,000 - $105,000

Sep 10, 2025

Scottsdale, AZ, US

Arizona State University (ASU) is looking to solve the problem of evaluating and optimizing AI models, particularly large language models (LLMs), to improve the learner's experience across its digital higher education portfolio and ensure AI models meet both technical and business objectives.

Requirements

At least 4 years of hands-on software development and/or AI solutioning experience.
At least 4 years of hands-on data modeling and predictive analytics experience.
Experience developing AI/ML solutions on platforms like AWS, GCP, Azure, OpenAI, Databricks, Snowflake, etc.
Experience developing applications that build AI APIs into smart agents.
Experience developing applications using multimodal AI models.
Experience designing, developing, and implementing generative AI models and algorithms, utilizing large language models (LLMs) for applications such as text generation, audio-to-text transcription, and qualitative data insights.
Experience with data manipulation and analysis using tools like Python and SQL

Responsibilities

Gather and preprocess structured and unstructured datasets, ensuring data quality and suitability for AI model evaluation.
Utilize AI-driven tools to evaluate model outputs for factual accuracy, relevance, and completeness, especially for large language models (LLMs).
Assess model performance using standard metrics (e.g., accuracy, precision, recall, F1 score) and advanced evaluation techniques.
Conduct comparative analysis of multiple LLMs, algorithms and prompts to select the best-performing model based on specific KPIs and business goals.
Detect and mitigate any biases in model predictions, ensuring fairness and reducing the risk of harmful outputs.
Develop strategies to identify and eliminate hallucinations and other unintended behaviors in the model.
Develop and implement continuous monitoring systems to track model performance in real time, detecting anomalies, degradation, or model drift.

Other

Must be able to reliably commute to Scottsdale Arizona three days a week.
Ability to communicate with cross-functional teams about various AI topics such as LLMs, VectorDBs, RAGs, etc.
Demonstrated ability to communicate thoughtfully, using problem-solving skills and build positive working relationships with cross-functional teams.
Applicant must be eligible to work in the United States. EdPlus at ASU will not be a sponsor for this position.
Are you able to commute three days a week to the office in Scottsdale, AZ?