Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Machine Learning Engineer — LLM Evaluation & Task Creation

Work Vista

$35 - $35

Dec 26, 2025

Remote, US

An AI research organization is looking to design, evaluate, and curate machine learning tasks, datasets, and evaluation workflows that support the training and benchmarking of advanced AI models, specifically large language models (LLMs).

Requirements

Minimum of 2 years of applied experience in machine learning.
Strong proficiency in Python and modern ML frameworks (PyTorch or TensorFlow).
Solid understanding of ML fundamentals, model evaluation, and optimization.
Experience with ML pipelines, experiment tracking, and cloud environments.
Experience creating ML benchmarks, evaluations, or challenge problems.
Background in generative models, LLMs, or multimodal learning.
Familiarity with MLOps tools (e.g., MLflow, Weights & Biases, Docker).

Responsibilities

Design and frame machine learning tasks to evaluate and improve LLM capabilities.
Build, train, and evaluate ML models across NLP, classification, prediction, and generative tasks.
Conduct experimentation, performance analysis, and iterative improvement.
Perform feature engineering, data preprocessing, and robustness testing.
Implement evaluation metrics, benchmarking workflows, and bias analyses.
Fine-tune and evaluate transformer-based models where applicable.
Maintain clear documentation of datasets, experiments, and modeling decisions.

Other

Technical degree in Computer Science, Engineering, Statistics, Mathematics, or a related field.
Professional working proficiency in written and spoken English.
Fully remote and asynchronous collaboration.
Hourly contract engagement with flexible scheduling.
Approximately 30–40 hours per week (flexible)