Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Applied Research Intern

Labelbox

$35,000 - $45,000

Aug 13, 2025

San Francisco, CA, US

Labelbox is looking to solve the problem of developing critical infrastructure that powers breakthrough AI models at leading research labs and enterprises, by building a team that can design, build, and productionize evaluation and post-training systems for frontier LLMs and multimodal models.

Requirements

A strong foundation in AI and machine learning, backed by a Ph.D. or Master’s degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions).
A deep understanding of frontier autoregressive and diffusion multimodal models, along with the human and synthetic data strategies needed to optimize them.
Passion and experience for LLM evaluation and benchmarking.
Expertise in training data quality construction, measurement and refinement.
The ability to bridge research and application by interpreting new findings and translating them into functional prototypes.
Proficiency in Python and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow.
A track record of publishing in top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL) and contributing to the broader research community.

Responsibilities

Build and own evaluation and benchmark suites for reasoning, code, agents, long-context, and V/LLMs.
Create post-training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals.
Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO-style training loops to improve real-world task and agent performance.
Land research in product: ship improvements into Labelbox workflows, services, and customer-facing evaluation/quality features; quantify impact with customer and internal metrics.
Engage with customer research teams: run pilots, co-design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers.
Design, build, and productionize evaluation and post-training systems for frontier LLMs and multimodal models.
Own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool-use, long-context, vision-language, et al.)

Other

A Ph.D. or Master’s degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions).
Exceptional communication and collaboration skills.
Ability to work in a hybrid model with 2 days per week in office, combining collaboration and flexibility
Ability to work in a fast-paced and high-intensity environment, perfect for ambitious individuals who thrive on ownership and quick decision-making
Ability to exercise caution and suspend or discontinue communications if encountering suspicious emails or interactions