Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Member of Technical Staff - Reinforcement Learning (Infrastructure), AGI Autonomy

Amazon.com

$255,000 - $345,000

Aug 21, 2025

San Francisco, CA, US

The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds.

Requirements

Experience programming in Java, C++, Python or related language
Experience with neural deep learning methods and machine learning
Experience debugging ML systems
PhD in Computer Science, Machine Learning, or a related field, with a focus on ML System.
Demonstrated experience in developing, implementing and debugging large scale ML systems.
Experience with distributed system, Megatron, vLLM, Ray, and working with GPUs.

Responsibilities

Develop cutting-edge training infrastructure to ensure large-scale reinforcement learning on LLMs runs highly efficient and robust.
Work across the entire technology stack, including low level ML system, job orchestration and data management.
Analyze, troubleshoot and profiling complex ML systems, identify and address performance bottlenecks.
Work closely with researchers, conduct MLSys research to create new techniques, infrastructure, and tooling around emerging research capabilities.

Other

PhD, or Master's degree and 3+ years of applied research experience
Work safely and cooperatively with other employees, supervisors, and staff;
Adhere to standards of excellence despite stressful conditions;
Communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service;
Follow all federal, state, and local laws and Company policies.