Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Amazon Logo

Sr. Machine Learning Engineer, Amazon General Intelligence (AGI)

Amazon

$151,300 - $261,500
Aug 31, 2025
Bellevue, WA, US
Apply Now

Amazon's Machine Learning training infrastructure (ML Infra) team is looking to design, implement, and optimize large-scale computing infrastructure to power cutting-edge AI and machine learning initiatives.

Requirements

  • 8+ years of professional software development experience in distributed systems with emphasis on ML infrastructure
  • 8+ years of current programming experience building ML infrastructure using languages such as Python, C++ or Rust
  • Hands-on experience with parallel computing platforms such as CUDA, OpenMP, etc
  • Deep understanding of AI frameworks such as PyTorch, TensorFlow, and JAX, and their demands on underlying compute infrastructure, memory bandwidth, network interconnect, and storage as scale goes up
  • Knowledge of emerging AI hardware accelerators and architectures
  • Experience with containerization and orchestration technologies (Docker, Kubernetes)
  • Experience with cloud computing platforms (AWS, Azure, GCP) and their offerings

Responsibilities

  • Lead the definition, design, architecture quality, implementation, and delivery of the most advanced, most difficult, most cross-cutting, and/or most ambiguous challenges spanning across our ML infrastructure.
  • Align the teams in ML Infrastructure and related organizations to a coherent technical vision and deliver systems that fit well together.
  • Exert influence over multiple teams, increasing their productivity and effectiveness.
  • Considered to be an authority on technical issues by both the technical and research community, you are responsible for guiding difficult trade-off decisions and drive awareness about the impact and consequences of technical decisions on AI research and product development.
  • Demonstrate significant innovation, creativity, and judgement when solving challenging AI/ML infrastructure problems.
  • Actively mentor senior and Principal engineers, scale yourself by developing and institutionalizing best practices in AI/ML infrastructure and distributed computing across the organization.

Other

  • 5+ years of non-internship professional software development experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in computer science or equivalent
  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience