Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Tech Lead – AML Inference

ByteDance

Salary not specified

Aug 29, 2025

San Jose, CA, USA

ByteDance's Applied Machine Learning (AML) team needs to push the next-generation AI infrastructure and recommendation platform for ads ranking, search ranking, live streaming, and e-commerce. The Tech Lead, AML Inference will oversee the development and execution of ByteDance’s inference infrastructure, ensuring reliability, scalability, and performance across large-scale distributed systems.

Requirements

5+ years of experience in developing and deploying large-scale, distributed systems, with at least 5 years in a leadership or technical lead role.
Strong programming skills in languages such as C++, Python, or Go.
Deep understanding of inference frameworks and ML system deployment (e.g., TensorFlow, PyTorch, TensorRT, JAX, MXNet).
Proven experience optimizing performance for large-scale machine learning systems, including hardware-software co-design, GPU/RDMA acceleration, or HPC techniques.
Experience leading teams working on high-throughput, low-latency ML serving systems.
Contributions to open-source ML or systems projects.
Familiarity with container orchestration, service mesh, or cloud-native ML infrastructure.

Responsibilities

Lead and mentor a team of inference-focused Machine Learning Engineers, setting technical direction and ensuring best practices.
Drive the design and evolution of distributed inference infrastructure to support feeds, ads, search, and other core ranking models.
Oversee the development of monitoring, observability, and management tools to ensure reliability and scalability of online inference services.
Identify and resolve system inefficiencies, performance bottlenecks, and reliability issues, ensuring optimized end-to-end performance.
Partner with research and product teams to translate requirements into robust and efficient inference solutions.
Stay at the forefront of advancements in inference frameworks, ML hardware acceleration, and distributed systems, incorporating innovations where impactful.

Other

Excellent communication and collaboration skills; ability to work across research, engineering, and product teams.
Experience collaborating with and leading global, cross-functional teams across different time zones.