Roku is looking to improve the performance of its advertising ecosystem by developing a state-of-the-art inference platform that can handle low latencies, scale, throughput, and availability with optimizations across hardware, software, and models.
Requirements
- Strong programming skills in high-performance languages
- Deep understanding of inference frameworks and ML system deployment
- Proven experience optimizing performance for large-scale machine learning systems, including a deep knowledge of SOTA model optimizations, hardware-software co-design, GPU acceleration, and HPC techniques
- Experience leading teams working on high-throughput, low-latency ML serving systems
- Experience collaborating with and leading global, cross-functional teams
- Contributions to open-source ML or systems projects
Responsibilities
- Lead the design and development of a SOTA Inference platform
- Oversee the development of monitoring, observability, and other tooling to ensure system and model performance, reliability, and scalability of online inference services
- Identify and resolve system inefficiencies, performance bottlenecks, and reliability issues, ensuring optimized end-to-end performance
- Stay at the forefront of advancements in inference frameworks, ML hardware acceleration, and distributed systems, and incorporate innovations where and when they are impactful
Other
- M.S. or above in CS, ECE, or a related field
- 10+ years of experience in developing and deploying large-scale, distributed systems, with at least 5 years in a leadership or technical lead role
- Excellent communication and collaboration skills
- Ability to work in a fast-paced environment with a focus on company success
- Must be able to work in the office Monday through Thursday, with Fridays flexible for remote work