Roblox is looking to build and improve its ML Platform to empower its community of developers and creators by providing tools and infrastructure for machine learning. The goal is to make ML accessible and efficient for internal developers, enabling them to train, evaluate, deploy, and operate models quickly and safely.
Requirements
- 5+ years of professional experience and have a wealth of system design experience upon which to draw to build a scalable, reliable ML platform for all of Roblox.
- Proficiency in API design and developer experience—gRPC/REST APIs, SDKs, CLIs, and simple UIs that developers love to use.
- Experience with the end‑to‑end ML model lifecycle such as model serving, training, model CI/CD, and GPU resources management, and have built ML platform features that are delightful to use.
- Hands‑on ML experience is a plus
- Experience with infrastructure‑as‑code
- Experience automating painful manual processes
- Experience building scalable, reliable ML platforms
Responsibilities
- Own platform as a product and set direction end to end: Define requirements, write RFDs, and ship APIs, SDKs, CLIs, and UIs that make ML@Roblox easy to adopt.
- Bootstrap and maintain core ML Platform components: Serving Layer, Model Registry, Pipeline Orchestrator, and Training/Inference control planes.
- Set technical strategy and oversee development of high scale and reliable infrastructure systems, with clear SLOs for latency, availability, and cost.
- Design great developer experiences with paved‑road templates, golden paths, opinionated defaults, and clear docs to reduce time‑to‑first‑production.
- Instrument the platform to measure adoption, friction, reliability, and cost; use data to prioritize roadmap and validate outcomes.
- Partner across organizations (ML Engineering, Data Science, Infra/SRE, Security, Finance) to optimize performance, safety, and spend, especially for GPU‑intensive training and high‑QPS inference.
- Propose and implement new platform tooling to improve time to production for MLEs across the full ML lifecycle.
Other
- Treat platform as a product
- Blend product thinking, developer experience, backend engineering, and infrastructure at scale
- A Code Machine; you love not only to design and communicate ideas but also to actually ship product.
- Obsess about user feedback, and constantly drive towards getting platform features in customers hands.
- Passionate about supporting ML engineers to meet and understand their needs, and translating them into clean, durable platform abstractions.