Waymo is establishing a new team called SCORPIO (SimEval Capacity Operations, Resource Planning, Infrastructure Optimization) to ensure the efficient and effective use of Waymo's large-scale simulation compute, storage, and network resources. The team will develop data-driven models, metrics, and processes to forecast demand, plan capacity, and optimize resource allocation, ultimately improving developer experience and maximizing return on infrastructure investments.
Requirements
- Strong expertise in statistical modeling, time series analysis, and forecasting techniques (e.g., ARIMA, Exponential Smoothing, regression models).
- Demonstrated ability to work with large-scale, complex datasets and experience with distributed computing environments.
- Proficiency in Python or R, including common data science libraries (e.g., pandas, NumPy, SciPy, scikit-learn).
- Expertise in SQL and experience with data warehousing solutions (e.g., BigQuery, etc.).
- Direct experience in CapEx Engineering, Cloud Services Capacity Planning (e.g., AWS, GCP, Azure), or managing resources for large-scale compute/HPC infrastructure.
- Familiarity with simulation workloads, performance analysis, and distributed systems.
- Experience building and deploying data pipelines and automation tools in a production environment.
Responsibilities
- Define the vision, strategy, and technical roadmap for data-driven capacity planning and resource optimization within Waymo's simulation environment.
- Lead the development and implementation of sophisticated forecasting models to predict demand for heterogeneous TI resources (CPU, GPU, Storage, Bandwidth, RAM) across various time horizons and simulation workflows.
- Design, build, and maintain robust capacity models, key metrics, and insightful dashboards to monitor resource utilization, identify current and future bottlenecks, and inform investment decisions.
- Develop and propose actionable strategies for resource optimization, cost management, and risk mitigation to senior leadership, finance, and engineering teams.
- Collaborate deeply with Simulation, Infrastructure, Finance, Product Management, and Engineering teams to understand demand drivers, usage patterns, system changes, and their impacts on resource needs.
- Spearhead the design and development of automated systems for demand management, quota allocation, and resource reassignment to enhance efficiency and responsiveness.
- Provide data-driven insights to influence the design of simulation products and user guidelines, promoting more efficient resource consumption patterns.
Other
- PhD or Master's degree in Data Science, Statistics, Operations Research, Computer Science, Industrial Engineering, or a related quantitative field.
- 10+ years of experience in data science or quantitative analysis, with a significant focus on capacity planning, resource optimization, demand forecasting, or a closely related area.
- 5+ years of experience in a technical leadership role, with a proven track record of defining strategy, setting technical direction, and leading complex projects.
- Exceptional communication and collaboration skills, with the ability to convey complex quantitative findings and recommendations clearly to diverse audiences, including executive leadership.
- Build and mentor a high-performing team, potentially including data scientists, business analysts, and software engineers.