Waymo is looking to establish a new team, SCORPIO, to ensure the efficient and effective use of its large-scale simulation compute, storage, and network resources. This involves developing data-driven models, metrics, and processes for demand forecasting, capacity planning, and resource optimization to improve developer experience and maximize return on infrastructure investments.
Requirements
- Strong expertise in statistical modeling, time series analysis, and forecasting techniques (e.g., ARIMA, Exponential Smoothing, regression models).
- Demonstrated ability to work with large-scale, complex datasets and experience with distributed computing environments.
- Proficiency in Python or R, including common data science libraries (e.g., pandas, NumPy, SciPy, scikit-learn).
- Expertise in SQL and experience with data warehousing solutions (e.g., BigQuery, etc.).
- Direct experience in CapEx Engineering, Cloud Services Capacity Planning (e.g., AWS, GCP, Azure), or managing resources for large-scale compute/HPC infrastructure.
- Familiarity with simulation workloads, performance analysis, and distributed systems.
- Experience building and deploying data pipelines and automation tools in a production environment.
Responsibilities
- Define the vision, strategy, and technical roadmap for data-driven capacity planning and resource optimization within Waymo's simulation environment.
- Lead the development and implementation of sophisticated forecasting models to predict demand for heterogeneous TI resources (CPU, GPU, Storage, Bandwidth, RAM) across various time horizons and simulation workflows.
- Design, build, and maintain robust capacity models, key metrics, and insightful dashboards to monitor resource utilization, identify current and future bottlenecks, and inform investment decisions.
- Develop and propose actionable strategies for resource optimization, cost management, and risk mitigation to senior leadership, finance, and engineering teams.
- Collaborate deeply with Simulation, Infrastructure, Finance, Product Management, and Engineering teams to understand demand drivers, usage patterns, system changes, and their impacts on resource needs.
- Spearhead the design and development of automated systems for demand management, quota allocation, and resource reassignment to enhance efficiency and responsiveness.
- Provide data-driven insights to influence the design of simulation products and user guidelines, promoting more efficient resource consumption patterns.
Other
- 5+ years of experience in a technical leadership role, with a proven track record of defining strategy, setting technical direction, and leading complex projects.
- Exceptional communication and collaboration skills, with the ability to convey complex quantitative findings and recommendations clearly to diverse audiences, including executive leadership.
- Experience with financial modeling, cost-benefit analysis, and ROI calculations related to technical infrastructure.
- Experience hiring, growing, and nurturing a technical team.
- Build and mentor a high-performing team, potentially including data scientists, business analysts, and software engineers.