Google Cloud needs to evaluate the performance of future GPU platforms and optimize the efficiency of its GPU fleet for machine learning workloads.
Requirements
- 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree.
- 2 years of experience with data structures or algorithms in either an academic or industry setting.
- 3 years of experience with LLMs/ML, algorithms and tools (e.g. TensorFlow/Jax), Artificial Intelligence (AI), deep learning, or natural language processing.
- 2 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
- Experience in developing and deploying AI/ML models and algorithms.
- Experience in Python and any other languages (e.g., C++, Kotlin, Java.).
- Understanding of Machine Learning, data analysis and developer tools.
Responsibilities
- Help Google Cloud thoroughly evaluate the performance of future GPU platforms with an opportunity to influence the GPU roadmap at Google.
- Engage with GPU vendors to perform a detailed benchmark of the latest GPU systems and improve the simulation accuracy for these new systems.
- Perform detailed roofline analysis on the latest production ML workloads/hardware to help identify opportunities/bottlenecks for optimization in the fleet.
- Conduct competitive analysis of various Machine Learning (ML) workloads/platforms to better understand and help Google leadership navigate the complex and ever-changing ML landscape.
- evaluate current and future ML workloads/hardware using detailed benchmarking and simulation of ML systems
- guide decision making for the Cloud hardware teams and cross-functional optimization efforts to improve GPU fleet efficiency.
Other
- Bachelor’s degree or equivalent practical experience.
- Master's degree or PhD in Computer Science or related technical fields.