Meta's growing AI/HPC infrastructure requires end-to-end AI product introductions and AI operations initiatives to solve challenging networking problems and drive innovative solutions.
Requirements
- Experience in AI/HPC product development and operations
- Demonstrated experience in the Network communications stack for AI solutions
- Fundamental knowledge of the hardware components
- Understanding of the Network communication stack, Network Hardware (NICs, Optics & Switches)
- Experience Developing & Delivering AI Cluster Solutions for training & inference use cases
- Experience delivering tech programs or products from inception to delivery
- Experience operating autonomously across multiple teams, demonstrated critical thinking, and thought leadership
Responsibilities
- Lead technical program management of next-generation Artificial Intelligence/Machine Learning (AI/ML) platform(s) for Meta's Network Infrastructure in a matrix organization covering a range of areas (Data Center, Network, Hardware Systems, Infrastructure Engineering, Software Engineering, Capacity Management) and across multiple physical locations
- Collaborate with Engineering and business owners to define program requirements, set priorities, and establish scope which includes defining the roadmap and long-term strategy of the teams that you are partnering with
- Manage cross functional dependencies, risks, and changes effectively by optimizing scope, schedule, and resources accordingly
- Develop and own communication plans to effectively and proactively communicate program status, issues, and risks to stakeholders
- Partner with cross functional teams to drive technical analysis, design, development, testing, implementation, and post implementation phases
- Define and track key metrics and key quality and performance indicators and drive cross functional execution of program deliverables
- Proactively identify and analyze complex, long-term, critical infrastructure problems with engineering leaders and stakeholders
Other
- proven track record of communication and leadership and program management
- B.S. in Computer Science or a related technical discipline, or equivalent experience
- 12+ years of software engineering, systems engineering, hardware engineering, or technical product/program management experience
- 8+ years experience in delivering Network solutions/Programs for Data Center applications
- Communication experience and experience working with technical management teams to develop systems, solutions, and products