Lambda is looking to build the world's best deep learning cloud and needs a senior technical leader to set the vision, strategy, and architecture for its storage infrastructure across cloud and hybrid environments to deliver unmatched performance, durability, scalability, and cost efficiency for the most demanding AI workloads in the world.
Requirements
- Designing and delivering large-scale storage platforms, including block, file, and object architectures, for performance-critical workloads.
- Evaluating and selecting storage technologies through benchmarking of throughput, IOPS, latency, durability, and total cost of ownership.
- Architecting and managing storage solutions for petabyte- to exabyte-scale datasets, including intelligent tiering strategies.
- Defining lifecycle management, replication, and disaster recovery strategies to ensure data durability and high availability.
- Integrating storage services across hybrid and multi-cloud environments to deliver a unified, high-performance platform.
- Proven expertise in cloud-scale storage or infrastructure platforms
- Experience with software-defined and cloud-native storage architectures
Responsibilities
- Define and execute the long-term vision and strategic roadmap for Lambda’s storage platform across cloud and hybrid environments, ensuring it delivers uncompromising performance, scalability, durability, and cost efficiency for the world’s largest AI workloads.
- Lead the evaluation, selection, and seamless integration of advanced storage technologies — spanning block, file, and object architectures — using rigorous benchmarking to optimize IOPS, throughput, latency, and total cost of ownership.
- Translate complex infrastructure capabilities into clear product requirements, precise service-level objectives (SLOs), and measurable performance benchmarks that align with demanding AI and HPC use cases.
- Architect and implement intelligent data tiering strategies (hot, warm, cold) to maximize performance where it matters and drive significant cost savings at scale.
- Collaborate with infrastructure and operations leaders to forecast multi-year capacity growth, design for petabyte-to-exabyte scalability, and ensure consistent performance under peak workloads.
- Define and enforce lifecycle management, replication, and disaster recovery policies that guarantee data integrity, compliance, and near-zero downtime.
- Own the observability and optimization roadmap for the storage platform, deploying advanced telemetry, monitoring, and analytics to proactively detect and remediate bottlenecks before they impact customers.
Other
- Bachelor’s degree or foreign equivalent in Computer Science, Electrical Engineering, Computer Engineering, or a closely related technical field.
- Seven (7) years of progressive, post-baccalaureate experience in product management, including at least four (4) years focused specifically on cloud-scale storage or infrastructure platforms.
- Note: This position requires presence in our San Francisco or Seattle office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.
- Health, dental, and vision coverage for you and your dependents
- 401k Plan with 2% company match (USA employees)