Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Lambda Logo

Engineering Manager - Observability

Lambda

$297,000 - $495,000
Sep 19, 2025
Seattle, WA, US
Apply Now

Lambda is looking to build the world's best deep learning cloud and needs to ensure its reliability and performance for AI deployments. The Observability team is responsible for building and operating large-scale monitoring systems to keep these offerings reliable and instantly detect issues in high-performance AI clusters.

Requirements

  • Experience with a wide variety of modern open-source observability software.
  • Strong background in software engineering and the SDLC.
  • Extensive experience with site reliability engineering and ability to champion improved SRE practices.
  • Experience building a high-performance team through deliberate hiring, upskilling, performance-management, and expectation setting.
  • Experience with Kubernetes, designing scalable distributed systems
  • Significant experience in environments that require the monitoring of bare-metal infrastructure is preferred.
  • Experience driving cross-functional engineering management initiatives (coordinating events, strategic planning, coordinating large projects).

Responsibilities

  • Grow/Hire, lead, and mentor a team of high-performing observability engineers and SREs.
  • Work with the engineering team to drive strategy for Lambda internal and customer observability solutions.
  • Improve observability of AI infrastructure and develop new monitoring solutions as new products are introduced.
  • Lead team in the continued development of our existing Metrics solutions based on the Prometheus and OpenTelemetry ecosystems.
  • Lead team in tasks related to delivery of new Logging and Tracing solutions based on Clickhouse.
  • Participate in design of solutions for bringing observability data to our customers.
  • Identify gaps in our observability posture and drive resolution.

Other

  • This position requires presence in our San Francisco or Seattle office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.
  • 10+ years of experience in observability systems or platform engineering with at least 3 years in a management or lead role.
  • Demonstrated experience leading a team of engineers and SREs on complex, cross-functional projects in a fast-paced startup environment.
  • Strong project management skills, leading planning, project execution, and delivery of team outcomes on schedule.
  • Foster a culture of technical excellence, collaboration, and customer service.