Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

o9 Solutions Logo

SRE Manager

o9 Solutions

$149,818 - $205,999
Oct 1, 2025
Dallas, TX, USA
Apply Now

o9 is looking to solve the problem of transforming decision-making through an AI-first approach, integrating siloed planning capabilities, and capturing value leakage to help businesses plan smarter and faster, thereby enhancing operational efficiency and reducing waste.

Requirements

  • Strong knowledge of cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
  • Expertise in observability tools (Prometheus, Grafana, Datadog, etc.) and incident management platforms.
  • Experience with configuration management tools (Terraform, Ansible, Helm, etc.).
  • Solid understanding of networking, security, Linux internals, and distributed systems.
  • Relevant cloud certifications (AWS, Azure, or GCP) strongly preferred.
  • Kubernetes Administration (CKA) certification is a plus.
  • Experience operating complex, cloud-native production systems at scale.

Responsibilities

  • Hire, mentor, and manage a globally distributed team of Site Reliability Engineers.
  • Own system uptime and SLA compliance across o9’s cloud-native production environment.
  • Drive root cause analysis and implement post-incident learning processes to improve system resilience.
  • Oversee the design and implementation of robust monitoring, alerting, and logging solutions.
  • Lead initiatives to improve infrastructure automation, deployment pipelines, and CI/CD practices.
  • Champion Infrastructure as Code (IaC) and GitOps best practices.
  • Manage capacity planning, scalability efforts, and performance tuning across services.

Other

  • Bachelor’s degree in Computer Science, Engineering, or a related field required; Master’s degree preferred.
  • 8+ years of experience in DevOps, SRE, or infrastructure roles, with 2+ years leading or managing technical teams.
  • Proven ability to lead technical teams through high-stakes, high-impact situations.
  • Strong communication skills with the ability to translate complex topics into clear stakeholder updates.
  • Strategic mindset with a bias for action and problem-solving.