Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Kontakt.io Logo

Lead Software Engineer - SRE

Kontakt.io

Salary not specified
Aug 25, 2025
Gisborne, New Zealand / Aotearoa • New York, NY, US • Boston, MA, US
Apply Now

Kontakt.io is building a platform for care operations that aims to reduce waste, cut costs, and improve revenue by enhancing throughput, asset utilization, and staff productivity. The platform uses AI, RTLS, and EHR data to automate workflows and orchestrate care delivery. The company is looking for a Lead Software Engineer - SRE to ensure the reliability, scalability, and performance of this platform, which is critical for healthcare operations.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Engineering.
  • 5+ years of software engineering experience building production-grade systems (Java, Python, Go, or similar).
  • Deep expertise in cloud platforms (especially AWS), Kubernetes, and distributed system architecture.
  • Hands-on experience with monitoring, logging, and observability tools (Prometheus, OpenTelemetry, Datadog, etc.).
  • Extensive knowledge of CI/CD automation, GitOps workflows, and infrastructure-as-code (Terraform, Helm, ArgoCD).
  • A track record of leading major incident response and running postmortems with a blameless, learning-focused approach.
  • Strong understanding of networking, access control, and security within regulated environments (HIPAA, SOC 2).

Responsibilities

  • Lead the design and implementation of scalable, fault-tolerant, and self-healing infrastructure and services across AWS and Kubernetes.
  • Collaborate with Product, Engineering, and Infrastructure teams to align SRE initiatives with business priorities and platform needs.
  • Define and drive adoption of SLIs, SLOs, and SLAs to ensure consistent performance and high reliability across the platform.
  • Own and evolve observability strategies using Prometheus, OpenTelemetry, Grafana, and related tooling.
  • Design and maintain infrastructure as code (Terraform) and drive GitOps best practices.
  • Oversee major incident response and on-call practices, including incident reviews and long-term remediation planning.
  • Contribute to the long-term reliability roadmap and architecture of high-throughput, real-time systems in healthcare operations.

Other

  • Mentor and support the growth of SRE and platform engineers, fostering a culture of engineering rigor and operational excellence.
  • A leadership mindset—able to drive cross-functional alignment, lead initiatives, and mentor a high-performance SRE team.
  • Help scale the platform that care operations run on.