Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Liberty Mutual Logo

Solutions Engineer - Software - Site Reliability Engineer

Liberty Mutual

Salary not specified
Sep 9, 2025
Boston, MA, USA • Indianapolis, IN, USA • Plano, TX, USA • Portsmouth, NH, USA • Columbus, OH, USA
Apply Now

Liberty Mutual is undergoing a transformational shift to redefine how people experience insurance by operating as a tech startup within a Fortune 100 company, leading a digital disruption that will redefine how people experience insurance—starting with the way we design, build, and operate reliable software at scale.

Requirements

  • Deep knowledge of containerization (Docker, Kubernetes/EKS), service mesh (Istio, Linkerd), and microservice architectures.
  • Practical experience with observability stacks (Datadog, Splunk).
  • Proficiency in at least one programming language (Python, Go, Java, TypeScript, or similar).
  • Familiarity with CI/CD systems (GitHub Actions, Azure DevOps, Jenkins) and release strategies (blue/green, canary, feature flags).
  • Hands-on exposure to chaos-engineering and resilience testing tools (Gremlin, ChaosMesh) and load/performance tools (k6, JMeter, LoadRunner).
  • Experience with incident management platforms (ServiceNow) and running blameless post-mortems.
  • Relevant certifications (AWS DevOps, Kubernetes, Observability platforms) are a plus.

Responsibilities

  • Lead the end-to-end delivery of reliability solutions that meet customer needs while aligning with technology guardrails and strategic roadmaps.
  • Define and implement SLOs, SLIs, and error-budget policies; integrate them with CI/CD pipelines and automated quality gates.
  • Design and build cloud-native reliability tooling—auto-scaling, self-healing, blue/green and canary release frameworks—leveraging AWS services (EKS, Lambda, Fargate, Auto Scaling, Route 53, CloudWatch).
  • Implement and extend observability platforms (metrics, logs, traces, events) using Datadog, SPLUNK, and AWS native services.
  • Drive Gen-AI/ML experimentation for anomaly detection, predictive scaling, and automated incident triage; transition validated prototypes into production platforms.
  • Champion infrastructure-as-code (Terraform, CloudFormation, CDK) and GitOps workflows to ensure repeatable, auditable changes.
  • Embed chaos engineering and resilience testing (Gremlin, Litmus, ChaosMesh, Fault Injection Simulator) into release pipelines.

Other

  • This position follows a hybrid work model (2 days onsite) and is open to candidates located in Portsmouth, NH; Boston, MA; Plano, TX; Indianapolis, IN; and Columbus, OH.
  • Mentor and coach engineers, fostering a culture of reliability, automation, and customer-centric thinking.
  • Strategic Partner – able to connect the dots between business outcomes, customer experience, and technical architecture.
  • Change Agent – skilled at leading by influence, facilitating consensus across dev, ops, product, SRE, and leadership stakeholders.
  • Strong communication, facilitation, consensus-building, and stakeholder-management skills.