Solera Health is looking to solve the problem of production issues in their platform, which provides a curated marketplace of digital and community solutions for chronic conditions, by hiring a Software Reliability Engineer to join a SWAT-style team dedicated to rapid triage and resolution of production issues.
Requirements
- 2+ years of professional experience in software engineering, production support, or incident response.
- Strong proficiency in JavaScript/TypeScript, including debugging live applications and services.
- Experience tracing data through SQL and NoSQL databases.
- Familiarity with Azure or GCP environments.
- Proven ability to analyze and stabilize distributed or microservice-based systems.
- Excellent communication skills — able to translate technical findings into clear, actionable explanations.
- Experience with event-driven architectures, message queues, or streaming systems.
Responsibilities
- Lead incident response efforts to rapidly diagnose and resolve production issues across distributed systems.
- Use observability tools such as Dynatrace and Azure Application Insights to pinpoint root causes and verify fixes.
- Partner with engineers across the stack to trace issues through APIs, microservices, and data layers in low-documentation environments.
- Write and execute targeted automated tests (Jest, Cypress, Playwright) to confirm resolutions and prevent regressions.
- Clearly communicate root causes and resolution plans to both technical and non-technical stakeholders.
- Partner with platform and DevOps teams to improve monitoring, alerting, and deployment practices.
- Contribute to stability and reliability improvements — not feature development or infrastructure provisioning.
Other
- U.S. work authorization required; visa sponsorship not available.
- Remote-first within the United States.
- Occasional travel for team meetings.
- 2+ years of professional experience
- Excellent communication skills