Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

JP Morgan Chase Logo

Lead Site Reliability Engineer

JP Morgan Chase

Salary not specified
Sep 12, 2025
Plano, TX, USA • New York, NY, USA
Apply Now

JPMorgan Chase is looking to define the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.

Requirements

  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
  • Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, Micro services, etc.)
  • Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
  • Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Data dog, Splunk, etc.
  • Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform, etc.)
  • Experience with container and container orchestration (e.g., ECS, Kubernetes, Dockers, etc.)
  • Experience with troubleshooting common networking technologies and issues

Responsibilities

  • Leads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levels
  • Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
  • Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
  • Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
  • Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other SRE best practices - implementing these within an application or a platform
  • Documents and shares knowledge within your organization via internal forums and communities of practice
  • Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.

Other

  • Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
  • Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
  • Ability to expand and collaborate across different levels and stakeholder groups
  • Ability to identify and solve problems related to complex data structures and algorithms
  • Drive to self-educate and evaluate new technology - with ability to teach, train, and coach team members on current technology trends