JPMorgan Chase is looking to define the future of site reliability engineering within their Commercial and Investment Bank Technology division by hiring a Senior Lead Site Reliability Engineer to address complex technical and business issues, improve system resilience, and drive operational efficiency.
Requirements
- Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform
- Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
- Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
- Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
- AWS Cloud experience across multiple areas
Responsibilities
- Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance
- Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
- Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations
- Evolves and debug critical components of applications and platforms
- Employ AI-driven solutions to streamline processes and enhance operational efficiency.
- Utilize data-driven analytics and AI technologies to automate detection, diagnosis, resolution processes, elevate service levels and drive continuous improvement.
- Serve as the primary contact during major incidents, demonstrating the ability to swiftly identify and resolve issues to prevent financial losses.
Other
- Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
- hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them.
- Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
- Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues
- Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team