Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Wells Fargo Logo

Principal System Reliability Engineer

Wells Fargo

$159,000 - $305,000
Aug 29, 2025
Iselin, NJ, US • Charlotte, NC, USA • Chandler, AZ, USA
Apply Now

Wells Fargo is seeking a Principal Engineer to implement resiliency, observability, and operational automation for their applications, adopting a System Reliability Engineering practice for on-prem, hybrid, and native cloud environments. The role aims to improve production system availability and performance, reduce mean time to resolve incidents, and provide expert advice to leadership on technology strategy.

Requirements

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of experience leading observability and monitoring tooling
  • 7+ years in infrastructure (windows and Linux) support
  • 5+ years proven success in toil reduction initiatives
  • 5+ years in cloud application management
  • Ability to troubleshoot the full application stack, operating system stack and middleware
  • Deep understanding of java applications including how to read a thread dump and use of java flight recorder

Responsibilities

  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions.
  • Ensure high availability and performance of production systems through proactive monitoring and incident response.
  • Design and implement scalability, reliability, and observability strategies for cloud and on-premise environments.
  • Define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and Error Budgets to improve system reliability.
  • Will own and drive alarming, monitoring, toile reduction and overall risk reduction in the Financial Hardship Operations, Consumer Lending Operations and Unsecure Lending Operations Space.

Other

  • This position offers a hybrid work schedule - ability to work in office
  • This position is not eligible for Visa sponsorship
  • Relocation assistance is not available for this position
  • Strong communication with the ability to communicate on all levels of the organization
  • Ability to mentor the platform teams by training, documenting, certifying and building the team’s skill set