Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

LPL Financial Holdings Logo

Senior Site Reliability Engineer - Site Reliability Engineering

LPL Financial Holdings

$92,288 - $153,813
Sep 20, 2025
Austin, TX, USA • Fort Mill, SC, USA • San Diego, CA, USA
Apply Now

LPL Financial is seeking to drive the traceability and performance of business-critical transactions across multiple systems, ensuring system resilience and enhancing the advisor experience.

Requirements

  • 5+ years in observability, SRE, or related roles with a focus on transaction monitoring and tracing
  • Hands-on experience with tools like Dynatrace, ELK, Datadog, Splunk, Open Telemetry, Jaeger, or equivalent
  • Expertise in monitoring critical transactions in cloud environments (AWS, Azure, or Google Cloud)
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in scripting or programming languages (e.g., Python, Go, Java) for automation and integration.
  • Certifications: Dynatrace Associate or Professional Certification.
  • Experience with Open Telemetry and other observability standards.

Responsibilities

  • End-to-End Observability: Design and implement observability frameworks for end-to-end transaction traceability across microservices, APIs, databases, and third-party integrations. Leverage tools like Dynatrace, Open Telemetry, ELK, Grafana to trace transactions and visualize dependencies. Build actionable dashboards and alerts to provide real-time insights into transaction health and performance.
  • Performance Optimization: Monitor transaction latency, throughput, and error rates to identify bottlenecks and optimize performance. Use distributed tracing and telemetry data to analyze and resolve issues impacting transaction flows. Work with application and database teams to fine-tune configurations for better transaction efficiency
  • Collaboration & Governance: Partner with application teams, architects, and business stakeholders to define transaction observability and resiliency requirements. Develop and enforce standards for transaction monitoring and tracing across teams and environments. Provide training and guidance to teams on implementing best practices for observability and resiliency
  • Critical Transaction Resiliency: Identify and prioritize business-critical transaction flows across distributed systems. Develop strategies to ensure high availability and resilience for critical transactions. Implement failover mechanisms, redundancy strategies, and fault-tolerant designs for transaction paths. Collaborate with Site Reliability Engineering (SRE) and DevOps teams to conduct chaos engineering exercises to test resiliency.
  • Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical transaction paths.
  • Documentation & Reporting: Maintain comprehensive documentation of transaction flows, dependencies, and observability configurations. Provide regular reports on transaction health, performance trends, and resiliency improvements to leadership. Develop playbooks for handling transaction-related incidents and outages.
  • Achieve a 30% reduction in MTTD and MTTR within the first year of operation, demonstrating the effectiveness of the SRE capabilities, observability and self-healing

Other

  • Strong collaborators who can deliver a world-class client experience
  • Ability to thrive in a fast-paced environment
  • Client-focused and team-oriented
  • Ability to execute in a way that encourages creativity and continuous improvement
  • Bachelor's degree or equivalent experience