Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Rivian Logo

Senior Site Reliability Engineer - Observability

Rivian

$146,900 - $194,610
Sep 22, 2025
Palo Alto, CA, USA
Apply Now

Rivian and Volkswagen Group Technologies is looking to solve challenges in automotive's next chapter by developing technology for software-defined vehicles, specifically focusing on operating systems, zonal controllers, and cloud/connectivity solutions. The Senior SRE role is crucial for ensuring the health, performance, and reliability of their production environment through robust observability systems.

Requirements

  • Proficiency in designing and operating observability platforms with tools like Prometheus, Grafana, Loki, Jaeger, or Datadog.
  • Experience with OpenTelemetry and distributed tracing in microservices architectures.
  • Deep knowledge of Kubernetes (e.g., EKS), ArgoCD, and Crossplane.
  • Strong proficiency in Python, Go, or similar languages for building automation and custom telemetry solutions.
  • Familiarity with multi-cloud setups, containerization (Docker), and Linux system fundamentals.

Responsibilities

  • Observability Platform Design: Architect, implement, and maintain observability systems, leveraging tools like Datadog, LGTM stack, OpenTelemetry, and Vector to enable real-time performance monitoring, logging, and alerting.
  • Telemetry Optimization: Evolve and scale telemetry pipelines to ensure low latency and high availability for metrics, logs, and traces across multi-cloud environments.
  • Performance Engineering: Proactively identify performance bottlenecks, optimize systems, and provide recommendations for reliability improvements.
  • Scalable Automation: Implement automation solutions to scale systems sustainably while driving improvements in reliability and deployment velocity.
  • Incident Management: Collaborate with the incident response team to establish data-driven debugging and troubleshooting processes using observability data.
  • Tooling Development: Create and maintain self-service observability tools and dashboards to empower teams across the organization.
  • Cross-functional Collaboration: Partner with development, DevOps, and infrastructure teams to define SLOs/SLIs and ensure observability is embedded throughout the software lifecycle.

Other

  • 5+ years in Site Reliability Engineering or a related role with a strong emphasis on observability.
  • Exceptional problem-solving, communication, and a data-driven approach to decision-making.
  • Equal Opportunity Employer statement
  • Commitment to ensuring hiring process accessibility for persons with disabilities.
  • Candidate Data Privacy statement regarding collection, use, and disclosure of personal information.