Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

ServiceNow Logo

Principal Observability Architect

ServiceNow

Salary not specified
Sep 3, 2025
Orlando, FL, USA
Apply Now

The company is seeking a Principal Observability Architect to lead the strategic architecture, evolution, and operationalization of a modern, multi-tenant Observability Platform-as-a-Service (OPaaS) tailored for a hybrid on-prem and cloud-native SaaS product.

Requirements

  • Deep expertise in OpenTelemetry (including collector deployment, semantic conventions, sampling strategies).
  • Experience integrating observability in Kubernetes, microservices, and serverless ecosystems.
  • Hands-on with telemetry data pipelines using Cribl, Prometheus/VictoriaMetrics, and log/trace platforms.
  • Experience embedding telemetry validation in CI/CD workflows.
  • Familiarity with AI/ML for observability (anomaly detection, summarization, impact correlation).
  • Working knowledge of data privacy, retention, and compliance practices in observability.
  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving.

Responsibilities

  • Lead architecture and roadmap for a multi-region, multi-cloud, multi-tenant observability platform scalable across diverse customer environments and service boundaries.
  • Architect near real-time telemetry ingestion pipelines with low-latency guarantees (seconds) using a mix of streaming and batch processing technologies.
  • Define observability blueprints including telemetry SLAs, data contracts, tenant data isolation, and cost-aware retention strategies for high-cardinality data.
  • Ensure observability systems are cloud-native and container-aware, supporting environments built on Kubernetes, service meshes, and serverless components.
  • Design and implement real-time metrics, logs, traces, and event pipelines with technologies such as: VictoriaMetrics, Prometheus, Grafana, Alertmanager, Cribl Stream and Edge for dynamic routing and filtering, VictoriaLogs for structured log analysis.
  • Embed real-time anomaly detection and signal correlation, with context-aware alerting to reduce noise and MTTR.
  • Standardize OpenTelemetry instrumentation across all services with prebuilt SDKs, language libraries, and semantic conventions.

Other

  • 10+ years in DevOps, SRE, or Observability roles, including 5+ years in architecture or platform engineering.
  • Proven experience designing and operating near real-time observability systems in global-scale SaaS environments.
  • Lead cross-functional collaboration with SRE, Platform, Security, and Engineering teams to evolve observability maturity.
  • Define and document observability patterns, anti-patterns, and escalation workflows.
  • Drive internal R&D around OpenTelemetry, AI in observability, high-cardinality telemetry, and eBPF-based observability tooling.