Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

S&P Global  Logo

Site Reliability Engineer

S&P Global

$90,000 - $122,000
Aug 30, 2025
Princeton, NJ, USA • New York, NY, USA
Apply Now

The IT Operations team at S&P Dow Jones Indices (S&P DJI) needs to ensure the high availability of its Production IT systems that underpin S&P DJI's index platforms and applications. This role will focus on designing, implementing, and managing end-to-end observability using Datadog and related tools to maintain and improve service availability, respond to incidents, and enhance support processes.

Requirements

  • Proven expertise in Datadog APM, DBM, logging, and infrastructure monitoring.
  • Strong programming skills in Java and Python.
  • Hands-on experience with AWS, including operational management of core services.
  • Experience with CI/CD pipelines and container orchestration technologies.
  • Familiarity with ITSM tools (ServiceNow, PagerDuty).
  • Understanding of observability best practices, log correlation, and distributed tracing.
  • Datadog certifications (APM, Logs, Fundamentals).

Responsibilities

  • Design, implement, and manage end-to-end observability using Datadog APM, DBM, log pipelines, synthetic monitoring, and AI-driven alerting.
  • Maintain production monitoring, respond to incidents, and lead root cause analysis using Datadog, Splunk, and ELK.
  • Enhance automation and testing frameworks using Java, Spring Boot, Selenium, Cucumber, Playwright, and Jenkins.
  • Operate AWS services including EC2, ECS, RDS, S3, DynamoDB, and Secrets Manager.
  • Contribute to CI/CD practices and containerization technologies.
  • Integrate monitoring with PagerDuty and ServiceNow for incident workflows.
  • Participate in post-incident reviews, disaster recovery testing, and SRE process improvements.

Other

  • 4 years of experience in SRE, DevOps, or platform engineering roles.
  • Bachelor's degree in Computer Science or similar field of study
  • Excellent troubleshooting, documentation, and communication skills.
  • Exposure to other monitoring tools like Splunk, Dynatrace, or ELK.
  • Knowledge of Agile/Scrum and globally distributed team collaboration.