Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

MetroStar Logo

Site Reliability Engineer

MetroStar

Salary not specified
May 8, 2025
Washington, DC, USA
Apply Now

The company is looking to design, implement, and manage highly available and scalable systems, applying industry best practices and reliability engineering principles.

Requirements

  • Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible).
  • Proficiency in managing, leading, and engineering incident and outage response
  • Strong engineering experience in network protocols (e.g., TCP/IP, DNS, HTTP/HTTPS, Load Balancing, etc.)
  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash) and RPA (e.g. Blue Prism, UIPath) to automate tasks and develop tools.
  • Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker).
  • Expertise in implementing and managing monitoring and logging solutions (e.g., Splunk, Prometheus, Grafana, ELK stack).
  • Familiarity with CI/CD pipeline development and management (e.g., GitLab CI, Azure DevOps, AWS Lambda, Jenkins)

Responsibilities

  • Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance to meet defined service level objectives.
  • Design and implement monitoring, alerting, and incident response strategies to proactively identify and mitigate potential issues, ensuring uninterrupted service availability.
  • Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes.
  • Develop and maintain comprehensive documentation for system configurations, processes, and procedures.
  • Participate in on-call rotations and respond to incidents, working diligently to resolve issues and prevent recurrence.

Other

  • Possess an active Secret U.S. Government security clearance or higher
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Minimum of 3 years of professional experience in a Site Reliability Engineering role or similar capacity.
  • Strong problem-solving skills, with the ability to diagnose complex issues and implement effective solutions.
  • Excellent communication skills, with the ability to collaborate effectively across diverse teams.