Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Trellix Logo

Site Reliability Engineer

Trellix

Salary not specified
Sep 12, 2025
San Jose, CA, USA
Apply Now

Skyhigh Security is looking to solve the problem of maintaining a high availability production environment and improving the operational aspects of systems, such as monitoring, alerting, incident response, and vendor interactions.

Requirements

  • System admin experience on Linux environments.
  • Experience with end-to-end monitoring setup for infra and applications
  • Experience with Prometheus, Grafana, ELK, Opensearch, Cloudwatch, PagerDuty and other monitoring tools.
  • Solid experience with Cloud Technologies such as AWS and OCI.
  • Good experience with containerized workloads tools like Kubernetes.
  • Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is required.
  • Experience with BGP, NAT, TCP/IP, iBGP, Proxies, Cross connects.

Responsibilities

  • Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services.
  • Ensure all SRE and operating procedures are maintained and executed.
  • Maintain a 24x7 production environment with a high level of service availability and perform quality reviews, manage operational issues.
  • Perform root cause analysis for major incidents and drive the process by involving required stakeholders.
  • Perform problem management by analyzing metrics, alarms and dashboards to troubleshoot problem areas, report issues to assist in performance tuning and fault finding.
  • Implementation of proactive monitoring, alerting, trend analysis, and self-healing solutions.
  • Explore and innovate new technologies, features, and tools to improve the platform and automate operational tasks using Bash, Python or any other programming language.

Other

  • Bachelor’s degree in computer science, electrical engineering or a related area, with 7+ years of SRE experience in a large enterprise organization
  • Ability to work a flexible work schedule in a 24 x 7 environment with rotational shifts
  • Strong communication and analytical/problem-solving skills.
  • Systematic approach and to drive problems to resolution.
  • Paid Time Off