CrowdStrike is looking for an SRE Manager to lead a team of engineers in a production environment with tens-of-thousands of bare metal and virtual compute nodes. The role involves managing day-to-day operations, infrastructure management, and operational excellence, including live-site response, monitoring, provisioning, and systems administration.
Requirements
- 7+ years of engineering experience, preferably in a production role
- 2+ years of hands-on management experience leading engineering teams.
- Experience leading teams working with one or more of the following technologies: Linux, VMWare, FreeBSD, Storage Area Networks
- Proficiency in hybrid/on-prem cloud environments
- Deep understanding of distributed systems and reliability engineering principles
- Solid design and problem solving skills with demonstrated passion for engineering excellence, quality, security and performance
Responsibilities
- Manage and mentor a team of SRE engineers
- Oversee 24/7 monitoring and incident response for production systems
- Drive SLI/SLO definition and monitoring across services
- Lead post-incident reviews and implement preventive measures
- Ensure compliance with security and regulatory requirements
- Develop and execute reliability roadmaps aligned with business objectives
- Capacity planning and infrastructure scaling strategies
Other
- Bachelor's degree in Computer Science or related field, or equivalent work experience.
- Demonstrated success in working across organizational boundaries to drive complex technical initiatives
- Strong cross-group collaboration and interpersonal communication skills working with variety of roles including engineering, product management, project management, etc
- Experience leading distributed teams in a remote-first environment
- Strategic thinking and ability to translate business objectives into technical roadmaps