Digital Realty is looking to solve the problem of maintaining and improving the reliability, performance, and availability of its systems and infrastructure by hiring a Manager - Network Observability Platform and Automation.
Requirements
- Strong technical background in distributed systems, cloud computing, and related technologies.
- Proven experience in managing and mentoring technical teams.
- Experience with monitoring, automation, and incident management.
- Understanding of SLOs, SLIs, and SLAs.
- Familiarity with DevOps and Agile practices.
- Expertise in Layer 3 routing (BGP, IS-IS, etc) and Layer 2 switching (802.1Q, STP, etc) protocols.
- Experience with virtual networking concepts such as EVPN, VXLAN, Open vSwitch.
Responsibilities
- Manage and mentor a team of SREs, fostering their growth and development.
- Oversee the design, implementation, and maintenance of reliable infrastructure and services.
- Collaborate with other teams to define requirements, standards, and best practices.
- Identify and address performance bottlenecks and ensure system stability.
- Implement and improve monitoring and observability frameworks.
- Manage on-call rotations and incident response to minimize downtime and ensure swift resolution.
- Drive automation efforts to reduce manual tasks and improve efficiency.
Other
- 10+ years of operations and engineering experience.
- 5+ years of team building and management.
- Bachelor’s degree in computer science (or equivalent training) preferred.
- Strong analytical and troubleshooting skills.
- Strong communication skills.