Arista Networks is looking to drive systems reliability and scalability to provide the best possible development experience for their 2000+ person engineering team
Requirements
- Knowledge of one or more of Go, Python, Javascript, Shell Scripting
- Knowledge of Linux (or UNIX)
- Experience operating and managing software systems at scale
- Strong understanding of the fundamentals of storage and networking
- Comfortable with Ansible and GitOps
- Applied understanding of software engineering principles
- Strong problem solving and software troubleshooting skills
Responsibilities
- Keeping the production status green all the time
- Proactively monitor, respond to, and enhance alerts
- Build automated responses to the most common alerts or work with the rest of the EngProd team to build them
- Create and maintain the incident response runbooks working with the service dev teams
- Debug and resolve issues impacting developer user experience and infrastructure stability
- Develop patterns to support system reliability and socialize them within the EngProd team
- Review and contribute to the specifications and implementations written by other team members
Other
- At least BS Computer Science or Engineering + 5 years’ experience, MS Computer Science or Engineering + 3 years’ experience, or equivalent work experience
- Ability to design a solution and implement features independently
- Ability to work in small teams
- Collaborate and work with other engineers to design, build, scale, and operate the systems
- Provide support for our tools and infrastructure to Arista’s development team