Arista is looking for an SRE to manage and operate its global CloudVision service fleet, ensuring its reliability, scalability, and security.
Requirements
- Experience developing or managing deployments of distributed database systems or scale out applications for a SaaS environment.
- CloudVision is deployed on Kubernetes across global regions using Spinnaker for our CI/CD pipeline.
- Our tech stack runs on GKE, using HBase/Hadoop as main distributed database and storage layer, ElasticSearch for powering search data, ClickHouse for fast real time queries of flow data, our own Kafka-based distributed real time stream processing layer for analytics, and TensorFlow for ML analysis.
- Our monitoring system is built on top of Prometheus, Grafana, Loki, and other OSS tools.
Responsibilities
- Building the CI/CD lifecycle for services, from inception and design to deployment and scaling
- Improving operational processes through automation
- Identifying key service indicators to be used in capacity planning
- Owning disaster recovery and management
- Driving infrastructure and cloud-based application security design
- Leading sustainable incident response and blameless postmortems
- Being an active member of our globally distributed on-call team
Other
- BS/MS degree in Computer Science or a relevant experience subject.
- 5+ years software engineering experience.