Roku is seeking a skilled engineer with exceptional DevOps skills to join their Big Data team to automate and scale Big Data and Analytics technology stacks on Cloud infrastructure, build CI/CD pipelines, set up monitoring and alerting for production infrastructure, and keep technology stacks up to date.
Requirements
- 8+ years of experience in DevOps or Site Reliability Engineering
- Experience with Cloud infrastructure such as Amazon AWS, Google Cloud Platform (GCP), Microsoft Azure, or other Public Cloud platforms. GCP is preferred.
- Experience with at least 3 of the technologies/tools mentioned here: Big Data / Hadoop, Kafka, Spark, Airflow, Presto, Druid, Opensearch, HA Proxy, or Hive
- Experience with Kubernetes and Docker
- Experience with Terraform
- Strong background in Linux/Unix
- Experience with system engineering around edge cases, failure modes, and disaster recovery
Responsibilities
- Develop best practices around cloud infrastructure provisioning, disaster recovery, and guiding developers on the adoption
- Scale Big Data and distributed systems
- Collaborate on system architecture with developers for optimal scaling, resource utilization, fault tolerance, reliability, and availability
- Conduct low-level systems debugging, performance measurement & optimization on large production clusters and low-latency services
- Create scripts and automation that can react quickly to infrastructure issues and take corrective actions
- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects
- Collaborate and communicate with a geographically distributed team
Other
- Bachelor’s degree, or equivalent work experience
- Collaborate and communicate with a geographically distributed team
- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects