Roku is seeking to scale and support its large data lake, which stores over 70 petabytes of data and runs over 10 million queries per month, by hiring a skilled engineer with exceptional DevOps skills to join the Big Data team.
Requirements
- Experience with Cloud infrastructure such as Amazon AWS, Google Cloud Platform (GCP), Microsoft Azure, or other Public Cloud platforms. GCP is preferred.
- Experience with at least 3 of the technologies/tools mentioned here: Big Data / Hadoop, Kafka, Spark, Airflow, Presto, Druid, Opensearch, HA Proxy, or Hive
- Experience with Kubernetes and Docker
- Experience with Terraform
- Strong background in Linux/Unix
- Experience with system engineering around edge cases, failure modes, and disaster recovery
- Experience with shell scripting, or equivalent programming skills in Python
Responsibilities
- Develop best practices around cloud infrastructure provisioning, disaster recovery, and guiding developers on the adoption
- Scale Big Data and distributed systems
- Collaborate on system architecture with developers for optimal scaling, resource utilization, fault tolerance, reliability, and availability
- Conduct low-level systems debugging, performance measurement & optimization on large production clusters and low-latency services
- Create scripts and automation that can react quickly to infrastructure issues and take corrective actions
- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects
- Collaborate and communicate with a geographically distributed team
Other
- Bachelor’s degree, or equivalent work experience
- 8+ years of experience in DevOps or Site Reliability Engineering
- Ability to work in a fast-paced environment
- Strong communication and collaboration skills
- Ability to take ownership and responsibility over new projects