Roku is looking to automate and scale its Big Data and Analytics technology stacks on Cloud infrastructure, build CI/CD pipelines, set up monitoring and alerting for production infrastructure, and keep technology stacks up to date.
Requirements
- Experience with Cloud infrastructure such as Amazon AWS, Google Cloud Platform (GCP), Microsoft Azure, or other Public Cloud platforms. GCP is preferred.
- Experience with at least 3 of the technologies/tools mentioned here: Big Data / Hadoop, Kafka, Spark, Airflow, Presto, Druid, Opensearch, HA Proxy, or Hive
- Experience with Kubernetes and Docker
- Experience with Terraform
- Strong background in Linux/Unix
- Experience with system engineering around edge cases, failure modes, and disaster recovery
- Experience with shell scripting, or equivalent programming skills in Python
Responsibilities
- Automating and scaling Big Data and Analytics technology stacks on Cloud infrastructure
- Building CI/CD pipelines
- Setting up monitoring and alerting for production infrastructure
- Keeping our technology stacks up to date
- Develop best practices around cloud infrastructure provisioning, disaster recovery, and guiding developers on the adoption
- Scale Big Data and distributed systems
- Collaborate on system architecture with developers for optimal scaling, resource utilization, fault tolerance, reliability, and availability
Other
- Bachelor’s degree, or equivalent work experience
- 8+ years of experience in DevOps or Site Reliability Engineering
- Experience with monitoring and alerting tools such as Datadog and PagerDuty, and being part of call rotations
- Experience with Chef, Puppet, or Ansible
- Experience with Networking, Network Security, and Data Security