The company is looking to improve the scalability, reliability, and performance of its large-scale cloud platform by hiring a Senior DevOps Engineer.
Requirements
- Strong cloud expertise in Google Cloud Platform (preferred) or AWS
- Infrastructure as Code – Deep knowledge of Terraform, Ansible, or equivalent tools
- Proficiency in scripting/programming with Python, Go, or Bash
- Containerization & Kubernetes – Experience managing large-scale Kubernetes environments and deploying workloads in GKE, EKS, or similar
- CI/CD & Automation – Hands-on experience with CI/CD tools such as Jenkins, GitLab CI, or ArgoCD
- Linux Administration – Strong knowledge of Linux systems and shell scripting
- Networking & Security – Understanding of cloud networking, IAM, firewalls, and security best practices
Responsibilities
- Cloud Infrastructure – Architect, deploy, and manage cloud-native solutions in Google Cloud Platform (GCP) to ensure scalability, reliability, and performance
- Automation & Infrastructure as Code – Develop Infrastructure-as-Code using Terraform, automate configuration management with tools like Ansible, and drive adoption of GitOps practices
- CI/CD Pipelines – Build, maintain, and optimize CI/CD pipelines using Jenkins, GitLab CI, or similar tools to enable rapid, reliable software delivery
- Containerization & Orchestration – Deploy and manage containerized workloads using Kubernetes (GKE preferred) and Docker, ensuring resilience and security
- Monitoring & Performance Optimization – Implement monitoring, logging, and alerting solutions to proactively identify and resolve performance issues
- Security & Compliance – Ensure security best practices are followed across infrastructure, networking, and application deployment
- Collaboration – Work closely with development teams to integrate DevOps practices, troubleshoot production issues, and improve operational efficiency
Other
- 5+ years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles
- US Citizen or Permanent Resident
- Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment
- Participate in on-call support to maintain platform reliability and resolve production incidents