Zscaler is looking to solve the business and technical problem of securely connecting users, devices, and applications in any location by leveraging AI and a cloud-first strategy, and to disrupt the cybersecurity market through AI by processing billions of transactions and generating trillions of data points daily.
Requirements
- 10+ years of experience in Site Reliability Engineering, cloud infrastructure, and/or applications architecture, with a strong foundation in Kubernetes and Docker
- Proven programming expertise in Python, SQL, and distributed processing technologies such as Spark, BigQuery, or Apache Beam
- Hands-on experience building and maintaining CI/CD pipelines, leveraging infrastructure-as-code tools like ArgoCD, Terraform, or similar
- Strong knowledge of cloud platforms (AWS preferred, GCP acceptable), including certification or equivalent skills specific to cloud-native system management
- Working knowledge of AI/ML pipelines and frameworks (e.g., SkyPilot, mobile ML training) and experience with GPU-optimized cloud infrastructure
- Experience with SQL/NoSQL databases, ML automation platforms, and tools for full production lifecycle of AI-based products
Responsibilities
- Architect, build, and maintain large-scale distributed systems to support end-to-end AI pipelines, including data collection, feature engineering, model training, evaluation, deployment, and real-time serving
- Act as the owner of Site Reliability Engineering (SRE) for AI-driven applications deployed on AWS, ensuring performance, availability, observability, and scalability
- Collaborate with the engineering team to design and implement CI/CD pipelines, infrastructure provisioning, scripting automation for deployment and customer-facing services, robust monitoring frameworks using tools and techniques for real-time statistics and performance tracking across production systems
- Drive innovation and best practices in integrating Kubernetes, ArgoCD, and similar tools into cloud environments, with a focus on AI/ML pipelines and GPU-based cloud structures (e.g., SkyPilot)
- Serve as the group's FinOps expert and AWS admin, taking ownership of hosting cost optimization and all administrative aspects of the AWS account for ZAIRe
Other
- This position is hybrid, based in our New Jersey office three days a week. Exceptional remote candidates will also be considered.
- Bachelor's degree in Computer Science, Engineering, or a related field
- Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, or related field, with a demonstrated ability to lead projects and innovate quickly in a fast-paced environment
- Named a Best Workplace in Technology by Fortune and others, Zscaler fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry. If you thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good, come make your next move with Zscaler.
- Zscaler is committed to providing equal employment opportunities to all individuals.