Zscaler is looking to enhance its cloud security platform by developing agentic AI features and resilient services to protect enterprise customers from cyberattacks and data loss.
Requirements
- Proven experience in the full ML model lifecycle— building, deployment, monitoring, and optimization
- Hands-on experience with modern GenAI stacks (e.g., LangChain/LangGraph, CrewAI, vector stores, RAG, prompts/memory, evaluators)
- Prior delivery of agentic/LLM systems in production, including context engineering and memory management strategies
- Experience fine-tuning and serving LLMs/SLMs at scale (latency, cost, safety, evals)
- Solid distributed systems fundamentals (APIs, queues, caching, concurrency) and cloud experience (Docker/Kubernetes, CI/CD; AWS/GCP/Azure)
Responsibilities
- Build agentic AI features: implement tool-use workflows (planning, memory, context), retrieval/RAG, and evaluators; harden for scale and reliability
- Ship resilient services: design and operate microservices, data/feature pipelines, and low-latency inference paths with solid observability
- Operate with excellence: add tests, monitors, tracing, dashboards; participate in on-call with an automation-first mindset
Other
- Hybrid role reporting in to our San Jose office 3 days a week.
- Reporting to the VP of Software Engineering
- Named a Best Workplace in Technology by Fortune and others, Zscaler fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry.
- If you thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good, come make your next move with Zscaler.
- By applying for this role, you adhere to applicable laws, regulations, and Zscaler policies, including those related to security and privacy standards and guidelines.