OpenAI is looking to build a high-availability, multi-tenant cache platform that scales automatically with workload, minimizes tail latency, and supports a diverse range of use cases.
Requirements
- Deep experience in distributed caching systems (e.g., Redis, Memcached)
- Networking fundamentals
- Kubernetes-based service orchestration
- Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems
- Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning
Responsibilities
- Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences.
- Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost.
- Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs.
Other
- 5+ years of experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems
- Thrive in a fast-paced environment and enjoy balancing pragmatic engineering with long-term technical excellence
- Equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic