OpenAI is looking to build a high-availability, multi-tenant cache platform that scales automatically with workload, minimizes tail latency, and supports a diverse range of use cases.
Requirements
- Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning.
- Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems.
- Experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems.
- Knowledge of networking fundamentals.
Responsibilities
- Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences.
- Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost.
- Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs.
Other
- 5+ years of experience building and scaling distributed systems.
- Thrive in a fast-paced environment and enjoy balancing pragmatic engineering with long-term technical excellence.