OpenAI's Hardware organization needs to build and optimize the low-level stack for their supercomputing clusters to orchestrate computation and data movement for advanced AI workloads, co-designing hardware tightly integrated with AI models.
Requirements
- Proficient in systems programming (e.g., Rust, C++) and scripting languages like Python.
- Have experience in one or more of the following areas: compiler development, kernel authoring, accelerator programming, runtime systems, distributed systems, or high-performance simulation.
Responsibilities
- Design and build APIs and runtime components to orchestrate computation and data movement across heterogeneous ML workloads.
- Contribute to compiler infrastructure, including the development of optimizations and compiler passes to support evolving hardware.
- Engineer and optimize compute and data kernels, ensuring correctness, high performance, and portability across simulation and production environments.
- Profile and optimize system bottlenecks, especially around I/O, memory hierarchy, and interconnects, at both local and distributed scales.
- Develop simulation infrastructure to validate runtime behaviors, test training stack changes, and support early-stage hardware and system development.
- Rapidly deploy runtime and compiler updates to new supercomputing builds in close collaboration with hardware and research teams.
- Work across a diverse stack, primarily using Rust and Python, with opportunities to influence architecture decisions across the training framework.
Other
- This role is based in San Francisco, CA, with a hybrid work model (3 days/week in-office).
- Relocation assistance is available.
- Have a deep curiosity for how large-scale systems work and enjoy making them faster, simpler, and more reliable.
- Are excited to work in a fast-paced, highly collaborative environment with evolving hardware and ML system demands.
- Value engineering excellence, technical leadership, and thoughtful system design.