Accelerate the progress of research towards AGI by building core systems that researchers rely on, including low-level infrastructure components and research-facing custom applications, at OpenAI
Requirements
- Strong proficiency in Python/Rust and backend software development, ideally in large codebases
- Experience with distributed systems and scalable data processing infrastructure, including technologies like Kafka, Spark, Trino/Presto, Iceberg
- Hands-on experience operating services in Kubernetes, with familiarity in tools like Terraform and Helm
- Comfort working across the stack - from low-level infrastructure components to application logic - and making trade-offs to move quickly
- A focus on building systems that are both technically sound and easy for others to use
- Curiosity and adaptability in fast-changing environments, especially in high-growth orgs
Responsibilities
- Design, build, and operate scalable backend systems that support various ML research workflows, including observability and analytics
- Develop reliable infrastructure that supports both streaming and batch data processing at scale
- Creating internal-facing tools and applications as needed
- Debug and improve performance of services running on Kubernetes, including operational tooling and observability
- Collaborate with engineers and researchers to deliver reliable systems that meet real-world needs in production
- Help improve system reliability by participating in the on-call rotation and responding to critical incidents
Other
- Bachelor's degree or higher in Computer Science or related field
- 3 days in the office per week, with relocation assistance to new employees
- Must be eligible to work in the US
- Background checks for applicants will be administered in accordance with applicable law
- Committed to providing reasonable accommodations to applicants with disabilities