Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Open AI Logo

Software Engineer, Fleet Management

Open AI

Salary not specified
Nov 19, 2025
San Francisco, CA, US
Apply Now

OpenAI needs to build systems to manage large-scale computing environments for AI research and product development, ensuring high availability, performance, and efficiency while streamlining the research user experience.

Requirements

  • strong software engineering skills with experience in large-scale infrastructure environments.
  • broad knowledge of cluster-level systems (e.g., Kubernetes, CI/CD pipelines, Terraform, cloud providers).
  • deep expertise in server-level systems (e.g., systemd, containerization, Chef, Linux kernels, firmware management, host routing).

Responsibilities

  • Design and build systems to manage both cloud and bare-metal fleets at scale.
  • Develop tools that integrate low-level hardware metrics with high-level job scheduling and cluster management algorithms.
  • Leverage LLMs to coordinate vendor operations and optimize infrastructure workflows.
  • Automate infrastructure processes, reducing repetitive toil and improving system reliability.
  • Collaborate with hardware, infrastructure, and research teams to ensure seamless integration across the stack.
  • Continuously improve tools, automation, processes, and documentation to enhance operational efficiency.

Other

  • This role is based in San Francisco, CA.
  • We use a hybrid work model of 3 days in the office per week.
  • offer relocation assistance to new employees.
  • passionate about optimizing the performance and reliability of large compute fleets.
  • Thrive in dynamic environments and are eager to solve complex infrastructure challenges.