Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Cohere Logo

Software Engineer, Internal Infrastructure (North America)

Cohere

Salary not specified
Oct 8, 2025
New York, NY, US • San Francisco, CA, US
Apply Now

Cohere is looking to build and operate world-class infrastructure and tools to train, evaluate, and serve their foundational models, aiming to scale intelligence to serve humanity by supporting AI researchers and accelerating the development of industry-leading AI models.

Requirements

  • Have deep experience running Kubernetes clusters at scale and/or scaling and troubleshooting Cloud Native infrastructure, including Infrastructure as Code
  • Have strong programming skills in Go or Python
  • Prefer contributing to Open Source solutions rather than building solutions from the ground up
  • You've previously worked with ML training infrastructure and GPU workloads and have familiarity with RDMA networking
  • You have expertise to support and troubleshoot low level Linux systems
  • You have experience collaborating with research teams or machine learning engineers

Responsibilities

  • Build and operate Kubernetes compute superclusters across multiple clouds
  • Partner with cloud providers to optimize infrastructure costs, performance, and reliability for AI workloads
  • Work closely with research teams to understand their infrastructure needs and identify ways to improve stability, performance, and efficiency of novel model training techniques
  • Design and build resilient, scalable systems for training AI models, focusing on creating intuitive user interfaces that empower researchers to self-serve to troubleshoot and resolve problems
  • Encourage software best practices across our company and participate in team processes such as knowledge sharing, reviews, and on-call

Other

  • All of our infrastructure roles require participating in a 24x7 on-call rotation, where you are compensated for your on-call schedule.
  • Are self-directed and adaptable, and excel at identifying and solving key problems
  • Draw motivation from building systems that help others be more productive
  • See mentorship, knowledge transfer, and review as essential prerequisites for a healthy team
  • Have excellent communication skills and thrive in fast-paced environments