Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Boson AI Logo

HPC Engineer, AI/ML Infrastructure

Boson AI

Salary not specified
Nov 5, 2025
Santa Clara, CA, United States of America
Apply Now

The company is looking to manage and optimize its High Performance Computing (HPC) infrastructure, specifically a large GPU cluster, to support ML/research teams and ensure smooth operations as they scale.

Requirements

  • 5+ years of experience in HPC operations.
  • Proficiency in Linux systems administration (Ubuntu/Debian).
  • Experience with Kubernetes and container orchestration
  • Knowledge of security best practices in multi-tenant environments.
  • Understanding of L2/L3 networking fundamentals
  • Skilled in Python and Bash scripting.
  • Experience with infrastructure-as-code tools (Ansible/Terraform).

Responsibilities

  • Manage and optimize HPC cluster operations
  • Deploy and maintain infrastructure-as-code solutions
  • Support ML/research teams with cluster usage optimization
  • Operate, troubleshoot and optimize Ceph storage clusters.
  • Develop automation and tooling

Other

  • If you're a natural problem-solver with a passion for continuous learning, we'd love to hear from you.