Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Bloomberg Logo

Senior Software Engineer- AI Hardware

Bloomberg

$160,000 - $240,000
Oct 27, 2025
New York, NY, United States of America
Apply Now

Bloomberg is seeking an engineer to join their hardware management team to manage and support thousands of servers, including the entire AI stack, ensuring peak performance and reliability of HPC/AI clusters.

Requirements

  • 4+ years of proficiency in Kubernetes environments (deployments, storage, services, jobs, ingress, egress, etc)
  • Hands-on management of GPU-based systems, including kernel and driver management, and developing software tooling to automate provisioning and maintenance of these systems.
  • Design, implemented, and maintained system software that enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems
  • Oversee the ongoing monitoring, support, and maintenance of our HPC/AI clusters, ensuring peak performance and reliability
  • Drive system upgrades, customization, and seamless integration with software developers, network operations, and data center teams
  • Manage and maintain a diverse range of computer systems and application software, ensuring they meet the highest standards of functionality and efficiency
  • Develop and maintain expertise in low-latency/high-bandwidth, interconnected infrastructure (including InfiniBand, Ethernet, RDMA/RoCE, and others)

Responsibilities

  • Design, build, and maintain highly reliable, scalable, and efficient infrastructure platforms that support our engineering teams and business needs.
  • Participate in system design discussions and contribute to architectural decisions
  • Ensure code quality through standard methodologies, code reviews, and alignment to clean code principles
  • Be able to produce clear and consumable documentation for a wide audience
  • Hands-on management of GPU-based systems, including kernel and driver management, and developing software tooling to automate provisioning and maintenance of these systems.
  • Design, implemented, and maintained system software that enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems
  • Oversee the ongoing monitoring, support, and maintenance of our HPC/AI clusters, ensuring peak performance and reliability

Other

  • Communicate effectively across diverse teams
  • Be willing to participate in on-call rotations as arranged
  • Be a self starter, manage priorities, and work independently
  • Stay up-to-date with the latest infrastructure technologies, and industry standard processes, and evaluate their potential impact on existing and future solutions
  • Hold yourself to high standards