Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Software Engineer- AI Hardware

Bloomberg

$160,000 - $240,000

Oct 27, 2025

New York, NY, United States of America

Bloomberg is seeking an engineer to join their hardware management team to manage and support thousands of servers, including the entire AI stack, ensuring peak performance and reliability of HPC/AI clusters.

Requirements

4+ years of proficiency in Kubernetes environments (deployments, storage, services, jobs, ingress, egress, etc)
Hands-on management of GPU-based systems, including kernel and driver management, and developing software tooling to automate provisioning and maintenance of these systems.
Design, implemented, and maintained system software that enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems
Oversee the ongoing monitoring, support, and maintenance of our HPC/AI clusters, ensuring peak performance and reliability
Drive system upgrades, customization, and seamless integration with software developers, network operations, and data center teams
Manage and maintain a diverse range of computer systems and application software, ensuring they meet the highest standards of functionality and efficiency
Develop and maintain expertise in low-latency/high-bandwidth, interconnected infrastructure (including InfiniBand, Ethernet, RDMA/RoCE, and others)

Responsibilities

Design, build, and maintain highly reliable, scalable, and efficient infrastructure platforms that support our engineering teams and business needs.
Participate in system design discussions and contribute to architectural decisions
Ensure code quality through standard methodologies, code reviews, and alignment to clean code principles
Be able to produce clear and consumable documentation for a wide audience
Hands-on management of GPU-based systems, including kernel and driver management, and developing software tooling to automate provisioning and maintenance of these systems.
Design, implemented, and maintained system software that enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems
Oversee the ongoing monitoring, support, and maintenance of our HPC/AI clusters, ensuring peak performance and reliability

Other

Communicate effectively across diverse teams
Be willing to participate in on-call rotations as arranged
Be a self starter, manage priorities, and work independently
Stay up-to-date with the latest infrastructure technologies, and industry standard processes, and evaluate their potential impact on existing and future solutions
Hold yourself to high standards