Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Microsoft Logo

Senior Software Engineer

Microsoft

$119,800 - $258,000
Oct 3, 2025
Remote, US
Apply Now

Microsoft Azure AI/HPC team is looking for software engineers to enable customers in deploying, monitoring, profiling, and debugging their application on hyperscale cloud infrastructure. At this supercomputing scale, specialized tools and techniques are needed to maintain reliability, runtime performance, system health, and running jobs to meet customer SLAs. The goal is to build and use state-of-the-art cloud applications and services to find operational gaps and instrument features for smooth operation and management of cloud-native supercomputers, impacting business goals and facilitating growth in AI and HPC in the cloud.

Requirements

  • 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, OR Java, JavaScript, or Python
  • 3+ years of experience in operating AI/HPC systems, developing and running AI/HPC applications on clusters, or operating Cloud Infrastructure.
  • 2+ years of specialized experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure.
  • 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, OR Python
  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
  • 1+ years previous experience with running and troubleshooting machine learning workloads on GPU-based HPC systems.
  • 1+ years experience with Cloud Computing, Virtualization and Container Technologies.
  • Familiarity with the HPC software stack.

Responsibilities

  • Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.
  • Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.
  • Collaborates with appropriate stakeholders to determine user requirements for a scenario.
  • Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Leverages subject-matter expertise of product features and partners with appropriate stakeholders (e.g., project managers) to drive a workgroup's project plans, release plans, and work items.

Other

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
  • Microsoft will accept applications for the role until October 13, 2025.