Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Microsoft Logo

Principal Software Engineer - Azure Core

Microsoft

$163,000 - $296,400
Oct 31, 2025
Remote, US
Apply Now

Microsoft Azure Core is building the foundation for Microsoft's cloud services, focusing on infrastructure and advanced cloud platform technologies such as cloud-native applications, containerization (Kubernetes), site reliability engineering (SRE), and high-performance computing (HPC). We are developing next-generation artificial intelligence (AI) data centers to power large-scale training and inference. We seek experienced Principal Software Engineers who can design, bootstrap, and operate infrastructure at hyperscale.

Requirements

  • 8+ years technical engineering experience with coding in languages including, but not limited to, Go, Rust, Bash, or Python
  • 5+ year(s) experience building and managing data centers.
  • Bootstrapping and managing data center (DC) infrastructure, including device inventory, diagnosis, and repairs.
  • Networking and security expertise in high-performance computing, Remote Direct Memory Access (RDMA) over InfiniBand or RDMA over RoCE and eBPF.
  • Driver and firmware lifecycle management, including GPU diagnostics.
  • Storage and acceleration technologies for AI workloads, including distributed storage systems for multi-exabyte AI workloads and high-throughput data pipelines.
  • 1+ year(s) experience with Artificial Intelligence (AI) and Machine Learning (ML) job scheduling and orchestration at scale, using technologies such as Simple Linux Utility for Resource Management (SLURM), Ray, and Kueue.

Responsibilities

  • Provides technical leadership for the identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Leads by example and mentors others to produce extensible and maintainable code used across the company.
  • Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.
  • Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.
  • Bootstrapping and managing data center (DC) infrastructure, including device inventory, diagnosis, and repairs.
  • Networking and security expertise in high-performance computing, Remote Direct Memory Access (RDMA) over InfiniBand or RDMA over RoCE and eBPF.
  • Driver and firmware lifecycle management, including GPU diagnostics.

Other

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
  • Partners with appropriate stakeholders to determine user requirements for one or more complex scenarios.
  • Leverages deep subject-matter expertise of cross-product features with appropriate stakeholders (e.g., project managers) to lead multiple product's project plans, release plans, and work items.
  • We embrace inclusivity and diverse perspectives, using empathy, trust, and accountability to drive our culture and deliver solutions in an iterative manner.