Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Microsoft Logo

Senior Software Engineer

Microsoft

$119,800 - $258,000
Oct 30, 2025
Remote, US
Apply Now

Microsoft's HPC/AI team is building the next-generation distributed AI supercomputer to enable breakthroughs in artificial intelligence by delivering unmatched computational power, scalability, and reliability. This role focuses on developing next-generation networking capabilities to ensure high performance, low latency, and minimal jitter for distributed AI workloads, enabling state-of-the-art AI systems to achieve their full potential.

Requirements

  • 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, OR Java, JavaScript, or Python
  • 2+ years of experience with network virtualization, software-defined networking (SDN), or network performance tuning.
  • Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink).
  • Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure.
  • Experience with telemetry and observability tools for network monitoring at scale.
  • Background in building scalable and fault-tolerant systems in large, distributed environments.
  • 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, OR Python OR Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python OR equivalent experience.

Responsibilities

  • Design, develop, and optimize networking solutions tailored for large-scale AI training infrastructure.
  • Architect and implement high-performance, low-latency, and low-jitter communication frameworks for distributed systems.
  • Benchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transfer.
  • Debug and resolve complex networking issues in large-scale, high-performance environments.
  • Drive identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Act as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.

Other

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
  • Microsoft is an equal opportunity employer.
  • If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.