Microsoft Azure Storage is facing the challenge of evolving traditional networking and storage endpoint infrastructure to support the massive scale and low-latency requirements of next-generation AI systems. The job aims to reimagine the network-to-storage data path and endpoint engineering to deliver ultra-low latency, massive bandwidth, and intelligent data movement, which is crucial for exascale AI training, inference, and data movement.
Requirements
- 5+ years of experience in designing, analyzing, and troubleshooting large-scale distributed systems
- 1+ year(s) experience with a passion for distributed systems and large-scale storage, experience with multi-threaded or parallel programming, and knowledge of modern network and transport layer protocols.
- 1+ year(s) experience demonstrating excellence in software engineering practices, coding with a solid foundation in data structures and algorithms, strong testing, debugging, and analytical skills, proven ability to plan, schedule, and deliver quality software, and a determined approach to reliability, performance, and architecture
- coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
Responsibilities
- Architect and build data paths that deliver order-of-magnitude improvements in throughput and latency.
- Own critical components of the data plane, from transport protocol design to host-side integration and kernel bypass mechanisms.
- Collaborate with hardware, operating system (OS), networking, and storage teams to deliver cohesive, end-to-end performance across compute and storage boundaries.
- Drive long-term strategy and technical direction for how Azure delivers data to AI workloads at scale.
- Design and implementation of high-performance, low-latency data paths connecting compute, storage, and networking.
- Building storage front end gateway with next-generation transport protocols optimized for AI-scale data movement.
- Re-architecting the Hypertext Transfer Protocol (HTTP) and storage access stack for minimal overhead and maximum throughput.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- personable and positive with high emotional intelligence, and driven to own cross-team initiatives and improvements to enhance Azure Storage.
- Mentors senior engineers and guide technical decision-making across multiple feature areas and partner teams.
- Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.