Microsoft is looking to solve the problem of creating a next generation of computing power for every person and every organization around the world by connecting countless Graphics Processing Units (GPUs) and Central Processing Units (CPUs) with high-speed optical links, making the network that enables this new level of human civilization accessible, reliable, self-healing, and ubiquitous.
Requirements
- 3+ years of experience lab testing and bringing designs to production
- Experience with optical networking technologies, including optical direct-detection and coherent technologies
- Experience with automation tools and scripting languages
- Experience with data analytics and telemetry
- Experience with cloud-based networks and Azure
- Experience with AI and machine learning technologies
- Experience with network design, development, and automation
Responsibilities
- Optical Network Design for AI Systems: Collaborate with key stakeholders to ascertain customer requirements specific to optical networking for AI systems. Lead the design and development of specialized optical network components, focusing on optical direct-detection and coherent technologies, to meet both customer and technical requirements. Create comprehensive design documents that detail the architectural blueprint for optical solutions optimized for AI applications.
- Testing and Validation: Conduct rigorous testing and validation of optical systems and components. Utilize automation tools to establish test cases, and identify gaps in test coverage, ensuring the transceivers' high-performance compatibility with AI applications.
- Code Implementation and Optimization: Develop, optimize, debug, and refactor code tailored for the management and automation of optical networks, focusing on scalability and efficiency in AI-driven network environments.
- Project and Feature Leadership: Utilize your in-depth expertise in coherent and direct-detection optical technologies to steer project and release plans in coordination with project managers and stakeholders. Make sure these plans align with the overarching objectives of optical networking optimized for AI systems.
- Operational Readiness and Incident Management: Act as the Designated Responsible Individual (DRI) during on-call rotations, efficiently responding to incidents with significant customer or business impact. Assess the level of impact, troubleshoot issues, and deploy targeted fixes to resolve root causes. Coordinate with multiple teams, from product owners to engineers, to facilitate timely incident resolution. Implement automation strategies to prevent recurrence of issues. Escalate complex or ambiguous problems as needed, and contribute to postmortem reviews, sharing insights on incidents and their resolutions
- Telemetry and Data Analysis for Optical Networks: Monitor network telemetry designed specifically for AI-optimized optical networks. Leverage data analytics to detect patterns, errors, and unexpected issues, using these insights to make informed decisions on product and service improvements
- Continuous Improvement and Trend Adaptation: Stay abreast of the latest trends and technologies in optical networking and AI. Seek out new knowledge and techniques that can improve the availability, reliability, efficiency, and scalability of AI-optimized optical networks, adapting new solutions and best practices accordingly.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements
- Doctorate Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field
- OR Master's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field AND 3+ years technical experience in network design, development, and automation
- OR Bachelor's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field AND 4+ years technical experience in network design, development, and automation
- OR equivalent experience.