The business problem involves ensuring high-quality and reliable software solutions across commercial and government sectors by proactively addressing customer concerns, managing escalations, and collaborating with engineering teams to enhance service delivery and customer satisfaction.
Requirements
- 2+ Experience managing customer relationships and working with customers to resolve complex issues; Ability to work effectively with customers, support teams, and engineering teams in high-paced environment.
- 2+ Experience developing software and/or services and cloud-based solutions with strong knowledge of managed services, including Exchange Online, Microsoft TEAMS, SharePoint, etc.
- 2+ Proven expertise with incident management processes, from initial triage through resolution and post-incident review.
- 2+ year(s) technical experience working with large-scale cloud or distributed systems.
- Proficiency in one or more programming languages (e.g., C, JavaScript).
- Preferred proficiency in AI technologies, including knowledge in using, designing and deploying intelligent agents and AI systems to automate and streamline business operations.
Responsibilities
- Lead and coordinate end-to-end incident response efforts, ensuring timely resolution, effective stakeholder communication, and continuous improvement of incident management processes.
- Leverages technical expertise, judgment, and decision making to coordinate multiple work streams and resources in crisis situations to drive mitigation plan and resolve, reduce, or mitigate the impact of a crisis by engaging necessary teams and escalating to appropriate stakeholders.
- Analyze data sets; review existing processes and tools; provide operational insights into customer experience; assess reliability quality in engineering; product teams.
- Support and improve tools and predictive models to enhance product development and operations and monitor their impact on operational metrics.
- Communicates customer impact and other relevant information with key stakeholders, leadership, and customers.
- Develops projects and programs to improve crisis response by creating standard practices (e.g., processes, standard operating procedures) for consistent response across engineering teams.
- Participate in on-call rotation.
Other
- Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, service engineering, or systems engineering OR equivalent experience.
- Ability to meet Microsoft, customer and/or government security screening requirements including Microsoft Cloud Background Check.
- Citizenship & Citizenship Verification: This role will require access to information that is controlled for export under export control regulations, and requires verification of U.S. citizenship, lawful permanent residency, or other protected status.
- Hybrid role based in Redmond, WA, requiring a minimum of three in-office days per week.
- Master's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 5+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls.