Building reliable, diagnosable, and scalable systems for the XPF (XStore on Pilotfish) platform, ensuring operational readiness and long-term sustainability.
Requirements
- coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
Responsibilities
- Investigate and resolve complex issues across hardware, firmware, and software layers.
- Drive root cause analysis and implement durable fixes to improve system reliability.
- Design and develop tools that reduce manual interventions in diagnostics, repair workflows, and node lifecycle management.
- Build and enhance diagnostics workflows to detect and isolate hardware and software faults.
- Ensure comprehensive diagnostics coverage for new hardware Stock Keeping Unit (SKUs) and evolving platform requirements.
- Identify and close operational and security gaps in the platform.
- Acts as a Designated Responsible Individual (DRI) working on-call to monitor system/product feature/service for degradation, downtime, or interruptions and gains approval to restore system/product/service for simple problems.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Works with appropriate stakeholders to determine user requirements for a set of features.
- Collaborate with engineering and support teams to ensure Storage’s requirements are met across all phases of the platform lifecycle.
- Drive alignment on diagnostics, telemetry, and repair strategies.