Oversee and coordinate the ongoing stability and reliability of a mission-critical product, ensuring technical expertise is available to address high-severity incidents, outages, or critical failures.
Requirements
- 3-5+ years in technical operations, product support, product management, or engineering leadership.
- Strong technical problem-solving skills.
- Strong background in incident management and vendor coordination.
- Ability to quickly assess technical issues and allocate resources effectively.
- Background in systems design and experience with systems thinking.
- Experience with mission-critical product environments.
- Familiarity with monitoring, alerting, and incident response tools.
- Technical background in software development, infrastructure, or systems engineering.
Responsibilities
- Serve as primary coordination for product support escalations; organize technical resources during incidents and ensure post-incident reports are completed.
- Confirm points of contact, approvers, and witness roster; align access and change windows that affect deployment and testing.
- Maintain readiness plans, escalation procedures, and operational run-books; update after acceptance tests and changes.
- Interface directly with vendor partners and internal teams during support events.
- Set a single source of truth for integration, testing, safety, and operations documentation, including versioning, permissions, and retention.
- Approve repository structure and evidence-retention locations; keep the evidence index current and searchable.
- Ensure delivery and currency of the documentation set, including QA plan, as-builts, configuration baseline, backup and restore procedure, troubleshooting guide, service bulletins, O&M run-book, SAT plan, SAT results, and commissioning logs.
Other
- Excellent communication skills, with the ability to translate technical details for non-technical stakeholders.
- Experience working in product leadership.