Optimum is seeking an IMS Core Lead Engineer – SRE to ensure the reliability, redundancy, and high performance of mission-critical platforms supporting Mobile & wireless (Wi-Fi) Core and other core network services, driving operational excellence and enabling progress through reliable, high-speed connectivity solutions.
Requirements
- Expert-level knowledge of telecom protocols: EoGRE, PMIP, RTP, SRTP, DIAMETER, RADIUS, IPSEC.
- Extensive hands-on experience with Cisco, Nokia, IMS cores, and mobile core networks (EPC, 5GC) and cloud infrastructure.
- Strong proficiency in Unix/Linux system administration, scripting (Python, Bash, Perl), and automation/orchestration tools (Ansible, Kubernetes, Docker).
- Demonstrated experience in designing and leading the operation of comprehensive monitoring systems using Netscout and Grafana dashboards.
- Expert-level IP networking skills: BGP, OSPF, VLANs, NAT, VPNs, ACLs, QoS.
- Proven track record of leading incident management and driving RCA-driven improvements.
- Experience with NFV and cloud-native network functions in telecom environments.
Responsibilities
- Lead the architectural design and strategic implementation of reliable, resilient, and redundant systems that consistently meet or exceed availability targets.
- Direct the full lifecycle management of core platforms, including hardware/software upgrades, migrations, and capacity planning to ensure business continuity.
- Define and implement standards for network access, QoS, and traffic management policies.
- Lead the architectural design and strategic conversations with vendors relates to 5GSA to buils a reliable, resilient, and redundant systems that consistently meet or exceed availability targets.
- Establish and enforce SLIs, SLOs, and SLAs for all core platforms, ensuring a data-driven approach to service reliability.
- Champion a culture of automation by leading the development of tools for provisioning, monitoring, scaling, failover, and recovery, significantly reducing manual intervention and downtime.
- Lead the incident response process during critical outages, guiding root cause analysis (RCA) and directing the implementation of long-term remediation efforts to prevent recurrence.
Other
- 7+ years in telecom engineering roles, with 3+ years in a leadership or senior-level role applying SRE or automation practices in production environments.
- Participation in 24/7 on-call rotations.
- Availability for after-hours maintenance and urgent service restoration activities.
- Ability to work in high-pressure, high-reliability production environments.
- The ideal candidate for this role is not just a subject matter expert but a leader who can drive strategic change.