The IC2 Site Reliability Engineer supports daily operations for a secure, large-scale OCI-based cloud environment powering mission-critical federal government workloads
Requirements
- Hands-on experience with Linux systems administration
- Scripting ability with Python or Bash
- Understanding of basic cloud concepts (networking, compute, identity, observability)
- Exposure to Oracle Cloud Infrastructure (OCI) or other major cloud platforms
- Familiarity with Infrastructure-as-Code tools such as Terraform or Ansible
- Experience supporting production systems or participating in on-call rotations
- Understanding of security best practices within classified environments
Responsibilities
- Perform routine operational tasks such as deployments, patching, fleet maintenance, and basic troubleshooting for cloud-based systems
- Tune team-specific alarms and thresholds, escalate incidents appropriately, and support the management of metrics, KPIs, and system health dashboards
- Participate in incident response by quickly triaging and escalating incidents, executing operational playbooks, and documenting issues for senior review
- Serve as a technical support point of contact, troubleshooting and resolving technical issues, assisting customers with environment setup and debugging, and providing timely communication and status updates to customers and internal teams
- Own, maintain, and improve runbooks to ensure consistency and clarity for operational processes
- Implement defined enhancements to existing tools, documentation, and monitoring solutions
- Collaborate closely with other team members and escalate complex issues for further investigation and resolution
Other
- U.S. Citizenship and possess and maintains TS/SCI w/Poly security clearance
- Ability to work collaboratively with technical teams and communicate effectively
- Strong problem-solving skills and willingness to learn complex systems
- Participate in on-call rotations with support from senior engineers, ensuring continuity of coverage and timely response
- Ensure compliance with all security, operational, and documentation standards