Raft is looking to build and maintain infrastructure as code on large scale multi-site deployments, scale platform capabilities, automate workflows for continuous delivery on a hybrid infrastructure, and troubleshoot issues on high traffic production systems.
Requirements
- 5+ years, building and maintaining Kubernetes clusters across hybrid-cloud infrastructure
- 8+ years of experience working in Operations, DevOps, or Site Reliability Engineering
- 5+ years of infrastructure/configuration management experience using tools like Terraform, Helm etc.
- 5+ years experience with Cloud service monitoring tools like Prometheus, Grafana, FluentD, ElasticStack, SumoLogic, etc.
- Exceptionally proficient (knowledge and work experience) in Linux system administration
- Ability to assist with GitLab CI pipelines (build/promote artifacts and security scans)
- Experience creating automation using APIs from Azure or Google Cloud
Responsibilities
- build and maintain infrastructure as code on large scale multi-site deployments
- evaluate and assess new ways to scale platform capabilities
- automate workflows to help push the limit of the infrastructure and enable continuous delivery of capabilities onto a hybrid infrastructure
- troubleshoot issues until root causes are understood on high traffic production systems
- participate in design and code review processes
- interact with product owners to coordinate infrastructure changes
- identifying bottlenecks and improving performance of the platform
Other
- This is a U.S. based position.
- All of the programs we support require U.S. citizenship to be eligible for employment.
- All work must be conducted within the continental U.S.
- Ability to obtain a Security+ within the first 90 days of employment with Raft
- Ability to obtain and maintain a security clearance