Karsun is looking to build out and run production environments, automate operations, and maintain and support infrastructure to meet reliability expectations of multiple applications
Requirements
- Deep understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Kubernetes)
- Experience with monitoring, logging, and observability tools like DataDog, AWS Cloudwatch, ELK, Prometheus, Splunk etc.
- Knowledge of infrastructure as code tools (e.g., Terraform, Ansible, ArgoCD) and CI/CD pipelines
- Experience deploying enterprise software within AWS Services such as EKS, RDS, EC2, Elastic Load Balancers, Lambda, DynamoDB, multi regions, and API Gateway
- Certifications such as AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer are a plus
- Experience with tools such as Jenkins, GitHub/Bitbucket, Nexus/Artifactory
- Experience with tools such as Ansible, Packer, Puppet, or Chef
Responsibilities
- Deploy and manage applications into Kubernetes container platforms such as AWS EKS, or OpenShift
- Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues.
- Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance.
- Implement and support integrated CI/CD pipelines for on-premises and/or cloud assets using tools such as Jenkins, GitHub/Bitbucket, Nexus/Artifactory
- Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents
- Implement, deploy and maintain infrastructure as code (IaC) for provisioning infrastructure using AWS CloudFormation or Terraform
- Design, build, and maintain automated monitoring and notification services to support fault tolerant and highly available systems and metrics using tools such as AWS CloudWatch, EFK, and Prometheus
Other
- Bachelor’s degree in computer science, Engineering, or a related field and 8-10 years of relevant experience
- Ability to obtain and maintain a Public Trust clearance
- Strong problem-solving and analytical skills, with a keen attention to detail
- 5+ years of experience supporting operations and maintenance for cloud-native applications in production that are fault-tolerant, self-healing, scalable and high available
- Travel requirements not mentioned