Ensuring the reliability, scalability, and performance of cloud-based applications for a partner company.
Requirements
- 8+ years of professional software engineering or site reliability engineering experience.
- Expertise in programming languages such as Python, JavaScript, Go, or C.
- Strong experience with cloud infrastructure, particularly AWS, and distributed systems.
- Proficiency with infrastructure-as-code tools like Terraform.
- Knowledge of observability and monitoring tools (Grafana, Prometheus, Splunk).
- Familiarity with RESTful APIs, Git workflows, and software development best practices.
- Experience with cloud-native technologies and highly regulated environments is a plus.
Responsibilities
- Lead the design, development, and deployment of scalable software and infrastructure solutions.
- Champion SRE best practices, including monitoring SLAs, SLOs, and SLIs to ensure system reliability.
- Architect and implement cloud-native solutions, leveraging tools such as AWS, Kubernetes, Terraform, Prometheus, Grafana, and CI/CD pipelines.
- Develop and maintain operational procedures, system documentation, and automation for testing and deployments.
- Mentor and guide team members through code reviews, standards enforcement, and knowledge sharing.
- Collaborate with cross-functional teams to improve service performance, availability, and incident response.
- Ensure compliance with security, audit, and regulatory requirements through proper controls and documentation.
Other
- Bachelor’s degree in Computer Science, Information Systems, or equivalent experience.
- Strong problem-solving, analytical, collaboration, and communication skills.
- Flexible work arrangements, including fully remote options.
- Inclusive, collaborative, and innovative work environment.