Nexthink is looking to build and run a high-performance cloud platform, with a specific focus on enabling the US Public Sector market, including a FedRAMP Moderate offering. This involves driving the development of modern, cloud-native SRE processes and managing operations for their multi-tenant, microservices-based cloud platform to meet federal security standards.
Requirements
- Proficiency in cloud platforms (AWS, Azure, GCP) and cloud-native services.
- Strong scripting and programming skills (Python, Bash, Go, or similar).
- Experience with Infrastructure as Code (IaC) tools such as Terraform, CrossPlane, CloudFormation, or Ansible.
- Knowledge of containerization and orchestration (Docker, Kubernetes).
- Familiarity with CI/CD pipelines and tools (Jenkins, GitLab, GitHub, etc.).
- In-depth knowledge of FedRAMP requirements and best practices.
- Experience with security tools and practices (SIEM, IDS/IPS, firewalls).
Responsibilities
- Drive automation of infrastructure provisioning, configuration, and management using Infrastructure as Code (IaC) tools.
- Develop and maintain comprehensive monitoring, logging, and alerting systems to ensure high availability and performance.
- Lead efforts in performance tuning and optimization for applications and infrastructure.
- Ensure implementation and maintenance of security controls and best practices to achieve FedRAMP compliance.
- Conduct and oversee regular security assessments, vulnerability scans, and penetration testing.
- Lead incident management efforts, ensuring rapid resolution and thorough root cause analysis.
- Work closely with development, operations, and security teams to integrate reliability and security into the software development lifecycle.
Other
- Lead, mentor, and develop a team of US-based Site Reliability Engineers.
- Foster a culture of continuous improvement, collaboration, and innovation.
- Collaborate with the compliance team to prepare for and respond to FedRAMP audits.
- Communicate effectively with stakeholders, providing regular updates on system performance, reliability, and compliance status.
- Ability to collaborate with and foster effective communication with global engineering teams in EU and India timezones.