Informatica is looking for a Principal DevOps Engineer to shape the future of their global infrastructure, architecting a scalable, secure, and available foundation that powers their platform across multiple clouds.
Requirements
- Deep, hands-on expertise in at least one major public cloud (AWS, Azure, or GCP) and production experience with at least one other. Experience with OCI cloud.
- Experience supporting microservices-based architectures in a production environment.
- AIOps & Experienced Automation: experience using observability data for AIOps programmes. Familiarity with applying statistical analysis or machine learning models for predictive monitoring, anomaly detection, and automated root cause analysis.
- Infrastructure as Code (IaC): Mastery of tools like Terraform or CloudFormation. Experience with configuration management tools like Ansible, Chef, or Puppet.
- Scripting & Automation: Expert-level proficiency in at least one scripting language (Python, Bash, MongoDB Queries) with a portfolio of successful automation projects.
- CI/CD: Deep experience building CI/CD pipelines and deployment tools (Jenkins, Git, GitHub).
- Observability: Hands-on experience building monitoring/logging for distributed systems (Prometheus, Grafana, CloudWatch).
Responsibilities
- Architect & Strategize: Lead the design of our next-generation deployment architecture for a microservices-based platform. Drive technological choices for team tooling and infrastructure, ensuring long-term scalability and reliability.
- AIOPS: Implement AIOps frameworks to improve operational tasks and enhance system self-healing capabilities.
- Develop CI/CD Pipelines: Design, manage, and increase our CI/CD pipelines using tools like Jenkins, Git, and GitHub to allow rapid, reliable, and automated software delivery.
- Ensure Uptime: Take ultimate ownership of our production environment's stability. Lead end-to-end incident management, from escalation to Root Cause Analysis (RCA). Manage patching, upgrades, and disaster recovery processes.
- Automate & Operate: Engineer and own a world-class observability stack (e.g.,Prometheus, Grafana, CloudWatch, ELK). Develop automation scripts and frameworks to streamline operational tasks and enhance system self-healing capabilities.
- Mentor & Lead: Act as a technical leader and mentor for the team. Share your expertise, establish best practices, and improve the technical capabilities of the entire team.
Other
- Bachelor of Science (BSc) degree in Engineering, Computer Science, or a related technical field.
- 8+ years of progressive experience in DevOps, SRE, or Cloud Platform Engineering, with at least 3 years in a senior role managing large-scale production environments.
- Participation in a 24x7 on-call rotation to support critical uptime.
- Comprehensive health, vision, and wellness benefits (Paid parental leave, adoption benefits, life insurance, disability insurance and 401k plan or international pension/retirement plans
- Flexible time-off policy and hybrid working practices