Collibra is looking to hire an Infrastructure Engineer to help customers deploy their Unstructured AI application in their own cloud environments (AWS, GCP, Azure), ensuring reliable, scalable, and easily deployable infrastructure.
Requirements
- Strong experience with Terraform or similar infrastructure-as-code tools and modular deployment best practices.
- Good understanding of Kubernetes, Helm, and container orchestration.
- Experience deploying and managing workloads in at least one major cloud platform (AWS, GCP, or Azure).
- python application code experience to align infrastructure and deployment patterns.
- Familiarity with networking, IAM, and security configurations, including VPC design and private networking.
- Experience with deployment automation and CI/CD pipelines, including container image management (ECR, GCR, ACR).
- Hands-on experience with monitoring, observability, and incident response, debugging distributed systems using tools like Datadog, Prometheus, or Grafana.
Responsibilities
- Designing, implementing, and maintaining the infrastructure blueprint that enables customers to deploy our Unstructured AI application in their own cloud environments (AWS, GCP, Azure).
- Codifying infrastructure as code (Terraform), building modular, secure, and reproducible configurations that support flexible deployment scenarios, including bring-your-own VPC, custom networking, and varied customer setups.
- Managing and evolving the deployment stack, including Kubernetes and Helm-based deployments of the Unstructured AI application, and supporting components such as databases, secrets, and user authentication and SSO.
- Set up and maintaining monitoring, alerting, and observability systems (e.g. Datadog, Prometheus, Grafana) to ensure operational visibility and proactive incident detection.
- Participating in incident response and troubleshooting for customer-deployed environments, using observability data to diagnose and resolve issues quickly.
- Collaborating with the application team to implement python code and configuration changes supporting evolving infrastructure, authentication, and deployment patterns.
- Building tools and services to automate product updates, simplifying upgrades and ongoing maintenance of the Unstructured AI application.
Other
- This is a hybrid role based in our New York office. Our hybrid model means you'll work from the office at least two days each week.
- A bachelor's degree or equivalent related working experience is required.
- This position is not eligible for visa sponsorship.
- Experienced in collaborating effectively with engineers across domains to deliver robust, production-ready systems.
- Actively taking ownership in rapidly evolving environments, adapting infrastructure and deployments as the Unstructured AI product grows and changes.