Astronomer is looking to hire a Senior Software Engineer to join their Data Plane Management team to efficiently operate and develop new features for the cloud compute foundation that customer Airflow deployments are hosted upon, managing a large scale fleet of 500+ Kubernetes clusters.
Requirements
- Experience building and operating SaaS infrastructure, or experience managing a large scale internal compute platform.
- Software engineering expertise with Golang, or similar languages with a desire to learn Golang.
- Production experience with a container based orchestration system (Kubernetes preferred, but not essential).
- Understanding of how to build with security and isolation in mind, so that Astronomer’s managed platform can securely integrate with any customer environment, and to ensure strict isolation between customer workloads.
- Programmatically administered Kubernetes in multiple clouds.
- Experience designing systems for resiliency, scale and security.
- Experience with our particular tech stack components and technologies (deep breath): Calico/Cilium, PostgreSQL/Aurora/CloudSQL/etc, OpenTelemetry, Chronosphere (Prometheus), Splunk,, Istio, Karpenter, Falco,
Responsibilities
- Own key endpoints and features of our flagship product, Astro, to extend our offering to more complex customer networking options.
- Work across domains to develop diverse features for our core infrastructure (ie: workload identity, multitenancy, cross region disaster recovery, cloud quota management, private network access).
- Evolve our fleet orchestration system to allow us to be able to safely make (and roll back) changes across our infrastructure, and to enable us to scale from the 500+ clusters we have now to thousands in the future.
- Develop your experience working in a multi-cloud environment, working with managed K8S offerings and network/authnz primitives from AWS, Azure, GCP.
- Deepen your operational knowledge of k8s-based workloads, managing the data pipelines of many of the largest companies in the world.
Other
- A passion for reliability and operational excellence. A low tolerance for toil, alert fatigue, and other nonsense.
- Strong communication skills, both written and verbal, with experience in working with a globally distributed team in delivery.
- Proactive approach to identifying and addressing issues, with a focus on ownership and accountability.
- Experience as part of an on-call rotation - this role involves periodic on-call for the services and system we own.
- A passion for finding and addressing inefficiencies in code, infrastructure costs, , tooling and processes.