Astronomer is looking to redefine how companies run Apache Airflow at scale by improving their production infrastructure design, build, testing, and deployment processes. The goal is to make data orchestration faster, more reliable, and easier to manage for global organizations, ultimately enhancing their ability to build reliable data products, unlock AI value, and power data-driven applications.
Requirements
- Strong experience in Non-Abstract Systems design and implementation.
- Strong proficiency in Python, Golang and in-depth experience with Kubernetes (CKA or equivalent or greater).
- Experience with observability principles and technologies, including SLI/SLO definition and tracking.
- Experience with (and ideally strong opinions on) software development best practices, such as code review, testing, CI/CD, version control, automation and debugging.
- Experience working on a SaaS/PaaS product across multiple cloud providers.
- Experience with our particular tech stack components and technologies (deep breath): CircleCI, Chronosphere (Prometheus), Splunk, Bazel, Istio, Playwright, Karpenter, Github [Actions] …
- Experience of the innards and quirks of AWS, GCP and (particularly) Azure.
Responsibilities
- Own and build how we test, build and deploy code in a high-scale PaaS environment.
- Collaborate across the whole company on how we design production systems, set standards and make technology choices for new and existing products, and how these fit together.
- Deliver results - we routinely “change the wheels on the bus while it’s moving”, in a predictable, safe and reliable way.
- Work on building out the Platform/Reliability practice for the company.
- Be directly involved in decision-making on what we work on, as well as how we work on it.
- Be directly involved in determining how our platform works.
- Participate in incident management and determine sensible practices as the platform evolves.
Other
- Make high-quality, data-driven and experience-driven decisions on how we build this and the next generation of our production platform, then deliver the results.
- Make promises, and keep them.
- Create and maintain comprehensive internal documentation for systems and processes, ensuring clarity and accessibility.
- Strong communication skills, both written and verbal, with experience in working with a globally distributed team in delivery.
- A passion for reliability and operational excellence. A low tolerance for toil and other nonsense.