ClickHouse is looking to solve the problem of building and operating a reliable, scalable, and efficient telemetry platform that powers both internal monitoring and the observability features its customers rely on.
Requirements
- Proficiency in at least one systems-level language (e.g. Go, C++, Rust, Python)
- Experience with Kubernetes, Helm, ArgoCD, and Terraform or similar IaC tools
- Comfortable working with at least one major cloud provider (AWS, GCP, Azure)
- Familiarity with OpenTelemetry, Prometheus, Grafana, or similar tools
- Experience with ClickHouse
- Experience building and running production systems at scale
- Strong production debugging skills and a problem-solving mindset
Responsibilities
- Design, build, and operate distributed systems that power observability across ClickHouse Cloud
- Own reliability, performance, and cost-efficiency of our telemetry pipeline and storage systems
- Take part in the on-call rotation and help drive root-cause resolution and long-term fixes
- Build tooling and automation to eliminate repetitive operational work
- Help shape the roadmap for observability by identifying bottlenecks and scaling challenges
- Collaborate with other engineering teams to improve their observability posture
- Contribute to design discussions, architecture reviews, and mentor teammates
Other
- Strong bias for action and ownership
- Great communication skills; comfortable working in a remote, async-friendly team
- Ability to iterate quickly: build MVPs, collect feedback, and improve continuously
- 5+ years of experience
- Flexible work environment
- Healthcare
- Equity in the company
- Time off
- A $500 Home office setup if you’re a remote employee
- Global Gatherings