CoreWeave is seeking to enhance its observability capabilities to better understand, troubleshoot, and optimize complex systems at the forefront of Artificial Intelligence.
Requirements
- Experience with logging, tracing, and metrics platforms in production and at scale
- Familiarity with various logging and metrics systems like ClickHouse, Elastic, Loki, Victoria Metrics, Prometheus, Thanos and/or Grafana
- Familiarity with PromQL, any other querying language and understanding of the data model for observability systems
- Comfortable with the idea of using Go as your primary programming language
- Experienced running telemetry services at a cloud (preferred)
- Operated Kubernetes clusters at scale for both event-driven and stateful orchestration (preferred)
- Familiarity with Infrastructure-as-Code tools and practices (preferred)
Responsibilities
- Modernize logging platforms at cloud-scale
- Design and execute migrations that are transparent to platform consumers
- Build governance mechanisms that empower CoreWeavers to effectively manage the telemetry their services produce and adopt best practices
- Develop and enforce best practices regarding the health of telemetry ETL pipelines
- Improve the performance, security, reliability, and scalability of observability services while participating in the team’s on-call rotation
Other
- Six or more years of experience in a software or infrastructure engineering industry
- Customer obsessed, ecstatic to provide infrastructure as a service, and default to adopting a product lens when evaluating platform scale problems
- Work with a passionate team of engineers in an iterative, high-trust agile environment
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption