CoreWeave is looking to solve the problem of managing complex infrastructure across globally distributed datacenters to power the largest AI workloads in the world.
Requirements
Proficiency in Go and/or Python software development.
Familiarity with CI/CD tools like Argo, Flux, and GitHub Actions.
Strong understanding of Linux internals.
Experience designing, implementing, and monitoring Kubernetes operators for custom resource definitions.
Experience with infrastructure automation and configuration management tools like Ansible, Puppet, Chef, Salt.
Experience with distributed cloud computing principles, including testing strategies, observability, error budgets, and fault-tolerant design.
Experience implementing metrics pipelines, custom alerts, and monitoring strategies.
Responsibilities
Design and implement solutions to problems of scale for multi-site deployment and management of CoreWeave’s global server hardware fleet.
Build and maintain backend services and APIs (gRPC/REST) in Go or Python to interact with Kubernetes and other infrastructure systems.
Develop provisioning services, automation workflows, and fleet management tools that span from bare metal to container orchestration.
Write and maintain Kubernetes custom controllers and operators to automate infrastructure behavior.
Design and implement observability solutions for large-scale server monitoring to improve system stability and insight.
Adapt and extend open source tooling to enhance visibility into system metrics, performance, and health.
Create test plans, deployment automation, dashboards, alerts, and insights into our fleet operations.
Other
5+ years of experience in software or infrastructure engineering.
Ability to break down complex problems into achievable tasks and collaborate with teammates to execute them.
Willingness and ability to thrive in a fast-paced startup environment.
Participate in an on-call rotation.
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations