CoreWeave is seeking a Senior Engineer to help build and scale a planet-scale performance data warehouse, enabling the company to deliver industry-leading performance benchmarking for AI and cloud infrastructure.
Requirements
- 5+ years of experience building distributed systems, high-performance computing, or cloud services.
- Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.
- Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).
- Experience with performance-critical GPU systems (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth) and model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM).
- Strong communicator comfortable collaborating with cross-functional teams and external partners.
- Experience with time-series databases, LSM-based storage engines, or custom data pipelines.
- Experience running MLPerf submissions or similar large-scale audited benchmarks.
Responsibilities
- Build and improve Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave’s compute stack.
- Implement and maintain benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, runbooks, and result validation.
- Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.
- Mentor junior engineers; review cross-team designs and elevate coding/testing standards.
- Help ensure reproducible, well-documented benchmarking processes.
Other
- Mentor junior engineers.
- Collaborate with cross-functional teams and external partners.
- Attend onboarding at one of CoreWeave’s hubs within the first month.
- Hybrid work environment with potential for remote work based on location and role requirements.
- Eligibility to access export-controlled information as defined by U.S. Government regulations.