CoreWeave is seeking a Senior Engineer to address the challenge of building and maintaining a planet-scale performance data warehouse to support AI infrastructure, enabling industry-leading benchmarking and performance analysis across global data centers.
Requirements
- 3–5 years of experience building distributed systems, high-performance computing components, or cloud services.
- Strong programming skills in Python or Go (C++ a plus) with understanding of networked systems and performance fundamentals.
- Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (e.g., Prometheus, Grafana, OpenTelemetry).
- Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM).
- Experience with time-series databases, LSM-based storage engines, or custom data pipelines.
- Familiarity with MLPerf or other large-scale benchmarking frameworks.
- Contributions to OSS projects such as llm-d, vLLM or PyTorch.
Responsibilities
- Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave’s compute stack.
- Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation.
- Participate in design discussions and contribute to architecture decisions within the team.
- Break down engineering tasks into clear milestones and deliver reliable, high-quality code.
- Collaborate with teammates to maintain reproducible, well-documented benchmarking processes.
- Provide constructive code reviews and share best practices with peers.
- Mentor junior engineers; review cross-team designs and elevate coding/testing standards.
Other
- Effective communicator comfortable working cross-functionally.
- Ability to work in a hybrid environment, with remote work considered for candidates located more than 30 miles from an office.
- Must be a U.S. person or eligible to access export-controlled information without requiring export authorization.
- Willingness to attend onboarding at one of CoreWeave’s hubs within the first month.
- Participation in quarterly team gatherings to support collaboration.