Anthropic is looking to build reliable, interpretable, and steerable AI systems, and this role focuses on the network infrastructure and optimization required to support these advanced AI systems.
Requirements
- Expert-level proficiency with network protocols and networking concepts
- Deep kernel networking: TCP/IP stack internals, XDP, eBPF, io_uring, and epoll
- User-space networking: DPDK, RDMA, kernel bypass techniques
- Strong programming skills in a systems programming language, including memory management, lock-free data structures, and NUMA-aware programming
- Software, driver, and OS performance optimization tools and techniques
- Comfort with or desire to learn Rust
- Experience programming on SmartNICs
Responsibilities
- Writing and maintaining software that interfaces between our accelerators and our high-speed networks.
- Building and maintaining software that interacts with networks.
- Diagnosing and resolving networking issues in distributed systems, especially at OSI model layers 2-4
- Build a system for accelerator-initiated tensor movement over the network
- Benchmark software for a new networking environment
- Implement a new collective algorithm to improve latency
- Optimize congestion control algorithms for large-scale synchronous workloads
Other
- We require at least a Bachelor's degree in a related field or equivalent experience.
- Currently, we expect all staff to be in one of our offices at least 25% of the time.
- We do sponsor visas!
- Strong debugging mindset with patience for complex, multi-layered issues
- We greatly value communication skills.