ByteDance DPU team is building the computing infrastructure foundation for ByteDance and Volcano Engine Public Cloud, focusing on architecture, development, and research of software and hardware technologies for Cloud & AI computing.
Requirements
- Familiar with Linux kernel, proficient in kernel subsystems, such as memory, KVM, scheduler, Cgroup, network, storage, file system and other modules, and have relevant practical experience;
- Have a deep understanding of Cloud Services & AI applications, including but not limited to Database Systems, Big data, Distributed Storage, Serverless computing, AI/ML inference & training, etc.
- Familiar with data center infrastructure network traffic, understands IDC hardware, core components, and server architecture, and has system-level design experience for end-to-end performance, cost, and stability.
- Familiar with x86 & ARM CPU architecture, familiar with CPU underlying performance tuning, understanding of network and storage protocols.
- Those who understand PCIE/RDMA/smartNIC technologies are preferred;
- Familiar with GPU CUDA programming, userspace storage technology SPDK, network technology DPDK.
- Those who understand virtualization file system virtfs, CXL memory technologies are preferred
Responsibilities
- Responsible for the architecture of the next generation kernel/virtualization/container based on DPU and exploration of frontend technologies.
- Responsible for the research and architecture of DPU software and hardware for both CPU-centric and GPU-centric infrastructure.
- Responsible for optimizing the architecture, development, and performance of the monitoring system for CPU, network, storage, kernel in the data center infrastructure.
- Responsible for cutting-edge exploration, architecture, development, and optimization of GPU virtualization and high performance storage & memory systems.
- Responsible for the acceleration architecture for typical workloads in data center infrastructure such as AI/ML, databases, Serverless Computing, Big Data, etc.
Other
- 3 years of experience in software-hardware co-design is preferred, with specific experience in developing distributed computing systems, high-speed interconnection, and distributed storage.