ByteDance Networking is looking to design, build, and operate a global, intelligent network infrastructure to meet the requirements of high availability, scalability, and high-performance for its hyperscale data-center networking solutions
Requirements
- Proficiency in computer network and network programming
- Proficiency in one or several mainstream programming languages, including C/C++, Python, Go and so on
- Be familiar with the latest advances in the area of high-speed network systems, including RDMA, congestion control, AI network optimization and so on
- Experience in developing high performance communication frameworks(including NCCL, MPI and RPC libraries) is a plus
- Experience in developing software systems for AI network diagnosis and performance optimization
- Be familiar with AI training/inference systems and software-hardware co-design
- Having top tier networking conference publications such as NSDI,SIGCOMM,OSDI,SOSP etc
Responsibilities
- Design, implementation and deployment of high-speed network technologies to support AI/LLM applications
- Design and development of platforms/systems for monitoring, analysis and diagnosis of large scale AI/LLM network
- Research and development of high-performance AI communication framework, network protocol stacks, and codesign optimization of host-network-application to improve the scalability, reliability and performance of AI/LLM network
- Follow the latest technologies from academia and industry, identify the innovative parts of the system and present in academic papers
Other
- Currently pursuing a PhD in computer networking or a related technical discipline
- State your availability clearly in your resume (Start date, End date)
- Accept and agree to our global applicant privacy policy
- Applications will be reviewed on a rolling basis - we encourage you to apply early