xAI is looking to build cutting-edge software, services, and frameworks to empower their Network Development Engineers, driving automation-first solutions for their production and ancillary networks to support their mission of accelerating human scientific discovery through AI.
Requirements
- Python
- Go
- TCP/IP
- BGP
- RDMA
- Deep experience collaborating with network engineers daily using extensive knowledge of network topologies, physical and logical, and network protocols.
- Expert knowledge and proven history with designing scalable and reliable software from the ground up that can build and orchestrate tens of thousands of network devices at lightning speeds.
Responsibilities
- Build cutting-edge software, services, and frameworks to empower our Network Development Engineers.
- Tackle all facets of network management—metric collection, configuration, zero-touch provisioning, monitoring, and auto-remediation.
- Drive automation-first solutions for xAI’s production and ancillary networks.
- Develop extensible tools, streamline complex processes, and ensure rock-solid reliability.
- Building software and tools with extensive metrics coverage for some of the world’s largest GPU supercomputing network fabrics used for AI training and serving customer inference queries.
- Implement IaC best practices, enhancing deployment pipelines, and ensuring robust, secure service delivery across our production environments.
Other
- All employees are expected to be hands-on and to contribute directly to the company’s mission.
- Leadership is given to those who show initiative and consistently deliver excellence.
- Work ethic and strong prioritization skills are important.
- All engineers and researchers are expected to have strong communication skills.
- Ability to thrive in ambiguity, creating metrics that will help prioritize the focus of the team and your own.