xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. The Reasoning Infrastructure team builds an end-to-end RL training framework to enable pretrain scale RL.
Requirements
- Experience building, debugging, and optimizing large scale distributed training systems
- Experience building async RL training frameworks
- Experience in inference systems
- Proficiency in Python, Jax, or Rust
- Strong knowledge of reinforcement learning techniques
- Experience building infra for large-scale reinforcement learning and multi-agent reinforcement learning
Responsibilities
- Design and implement state-of-the-art distributed RL systems
- Profile, debug, and optimize system performance
- Software and algorithm co-design with researchers
Other
- All employees are expected to be hands-on and to contribute directly to the company’s mission.
- Leadership is given to those who show initiative and consistently deliver excellence.
- Work ethic and strong prioritization skills are important.
- All engineers and researchers are expected to have strong communication skills.
- Candidates are expected to be located near Palo Alto or open to relocation.