The company is looking to build and optimize the infrastructure that powers its generative AI solutions.
Requirements
- Strong experience in operating system development, including kernel space drivers and user-space libraries
- Expertise with communication fabrics (e.g., RDMA, PCIe, Infiniband, RoCE) and high-performance data transport mechanisms
- Hands-on experience with software bring-up for custom hardware platforms
- A solid understanding of distributed systems, performance tuning, and large-scale system architecture
- Familiarity with high-performance computing (HPC) workloads and AI applications
- Experience with containerization and virtualization technologies
Responsibilities
- Design and enhance the infrastructure for ML training and inference at scale
- Support system software, including drivers and kernel modules, for cutting-edge hardware
- Build user-facing tools that improve system monitoring, job management, and profiling for the SambaNova platform
- Develop solutions for virtualization and multi-tenant environments to improve ease of use and isolation
- Collaborate with cross-functional teams, including ML, Compiler, and DevOps, to ensure system-wide optimization
Other
- Excellent problem-solving skills and the ability to work collaboratively in a fast-paced, dynamic environment
- Strong communication skills and a passion for mentoring and working with colleagues across teams
- A track record of contributing to open-source projects or developing innovative system-level software
- Enthusiasm for innovation and the drive to push the boundaries of what’s possible in the world of AI
- Annual Salary Range and Level: $170,000/year up to $210,000/year
- Benefits Summary for US-Based Full-Time Direct Employment Positions