Cerebras Systems builds the world's largest AI chip, aiming to provide industry-leading training and inference speeds and empower machine learning users to effortlessly run large-scale ML applications without the hassle of managing hundreds of GPUs or TPUs.
Requirements
- 2+ years of experience in software integration, debugging, or quality engineering.
- Strong programming and automation skills in Python, C++, Go, or similar languages.
- Experience testing compute, machine learning, networking, or storage systems in large-scale environments.
- Solid understanding of system architecture (compute, networking, storage) and ML workloads.
- Proven ability to break down complex issues into root causes and scalable solutions across distributed or complex systems.
- Ability to understand complex systems and design comprehensive, effective test plans.
- Experience with ML workloads such as LLM or multimodal model training and inference.
Responsibilities
- Debug and resolve complex integration issues across the Cerebras AI platform, spanning ML, compiler, runtime, and hardware layers.
- Develop and deploy AI-enhanced debugging and validation tools to accelerate issue identification and resolution.
- Automate test generation, data capture, and diagnostics using scripting and intelligent systems.
- Create and execute robust validation plans for LLM and multimodal workloads in production-scale environments.
- Identify edge cases, stress failure modes, and proactively improve system resilience.
- Design and maintain CI/CD pipelines to ensure continuous integration, fast feedback, and early detection of regressions.
- Contribute to continuous improvement by implementing quality metrics, automation pipelines, and actionable insights.
Other
- This role follows a hybrid schedule, requiring in-office presence 3 days per week.
- Strong collaboration and communication skills across cross-functional teams.
- Experience collaborating with globally distributed teams across time zones.
- People who are serious about software make their own hardware.
- Our simple, non-corporate work culture that respects individual beliefs.