Cerebras Systems is looking to solve the problem of delivering industry-leading training and inference speeds for machine learning applications, by adapting today's most advanced language and vision models to run efficiently on their flagship Cerebras architecture.
Requirements
- Strong programming skills in Python and/or C++
- Experience with Generative AI and Machine Learning systems
- Proficiency with at least one major ML framework (PyTorch, Transformers, vLLM, or SGLang)
- Deep understanding of transformer-based models in language and/or vision domains, with demonstrated experience implementing and optimizing them
- Proven ability to implement custom layers, operators, and backpropagation logic
- Strong foundation in performance optimization on specialized hardware (e.g., GPUs, TPUs, or HPC interconnects)
- Experience with speculative decoding, neural network pruning and compression, sparse attention, quantization, sparsity, post-training techniques, and inference-focused evaluations
Responsibilities
- Design, implement, and optimize state-of-the-art transformer architectures for NLP and computer vision on Cerebras hardware.
- Research and prototype novel inference algorithms and model architectures that exploit the unique capabilities of Cerebras hardware, with emphasis on speculative decoding, pruning/compression, sparse attention, and sparsity.
- Train models to convergence, perform hyperparameter sweeps, and analyze results to inform next steps.
- Bring up new models on the Cerebras system, validate functional correctness, and troubleshoot any integration issues.
- Profile and optimize model code using Cerebras tools to maximize throughput and minimize latency.
- Develop diagnostic tooling or scripts to surface performance bottlenecks and guide optimization strategies for inference workloads.
- Collaborate across teams, including software, hardware, and product, to drive projects from inception through delivery.
Other
- Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, Electrical Engineering, or a related technical field AND 7+ years of ML software development experience
- 4+ years of experience testing, maintaining, or launching software products, including 2+ years of experience with software design and architecture
- 3+ years of experience in software development focused on machine learning (e.g., deep learning, large language models, or computer vision)
- Collaborative approach with humility, eagerness to help colleagues, and commitment to team success
- Hybrid role in Toronto, ON, CA or Sunnyvale, CA, USA