Cerebras Systems builds the world's largest AI chip to provide industry-leading training and inference speeds and empower machine learning users to effortlessly run large-scale ML applications without the hassle of managing hundreds of GPUs or TPUs. The Inference ML team aims to enable seamless integration of machine learning (ML) frameworks with Cerebras' software and hardware ecosystem, bridging the gap between popular ML frameworks and their optimized stack to provide tools that make developing and deploying ML models efficient and accessible.
Requirements
- Proficiency in Python for building and maintaining scalable systems.
- Advanced proficiency in C++, with an emphasis on multi-threaded programming, performance optimization, and system-level development.
- Hands-on experience with ML frameworks such as PyTorch, TensorFlow, or JAX, and a strong understanding of their underlying architectures.
- Solid understanding of software architectural patterns for large-scale, high-performance applications.
- In-depth knowledge of machine learning algorithms, theory, and best practices for developing production-ready software.
Responsibilities
- Design and implement scalable and efficient integrations with popular machine learning frameworks, such as PyTorch, while ensuring compatibility with future frameworks.
- Analyze the characteristics of various ML models to make informed design decisions for scalable, intuitive, and user-friendly APIs.
- Optimize software to accelerate ML model training and ensure high throughput and low latency during inference.
- Stay up-to-date with advancements in machine learning and deep learning, and apply state-of-the-art techniques to enhance our solutions.
- Evaluate trade-offs between different approaches, clearly articulate design choices, and develop detailed proposals for implementing new features.
- Build and maintain robust automated test suites to ensure software quality, performance, and reliability.
- Collaborate with cross-functional teams, including compiler engineers, kernel developers, and system architects, to integrate ML capabilities seamlessly into our products and services.
Other
- Lead and provide technical guidance to a team of machine learning engineers working on complex machine learning integration projects.
- Contribute to an agile team environment by delivering high-quality software and adhering to agile development practices.
- Exceptional communication and presentation skills, with the ability to work both independently and collaboratively across multidisciplinary teams.
- 5+ years of experience in large-scale software engineering, with a focus on deep learning or related domains.
- Proven experience leading and mentoring software or machine learning engineers.