Tenstorrent is looking to optimize low-level workloads, kernel development, and enhance the performance of their software for machine learning applications.
Requirements
- Proficiency in C/C++ programming languages
- Familiarity with machine learning frameworks and concepts
- Experience with performance profiling and optimization tools
- Experience with GPU programming (CUDA, OpenCL) is a plus
- Knowledge of operating system internals is a plus
- Proven experience in kernel development, with a strong focus on low-level optimizations and tensor optimization
- Strong problem-solving skills and the ability to analyze and debug complex issues
Responsibilities
- Lead the design, development, and maintenance of acceleration kernel software components for applications
- Develop and optimize kernels and kernel libraries for efficient machine learning and HPC applications
- Implement and optimize tensor compute and tensor data movement kernels
- Analyze and optimize low-level code to improve the performance and efficiency of software
- Collaborate with machine learning engineers and data scientists to integrate optimized kernels and low-level routines into machine learning frameworks and pipelines
- Identify performance bottlenecks, conduct performance profiling, and develop strategies to address and resolve them
- Oversee the creation of comprehensive unit tests, conduct thorough debugging, and ensure the stability and reliability of kernel-level code
Other
- Bachelor’s degree in Computer Science, Software Engineering, or a related field
- Excellent communication and leadership skills
- Self-motivated, detail-oriented, and able to work independently as well as lead a team
- Citizenship/permanent residency, asylee and refugee information and/or documentation will be required and considered as Tenstorrent moves through the employment process