At d-Matrix, the business problem is to productize the SW stack for the AI compute engine and develop software kernels for next-generation AI hardware.
Requirements
- Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals
- Proficient in C/C++ and Python development in Linux environment and using standard development tools
- Experience implementing algorithms in high-level languages such as C/C++ and Python
- Experience implementing algorithms for specialized hardware such as FPGAs, DSPs, GPUs, AI accelerators using libraries such as CUDA, etc.
- Experience in implementing operators commonly used in ML workloads—GEMMs, Convolutions, BLAS, SIMD operators for operations like softmax, layer normalization, pooling, etc.
- Experience with development for embedded SIMD vector processors such as Tensilica
- Experience with ML frameworks such as TensorFlow and/or PyTorch
Responsibilities
- Development, enhancement, and maintenance of software kernels for next-generation AI hardware
- Building software kernels for HW architectures
- Mapping algorithms to the architecture
- Mapping computational graphs generated by AI frameworks to the underlying architecture
- Optimizing and trade-off various aspects of hardware-software co-design
- Building and scaling software deliverables in a tight development window
- Working with a team of compiler experts to build out the compiler infrastructure
Other
- MS or PhD in computer engineering, math, physics, or a related degree with 10+ years of industry experience
- Self-motivated team player with a strong sense of ownership and leadership
- Prior startup, small team, or incubation experience
- Work experience at a cloud provider or AI compute/subsystem company
- Hybrid work environment, working onsite at our Santa Clara, Ca headquarters 3-5 days per week