Qualcomm is investing in Deep Learning and developing hardware and software solutions for Inference Acceleration to play a central role in the evolution of Cloud AI.
Requirements
- Hands-on experience in building and optimizing language models, notably in PyTorch, ONNX, preferably in production-grade environments.
- Deep understanding of transformer architectures, attention mechanisms and performance trade-offs.
- Experience in workload mapping strategies exhibiting sharding or various parallelisms.
- Strong Python programming skills.
- Proactive learning about the latest inference optimization techniques.
- Understanding of computer architecture, ML accelerators, in-memory processing and distributed systems.
- Background in neural network operators and mathematical operations, including linear algebra and math libraries.
Responsibilities
- Convert, optimize and deploy models for efficient inference using PyTorch, ONNX.
- Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities.
- Performance analysis and optimization of LLM, VLM, and diffusion models for inference. Scale performance for throughput and latency constraints.
- Mapping the next generation AI workloads on top of current and future hardware designs.
- Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams.
- Analyze complex performance or stability issues to work towards final root cause of underlying problems.
- Create engineering solutions to deliver continuous insights into performance of AI workloads guiding the improvements over time.
Other
- Strategic thinking, strong execution, and excellent communication skills.
- Strong communication, problem-solving skills and ability to learn and work effectively in a fast-paced and collaborative environment.
- MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering.
- PhD in Computer Science, Computer Engineering or Machine Learning
- Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 6+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.