Qualcomm Cloud AI team is developing software solutions for Inference Acceleration.
Requirements
- Experience in serving frameworks, like vLLM
- Strong development skills in PyTorch
- Strong understanding of LLMs, Multi-modal and reasoning models
- Experience in executing, analyzing, and optimizing neural networks
- Experience in writing high performance software for multicore systems
- Experience with C++, Python
- Strong skills in analyzing performance of software/hardware solutions on multi-core architectures; understanding of multi-core architecture fundamentals (core, cache, memory, bus, PCIe, etc)
Responsibilities
- developing software solutions for Inference Acceleration
- design, compiler technology, performance modeling, and bottleneck analysis
- span the whole product life cycle from early R&D to commercial deployment
- serving frameworks, like vLLM
- executing, analyzing, and optimizing neural networks
- writing high performance software for multicore systems
- analyzing performance of software/hardware solutions on multi-core architectures
Other
- ambitious, bright and innovative engineer
- delivered commercial software
- fast-paced and requires cross-functional interaction on a daily basis so good communication, planning and execution skills are a must
- Proven ability of planning, managing and deliver large commercial software projects
- Excellent communication skills (written and verbal) and team player