AI workloads are growing at an unprecedented pace, and inference has become one of the most critical challenges in modern computing. Large-scale models demand massive compute resources, and the diversity of hardware across cloud and edge adds complexity. Achieving low latency and high throughput while controlling cost requires rethinking the entire inference stack-from algorithms to infrastructure.
Requirements
- Experience with LLM architectures, systems for LLM inference, and/or AI hardware.
- Experience with GPUs and understanding of CUDA/ROCm frameworks.
- Experience with computer systems and/or networks.
- Proficient software development skills, preferably in C++ and Python .
Responsibilities
- Research Interns put inquiry and theory into practice.
- During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community.
- Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.
Other
- Accepted or currently enrolled in a PhD program in Computer Science, Software Engineering, Electrical Engineering, or a related STEM field.
- Research Interns are expected to be physically located in their manager's Microsoft worksite location for the duration of their internship.
- In addition to the qualifications below, you'll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples.
- Experience in conducting research and writing peer-reviewed publications.
- Proficient written and verbal communication skills.
- Be able to work in a cross-functional and multi-disciplinary setting across research and product.