Microsoft is looking to solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment through advances in Artificial Intelligence (AI) systems and architecture.
Requirements
- Research experience in areas such as computer architecture, AI/ML systems, performance modeling, distributed systems, or hardware–software co-design.
- Programming skills in Python, C/C++ with experience building prototypes, simulators, or performance analysis tools.
- Familiarity with modern AI workloads and/or deep learning frameworks (e.g., PyTorch).
- Experience with PyTorch, CUDA, Triton, or performance-simulation tools.
- Background in large-scale system design, AI inference bottleneck analysis, or modeling cost/performance tradeoffs.
- Understanding of accelerator, memory-system, or interconnect design principles.
Responsibilities
- Investigate and evaluate emerging disaggregated KV cache architectures.
- Implement a hierarchical storage architecture with multiple tiers
- GPU Memory: Active working set of KV caches currently used by the model
- CPU DRAM: Hot cache for recently used KV chunks using pinned memory for efficient GPU-CPU transfers
- Local Storage: Large-scale local caching (NVMe, local disk)
- Build Peer-to-Peer (P2P) service KV cache sharing architecture that enables direct, high-performance cache transfer between multiple LLM serving instances without requiring centralized cache servers.
Other
- Currently enrolled in a PhD program in Computer Science, Electrical/Computer Engineering, or a related field.
- Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
- Submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples.
- Ability to collaborate effectively with researchers across disciplines and work in cross-group, cross-cultural environments.
- Proficient communication and presentation skills for sharing complex technical insights.