Samsung's Architecture Research Lab (ARL) is addressing memory bandwidth and communication bottlenecks in AI systems, specifically for Large Language Models (LLMs), to develop next-generation AI systems and improve developer productivity.
Requirements
- LLM models
- performance benchmarking
- AI system design
- performance analysis and modeling
- PyTorch
- Python/C++
Responsibilities
- Research various parallel algorithms in LLM inference system with focuses on DP, TP, EP and PP and evaluate the performance impact on AI inference in distributed systems with CPU/GPU/Accelerators
- Analyzing the system components and guiding the selection in the area of the compute, memory and network topologies; analyze bottlenecks and come up with innovative solutions
- Conducting the performance analyzing and modeling work, to run design space exploration and system-level optimization.
- Complete other responsibilities as assigned.
Other
- PhD student currently pursuing degree, 3+ years in Computer Science preferred
- Must have at least 1 academic quarter/semester remaining
- You’re inclusive, adapting your style to the situation and diverse global norms of our people.
- An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
- You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
- Innovative and creative, you proactively explore new ideas and adapt quickly to change.