TSMC is looking to solve the problem of optimizing AI hardware, specifically by researching the benefits of eDRAM for GPUs and identifying opportunities for improvement in GPU architecture and AI workloads.
Requirements
- Experience on GPU architecture and simulations
- GPU latency and energy evaluations
- Embedded DRAM options
- Generative AI workloads
- AI workload profiling
- Memory array design
- Understanding of 3D integration schemes
Responsibilities
- Expand our in-house analytical GPU framework to make it amenable to workload profiling and validate the updates with Gainsight
- Identify opportunities for eDRAM (including refresh-free operation) according to data types (activation, KV cache, weights) and a set of relevant workloads for multi-GPU (single server) inference
- Identify and contrast the opportunities for LLM prefill and decode separately, considering a long context length
- Benchmark various eDRAM options (1T1C BEOL, 2TGC hybrid, 2TGC BEOL) vs. the SRAM baseline with respect to workload-level inference energy and latency at iso-area and iso-capacity
- Modify the GPU architecture to make the best out of eDRAM
- Determine which AI workloads would benefit the most from eDRAM
- Explore architectural and algorithmic options to maximize a refresh-free operation
Other
- Ph.D. Student in Electrical Engineering or Computer Science
- Equal Employment Opportunity for all individuals regardless of race, color, religion, gender, age, national origin, marital status, sexual orientation, gender identity, status as a protected veteran, genetic information, or any other characteristic protected by applicable law
- Reasonable accommodation due to a disability during the application or the recruiting process
- Commitment to treating all employees and applicants for employment with respect and dignity