Special Program -Rising Star Express (RiSE) Forum
Mini-tutorial talks by young researchers
Chair: Chao-Tsung Huang, National Tsing Hua University
Co-chair: Mototsugu Hamada, The University of Tokyo
RiSE Talk 1
Design techniques for high-bandwidth die-to-die chiplet interfaces
Seoul National University, Korea
As Moore’s law slows down due to the challenges of transistor scaling, innovative solutions such as chiplet technology have gained increasing attention. High-bandwidth chiplet interfaces play a critical role in this new paradigm, enabling efficient and rapid data transfer between chiplets. This capability is crucial for the performance of chiplet-based computing systems, which rely on high bandwidth and low latency communication to execute applications swiftly and process data effectively. Optimizing these chiplet interfaces is therefore essential to maximize computing performance and overcome the bottlenecks imposed by traditional monolithic chip designs. This talk discusses the recent advancements in chiplet interface technologies and explores strategies to address the challenges associated with their implementation.
Woo-Seok Choi received the B.S. and M.S. degree in electrical engineering and computer science from Seoul National University in 2008 and 2010, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign in 2017. From 2018 to 2019, he was a postdoctoral fellow at Harvard University. Since 2020, Woo-Seok has been with the Department of Electrical and Computer Engineering of Seoul National University, where he is currently an associate professor. His current research interests include designing energy-efficient high-speed wireline transceivers and algorithm/hardware co-design for machine learning applications.
RiSE Talk 2
Towards Multi-Layer Processing-in-Memory Systems for General Applications
Tokyo Institute of Technology, Japan
Processing-in-Memory (PIM) is a promising solution to the memory wall problem by integrating computation and memory, thus reducing data movement overhead in data-intensive tasks. While PIM has demonstrated potential across various memory hierarchy levels, integrating these diverse PIM technologies into a unified system remains an open challenge, despite the potential of leveraging the strengths of each memory type, mirroring the traditional memory hierarchy, for enhanced workload adaptability. However, realizing such a system requires addressing key challenges, including effective memory management and job scheduling across this heterogeneous setup. This talk will explore the current limitations of PIM and its integration challenges, and propose potential approaches to enable multi-layer PIM systems.
Daichi Fujiki is an Associate Professor in the AI Computing Research Unit at Tokyo Institute of Technology. His research interests include memory-centric computing systems for general workloads and domain-specific architectures for data-intensive applications. His research unit develops processing in-memory architectures and custom acceleration frameworks for next-generation AI processing. He received a Ph.D. in 2022 and an M.S.Eng in 2017 from the University of Michigan, Ann Arbor, and a B.E. in 2016 from Keio University, Japan.
RiSE Talk 3
NoC-based Intelligent CAS Designs and Applications for Anomaly Detection in Smart Motor Systems
National Yang Ming Chiao Tung University, Taiwan
Deep Neural Networks (DNN) have shown significant advantages in many domains such as image processing, speech recognition, and machine translation. Current DNNs include many layers and thousands of parameters, leading to high design complexity and power consumption when developing large-scale deep neural network accelerators. In addition, contemporary DNNs are usually trained based on tons of labeled data. Therefore, it is time-consuming to generate an optimal DNN when facing a new dataset. The first part of this tutorial, I will introduce a Network-on-Chip-based (NoC-based) DNN design paradigm. The NoC interconnection can help to reduce the off-chip memory accesses while offering better scalability and flexibility. In the second part of the tutorial, we move the gear to Industry 4.0 field, which has ushered in a new era characterized by a data-centric ecosystem. I will delve into the realm of lightweight methodologies engineered to execute anomaly detection and even forecast the remaining useful life (RUL) through a CAS-oriented approach. The ultimate goal of this tutorial is to let the audients, who are interested in AI accelerator design but with limited background, understand the fundamental design concepts of a NoC-based DNN hardware platform and the potential applications.
Kun-Chih (Jimmy) Chen is currently an Associate Professor and Electric Junior Chair Professor at the Institute of Electronics, National Yang Ming Chiao Tung University (NYCU). His research interests include MPSoC design, Neural network learning algorithm design, Reliable system design, VLSI/CAD design, and Smart manufacturing. Dr. Chen served as TPC Chair and General Chair of the International Workshop on Network on Chip Architectures (NoCArc) in 2018 and 2019. He also served as TPC Chair of the IEEE MCSoC in 2023 and 2024. Besides, he was a Guest Editor of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS). Dr. Chen received the IEEE Tainan Section Best Young Professional Member Award, TCUS Young Scholar Innovation Distinction Award, the Exploration Research Award of Pan Wen Yuan Foundation, the Taiwan IC Design Society Outstanding Young Scholar Award, the CIEE Outstanding Youth Electrical Engineer Award, and IEEE TVLSI Best Paper Award. He is the IEEE senior member and ACM member.
RiSE Talk 4
High-Performance Computing-In-Memory Chiplet-based AI Processor
The Hong Kong University of Science and Technology, China
With the development of Large Language Models (LLM), Artificial Intelligence (AI) has transitioned to the Mosaic moment for a new era. LLM’s extensive parameters and multi-task learning capabilities enable it to handle complex and general AI tasks like intelligent assistants, content generation, and sentiment analysis. High-performance AI chips are in high demand to deploy LLM inference and drive further AI development. Chip performance is determined by three elements: computation efficiency, transistor density, and chip area. However, chip fabrication faces the “Technology Wall” due to physical limitations, especially when scaling under 5nm to achieve higher transistor density. Therefore, we must explore opportunities from the other two elements. Computing-In-Memory (CIM) is a promising architecture that overcomes the “Memory Wall” caused by LLM’s large-scale parameters. Advanced chiplet integration offers an important solution to break the “Area Wall” by integrating multiple chiplets into a single package. In this talk, we will discuss how architecture innovation with CIM and advanced integration with chiplets can bring a new vision to designing high-performance AI processors for LLM acceleration. By leveraging these innovative approaches, we can overcome the limitations of traditional chip design and unlock the full potential of LLM in AI development.
Fengbin Fengbin Tu is currently an Assistant Professor in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Technology. He received the Ph.D. degree from the Institute of Microelectronics, Tsinghua University, in 2019, with his dissertation recognized by the Tsinghua Excellent Dissertation Award in 2019. Dr. Tu was a Postdoctoral Scholar at University of California, Santa Barbara, from 2019 to 2022, and a Postdoctoral Fellow at the AI Chip Center for Emerging Smart Systems (ACCESS), from 2022 to 2023. His research interests include AI chip, computer architecture, reconfigurable computing, and computing-in-memory. His AI chips ReDCIM and Thinker won the 2023 Top-10 Research Advances in China Semiconductors and 2017 ISLPED Design Contest Award, respectively. He has published two books, Artificial Intelligence Chip Design in 2020, and Architecture Design and Memory Optimization for Neural Network Accelerators in 2022. Dr. Tu’s research has been published at top conferences and journals on integrated circuits and computer architecture, including ISSCC, JSSC, DAC, ISCA, and MICRO.Tu is currently an Assistant Professor in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Technology. He received the Ph.D. degree from the Institute of Microelectronics, Tsinghua University, in 2019, with his dissertation recognized by the Tsinghua Excellent Dissertation Award in 2019. Dr. Tu was a Postdoctoral Scholar at University of California, Santa Barbara, from 2019 to 2022, and a Postdoctoral Fellow at the AI Chip Center for Emerging Smart Systems (ACCESS), from 2022 to 2023. His research interests include AI chip, computer architecture, reconfigurable computing, and computing-in-memory. His AI chips ReDCIM and Thinker won the 2023 Top-10 Research Advances in China Semiconductors and 2017 ISLPED Design Contest Award, respectively. He has published two books, Artificial Intelligence Chip Design in 2020, and Architecture Design and Memory Optimization for Neural Network Accelerators in 2022. Dr. Tu’s research has been published at top conferences and journals on integrated circuits and computer architecture, including ISSCC, JSSC, DAC, ISCA, and MICRO.