Special Program – Convergence Workshop
Expanding knowledge of IC designers beyond circuits
Chair: Makoto Takamiya, The University of Tokyo
Co-chair: Atsutake Kosuge, The University of Tokyo
Convergence Workshop Talk 1
CGRA-based SoC design for AI Acceleration
Fudan University, China
Convolutional Neural Networks (CNNs) and Transformer neural networks have been widely applied in fields such as natural language processing and computer vision. Coarse-Grained Reconfigurable Architectures (CGRAs) are highly suitable for accelerating CNN and Transformer applications due to their high flexibility and energy efficiency. This presentation focuses on the design challenges of CGRA-based heterogenous accelerator architecture with RISC-V processor driven by CNN and Transformers. It introduces a CGRV-OPT compiler based on MLIR, including an automated software-hardware partitioning mechanism to seamlessly translate different workloads into low-level intermediate representations (IR) for different heterogenous architectures. We propose an agile development architecture template capable of realizing CNN and Transformer accelerators, providing a broad design space for optimized CGRA accelerator implementations.
Lingli Wang, a professor at the National Key Laboratory of Integrated Chips and Systems, Fudan University. He received PhD degree from Napier University in Edinburgh, UK in 2001. During Feb. 2001 and April 2005, he was working in Altera European Technology Center (part of Intel now). His research interests include integrated circuit design & EDA algorithms, FPGA architecture and application acceleration, reconfigurable computing and quantum computing. He has published over 150 academic papers.
Convergence Workshop Talk 2
Nanosecond-level Feedback Control Technology Using FPGA for Parallel SiC-MOSFETs
Tohoku University, Japan
This presentation will give an overview of the development status of power electronics, which is working on adding new performance by integrating digital technology and the latest power devices, and will also explain specific development examples related to parallel control.
In recent years, wide bandgap power devices with high switching speed and low loss, such as SiC-MOSFETs and GaN-HEMTs, have been developed and applied to electric vehicles and data centers. For such high-speed switching devices, it is important to balance the current of parallel-connected power devices for both electrical performance and reliability. Therefore, we have developed a high-speed feedback control method that compares the switching timing of parallel-connected SiC-MOSFETs on an FPGA with a resolution of 1 nanosecond and adjusts the timing of gate signals to suppress the current concentration in the parallel-connected SiC-MOSFETs.
Graduated from Waseda University in 1982.
Joined Fuji Electric Co., Ltd.
Received his Ph.D. in Engineering from Yamanashi University in 1998.
2009-2017, Part-time lecturer and guest senior researcher, Waseda University.
2013-2017, Visiting Professor, Graduate School of Yamanashi University,
After serving as Director of Next Generation Module Development Center, Fuji Electric Co., Ltd.
2017- Professor, R&D Division Director, Center for Innovative Integrated Electronic Systems, Tohoku University
2019-2023 Deputy Program Director, Energy System of an IoE Society, SIP, Cabinet Office, Japan
From 2021, Research representative, Power Electronics Circuit System Area “Integrated Power Electronics Contributing to a Decarbonized Society”, INNOPEL, MEXT, Japan
Recipient of the 47th Electrical Science and Technology Encouragement Award, and the 17th STS Award
Member of IEEE, IEEJ, JSAP, JIEP.JSSD
Convergence Workshop Talk 3
HEMA: A Practical FPGA-based Accelerator for Homomorphic Encrypted Multiplication on Ciphertext
Hong Kong University of Science and Technology, China
Fully Homomorphic Encryption (FHE) has emerged as a promising solution for privacy-preserving computing, paving the way for the widespread adoption of cloud computing with ideal security. Despite advancements in theoretical cryptography, the efficiency of FHE remains a challenge due to its significant computing and memory requirements when operating on encrypted data. Homomorphic multiplication is the most computation-intensive high-level operation in FHE, but its efficient implementation is not yet fully developed, posing a critical challenge for FHE acceleration.
We propose a practical FPGA-based accelerator called HEMA for homomorphic encrypted multiplication that supports unbounded computations on ciphertexts. A novel hybrid parallelism design is introduced to eliminate delays and redundant data movement caused by unaligned parallelism patterns with the help of optimized datapath scheduling. We present a full-RNS Base Conversion implementation in favor of this hybrid pattern using a parallel block matrix multiplication (PBMM) model to exploit data reusability for the first time.
We also implement the Number Theoretic Transform (NTT) module with high speed and area efficiency, which is the most frequent and expensive primitive in FHE. A novel algorithmic implementation of NTT modeled on tensor products is first proposed, which provides high flexibility in parameter sets and high scalability in processing elements (PEs). Different levels of parallelism are then explored to adapt to the trade-off between performance and area efficiency. With the help of stride permutation, a non-conflict data flow control is built to significantly simplify the memory access pattern, contributing to higher performance of NTT.
Our proposed accelerator is evaluated on the Xilinx Alveo U280 FPGA platform. Experimental results demonstrate its superior performance compared to CPU, GPU, and FPGA-based implementations with the improvement of 993×, 6.62×, and 3.17× respectively.
Prof Zhang received her PhD degree in Electrical Engineering from Princeton University with Wu Prize for research excellence. She joins the Department of Electronic and Computer Engineering at Hong Kong University of Science and Technology in 2013 and establishes Reconfigurable System Lab, where she is currently a professor. She was an assistant professor in School of Computer Engineering at Nanyang Technological University, Singapore from 2010 to 2013. She has authored and co-authored more than 150 papers in peer-reviewed journals and international conferences, and won the best paper awards in ISVLSI 2009, ICCAD 2017 and ICCAD 2022. Her current research interests include FPGA-based design, heterogeneous computing, electronic design automation, and embedded system security. Prof. Zhang currently serves in several editorial boards as Associate Editor, such as ACM TRETS, IEEE TCAD, ACM TECS, IEEE TCAS II, etc. She also served in many organization committees and technical program committees, such as DAC, ICCAD, DATE, FPGA, FCCM, etc.
Convergence Workshop Talk 4
Flash Memory for Generative AI
Kioxia Corporation, Japan
RAG (Retrieval Augmented Generation) has emerged as a promising AI framework, especially in enterprise use cases, to enhance the output of generative AI with LLMs through incorporating external information (index and vector data) stored in Vector DB. Currently, one of the popular algorithms to retrieve information from Vector DB is HNSW (Hierarchical Navigable Small World), in which whole index and vector data is stored in DRAM. However, as the amount of index and vector data enormously grows, it becomes very difficult and challenging to store it all in just DRAM. Tackling this issue, KIOXIA is promoting the concept which spontaneously utilizes SSDs instead of DRAM to store and retrieve the Vector DB in more efficient and scalable way. We call this concept, “RAG Optimized SSD Solution (ROSS)” and bolster two algorithms for this concept. One is “DiskANN” (developed by Microsoft), which utilizes DRAM and SSD in a hybrid manner, and the other is “AiSAQ: All- in-Storage Approximate Nearest Neighbor Search with Product Quantization” (developed by KIOXIA). AiSAQ reduces DRAM usage down to almost zero while keeping the comparable latency and throughput to DiskANN. This talk will provide an in-depth overview of the ROSS.
Jun Deguch received the B.E. and M.E. degrees in machine intelligence and systems engineering and the Ph.D. degree in bioengineering and robotics from Tohoku University, Sendai, Japan, in 2001, 2003, and 2006, respectively. In 2004, he was a Visiting Scholar at the University of California, Santa Cruz, CA, USA. In 2006, he joined Toshiba Corporation, and was involved in design of analog/RF circuits for wireless communications, CMOS image sensors, high-speed I/O, and accelerators for deep learning. From 2014 to 2015, he was a Visiting Scientist at the MIT Media Lab, Cambridge, MA, USA, and was involved in research on brain/neuro science. In 2017, he moved to Kioxia Corporation (formerly Toshiba Memory Corporation), and is serving as the group manager of a research team working on AI-related technology from algorithms to circuit designs. Dr. Deguchi has served as a member of the international technical program committee (TPC) of IEEE Asian Solid-State Circuits Conference (A-SSCC) since 2017. He also served as a TPC member of IEEE International Solid-State Circuits Conference (ISSCC) from 2016 to 2023, a Far-East chair of IEEE ISSCC 2023, a TPC vice-chair of IEEE A-SSCC 2019, a guest editor of IEEE Journal of Solid-State Circuits (JSSC) for the special issues on IEEE A-SSCC 2020, IEEE ISSCC 2020 and IEEE ISSCC 2021. He has also been a review committee member of IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2020.