Search results
(1 - 1 of 1)
- Title
- A Novel CNFET SRAM-Based Computing-In-Memory Design and Low Power Techniques for AI Accelerator
- Creator
- Kim, Young Bae
- Date
- 2023
- Description
-
Power consumption and data processing speed of integrated circuits (ICs) is an increasing concern in many emerging Artificial Intelligence (AI...
Show morePower consumption and data processing speed of integrated circuits (ICs) is an increasing concern in many emerging Artificial Intelligence (AI) applications, such as autonomous vehicles and Internet of Things (IoT). In addition, according to the 2020 International Technology Road map for Semiconductors (ITRS), the high power consumption trend of AI chips far exceeds the power requirements. As a result, power optimization techniques are highly regarded in nowadays AI chip designs. There are various low-power methodologies from the system level to the layout level, and we are focusing on transistor level and register transfer level (RTL) through this thesis. In this thesis, we propose a novel ultra-low power voltage-based computing-in- memory (CIM) design with a new SRAM bit cell structure for AI Accelerator. The basic working principle of CIM (Computing-in-memory) is to use the existing internal embedded memory array (e.g. SRAM) instead of external memory, and it reduces unnecessary access to external memory by calculating with internal embedded mem- ory. Since the proposed our SRAM bit cell uses a single bitline for CIM calculation with decoupled read and write operations, it supports much higher energy eciency. In addition, to separate read and write operations, the stack structure of the read unit minimizes leakage power consumption. Moreover, the proposed bit cell structure provides better read and write stability due to the isolated read path, write path and greater pull-up ratio. Compared to the state-of-the-art SRAM-CIM, our proposed SRAM-CIM does not require extra transistors for CIM vector-matrix multiplication. We implemented a 16k (128⇥128) bit cell array for the computation of 128x neurons, and used 64x binary inputs (0 or 1) and 64⇥128 binary weights (-1 or +1) values for the binary neural networks (BNNs). Each row of the bit cell array corresponding to a single neuron consists of a total of 128 cells, 64x cells for dot-product and 64x replicas cells for ADC reference. And 64x replicas cells consist of 32x cells for ADC reference and 32x cells for o↵set calibration. We used a row-by-row ADC for the quantized outputs of each neuron, which supports 1-7 bits of output for each neuron. The ADC uses the sweeping method using 32x duplicate bit cells, and the sweep cycle is set to 2N1 +1, where N is the number of output bits. The simulation is performed at room temperature (27C) using 32nm CNFET and 20nm FinFET technology via Synopsys Hspice, and all transistors in bitcells use the minimum size considering the area, power, and speed. The proposed SRAM-CIM has reduced power consumption for vector-matrix multiplication by 99.96% compared to the existing state-of-the-art SRAM-CIM. Moreover, because of the separated reading unit from an internal node of latch, there is no feedback from the read access circuit, which makes it read static noise margin (SNM) free. Furthermore, for the low power AI accelerator design, we propose a new AI accelerator design method that applies low power techniques such as bus specific clock gating (BSCG) and local explicit clock gating (LECG) at the register-transfer- level (RT-level). And evaluates them on the Xilinx ZCU-102 FPGA SoC hardware platform and 45nm technology for ASIC, respectively. It measures dynamic power using a commercial EDA tool, and chooses only a subset of FFs to be gated selectively based on their switching activities. We achieve up to a 53.21% power reduction in the ASIC implementation and saved 32.72% of the dynamic power dissipation in the FPGA implementation. This shows that our RTL low power schemes have a powerful possibility of dynamic power reduction when applied to the FPGA design flow and ASIC design flow for the implementation of the AI system.
Show less