CN115954029A

CN115954029A - Multi-bit operation module and in-memory calculation circuit structure using the same

Info

Publication number: CN115954029A
Application number: CN202310026356.1A
Authority: CN
Inventors: 周永亮; 周子璇; 施琦; 杨震; 韦一鸣; 彭春雨; 吴秀龙
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-04-11

Abstract

The present invention relates to the field of static random access memory technology, and more particularly, to a multi-bit arithmetic module and a memory computing circuit structure using the same. The multi-bit operation module completes multi-bit multiply-accumulate operation by calculating discharge accumulation of the bit line load capacitance, the design of bit weight division and global bit line separation has good calculation parallelism and stability and higher reasoning precision, and the multi-bit operation module is matched with a subsequent quantization unit module to obtain quantization output and can support multi-bit MAC operation in a deep neural network.

Description

Multi-bit operation module and in-memory calculation circuit structure using the module

技术领域technical field

本发明涉及静态随机存储器技术领域，更具体的，涉及多比特运算模块，以及使用了该模块的存内计算电路结构。The invention relates to the technical field of static random access memory, and more specifically, relates to a multi-bit computing module and an in-memory computing circuit structure using the module.

背景技术Background technique

近年来，卷积神经网络(Convolutional Neural Networks，CNN)在涉及人工智能(AI)和物联网(IoT)的众多应用中取得了前所未有的成功，如图像识别、语音关键词检测、人脸识别等。In recent years, Convolutional Neural Networks (CNN) have achieved unprecedented success in many applications involving artificial intelligence (AI) and the Internet of Things (IoT), such as image recognition, voice keyword detection, face recognition, etc. .

然而，受到计算硬件的限制，处理AI任务时效率低下。传统的计算硬件是基于冯·诺依曼架构，由于存储器和计算单元是独立的两个部分，计算机执行计算操作时，需要从存储器中取出数据，传输到计算单元中进行计算，而后再写回存储器。由于数据在处理单元(Processing Elemen，PEs)和内存之间的移动，容易出现能耗过高和延迟的问题，称为“存储墙”。存内计算(Computing In Memory，CIM)打破传统计算机的冯·诺依玛架构，将计算电路嵌入存储器中，存储和计算连为一体，从而大幅度降低了数据迁移及对存储器的访存消耗。However, limited by computing hardware, it is inefficient when dealing with AI tasks. Traditional computing hardware is based on the von Neumann architecture. Since the memory and the computing unit are two independent parts, when the computer performs computing operations, it needs to fetch data from the memory, transfer it to the computing unit for calculation, and then write it back. memory. Due to the movement of data between processing elements (Processing Elements, PEs) and memory, it is prone to excessive energy consumption and latency problems, known as "storage walls". Computing In Memory (CIM) breaks the von Neumar architecture of traditional computers, embeds computing circuits in memory, and integrates storage and computing, thereby greatly reducing data migration and memory access consumption.

非易失性内存计算(Nonvolatile Computing In Memory)在要求非易失数据存储和低功耗的电池供电的微型AI设备上极具优势。目前的非易失性存内计算技术方案支持二进制神经网络(BNNs)或二进制权重网络(BWNs)，一定程度上减少了存储需要，提高了能效。然而BNNs、BWNs仅适用于简单的网络，应用于复杂应用时只能提供有限的系统级推理精度，限制了AI技术的进一步发展。因此，具有多比特输入(IN)、权重(W)和输出(OUT)的乘累加(Multiply And Accumulate，MAC)存内计算技术对于要求高推理精度的高级AI边缘芯片是极为重要的。Nonvolatile Computing In Memory (Nonvolatile Computing In Memory) has great advantages in battery-powered micro AI devices that require nonvolatile data storage and low power consumption. The current non-volatile in-memory computing technology solution supports binary neural networks (BNNs) or binary weight networks (BWNs), which reduces storage requirements to a certain extent and improves energy efficiency. However, BNNs and BWNs are only suitable for simple networks, and can only provide limited system-level reasoning accuracy when applied to complex applications, which limits the further development of AI technology. Therefore, Multiply And Accumulate (MAC) in-memory computing technology with multi-bit input (IN), weight (W) and output (OUT) is extremely important for advanced AI edge chips that require high inference accuracy.

发明内容Contents of the invention

基于此，有必要针对现有传统存内计算推理精度有限的问题，提供多比特运算模块以及使用了该模块的存内计算电路结构。Based on this, it is necessary to provide a multi-bit computing module and an in-memory computing circuit structure using the module for the problem of limited reasoning accuracy of traditional in-memory computing.

本发明采用以下技术方案实现：The present invention adopts following technical scheme to realize:

第一方面，本发明提供了多比特运算模块，包括分比特计算模块一、分比特计算模块二。In a first aspect, the present invention provides a multi-bit calculation module, including a bit division calculation module 1 and a bit division calculation module 2.

分比特计算模块一包括n个级联计算单元一、n条权重位线一LW[1]～LW[n]。The bit division calculation module 1 includes n cascaded calculation units 1 and n weight bit lines 1 LW[1]˜LW[n].

其中，第k个级联计算单元一包括4个NMOS管N1[k]、N2[k]、N3[k]、N4[k]。N1[k]、N2[k]的规格相同。1≤k≤n。Wherein, the kth cascaded computing unit 1 includes 4 NMOS transistors N1[k], N2[k], N3[k], N4[k]. The specifications of N1[k] and N2[k] are the same. 1≤k≤n.

N1[k]的栅极连接权重位线一LW[k]，漏极连接计算位线CBL，源极连接节点一X1[k]。N2[k]的栅极连接权重位线一LW[k]，漏极连接计算位线CBLB，源极连接节点二X2[k]。N3[k]的栅极连接全局位线GBL，漏极连接节点一X1[k]，源极连接地GND。N4[k]，其栅极连接全局位线GBLB，漏极连接节点二X2[k]，源极连接地GND。The gate of N1[k] is connected to the weight bit line one LW[k], the drain is connected to the calculation bit line CBL, and the source is connected to the node one X1[k]. The gate of N2[k] is connected to weight bit line one LW[k], the drain is connected to calculation bit line CBLB, and the source is connected to node two X2[k]. The gate of N3[k] is connected to the global bit line GBL, the drain is connected to node one X1[k], and the source is connected to the ground GND. N4[k], its gate is connected to the global bit line GBLB, its drain is connected to node 2 X2[k], and its source is connected to the ground GND.

分比特计算模块二包括n个级联计算单元二、n条偶数权重位线二RW[1]～RW[n]。The sub-bit computing module 2 includes n cascaded computing units 2 and n even-weighted bit lines 2 RW[1]-RW[n].

其中，第k个级联计算单元二包括4个NMOS管N5[k]、N6[k]、N7[k]、N8[k]。N5[k]、N6[k]的规格相同。N7[k]、N8[k]、N3[k]、N4[k]的规格相同，N5[k]宽长比是N1[k]宽长比的h倍。Wherein, the k-th cascaded computing unit 2 includes four NMOS transistors N5[k], N6[k], N7[k], and N8[k]. The specifications of N5[k] and N6[k] are the same. The specifications of N7[k], N8[k], N3[k], and N4[k] are the same, and the aspect ratio of N5[k] is h times that of N1[k].

N5[k]的栅极连接权重位线二RW[k]，漏极连接计算位线CBL，源极连接节点三X3[k]。N6[k]的栅极连接权重位线二RW[k]，漏极连接计算位线CBLB，源极连接节点四X4[k]。N7[k]的栅极连接全局位线GBL，漏极连接节点三X3[k]，源极连接地GND。N8[k]的栅极连接全局位线GBLB，漏极连接节点四X4[k]，源极连接地GND。The gate of N5[k] is connected to weight bit line two RW[k], the drain is connected to calculation bit line CBL, and the source is connected to node three X3[k]. The gate of N6[k] is connected to weight bit line 2 RW[k], the drain is connected to calculation bit line CBLB, and the source is connected to node 4 X4[k]. The gate of N7[k] is connected to the global bit line GBL, the drain is connected to node 3 X3[k], and the source is connected to the ground GND. The gate of N8[k] is connected to the global bit line GBLB, the drain is connected to node 4 X4[k], and the source is connected to the ground GND.

权重位线二RW[k]、权重位线一LW[k]用于提供权重值。全局位线GBL、GBLB用于提供多比特输入值。The weight bit line two RW[k] and the weight bit line one LW[k] are used to provide weight values. Global bit lines GBL, GBLB are used to provide multi-bit input values.

多比特运算模块从分比特计算模块一、分比特计算模块二选通列并行工作，接收权重值与多比特输入值，进行多比特乘累加计算。计算位线CBL、CBLB用于通过电压变化量反映多比特乘累加计算结果。The multi-bit calculation module works in parallel with the selection columns of the bit-divided calculation module 1 and the bit-divided calculation module 2, receives weight values and multi-bit input values, and performs multi-bit multiplication and accumulation calculations. Calculation bit lines CBL and CBLB are used to reflect multi-bit multiplication and accumulation calculation results through voltage variation.

该种多比特运算模块的实现根据本公开的实施例的方法或过程。The realization of this multi-bit operation module is according to the method or process of the embodiment of the present disclosure.

第二方面，本发明公开了存内计算电路结构，包括存储阵列模块、数据选择模块、灵敏放大器模块、模式选择模块、如第一方面公开的多比特运算模块、量化单元模块、时序控制电路模块。In the second aspect, the present invention discloses an in-memory computing circuit structure, including a storage array module, a data selection module, a sense amplifier module, a mode selection module, a multi-bit operation module disclosed in the first aspect, a quantization unit module, and a timing control circuit module .

存储阵列模块用于提供标准读写模式和多比特乘累加计算模式。存储阵列模块包括存储部及参考部。数据选择模块包括列选择模块、行译码模块，用于在标准读写模式下，根据外部地址信号对存储部中对应的存储单元进行定位访问。列选择模块还连接有写入驱动电路，用于控制对存储单元进行写入。灵敏放大器模块用于将存储部产生的读电流与参考部的参考电流进行比较并生成转换电压，对转换电压放大并获得输出权重值。灵敏放大器模块还连接有读出驱动电路，其用于在标准读写模式的读操作时读取输出权重值。模式选择模块用于切换存储阵列模块的标准读写模式和多比特乘累加计算模式。多比特运算模块在多比特计算功能模式下，根据权重值、多比特输入值，进行多比特乘累加计算。多比特运算模块连接有输入寄存器，用于通过全局位线GBL、GBLB将多比特输入值输入进多比特运算模块中。量化单元模块用于在多比特乘累加计算模式下，将计算位线CBL、CBLB累积的电压变化量进行量化，获得量化输出。时序控制电路模块用于控制存内计算电路结构各部分时序，使其对应工作。The storage array module is used to provide a standard read-write mode and a multi-bit multiply-accumulate calculation mode. The storage array module includes a storage unit and a reference unit. The data selection module includes a column selection module and a row decoding module, which are used for positioning and accessing corresponding storage units in the storage unit according to external address signals in standard read-write mode. The column selection module is also connected with a writing drive circuit, which is used to control writing to the storage unit. The sense amplifier module is used to compare the read current generated by the storage unit with the reference current of the reference unit to generate a conversion voltage, amplify the conversion voltage and obtain an output weight value. The sense amplifier module is also connected with a read-out driving circuit, which is used to read the output weight value during the read operation in the standard read-write mode. The mode selection module is used to switch the standard read-write mode and the multi-bit multiply-accumulate calculation mode of the storage array module. In the multi-bit calculation function mode, the multi-bit calculation module performs multi-bit multiplication and accumulation calculation according to the weight value and multi-bit input value. The multi-bit operation module is connected with an input register for inputting multi-bit input values into the multi-bit operation module through the global bit lines GBL and GBLB. The quantization unit module is used to quantize the accumulated voltage variation of the calculated bit lines CBL and CBLB in the multi-bit multiply-accumulate calculation mode to obtain a quantized output. The timing control circuit module is used to control the timing of each part of the in-memory computing circuit structure to make it work correspondingly.

该种存内计算电路结构的实现根据本公开的实施例的方法或过程。The implementation of this in-memory computing circuit structure is according to the method or process of the embodiment of the present disclosure.

与现有技术相比，本发明具备如下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1，本发明的多比特运算模块通过计算位线负载电容的放电累加完成了多比特乘累加运算，分比特权重和分离全局位线的设计具有良好的计算并行度和稳定性，具有较高的推理精度，且与后续的量化单元模块配合获得量化输出，可支持深度神经网络多比特MAC运算。1. The multi-bit operation module of the present invention completes the multi-bit multiplication and accumulation operation by calculating the discharge and accumulation of the bit line load capacitance, and the design of dividing bit weights and separating global bit lines has good calculation parallelism and stability, and has high Inference accuracy, and cooperate with the subsequent quantization unit module to obtain quantized output, which can support multi-bit MAC operation of deep neural network.

2，本发明的存储阵列模块采用1T-1MTJ存储单元构成的MRAM，具有较高的存储密度和算力密度，可减小面积的开销。本发明可在访存的同时，完成多比特乘累加运算，能够显著地降低网络整体功耗。2. The memory array module of the present invention adopts MRAM composed of 1T-1MTJ memory cells, which has high storage density and computing power density, and can reduce area overhead. The invention can complete multi-bit multiplication and accumulation operation while accessing the memory, and can significantly reduce the overall power consumption of the network.

3，本发明基于MRAM实现了多比特乘累加计算，具有静态功耗低和非易失性的特点，在要求数据非易失存储和低功耗电池的设备应用中具有优势。3. The present invention realizes multi-bit multiplication and accumulation calculation based on MRAM, has the characteristics of low static power consumption and non-volatility, and has advantages in equipment applications requiring non-volatile storage of data and low-power consumption batteries.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only are some embodiments of the present invention, for those of ordinary skill in the art,

在不付出创造性劳动性的前提下，还可以根据这些附图获得其它的附图。Other drawings can also be obtained from these drawings without any creative effort.

图1为本发明实施例1中存内计算电路结构的结构示意图；FIG. 1 is a schematic structural diagram of an in-memory computing circuit structure in Embodiment 1 of the present invention;

图2为图1中存储阵列模块、列选择器模块、灵敏放大器模块、模式选择器模块、多比特运算模块的结构图；Fig. 2 is the structural diagram of memory array module, column selector module, sense amplifier module, mode selector module, multi-bit computing module in Fig. 1;

图3为图2中左存储阵列、左参考阵列的结构图；Fig. 3 is a structural diagram of the left storage array and the left reference array in Fig. 2;

图4为图2中右存储阵列、左参考阵列的结构图；Fig. 4 is a structural diagram of the right storage array and the left reference array in Fig. 2;

图5为图2中多比特运算模块中第k个级联计算单元一、第k个级联计算单元二构成一对级联计算单元的结构图；Fig. 5 is a structural diagram of a pair of cascaded computing units formed by the k-th cascaded computing unit 1 and the k-th cascaded computing unit 2 in the multi-bit computing module in Fig. 2;

图6为图2中多比特运算模块模拟域乘累加计算等效电路图；Fig. 6 is the equivalent circuit diagram of multiplying and accumulating calculation in the analog domain of the multi-bit computing module in Fig. 2;

图7为图2中灵敏放大器模块的结构图；Fig. 7 is a structural diagram of the sense amplifier module in Fig. 2;

图8为图7中灵敏放大器模块进行读操作瞬态仿真波形图；FIG. 8 is a transient simulation waveform diagram of a read operation performed by the sense amplifier module in FIG. 7;

图9为图2中量化单元模块的结构图；Fig. 9 is a structural diagram of the quantization unit module in Fig. 2;

图10为图9的量化单元模块的量化过程示意图；FIG. 10 is a schematic diagram of the quantization process of the quantization unit module in FIG. 9;

图11为图2中多比特运算模块执行2-bit输入与2-bit权重乘累加计算结果示意图；Fig. 11 is a schematic diagram of the multi-bit operation module in Fig. 2 performing 2-bit input and 2-bit weight multiplication and accumulation calculation results;

图12为图1的存内计算电路结构基于2-bit输入与2-bit权重乘累加计算蒙特卡洛仿真结果图A；Fig. 12 is the Monte Carlo simulation result diagram A of the in-memory computing circuit structure in Fig. 1 based on 2-bit input and 2-bit weight multiplication and accumulation calculation;

图13为图1的存内计算电路结构基于2-bit输入与2-bit权重乘累加计算蒙特卡洛仿真结果图B；Fig. 13 is the in-memory computing circuit structure of Fig. 1 based on 2-bit input and 2-bit weight multiplication and accumulation calculation Monte Carlo simulation result diagram B;

图14为图1的存内计算电路结构基于2-bit输入与2-bit权重乘累加计算蒙特卡洛仿真结果图C；Fig. 14 is the Monte Carlo simulation result diagram C of the in-memory computing circuit structure in Fig. 1 based on 2-bit input and 2-bit weight multiplication and accumulation calculation;

图15为本发明实施例1提供的存内计算电路结构中系统功耗及能效随工作电压变化的示意图。15 is a schematic diagram of system power consumption and energy efficiency varying with operating voltage in the in-memory computing circuit structure provided by Embodiment 1 of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

需要说明的是，当组件被称为“安装于”另一个组件，它可以直接在另一个组件上或者也可以存在居中的组件。当一个组件被认为是“设置于”另一个组件，它可以是直接设置在另一个组件上或者可能同时存在居中组件。当一个组件被认为是“固定于”另一个组件，它可以是直接固定在另一个组件上或者可能同时存在居中组件。It should be noted that when a component is said to be "mounted on" another component, it can be directly on the other component or there can also be an intervening component. When a component is said to be "set on" another component, it may be set directly on the other component or there may be an intervening component at the same time. When a component is said to be "fixed" to another component, it may be directly fixed to the other component or there may be an intervening component at the same time.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本发明。本文所使用的术语“或/及”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of the invention. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention. As used herein, the term "or/and" includes any and all combinations of one or more of the associated listed items.

实施例1Example 1

参看图1，为本实施例1公开的存内计算电路结构的结构示意图。Referring to FIG. 1 , it is a schematic structural diagram of the in-memory computing circuit structure disclosed in the first embodiment.

存内计算电路结构包括存储阵列模块、数据选择模块、灵敏放大器模块、模式选择模块、多比特运算模块、量化单元模块、时序控制电路模块。参看图3，由上到下依次展示了存储阵列模块、列选择模块、灵敏放大器模块、模式选择器模块、多比特运算模块的结构图。The in-memory computing circuit structure includes a storage array module, a data selection module, a sensitive amplifier module, a mode selection module, a multi-bit operation module, a quantization unit module, and a timing control circuit module. Referring to FIG. 3 , the structural diagram of the memory array module, the column selection module, the sense amplifier module, the mode selector module, and the multi-bit operation module are shown sequentially from top to bottom.

存储阵列模块用于提供标准读写模式和多比特乘累加计算模式。存储阵列模块包括存储部及参考部，包括(2N+2N/j)列M行阵列。The storage array module is used to provide a standard read-write mode and a multi-bit multiply-accumulate calculation mode. The storage array module includes a storage unit and a reference unit, including an array of (2N+2N/j) columns and M rows.

具体的，如图2所示，存储部包括左存储阵列、右存储阵列：Specifically, as shown in Figure 2, the storage unit includes a left storage array and a right storage array:

如图3所示，为图2的左部放大图。左存储阵列包括N列M行的存储单元；其中，每j列构成一组左子阵列，左存储阵列包括N/j组左子阵列。As shown in Figure 3, it is an enlarged view of the left part of Figure 2. The left storage array includes N columns and M rows of storage units; wherein, every j column constitutes a group of left sub-arrays, and the left storage array includes N/j groups of left sub-arrays.

如图4所示，为图2的右部放大图。右存储阵列也包括N列M行的存储单元；其中，每j列构成一组右子阵列，右存储阵列包括N/j组右子阵列。As shown in Fig. 4, it is an enlarged view of the right part of Fig. 2 . The right storage array also includes N columns and M rows of storage units; wherein, every j column constitutes a group of right sub-arrays, and the right storage array includes N/j groups of right sub-arrays.

参考部包括左参考阵列、右参考阵列：左参考阵列包括对应左存储阵列的N/j列M行的左参考单元。其中，第k列左参考单元与第k组左子阵列对应设置；1≤k≤N/j。类似的，右参考阵列包括对应右存储阵列的N/j列M行的右参考单元；其中，第k列右参考单元与第k组右子阵列对应设置。The reference part includes a left reference array and a right reference array: the left reference array includes left reference units corresponding to N/j columns and M rows of the left storage array. Wherein, the left reference unit of the kth column is set corresponding to the left subarray of the kth group; 1≤k≤N/j. Similarly, the right reference array includes right reference units corresponding to N/j columns and M rows of the right storage array; wherein, the right reference unit in the kth column corresponds to the kth group of right subarrays.

存储单元、左参考单元、右参考单元都是基于MRAM，区别在于连接的位线和源线不同：The storage unit, the left reference unit, and the right reference unit are all based on MRAM. The difference is that the connected bit lines and source lines are different:

存储单元包括NMOS管M1、磁隧道结器件MTJ1。NMOS管M1的栅极连接字线WL，漏极连接源线SL。磁隧道结器件MTJ1的一端电连接位线BL，另一端电连接M1的源极。The storage unit includes an NMOS transistor M1 and a magnetic tunnel junction device MTJ1. The gate of the NMOS transistor M1 is connected to the word line WL, and the drain is connected to the source line SL. One end of the magnetic tunnel junction device MTJ1 is electrically connected to the bit line BL, and the other end is electrically connected to the source of M1.

左参考单元和右参考单元结构相同，包括NMOS管M2、磁隧道结器件MTJ2。NMOS管M2的栅极连接字线WL，漏极连接参考源线。磁隧道结器件MTJ2的一端电连接参考位线，另一端电连接M2的源极。The left reference unit and the right reference unit have the same structure, including an NMOS transistor M2 and a magnetic tunnel junction device MTJ2. The gate of the NMOS transistor M2 is connected to the word line WL, and the drain is connected to the reference source line. One end of the magnetic tunnel junction device MTJ2 is electrically connected to the reference bit line, and the other end is electrically connected to the source of M2.

需要说明的是，磁隧道结器件MTJ1、MTJ2根据写操作电流的方向不同会呈现高阻和低阻两种状态。It should be noted that the magnetic tunnel junction devices MTJ1 and MTJ2 will exhibit two states of high resistance and low resistance according to the direction of the write operation current.

同一行的存储单元、左参考单元、右参考单元共用同一根字线WL。同一列的存储单元共用同一根位线BL、同一根源线SL。同一列的左参考单元共用同一根参考位线、同一根参考源线。同一列的右参考单元共用同一根参考位线、同一根参考源线。The memory cells, the left reference cell and the right reference cell in the same row share the same word line WL. Memory cells in the same column share the same bit line BL and the same source line SL. The left reference cells in the same column share the same reference bit line and the same reference source line. The right reference cells in the same column share the same reference bit line and the same reference source line.

其中，左参考阵列的第k根参考位线用于输出参考电流I_REF1[k]，右参考阵列的第k根参考位线用于输出参考电流I_REF2[k]。Wherein, the kth reference bit line of the left reference array is used to output the reference current I _REF1 [k], and the kth reference bit line of the right reference array is used to output the reference current I _REF2 [k].

多比特运算模块对应存储阵列模块设置，用于在多比特计算功能模式下，根据权重值、多比特输入值，进行多比特乘累加计算。其中，多比特运算模块连接有输入寄存器，其用于通过全局位线GBL、全局位线GBLB将多比特输入值输入进多比特运算模块中。权重值则来自于灵敏放大器模块。The multi-bit calculation module is set corresponding to the storage array module, and is used for multi-bit multiplication and accumulation calculation according to the weight value and multi-bit input value in the multi-bit calculation function mode. Wherein, the multi-bit operation module is connected with an input register, which is used for inputting multi-bit input values into the multi-bit operation module through the global bit line GBL and the global bit line GBLB. The weight values come from the sense amplifier block.

多比特运算模块包括分比特计算模块一、分比特计算模块二。The multi-bit calculation module includes a bit division calculation module 1 and a bit division calculation module 2.

参看图5，第k个级联计算单元一包括4个NMOS管N1(k)、N2(k)、N3(k)、N4(k)。Referring to FIG. 5 , the k-th cascaded computing unit includes 4 NMOS transistors N1(k), N2(k), N3(k), and N4(k).

N1(k)的栅极连接权第k条权重位线一LW(k)，漏极连接计算位线CBL，源极连接第k个节点一X1(k)。N2(k)的栅极连接权重位线一LW(k)，漏极连接计算位线CBLB，源极连接第k个节点二X2(k)。N3(k)的栅极连接全局位线GBL，漏极连接节点一X1(k)，源极连接地GND。N4(k)的栅极连接全局位线GBLB，漏极连接节点二X2(k)，源极连接地GND。The gate of N1(k) is connected to the k-th weight bit line—LW(k), the drain is connected to the calculation bit line CBL, and the source is connected to the k-th node—X1(k). The gate of N2(k) is connected to the weight bit line one LW(k), the drain is connected to the calculation bit line CBLB, and the source is connected to the kth node two X2(k). The gate of N3(k) is connected to the global bit line GBL, the drain is connected to node-X1(k), and the source is connected to the ground GND. The gate of N4(k) is connected to the global bit line GBLB, the drain is connected to node 2 X2(k), and the source is connected to the ground GND.

参看图5，第k个级联计算单元二包括4个NMOS管N5[k]、N6[k]、N7[k]、N8[k]。N5[k]的栅极连接第k条权重位线二RW[k]，漏极连接计算位线CBL，源极连接第k个节点三X3[k]。N6[k]的栅极连接权重位线二RW[k]，漏极连接计算位线CBLB，源极连接第k个节点四X4[k]。N7[k]的栅极连接全局位线GBL，漏极连接节点三X3[k]，源极连接地GND。N8[k]的栅极连接全局位线GBLB，漏极连接节点四X4[k]，源极连接地GND。Referring to FIG. 5 , the second kth cascaded computing unit includes four NMOS transistors N5[k], N6[k], N7[k], and N8[k]. The gate of N5[k] is connected to the k-th weight bit line 2 RW[k], the drain is connected to the calculation bit line CBL, and the source is connected to the k-th node 3 X3[k]. The gate of N6[k] is connected to the weight bit line 2 RW[k], the drain is connected to the calculation bit line CBLB, and the source is connected to the kth node 4 X4[k]. The gate of N7[k] is connected to the global bit line GBL, the drain is connected to node 3 X3[k], and the source is connected to the ground GND. The gate of N8[k] is connected to the global bit line GBLB, the drain is connected to node 4 X4[k], and the source is connected to the ground GND.

需要强调的是，n＝N/j。N1[k]、N2[k]的规格相同，1≤k≤n。N5[k]、N6[k]的规格相同。N7[k]、N8[k]、N3[k]、N4[k]的规格相同。N5[k]宽长比是N1[k]宽长比的h倍，通过调整h的值进而对级联计算单元的导通电流进行加权控制。It should be emphasized that n=N/j. The specifications of N1[k] and N2[k] are the same, 1≤k≤n. The specifications of N5[k] and N6[k] are the same. The specifications of N7[k], N8[k], N3[k], and N4[k] are the same. The width-to-length ratio of N5[k] is h times the width-to-length ratio of N1[k]. By adjusting the value of h, the conduction current of the cascaded computing units is weighted and controlled.

在多比特乘累加计算模式下，多比特运算模块利用计算位线电容(C_CBL/C_CBLB)的放电量信息实现对神经网络的卷积运算。In the multi-bit multiply-accumulate calculation mode, the multi-bit operation module uses the discharge information of the calculated bit line capacitance (C _CBL /C _CBLB ) to realize the convolution operation on the neural network.

总的来说就是，左、右两个存储阵列均为N列M行，每个存储阵列中按j列划分为N/j组子阵列。左存储阵列对应级联计算单元一(即低位级联计算单元)，右存储阵列对应级联计算单元二(即高位级联计算单元)。在标准读写模式和多比特乘累加计算模式下，左右两个存储阵列中的N/j组子阵列均并行工作，且相对应位置的左、右子阵列(即第k组左子阵列、第k组右子阵列)为一对，共N/j对；相应的左存储阵列对应的级联计算单元一与右存储阵列对应的级联计算单元二(即第k个低位级联计算单元、第k个高位级联计算单元)构成一对级联计算单元，共N/j对，如图5所示，这样保证了计算并行度和稳定性。In general, both the left and right storage arrays have N columns and M rows, and each storage array is divided into N/j groups of sub-arrays according to j columns. The left storage array corresponds to the first cascaded computing unit (ie, the low-order cascaded computing unit), and the right storage array corresponds to the second cascaded computing unit (ie, the high-order cascaded computing unit). Under the standard read-write mode and the multi-bit multiplication and accumulation calculation mode, the N/j groups of sub-arrays in the left and right storage arrays all work in parallel, and the left and right sub-arrays of the corresponding positions (that is, the kth group of left sub-arrays, The kth right sub-array) is a pair, a total of N/j pairs; the cascaded computing unit one corresponding to the left storage array and the second cascaded computing unit corresponding to the right storage array (ie the kth low-order cascaded computing unit , the kth high-level cascaded computing unit) form a pair of cascaded computing units, a total of N/j pairs, as shown in Figure 5, which ensures the parallelism and stability of the calculation.

当存储阵列模块完成标准读操作后，N/j对存储数据通过灵敏放大器模块读出，每对输出由DOUTL[k]与DOUTR[k]组成2-bit权重W[1:0]。After the memory array module completes the standard read operation, N/j pairs of stored data are read out through the sense amplifier module, and each pair of outputs consists of DOUTL[k] and DOUTR[k] to form a 2-bit weight W[1:0].

在多比特乘累加计算模式下，MEN置为高电平，N/j对2-bit权重分比特传输至对应的N/j对级联计算单元，外部4-bit输入IN[3:0]分为两组(IN[3:2]、IN[1:0])分别传递至全局位线GBL/GBLB，表征为VGBL/VGBLB高电平的持续时间，计算结果分别对应计算位线CBL/CBLB上的电压变化量。In the multi-bit multiplication and accumulation calculation mode, MEN is set to high level, N/j pairs of 2-bit weights are divided into bits and transmitted to the corresponding N/j pairs of cascaded calculation units, and the external 4-bit input is IN[3:0] Divided into two groups (IN[3:2], IN[1:0]) and transmitted to the global bit line GBL/GBLB respectively, characterized by the duration of VGBL/VGBLB high level, the calculation results correspond to the calculation of the bit line CBL/ The amount of voltage change on CBLB.

对于单对级联计算单元被激活，其开始读取对应的灵敏放大器(CSA)输出的权重信息：When a single pair of cascaded computing units is activated, it starts to read the weight information output by the corresponding sense amplifier (CSA):

若读取的权重为W[1:0]为“00”，则该计算单元不产生电流，即乘法操作结果为0；若读取的权重W[1:0]为1“01”、“10”和“11”，则计算位线电容(CCBL/CCBLB)该对计算单元开始对该计算单元放电，最终乘累加结果分别产生放电电流I、2I和3I，且根据输入值对应为全局的放电时间在计算位线CBL/CBLB上产生放电电压降。If the read weight W[1:0] is "00", the calculation unit does not generate current, that is, the result of the multiplication operation is 0; if the read weight W[1:0] is 1 "01", " 10" and "11", then calculate the bit line capacitance (CCBL/CCBLB). The pair of calculation units starts to discharge the calculation unit, and finally multiply and accumulate the results to generate discharge currents I, 2I and 3I respectively, and correspond to the global according to the input value The discharge time produces a discharge voltage drop on the compute bit line CBL/CBLB.

N/j对乘法结果在对应的计算位线CBL/CBLB上产生的放电累加，即CBL/CBLB上的总的电压变化量对应最终乘累加计算结果。The discharge and accumulation of the multiplication result of the N/j pair on the corresponding calculation bit line CBL/CBLB, that is, the total voltage change on the CBL/CBLB corresponds to the final multiplication and accumulation calculation result.

参看图6，为多比特运算模块执行2-bit输入与2-bit权重乘累加计算等效电路图，取第1个级联计算单元一的一半、第1个级联计算单元二的一半进行说明。本实施例中，将h取2。Referring to Figure 6, it is an equivalent circuit diagram for the multi-bit operation module to perform 2-bit input and 2-bit weight multiplication and accumulation calculation. Take half of the first cascaded computing unit 1 and half of the first cascaded computing unit 2 for illustration . In this embodiment, h is set to 2.

N1[1]与N2[1](未画出)构成低位级联计算单元，读取左存储子阵列中存储的权重值。N5[1]与N6[1](未画出)构成高位级联计算单元，读取右存储子阵列中存储的权重值，其中，LW[1]与RW[1]分别代表左存储子阵列与右存储子阵列的权重值，即2-bit权重W[1:0]分比特传输至高位级联计算单元与低位级联计算单元，其中一对2-bit输入IN[3:2]传递至全局位线GBL，根据输入数据确定V_GBL高电平持续时间T_GBL。N5[1]的宽长比为N1[1]的两倍，则当高位级联计算单元与低位级联计算单元均导通时，高位级联单元导通时产生电流I₂为低位级联计算单元导通电流I₁的两倍。N1[1] and N2[1] (not shown) form a low-order cascade computing unit, which reads the weight value stored in the left storage sub-array. N5[1] and N6[1] (not shown) form a high-level cascaded computing unit to read the weight value stored in the right storage sub-array, where LW[1] and RW[1] represent the left storage sub-array respectively The weight value of the right storage sub-array, that is, the 2-bit weight W[1:0] is transmitted to the high-order cascaded computing unit and the low-order cascaded computing unit, and a pair of 2-bit input IN[3:2] is transmitted To the global bit line GBL, the high level duration T _GBL of V _GBL is determined according to the input data. The width-to-length ratio of N5[1] is twice that of N1[1]. When both the high-order cascade computing unit and the low-order cascade computing unit are turned on, the current I ₂ generated when the high-order cascade unit is turned on is the low-order cascade Calculate twice the cell conduction current _I1 .

上述的多比特运算模块支持涉及多比特输入、多比特权重的乘累加运算，相较于现有的单比特乘累加运算，布尔逻辑运算等，本多比特运算模块适用于多种多比特神经网络，可提高AI边缘设备的推理精度。The above-mentioned multi-bit operation module supports multiply-accumulate operations involving multi-bit input and multi-bit weights. Compared with existing single-bit multiply-accumulate operations and Boolean logic operations, this multi-bit operation module is suitable for a variety of multi-bit neural networks , which can improve the inference accuracy of AI edge devices.

对于存储阵列模块而言，需要配置数据选择模块，用于在标准读写模式下，根据外部地址信号对存储部中对应的存储单元进行定位访问。依据存储阵列的分布特性，数据选择模块包括行译码模块、列选择模块，后者用于启用对应行，前者用于启用对应行。列选择模块还连接有写入驱动电路，用于控制对存储单元进行写入。For the memory array module, it is necessary to configure a data selection module for positioning and accessing the corresponding storage unit in the storage unit according to the external address signal in the standard read-write mode. According to the distribution characteristics of the storage array, the data selection module includes a row decoding module and a column selection module, the latter is used to enable the corresponding row, and the former is used to enable the corresponding row. The column selection module is also connected with a writing drive circuit, which is used to control writing to the storage unit.

(一)行译码模块连接在字线WL上(图3未画出)，M条字线WL共用同一个行译码模块(即同一个行译码器)。(1) The row decoding module is connected to the word line WL (not shown in FIG. 3 ), and the M word lines WL share the same row decoding module (ie, the same row decoder).

(二)列选择模块包括n个列选择器一、n个列选择器二。如图3所示，(2) The column selection module includes n column selectors 1 and n column selectors 2. As shown in Figure 3,

第k个列选择器一与第k组左子阵列对应设置。第k组左子阵列的位线BL与第k个列选择器一的输入端连接，第k个列选择器一的输出端输出读电流I_CELL1[k]。The kth column selector 1 is set corresponding to the kth group of left sub-arrays. The bit line BL of the left sub-array of the kth group is connected to the input terminal of the kth column selector 1, and the output terminal of the kth column selector 1 outputs the read current I _CELL1 [k].

第k个列选择二与第k组右子阵列对应设置。第k组右子阵列的位线BL与第k个列选择器二的输入端连接，第k个列选择器二的输出端输出读电流I_CELL2[k]。The kth column selection two is set correspondingly to the kth group of right subarrays. The bit line BL of the right sub-array of the kth group is connected to the input terminal of the kth column selector 2, and the output terminal of the kth column selector 2 outputs the read current I _CELL2 [k].

n个列选择器一、n个列选择器二共用同一个寻址信号CS，方便进行统一控制。具体的，寻址信号CS向列选择模块输入启用第k列，列选择器一即启用第k组左子阵列，列选择器二即启用第k组右子阵列。The n column selectors 1 and n column selectors 2 share the same addressing signal CS, which is convenient for unified control. Specifically, the addressing signal CS is input to the column selection module to enable the k-th column, the first column selector activates the k-th left sub-array, and the second column selector enables the k-th right sub-array.

灵敏放大器模块用于将存储部产生的读电流与参考部的参考电流进行比较并生成转换电压，对转换电压放大并获得输出权重值。The sense amplifier module is used to compare the read current generated by the storage unit with the reference current of the reference unit to generate a conversion voltage, amplify the conversion voltage and obtain an output weight value.

如图3所示，灵敏放大器模块包括n个灵敏放大器一、n个灵敏放大器二。As shown in FIG. 3 , the sense amplifier module includes n sense amplifiers 1 and n sense amplifiers 2 .

(A)第k个灵敏放大器一与第k个列选择器一连接。第k个灵敏放大器一包括第k个电流采样单元一、第k个电压放大器一，用于对I_CELL1[k]、I_REF1[k]进行采样和比较，输出DOUTL[k]。(A) The kth sense amplifier one is connected to the kth column selector one. The k-th sense amplifier 1 includes the k-th current sampling unit 1 and the k-th voltage amplifier 1 for sampling and comparing I _CELL1 [k] and I _REF1 [k], and outputting DOUTL[k].

具体的，参看图7，电流采样单元一包括6个PMOS管P1～P6，4个NMOS管NM1～NM4。Specifically, referring to FIG. 7 , the current sampling unit 1 includes 6 PMOS transistors P1 ˜ P6 and 4 NMOS transistors NM1 ˜ NM4 .

P1的栅极连接外部使能信号SAEN，源极连接电源VDD，漏极连接第一节点NET1。P2的栅极、漏极连接第一节点NET1，源极连接电源VDD。P3栅极连接第一节点NET1，源极连接电源VDD，漏极连接第一级输出节点SO。P4的栅极连接第二节点NET2，源极连接电源VDD，漏极连接第一级输出节点SOB。P5的栅极、漏极连接第二节点NET2，源极连接电源VDD。P6的栅极连接外部使能信号SAEN，源极连接电源VDD，漏极连接第二节点NET2。The gate of P1 is connected to the external enable signal SAEN, the source is connected to the power supply VDD, and the drain is connected to the first node NET1. The gate and drain of P2 are connected to the first node NET1, and the source is connected to the power supply VDD. The gate of P3 is connected to the first node NET1, the source is connected to the power supply VDD, and the drain is connected to the first-stage output node SO. The gate of P4 is connected to the second node NET2 , the source is connected to the power supply VDD, and the drain is connected to the first-stage output node SOB. The gate and drain of P5 are connected to the second node NET2, and the source is connected to the power supply VDD. The gate of P6 is connected to the external enable signal SAEN, the source is connected to the power supply VDD, and the drain is connected to the second node NET2.

NM1的栅极连接钳位信号CLP，源极连接读电流I_CELL，漏极连接第一节点NET1。NM2的栅极连接第一级输出节点SOB，源极连接地GND，漏极连接第一级输出节点SO。NM3的栅极、漏极连接第一级输出节点SOB，源极连接地GND。NM4的栅极连接钳位信号CLP，源极连接参考电流I_REF，漏极连接第二节点NET2。The gate of NM1 is connected to the clamping signal CLP, the source is connected to the read current I _CELL , and the drain is connected to the first node NET1. The gate of NM2 is connected to the first-stage output node SOB, the source is connected to the ground GND, and the drain is connected to the first-stage output node SO. The gate and drain of NM3 are connected to the first-stage output node SOB, and the source is connected to the ground GND. The gate of NM4 is connected to the clamping signal CLP, the source is connected to the reference current I _REF , and the drain is connected to the second node NET2.

电压放大器一包括2个PMOS管P7～P8、3个NMOS管NM5～NM7、1个反相器INV。Voltage amplifier one includes 2 PMOS transistors P7-P8, 3 NMOS transistors NM5-NM7, and 1 inverter INV.

P7的栅极、漏极连接第三节点NET3，源极连接电源VDD。P8的栅极、漏极连接第四节点NET4，源极连接电源VDD。NM5的栅极连接第一级输出节点SO，源极连接第五节点NET5，漏极连接第三节点NET3。NM6的栅极连接第一级输出节点SOB，源极连接第五节点NET5，漏极连接第四节点NET4。NM7的栅极连接外部使能信号SAEN，源极连接地GND，漏极连接第五节点NET5。反相器INV的输入端连接第四节点NET4，输出信号为权重值DOUT并分成两路，其中一路用于连接读出驱动电路、另一路连接权重位线WW。The gate and drain of P7 are connected to the third node NET3, and the source is connected to the power supply VDD. The gate and drain of P8 are connected to the fourth node NET4, and the source is connected to the power supply VDD. The gate of NM5 is connected to the first-stage output node SO, the source is connected to the fifth node NET5, and the drain is connected to the third node NET3. The gate of NM6 is connected to the first-stage output node SOB, the source is connected to the fifth node NET5, and the drain is connected to the fourth node NET4. The gate of NM7 is connected to the external enable signal SAEN, the source is connected to the ground GND, and the drain is connected to the fifth node NET5. The input terminal of the inverter INV is connected to the fourth node NET4, and the output signal is the weight value DOUT and is divided into two routes, one of which is used to connect to the readout drive circuit, and the other is connected to the weight bit line WW.

由于上述(A)中为第k个灵敏放大器一、第k个电压放大器一，因此P1～P8即为PL1[k]～PL8[k]，NM1～NM5即为NML1[k]～NML5[k]，INV即为INVL[k]，NET1～NET即为NETL1[k]～NETL5[k]，SO即为SOL[k]，SOB即为SOBL[k]，I_CELL为I_CELL1[k]，I_REF为I_REF1[k]，DOUT即为DOUTL[k](即第k个权重值一)，WW即为LW[k]。Since the above (A) is the kth sense amplifier 1 and the kth voltage amplifier 1, P1~P8 are PL1[k]~PL8[k], and NM1~NM5 are NML1[k]~NML5[k] ], INV is INVL[k], NET1～NET is NETL1[k]～NETL5[k], SO is SOL[k], SOB is SOBL[k], I _CELL is I _CELL1 [k], I _REF is I _REF1 [k], DOUT is DOUTL[k] (that is, the kth weight value 1), and WW is LW[k].

(2)第k个灵敏放大器二与第k个列选择器二连接。第k个灵敏放大器二包括第k个电流采样单元二、第k个电压放大器二，用于对I_CELL2[k]、I_REF2[k]进行采样和比较，输出DOUTR[k]。(2) The k-th sense amplifier 2 is connected to the k-th column selector 2 . The k-th sense amplifier 2 includes the k-th current sampling unit 2 and the k-th voltage amplifier 2, which are used to sample and compare I _CELL2 [k] and I _REF2 [k], and output DOUTR[k].

灵敏放大器二包括电流采样单元二、电压放大器二，用于对偶数存储子阵列的读电流I_CELL2和偶数参考阵列的参考电流I_REF2进行采样和比较，输出权重值二。Sensitive amplifier 2 includes current sampling unit 2 and voltage amplifier 2, used for sampling and comparing read current I _CELL2 of even storage sub-arrays and reference current I _REF2 of even reference arrays, and outputs weight value 2.

与(A)相似的，电流采样单元二和电流采样单元一的构成相同，也包括6个PMOS管P1～P6，4个NMOS管NM1～NM4。Similar to (A), the current sampling unit 2 and the current sampling unit 1 have the same configuration, and also include 6 PMOS transistors P1-P6, and 4 NMOS transistors NM1-NM4.

电压放大器一和电压放大器二的构成也相同，包括2个PMOS管P7～P8、3个NMOS管NM5～NM7、1个反相器INV。The composition of voltage amplifier 1 and voltage amplifier 2 is also the same, including 2 PMOS transistors P7-P8, 3 NMOS transistors NM5-NM7, and 1 inverter INV.

具体连接参看(A)的介绍，及图7：For the specific connection, refer to the introduction of (A), and Figure 7:

由于(B)中为第k个灵敏放大器二、第k个电压放大器二，因此P1～P8即为PR1[k]～PR8[k]，NM1～NM5即为NMR1[k]～NMR5[k]，INV2即为INVR[k]，NET1～NET5即为NETR1[k]～NETR5[k]，SO即为SOR[k]，SOB即为SOBR[k]，I_CELL为I_CELL2[k]，I_REF为I_REF2[k]，DOUT即为DOUTR[k](即第k个权重值二)，WW即为RW[k]。Since (B) is the kth sensitive amplifier 2 and the kth voltage amplifier 2, P1~P8 are PR1[k]~PR8[k], and NM1~NM5 are NMR1[k]~NMR5[k] , INV2 is INVR[k], NET1～NET5 is NETR1[k]～NETR5[k], SO is SOR[k], SOB is SOBR[k], I CELL is I CELL2 [k], I _CELL is I _CELL2 [k], I _REF is I _REF2 [k], DOUT is DOUTR[k] (that is, the kth weight value 2), and WW is RW[k].

对于输出信号DOUTL/DOUTR，若参考电流小于位线电流，输出为低电平0；当参考电流大于位线电流，输出为高电平1。For the output signal DOUTL/DOUTR, if the reference current is less than the bit line current, the output is low level 0; when the reference current is greater than the bit line current, the output is high level 1.

参看图8，为本发明实施例的读操作瞬态仿真波形图，以(A)、(B)中不带序号的通用参数指代进行说明：Referring to FIG. 8 , it is a transient simulation waveform diagram of a read operation according to an embodiment of the present invention, and is illustrated by referring to general parameters without serial numbers in (A) and (B):

读操作过程可分为两个阶段：The read operation process can be divided into two phases:

预充/压差稳定阶段：钳位信号CLP使能，此时开启介于位线和电流源之间的钳位管，字线WL开启，电流流经存储部和参考部。当存储单元中的MTJ1为高阻态时，参考电流I_REF大于位线电流I_CELL；当存储单元中的MTJ1为低阻态时，参考电流I_REF小于位线电流I_CELL。位线电流I_CELL通过PMOS管P2、P3构成的电流镜复制到节点SO上，参考电流IREF通过PMOS管P4、P5构成的电流镜复制到节点SOB上。因此存储部与参考部的电流差被转化为节点SO与节点SOB之间的电压差。Pre-charging/pressure drop stabilization stage: the clamp signal CLP is enabled, at this time the clamp transistor between the bit line and the current source is turned on, the word line WL is turned on, and the current flows through the storage part and the reference part. When the MTJ1 in the memory cell is in a high resistance state, the reference current I _REF is greater than the bit line current I _CELL ; when the MTJ1 in the memory cell is in a low resistance state, the reference current I _REF is smaller than the bit line current I _CELL . The bit line current I _CELL is copied to the node SO through the current mirror formed by the PMOS transistors P2 and P3 , and the reference current IREF is copied to the node SOB through the current mirror formed by the PMOS transistors P4 and P5 . Therefore, the current difference between the storage part and the reference part is converted into a voltage difference between the node SO and the node SOB.

采样阶段：当节点SO与节点SOB之间稳定电压差形成后，电压放大器使能信号SAE开启，节点SO与节点SOB之间的电压差倍放大，输出端读出存储数据。Sampling stage: When the stable voltage difference between the node SO and the node SOB is formed, the voltage amplifier enable signal SAE is turned on, the voltage difference between the node SO and the node SOB is multiplied, and the stored data is read out at the output terminal.

当然，灵敏放大器模块还连接有读出驱动电路，其用于在标准读写模式的读操作时读取输出权重值。Certainly, the sense amplifier module is also connected with a read-out drive circuit, which is used to read the output weight value during the read operation in the standard read-write mode.

模式选择模块用于切换存储阵列模块的标准读写模式和多比特乘累加计算模式。The mode selection module is used to switch the standard read-write mode and the multi-bit multiply-accumulate calculation mode of the storage array module.

如图3所示，模式选择模块根据外部使能信号MEN选择模式。As shown in Figure 3, the mode selection module selects a mode according to the external enable signal MEN.

当外部使能信号MEN为高电平时，即权重位线一LW[1]～LW[n]、权重位线二RW[1]～RW[n]与多比特运算模块不接通，存储阵列模块为标准读写模式。When the external enable signal MEN is at a high level, that is, the weight bit line 1 LW[1]~LW[n], the weight bit line 2 RW[1]~RW[n] are not connected to the multi-bit operation module, and the memory array The module is in standard read and write mode.

当外部使能信号MEN为低电平时，存储阵列模块为多比特乘累加计算模式，权重位线一LW[1]～LW[n]、权重位线二RW[1]～RW[n]与多比特运算模块接通，使第k个灵敏放大器一与第k个级联计算单元一连接，第k个灵敏放大器二与第k个级联计算单元二连接。When the external enable signal MEN is at low level, the memory array module is in the multi-bit multiplication and accumulation calculation mode, and the weight bit line 1 LW[1]~LW[n], the weight bit line 2 RW[1]~RW[n] and The multi-bit operation module is turned on, so that the k-th sense amplifier one is connected to the k-th cascade computing unit one, and the k-th sense amplifier two is connected to the k-th cascade computing unit two.

在标准读写模式和多比特乘累加计算模式下，左子阵列、右子阵列均并行工作。当存储阵列模块完成标准读操作后，存储数据通过灵敏放大器模块读出，输出对应的权重值一DOUTL与权重值二DOUTR，对应2-bit权重。In the standard read-write mode and the multi-bit multiply-accumulate calculation mode, both the left sub-array and the right sub-array work in parallel. After the storage array module completes the standard read operation, the stored data is read out through the sense amplifier module, and the corresponding weight value one DOUTL and weight value two DOUTR are output, corresponding to 2-bit weights.

在多比特乘累加计算模式下，2-bit权重分比特传输至级联计算单元一/二；外部4-bit输入分为两组，分别传递至全局位线GBL/GBLB，表征为V_GBL/V_GBLB高电平的持续时间，计算结果分别对应计算位线CBL/CBLB上的电压变化量。In the multi-bit multiplication and accumulation calculation mode, the 2-bit weight is divided into bits and transmitted to the cascaded calculation unit 1/2; the external 4-bit input is divided into two groups, which are respectively transmitted to the global bit line GBL/GBLB, represented by V _GBL / For the duration of the high level of V _GBLB , the calculation results correspond to the calculated voltage variation on the bit line CBL/CBLB respectively.

若读取的权重值为“0”，则该计算单元不产生电流，即乘法操作结果为0；若读取的权重值为“1”，则计算位线电容(C_CBL/C_CBLB)开始对该计算单元放电，最终乘累加结果对应为全局位线CBL/CBLB上产生的放电量。If the read weight value is "0", the calculation unit does not generate current, that is, the result of the multiplication operation is 0; if the read weight value is "1", the calculation of the bit line capacitance (C _CBL /C _CBLB ) starts The calculation unit is discharged, and the final multiplication and accumulation result corresponds to the discharge amount generated on the global bit line CBL/CBLB.

量化单元模块用于在多比特乘累加计算模式下，将计算位线CBL/CBLB累积的电压变化量进行量化，获得量化输出。The quantization unit module is used to quantize the accumulated voltage variation of the calculated bit line CBL/CBLB in the multi-bit multiply-accumulate calculation mode to obtain a quantized output.

需要说明的是，量化单元模块设置了两个，一个连接到计算位线CBL，用于对计算位线CBL的电压变化量进行量化；另一个连接到计算位线CBLB，用于对计算位线CBLB的电压变化量进行量化。It should be noted that there are two quantization unit modules, one is connected to the calculation bit line CBL, which is used to quantify the voltage change of the calculation bit line CBL; the other is connected to the calculation bit line CBLB, and is used to quantify the calculation bit line CBL CBLB voltage change is quantified.

参看图10，两个量化单元模块结构相同，均包括电容阵列、逐次逼近逻辑控制单元、电压比较器。Referring to FIG. 10 , the two quantization unit modules have the same structure, and both include a capacitor array, a successive approximation logic control unit, and a voltage comparator.

电容阵列包括5个电容C0～C4，其中第一电容为C0，第二电容为C1，第三电容为C2，第四电容为C3，第五电容为C4。五个电容的关系为C4:C3:C2:C1:C0＝8:4:2:1:1。The capacitor array includes five capacitors C0-C4, wherein the first capacitor is C0, the second capacitor is C1, the third capacitor is C2, the fourth capacitor is C3, and the fifth capacitor is C4. The relationship of the five capacitors is C4:C3:C2:C1:C0=8:4:2:1:1.

电容C0、C1、C2、C3、C4的上极板均连接至电压比较器的输入节点INP，电容C0、C1、C2、C3、C4的下极板分别通过控制开关S[0]、S[1]、S[2]、S[3]、S[4]连接到计算位线CBL或CBLB、参考电压VREF、电源VDD。The upper plates of capacitors C0, C1, C2, C3, and C4 are all connected to the input node INP of the voltage comparator, and the lower plates of capacitors C0, C1, C2, C3, and C4 are controlled by switches S[0], S[ 1], S[2], S[3], S[4] are connected to the calculation bit line CBL or CBLB, the reference voltage VREF, and the power supply VDD.

逐次逼近逻辑控制单元采用逐次逼近逻辑生成控制信号S[4:0]，用于控制电容阵列生成电压比较器使能信号EN进而控制电压比较器。The successive approximation logic control unit uses the successive approximation logic to generate the control signal S[4:0], which is used to control the capacitor array to generate the voltage comparator enable signal EN to control the voltage comparator.

电压比较器的输入节点INN连接至共模电压VCM。控制信号CE开启时，输入节点INP、INN短接，电压比较器使能信号EN开启电压比较器，对输入节点INP、INN的电压进行比较，产生输出Output。The input node INN of the voltage comparator is connected to the common mode voltage VCM. When the control signal CE is turned on, the input nodes INP and INN are short-circuited, and the voltage comparator enable signal EN turns on the voltage comparator to compare the voltages of the input nodes INP and INN to generate an output Output.

对于单个量化单元模块而言，其将7种不同的乘累加结果量化为4-bit数据(0～15)，即最大量化结果MAC_MAX＝15。For a single quantization unit module, it quantizes seven different multiply-accumulate results into 4-bit data (0-15), that is, the maximum quantization result MAC_MAX=15.

量化单元模块执行标准二分法转化，参考图10，对于连接计算位线CBL的量化单元模块一，以乘累加MAC＝9举例：The quantization unit module performs standard dichotomy conversion. Referring to FIG. 10, for the quantization unit module 1 connected to the calculation bit line CBL, take multiply-accumulate MAC=9 as an example:

首先，控制信号CE开启，节点INN、INP短接至共模电压VCM，VCM对应量化数字输出0。First, the control signal CE is turned on, the nodes INN and INP are short-circuited to the common-mode voltage VCM, and VCM corresponds to the quantized digital output 0.

当电容阵列完成对计算位线CBL上乘累加得到的模拟电压采样后，开关S[4:0]切换至电源VDD，此时节点INP的电压为：After the capacitor array finishes sampling the analog voltage obtained by multiplying and accumulating the bit line CBL, the switch S[4:0] is switched to the power supply VDD, and the voltage of the node INP at this time is:

V_INP(0)＝V_CM+V_DD-V_CBL；V _INP (0) = V _CM +V _DD -V _CBL ;

当比较阶段开始时，逐次逼近逻辑控制单元通过控制开关S[4]，将第五电容C4的下极板与参考电压V_REF相连，进行第一次逼近(即1^st VREF)，得到节点INP的电压为：When the comparison phase starts, the successive approximation logic control unit controls the switch S[4], connects the lower plate of the fifth capacitor C4 to the reference voltage V _REF , and performs the first approximation (that is, 1 ^st VREF), and obtains the node INP The voltage is:

V_INP(1^st)＝V_CM-V_CBL+(V_REF+V_DD)/2；V _INP (1 ^st ) = V _CM -V _CBL +(V _REF +V _DD )/2;

经过电压比较器对V_INP与V_INN电压大小进行比较得到输出0，反馈至逐次逼近逻辑控制单元，逐次逼近逻辑控制单元控制开关S[3]，将第四电容C3的下极板与参考电压V_REF相连，进行第二次逼近(即2^nd VREF)，得到节点INP的电压为：The voltage of V _INP and V _INN is compared by the voltage comparator to obtain an output of 0, which is fed back to the successive approximation logic control unit, which controls the switch S[3] to connect the lower plate of the fourth capacitor C3 with the reference voltage V _REF is connected, and the second approximation (that is, ^2nd VREF) is performed, and the voltage of the node INP is obtained as:

V_INP(2^nd)＝V_CM-V_CBL+3*(V_REF+V_DD)/4；V _INP (2 ^nd )＝V _CM -V _CBL +3*(V _REF +V _DD )/4;

经过电压比较器对V_INP与V_INN电压大小进行比较得到输出1，反馈至逐次逼近逻辑控制单元，逐次逼近逻辑控制单元控制开关S[3]、S[2]，将第四电容C3的下极板切换至与电源VDD相连，将第三电容C2的下极板与参考电压VREF相连，进行第三次逼近(即3^rd VREF)，得到节点INP的电压为：The output 1 is obtained by comparing the voltages of V _INP and V _INN through the voltage comparator, which is fed back to the successive approximation logic control unit. The successive approximation logic control unit controls the switches S[3] and S[2]. The plate is switched to be connected to the power supply VDD, and the lower plate of the third capacitor C2 is connected to the reference voltage VREF, and the third approximation (ie 3 ^rd VREF) is performed, and the voltage of the node INP is obtained as:

V_INP(3^rd)＝V_CM-V_CBL+5*(V_REF+V_DD)/8；V _INP (3 ^rd )＝V _CM -V _CBL +5*(V _REF +V _DD )/8;

经过电压比较器对V_INP与V_INN电压大小进行比较得到输出1，反馈至逐次逼近逻辑控制单元，逐次逼近逻辑控制单元控制开关S[2]、S[1]，将第三电容C2的下极板切换至与电源VDD相连，将第二电容C1的下极板与参考电压VREF相连，进行第四次逼近(即4^th VREF)，得到节点INP的电压为：The output 1 is obtained by comparing the voltages of V _INP and V _INN through the voltage comparator, which is fed back to the successive approximation logic control unit, which controls the switches S[2] and S[1], and converts the lower voltage of the third capacitor C2 to The plate is switched to be connected to the power supply VDD, the lower plate of the second capacitor C1 is connected to the reference voltage VREF, and the fourth approximation (that is, 4 ^th VREF) is performed, and the voltage of the node INP is obtained as:

V_INP(4^th)＝V_CM-V_CBL/CBLB+9*(V_REF+V_DD)/16；V _INP (4 ^th )＝V _CM -V _CBL/CBLB +9*(V _REF +V _DD )/16;

此时的V_INP与V_INN电压相等，量化单元输出乘累加计算数字结果MAC＝9，即为量化单元模块一的Output。At this time, the voltages of V _INP and V _INN are equal, and the quantization unit outputs a multiplication and accumulation calculation digital result MAC=9, which is the output of the first quantization unit module.

同理，对于连接计算位线CBLB的量化单元模块二，也以乘累加MAC＝9举例，工作过程与上面类似，也输出乘累加计算数字结果MAC＝9，即为量化单元模块二的Output。Similarly, for the quantization unit module 2 connected to the calculation bit line CBLB, take the multiply-accumulate MAC=9 as an example, the working process is similar to the above, and also output the multiply-accumulate calculation digital result MAC=9, which is the output of the quantization unit module 2.

量化单元模块一的Output和量化单元模块二的Output用于输入到数字组合电路中，并进行权重再分配，这样经过数字组合电路转换后得到全精度数字结果输出。以上述的两个Output均为MAC＝9为例，则最终全精度输出为FMAC＝153(“10011001”)。The Output of the quantization unit module 1 and the output of the quantization unit module 2 are used to input into the digital combination circuit, and the weights are redistributed, so that a full-precision digital result output can be obtained after conversion by the digital combination circuit. Taking the above two Outputs as both MAC=9 as an example, the final full-precision output is FMAC=153 (“10011001”).

此外，上述部件的工作由时序控制电路模块控制，其通过控制存内计算电路结构各部分时序，使其对应工作。具体的，时序控制电路控制各输入信号、控制信号的高低电平切换。In addition, the work of the above-mentioned components is controlled by the timing control circuit module, which controls the timing of each part of the in-memory computing circuit structure to make them work correspondingly. Specifically, the timing control circuit controls the high-low level switching of each input signal and control signal.

实施例2Example 2

本实施例2对实施例1中h取2时，具体模拟域计算过程进行原理性说明及仿真证明。In this embodiment 2, when h is 2 in the embodiment 1, the specific simulation domain calculation process is explained in principle and proved by simulation.

参看图11，为多比特运算模块中任一对级联计算单元(以第1个级联计算单元一、第1个级联计算单元二组成第1对级联计算单元为例)执行2-bit输入与2-bit权重乘法计算结果示意图。Referring to Fig. 11 , it executes 2- Schematic diagram of bit input and 2-bit weight multiplication calculation results.

W[1:0]×IN[3:2]的具体模拟域计算过程如下：The specific simulation domain calculation process of W[1:0]×IN[3:2] is as follows:

当权重W[1:0]＝00时，即左存储阵列的权重值LW[1]与右存储阵列的权重值RW[1]均为“0”，此时无论输入IN[3:2]值为多少(IN[3:2]＝00、01、10、11)，低位联计算单元与高位级联计算单元均不导通，即计算位线CBL上无放电电流产生，计算位线电容C_CBL无放电路径；当输入IN[3:2]＝00时，即V_GBL在一个计算周期内始终为低电平时，此时，无论权重W[1:0]为多少(W[1:0]＝00、01、10、11)，低位联计算单元与高位级联计算单元均不导通，即计算位线CBL上无放电电流产生，计算位线电容C_CBL无放电路径，When the weight W[1:0]=00, that is, the weight value LW[1] of the left storage array and the weight value RW[1] of the right storage array are both "0". What is the value (IN[3:2]=00, 01, 10, 11), the low-level cascaded calculation unit and the high-level cascaded calculation unit are not conducting, that is, no discharge current is generated on the calculated bit line CBL, and the bit line capacitance is calculated C _CBL has no discharge path; when the input IN[3:2]=00, that is, when V _GBL is always low in a calculation cycle, at this time, no matter how much the weight W[1:0] is (W[1: 0]=00, 01, 10, 11), the low-level cascaded calculation unit and the high-level cascaded calculation unit are not connected, that is, no discharge current is generated on the calculation bit line CBL, and there is no discharge path for the calculation bit line capacitance C _CBL ,

在以上共计7种情况下，位线CBL产生电压变化ΔV_CBL＝0，即：W[1:0]×IN[3:2]＝00×00＝00×01＝00×10＝00×11＝01×00＝10×00＝11×00＝0。In the above seven cases in total, the bit line CBL produces a voltage change ΔV _CBL =0, namely: W[1:0]×IN[3:2]=00×00=00×01=00×10=00×11 =01×00=10×00=11×00=0.

当W[1:0]＝01，IN[3:2]＝01时，即RW[1]＝0，LW[1]＝1，V_GBL在一个计算周期内高电平持续时间T_GBL为t时，高位级联计算单元不导通，低位级联计算单元导通产生电流I，计算位线电容C_CBL开始对低位级联计算单元放电，位线CBL产生电压变化

即：

When W[1:0]=01, IN[3:2]=01, that is, RW[1]=0, LW[1]=1, the high level duration T _GBL of V _GBL in one calculation cycle is At time t, the high-order cascaded computing unit is not turned on, and the low-order cascaded computing unit is turned on to generate a current I, and the calculated bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit, and the bit line CBL produces a voltage change

Right now:

当W[1:0]＝01，IN[3:2]＝10时，即RW[1]＝1，LW[1]＝0，V_GBL在一个计算周期内高电平持续时间T_GBL为2t时，高位级联计算单元不导通，低位级联计算单元导通产生电流I，即流经计算位线CBL的电流为I，计算位线电容C_CBL开始对低位级联计算单元放电，位线CBL产生电压变化

即：

When W[1:0]=01, IN[3:2]=10, that is, RW[1]=1, LW[1]=0, the high level duration T _GBL of V _GBL in one calculation cycle is At 2t, the high-order cascaded computing unit is not turned on, and the low-order cascaded computing unit is turned on to generate a current I, that is, the current flowing through the computing bit line CBL is I, and the computing bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit. The bit line CBL produces a voltage change

Right now:

当W[1:0]＝01，IN[3:2]＝11时，即RW[1]＝0，LW[1]＝1，V_GBL在一个计算周期内高电平持续时间为3t时，高位级联计算单元不导通，低位级联计算单元导通产生电流I，即流经计算位线CBL的电流为I，计算位线电容C_CBL开始对低位级联计算单元放电，位线CBL产生电压变化

即

When W[1:0]=01, IN[3:2]=11, that is, RW[1]=0, LW[1]=1, when V _GBL has a high level duration of 3t in one calculation cycle , the high-order cascaded computing unit is not turned on, and the low-order cascaded computing unit is turned on to generate a current I, that is, the current flowing through the computing bit line CBL is I, and the computing bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit, and the bit line CBL produces a voltage change

Right now

当W[1:0]＝10，IN[3:2]＝01时，即RW[1]＝1，LW[1]＝0，V_GBL在一个计算周期内高电平持续时间T_GBL为t时，低位级联计算单元导通，高位级联计算单元导通产生电流2I，即流经计算位线CBL的电流为2I，计算位线电容C_CBL开始对低位级联计算单元放电，位线CBL产生电压变化

即：

When W[1:0]=10, IN[3:2]=01, that is, RW[1]=1, LW[1]=0, the high level duration T _GBL of V _GBL in one calculation cycle is At time t, the lower cascaded computing unit is turned on, and the higher cascaded computing unit is turned on to generate a current of 2I, that is, the current flowing through the computing bit line CBL is 2I, and the computing bit line capacitance C _CBL starts to discharge the low cascaded computing unit, and the bit line line CBL produces a voltage change

Right now:

当W[1:0]＝10，IN[3:2]＝10时，即RW[1]＝1，LW[1]＝0，V_GBL在一个计算周期内高电平持续时间T_GBL为2t时，低位级联计算单元导通，高位级联计算单元导通产生电流2I，计算位线电容C_CBL开始对低位级联计算单元放电，位线CBL产生电压变化

即：

When W[1:0]=10, IN[3:2]=10, that is, RW[1]=1, LW[1]=0, the high level duration T _GBL of V _GBL in one calculation cycle is At 2t, the low-order cascaded computing unit is turned on, and the high-order cascaded computing unit is turned on to generate a current of 2I. The calculated bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit, and the bit line CBL produces a voltage change.

Right now:

当W[1:0]＝10，IN[3:2]＝11时，即RW[1]＝1，LW[1]＝0，V_GBL在一个计算周期内高电平持续时间T_GBL为3t时，低位级联计算单元导通，高位级联计算单元导通产生电流2I，即流经计算位线CBL的电流为2I，计算位线电容C_CBL开始对低位级联计算单元放电，位线CBL产生电压变化

即：

When W[1:0]=10, IN[3:2]=11, that is, RW[1]=1, LW[1]=0, the high level duration T _GBL of V _GBL in one calculation cycle is At 3t, the low-order cascaded computing unit is turned on, and the high-order cascaded computing unit is turned on to generate a current of 2I, that is, the current flowing through the computing bit line CBL is 2I, and the computing bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit, and the bit line line CBL produces a voltage change

Right now:

当W[1:0]＝11，IN[3:2]＝01时，即RW[1]＝1，LW[1]＝1，VGBL在一个计算周期内高电平持续时间T_GBL为t时，低位级联计算单元导通产生电流I与高位级联计算单元导通产生电流2I，即流经计算位线CBL的电流为3I，计算位线电容C_CBL开始对低位级联计算单元放电，计算位线CBL产生电压变化

即：

When W[1:0]=11, IN[3:2]=01, that is, RW[1]=1, LW[1]=1, the high level duration of VGBL in one calculation cycle T _GBL is t When , the low-order cascaded computing unit conducts to generate a current I and the high-order cascaded computing unit conducts to generate a current 2I, that is, the current flowing through the computing bit line CBL is 3I, and the computing bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit , to calculate the voltage change of the bit line CBL

Right now:

当W[1:0]＝11，IN[3:2]＝10，即RW[1]＝1，LW[1]＝1，V_GBL在一个计算周期内高电平持续时间T_GBL为2t时，低位级联计算单元导通产生电流I与高位级联计算单元导通产生电流2I，即流经计算位线CBL的电流为3I，计算位线电容C_CBL开始对低位级联计算单元放电，计算位线CBL产生电压变化

即：

When W[1:0]=11, IN[3:2]=10, that is, RW[1]=1, LW[1]=1, the duration of V _GBL high level T _GBL is 2t in one calculation cycle When , the low-order cascaded computing unit conducts to generate a current I and the high-order cascaded computing unit conducts to generate a current 2I, that is, the current flowing through the computing bit line CBL is 3I, and the computing bit line capacitance C _CBL starts to discharge the low-order cascaded computing unit , to calculate the voltage change of the bit line CBL

Right now:

当W[1:0]＝11，IN[3:2]＝11时，即RW[1]＝1，LW[1]＝1，VGBL在一个计算周期内高电平持续时间T_GBL为3t时，低位级联计算单元导通产生电流I与高位级联计算单元导通产生电流2I，即流经计算位线CBL的电流为2I，计算位线电容C_CBL开始对低位级联计算单元放电，计算位线CBL产生电压变化

即：

When W[1:0]=11, IN[3:2]=11, that is, RW[1]=1, LW[1]=1, the high level duration T _GBL of VGBL in one calculation cycle is 3t When , the low-order cascaded computing unit is turned on to generate a current I and the high-order cascaded computing unit is turned on to generate a current 2I, that is, the current flowing through the computing bit line CBL is 2I, and the computing bit line capacitance C _CBL starts to discharge the low-bit cascaded computing unit , to calculate the voltage change of the bit line CBL

Right now:

具体的真值表如下表一：The specific truth table is shown in Table 1:

表一多比特运算模块进行多比特乘累加运算的真值表Table 1 The truth table for the multi-bit multiplication and accumulation operation performed by the multi-bit operation module

由以上分析可知，在模拟域中，From the above analysis, we can see that in the simulation domain,

容易证明IN[3:2]分别为“00”、“01”、“10”及“11”时，对应放电时间分别为“0t”、“1t”、“2t”及“3t”。It is easy to prove that when IN[3:2] are "00", "01", "10" and "11", the corresponding discharge times are "0t", "1t", "2t" and "3t".

而为了证明W[1:0]分别为“00”、“01”、“10”及“11”时，对应的计算单元放电电流分别为“0I”、“1I”、“2I”、“3I”，要依据图6的等效电路进行直流分析：In order to prove that when W[1:0] is "00", "01", "10" and "11" respectively, the corresponding calculation unit discharge currents are "0I", "1I", "2I", "3I" ", according to the equivalent circuit in Figure 6 for DC analysis:

当N1[1]与N5[1]均工作于饱和区，N2[1]与N6[1]均工作与深线性区，可以做出一个由可调电流源I与可调线性电阻R_on串联的等效分析电路，这里令N1[1]的宽长比为(W/L)₁，N5[1]的宽长比为2(W/L)₁，N2[1]、N6[1]的宽长比为(W/L)₂，N1[1]、N5[1]的阈值电压为V_TH1，N2[1]、N6[1]的阈值电压为V_TH2；LW[1]的电压为V_LW1，RW[1]的电压为V_RW1。When both N1[1] and N5[1] work in the saturation region, and N2[1] and N6[1] work in the deep linear region, an adjustable current source I can be made in series with an adjustable linear resistor R _on The equivalent analysis circuit of , here let the width-length ratio of N1[1] be (W/L) ₁ , the width-length ratio of N5[1] be 2(W/L) ₁ , N2[1], N6[1] The width-to-length ratio is (W/L) ₂ , the threshold voltage of N1[1] and N5[1] is V _TH1 , the threshold voltage of N2[1] and N6[1] is V _TH2 ; the voltage of LW[1] is V _LW1 , and the voltage of RW[1] is V _RW1 .

需要说明的是，(W/L)₁、(W/L)₂表示两种宽长比。It should be noted that (W/L) ₁ and (W/L) ₂ represent two kinds of width-to-length ratios.

计算开始前，计算位线电容C_CBL被预充至V_PRE。当高位级联计算单元与低位级联计算单元均被激活时，低位级联计算单元的导通电流I₁为：Before the calculation starts, the calculation bit line capacitance C _CBL is precharged to V _PRE . When both the high-order cascaded computing unit and the low-order cascaded computing unit are activated, the conduction current I ₁ of the low-order cascaded computing unit is:

其中，μ_n、C_ox均为工艺相关常数。Among them, μ _n and C _ox are process-related constants.

高位级联计算单元的导通电流I₂为：The conduction current I ₂ of the high-level cascaded computing unit is:

可调线性电阻R_on为：The adjustable linear resistance R _on is:

其中，节点LX(即X1[1])的电压V_LX为：Among them, the voltage V _LX of the node LX (that is, X1[1]) is:

V_LX＝I₁*R_on； (7)V _LX =I ₁ *R _on ; (7)

节点RX(即X3[1])的电压V_RX为：The voltage V _RX of node RX (that is, X3[1]) is:

V_RX＝I₂*R_on； (8)V _RX =I ₂ *R _on ; (8)

当可调线性电阻R_on的阻值足够小时，可以认为V_LX、V_RX的变化对I₁、I₂的影响可忽略不计，即低位级联计算单元与高位级联计算单元的导通电流跟随其宽长比线性变化，亦即：When the resistance value of the adjustable linear resistor R _on is small enough, it can be considered that the impact of changes in V _LX and _VRX on I ₁ and I ₂ is negligible, that is, the conduction current of the low-order cascaded computing unit and the high-order cascaded computing unit Follow its width-to-length ratio linearly, that is:

一对级联计算单元的放电电流等于高位级联计算单元电流I₂与低位级联计算单元电流I₁之和，根据公式(9)、(10)、(11)可知：The discharge current of a pair of cascaded computing units is equal to the sum of the current I ₂ of the high-order cascaded computing unit and the current I ₁ of the low-order cascaded computing unit. According to formulas (9), (10), and (11), it can be known that:

当W[1:0]分别为“00”、“01”、“10”及“11”时，总放电电流分别为：0*I₁+0*I₂＝0、1*I₁+0*I₂＝I₁、0*I₁+1*I₂＝I₂＝2I₁、1*I₁+1*I₂＝3I₁。When W[1:0] are "00", "01", "10" and "11" respectively, the total discharge current is: 0*I ₁ +0*I ₂ =0, 1*I ₁ +0 *I ₂ =I ₁ , 0*I ₁ +1*I ₂ =I ₂ =2I ₁ , 1*I ₁ +1*I ₂ =3I ₁ .

又根据线性电容VCR(Voltage Current Relation)关系式可得：And according to the linear capacitance VCR (Voltage Current Relation) relationship can be obtained:

即验证了公式(1)、(2)、(3)。That is, formulas (1), (2), and (3) are verified.

此外，参见图12、图13、图14为2-bit输入与2-bit权重乘法计算7种放电结果蒙特卡洛仿真证明图，图12验证了W[1:0]×N[3:2]＝10×01与W[1:0]×N[3:2]＝01×10两种计算情况在CBL上的放电量均为2ΔV，且其蒙特卡洛仿真的高斯分布显示放电操作结束后CBL上电压均值与标准差近似相等。同理，图13验证了W[1:0]×IN[3:2]＝01×11与W[1:0]×N[3:2]＝11×01两种计算情况在CBL上的放电量均为3ΔV；图14验证了W[1:0]×IN[3:2]＝10×11与W[1:0]×N[3:2]＝11×10两种计算情况在CBL上的放电量均为6ΔV。可知本多比特运算模块在模拟域中的执行的多比特乘累加计算具有可靠性。In addition, see Figure 12, Figure 13, and Figure 14 for Monte Carlo simulation proof diagrams of 2-bit input and 2-bit weight multiplication to calculate 7 kinds of discharge results. Figure 12 verifies W[1:0]×N[3:2 ]=10×01 and W[1:0]×N[3:2]=01×10, the discharge amount on the CBL is both 2ΔV, and the Gaussian distribution of the Monte Carlo simulation shows that the discharge operation is over The mean and standard deviation of the voltage on the posterior CBL are approximately equal. Similarly, Figure 13 verifies the calculations of W[1:0]×IN[3:2]=01×11 and W[1:0]×N[3:2]=11×01 on the CBL The discharge capacity is 3ΔV; Figure 14 verifies that the two calculation cases of W[1:0]×IN[3:2]=10×11 and W[1:0]×N[3:2]=11×10 are in The discharge capacity on the CBL is 6ΔV. It can be seen that the multi-bit multiplication and accumulation calculation performed by the multi-bit operation module in the analog domain is reliable.

本实施例2还对实施例1公开的存内计算电路系统功耗及能效随工作电压变化的进行了仿真。参看图15，横坐标为工作电压，左边纵坐标表示功耗，右边纵坐标表示能效。从图中可以看到，随着工作电压的降低，功耗降低，能效得到提升，该电路最低工作电压可到0.5V，此时的功耗及能效分别为43.21μW和84.39TOPS/W，功耗及能效符合要求。In Embodiment 2, the power consumption and energy efficiency of the in-memory computing circuit system disclosed in Embodiment 1 are simulated as a function of the operating voltage. Referring to FIG. 15 , the abscissa is the operating voltage, the left ordinate represents power consumption, and the right ordinate represents energy efficiency. It can be seen from the figure that as the operating voltage decreases, the power consumption decreases and the energy efficiency improves. The minimum operating voltage of the circuit can reach 0.5V. Consumption and energy efficiency meet the requirements.

实施例3Example 3

本实施例3考虑更普遍的情况，即为了验证设计方法的实用性：将高位级联计算单元中N5[k]宽长比设置为低位级联计算单元中N1[k]宽长比的h倍，而其他条件均不变，可以使高位级联计算单元被激活时产生的放电电流I_h为低位级联计算单元被激活时产生的放电电流I₁的h倍。This embodiment 3 considers a more general situation, that is, in order to verify the practicability of the design method: set the width-to-length ratio of N5[k] in the high-order cascaded computing unit to h of the width-to-length ratio of N1[k] in the low-order cascaded computing unit times, while other _conditions remain unchanged, the discharge current I h generated when the high-order cascaded computing unit is activated can be h times the discharge current I ₁ generated when the low-order cascaded computing unit is activated.

参考实施例2，公式(5)、(8)替换成公式(13)、(14)：With reference to embodiment 2, formula (5), (8) are replaced with formula (13), (14):

高位级联计算单元的导通电流I_h为：The conduction current I _h of the high-level cascaded computing unit is:

节点RX(即X3[k])的电压V_RX为：The voltage V _RX of node RX (ie X3[k]) is:

V_RX＝I_h*R_on (14)V _RX =I _h *R _on (14)

再将公式(14)代入公式(13)，同时再将公式(7)代入公式(4)中，可得：Substituting formula (14) into formula (13) and substituting formula (7) into formula (4) at the same time, we can get:

令

make

则重写公式(15)、(16)可得：Then rewrite formulas (15) and (16) to get:

则可转化为证明：

can be transformed into a proof:

即证明公式(18)：

成立即可。That is to prove formula (18):

Just set up.

而根据洛必达法则可得：And according to L'Hopital's law:

即公式(18)得证，所以，I_h＝I₁*h。That is, formula (18) is proved, so, I _h =I ₁ *h.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims

1. multi-bit operation module, which is used to realize multi-bit multiplication and accumulation calculation, is characterized in that, described multi-bit operation module comprises:

Bit division calculation module 1, which includes n cascaded calculation units 1, n weight bit lines 1 LW[1]～LW[n];

Wherein, the kth cascaded computing unit 1 includes:

The gate of the NMOS transistor N1[k] is connected to the weight bit line LW[k], the drain is connected to the calculation bit line CBL, and the source is connected to the node X1[k];

NMOS transistor N2[k], its gate is connected to weight bit line one LW[k], its drain is connected to calculation bit line CBLB, and its source is connected to node two X2[k]; the specifications of N1[k] and N2[k] are the same ;

NMOS transistor N3[k], its gate is connected to the global bit line GBL, its drain is connected to node X1[k], and its source is connected to ground GND; and

NMOS transistor N4[k], its gate is connected to the global bit line GBLB, its drain is connected to node 2 X2[k], and its source is connected to ground GND; 1≤k≤n;

Bit division calculation module 2, which includes n cascaded calculation unit 2, n even weight bit line 2 RW[1]～RW[n];

Among them, the kth cascaded computing unit 2 includes:

NMOS transistor N5[k], its gate is connected to weight bit line two RW[k], its drain is connected to calculation bit line CBL, and its source is connected to node three X3[k];

NMOS transistor N6[k], its gate is connected to weight bit line 2 RW[k], its drain is connected to calculation bit line CBLB, and its source is connected to node 4 X4[k]; the specifications of N5[k] and N6[k] are the same ;

NMOS transistor N7[k], its gate is connected to the global bit line GBL, its drain is connected to node 3 X3[k], and its source is connected to ground GND; and

NMOS transistor N8[k], its gate is connected to the global bit line GBLB, its drain is connected to node 4 X4[k], and its source is connected to ground GND; N7[k], N8[k], N3[k], N4[k] ] have the same specifications, and the width-to-length ratio of N5[k] is h times the width-to-length ratio of N1[k];

Weight bit line 2 RW[k] and weight bit line 1 LW[k] are used to provide weight values; global bit lines GBL and GBLB are used to provide multi-bit input values;

The multi-bit calculation module works in parallel from the sub-bit calculation module 1 and the sub-bit calculation module 2 gating columns, receives weight values and multi-bit input values, and performs multi-bit multiplication and accumulation calculations; calculates bit lines CBL and CBLB for passing voltage The amount of change reflects the multi-bit multiply-accumulate calculation result.

2. In-memory computing circuit structure, characterized in that it includes:

A storage array module, which is used to provide a standard read-write mode and a multi-bit multiply-accumulate calculation mode; the storage array module includes a storage unit and a reference unit;

A data selection module, which includes a column selection module and a row decoding module, is used to locate and access the corresponding storage unit in the storage unit according to the external address signal in the standard read-write mode; the column selection module is also connected with a write drive circuit , used to control writing to the storage unit;

A sense amplifier module, which is used to compare the read current generated by the storage unit with the reference current of the reference unit to generate a conversion voltage, amplify the conversion voltage and obtain an output weight value; the sense amplifier module is also connected to a readout drive circuit, It is used to read the output weight value during the read operation of the standard read and write mode;

A mode selection module, which is used to switch the standard read-write mode and the multi-bit multiplication and accumulation calculation mode of the storage array module;

The multi-bit computing module as claimed in claim 1, under the multi-bit computing function mode, according to the weight value and the multi-bit input value, multi-bit multiplication and accumulation calculation is carried out; the multi-bit computing module is connected with an input register, which is used for Inputting multi-bit input values into the multi-bit computing module through global bit lines GBL and GBLB;

A quantization unit module, which is used to quantify the accumulated voltage variation of the calculated bit lines CBL and CBLB in the multi-bit multiply-accumulate calculation mode to obtain a quantized output; and

The timing control circuit module is used to control the timing of each part of the in-memory computing circuit structure to make it work correspondingly.

3. The in-memory computing circuit structure according to claim 2, wherein the storage unit comprises a left storage array and a right storage array;

The left storage array includes N columns and M rows of storage cells; wherein, every j column constitutes a group of left sub-arrays, and the left storage array includes N/j groups of left sub-arrays, N=n*j;

The right storage array also includes N columns and M rows of storage cells; wherein, every j column constitutes a set of right sub-arrays, and the right storage array includes N/j groups of right sub-arrays;

The reference part includes a left reference array and a right reference array; the left reference array includes left reference units corresponding to N/j columns and M rows of the left storage array; wherein, the kth column left reference unit and the kth group left subarray Corresponding setting; 1≤k≤N/j;

The right reference array includes right reference units corresponding to N/j columns and M rows of the right storage array; wherein, the kth column right reference unit is set correspondingly to the kth right subarray.

4. The in-memory computing circuit structure according to claim 3, wherein the storage unit comprises:

The NMOS transistor M1 has its gate connected to the word line WL, and its drain connected to the source line SL; and

A magnetic tunnel junction device MTJ1, one end of which is electrically connected to the bit line BL, and the other end is electrically connected to the source of M1;

The left reference unit and the right reference unit have the same structure, including:

The NMOS transistor M2 has its gate connected to the word line WL, and its drain connected to the reference source line; and

A magnetic tunnel junction device MTJ2, one end of which is electrically connected to the reference bit line, and the other end is electrically connected to the source of M2;

Memory cells, left reference cells, and right reference cells in the same row share the same word line WL; memory cells in the same column share the same bit line BL and the same source line SL; left reference cells in the same column share the same reference bit line, The same reference source line; the right reference cells in the same column share the same reference bit line and the same reference source line;

The kth reference bit line of the left reference array is used to output the reference current I _REF1 [k], and the kth reference bit line of the right reference array is used to output the reference current I _REF2 [k].

5. The computing circuit structure in memory according to claim 4, wherein the column selection module comprises n column selectors one and n column selectors two; n column selectors one and n column selectors Device two share the same addressing signal CS;

Wherein, the k-th column selector 1 is set correspondingly to the k-th group left sub-array, and the k-th column selector 2 is set correspondingly to the k-th group right sub-array;

The bit line BL of the left sub-array of the kth group is connected to the input terminal of the kth column selector 1, and the output terminal of the kth column selector 1 outputs the read current I _CELL1 [k];

The bit line BL of the right sub-array of the kth group is connected to the input terminal of the kth column selector 2, and the output terminal of the kth column selector 2 outputs the read current I _CELL2 [k];

The row decoding module is connected to the word line WL, and the M word lines WL share the same row decoding module.

6. The computing circuit structure in memory according to claim 5, wherein said sense amplifier module comprises n sense amplifiers one, n sense amplifiers two;

The kth sense amplifier one is connected to the kth column selector one; the kth sense amplifier two is connected to the kth column selector two;

The k-th sense amplifier 1 includes the k-th current sampling unit 1 and the k-th voltage amplifier 1, which are used to sample and compare I _CELL1 [k] and I _REF1 [k], and output DOUTL[k];

The k-th sense amplifier 2 includes the k-th current sampling unit 2 and the k-th voltage amplifier 2, which are used to sample and compare I _CELL2 [k] and I _REF2 [k], and output DOUTR[k].

7. The computing circuit structure in memory according to claim 6, wherein the k-th current sampling unit 1 comprises:

The gate of the PMOS transistor PL1[k] is connected to the external enable signal SAEN, the source is connected to the power supply VDD, and the drain is connected to the first node NETL1[k];

PMOS transistor PL2[k], its gate and drain are connected to the first node NETL1[k], and its source is connected to the power supply VDD;

PMOS transistor PL3[k], its gate is connected to the first node NETL1[k], its source is connected to the power supply VDD, and its drain is connected to the first-stage output node SOL[k];

The gate of the PMOS transistor PL4[k] is connected to the second node NETL2[k], the source is connected to the power supply VDD, and the drain is connected to the first-stage output node SOBL[k];

PMOS transistor PL5[k], its gate and drain are connected to the second node NETL2[k], and its source is connected to the power supply VDD;

PMOS transistor PL6[k], its gate is connected to the external enable signal SAEN, its source is connected to the power supply VDD, and its drain is connected to the second node NETL2[k];

The gate of the NMOS transistor NML1[k] is connected to the clamping signal CLP, the source is connected to the read current I _CELL1 [k], and the drain is connected to the first node NETL1[k];

The NMOS transistor NML2[k] has its gate connected to the first-stage output node SOBL[k], its source connected to the ground GND, and its drain connected to the first-stage output node SOL[k];

NMOS transistor NML3[k], its gate and drain are connected to the first-stage output node SOBL[k], and its source is connected to ground GND; and

The gate of the NMOS transistor NML4[k] is connected to the clamping signal CLP, the source is connected to the reference current I _REF1 [k], and the drain is connected to the second node NETL2[k];

The kth voltage amplifier one includes:

PMOS transistor PL7[k], its gate and drain are connected to the third node NETL3[k], and its source is connected to the power supply VDD;

PMOS transistor PL8[k], its gate and drain are connected to the fourth node NETL4[k], and its source is connected to the power supply VDD;

The NMOS transistor NML5[k] has its gate connected to the first-stage output node SOL[k], its source connected to the fifth node NETL5[k], and its drain connected to the third node NETL3[k];

The NMOS transistor NML6[k] has its gate connected to the first-stage output node SOBL[k], its source connected to the fifth node NETL5[k], and its drain connected to the fourth node NETL4[k];

The NMOS transistor NML7[k] has its gate connected to the external enable signal SAEN, its source connected to the ground GND, and its drain connected to the fifth node NETL5[k]; and

Inverter INVL[k], its input terminal is connected to the fourth node NETL4[k], the output signal is a weight value DOUTL[k] and is divided into two routes, one of which is used to connect the readout drive circuit, and the other is connected to the weight bit Line-LW[k].

8. The internal computing circuit structure according to claim 7, wherein the k-th current sampling unit two comprises:

PMOS transistor PR1[k], its gate is connected to the external enable signal SAEN, its source is connected to the power supply VDD, and its drain is connected to the first node NETR1[k];

PMOS transistor PR2[k], its gate and drain are connected to the first node NETR1[k], and its source is connected to the power supply VDD;

PMOS transistor PR3[k], its gate is connected to the first node NETR1[k], its source is connected to the power supply VDD, and its drain is connected to the first-stage output node SOR[k];

PMOS transistor PR4[k], its gate is connected to the second node NETR2[k], its source is connected to the power supply VDD, and its drain is connected to the first-stage output node SOBR[k];

PMOS transistor PR5[k], its gate and drain are connected to the second node NETR2[k], and its source is connected to the power supply VDD;

PMOS transistor PR6[k], its gate is connected to the external enable signal SAEN, its source is connected to the power supply VDD, and its drain is connected to the second node NETR2[k];

The gate of the NMOS transistor NMR1[k] is connected to the clamping signal CLP, the source is connected to the read current I _CELL2 [k], and the drain is connected to the first node NETR1[k];

The NMOS transistor NMR2[k] has its gate connected to the first-stage output node SOBR[k], its source connected to the ground GND, and its drain connected to the first-stage output node SOR[k];

NMOS transistor NMR3[k], its gate and drain are connected to the first-stage output node SOBR[k], and its source is connected to ground GND; and

The gate of the NMOS transistor NMR4[k] is connected to the clamping signal CLP, the source is connected to the reference current I _REF2 [k], and the drain is connected to the second node NETR2[k];

Described voltage amplifier two comprises:

PMOS transistor PR8[k], its gate and drain are connected to the third node NETR3[k], and its source is connected to the power supply VDD;

PMOS transistor PR9[k], its gate and drain are connected to the fourth node NETR4[k], and its source is connected to the power supply VDD;

The NMOS transistor NMR5[k] has its gate connected to the first-stage output node SOR[k], its source connected to the fifth node NETR5[k], and its drain connected to the third node NETR3[k];

The NMOS transistor NMR6[k] has its gate connected to the first-stage output node SOBR[k], its source connected to the fifth node NETR5[k], and its drain connected to the fourth node NETR4[k];

The NMOS transistor NMR7[k] has its gate connected to the external enable signal SAEN, its source connected to the ground GND, and its drain connected to the fifth node NETR5[k];

Inverter INVR[k], whose input terminal is connected to the fourth node NETR4[k], the output signal is a weight value DOUTR[k] and is divided into two routes, one of which is used to connect the readout drive circuit, and the other is connected to the weight bit Line two RW[k].

9. The internal computing circuit structure according to claim 7, wherein the mode selection module selects a mode according to an external enable signal MEN;

When the external enable signal MEN is at a high level, the storage array module is in a standard read-write mode;

When the external enable signal MEN is low level, the memory array module is in the multi-bit multiplication and accumulation calculation mode, the kth sense amplifier one is connected to the kth cascaded calculation unit one, and the kth sense amplifier two is connected to the kth sense amplifier Two cascaded computing units are connected.

10. The in-memory computing circuit structure according to claim 7, wherein the quantization unit module includes a capacitor array, a successive approximation logic control unit, and a voltage comparator;

The capacitor array includes capacitors C0, C1, C2, C3, and C4, and the upper plates of the capacitors C0, C1, C2, C3, and C4 are all connected to the input node INP of the voltage comparator, and the capacitors C0, C1, C2, The lower plates of C3 and C4 are respectively connected to the calculation bit line CBL/CBLB, the reference voltage VREF, and the power supply VDD through the control switches S[0], S[1], S[2], S[3], and S[4] ;

The successive approximation logic control unit adopts the successive approximation logic to generate the control signal S[4:0], which is used to control the capacitor array to generate the voltage comparator enable signal EN and then control the voltage comparator;

The input node INN of the voltage comparator is connected to the common mode voltage VCM; when the control signal CE is turned on, the input nodes INP and INN are short-circuited, and the voltage comparator enable signal EN turns on the voltage comparator, and the voltage of the input nodes INP and INN Compare and generate output Output.