CN117828253B - Multi-bit matrix vector multiplication calculation unit, array and working method thereof - Google Patents
Multi-bit matrix vector multiplication calculation unit, array and working method thereof Download PDFInfo
- Publication number
- CN117828253B CN117828253B CN202311675859.8A CN202311675859A CN117828253B CN 117828253 B CN117828253 B CN 117828253B CN 202311675859 A CN202311675859 A CN 202311675859A CN 117828253 B CN117828253 B CN 117828253B
- Authority
- CN
- China
- Prior art keywords
- calculation
- module
- flash memory
- input
- voltage signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Read Only Memory (AREA)
Abstract
本发明公开了一种多比特矩阵向量乘法计算单元、阵列及其工作方法,该单元包括:包括模拟电压信号输入模块、闪存存储单元、计算电容模块、等比例时间信号产生模块和读出电路模块;该阵列包括多个并列设置的多比特矩阵向量乘法计算单元。该方法包括:对计算电容单元进行清零处理;获取等比例时间长度的电压信号;对清除电荷后的计算电容模块进行充电获取数字权重;对模拟输入电压信号与数字权重进行相乘计算处理;输出数字信号形式的计算结果。本发明能够提升闪存器件的计算速率并降低闪存器件的计算能耗。本发明作为一种多比特矩阵向量乘法计算单元、阵列及其工作方法,可广泛应用于闪存芯片技术领域。
The present invention discloses a multi-bit matrix vector multiplication calculation unit, array and working method thereof, wherein the unit comprises: an analog voltage signal input module, a flash memory storage unit, a calculation capacitor module, a proportional time signal generation module and a readout circuit module; the array comprises a plurality of multi-bit matrix vector multiplication calculation units arranged in parallel. The method comprises: performing a zeroing process on the calculation capacitor unit; obtaining a voltage signal of a proportional time length; charging the calculation capacitor module after clearing the charge to obtain a digital weight; performing a multiplication calculation process on the analog input voltage signal and the digital weight; and outputting the calculation result in the form of a digital signal. The present invention can improve the calculation rate of a flash memory device and reduce the calculation energy consumption of the flash memory device. As a multi-bit matrix vector multiplication calculation unit, array and working method thereof, the present invention can be widely used in the field of flash memory chip technology.
Description
技术领域Technical Field
本发明涉及闪存芯片技术领域,尤其涉及一种多比特矩阵向量乘法计算单元、阵列及其工作方法。The present invention relates to the technical field of flash memory chips, and in particular to a multi-bit matrix-vector multiplication calculation unit, an array and a working method thereof.
背景技术Background Art
大数据的可用性和处理器的性能提升,各种高性能的神经网络算法得到飞速发展,特别是各种基于点积计算的神经网络算法,这些算法被用于实现诸如图像分类识别、机器翻译等功能,并被广泛地部署于各种应用,如自动驾驶汽车,虚拟现实,社交媒体以及医疗设备等。随着这种网络的复杂性和计算工作的增加,开发一种高效的神经加速器正成为许多应用中迫切和关键的需求。从计算的角度来看,神经网络处理可以分为两个主要阶段,即训练和推理。这两个阶段计算需求不同。通常在功能强大的服务器上执行的训练需要大量的计算能力和内存,以存储巨大的训练数据集、计算输出,并运行训练算法来计算权重更新。另一方面,最终用户推理只需要足够的内存和处理能力,就可以将输入数据输入到预先训练过的网络,并在特定的应用程序内计算结果。在广泛的应用中,如快速增长的“物联网”,人们希望在终端上而不是在云服务器中执行推理。终端设备上的推理降低了处理延迟和通信成本,提高了安全性,并消除了对可靠网络访问的依赖。另一方面,大多数终端用户设备都有严格的功率预算限制,但目前在终端设备上运行神经网络算法的推理,依然具有终端处理器内存容量小、计算和存储功能分别由中央处理器和存储器完成,进而造成的高功耗及散热问题,而神经网络应用在计算过程中,需要大量的访存操作,而访存操作占据了计算过程中大量的功耗和时延,这限制了处理器的性能提升。With the availability of big data and the improvement of processor performance, various high-performance neural network algorithms have been developed rapidly, especially various neural network algorithms based on dot product calculations, which are used to achieve functions such as image classification recognition and machine translation, and are widely deployed in various applications such as self-driving cars, virtual reality, social media, and medical devices. With the increase in the complexity of such networks and the increase in computing work, the development of an efficient neural accelerator is becoming an urgent and critical need in many applications. From a computing perspective, neural network processing can be divided into two main stages, namely training and inference. The two stages have different computing requirements. Training, which is usually performed on powerful servers, requires a lot of computing power and memory to store huge training data sets, calculate outputs, and run training algorithms to calculate weight updates. On the other hand, end-user inference only requires enough memory and processing power to input input data into a pre-trained network and calculate the results within a specific application. In a wide range of applications, such as the rapidly growing "Internet of Things", people want to perform inference on the terminal rather than in a cloud server. Inference on the terminal device reduces processing latency and communication costs, improves security, and eliminates the reliance on reliable network access. On the other hand, most end-user devices have strict power budget constraints, but the reasoning of neural network algorithms currently running on terminal devices still has high power consumption and heat dissipation problems caused by the small memory capacity of the terminal processor and the calculation and storage functions being performed by the central processing unit and memory respectively. The application of neural networks requires a large number of memory access operations during the calculation process, and memory access operations occupy a large amount of power consumption and latency in the calculation process, which limits the performance improvement of the processor.
发明内容Summary of the invention
为了解决上述技术问题,本发明的目的是提供一种多比特矩阵向量乘法计算单元、阵列及其工作方法,能够提升闪存器件的计算速率并降低闪存器件的计算能耗。In order to solve the above technical problems, the purpose of the present invention is to provide a multi-bit matrix-vector multiplication calculation unit, array and working method thereof, which can improve the calculation rate of flash memory devices and reduce the calculation energy consumption of flash memory devices.
本发明所采用的第一技术方案是:一种多比特矩阵向量乘法计算单元,包括模拟电压信号输入模块、闪存存储单元、计算电容模块、等比例时间信号产生模块和读出电路模块,所述模拟电压信号输入模块的输出端与所述闪存存储单元的漏端连接,所述等比例时间信号产生模块的输出端与所述闪存存储单元的栅端连接,所述闪存存储单元的源端与所述计算电容模块的输入端连接,所述计算电容模块的输出端与所述读出电路模块的输入端连接,其中:The first technical solution adopted by the present invention is: a multi-bit matrix vector multiplication calculation unit, comprising an analog voltage signal input module, a flash memory storage unit, a calculation capacitance module, a proportional time signal generation module and a readout circuit module, wherein the output end of the analog voltage signal input module is connected to the drain end of the flash memory storage unit, the output end of the proportional time signal generation module is connected to the gate end of the flash memory storage unit, the source end of the flash memory storage unit is connected to the input end of the calculation capacitance module, and the output end of the calculation capacitance module is connected to the input end of the readout circuit module, wherein:
所述模拟电压信号输入模块用于产生模拟输入电压信号并加载至所述闪存存储单元的漏端;The analog voltage signal input module is used to generate an analog input voltage signal and load it to the drain end of the flash memory storage unit;
所述闪存存储单元用于存储数字权重;The flash memory storage unit is used to store digital weights;
所述计算电容模块用于将所述模拟输入电压信号与所述数字权重进行相乘计算处理,输出电压信号形式的计算结果;The capacitance calculation module is used to perform multiplication calculation processing on the analog input voltage signal and the digital weight, and output the calculation result in the form of a voltage signal;
所述等比例时间信号产生模块用于产生等比例时间长度的电压信号并控制所述闪存存储单元的导通时间;The proportional time signal generating module is used to generate a voltage signal of proportional time length and control the conduction time of the flash memory storage unit;
所述读出电路模块用于将所述电压信号形式的计算结果进行转换处理,输出数字信号形式的计算结果。The readout circuit module is used to convert the calculation result in the form of a voltage signal and output the calculation result in the form of a digital signal.
进一步,所述模拟电压信号输入模块的具体实施为以下两种构建方式中的任意一种构建方式:Furthermore, the analog voltage signal input module is specifically implemented in any one of the following two construction modes:
第一种构建方式:通过数模转换器ADC进行构建;The first construction method: construction through digital-to-analog converter ADC;
第二种构建方式:通过传感器与放大器电性连接进行构建。The second construction method is to construct by electrically connecting the sensor and the amplifier.
进一步,所述闪存存储单元包括至少一个闪存器件,所述闪存器件用于存储1比特数据。Furthermore, the flash memory storage unit includes at least one flash memory device, and the flash memory device is used to store 1-bit data.
进一步,所述计算电容模块包括若干个计算电容单元、连接开关与接地开关,所述计算电容单元与所述连接开关电性连接,所述计算电容单元与所述接地开关电性连接,其中:Further, the capacitance calculation module includes a plurality of capacitance calculation units, a connection switch and a grounding switch, the capacitance calculation unit is electrically connected to the connection switch, and the capacitance calculation unit is electrically connected to the grounding switch, wherein:
所述计算电容单元用于将所述模拟输入电压信号与所述数字权重进行相乘计算处理;The capacitance calculation unit is used to perform multiplication calculation processing on the analog input voltage signal and the digital weight;
所述连接开关用于连接所述计算电容单元;The connecting switch is used to connect the calculating capacitance unit;
所述接地开关用于清除所述计算电容单元的电荷。The grounding switch is used to clear the charge of the calculation capacitor unit.
进一步,所述等比例时间信号产生模块包括时钟发生器、第一数字延迟单元、第二数字延迟单元、第一输入异或门、第二输入异或门和第三输入异或门,所述时钟发生器的第一输出端分别与所述第一输入异或门的第一输入端、第二输入异或门的第一输入端和第三输入异或门的第一输入端连接,所述时钟发生器的第二输出端与所述第一数字延迟单元的输入端连接,所述第一数字延迟单元的输出端与所述第一输入异或门的第二输入端连接,所述第二数字延迟单元的输出端与所述第二输入异或门的第二输入端连接,所述第二数字延迟单元的输入端与所述第三输入异或门的第二输入端连接,其中:Further, the proportional time signal generating module includes a clock generator, a first digital delay unit, a second digital delay unit, a first input XOR gate, a second input XOR gate and a third input XOR gate, the first output end of the clock generator is respectively connected to the first input end of the first input XOR gate, the first input end of the second input XOR gate and the first input end of the third input XOR gate, the second output end of the clock generator is connected to the input end of the first digital delay unit, the output end of the first digital delay unit is connected to the second input end of the first input XOR gate, the output end of the second digital delay unit is connected to the second input end of the second input XOR gate, and the input end of the second digital delay unit is connected to the second input end of the third input XOR gate, wherein:
所述时钟发生器用于产生计算时钟信号;The clock generator is used to generate a computing clock signal;
所述第一数字延迟单元和所述第二数字延迟单元用于对所述计算时钟信号进行延迟,得到延迟后的计算时钟信号;The first digital delay unit and the second digital delay unit are used to delay the calculation clock signal to obtain a delayed calculation clock signal;
所述第一输入异或门、所述第二输入异或门和所述第三输入异或门用于对所述计算时钟信号与所述延迟后的计算时钟信号进行异或计算,得到等比例时间长度的电压信号。The first input XOR gate, the second input XOR gate and the third input XOR gate are used to perform XOR calculation on the calculation clock signal and the delayed calculation clock signal to obtain voltage signals of equal proportional time lengths.
进一步,所述读出电路模块为模数转换器。Furthermore, the readout circuit module is an analog-to-digital converter.
本发明所采用的第二技术方案是:一种多比特矩阵向量乘法计算阵列,包括多个并列设置的多比特矩阵向量乘法计算单元。The second technical solution adopted by the present invention is: a multi-bit matrix-vector multiplication calculation array, including a plurality of multi-bit matrix-vector multiplication calculation units arranged in parallel.
本发明所采用的第三技术方案是:一种多比特矩阵向量乘法计算阵列的工作方法,包括独立计算电容工作模式和共享计算电容工作模式。The third technical solution adopted by the present invention is: a working method of a multi-bit matrix-vector multiplication calculation array, including an independent calculation capacitor working mode and a shared calculation capacitor working mode.
进一步,所述独立计算电容工作模式具体包括以下步骤:Furthermore, the independent capacitance calculation working mode specifically includes the following steps:
将所述连接开关与所述接地开关进行闭合处理,对所述计算电容单元进行清零处理,得到清除电荷后的计算电容模块;The connecting switch and the grounding switch are closed, and the capacitance calculation unit is cleared to obtain a capacitance calculation module after charge is cleared;
将所述连接开关与所述接地开关进行断开,并基于所述模拟电压信号输入模块获取所述模拟输入电压信号,所述等比例时间信号产生模块获取所述等比例时间长度的电压信号;The connecting switch is disconnected from the grounding switch, and the analog input voltage signal is obtained based on the analog voltage signal input module, and the proportional time signal generating module obtains the voltage signal of the proportional time length;
将所述模拟输入电压信号加载至所述闪存存储单元的漏端,所述等比例时间长度的电压信号加载至所述闪存存储单元的源端,对所述清除电荷后的计算电容模块进行充电,获取数字权重;The analog input voltage signal is loaded to the drain end of the flash memory unit, the voltage signal of the proportional time length is loaded to the source end of the flash memory unit, the calculation capacitance module after the charge is cleared is charged, and a digital weight is obtained;
将所述连接开关进行闭合,所述接地开关保持断开,基于所述计算电容模块对所述模拟输入电压信号与所述数字权重进行相乘计算处理,输出电压信号形式的计算结果;The connection switch is closed, the grounding switch remains open, and the analog input voltage signal and the digital weight are multiplied and calculated based on the capacitance calculation module, and a calculation result in the form of a voltage signal is output;
通过所述读出电路模块对所述电压信号形式的计算结果进行转换处理,输出数字信号形式的计算结果。The calculation result in the form of a voltage signal is converted and processed by the readout circuit module, and the calculation result in the form of a digital signal is output.
进一步,所述共享计算电容工作模式为,构建至少一个page模块,所述page模块包括至少一个所述闪存存储单元与至少一个所述计算电容模块,所述闪存存储单元与所述计算电容模块一一对应连接,若一个所述page模块处于工作状态时,其余所述page模块的所述连接开关均断开且所述闪存存储单元的栅端不加载所述等比例时间长度的电压信号。Furthermore, the shared calculation capacitor working mode is to construct at least one page module, the page module includes at least one flash memory storage unit and at least one calculation capacitor module, the flash memory storage unit and the calculation capacitor module are connected one-to-one, if one of the page modules is in working state, the connection switches of the other page modules are disconnected and the gate end of the flash memory storage unit is not loaded with the voltage signal of the equal proportional time length.
本发明计算单元、阵列及工作方法的有益效果是:本发明通过结合模拟电压信号输入模块、闪存存储单元、计算电容模块、等比例时间信号产生模块和读出电路模块进行构建多比特矩阵向量乘法计算单元,基于计算电容模块进行电荷计算,且通过计算电容模块实现分阶段相乘计算,进一步降低多比特矩阵向量乘法计算单元的工作能耗,基于电荷计算,提高计算结果的可靠性,基于闪存存储单元的存储数字权重,能够使得存内计算单元有大规模的并行性,提升计算能效与速率。The beneficial effects of the computing unit, array and working method of the present invention are as follows: the present invention constructs a multi-bit matrix vector multiplication computing unit by combining an analog voltage signal input module, a flash memory storage unit, a computing capacitance module, a proportional time signal generating module and a readout circuit module, performs charge calculation based on the computing capacitance module, and realizes multiplication calculation in stages through the computing capacitance module, thereby further reducing the working energy consumption of the multi-bit matrix vector multiplication computing unit, improving the reliability of the calculation result based on the charge calculation, and enabling the in-memory computing unit to have large-scale parallelism based on the storage digital weight of the flash memory storage unit, thereby improving the computing energy efficiency and speed.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明一种多比特矩阵向量乘法计算单元的结构示意图;FIG1 is a schematic diagram of the structure of a multi-bit matrix-vector multiplication calculation unit of the present invention;
图2是本发明一种多比特矩阵向量乘法计算阵列的工作方法的结构框图;FIG2 is a block diagram of a working method of a multi-bit matrix-vector multiplication calculation array according to the present invention;
图3是本发明具体实施例等比例时间信号产生模块结构示意图;3 is a schematic diagram of the structure of a proportional time signal generating module according to a specific embodiment of the present invention;
图4是本发明具体实施例完整的单个乘法计算结构示意图;FIG4 is a schematic diagram of a complete single multiplication calculation structure of a specific embodiment of the present invention;
图5是本发明具体实施例阵列计算独立电容计算模式结构示意图;5 is a schematic diagram of the structure of an array calculation mode for independent capacitance calculation according to a specific embodiment of the present invention;
图6是本发明具体实施例阵列计算共享电容计算模式结构示意图;6 is a schematic diagram of the structure of a calculation mode for shared capacitance calculation by an array according to a specific embodiment of the present invention;
图7是本发明具体实施例提供的常规NAND闪存存储架构意图。FIG. 7 is a schematic diagram of a conventional NAND flash memory storage architecture provided by a specific embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合附图和具体实施例对本发明做进一步的详细说明。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。The present invention is further described in detail below in conjunction with the accompanying drawings and specific embodiments. The step numbers in the following embodiments are only provided for the convenience of explanation and description, and the order between the steps is not limited in any way. The execution order of each step in the embodiment can be adaptively adjusted according to the understanding of those skilled in the art.
参照图1,本发明提供了一种多比特矩阵向量乘法计算单元,包括模拟电压信号输入模块、闪存存储单元、计算电容模块、等比例时间信号产生模块和读出电路模块,模拟电压信号输入模块的输出端与闪存存储单元的漏端连接,等比例时间信号产生模块的输出端与闪存存储单元的栅端连接,闪存存储单元的源端与计算电容模块的输入端连接,计算电容模块的输出端与读出电路模块的输入端连接,其中:1, the present invention provides a multi-bit matrix vector multiplication calculation unit, including an analog voltage signal input module, a flash memory storage unit, a calculation capacitance module, a proportional time signal generation module and a readout circuit module, the output end of the analog voltage signal input module is connected to the drain end of the flash memory storage unit, the output end of the proportional time signal generation module is connected to the gate end of the flash memory storage unit, the source end of the flash memory storage unit is connected to the input end of the calculation capacitance module, and the output end of the calculation capacitance module is connected to the input end of the readout circuit module, wherein:
模拟电压信号输入模块用于产生模拟输入电压信号并加载至闪存存储单元的漏端;The analog voltage signal input module is used to generate an analog input voltage signal and load it to the drain end of the flash memory storage unit;
具体地,模拟电压信号输入模块包括两种构建方式,第一构建方式为通过数模转换器ADC进行构建,第二种构建方式为通过传感器与放大器电性连接进行构建。Specifically, the analog voltage signal input module includes two construction methods. The first construction method is to construct it through a digital-to-analog converter ADC, and the second construction method is to construct it through an electrical connection between a sensor and an amplifier.
在本实施例中,模拟电压信号输入模块是两种方案,第一种是传感器加放大器的方案,用于直接将传感器端模拟的信号加载与FLASH器件的漏断作为模拟电压,放大器DAC的方案是用于将传感器自身处理好的数字信号任意比特转换为模拟电压加载与FLASH器件的漏端,作为乘法计算的模拟电压。In this embodiment, there are two schemes for the analog voltage signal input module. The first is a sensor plus amplifier scheme, which is used to directly load the analog signal at the sensor end with the drain of the FLASH device as an analog voltage. The amplifier DAC scheme is used to convert any bit of the digital signal processed by the sensor itself into an analog voltage and load it with the drain of the FLASH device as an analog voltage for multiplication calculation.
闪存存储单元用于存储数字权重;The flash memory unit is used to store the digital weights;
具体地,闪存存储单元包括至少一个闪存器件,闪存器件用于存储1比特数据。Specifically, the flash memory storage unit includes at least one flash memory device, and the flash memory device is used to store 1-bit data.
在本实施例中,闪存存储单元用于完成模拟输入与数字权重的乘法,计算电容为等大小电容,乘法结果由电荷均分之后的电压表示。In this embodiment, the flash memory storage unit is used to complete the multiplication of the analog input and the digital weight, the calculated capacitor is a capacitor of equal size, and the multiplication result is represented by the voltage after the charge is evenly divided.
闪存存储单元,可根据数字数值的位宽包含不同数量的闪存器件,每个闪存器件设置为存储1比特数据,闪存单元的类型即可以是NOR类型的闪存,也可以是NAND类型的闪存,具体如图7所示,存储单元可以是任意能够以导通和阻断形式实现数据0/1存储的存储器件。The flash memory storage unit may include different numbers of flash memory devices according to the bit width of the digital value. Each flash memory device is configured to store 1 bit of data. The type of the flash memory unit may be either a NOR type flash memory or a NAND type flash memory. Specifically, as shown in FIG. 7 , the storage unit may be any storage device that can implement data 0/1 storage in a conductive and blocked form.
进一步的,可由N个闪存器件组成N比特的闪存单元,闪存单元中,位置权重最高的闪存器件,即位置权重为2N位置的闪存器件,漏端连接输入模拟电压信号,栅端连接2N倍单位时长的电压信号,源端连接对应的计算电容的上极板,位置权重为2N-1位置的闪存器件,漏端连接与其他闪存器件相同的同一个模拟电压信号,栅端连接2(N-1)倍单位时长的电压信号,源端连接对应的计算电容的上极板,N个闪存器件均与前两个闪存器件一样,与输入电压等比例时长信号以及计算电容有相同的连接关系。Furthermore, an N-bit flash memory unit can be composed of N flash memory devices. In the flash memory unit, the flash memory device with the highest position weight, that is, the flash memory device with a position weight of 2N, has a drain connected to an input analog voltage signal, a gate connected to a voltage signal of 2N times the unit time length, and a source connected to an upper plate of a corresponding calculated capacitor. The flash memory device with a position weight of 2N-1 has a drain connected to the same analog voltage signal as other flash memory devices, a gate connected to a voltage signal of 2(N-1) times the unit time length, and a source connected to an upper plate of a corresponding calculated capacitor. The N flash memory devices are the same as the first two flash memory devices, and have the same connection relationship with the input voltage proportional time length signal and the calculated capacitor.
计算电容模块用于将模拟输入电压信号与数字权重进行相乘计算处理,输出电压信号形式的计算结果;The calculation capacitance module is used to multiply the analog input voltage signal with the digital weight and output the calculation result in the form of a voltage signal;
具体地,计算电容模块包括若干个计算电容单元、连接开关与接地开关,计算电容单元与连接开关电性连接,计算电容单元与接地开关电性连接,其中,计算电容单元用于将模拟输入电压信号与数字权重进行相乘计算处理;连接开关用于连接计算电容单元;接地开关用于清除计算电容单元的电荷。Specifically, the calculation capacitor module includes a plurality of calculation capacitor units, a connecting switch and a grounding switch, wherein the calculation capacitor unit is electrically connected to the connecting switch, and the calculation capacitor unit is electrically connected to the grounding switch, wherein the calculation capacitor unit is used to multiply the analog input voltage signal with the digital weight; the connecting switch is used to connect the calculation capacitor unit; and the grounding switch is used to clear the charge of the calculation capacitor unit.
在本实施例中,计算电容模块,包括计算电容、连接开关和接地开关,计算电容用于完成模拟输入与数字权重的乘法,计算电容为等大小电容,乘法结果由电荷均分之后的电压表示,连接开关用于与接地开关配合完成计算前的电荷清零步骤,以及在计算阶段连接等大小的计算电容,接地开关用于在电荷清零阶段与连接开关配合清除电容上的电荷以提升计算精度减小误差;In this embodiment, the calculation capacitance module includes a calculation capacitance, a connection switch and a grounding switch. The calculation capacitance is used to complete the multiplication of the analog input and the digital weight. The calculation capacitance is an equal-sized capacitance. The multiplication result is represented by the voltage after the charge is evenly divided. The connection switch is used to cooperate with the grounding switch to complete the charge zeroing step before the calculation, and to connect the equal-sized calculation capacitances in the calculation stage. The grounding switch is used to cooperate with the connection switch in the charge zeroing stage to clear the charge on the capacitor to improve the calculation accuracy and reduce the error.
计算电容的数量与乘数2的位宽一致,多个计算电容的上极板与传输门的输入端相连,多个连接开关的输出端与读出电路模块连接。The number of calculation capacitors is consistent with the bit width of the multiplier 2, the upper plates of the plurality of calculation capacitors are connected to the input end of the transmission gate, and the output ends of the plurality of connection switches are connected to the readout circuit module.
进一步的,如图4所示,计算电容模块包括:计算电容,连接开关和接地开关,计算电容的数量与乘数2的位宽一致,多个计算电容的上极板与传输门的输入端相连,同时与闪存器件的源端相连,多个连接开关的输出端与读出电路模块连接,乘数1也即外部输入模拟电压信号加载于闪存器件的漏端,若对应的闪存器件存储的数值为1,且加载于闪存器件栅极的信号在等比例时间信号的高电平内,则漏端的电压会在等比例时间信号内对计算电容进行充电,闪存器件上的电荷增加且与时间信号长度成正比,若闪存器件存储的数值为0或者栅极的等比例时间信号为低电平则充电的路径阻断,计算电容不被充电,计算电容上的电荷为0,在充电阶段结束以后,等比例时间信号为低电平,外部输入模拟电压信号可以保持也可以断开,此时接地开关断开连接开关闭合,计算电容上的电荷进行迁移,所有的计算电容及导线在物理上是相连的,计算电容直接连接读出电路,将计算结果的模拟电压转换为数字信号。Further, as shown in FIG4 , the capacitance calculation module includes: a capacitance calculation, a connection switch and a grounding switch, the number of capacitance calculations is consistent with the bit width of the multiplier 2, the upper plates of the plurality of capacitance calculations are connected to the input end of the transmission gate and to the source end of the flash memory device, the output ends of the plurality of connection switches are connected to the readout circuit module, the multiplier 1, i.e., the external input analog voltage signal is loaded on the drain end of the flash memory device, if the corresponding flash memory device stores a value of 1, and the signal loaded on the gate of the flash memory device is within the high level of the proportional time signal, then the voltage at the drain end will charge the capacitance calculation within the proportional time signal. , the charge on the flash memory device increases and is proportional to the length of the time signal. If the value stored in the flash memory device is 0 or the proportional time signal of the gate is low, the charging path is blocked, the calculation capacitor is not charged, and the charge on the calculation capacitor is 0. After the charging stage is over, the proportional time signal is low, and the external input analog voltage signal can be maintained or disconnected. At this time, the ground switch is disconnected and the connecting switch is closed, and the charge on the calculation capacitor is migrated. All the calculation capacitors and wires are physically connected. The calculation capacitor is directly connected to the readout circuit to convert the analog voltage of the calculation result into a digital signal.
等比例时间信号产生模块用于产生等比例时间长度的电压信号并控制闪存存储单元的导通时间;The proportional time signal generating module is used to generate a voltage signal of proportional time length and control the conduction time of the flash memory storage unit;
具体地,等比例时间信号产生模块包括时钟发生器、第一数字延迟单元、第二数字延迟单元、第一输入异或门、第二输入异或门和第三输入异或门,时钟发生器的第一输出端分别与第一输入异或门的第一输入端、第二输入异或门的第一输入端和第三输入异或门的第一输入端连接,时钟发生器的第二输出端与第一数字延迟单元的输入端连接,第一数字延迟单元的输出端与第一输入异或门的第二输入端连接,第二数字延迟单元的输出端与第二输入异或门的第二输入端连接,第二数字延迟单元的输入端与第三输入异或门的第二输入端连接,其中,时钟发生器用于产生计算时钟信号;第一数字延迟单元和第二数字延迟单元用于对计算时钟信号进行延迟,得到延迟后的计算时钟信号;第一输入异或门、第二输入异或门和第三输入异或门用于对计算时钟信号与延迟后的计算时钟信号进行异或计算,得到等比例时间长度的电压信号。Specifically, the proportional time signal generating module includes a clock generator, a first digital delay unit, a second digital delay unit, a first input XOR gate, a second input XOR gate and a third input XOR gate, wherein the first output end of the clock generator is respectively connected to the first input end of the first input XOR gate, the first input end of the second input XOR gate and the first input end of the third input XOR gate, the second output end of the clock generator is connected to the input end of the first digital delay unit, the output end of the first digital delay unit is connected to the second input end of the first input XOR gate, the output end of the second digital delay unit is connected to the second input end of the second input XOR gate, and the input end of the second digital delay unit is connected to the second input end of the third input XOR gate, wherein the clock generator is used to generate a calculation clock signal; the first digital delay unit and the second digital delay unit are used to delay the calculation clock signal to obtain a delayed calculation clock signal; the first input XOR gate, the second input XOR gate and the third input XOR gate are used to perform XOR calculation on the calculation clock signal and the delayed calculation clock signal to obtain a voltage signal of proportional time length.
在本实施例中,等比例时间产生模块用于生成1:2:4等的等比例时间,时间信号加载于FLASH器件的栅端,控制对应FLASH器件的导通时间,以实现乘法计算过程中数字信号的位置权重20、21、22至2N。In this embodiment, the proportional time generation module is used to generate proportional times of 1:2:4, etc. The time signal is loaded on the gate terminal of the FLASH device to control the on-time of the corresponding FLASH device to realize the position weights 2 0 , 2 1 , 2 2 to 2 N of the digital signal in the multiplication calculation process.
如图3所示,等比例时间信号产生模块,由时钟发生器、标准的数字延迟单元及两输入异或门组成,其中,时钟发生器用于产生整个计算模块的时钟,标准的数字延迟单元用于对输入的时间信号进行时延,为了实现等比例时间,插入等比例数量的延迟单元,输入异或门用于将时钟信号与延迟后的信号进行异或,最后产生等比例的高电平信号,即等比例时间信号;As shown in FIG3 , the proportional time signal generation module is composed of a clock generator, a standard digital delay unit and a two-input XOR gate, wherein the clock generator is used to generate the clock of the entire calculation module, the standard digital delay unit is used to delay the input time signal, and in order to achieve proportional time, a proportional number of delay units are inserted, and the input XOR gate is used to XOR the clock signal with the delayed signal, and finally generate a proportional high-level signal, i.e., a proportional time signal;
需要说明的是,不同的制造商具有不同标准的数字延迟单元,可以根据计算电容模块的电容大小,选择合适的延时时间长度,进而选取合适的数字延迟单元,适合具体实施的数字延迟单元即为标准的数字延迟单元。It should be noted that different manufacturers have digital delay units with different standards. The appropriate delay time length can be selected based on the capacitance of the calculated capacitance module, and then the appropriate digital delay unit can be selected. The digital delay unit suitable for specific implementation is the standard digital delay unit.
进一步地,时钟发生器的信号输出端与三个两输入异或门的输入端连接,时钟发生器的信号与第一个标准数字延迟单元的输入端连接,第一个标准延迟单元的输出端连接第一个两输入异或门的输入端,产生单位时长的电压信号加载于闪存器件的漏端,始终发生器的信号在经过两个数字标准延时单元之后,第二个标准延时单元的输出端连接第二个两输入异或门的一端,异或门输出产生二倍与单位时长的电压信号,时钟发生器的时钟信号在经过四个数字标准延时单元后输入端与第三个两输入异或门的一端连接,异或门输出产生四倍单位时长的电压信号,以此类推,产生2N倍于单位时长的电压信号,等比例时间信号产生模块的多个输出对应连接闪存存储单元的闪存器件的栅极。Furthermore, the signal output end of the clock generator is connected to the input ends of three two-input XOR gates, the signal of the clock generator is connected to the input end of the first standard digital delay unit, the output end of the first standard delay unit is connected to the input end of the first two-input XOR gate, and a voltage signal of unit time length is generated and loaded on the drain end of the flash memory device. After the signal of the clock generator passes through two digital standard delay units, the output end of the second standard delay unit is connected to one end of the second two-input XOR gate, and the XOR gate output generates a voltage signal with twice the unit time length. After the clock signal of the clock generator passes through four digital standard delay units, the input end is connected to one end of the third two-input XOR gate, and the XOR gate output generates a voltage signal with four times the unit time length, and so on, to generate a voltage signal with 2N times the unit time length. The multiple outputs of the proportional time signal generating module correspond to the gate of the flash memory device connected to the flash memory storage unit.
读出电路模块用于将电压信号形式的计算结果进行转换处理,输出数字信号形式的计算结果。The readout circuit module is used to convert the calculation result in the form of a voltage signal and output the calculation result in the form of a digital signal.
具体地,读出电路模块为模数转换器;Specifically, the readout circuit module is an analog-to-digital converter;
在本实施例中,读出电路用于将计算完成的乘法结果电压信号转换为数字信号,并用于后续的处理。In this embodiment, the readout circuit is used to convert the calculated multiplication result voltage signal into a digital signal for subsequent processing.
一种多比特矩阵向量乘法计算阵列,包括多个并列设置的多比特矩阵向量乘法计算单元。A multi-bit matrix-vector multiplication calculation array comprises a plurality of multi-bit matrix-vector multiplication calculation units arranged in parallel.
具体地,阵列即由多个乘法单元组成,阵列的大小即矩阵乘法单元的个数可视存储规模的大小设定。Specifically, the array is composed of a plurality of multiplication units, and the size of the array, that is, the number of matrix multiplication units, can be set according to the size of the visual storage scale.
多比特矩阵向量乘法计算阵列结构结合存储器件、计算电容以及等比例时间实现在闪存内的计算,能够在一定程度上解决电流型存内计算的速度功耗及可靠性较低的问题。The multi-bit matrix-vector multiplication calculation array structure combines storage devices, calculation capacitors and proportional time to realize calculations in flash memory, which can to a certain extent solve the problems of low speed, power consumption and reliability of current-type in-memory calculations.
综上所述,本发明实施例提供一种基于闪存的多比特矩阵向量乘法计算单元及阵列,多比特存内计算单元执行输入数据和存储单元中的存储数据之间的矩阵向量计算。在该单元中,数模转换器将多比特数字输入信号转换为模拟电压信号或者传感器输出的模拟电压加于存储单元浮栅管的漏端,闪存存储单元存单个比特数据,施加于闪存存储单元栅端的电压的保持时间成等比例关系,等比例时间由数字模块实现,进而实现由多个闪存存储单元共同存储多比特数字信号。根据漏端输入模拟电压信号的大小以及存储单元栅端电压保持时间的长短,控制于存储单元一一对应的等大小电容上的电荷量大小,等大小电容均分后的电压即为单个乘法的乘积,多个乘法单元连接,共用均分后的电压即为矩阵向量计算结果,计算结果由读出电路转换读出。基于闪存原有的基本架构能够实现大规模并行的矩阵向量乘法计算,计算阵列中包括多个矩阵向量乘法计算单元。In summary, the embodiment of the present invention provides a multi-bit matrix vector multiplication calculation unit and array based on flash memory, and the multi-bit in-memory calculation unit performs matrix vector calculation between input data and storage data in the storage unit. In the unit, the digital-to-analog converter converts the multi-bit digital input signal into an analog voltage signal or the analog voltage output by the sensor is added to the drain end of the floating gate tube of the storage unit, and the flash memory storage unit stores a single bit of data. The retention time of the voltage applied to the gate end of the flash memory storage unit is in an equal proportional relationship. The equal proportional time is realized by the digital module, and then the multi-bit digital signal is stored by multiple flash memory storage units. According to the size of the analog voltage signal input to the drain end and the length of the retention time of the gate end voltage of the storage unit, the charge amount on the capacitors of equal size corresponding to the storage unit is controlled, and the voltage after the capacitors of equal size are evenly divided is the product of a single multiplication. Multiple multiplication units are connected, and the voltage after the shared even division is the matrix vector calculation result, and the calculation result is converted and read out by the readout circuit. Based on the original basic architecture of the flash memory, large-scale parallel matrix vector multiplication calculation can be realized, and the calculation array includes multiple matrix vector multiplication calculation units.
参照图2,一种多比特矩阵向量乘法计算阵列的工作方法,包括独立计算电容工作模式和共享计算电容工作模式,其中:Referring to FIG. 2 , a working method of a multi-bit matrix-vector multiplication calculation array includes an independent calculation capacitor working mode and a shared calculation capacitor working mode, wherein:
如图5所示,对于独立计算电容工作模式,包括以下步骤:As shown in FIG5 , the independent capacitor calculation working mode includes the following steps:
S1、将连接开关与接地开关进行闭合处理,对计算电容单元进行清零处理,得到清除电荷后的计算电容模块;S1, closing the connection switch and the grounding switch, clearing the calculation capacitance unit, and obtaining a calculation capacitance module after clearing the charge;
S2、将连接开关与接地开关进行断开,并基于模拟电压信号输入模块获取模拟输入电压信号,等比例时间信号产生模块获取等比例时间长度的电压信号;S2, disconnecting the connection switch from the grounding switch, and obtaining an analog input voltage signal based on the analog voltage signal input module, and obtaining a voltage signal of an equal-proportional time length by the equal-proportional time signal generation module;
S3、将模拟输入电压信号加载至闪存存储单元的漏端,等比例时间长度的电压信号加载至闪存存储单元的源端,对清除电荷后的计算电容模块进行充电,获取数字权重;S3, loading the analog input voltage signal to the drain end of the flash memory unit, loading the voltage signal of equal proportional time length to the source end of the flash memory unit, charging the calculation capacitor module after the charge is cleared, and obtaining the digital weight;
S4、将连接开关进行闭合,接地开关保持断开,基于计算电容模块对模拟输入电压信号与数字权重进行相乘计算处理,输出电压信号形式的计算结果;S4, closing the connection switch, keeping the grounding switch open, performing multiplication calculation processing on the analog input voltage signal and the digital weight based on the calculation capacitance module, and outputting the calculation result in the form of a voltage signal;
S5、通过读出电路模块对电压信号形式的计算结果进行转换处理,输出数字信号形式的计算结果。S5. Convert the calculation result in the form of a voltage signal through the readout circuit module and output the calculation result in the form of a digital signal.
独立计算电容模式下,模拟输入电压信号连接到闪存器件的漏端,闪存器件的源端连接至计算电容的上极板,上级版连接开关,下极板接地,多个开关闭合完成电荷的均分;在独立计算电容模式下,与前述的乘法单元相同,每个闪存器件连接一个计算电容,N×N个计算单元连接实现一个N×N的矩阵向量乘法,一个N×N的矩阵向量乘法单元,包含N×N个闪存器件及对应的计算电容;多个乘法单元连接后,多个计算电容均分,实现矩阵向量乘法;读出电路的连接方式,一个矩阵向量乘法的模块(即N×N×(乘数2)位宽个闪存器件)连接一个模数转换模块;多个矩阵向量乘法模块可同时工作;多个矩阵向量乘法模块,共享由传感器和放大器或者数模转换器输出的模拟电压信号。In the independent calculation capacitor mode, the analog input voltage signal is connected to the drain of the flash memory device, the source of the flash memory device is connected to the upper plate of the calculation capacitor, the upper plate is connected to a switch, the lower plate is grounded, and multiple switches are closed to complete the equal distribution of charges; in the independent calculation capacitor mode, the same as the aforementioned multiplication unit, each flash memory device is connected to a calculation capacitor, N×N calculation units are connected to realize an N×N matrix-vector multiplication, and an N×N matrix-vector multiplication unit includes N×N flash memory devices and corresponding calculation capacitors; after multiple multiplication units are connected, multiple calculation capacitors are evenly distributed to realize matrix-vector multiplication; the connection method of the readout circuit is that a matrix-vector multiplication module (i.e., N×N×(multiplier 2) bit-width flash memory devices) is connected to an analog-to-digital conversion module; multiple matrix-vector multiplication modules can work simultaneously; multiple matrix-vector multiplication modules share the analog voltage signal output by the sensor and amplifier or digital-to-analog converter.
如图6所示,对于共享计算电容工作模式,构建至少一个page模块,page模块包括至少一个闪存存储单元与至少一个计算电容模块,闪存存储单元与计算电容模块一一对应连接,若一个page模块处于工作状态时,其余page模块的连接开关均断开且闪存存储单元的栅端不加载等比例时间长度的电压信号。As shown in FIG6 , for the shared calculation capacitor working mode, at least one page module is constructed, and the page module includes at least one flash memory storage unit and at least one calculation capacitor module. The flash memory storage unit and the calculation capacitor module are connected one-to-one. If one page module is in a working state, the connection switches of the other page modules are disconnected and the gate end of the flash memory storage unit is not loaded with a voltage signal of equal proportional time length.
进一步的,在电荷清零阶段,由第一组page切换到第二组page,第一组page的均分开关断开,第二组page的均分开关及接地开关闭合,对上一page计算剩余的大量电荷进行接地清零;在计算过程的充电阶段,与三个计算电容及地连接的开关断开,模拟电压信号加载于闪存器件的漏端,等比例时间产生电路的输出电压加载于闪存器件的源端,对计算电容进行充电;在计算过程的均分阶段:与三个计算电容连接的开关闭合,接地的开关断开,在一个大的时钟周期内实现计算电容上的电荷均分并得到计算结果电压;在结果数据读出阶段,基于计算过程的均分阶段中的均分电压稳定后,使能模拟转换模块将模拟电压信号转换为数字信号输出。Furthermore, in the charge clearing stage, the first group of pages is switched to the second group of pages, the averaging switch of the first group of pages is disconnected, and the averaging switch and grounding switch of the second group of pages are closed, and the large amount of charge remaining from the previous page calculation is grounded and cleared; in the charging stage of the calculation process, the switch connected to the three calculation capacitors and the ground is disconnected, the analog voltage signal is loaded on the drain end of the flash memory device, and the output voltage of the proportional time generation circuit is loaded on the source end of the flash memory device to charge the calculation capacitor; in the averaging stage of the calculation process: the switch connected to the three calculation capacitors is closed, and the grounding switch is disconnected, so that the charge on the calculation capacitor is evenly distributed within a large clock cycle and the calculation result voltage is obtained; in the result data reading stage, after the averaging voltage in the averaging stage of the calculation process is stable, the analog conversion module is enabled to convert the analog voltage signal into a digital signal output.
具体地,模拟输入电压信号连接到闪存器件的漏端,闪存器件的源端连接至计算电容的上极板,上级版连接开关,下极板接地,多个开关闭合完成电荷的均分;与前述的乘法单元相同,每个闪存器件连接一个计算电容,N×N个计算单元连接实现一个N×N的矩阵向量乘法,一个N×N的矩阵向量乘法单元,包含N×N个闪存器件及对应的计算电容;共享计算电容模式下,按照闪存的存储架构概念,一个page上的闪存器件与计算电容一一连接,但多个page共用一组计算电容,在进行矩阵向量乘法计算时,采用十分复用的方法,某个page工作时,其余page的连接开关均断开且闪存器件的栅端不加载电压,只有进行计算的page与计算电容的上极板连接。Specifically, the analog input voltage signal is connected to the drain of the flash memory device, the source of the flash memory device is connected to the upper plate of the calculation capacitor, the upper plate is connected to a switch, the lower plate is grounded, and multiple switches are closed to complete the equalization of the charge; similar to the aforementioned multiplication unit, each flash memory device is connected to a calculation capacitor, and N×N calculation units are connected to realize an N×N matrix-vector multiplication. An N×N matrix-vector multiplication unit includes N×N flash memory devices and corresponding calculation capacitors; in the shared calculation capacitor mode, according to the storage architecture concept of the flash memory, the flash memory devices on a page are connected to the calculation capacitors one by one, but multiple pages share a group of calculation capacitors. When performing matrix-vector multiplication calculations, a very multiplexed method is adopted. When a page is working, the connection switches of the other pages are disconnected and the gate end of the flash memory device is not loaded with voltage. Only the page performing the calculation is connected to the upper plate of the calculation capacitor.
综上所述,在计算的速度上,由于存内计算单元是基于电荷进行计算,理论上速度可逼近DRAM;在功耗上,电容为储能器件,矩阵向量计算的均分阶段不耗能,因此在功耗上也较其他的存内计算方案有一定的优势;在计算密度上,闪存本身具有高的存储密度,进一步能够使得存内计算单元有大规模的并行性,提升计算能效;在计算可靠性上,基于电荷进行计算,无电流型闪存存内计算由于存储器件本身的可靠性下降所导致的计算可靠性下降;在计算架构上,能够根据具体的算法,灵活配置卷积层的大小以及全连接层的计算规模,并且,现有的存内计算的路径并不会影响原本的存储功能,计算也不会对存储的可靠性有影响;这使得这个架构的应用范围相较于修改原有存储结构的电流型闪存存内计算应用范围更广。In summary, in terms of computing speed, since the in-memory computing unit performs calculations based on charge, theoretically the speed can be close to DRAM; in terms of power consumption, capacitors are energy storage devices, and the averaging stage of matrix-vector calculations does not consume energy, so it has certain advantages over other in-memory computing solutions in terms of power consumption; in terms of computing density, flash memory itself has a high storage density, which can further enable large-scale parallelism of in-memory computing units and improve computing energy efficiency; in terms of computing reliability, calculations are based on charge, and there is no current-type flash memory in-memory computing that suffers from a decrease in computing reliability due to the decrease in the reliability of the storage device itself; in terms of computing architecture, the size of the convolutional layer and the computing scale of the fully connected layer can be flexibly configured according to the specific algorithm, and the existing in-memory computing path will not affect the original storage function, and the calculation will not affect the reliability of the storage; this makes the application range of this architecture wider than the current-type flash memory in-memory computing that modifies the original storage structure.
上述方法实施例中的内容均适用于本系统实施例中,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法实施例所达到的有益效果也相同。The contents of the above method embodiments are all applicable to the present system embodiments. The functions specifically implemented by the present system embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
以上是对本发明的较佳实施进行了具体说明,但本发明创造并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the invention is not limited to the embodiments. Those skilled in the art may make various equivalent modifications or substitutions without violating the spirit of the present invention. These equivalent modifications or substitutions are all included in the scope defined by the claims of this application.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311675859.8A CN117828253B (en) | 2023-12-07 | 2023-12-07 | Multi-bit matrix vector multiplication calculation unit, array and working method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311675859.8A CN117828253B (en) | 2023-12-07 | 2023-12-07 | Multi-bit matrix vector multiplication calculation unit, array and working method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117828253A CN117828253A (en) | 2024-04-05 |
CN117828253B true CN117828253B (en) | 2024-09-03 |
Family
ID=90516407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311675859.8A Active CN117828253B (en) | 2023-12-07 | 2023-12-07 | Multi-bit matrix vector multiplication calculation unit, array and working method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117828253B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114546335A (en) * | 2022-04-25 | 2022-05-27 | 中科南京智能技术研究院 | Memory computing device for multi-bit input and multi-bit weight multiplication accumulation |
CN115312090A (en) * | 2022-07-01 | 2022-11-08 | 南方科技大学 | In-memory computing circuit and method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB8431925D0 (en) * | 1984-12-18 | 1985-01-30 | Secr Defence | Digital data processor |
CN106342405B (en) * | 2010-09-14 | 2014-05-14 | 中国航空工业集团公司雷华电子技术研究所 | Digital radar frequency synthesizer controls pulse-generating circuit |
CN111949935A (en) * | 2019-05-16 | 2020-11-17 | 北京知存科技有限公司 | Analog vector-matrix multiplication circuit and chip |
CN111144558B (en) * | 2020-04-03 | 2020-08-18 | 深圳市九天睿芯科技有限公司 | Multi-bit convolution operation module based on time-variable current integration and charge sharing |
CN117037877A (en) * | 2023-07-18 | 2023-11-10 | 中山大学 | Memory computing chip based on NOR Flash and control method thereof |
-
2023
- 2023-12-07 CN CN202311675859.8A patent/CN117828253B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114546335A (en) * | 2022-04-25 | 2022-05-27 | 中科南京智能技术研究院 | Memory computing device for multi-bit input and multi-bit weight multiplication accumulation |
CN115312090A (en) * | 2022-07-01 | 2022-11-08 | 南方科技大学 | In-memory computing circuit and method |
Also Published As
Publication number | Publication date |
---|---|
CN117828253A (en) | 2024-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10185655B2 (en) | Systems and methods for rapid processing and storage of data | |
EP3754561A1 (en) | Reconfigurable memory compression techniques for deep neural networks | |
US20210271597A1 (en) | Configurable in memory computing engine, platform, bit cells and layouts therefore | |
WO2021197073A1 (en) | Multi-bit convolution operation module based on time-variable current integration and charge sharing | |
Sim et al. | Scalable stochastic-computing accelerator for convolutional neural networks | |
WO2019129070A1 (en) | Integrated circuit chip device | |
Zheng et al. | Mobilatice: a depth-wise dcnn accelerator with hybrid digital/analog nonvolatile processing-in-memory block | |
CN114707647B (en) | Precision lossless calculation integrated device and method suitable for multi-precision neural network | |
Long et al. | A ferroelectric FET based power-efficient architecture for data-intensive computing | |
Zhang et al. | Parallel convolutional neural network (CNN) accelerators based on stochastic computing | |
CN114003198B (en) | Inner product processing unit, arbitrary precision calculation device, method, and readable storage medium | |
CN115390789A (en) | Analog domain full-precision in-memory computing circuit and method based on magnetic tunnel junction computing unit | |
CN116821048A (en) | Integrated memory chip and operation method thereof | |
CN115910152A (en) | Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function | |
US20210150328A1 (en) | Hierarchical Hybrid Network on Chip Architecture for Compute-in-memory Probabilistic Machine Learning Accelerator | |
CN110717580B (en) | Calculation array based on voltage modulation and oriented to binarization neural network | |
CN116306854A (en) | Transformer Neural Network Acceleration Device and Method Based on Photoelectric Storage and Computing Integrated Device | |
Yu et al. | A 4-bit mixed-signal MAC array with swing enhancement and local kernel memory | |
Wu et al. | A 3.89-GOPS/mW scalable recurrent neural network processor with improved efficiency on memory and computation | |
CN117828253B (en) | Multi-bit matrix vector multiplication calculation unit, array and working method thereof | |
Xia et al. | Reconfigurable spatial-parallel stochastic computing for accelerating sparse convolutional neural networks | |
CN113378109B (en) | Mixed base fast Fourier transform calculation circuit based on in-memory calculation | |
CN115525250A (en) | memory computing circuit | |
Wu et al. | An energy-efficient accelerator with relative-indexing memory for sparse compressed convolutional neural network | |
Hsu et al. | Special Session: Architecture-Level DCIM Technologies for Edge AI Computing Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |