CN118468951A

CN118468951A - In-memory computing device and computing method

Info

Publication number: CN118468951A
Application number: CN202410635201.2A
Authority: CN
Inventors: 王林飞; 张�杰; 李倩; 刘海南; 李博
Original assignee: Institute of Microelectronics of CAS
Current assignee: Institute of Microelectronics of CAS
Priority date: 2024-05-21
Filing date: 2024-05-21
Publication date: 2024-08-09

Abstract

The invention discloses an in-memory computing device and a computing method, relates to the technical field of in-memory computing chip design, and aims to solve the problems that in the prior art, when an original signal and a reverse signal of a signal are output simultaneously, an inverter is required to be additionally arranged, so that the delay and the power consumption of the reverse signal relative to the original signal are increased. The in-memory computing column of the in-memory computing device comprises a computing unit and a plurality of storage units, wherein the computing unit comprises a first computing subunit and a second computing subunit; the first computing subunit includes a first input and a first output; the first input end is used for receiving a first input signal, and the first output end is used for outputting an inverse signal of the first output signal; the second computing subunit includes a second input and a second output; the second input end is used for receiving an inverse signal of the first input signal, and the second output end is used for outputting a first output signal; the method realizes that the output signal and the inverse signal of the output signal can be output simultaneously without configuring an inverter.

Description

In-memory computing device and computing method

技术领域Technical Field

本发明涉及存内计算芯片设计技术领域，尤其是一种存内计算装置及计算方法。The present invention relates to the technical field of in-memory computing chip design, and in particular to an in-memory computing device and a computing method.

背景技术Background Art

存内计算技术是目前神经网络加速器方向和人工智能方向的研究热点，为了更好的发挥存内计算技术的优势，研究人员对SRAM存储阵列进行改进。使得SRAM存储阵列更加适用于存内计算技术的发展。目前的存内计算芯片主要实现的功能为神经网络的乘法累加计算过程，其中对SRAM存储阵列进行改进，使得SRAM存储阵列能够同时具有存储功能和乘法功能。In-memory computing technology is currently a hot topic in the research of neural network accelerators and artificial intelligence. In order to better utilize the advantages of in-memory computing technology, researchers have improved the SRAM storage array, making the SRAM storage array more suitable for the development of in-memory computing technology. The main function of the current in-memory computing chip is the multiplication and accumulation calculation process of the neural network. The SRAM storage array is improved so that the SRAM storage array can have both storage and multiplication functions.

现有技术中采用存储阵列和计算单元阵列共同构成计算结构的方法，它们的门分别连接SRAM互补位线(BLP和BLN)，以及中间输入线INP和INN。SRAM中存储的数据(W)选择互补位线(BLP和BLN)连接到计算单元；在子阵列内部节省了上拉网络中的晶体管。但是采用的动态逻辑电路的总功耗会明显高于静态逻辑门，因为是时钟周期控制，每个周期都有一个翻转。由于周期性的预充电和放电操作，动态逻辑通常表现出较高的开关活动性；并且只有一个输出，与门输出逻辑与结果时需要使用与非门增加反相器来实现，即需要输出信号的反向信号时需要加额外的反相器来实现输出输出信号以及输出信号的反信号，这就导致了导致反向信号相对于原信号的延时以及功耗增加的问题。In the prior art, a method of using a storage array and a computing unit array to jointly form a computing structure is adopted, and their gates are respectively connected to the SRAM complementary bit lines (BLP and BLN), and the intermediate input lines INP and INN. The data (W) stored in the SRAM selects the complementary bit lines (BLP and BLN) to be connected to the computing unit; the transistors in the pull-up network are saved inside the sub-array. However, the total power consumption of the dynamic logic circuit adopted will be significantly higher than that of the static logic gate, because it is controlled by the clock cycle, and there is a flip in each cycle. Due to the periodic pre-charge and discharge operations, dynamic logic usually exhibits high switching activity; and there is only one output, and when the AND gate outputs the logical AND result, it is necessary to use a NAND gate to add an inverter to achieve it, that is, when the inverse signal of the output signal is required, an additional inverter is required to output the output signal and the inverse signal of the output signal, which leads to the problem of delay of the reverse signal relative to the original signal and increased power consumption.

因此，需要设计一种更加先进的计算单元，以解决现有技术中在输出信号的反向信号时需要额外配置反相器导致反向信号相对于原信号的延时以及功耗增加的问题。Therefore, it is necessary to design a more advanced computing unit to solve the problem in the prior art that an additional inverter is required when outputting the reverse signal of the signal, resulting in a delay of the reverse signal relative to the original signal and increased power consumption.

发明内容Summary of the invention

本发明的目的在于提供一种存内计算装置及计算方法，用于解决现有技术中在同时输出信号的原信号和反向信号时需要额外配置反相器导致反向信号相对于原信号的延时以及功耗增加的问题。The object of the present invention is to provide an in-memory computing device and a computing method, which are used to solve the problem in the prior art that when the original signal and the reverse signal of the signal are output simultaneously, an additional inverter is required, resulting in a delay of the reverse signal relative to the original signal and increased power consumption.

为了实现上述目的，本发明提供如下技术方案：In order to achieve the above object, the present invention provides the following technical solutions:

第一方面，本发明提供一种存内计算装置，所述存内计算装置包括多个存内计算列，所述存内计算列包括计算单元和多个存储单元，所述计算单元与多个所述存储单元连接；In a first aspect, the present invention provides an in-memory computing device, the in-memory computing device comprising a plurality of in-memory computing columns, the in-memory computing columns comprising a computing unit and a plurality of storage units, the computing unit being connected to the plurality of storage units;

所述计算单元包括第一计算子单元以及第二计算子单元；所述第一计算子单元与第二计算子单元连接；The computing unit comprises a first computing subunit and a second computing subunit; the first computing subunit is connected to the second computing subunit;

所述第一计算子单元包括第一输入端和第一输出端；所述第一输入端用于接收第一输入信号，所述第一输出端用于输出第一输出信号的反信号；The first calculation subunit includes a first input terminal and a first output terminal; the first input terminal is used to receive a first input signal, and the first output terminal is used to output an inverse signal of the first output signal;

所述第二计算子单元包括第二输入端和第二输出端；所述第二输入端用于接收第一输入信号的反信号,所述第二输出端用于输出第一输出信号。The second calculation subunit includes a second input terminal and a second output terminal; the second input terminal is used to receive the inverse signal of the first input signal, and the second output terminal is used to output the first output signal.

第二方面，本发明提供一种存内计算方法，所述存内计算方法应用于存内计算装置；所述存内计算装置包括多个存内计算列，所述存内计算列包括计算单元和多个存储单元；所述方法包括：In a second aspect, the present invention provides an in-memory computing method, the in-memory computing method is applied to an in-memory computing device; the in-memory computing device includes a plurality of in-memory computing columns, the in-memory computing columns include a computing unit and a plurality of storage units; the method includes:

所述计算单元接收输入信号、输入信号的反信号以及多个所述存储单元通过位线发送的位线信号以及位线反信号；The calculation unit receives an input signal, an inverse signal of the input signal, and a bit line signal and a bit line inverse signal sent by a plurality of the storage units through a bit line;

基于所述输入信号、所述输入信号的反信号、所述位线信号以及所述位线反信号，使用预设的晶体管进行与非操作和或非操作，使得所述计算单元同时输出输出信号以及所述输出信号的反信号。Based on the input signal, the inverted signal of the input signal, the bit line signal and the bit line inverted signal, a NAND operation and a NOR operation are performed using preset transistors, so that the calculation unit simultaneously outputs an output signal and an inverted signal of the output signal.

与现有技术相比，本发明提供的一种存内计算装置，通过在存内计算装置的任意一个存内计算列中设置与计算单元连接的多个存储单元；其中，计算单元包括第一计算子单元以及第二计算子单元，第一计算子单元与第二计算子单元连接；第一计算子单元包括第一输入端和第一输出端；第一输入端用于接收第一输入信号，第一输出端用于输出第一输出信号的反信号；第二计算子单元包括第二输入端和第二输出端；第二输入端用于接收第一输入信号的反信号,第二输出端用于输出第一输出信号；基于此，构建了一种具有双端输入及双端输出功能的存内计算装置，将计算单元与存储单元相结合，形成一种集存储功能、计算功能于一体的SRAM阵列；计算单元可以通过同时输入输入信号和输入信号的反信号，实现同时输出逻辑与结果和逻辑与结果的反向信号，从而不需要额外配置反相器，节省了电路中晶体管的数量，同时不会出现电路延时和功耗增加的问题。Compared with the prior art, the present invention provides an in-memory computing device, which is configured by setting a plurality of storage units connected to a computing unit in any in-memory computing column of the in-memory computing device; wherein the computing unit includes a first computing subunit and a second computing subunit, and the first computing subunit is connected to the second computing subunit; the first computing subunit includes a first input terminal and a first output terminal; the first input terminal is used to receive a first input signal, and the first output terminal is used to output an inverse signal of the first output signal; the second computing subunit includes a second input terminal and a second output terminal; the second input terminal is used to receive an inverse signal of the first input signal, and the second output terminal is used to output the first output signal; based on this, an in-memory computing device with two-end input and two-end output functions is constructed, and the computing unit is combined with the storage unit to form an SRAM array integrating storage function and computing function; the computing unit can realize the simultaneous output of a logic and result and an inverse signal of the logic and result by simultaneously inputting an input signal and an inverse signal of the input signal, thereby eliminating the need for additional configuration of an inverter, saving the number of transistors in the circuit, and preventing the problem of circuit delay and increased power consumption.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本发明的进一步理解，构成本发明的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the drawings:

图1为本发明提供的一种存内计算装置中任意一列存内计算列的结构示意图；FIG1 is a schematic diagram of the structure of any in-memory computing column in an in-memory computing device provided by the present invention;

图2为本发明提供的一种存内计算装置中任意一列存内计算列的计算操作主要流程示意图；FIG2 is a schematic diagram of the main flow of calculation operations of any one in-memory calculation column in an in-memory calculation device provided by the present invention;

图3为本发明提供的一种存内计算装置中任意一列存内计算列的计算操作时对应的信号输入输出示意图。FIG3 is a schematic diagram of signal input and output corresponding to a calculation operation of any in-memory calculation column in an in-memory calculation device provided by the present invention.

附图标记：100-计算单元、200-存储单元、110-第一计算子单元、120-第二计算子单元、MP1-第一P型晶体管、MP2-第二P型晶体管、MP3-第三P型晶体管、MP4-第四P型晶体管、MN1-第一N型晶体管、MN2-第二N型晶体管、MN3-第三N型晶体管、MN4-第四N型晶体管、MN5-第五N型晶体管、MN6-第六N型晶体管、MN7-第七N型晶体管、MN8-第八N型晶体管、I-第一输入端、O-第一输出端、I’-第二输入端、O’-第二输出端。Figure numerals: 100-computing unit, 200-storage unit, 110-first computing subunit, 120-second computing subunit, MP1-first P-type transistor, MP2-second P-type transistor, MP3-third P-type transistor, MP4-fourth P-type transistor, MN1-first N-type transistor, MN2-second N-type transistor, MN3-third N-type transistor, MN4-fourth N-type transistor, MN5-fifth N-type transistor, MN6-sixth N-type transistor, MN7-seventh N-type transistor, MN8-eighth N-type transistor, I-first input terminal, O-first output terminal, I’-second input terminal, O’-second output terminal.

具体实施方式DETAILED DESCRIPTION

为了便于清楚描述本发明实施例的技术方案，在本发明的实施例中，采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。例如，第一阈值和第二阈值仅仅是为了区分不同的阈值，并不对其先后顺序进行限定。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定，并且“第一”、“第二”等字样也并不限定一定不同。In order to clearly describe the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, words such as "first" and "second" are used to distinguish between identical or similar items with substantially identical functions and effects. For example, the first threshold and the second threshold are only used to distinguish between different thresholds, and do not limit their order. Those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and execution order, and words such as "first" and "second" do not necessarily limit them to be different.

需要说明的是，本发明中，“示例性的”或者“例如”等词用于表示作例子、例证或说明。本发明中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言，使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in the present invention should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.

本发明中，“至少一个”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B的情况，其中A，B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达，是指的这些项中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，a，b或c中的至少一项(个)，可以表示：a，b，c，a和b的结合，a和c的结合，b和c的结合，或a、b和c的结合，其中a，b，c可以是单个，也可以是多个。In the present invention, "at least one" means one or more, and "plurality" means two or more. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b or c can mean: a, b, c, the combination of a and b, the combination of a and c, the combination of b and c, or the combination of a, b and c, where a, b, c can be single or multiple.

目前存在的一种将计算单元与存储阵列分离的存内计算方法，SRAM(静态随机存取存储器Static Random Access Memory)由存储阵列和计算单元阵列共同构成计算结构，它们的门分别连接SRAM互补位线(BLP和BLN)，以及中间输入线INP和INN。SRAM中存储的数据(W)选择互补位线(BLP和BLN)连接到计算单元。IN和W之间的位运算可以通过计算单元的输入数据重新配置为AND(INP＝IN,INN＝0)，和 There is currently an in-memory computing method that separates the computing unit from the storage array. SRAM (Static Random Access Memory) is composed of a storage array and a computing unit array to form a computing structure. Their gates are connected to the SRAM complementary bit lines (BLP and BLN) and the intermediate input lines INP and INN. The data (W) stored in the SRAM selects the complementary bit lines (BLP and BLN) to connect to the computing unit. The bit operation between IN and W can be reconfigured to AND (INP = IN, INN = 0) through the input data of the computing unit. and

SRAM存储阵列为6个晶体管的典型SRAM存储单元结构，计算单元由两部分组成，第一部分是一个PMOS作为预充电管，第二部分是四个晶体管组成的下拉网络；计算单元内部也由6个晶体管组成了典型的计算单元结构。在时钟为0的时候，预充电的PMOS晶体管打开经输出节点电压拉到VDD。在此期间，下拉网络不打开，所以在预充电期间计算单元不发生任何静态功耗。当时候为1的时候，预充电晶体管关闭，下拉网络打开，下来网络根据输入值和下拉网络的拓扑结构进行有条件的放电，如果输入使下拉网络导通，输出节点和GND之间形成低阻通路，输出节点放电到GND。这种计算单元在一个计算周期中只能有一次计算，与CMOS静态逻辑相比，这种计算单元在子阵列内部节省了上拉网络中的晶体管。The SRAM storage array is a typical SRAM storage cell structure with 6 transistors. The calculation unit consists of two parts. The first part is a PMOS as a pre-charge tube, and the second part is a pull-down network composed of four transistors. The calculation unit also consists of 6 transistors inside the calculation unit to form a typical calculation unit structure. When the clock is 0, the pre-charged PMOS transistor is turned on and pulled to VDD through the output node voltage. During this period, the pull-down network is not turned on, so the calculation unit does not generate any static power consumption during the pre-charge period. When the time is 1, the pre-charge transistor is turned off, and the pull-down network is turned on. The pull-down network is conditionally discharged according to the input value and the topological structure of the pull-down network. If the input turns on the pull-down network, a low-resistance path is formed between the output node and GND, and the output node is discharged to GND. This calculation unit can only have one calculation in a calculation cycle. Compared with CMOS static logic, this calculation unit saves transistors in the pull-up network inside the sub-array.

虽然这种计算阵列能很好的进行各种计算功能的实现与切换，但是该计算阵列采用的动态逻辑电路的总功耗会明显高于静态逻辑门，因为是时钟周期控制，每个周期都有一个翻转。由于周期性的预充电和放电操作，动态逻辑通常表现出较高的开关活动性。并且只有一个输出，在输出输出信号的反向信号时需要加额外配置的反相器，增加晶体管数目，从而无法避免的造成计算电路的电路延时以及功耗增加的问题。Although this computing array can well implement and switch various computing functions, the total power consumption of the dynamic logic circuit used in the computing array will be significantly higher than that of the static logic gate because it is controlled by the clock cycle and there is a flip in each cycle. Due to the periodic pre-charge and discharge operations, dynamic logic usually exhibits high switching activity. And there is only one output, and an additional inverter is required when outputting the reverse signal of the output signal, which increases the number of transistors, thus inevitably causing the circuit delay of the computing circuit and the increase of power consumption.

鉴于此，本发明提供一种存内计算装置，该存内计算装置具有双端输出功能的计算单元，并且将计算单元与存储单元相结合，形成一种集存储功能、计算功能于一体的SRAM阵列，计算单元可以通过同时输入信号和信号的反信号，实现同时输出逻辑与结果和逻辑与结果的反向信号，不需要在输出反向信号时额外配置反相器，节省电路中晶体管的数量，不会导致计算电路的电路延时以及功耗增加的问题；从而解决了现有技术中在输出信号的反向信号时需要额外配置反相器导致电路延时以及功耗增加的问题。In view of this, the present invention provides an in-memory computing device, which has a computing unit with a dual-end output function, and combines the computing unit with a storage unit to form an SRAM array integrating storage function and computing function. The computing unit can simultaneously output a logic result and a reverse signal of the logic result by simultaneously inputting a signal and an inverse signal of the signal. There is no need to additionally configure an inverter when outputting the reverse signal, which saves the number of transistors in the circuit and does not cause circuit delays and increased power consumption of the computing circuit. This solves the problem in the prior art that an additional inverter needs to be configured when outputting the reverse signal of the signal, resulting in circuit delays and increased power consumption.

接下来结合附图对本发明的技术方案进行详细说明：Next, the technical solution of the present invention is described in detail with reference to the accompanying drawings:

请参阅图1，图1为本发明提供的一种存内计算装置中任意一列存内计算列的结构示意图。需要说明的是，存内计算装置表示可以由多个存储单元和多个计算单元组成的存内计算阵列，存内计算阵列中的任意一个存内计算列中可以包括多个存储单元和至少一个计算单元。Please refer to Figure 1, which is a schematic diagram of the structure of any in-memory computing column in an in-memory computing device provided by the present invention. It should be noted that the in-memory computing device represents an in-memory computing array that can be composed of multiple storage units and multiple computing units, and any in-memory computing column in the in-memory computing array can include multiple storage units and at least one computing unit.

在图1中，所述存内装置可以包括：In FIG1 , the storage device may include:

计算单元100和多个存储单元200，所述计算单元100与所述存储单元200连接；具体的，所述计算单元100通过位线BL以及位线非BLB与多个所述存储单元200连接。需要说明的是，该方案中具有多个与存储单元200相同存储单元，多个所述存储单元均采用并联连接的方式与位线BL以及位线非BLB点连接，如图1中虚线位置表示与多个存储单元200相同的存储单元。当然，若为了增加SRAM阵列的计算能力，提高计算效率，计算单元100也可以包括多个，如2个或4个；其连接方式也可以采用与计算单元100相同的方式与位线BL以及位线非BLB连接；本方案中不做具体限定。A computing unit 100 and a plurality of storage units 200, wherein the computing unit 100 is connected to the storage unit 200; specifically, the computing unit 100 is connected to the plurality of storage units 200 via a bit line BL and a bit line non-BLB. It should be noted that the scheme has a plurality of storage units identical to the storage unit 200, and the plurality of storage units are connected to the bit line BL and the bit line non-BLB points in parallel, as shown in FIG1 where the dotted line position represents a storage unit identical to the plurality of storage units 200. Of course, in order to increase the computing power of the SRAM array and improve computing efficiency, the computing unit 100 may also include a plurality of, such as 2 or 4; its connection method may also be connected to the bit line BL and the bit line non-BLB in the same manner as the computing unit 100; this is not specifically limited in the scheme.

计算单元100可以包括第一计算子单元110以及第二计算子单元120；第一计算子单元110与第二计算子单元120连接。The computing unit 100 may include a first computing subunit 110 and a second computing subunit 120 ; the first computing subunit 110 is connected to the second computing subunit 120 .

其中，第一计算子单元110包括第一输入端I和第一输出端O；第一输入端I用于接收第一输入信号，第一输出端O用于输出第一输出信号的反信号；第二计算子单元120包括第二输入端I’和第二输出端O’；第二输入端I’用于接收第一输入信号的反信号,第二输出端O’用于输出第一输出信号。Among them, the first calculation subunit 110 includes a first input terminal I and a first output terminal O; the first input terminal I is used to receive a first input signal, and the first output terminal O is used to output an inverse signal of the first output signal; the second calculation subunit 120 includes a second input terminal I' and a second output terminal O'; the second input terminal I' is used to receive an inverse signal of the first input signal, and the second output terminal O' is used to output the first output signal.

基于此，本发明提供一种存内计算装置，设置具有双端输出功能的计算单元，与存储阵列相结合，形成一种集存储功能、计算功能于一体的SRAM阵列，可以实现同时输出逻辑与结果和逻辑与结果的反向信号，不需要在输出反向信号时额外配置反相器，解决了现有技术中在同时输出信号的原信号和反向信号时需要额外配置反相器导致反向信号相对于原信号的延时以及功耗增加的问题。Based on this, the present invention provides an in-memory computing device, which is provided with a computing unit with a dual-end output function, which is combined with a storage array to form an SRAM array integrating storage function and computing function. It can realize the simultaneous output of logic and result and the reverse signal of the logic and result, and there is no need to additionally configure an inverter when outputting the reverse signal. This solves the problem in the prior art that when the original signal and the reverse signal of the signal are output simultaneously, an additional inverter needs to be configured, resulting in a delay of the reverse signal relative to the original signal and an increase in power consumption.

优选的，第一计算子单元110包括第一P型晶体管MP1，第一N型晶体管MN1以及第四N型晶体管MN4；第一P型晶体管MP1的源极与电源端VDD连接，第一P型晶体管MP1的漏极与第一N型晶体管MN1的漏极连接，第一P型晶体管MP1的漏极与第一输出端O连接；第一N型晶体管MN1的栅极与该存内计算装置的位线BL连接，第一N型晶体管MN1的源极与第四N型晶体管MN4的漏极连接；第四N型晶体管MN4的源极接地，第四N型晶体管MN4的栅极与第一输入端连接。Preferably, the first calculation subunit 110 includes a first P-type transistor MP1, a first N-type transistor MN1 and a fourth N-type transistor MN4; the source of the first P-type transistor MP1 is connected to the power supply terminal VDD, the drain of the first P-type transistor MP1 is connected to the drain of the first N-type transistor MN1, and the drain of the first P-type transistor MP1 is connected to the first output terminal O; the gate of the first N-type transistor MN1 is connected to the bit line BL of the in-memory calculation device, the source of the first N-type transistor MN1 is connected to the drain of the fourth N-type transistor MN4; the source of the fourth N-type transistor MN4 is grounded, and the gate of the fourth N-type transistor MN4 is connected to the first input terminal.

优选的，第二计算子单元120包括第二P型晶体管MP2，第二N型晶体管MN2以及第三N型晶体管MN3；第二P型晶体管MP2的源极与电源端VDD连接，第二P型晶体管MP2的漏极与第二N型晶体管MN2的漏极连接，第二P型晶体管MP2的漏极与第一P型晶体管MP1的栅极连接，第二P型晶体管MP2的栅极与第一P型晶体管MP1的漏极连接，第二P型晶体管MP2的漏极与第二输出端O’连接；第二N型晶体管MN2的栅极与第二输入端I’连接，第二N型晶体管MN2的源极接地；第三N型晶体管MN3的漏极与第二N型晶体管MN2的漏极连接，第三N型晶体管MN3的栅极与该存内计算装置的位线非BLB连接，第三N型晶体管MN3的源极接地。Preferably, the second calculation subunit 120 includes a second P-type transistor MP2, a second N-type transistor MN2 and a third N-type transistor MN3; the source of the second P-type transistor MP2 is connected to the power supply terminal VDD, the drain of the second P-type transistor MP2 is connected to the drain of the second N-type transistor MN2, the drain of the second P-type transistor MP2 is connected to the gate of the first P-type transistor MP1, the gate of the second P-type transistor MP2 is connected to the drain of the first P-type transistor MP1, and the drain of the second P-type transistor MP2 is connected to the second output terminal O’; the gate of the second N-type transistor MN2 is connected to the second input terminal I’, and the source of the second N-type transistor MN2 is grounded; the drain of the third N-type transistor MN3 is connected to the drain of the second N-type transistor MN2, the gate of the third N-type transistor MN3 is connected to the bit line non-BLB of the in-memory calculation device, and the source of the third N-type transistor MN3 is grounded.

优选的，多个存储单元200包括第三P型晶体管MP3，第五N型晶体管MN5以及第八N型晶体管MN8；第三P型晶体管MP3的源极与电源端VDD连接，第三P型晶体管MP3的漏极与第五N型晶体管MN5的漏极连接，第三P型晶体管MP3的栅极与第五N型晶体管MN5的栅极连接；第五N型晶体管MN5的漏极与第八N型晶体管MN8的漏极连接，第五N型晶体管MN5的源极接地；第八N型晶体管MN8的源极与该存内计算装置的位线BL连接，第八N型晶体管MN8的栅极与该存内计算装置的字线WL连接。Preferably, the plurality of storage cells 200 include a third P-type transistor MP3, a fifth N-type transistor MN5 and an eighth N-type transistor MN8; the source of the third P-type transistor MP3 is connected to the power supply terminal VDD, the drain of the third P-type transistor MP3 is connected to the drain of the fifth N-type transistor MN5, and the gate of the third P-type transistor MP3 is connected to the gate of the fifth N-type transistor MN5; the drain of the fifth N-type transistor MN5 is connected to the drain of the eighth N-type transistor MN8, and the source of the fifth N-type transistor MN5 is grounded; the source of the eighth N-type transistor MN8 is connected to the bit line BL of the in-memory computing device, and the gate of the eighth N-type transistor MN8 is connected to the word line WL of the in-memory computing device.

优选的，多个存储单元200还包括第四P型晶体管MP4，第六N型晶体管MN6以及第七N型晶体管MN7；第四P型晶体管MP4的源极与电源端VDD连接，第四P型晶体管MP4的漏极与第六N型晶体管MN6的漏极连接，第四P型晶体管MP4的栅极与第三P型晶体管MP3的漏极连接，第四P型晶体管MP4的漏极与第三P型晶体管MP3的栅极连接，第四P型晶体管MP4的栅极与第六N型晶体管MN6的栅极连接；第七N型晶体管MN7的漏极与第六N型晶体管MN6的漏极连接，第七N型晶体管MN7的栅极与该存内计算装置的字线WL连接，第七N型晶体管MN7的源极与该存内计算装置的位线非BLB连接。Preferably, the plurality of storage cells 200 further include a fourth P-type transistor MP4, a sixth N-type transistor MN6 and a seventh N-type transistor MN7; the source of the fourth P-type transistor MP4 is connected to the power supply terminal VDD, the drain of the fourth P-type transistor MP4 is connected to the drain of the sixth N-type transistor MN6, the gate of the fourth P-type transistor MP4 is connected to the drain of the third P-type transistor MP3, the drain of the fourth P-type transistor MP4 is connected to the gate of the third P-type transistor MP3, and the gate of the fourth P-type transistor MP4 is connected to the gate of the sixth N-type transistor MN6; the drain of the seventh N-type transistor MN7 is connected to the drain of the sixth N-type transistor MN6, the gate of the seventh N-type transistor MN7 is connected to the word line WL of the in-memory computing device, and the source of the seventh N-type transistor MN7 is connected to the bit line non-BLB of the in-memory computing device.

基于此，得到了SRAM存内计算列的结构图，其中包括SRAM存储单元和双端输出计算单元。在SRAM存储单元中MP3、MP4、MN5和MN6组成交叉耦合的反相器，Q和QB为存储节点。MN7、MN8为存取晶体管，MP1、MP2、MN1、MN2、MN3、MN4为计算晶体管；其中每一个电路的输入信号都有互补的形式，同时也产生互补的输出。MP1、MP2与MN1、MN2、MN3、MN4形成的反馈机制保证了在不需要负载器件的时候将其关断，并且计算电路是每列共享一个，当需要同时输出信号和输出信号的反信号的时候，同时输入信号和输入信号的反信号来进行逻辑与计算。Based on this, the structural diagram of the calculation column in the SRAM memory is obtained, which includes the SRAM storage unit and the two-terminal output calculation unit. In the SRAM storage unit, MP3, MP4, MN5 and MN6 form a cross-coupled inverter, and Q and QB are storage nodes. MN7 and MN8 are access transistors, and MP1, MP2, MN1, MN2, MN3, and MN4 are calculation transistors; the input signal of each circuit has a complementary form and also produces complementary outputs. The feedback mechanism formed by MP1, MP2 and MN1, MN2, MN3, and MN4 ensures that the load device is turned off when it is not needed, and the calculation circuit is shared by one per column. When the output signal and the inverse signal of the output signal need to be output at the same time, the input signal and the inverse signal of the input signal are input at the same time to perform logical and calculation.

具体的：当MP1、MN1、MN4进行与非操作，即第一输出端O输出第一输出信号的反信号，即其中A表示第一输入端I对应的输入信号值，B表示存储节点Q输出的信号值。当MP2、MN2、MN3进行或非操作，即第二输出端O’输出第一输出信号，即需要说明的是，在说明书其他位置的A和B所代表的含义与该处A和B所代表的含义相同；字线WL是基于行的，位线BL和位线非BLB是基于列的，当SRAM执行保持操作，读操作，写操作等SRAM常规操作时与常规SRAM相同。Specifically: when MP1, MN1, and MN4 perform a NAND operation, the first output terminal O outputs the inverse signal of the first output signal, that is, Where A represents the input signal value corresponding to the first input terminal I, and B represents the signal value output by the storage node Q. When MP2, MN2, and MN3 perform a non-OR operation, the second output terminal O' outputs the first output signal, that is, It should be noted that the meanings of A and B in other places in the specification are the same as those represented here; the word line WL is row-based, the bit line BL and the bit line non-BLB are column-based, and when the SRAM performs normal SRAM operations such as hold operations, read operations, and write operations, it is the same as a conventional SRAM.

第二方面，本发明还提供了一种存内计算方法，该存内计算方法应用于存内计算装置；该存内计算装置包括多个存内计算列，任意一个存内计算列可以包括多个存储单元和至少一个计算单元，其计算单元和多个存储单元的内部电路结构与第一方面所述的存内计算装置的存内计算列中的计算单元和多个存储单元的电路结构相同。请参阅图2，图2为本发明提供的一种存内计算装置中任意一列存内计算列的计算操作主要流程示意图。In a second aspect, the present invention further provides an in-memory computing method, which is applied to an in-memory computing device; the in-memory computing device includes multiple in-memory computing columns, and any one of the in-memory computing columns may include multiple storage units and at least one computing unit, and the internal circuit structure of the computing unit and the multiple storage units is the same as the circuit structure of the computing unit and the multiple storage units in the in-memory computing column of the in-memory computing device described in the first aspect. Please refer to Figure 2, which is a schematic diagram of the main flow of computing operations of any one in-memory computing column in an in-memory computing device provided by the present invention.

在图2中，方法可以包括：In FIG. 2 , the method may include:

步骤210：所述计算单元接收输入信号、输入信号的反信号以及多个所述存储单元通过位线发送的位线信号以及位线反信号。Step 210: The calculation unit receives an input signal, an inverse signal of the input signal, and a bit line signal and a bit line inverse signal sent by the plurality of storage units through bit lines.

步骤220：基于所述输入信号、所述输入信号的反信号、所述位线信号以及所述位线反信号，使用预设的晶体管进行与非操作和或非操作，使得所述计算单元同时输出输出信号以及所述输出信号的反信号。Step 220: Based on the input signal, the inverse signal of the input signal, the bit line signal and the bit line inverse signal, use preset transistors to perform NAND operations and NOR operations, so that the calculation unit simultaneously outputs an output signal and the inverse signal of the output signal.

在步骤210至步骤220中，基于计算单元的内部结构，可以实现同时输出输出信号和输出信号的反信号。结合图1中所述的电路结构，其中MP1、MN1、MN4进行与非操作，即第一输出端O输出第一输出信号的反信号，即输出当MN1、MN4中任一输入为低电平时，第一输出端O点电平由MP1控制，当MP1打开，第一输出端O点电平被拉到VDD，即A和B两个输入中任一输入为0，则第一输出端O输出必为1；只有当MN1和MN4输入同时为高电平时，MN1和MN4被同时打开，第一输出端O点电平被拉到低电位，可以用公式表达为：其中MP2、MN2、MN3进行或非操作，即输出当MN1、MN4中A,B任一输入为低电平时，MN2、MN3晶体管中其反相输入信号必为高电平输入，MN2、MN3晶体管中必有一晶体管打开，O点电平被拉到地电位，同时MP1被打开；当MN1、MN4中A,B两个输入都为高电平时，MN2、MN3晶体管的两个输入都是低电平，MN2、MN3晶体管都被关断，O点电平被MP2控制，用公式表达： In step 210 to step 220, based on the internal structure of the computing unit, the output signal and the inverse signal of the output signal can be output simultaneously. In combination with the circuit structure described in FIG. 1 , MP1, MN1, and MN4 perform a NAND operation, that is, the first output terminal O outputs the inverse signal of the first output signal, that is, the output When any input of MN1 or MN4 is at a low level, the level of the first output terminal O is controlled by MP1. When MP1 is turned on, the level of the first output terminal O is pulled to VDD, that is, if any input of A or B is 0, the output of the first output terminal O must be 1. Only when the inputs of MN1 and MN4 are at high levels at the same time, MN1 and MN4 are turned on at the same time, and the level of the first output terminal O is pulled to a low potential, which can be expressed by the formula: Among them, MP2, MN2, and MN3 perform an OR operation, that is, the output When any input A or B in MN1 or MN4 is at a low level, the inverting input signal in MN2 or MN3 must be at a high level, one of MN2 or MN3 must be turned on, the level at point O is pulled to the ground potential, and MP1 is turned on; when both input A or B in MN1 or MN4 are at a high level, both inputs of MN2 or MN3 are at a low level, both MN2 or MN3 are turned off, and the level at point O is controlled by MP2, which can be expressed by the formula:

需要说明的是，本发明说明书所公开的示例中，其IN表示第一输入端O对应的信号A,表示第一输出端O对应输出的输出信号的反信号，信号B表示位线BL输入至第一N型晶体管MN1的信号Q，表示第二输入端I’对应的信号OUT表示第二输出端O’对应输出的输出信号；图3是在图1的基础上仅标注了信号的输入和输出方向，其电路结构与图1中所述的电路结构相同。It should be noted that, in the examples disclosed in the specification of the present invention, IN represents the signal A corresponding to the first input terminal O, The signal represents the inverse signal of the output signal corresponding to the first output terminal O, the signal B represents the signal Q input from the bit line BL to the first N-type transistor MN1, Indicates the signal corresponding to the second input terminal I' OUT represents the output signal corresponding to the second output terminal O'; FIG. 3 is based on FIG. 1 and only marks the input and output directions of the signal. Its circuit structure is the same as the circuit structure described in FIG. 1.

进一步，在步骤220中，请结合参阅图3，图3为本发明提供的一种存内计算装置中任意一列存内计算列的计算操作时对应的信号输入输出示意图。在图3中，所述预设的晶体管可以包括第一P型晶体管MP1、第二P型晶体管MP2、第一N型晶体管MN1、第二N型晶体管MN2、第三N型晶体管MN3、第四N型晶体管MN4。Further, in step 220, please refer to FIG3, which is a schematic diagram of signal input and output corresponding to the calculation operation of any one column of the in-memory calculation column in the in-memory calculation device provided by the present invention. In FIG3, the preset transistors may include a first P-type transistor MP1, a second P-type transistor MP2, a first N-type transistor MN1, a second N-type transistor MN2, a third N-type transistor MN3, and a fourth N-type transistor MN4.

具体的，基于所述输入信号、所述输入信号的反信号、所述位线信号以及所述位线反信号，使用预设的晶体管进行与非操作和或非操作，使得所述计算单元同时输出输出信号以及所述输出信号的反信号。可以包括以下四种实现方式：Specifically, based on the input signal, the inverse signal of the input signal, the bit line signal and the bit line inverse signal, a preset transistor is used to perform a NAND operation and a NOR operation, so that the calculation unit simultaneously outputs an output signal and an inverse signal of the output signal. The following four implementations may be included:

方式1：当输入信号为0，位线信号为0时，对第一P型晶体管MP1、第一N型晶体管MN1以及第四N型晶体管MN4进行逻辑与非操作，使得所述计算单元输出所述输出信号的反信号1；以及对第二P型晶体管MP2、第二N型晶体管MN2以及第三N型晶体管MN3进行逻辑或非操作，使得所述计算单元输出所述输出信号0。Method 1: When the input signal is 0 and the bit line signal is 0, a logic AND operation is performed on the first P-type transistor MP1, the first N-type transistor MN1 and the fourth N-type transistor MN4, so that the calculation unit outputs the inverse signal 1 of the output signal; and a logic OR operation is performed on the second P-type transistor MP2, the second N-type transistor MN2 and the third N-type transistor MN3, so that the calculation unit outputs the output signal 0.

作为示例，在图3中，当设置IN＝0，BL＝Q＝0时,则BLB＝QB＝1；进一步可以控制MP1、MN1、MN4进行逻辑与非操作,即输出以及控制MP2、MN2、MN3进行或非操作，即输出 As an example, in FIG3, when IN=0, BL=Q=0 is set, then BLB＝QB＝1; further, MP1, MN1, and MN4 can be controlled to perform logical AND operations, that is, output And control MP2, MN2, MN3 to perform or not operation, that is, output

方式2：当输入信号为0，位线信号为1时，对第一P型晶体管MP1、第一N型晶体管MN1以及第四N型晶体管MN4进行逻辑与非操作，使得该计算单元输出输出信号的反信号1；以及对第二P型晶体管MP2、第二N型晶体管MN2以及第三N型晶体管MN3进行逻辑或非操作，使得该计算单元输出输出信号0。Method 2: When the input signal is 0 and the bit line signal is 1, a logic AND operation is performed on the first P-type transistor MP1, the first N-type transistor MN1 and the fourth N-type transistor MN4, so that the calculation unit outputs the inverse signal 1 of the output signal; and a logic OR operation is performed on the second P-type transistor MP2, the second N-type transistor MN2 and the third N-type transistor MN3, so that the calculation unit outputs the output signal 0.

作为示例，在图3中，当设置IN＝0，BL＝Q＝1时，则BLB＝QB＝0；进一步可以控制MP1、MN1、MN4进行逻辑与非操作,即输出以及控制MP2、MN2、MN3进行或非操作，即输出 As an example, in FIG3, when IN=0, BL=Q=1 is set, then BLB＝QB＝0; further, MP1, MN1, and MN4 can be controlled to perform logical AND and NOT operations, that is, output And control MP2, MN2, MN3 to perform OR operation, that is, output

方式3：当输入信号为1，位线信号为0时，对第一P型晶体管MP1、第一N型晶体管MN1以及第四N型晶体管MN4进行逻辑与非操作，使得该计算单元输出输出信号的反信号1；以及对第二P型晶体管MP2、第二N型晶体管MN2以及第三N型晶体管MN3进行逻辑或非操作，使得该计算单元输出输出信号0。Method 3: When the input signal is 1 and the bit line signal is 0, a logic AND operation is performed on the first P-type transistor MP1, the first N-type transistor MN1 and the fourth N-type transistor MN4, so that the calculation unit outputs the inverse signal 1 of the output signal; and a logic OR operation is performed on the second P-type transistor MP2, the second N-type transistor MN2 and the third N-type transistor MN3, so that the calculation unit outputs the output signal 0.

作为示例，在图3中，当设置IN＝1，BL＝Q＝0时，则BLB＝QB＝1；进一步控制MP1、MN1、MN4进行逻辑与非操作,即输出控制MP2、MN2、MN3进行或非操作，即输出 As an example, in FIG3, when IN=1, BL=Q=0 is set, then BLB＝QB＝1; further control MP1, MN1, MN4 to perform logical AND operation, that is, output Control MP2, MN2, MN3 to perform OR operation, that is, output

方式4：当输入信号为1，位线信号为1时，对第一P型晶体管MP1、第一N型晶体管MN1以及第四N型晶体管MN4进行逻辑与非操作，使得该计算单元输出输出信号的反信号0；以及对第二P型晶体管MP2、第二N型晶体管MN2以及第三N型晶体管MN3进行逻辑或非操作，使得该计算单元输出输出信号1。Method 4: When the input signal is 1 and the bit line signal is 1, a logic AND operation is performed on the first P-type transistor MP1, the first N-type transistor MN1 and the fourth N-type transistor MN4, so that the calculation unit outputs the inverse signal 0 of the output signal; and a logic OR operation is performed on the second P-type transistor MP2, the second N-type transistor MN2 and the third N-type transistor MN3, so that the calculation unit outputs the output signal 1.

作为示例，在图3中，当设置IN＝1，BL＝Q＝1时，则BLB＝QB＝0；进一步控制MP1、MN1、MN4进行逻辑与非操作,即输出控制MP2、MN2、MN3进行或非操作，即输出 As an example, in FIG3, when IN=1, BL=Q=1 is set, then BLB＝QB＝0; further control MP1, MN1, MN4 to perform logical AND operation, that is, output Control MP2, MN2, MN3 to perform OR operation, that is, output

基于此，可以将以上可以实现SRAM的计算单元同时输出输出信号以及输出信号的反信号的4种控制逻辑方式概括为表1中的内容。Based on this, the above four control logic methods that can realize the calculation unit of the SRAM to output the output signal and the inverse signal of the output signal at the same time can be summarized as the contents in Table 1.

表1在计算单元的控制逻辑下，针对不同的输入条件，计算单元同时输出输出信号以及输出信号的反信号的对照表Table 1 Comparison table of output signals and inverse signals of output signals output by the computing unit at the same time for different input conditions under the control logic of the computing unit

进一步，当SRAM执行保持操作，读操作，写操作等SRAM常规操作时与常规SRAM相同。Furthermore, when the SRAM performs a holding operation, a read operation, a write operation, and other SRAM normal operations, it is the same as a normal SRAM.

1、保持操作1. Keep operating

该单元在保持操作时。在保持操作下字线WL保持低电平，MN7和MN8关断，Q和QB不与位线联通，既不能读信号，也不能写信号，单元存储节点与外部信号不存在耦合，单元可以稳定保持数据。The cell is in hold operation. In the hold operation, the word line WL is kept at a low level, MN7 and MN8 are turned off, Q and QB are not connected to the bit line, neither the signal can be read nor written, the cell storage node is not coupled with the external signal, and the cell can stably hold the data.

2、读操作2. Read operation

在读操作之前，BL和BLB被预充到高电平。进行读操作时，WL保持为高电平，使得晶体管MN7和MN8打开。读“1”时，Q为“1”，QB为“0”。位线BLB与GND之间通过MN7形成一条低阻通路，产生读电流，而MN8关断，BL这侧不存在读电流，再通过灵敏放大器将位线电压放大输出到全摆幅，实现读操作。读“0”亦同理。Before the read operation, BL and BLB are precharged to a high level. When performing a read operation, WL remains at a high level, so that transistors MN7 and MN8 are turned on. When reading "1", Q is "1" and QB is "0". A low-resistance path is formed between the bit line BLB and GND through MN7, generating a read current, while MN8 is turned off, and there is no read current on the BL side. The bit line voltage is then amplified and output to the full swing through the sense amplifier to achieve a read operation. The same applies to reading "0".

3、写操作3. Write operation

在写操作模式下，BL和BLB的电压根据输入模块输入的写数据转换成相应的高低电平，BL为0，BLB为1，WL打开，内部节点Q通过BL开始放电，BLB对QB充电，形成正反馈。当Q放电到耦合反相器的翻转电平以下，单元存储的数据会翻转，QB节点变为1，Q为0，完成写操作；由于存储阵列是完全对称的结构，当Q和QB分别为0和1时，读写操作与上述过程类似，不再赘述。In the write operation mode, the voltages of BL and BLB are converted into corresponding high and low levels according to the write data input by the input module. BL is 0, BLB is 1, WL is turned on, the internal node Q starts to discharge through BL, and BLB charges QB, forming positive feedback. When Q discharges below the flip level of the coupled inverter, the data stored in the cell will flip, the QB node becomes 1, and Q is 0, completing the write operation; since the storage array is a completely symmetrical structure, when Q and QB are 0 and 1 respectively, the read and write operations are similar to the above process and will not be repeated.

基于此，本发明提出一种存内计算方法可以应用与第一方面提供的一种存内计算装置中。在SRAM阵列通过增加计算单元，配置输入信号及其反信号，能够实现逻辑与非功能和逻辑或非功能，其工作过程中能实现同时输出输出信号及其反信号的功能，不需要额外配置反相器，不会出现电路延时和功耗增加的问题；同时也节省了电路中晶体管的数量，降低了晶体管的使用成本。Based on this, the present invention proposes an in-memory computing method that can be applied to an in-memory computing device provided in the first aspect. By adding computing units to the SRAM array and configuring input signals and their inverse signals, the logical AND function and the logical OR function can be realized. During its operation, the function of simultaneously outputting the output signal and its inverse signal can be realized, and no additional inverter is required, and there will be no problems of circuit delay and increased power consumption; at the same time, the number of transistors in the circuit is saved, and the use cost of transistors is reduced.

尽管在此结合各实施例对本发明进行了描述，然而，在实施所要求保护的本发明过程中，本领域技术人员通过查看附图、公开内容、以及所附权利要求书，可理解并实现公开实施例的其他变化。在权利要求中，“包括”(comprising)一词不排除其他组成部分或步骤，“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施，但这并不表示这些措施不能组合起来产生良好的效果。Although the present invention is described herein in conjunction with various embodiments, in the process of implementing the claimed invention, those skilled in the art may understand and implement other variations of the disclosed embodiments by viewing the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other components or steps, and "one" or "an" does not exclude multiple situations. A single processor or other unit may implement several functions listed in the claims. Certain measures are recorded in different dependent claims, but this does not mean that these measures cannot be combined to produce good results.

尽管结合具体特征及其实施例对本发明进行了描述，显而易见的，在不脱离本发明的精神和范围的情况下，可对其进行各种修改和组合。相应地，本说明书和附图仅仅是所附权利要求所界定的本发明的示例性说明，且视为已覆盖本发明范围内的任意和所有修改、变化、组合或等同物。显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包括这些改动和变型在内。Although the present invention has been described in conjunction with specific features and embodiments thereof, it is apparent that various modifications and combinations may be made thereto without departing from the spirit and scope of the present invention. Accordingly, this specification and the accompanying drawings are merely exemplary illustrations of the present invention as defined by the appended claims and are deemed to cover any and all modifications, variations, combinations or equivalents within the scope of the present invention. Obviously, those skilled in the art may make various modifications and variations to the present invention without departing from the spirit and scope of the present invention. Thus, the present invention is intended to include such modifications and variations if they fall within the scope of the claims of the present invention and their equivalents.

Claims

1. An in-memory computing device, characterized in that the in-memory computing device comprises a plurality of in-memory computing columns, the in-memory computing columns comprise a computing unit and a plurality of storage units, the computing unit is connected to the plurality of storage units;

The computing unit comprises a first computing subunit and a second computing subunit; the first computing subunit is connected to the second computing subunit;

The first calculation subunit includes a first input terminal and a first output terminal; the first input terminal is used to receive a first input signal, and the first output terminal is used to output an inverse signal of the first output signal;

The second calculation subunit includes a second input terminal and a second output terminal; the second input terminal is used to receive the inverse signal of the first input signal, and the second output terminal is used to output the first output signal.

2. The device according to claim 1, wherein the first computing subunit comprises a first P-type transistor, a first N-type transistor and a fourth N-type transistor;

The source of the first P-type transistor is connected to the power supply terminal, the drain of the first P-type transistor is connected to the drain of the first N-type transistor, and the drain of the first P-type transistor is connected to the first output terminal; the gate of the first N-type transistor is connected to the bit line of the in-memory computing device, the source of the first N-type transistor is connected to the drain of the fourth N-type transistor; the source of the fourth N-type transistor is grounded, and the gate of the fourth N-type transistor is connected to the first input terminal.

3. The device according to claim 2, wherein the second computing subunit comprises a second P-type transistor, a second N-type transistor and a third N-type transistor;

The source of the second P-type transistor is connected to the power supply terminal, the drain of the second P-type transistor is connected to the drain of the second N-type transistor, the drain of the second P-type transistor is connected to the gate of the first P-type transistor, the gate of the second P-type transistor is connected to the drain of the first P-type transistor, and the drain of the second P-type transistor is connected to the second output terminal;

The gate of the second N-type transistor is connected to the second input terminal, and the source of the second N-type transistor is grounded;

The drain of the third N-type transistor is connected to the drain of the second N-type transistor, the gate of the third N-type transistor is disconnected from the bit line of the in-memory computing device, and the source of the third N-type transistor is grounded.

4. The device of claim 1, wherein the memory cell comprises a third P-type transistor, a fifth N-type transistor and an eighth N-type transistor;

The source of the third P-type transistor is connected to the power supply terminal, the drain of the third P-type transistor is connected to the drain of the fifth N-type transistor, and the gate of the third P-type transistor is connected to the gate of the fifth N-type transistor; the drain of the fifth N-type transistor is connected to the drain of the eighth N-type transistor, and the source of the fifth N-type transistor is grounded; the source of the eighth N-type transistor is connected to the bit line of the in-memory computing device, and the gate of the eighth N-type transistor is connected to the word line of the in-memory computing device.

5. The device of claim 4, wherein the storage unit comprises a fourth P-type transistor, a sixth N-type transistor and a seventh N-type transistor;

The source of the fourth P-type transistor is connected to the power supply terminal, the drain of the fourth P-type transistor is connected to the drain of the sixth N-type transistor, the gate of the fourth P-type transistor is connected to the drain of the third P-type transistor, the drain of the fourth P-type transistor is connected to the gate of the third P-type transistor, and the gate of the fourth P-type transistor is connected to the gate of the sixth N-type transistor; the drain of the seventh N-type transistor is connected to the drain of the sixth N-type transistor, the gate of the seventh N-type transistor is connected to the word line of the in-memory computing device, and the source of the seventh N-type transistor is not connected to the bit line of the in-memory computing device.

6. An in-memory computing method, characterized in that the in-memory computing method is applied to an in-memory computing device; the in-memory computing device includes a plurality of in-memory computing columns, the in-memory computing columns include a computing unit and a plurality of storage units; the method comprises:

The calculation unit receives an input signal, an inverse signal of the input signal, and a bit line signal and a bit line inverse signal sent by a plurality of the storage units through a bit line;

Based on the input signal, the inverted signal of the input signal, the bit line signal and the bit line inverted signal, a NAND operation and a NOR operation are performed using preset transistors, so that the calculation unit simultaneously outputs an output signal and an inverted signal of the output signal.

7. The method according to claim 6, wherein the preset transistors include a first P-type transistor, a second P-type transistor, a first N-type transistor, a second N-type transistor, a third N-type transistor, and a fourth N-type transistor;

The method of using a preset transistor to perform a NAND operation and a NOR operation based on the input signal, the inverted signal of the input signal, the bit line signal, and the bit line inverted signal, so that the calculation unit simultaneously outputs an output signal and an inverted signal of the output signal, includes:

When the input signal is 0 and the bit line signal is 0, a logic AND operation is performed on the first P-type transistor, the first N-type transistor and the fourth N-type transistor, so that the calculation unit outputs an inverse signal 1 of the output signal;

And, a logic OR operation is performed on the second P-type transistor, the second N-type transistor and the third N-type transistor, so that the calculation unit outputs the output signal 0.

8. The method according to claim 7, wherein the step of performing a NAND operation and a NOR operation using a preset transistor based on the input signal, the inverted signal of the input signal, the bit line signal, and the bit line inverted signal, and outputting an output signal and an inverted signal of the output signal at the same time, comprises:

When the input signal is 0 and the bit line signal is 1, a logic AND operation is performed on the first P-type transistor, the first N-type transistor and the fourth N-type transistor, so that the calculation unit outputs an inverse signal 1 of the output signal;

9. The method according to claim 7, wherein the step of performing a NAND operation and a NOR operation on a preset transistor based on the input signal, the inverted signal of the input signal, the bit line signal, and the bit line inverted signal, and outputting an output signal and an inverted signal of the output signal at the same time, comprises:

When the input signal is 1 and the bit line signal is 0, a logic AND operation is performed on the first P-type transistor, the first N-type transistor and the fourth N-type transistor, so that the calculation unit outputs an inverse signal 1 of the output signal;

10. The method according to claim 7, wherein the step of performing a NAND operation and a NOR operation using a preset transistor based on the input signal, the inverted signal of the input signal, the bit line signal, and the bit line inverted signal, and outputting an output signal and an inverted signal of the output signal at the same time, comprises:

When the input signal is 1 and the bit line signal is 1, a logic AND operation is performed on the first P-type transistor, the first N-type transistor and the fourth N-type transistor, so that the calculation unit outputs an inverse signal 0 of the output signal;

And, a logic OR operation is performed on the second P-type transistor, the second N-type transistor and the third N-type transistor, so that the calculation unit outputs the output signal 1.