CN110196734A

CN110196734A - A kind of computing device and Related product

Info

Publication number: CN110196734A
Application number: CN201810161816.0A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-02-27
Filing date: 2018-02-27
Publication date: 2019-09-03

Abstract

The present disclosure provides a computing device, the computing device comprising: an instruction control unit for acquiring an operation instruction, decoding the operation instruction into a first microinstruction and a second microinstruction, and converting the first microinstruction Send to the compression unit, send the second microinstruction to the operation unit; the storage unit is used to store input data, processed input data, operation instructions and operation results, and the input data includes at least one input neuron unit and/or at least one weight, the processed input data includes a processed input neuron and/or a processed weight; the compression unit is used to process the input data according to the first microinstruction , to obtain the processed input data; the operation unit is configured to process the processed input data according to the second microinstruction, so as to obtain the operation result.

Description

Computing device and related products

技术领域technical field

本申请涉及信息处理技术领域，具体涉及一种计算装置及相关产品。The present application relates to the technical field of information processing, in particular to a computing device and related products.

背景技术Background technique

随着信息技术的不断发展和人们日益增长的需求，人们对信息及时性的要求越来越高了。目前，终端对信息的获取以及处理均是基于通用处理器获得的。With the continuous development of information technology and people's growing needs, people's requirements for information timeliness are getting higher and higher. At present, the terminal obtains and processes information based on a general-purpose processor.

在实践中发现，这种基于通用处理器运行软件程序来处理信息的方式，受限于通用处理器的运行速率，特别是在通用处理器负荷较大的情况下，信息处理效率较低、时延较大。In practice, it has been found that this way of processing information based on general-purpose processors running software programs is limited by the speed of general-purpose processors, especially when the load of general-purpose processors is high, information processing efficiency is low and time-consuming The extension is larger.

申请内容application content

本申请实施例提供了一种数据处理方法及相关产品，可提升计算装置的处理速度，提高效率。Embodiments of the present application provide a data processing method and related products, which can increase the processing speed of a computing device and improve efficiency.

第一方面，提供一种使用计算装置进行数据处理的方法，所述计算装置包括运算单元、指令控制单元、存储单元以及压缩单元，所述方法包括：In a first aspect, there is provided a method for processing data using a computing device, the computing device including an arithmetic unit, an instruction control unit, a storage unit, and a compression unit, the method comprising:

所述指令控制单元获取运算指令，将所述运算指令译码为第一微指令和第二微指令，并将所述第一微指令发送给所述压缩单元，将所述第二微指令发送给所述运算单元；The instruction control unit acquires an operation instruction, decodes the operation instruction into a first microinstruction and a second microinstruction, sends the first microinstruction to the compression unit, and sends the second microinstruction to the compression unit. to the arithmetic unit;

所述压缩单元根据所述第一微指令对获取的输入数据进行处理，得到处理后的输入数据；其中，所述输入数据包括至少一个输入神经元和/或至少一个输入数据，所述处理后的输入数据包括处理后的输入神经元和/或处理后的输入数据；The compression unit processes the acquired input data according to the first microinstruction to obtain processed input data; wherein, the input data includes at least one input neuron and/or at least one input data, and the processed The input data comprises processed input neurons and/or processed input data;

所述运算单元根据所述第二微指令对所述处理后的输入数据进行处理，得到运算结果。The operation unit processes the processed input data according to the second microinstruction to obtain an operation result.

第二方面，提供一种计算装置，所述计算装置包括用于执行上述第一方面的方法的硬件单元。In a second aspect, a computing device is provided, and the computing device includes a hardware unit configured to execute the method of the first aspect above.

第三方面，提供一种计算机可读存储介质，其存储用于电子数据交换的计算机程序，其中，所述计算机程序使得计算机执行第一方面提供的方法。In a third aspect, a computer-readable storage medium is provided, which stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method provided in the first aspect.

第四方面，提供了一种芯片，所述芯片包括如上第二方面提供的计算装置。In a fourth aspect, a chip is provided, and the chip includes the computing device as provided in the second aspect above.

第五方面，提供了一种芯片封装结构，所述芯片封装结构包括如上第四方面提供的芯片。In a fifth aspect, a chip packaging structure is provided, and the chip packaging structure includes the chip provided in the fourth aspect above.

第六方面，提供了一种板卡，所述板卡包括如上第五方面提供的芯片封装结构。In a sixth aspect, a board is provided, and the board includes the chip packaging structure provided in the fifth aspect above.

第七方面，提供了一种电子设备，所述电子设备包括如上第六方面提供的板卡。In a seventh aspect, an electronic device is provided, and the electronic device includes the board provided in the sixth aspect above.

在一些实施例中，所述电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic equipment includes a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, Video cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, home appliances, and/or medical equipment.

在一些实施例中，所述交通工具包括飞机、轮船和/或车辆；所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机；所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the vehicles include airplanes, ships, and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods; the medical Equipment includes MRI machines, ultrasound machines, and/or electrocardiographs.

本申请提供的计算装置设置了压缩单元，该压缩单元根据运算指令对输入数据进行压缩处理，后续利用该处理后的输入数据进行计算处理，可减少数据计算量，提高数据处理效率。The computing device provided by the present application is provided with a compression unit, which compresses the input data according to the operation instructions, and then uses the processed input data to perform calculation processing, which can reduce the amount of data calculation and improve the efficiency of data processing.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

图1A-1C是本申请实施例提供的几种计算装置的结构示意图。1A-1C are schematic structural diagrams of several computing devices provided by the embodiments of the present application.

图1D是本发明实施例提供的一种压缩单元的结构示意图。FIG. 1D is a schematic structural diagram of a compression unit provided by an embodiment of the present invention.

图1E是本发明实施例提供的一种控制单元控制状态迁移的示意图。FIG. 1E is a schematic diagram of a control unit controlling state transition provided by an embodiment of the present invention.

图2为本发明实施例提供的一种压缩单元的局部结构示意图。Fig. 2 is a schematic diagram of a partial structure of a compression unit provided by an embodiment of the present invention.

图3为本发明实施例提供的一种神经网络结构示意图。FIG. 3 is a schematic diagram of a neural network structure provided by an embodiment of the present invention.

图4为本发明实施例提供的另一种压缩单元的局部结构示意图。Fig. 4 is a schematic diagram of a partial structure of another compression unit provided by an embodiment of the present invention.

图5为本发明实施例提供的另一种压缩单元的局部结构示意图。Fig. 5 is a schematic diagram of a partial structure of another compression unit provided by an embodiment of the present invention.

图6为本发明实施例提供的一种使用计算装置的数据处理方法的流程示意图。FIG. 6 is a schematic flowchart of a data processing method using a computing device provided by an embodiment of the present invention.

图7a是本披露提供的一种芯片装置的结构示意图。Fig. 7a is a schematic structural diagram of a chip device provided in the present disclosure.

图7b是本披露提供的一种主处理电路的结构示意图。Fig. 7b is a schematic structural diagram of a main processing circuit provided by the present disclosure.

图7c是本披露提供的芯片装置的数据分发示意图。Fig. 7c is a schematic diagram of data distribution of the chip device provided by the present disclosure.

图7d为一种芯片装置的数据回传示意图。Fig. 7d is a schematic diagram of data return of a chip device.

附图中“/”表示“或”。"/" in the drawings means "or".

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

请参见图1A是本发明实施例提供的一种计算装置的结构示意图。如图1A 所示，该计算装置包括：压缩单元101、存储单元102、指令控制单元107以及运算单元108。可选的，如图1B所述计算装置还可包括第一输入缓存单元105、第二输入缓存单元106。进一步可选的，所述计算装置还可包括直接存储访问 (Direct Memory Access,DMA)单元103、指令缓存单元104以及输出缓存单元 109。下面结合图1A和1B示出的计算装置，具体阐述本申请实施例。其中，Please refer to FIG. 1A which is a schematic structural diagram of a computing device provided by an embodiment of the present invention. As shown in FIG. 1A , the computing device includes: a compression unit 101 , a storage unit 102 , an instruction control unit 107 and a computing unit 108 . Optionally, the computing device as shown in FIG. 1B may further include a first input buffer unit 105 and a second input buffer unit 106 . Further optionally, the computing device may further include a direct memory access (Direct Memory Access, DMA) unit 103, an instruction cache unit 104, and an output cache unit 109. The following describes the embodiment of the present application in detail with reference to the computing device shown in FIGS. 1A and 1B . in,

所述存储单元102用于存储输入数据、运算指令(具体可包括但不限于神经网络运算指令、非神经网络运算指令、加法指令、卷积指令等等)、处理后的输入数据、输入数据的位置关系数据、运算结果以及其他神经网络运算中产生的中间数据等等，本申请不做限定。所述输入数据包括但不限于输入权值和输入神经元，且所述输入数据的数量本申请也不做限定，即所述输入数据包括至少一个输入权值和/或至少一个输入神经元。The storage unit 102 is used to store input data, operation instructions (specifically may include but not limited to neural network operation instructions, non-neural network operation instructions, addition instructions, convolution instructions, etc.), processed input data, input data This application does not limit the location relationship data, operation results, and other intermediate data generated in neural network operations. The input data includes but not limited to input weights and input neurons, and the quantity of the input data is not limited in this application, that is, the input data includes at least one input weight and/or at least one input neuron.

所述输入数据的位置关系数据用于表征输入数据的位置，例如输入数据为一个矩阵A，则以数据Aij为例，该数据Aij的位置信息为位于矩阵A中的第i 行第j列。The positional relationship data of the input data is used to represent the position of the input data. For example, the input data is a matrix A. Taking the data Aij as an example, the position information of the data Aij is located in the ith row and the jth column of the matrix A.

可选的本申请中，所述输入数据的位置关系数据还可表示所述输入数据中绝对值大于或等于预设阈值的输入数据的位置关系，该输入数据可为输入神经元或输入权值。其中，所述预设阈值可为用户侧或设备侧自定义设置的阈值，例如0.2、0.5等等。所述输入数据的位置关系数据可用直接索引或者步长索引的方式表示，具体将在后文进行详细阐述。Optionally, in this application, the positional relationship data of the input data can also represent the positional relationship of the input data whose absolute value is greater than or equal to a preset threshold in the input data, and the input data can be an input neuron or an input weight . Wherein, the preset threshold may be a threshold custom-set by the user side or the device side, such as 0.2, 0.5 and so on. The positional relationship data of the input data can be represented by a direct index or a step index, which will be described in detail later.

以直接索引为例，假设输入数据为矩阵预设阈值为0.5，则所述输入数据的位置关系数据为 Take direct indexing as an example, assuming that the input data is a matrix The preset threshold is 0.5, then the positional relationship data of the input data is

可选实施例中，本申请为节省存储空间，所述计算装置可根据所述输入数据的位置关系数据来存储所述输入数据。具体的，缓存输入数据的位置关系数据以及缓存绝对值大于或等于预设阈值的输入数据，其他位置的输入数据默认为是：绝对值小于该预设阈值的数据，或默认为0，本申请不做限定。即是本申请采用稠密方式来存储数据，例如在第一/二缓存单元中以稠密方式缓存输入神经元或输入权值等等。In an optional embodiment, in order to save storage space in the present application, the computing device may store the input data according to the position relationship data of the input data. Specifically, the location relationship data of the cached input data and the input data whose absolute value is greater than or equal to the preset threshold are cached. The input data of other locations defaults to: data whose absolute value is less than the preset threshold, or 0 by default. This application No limit. That is, the present application uses a dense method to store data, for example, the first/second cache unit caches input neurons or input weights in a dense manner.

所述直接存储访问DMA单元103用于在上述存储单元102与上述指令缓存单元104、上述映射单元101、上述第一输入缓存单元105和上述输出缓存单元 109之间进行数据读写。The direct memory access DMA unit 103 is used for reading and writing data between the storage unit 102 and the instruction cache unit 104 , the mapping unit 101 , the first input cache unit 105 and the output cache unit 109 .

例如，DMA单元103从所述存储单元102中读取运算指令，并将该运算指令发送给指令控制单元107，或缓存至指令缓存单元104等。For example, the DMA unit 103 reads an operation instruction from the storage unit 102 , and sends the operation instruction to the instruction control unit 107 , or caches the operation instruction in the instruction cache unit 104 .

又如，DMA单元103还可从所述存储单元102中读取输入权值或处理后的输入权值，以发送至第一输入存储单元105或第二输入存储单元106中进行缓存。相应地，DMA单元103还可从所述存储单元102中读取输入神经元或处理后的输入神经元，以发送至第一输入存储单元105或第二输入存储单元106中。其中，所述第一输入存储单元105和第二输入存储单元106中缓存的数据不同，例如第一输入缓存单元105存储有输入神经元或处理后的输入神经元，则第二输入缓存单元106中存储输入权值或处理后的权值；反之亦然。As another example, the DMA unit 103 may also read the input weight or the processed input weight from the storage unit 102 to send to the first input storage unit 105 or the second input storage unit 106 for buffering. Correspondingly, the DMA unit 103 can also read input neurons or processed input neurons from the storage unit 102 to send to the first input storage unit 105 or the second input storage unit 106 . Wherein, the data cached in the first input storage unit 105 and the second input storage unit 106 are different, for example, the first input cache unit 105 stores input neurons or processed input neurons, then the second input cache unit 106 Store input weights or processed weights in ; and vice versa.

所述指令缓存单元104用于缓存运算指令。所述第一输入缓存单元用于缓存第一缓存数据，所述第二输入缓存数据用于缓存第二缓存数据。所述第一缓存数据和第二缓存数据不同，关于所述第一缓存数据和第二缓存数据可参见前文所述。例如，如果所述第一缓存数据为处理后的输入权值，则第二缓存数据可为未处理的输入神经元，也可为处理后的输入神经元等等。The instruction cache unit 104 is used for caching operation instructions. The first input buffer unit is used for buffering first buffer data, and the second input buffer data is used for buffering second buffer data. The first cached data is different from the second cached data, and for the first cached data and the second cached data, reference can be made to the foregoing. For example, if the first cache data is processed input weights, the second cache data may be unprocessed input neurons, or processed input neurons, and so on.

所述指令控制单元107可用于从所述指令缓存单元或存储单元中获取运算指令，进一步地可将所述运算指令译码为相应地的微指令，以便所述计算单元中的相关部件能够识别和执行。例如，本申请中指令控制单元可将运算单元译码为第一微指令和第二微指令。其中，第一微指令用于指示所述压缩单元采用所述第一微指令所指示的处理方式进行对应的数据处理。第二微指令用于指示所述运算单元执行所述第二微指令对应的运算处理，例如乘法运算、卷积运算等等。The instruction control unit 107 can be used to obtain operation instructions from the instruction cache unit or storage unit, and can further decode the operation instructions into corresponding microinstructions, so that relevant components in the calculation unit can recognize and execute. For example, in the present application, the instruction control unit can decode the operation unit into the first microinstruction and the second microinstruction. Wherein, the first microinstruction is used to instruct the compression unit to use the processing method indicated by the first microinstruction to perform corresponding data processing. The second microinstruction is used to instruct the operation unit to execute the operation processing corresponding to the second microinstruction, such as multiplication operation, convolution operation and so on.

所述输出缓存单元可用于缓存所述运算单元输出的运算结果。所述运算单元用于根据指令控制单元发送的指令进行相应的数据运算处理，以获得运算结果。所述压缩单元用于对数据进行压缩处理，以降低数据维度，减少运算单元中的数据运算量，提高数据处理效率。The output buffer unit may be used for buffering the operation result output by the operation unit. The operation unit is used to perform corresponding data operation processing according to the instructions sent by the instruction control unit, so as to obtain operation results. The compression unit is used for compressing the data to reduce the dimension of the data, reduce the amount of data calculation in the operation unit, and improve the efficiency of data processing.

在可选实施例中，所述计算装置还可包括预处理模块110，具体如图1C所示。所述预处理模块可用于对数据进行预处理，以获得预处理后的数据。相应地，可将处理后的数据存储至存储单元中。例如本申请中，所述存储单元中缓存的输入数据即可为经过该预处理模块处理后的输入数据等。所述预处理包括但不限于以下处理中的任一项或多项的组合：高斯滤波、二值化、归一化、正则化、异常数据筛选等等，本申请不做限定。In an optional embodiment, the computing device may further include a preprocessing module 110, as specifically shown in FIG. 1C. The preprocessing module can be used to preprocess data to obtain preprocessed data. Correspondingly, the processed data can be stored in the storage unit. For example, in the present application, the input data buffered in the storage unit may be the input data processed by the preprocessing module and the like. The preprocessing includes, but is not limited to, any one or a combination of multiple of the following processes: Gaussian filtering, binarization, normalization, regularization, abnormal data screening, etc., which are not limited in this application.

在可选实施例中，本申请中的第一输入缓存单元和第二输入缓存单元均可被拆分为两个缓存单元，其中一个缓存单元用于存储输入权值或输入神经元，另一个缓存单元对应用于存储输入权值的位置关系数据或输入神经元的位置关系数据等等，本申请图未示。In an optional embodiment, both the first input cache unit and the second input cache unit in this application can be split into two cache units, one of which is used to store input weights or input neurons, and the other The cache unit is correspondingly used to store the positional relationship data of the input weights or the positional relationship data of the input neurons, etc., which are not shown in the figures of this application.

在可选实施例中，所述压缩单元101的设计位置本申请并不做限定。如图 1B所示，所述压缩单元可放在所述第一输入缓存单元和第二输入缓存单元的后面。可选的，所述所述压缩单元可放在所述第一输入缓存单元和第二输入缓存单元的前面，位于DMA单元的后面。可选的，所述压缩单元还可放在所述运算单元内等等，不做限定。In an optional embodiment, the design position of the compression unit 101 is not limited in this application. As shown in FIG. 1B, the compression unit may be placed behind the first input buffer unit and the second input buffer unit. Optionally, the compression unit may be placed in front of the first input buffer unit and the second input buffer unit, and behind the DMA unit. Optionally, the compression unit may also be placed in the operation unit, etc., which is not limited.

如图1D是本发明提供的一种压缩单元的结构示意图。如图1D，所述压缩单元101可包括以下中的任一项或多项的组合：剪枝单元201、量化单元202以及编码单元203。可选的，所述压缩单元101还可包括控制单元204。其中，所述剪枝单元201具体用于对接收的数据进行剪枝处理，关于所述剪枝处理的具体实现将在下文详述。所述量化单元202具体用于对接收的数据进行量化处理，关于所述量化处理的具体实现将在下文详述。所述编码单元203具体用于对接收的数据进行编码处理，关于所述编码处理的具体实现将在下文详述。所述控制单元204具体用于根据接收指令的指示利用上述三个单元中的至少一个单元完成对应的数据的处理方式，具体实现将在下文详述。FIG. 1D is a schematic structural diagram of a compression unit provided by the present invention. As shown in FIG. 1D , the compression unit 101 may include any one or a combination of the following: a pruning unit 201 , a quantization unit 202 and a coding unit 203 . Optionally, the compression unit 101 may further include a control unit 204 . Wherein, the pruning unit 201 is specifically configured to perform pruning processing on the received data, and the specific implementation of the pruning processing will be described in detail below. The quantization unit 202 is specifically configured to perform quantization processing on the received data, and the specific implementation of the quantization processing will be described in detail below. The encoding unit 203 is specifically configured to perform encoding processing on the received data, and the specific implementation of the encoding processing will be described in detail below. The control unit 204 is specifically configured to use at least one of the above three units to complete the corresponding data processing according to the instruction received, and the specific implementation will be described in detail below.

下面阐述所述压缩单元和所述运算单元涉及的具体实施例。具体的，所述压缩单元101在接收到所述指令控制单元107发送的第一微指令后，可获取输入数据，根据所述第一微指令所指示的处理方式对所述输入数据进行处理，以获得处理后的输入数据。Specific embodiments related to the compression unit and the operation unit are set forth below. Specifically, after receiving the first microinstruction sent by the instruction control unit 107, the compression unit 101 can obtain input data, and process the input data according to the processing mode indicated by the first microinstruction, to obtain the processed input data.

其中，所述处理方式包括但不限于以下中的任一项或多项的组合：剪枝处理、量化处理、编码处理或者其他用于降低数据维度或减少数据量的处理方式，本申请不做限定。Wherein, the processing methods include but are not limited to the combination of any one or more of the following: pruning processing, quantization processing, coding processing or other processing methods for reducing data dimensions or reducing data volume, this application does not limited.

具体的，如果所述处理方式为剪枝处理，则所述压缩单元可采用剪枝单元将绝对值大于或等于第一阈值的输入数据进行保留，将绝对值小于第一阈值的输入数据进行删除，从而获得处理后的输入数据。所述输入数据包括但限于至少一个输入神经元、至少一个输入权值等等。所述第一阈值为用户侧或设备侧自定义设置的，例如0.5等。Specifically, if the processing method is pruning, the compression unit may use a pruning unit to retain input data whose absolute value is greater than or equal to the first threshold, and delete input data whose absolute value is smaller than the first threshold , so as to obtain the processed input data. The input data includes but is limited to at least one input neuron, at least one input weight and so on. The first threshold is custom-set by the user side or the device side, such as 0.5 or the like.

例如，输入数据为向量P(0.02,0.05,1,2,0.07,2.1,0.89)，如果所述第一微指令指示所述压缩单元采用剪枝处理来处理数据，则所述压缩单元将对向量P进行剪枝处理，删除向量P中绝对值小于0.5的数据，从而获得处理后的输入数据 (即处理后的向量P)为(0,0,1,2,0,2.1,0.89)。For example, the input data is vector P(0.02,0.05,1,2,0.07,2.1,0.89), if the first microinstruction instructs the compression unit to process the data by pruning, then the compression unit will The vector P is pruned, and the data whose absolute value is less than 0.5 in the vector P is deleted, so that the processed input data (that is, the processed vector P) is (0, 0, 1, 2, 0, 2.1, 0.89).

如果所述处理方式为量化处理，则所述压缩单元可采用量化单元将所述输入数据进行聚类和量化，从而获得处理后的输入数据。其中所述量化是指将原数据量化至接近该原数据的新数据，该新数据可为用户侧自定义设置的，或者保证所述原数据和新数据之间的误差小于预设值，例如将0.5量化至1等等。所述量化处理方式具体所采用的量化和聚类算法本申请不做限定，例如采用 K-means算法对神经网络模型中的每一层输入权值或输入神经元进行聚类等。If the processing method is quantization processing, the compression unit may use a quantization unit to cluster and quantize the input data, so as to obtain processed input data. Wherein the quantization refers to quantizing the original data to new data close to the original data, the new data can be customized by the user side, or to ensure that the error between the original data and the new data is less than a preset value, for example Quantize 0.5 to 1 and so on. The specific quantization and clustering algorithms used in the quantization processing method are not limited in this application. For example, the K-means algorithm is used to cluster the input weights or input neurons of each layer in the neural network model.

例如输入数据为矩阵采用量化处理方式后可获得处理后的输入数据，例如可为 For example, the input data is a matrix The processed input data can be obtained after quantization processing, for example, it can be

如果所述处理方式为编码处理，则所述压缩单元可通过编码单元采用预设编码格式对所述输入数据进行编码，从而获得处理后的输入数据。所述预设编码格式为用户侧或设备侧自定义设置的，其可包括但不限于霍夫曼huffman编码、非归零码、曼切斯特编码等等。If the processing method is coding processing, the compression unit may use a coding unit to code the input data in a preset coding format, so as to obtain processed input data. The preset coding format is custom-set by the user side or the device side, and may include but not limited to Huffman coding, non-return-to-zero coding, Manchester coding, and the like.

本申请中，当所述第一微指令所指示的处理方式包括多种处理方式时，所述多种处理方式的执行顺序本申请可以不做限定，也可在所述第一微指令中定义。即，所述第一微指令中指示有所述压缩单元所需采用的处理方式以及所述处理方式的执行顺序等。In this application, when the processing method indicated by the first microinstruction includes multiple processing methods, the execution sequence of the various processing methods may not be limited in this application, and may also be defined in the first microinstruction . That is, the first microinstruction indicates the processing method to be adopted by the compression unit and the execution sequence of the processing method.

例如，所述第一微指令所指示的处理方式包括：剪枝处理、量化处理以及编码处理，如果所述第一微指令指示有上述三种处理方式的执行顺序，则所述压缩单元将按照所述第一微指令所指示的执行顺序以及处理方式，对所述输入数据进行处理。如果所述第一微指令中没有规定所述处理方式的执行顺序，则所述压缩单元可按照任意执行顺序执行上述三种处理方式，对所述输入数据进行处理，以获得处理后的输入数据。For example, the processing methods indicated by the first microinstruction include: pruning processing, quantization processing, and encoding processing. If the first microinstruction indicates the execution order of the above three processing methods, the compression unit will follow the The execution sequence and processing mode indicated by the first microinstruction process the input data. If the execution order of the processing methods is not specified in the first microinstruction, the compression unit may execute the above three processing methods in any execution order to process the input data to obtain processed input data .

在所述压缩单元内部，为支持多种处理方式的数据处理，设计了控制单元来控制各种处理方式间的迁移。具体的，控制单元可利用控制单元的概念，提出多种运算状态，每一种运算状态对应一种处理方式。例如，用第一状态表示控制单元的初始状态，不进行任何数据处理；用第二状态表示/指示进行数据的剪枝处理；用第三状态指示进行数据的量化处理，第四状态指示进行数据的编码处理。可选的，还可用第五状态指示依次执行数据的剪枝、量化以及编码处理，用第六状态指示依次执行数据的剪枝和量化状态，用第七状态指示依次执行数据的剪枝和编码处理等等，所述运算状态的数量以及所指示的数据的处理方式均可由用户侧或装置侧自定义设置的，本申请并不做限定。Inside the compression unit, in order to support data processing in multiple processing modes, a control unit is designed to control migration between various processing modes. Specifically, the control unit can use the concept of the control unit to propose multiple operation states, and each operation state corresponds to a processing method. For example, use the first state to indicate the initial state of the control unit without any data processing; use the second state to indicate/indicate the pruning of data; use the third state to indicate the quantization of data, and the fourth state indicates to perform data encoding processing. Optionally, the fifth state can also be used to indicate the sequential execution of data pruning, quantization and encoding, the sixth status can be used to indicate the sequential execution of data pruning and quantization, and the seventh status can be used to indicate sequential execution of data pruning and encoding Processing, etc., the number of the operation states and the processing mode of the indicated data can all be customized by the user side or the device side, which is not limited in this application.

下面以所述运算状态包括四个状态为例，阐述所述控制单元实现多种处理方式的具体实施。具体的，所述运算状态具体可为第一状态至第四状态。所述状态可用预设数值或预设数值表示，例如第一状态用00表示，第二状态用01 表示，第三状态用10表示，第四状态用11表示。如图1E示出利用控制单元实现各种状态之间的迁移示意图。在实际应用中，所述控制单元具体可为控制器、累加器、或者其他用于指示多种数据处理方式的物理元器件，本申请不做限定。Hereinafter, taking the operation state including four states as an example, the specific implementation of the control unit to realize various processing modes will be described. Specifically, the computing states may specifically be the first state to the fourth state. The state can be represented by a preset value or a preset value, for example, the first state is represented by 00, the second state is represented by 01, the third state is represented by 10, and the fourth state is represented by 11. FIG. 1E shows a schematic diagram of implementing transitions between various states by using a control unit. In practical applications, the control unit may specifically be a controller, an accumulator, or other physical components for instructing multiple data processing methods, which are not limited in this application.

其中，所述运算状态与所述处理方式关联。具体的，本申请中，所述第一状态00可表示为初始状态，不进行任何数据处理。第二状态01与所述剪枝处理关联，用于表示所述压缩单元可采用剪枝处理方式进行数据的处理。第三状态10与所述量化处理关联，用于表示所述压缩单元可采用量化处理方式进行数据的处理。第四状态11与所述编码处理关联，用于表示所述压缩单元可采用预设编码格式进行数据的编码处理。Wherein, the operation state is associated with the processing mode. Specifically, in this application, the first state 00 may be represented as an initial state without any data processing. The second state 01 is associated with the pruning process, and is used to indicate that the compression unit can use the pruning process to process data. The third state 10 is associated with the quantization processing, and is used to indicate that the compression unit can process data in a quantization processing manner. The fourth state 11 is associated with the coding process, and is used to indicate that the compression unit can use a preset coding format to perform data coding processing.

应理解的，在所述压缩单元进行数据处理之前，所述控制单元处于第一状态(即初始状态)；当所述压缩单元接收到所述第一微指令后，可根据所述第一微指令所指示的处理方式利用控制单元重新设置该控制单元的状态，使得所述压缩单元处于该状态下完成所述处理方式所对应的数据处理。即是，本申请中在所述压缩单元内部可通过所述控制单元重新设置/修改所述运算状态，以在所述压缩单元中完成所述处理方式对应的数据处理。It should be understood that, before the compression unit performs data processing, the control unit is in the first state (that is, the initial state); after the compression unit receives the first microinstruction, it can The processing mode indicated by the instruction uses the control unit to reset the state of the control unit, so that the compression unit is in this state to complete the data processing corresponding to the processing mode. That is, in the present application, the operation status can be reset/modified by the control unit inside the compression unit, so as to complete the data processing corresponding to the processing mode in the compression unit.

例如，所述第一微指令用于指示在所述压缩单元中依次采用剪枝处理、量化处理以及编码处理进行数据的处理。相应地所述压缩单元接收到所述第一微指令后，利用控制单元(如累加器)将第一(初始)状态00设置为第二状态01，在所述处理单元完成所述剪枝处理(即根据剪枝处理，对所述输入数据进行处理后)，可利用该控制单元将第二状态01设置为第三状态10。相应地，在所述压缩单元完成所述量化处理后，可通过控制单元将第三状态10设置为第四状态 11。相应地，在所述压缩单元完成所述第四状态关联的编码处理后，可通过控制单元将所述第四状态11设置为第一(初始)状态00，可结束流程。在本例中，如果所述控制单元为累加器，在状态改变过程中，通过累加器依次加1可实现四个状态之间的相互迁移，在完成第四状态后可将累加器重置为00，即恢复至初始状态。For example, the first microinstruction is used to instruct the compression unit to sequentially adopt pruning processing, quantization processing, and encoding processing to process data. Correspondingly, after the compression unit receives the first microinstruction, the first (initial) state 00 is set to the second state 01 by a control unit (such as an accumulator), and the pruning process is completed in the processing unit (ie after processing the input data according to the pruning process), the control unit can be used to set the second state 01 to the third state 10 . Correspondingly, after the compression unit completes the quantization process, the third state 10 can be set to the fourth state 11 by the control unit. Correspondingly, after the compression unit completes the encoding process associated with the fourth state, the control unit may set the fourth state 11 to the first (initial) state 00, and the process may end. In this example, if the control unit is an accumulator, during the state change process, the mutual migration between the four states can be realized by sequentially adding 1 to the accumulator, and the accumulator can be reset to 00, that is to restore to the initial state.

需要说明的，所述控制单元涉及的运算状态的数量本申请不做限定，其取决于数据的处理方式。例如，当运算指令所指示的数据处理方式包括剪枝处理和量化处理，则所述控制单元涉及的运算状态可对应包括3个状态，上例中对应为第一状态至第三状态。相应地，所述计算装置接收所述运算指令后，根据所述控制单元的控制，依次完成所述第二状态和第三状态所指示的剪枝处理和量化处理，在数据处理完成后，可将所述控制单元的运算状态设置为第一状态 (初始状态)等。It should be noted that the number of operation states involved in the control unit is not limited in this application, and it depends on the data processing manner. For example, when the data processing method indicated by the operation instruction includes pruning processing and quantization processing, the operation states involved in the control unit may correspond to include three states, corresponding to the first state to the third state in the above example. Correspondingly, after the computing device receives the computing instruction, according to the control of the control unit, it sequentially completes the pruning processing and quantization processing indicated by the second state and the third state, and after the data processing is completed, it can The operation state of the control unit is set to the first state (initial state) or the like.

需要说明的，本申请所述压缩单元对输入数据的处理，具体可为对输入权值和/或输入神经元的处理，以减少运算单元中对输入权值或者输入神经元的计算量，从而提高数据处理效率。It should be noted that the processing of the input data by the compression unit in the present application can specifically be the processing of the input weight and/or the input neuron, so as to reduce the calculation amount of the input weight or the input neuron in the operation unit, thereby Improve data processing efficiency.

相应地，在所述运算单元中可根据接收的第二微指令对所述处理后的输入数据执行相应地的运算处理，以获得运算结果。具体存在以下几种具体实施方式。Correspondingly, in the operation unit, according to the received second microinstruction, corresponding operation processing may be performed on the processed input data to obtain an operation result. Specifically, there are several specific implementation modes as follows.

在一种实施方式中，当所述处理后的输入数据包括处理后的输入神经元时，所述运算单元可根据第二微指令对所述处理后的输入神经元和输入权值执行相应地的运算操作，从而获得运算结果。In one embodiment, when the processed input data includes processed input neurons, the operation unit may perform corresponding processing on the processed input neurons and input weights according to the second microinstructions. operation to obtain the result of the operation.

在又一种实施方式中，当所述处理后的输入数据包括处理后的输入权值时，所述运算单元可根据第二微指令对所述处理后的输入权值以及输入神经元执行相应地的运算操作，以获得运算结果。In yet another implementation, when the processed input data includes processed input weights, the operation unit may perform corresponding operations on the processed input weights and input neurons according to the second microinstructions. ground operation to obtain the operation result.

在又一种实施方式中，当所述处理后的输入数据包括处理后的输入神经元和处理后的输入权值，则所述运算单元可根据第二微指令对所述处理后的输入权值和处理后的输入神经元执行相应的运算操作，以获得运算结果。In yet another embodiment, when the processed input data includes processed input neurons and processed input weights, the operation unit can perform the processing on the processed input weights according to the second microinstructions. Values and processed input neurons perform corresponding operations to obtain operation results.

可选的，所述压缩单元可将该运算结果存储到上述输出缓存单元109中，该输出缓存单元109通过上述直接存储访问单元103将该运算结果存储到上述存储单元102中。Optionally, the compression unit may store the calculation result in the output cache unit 109 , and the output cache unit 109 stores the calculation result in the storage unit 102 through the direct storage access unit 103 .

需要指出的是，上述指令缓存单元104、上述第一输入缓存单元105、上述第二输入缓存单元106和上述输出缓存单元109均可为片上缓存。It should be noted that the above-mentioned instruction cache unit 104 , the above-mentioned first input cache unit 105 , the above-mentioned second input cache unit 106 and the above-mentioned output cache unit 109 can all be on-chip caches.

进一步地，上述运算单元108包括但不限定于三个部分，分别为乘法器、一个或多个加法器(可选地，多个加法器组成加法树)和激活函数单元/激活函数运算器。上述乘法器将第一输入数据(in1)和第二输入数据(in2)相乘得到第一输出数据(out1)，过程为：out1＝in1*in2；上述加法树将第三输入数据(in3) 通过加法树逐级相加得到第二输出数据(out2)，其中in3是一个长度为N的向量，N大于1，过称为：out2＝in3[1]+in3[2]+...+in3[N]，和/或将第三输入数据 (in3)通过加法树累加之后得到的结果和第四输入数据(in4)相加得到第二输出数据(out2)，过程为：out2＝in3[1]+in3[2]+...+in3[N]+in4，或者将第三输入数据(in3)和第四输入数据(in4)相加得到第二输出数据(out2)，过称为： out2＝in3+in4；上述激活函数单元将第五输入数据(in5)通过激活函数(active) 运算得到第三输出数据(out3)，过程为：out3＝active(in5)，激活函数active可以是sigmoid、tanh、relu、softmax等函数，除了做激活操作，激活函数单元可以实现其他的非线性函数运算，可将输入数据(in)通过函数(f)运算得到输出数据(out)，过程为：out＝f(in)。Further, the above computing unit 108 includes but is not limited to three parts, namely a multiplier, one or more adders (optionally, multiple adders form an addition tree) and an activation function unit/activation function operator. The above-mentioned multiplier multiplies the first input data (in1) and the second input data (in2) to obtain the first output data (out1), the process is: out1=in1*in2; the above-mentioned addition tree multiplies the third input data (in3) The second output data (out2) is obtained by adding step by step through the addition tree, where in3 is a vector with a length of N, and N is greater than 1, which is called: out2=in3[1]+in3[2]+...+ in3[N], and/or the result obtained after the third input data (in3) is accumulated through the addition tree and the fourth input data (in4) are added to obtain the second output data (out2), the process is: out2=in3[ 1]+in3[2]+...+in3[N]+in4, or add the third input data (in3) and the fourth input data (in4) to get the second output data (out2), which is called : out2=in3+in4; the above-mentioned activation function unit obtains the third output data (out3) through the operation of the fifth input data (in5) through the activation function (active), the process is: out3=active (in5), the activation function active can be For functions such as sigmoid, tanh, relu, and softmax, in addition to activation operations, the activation function unit can implement other nonlinear function operations. The input data (in) can be calculated by the function (f) to obtain the output data (out). The process is: out=f(in).

上述运算单元108还可以包括池化单元，池化单元将输入数据(in)通过池化运算得到池化操作之后的输出数据(out)，过程为out＝pool(in)，其中pool 为池化操作，池化操作包括但不限于：平均值池化，最大值池化，中值池化，输入数据in是和输出out相关的一个池化核中的数据。The above computing unit 108 may also include a pooling unit, the pooling unit obtains the output data (out) after the pooling operation through the input data (in), and the process is out=pool(in), where pool is pooling Operation, pooling operations include but not limited to: average pooling, maximum pooling, median pooling, the input data in is the data in a pooling core related to the output out.

可以看出，在本发明实施例的方案中，上述压缩单元可对输入神经元和权值进行处理，剔除绝对值小于或等于上述阈值的输入神经元和权值，减少了输入神经元和权值的数量，减少了额外的开销，运算单元根据处理后的输入神经元和权值进行人工神经网络运算，提高了运算的效率。It can be seen that in the scheme of the embodiment of the present invention, the above-mentioned compression unit can process the input neurons and weights, and eliminate the input neurons and weights whose absolute value is less than or equal to the above-mentioned threshold, reducing the number of input neurons and weights. The number of values reduces additional overhead, and the calculation unit performs artificial neural network operations based on the processed input neurons and weights, which improves the efficiency of operations.

需要说明的是，上述计算装置不仅可以进行稀疏神经网络运算，还可以进行稠密神经网络运算。上述神经网络运算模块特别适用于稀疏神经网络的运算，是因为稀疏神经网络里0值数据或者绝对值很小的数据非常多。通过压缩单元可以提出这些数据，在保证运算精度的情况下，可提高运算的效率。It should be noted that the above computing device can not only perform sparse neural network operations, but also perform dense neural network operations. The aforementioned neural network operation module is particularly suitable for sparse neural network operations because there are a lot of 0-valued data or data with very small absolute values in the sparse neural network. These data can be provided through the compression unit, and the efficiency of the operation can be improved under the condition of ensuring the accuracy of the operation.

在可选实施例中，在上述剪枝处理过程中，所述压缩单元还可根据位置关系数据，对所述输入数据进行处理，以获得所述处理后的输入数据。具体存在以下几种具体实施方式。In an optional embodiment, during the above pruning process, the compression unit may also process the input data according to the position relationship data, so as to obtain the processed input data. Specifically, there are several specific implementation modes as follows.

在一种具体实施方式中，所述输入数据包括第一输入数据和第二输入数据。具体如图2，上述压缩单元101包括：In a specific implementation manner, the input data includes first input data and second input data. Specifically as shown in Figure 2, the above-mentioned compression unit 101 includes:

第一稀疏处理单元1011，用于对第二输入数据进行处理，以得到第三输出数据和第二输出数据，并将所述第三输出数据传输至第一数据处理单元1012。The first sparse processing unit 1011 is configured to process the second input data to obtain third output data and second output data, and transmit the third output data to the first data processing unit 1012 .

第一数据处理单元1012，用于接收第一输入数据和接收所述第三输出数据，并根据上述第三输出数据和第一输入数据输出第一输出数据。The first data processing unit 1012 is configured to receive the first input data and the third output data, and output the first output data according to the third output data and the first input data.

其中，当所述第一输入数据包括至少一个输入神经元，所述第二输入数据包括至少一个权值时，所述第一输出数据为处理后的输入神经元，所述第二输出数据为处理后的权值，所述第三输出数据为权值的位置关系数据；当所述第一输入数据包括至少一个权值，所述第二输入数据包括至少一个输入神经元时，所述第一输出数据为处理后的权值，所述第二输出数据为处理后的输入神经元，所述第三输出数据为输入神经元的位置关系数据。Wherein, when the first input data includes at least one input neuron, and the second input data includes at least one weight, the first output data is the processed input neuron, and the second output data is After processing the weight, the third output data is the position relationship data of the weight; when the first input data includes at least one weight, and the second input data includes at least one input neuron, the first The first output data is the processed weight value, the second output data is the processed input neuron, and the third output data is the position relationship data of the input neuron.

具体地，当上述第二输入数据为权值时，且权值的形式为w_ij，该w_ij表示第 i个输入神经元与第j个输出神经元之间的权值；上述第一稀疏处理单元1011根据权值确定上述位置关系数据(即上述第三输出数据)，并将上述权值中绝对值小于或者等于第二阈值的权值删除，得到处理后的权值(即上述第二输出数据)；当上述第二输入数据为输入神经元时，上述第一稀疏处理单元1011根据输入神经元得到位置关系数据，并将该输入神经元中的绝对值小于或等于上述第一阈值的输入神经元删除，以得到处理后的输入神经元。Specifically, when the above-mentioned second input data is a weight, and the form of the weight is w _ij , the w _ij represents the weight between the i-th input neuron and the j-th output neuron; the above-mentioned first sparse The processing unit 1011 determines the above-mentioned position relationship data (that is, the above-mentioned third output data) according to the weight value, and deletes the weight value whose absolute value is less than or equal to the second threshold value among the above-mentioned weight values, and obtains the processed weight value (that is, the above-mentioned second threshold value). output data); when the above-mentioned second input data is an input neuron, the above-mentioned first sparse processing unit 1011 obtains the position relation data according to the input neuron, and the absolute value in the input neuron is less than or equal to the above-mentioned first threshold Input neurons are deleted to obtain processed input neurons.

可选地，上述第一阈值可为0.1、0.08、0.05、0.02、0.01、0或者其他值。上述第二阈值可为0.1、0.08、0.06、0.05、0.02、0.01、0或者其他值。需要指出的是，上述第一阈值和上述第二阈值可以一致，也可以不一致。Optionally, the above-mentioned first threshold may be 0.1, 0.08, 0.05, 0.02, 0.01, 0 or other values. The above-mentioned second threshold may be 0.1, 0.08, 0.06, 0.05, 0.02, 0.01, 0 or other values. It should be noted that, the above-mentioned first threshold and the above-mentioned second threshold may be consistent or not.

其中，上述位置关系数据可以步长索引或者直接索引的形式表示。Wherein, the above positional relationship data may be expressed in the form of a step index or a direct index.

具体地，以直接索引形式表示的位置关系数据为由0和1组成的字符串，当上述第二输入数据为权值时，0表示该权值的绝对值小于或者等于上述第二阈值，即该权值对应的输入神经元与输出神经元之间没有连接，1表示该权值的绝对值大于上述第二阈值，即该权值对应的输入神经元与输出神经元之间有连接。以直接索引形式表示的位置关系数据有两种表示顺序：以每个输出神经元与所有输入神经元的连接状态组成一个0和1的字符串来表示权值的连接关系；或者每个输入神经元与所有输出神经元的连接状态组成一个0和1的字符串来表示权值的连接关系。当上述第二输入数据为输入神经元时，0表示该输入神经元的绝对值小于或者等于上述第一阈值，1表示该输入神经元的绝对值大于上述第一阈值。Specifically, the positional relationship data expressed in the form of direct index is a string composed of 0 and 1. When the above-mentioned second input data is a weight value, 0 means that the absolute value of the weight value is less than or equal to the above-mentioned second threshold value, that is There is no connection between the input neuron and the output neuron corresponding to the weight, and 1 indicates that the absolute value of the weight is greater than the second threshold, that is, there is a connection between the input neuron and the output neuron corresponding to the weight. The position relationship data expressed in the form of direct index has two representation orders: the connection state of each output neuron and all input neurons forms a string of 0 and 1 to represent the connection relationship of weights; or each input neuron The connection state of the unit and all output neurons forms a string of 0 and 1 to represent the connection relationship of weights. When the second input data is an input neuron, 0 indicates that the absolute value of the input neuron is less than or equal to the first threshold, and 1 indicates that the absolute value of the input neuron is greater than the first threshold.

当上述第二输入数据为权值时，以步长索引形式表示的位置关系数据为与输出神经元有连接的输入神经元与上一个与该输出神经元有连接的输入神经元之间的距离值组成的字符串；当上述第二输入数据为输入神经元时，以步长索引表示的数据以当前绝对值大于上述第一阈值的输入神经元与上一个绝对值大于上述第一阈值的输入神经元之间的距离值组成的字符串表示。When the above-mentioned second input data is a weight value, the positional relationship data expressed in the form of a step index is the distance between the input neuron connected to the output neuron and the last input neuron connected to the output neuron A character string consisting of values; when the above-mentioned second input data is an input neuron, the data represented by the step index is the input neuron whose current absolute value is greater than the above-mentioned first threshold and the last input whose absolute value is greater than the above-mentioned first threshold A string representation of distance values between neurons.

举例说明，假设上述第一阈值和上述第二阈值均为为0.01，参见图3，图3 为本发明实施例提供的一种神经网络的示意图。如图3中的a图所示，上述第一输入数据为输入神经元，包括输入神经元i1、i2、i3和i4，上述第二输入数据为权值。对于输出神经元o1，权值为w₁₁，w₂₁，w₃₁和w₄₁；对于输出神经元o2，权值w₁₂，w₂₂，w₃₂和w₄₂，其中权值w₂₁，w₁₂和w₄₂的值为0，其绝对值均小于上述第一阈值0.01，上述第一稀疏处理单元1011确定上述输入神经元i2和输出神经元o1没有连接，上述输入神经元i1和i4与输出神经元o2没有连接，上述输入神经元i1、i3和i4与上述输出神经元o1有连接，上述输入神经元i2和i3 与输出神经元o2有连接。以每个输出神经元与所有输入神经元的连接状态表示上述位置关系数据，则上述输出神经元o1的位置关系数据为“1011”，输出神经元o2的位置关系数据为“0110”(即上述位置关系数据为“10110110”)；以每个输入神经元与所有输出神经元的连接关系，则输入神经元i1的位置关系数据为“10”，输入神经元i2的位置关系数据为“01”，输入神经元i3的位置关系数据为“11”，输入神经元i4的位置关系数据为“10”(即上述位置关系数据为“10011110”)。For example, assuming that both the above-mentioned first threshold and the above-mentioned second threshold are 0.01, refer to FIG. 3 , which is a schematic diagram of a neural network provided by an embodiment of the present invention. As shown in diagram a of FIG. 3 , the above-mentioned first input data are input neurons, including input neurons i1, i2, i3 and i4, and the above-mentioned second input data are weights. For the output neuron o1, the weights are w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ ; for the output neuron o2, the weights w ₁₂ , w ₂₂ , w ₃₂ and w ₄₂ , where the weights w ₂₁ , w ₁₂ and The value of w ₄₂ is 0, and its absolute value is less than the above-mentioned first threshold 0.01. The above-mentioned first sparse processing unit 1011 determines that the above-mentioned input neuron i2 and output neuron o1 are not connected, and the above-mentioned input neuron i1 and i4 are connected to the output neuron o2 is not connected, the input neurons i1, i3 and i4 are connected to the output neuron o1, and the input neurons i2 and i3 are connected to the output neuron o2. The above-mentioned positional relationship data is represented by the connection state of each output neuron and all input neurons, then the positional relationship data of the above-mentioned output neuron o1 is "1011", and the positional relationship data of the output neuron o2 is "0110" (that is, the above-mentioned The positional relationship data is "10110110"); based on the connection relationship between each input neuron and all output neurons, the positional relationship data of input neuron i1 is "10", and the positional relationship data of input neuron i2 is "01" , the positional relationship data of the input neuron i3 is "11", and the positional relationship data of the input neuron i4 is "10" (that is, the above positional relationship data is "10011110").

对于上述输出神经元o1，上述压缩单元101将上述i1与w₁₁，i3与w₃₁和 i4与w₄₁分别作为一个数据集，并将该数据集存储到上述存储单元102中；对于输出神经元o2，上述压缩单元101将上述i2与w₂₂和i3与w₃₂分别作为一个数据集，并将该数据集存储到上述存储单元102中。For the above-mentioned output neuron o1, the above-mentioned compression unit 101 uses the above-mentioned i1 and w ₁₁ , i3 and w ₃₁ and i4 and w ₄₁ respectively as a data set, and stores the data set in the above-mentioned storage unit 102; for the output neuron o2, the above-mentioned compression unit 101 takes the above-mentioned i2 and w ₂₂ and i3 and w ₃₂ respectively as a data set, and stores the data set in the above-mentioned storage unit 102 .

针对上述输出神经元o1，上述第二输出数据为w₁₁，w₃₁和w₄₁；针对上述输出神经元o2，上述第二输出数据为w₂₂和w₃₂。For the output neuron o1, the second output data are w ₁₁ , w ₃₁ and w ₄₁ ; for the output neuron o2, the second output data are w ₂₂ and w ₃₂ .

当上述第二输入数据为输入神经元i1、i2、i3和i4，且该输入神经元的值分别为1，0，3，5则上述位置关系数据(即上述第三输出数据)为“1011”，上述第二输出数据为1，3，5。When the above-mentioned second input data are input neurons i1, i2, i3 and i4, and the values of the input neurons are respectively 1, 0, 3, and 5, then the above-mentioned position relationship data (that is, the above-mentioned third output data) is "1011 ", the above-mentioned second output data is 1, 3, 5.

如图3中的b图所示，上述第一输入数据包括输入神经元i1、i2、i3和i4，上述第二输入数据为权值。对于输出神经元o1，权值为w₁₁，w₂₁，w₃₁和w₄₁；对于输出神经元o2，权值w₁₂，w₂₂，w₃₂和w₄₂，其中权值w₂₁，w₁₂和w₄₂的值为0，上述稀疏处理单元1011确定上述输入神经元i1、i3和i4与上述输出神经元o1有连接，上述输入神经元i2和i3与输出神经元o2有连接。上述输出神经元o1与输入神经元之间的位置关系数据为“021”。其中，该位置关系数据中第一个数字“0”表示第一个与输出神经元o1有连接的输入神经元与第一个输入神经元之间的距离为0，即第一个与输出神经元o1有连接的输入神经元为输入神经元i1；上述位置关系数据中第二个数字“2”表示第二个与输出神经元o1有连接的输入神经元与第一个与输出神经元o1有连接的输入神经元(即输入神经元i1) 之间的距离为2，即第二个与输出神经元o1有连接的输入神经元为输入神经元 i3；上述位置关系数据中第三个数字“1”表示第三个与输出神经元o1有连接的输入神经元与第二个与该输出神经元o1有连接的输入神经元之间的距离为1，即第三个与输出神经元o1有连接的输入神经元为输入神经元i4。As shown in diagram b of FIG. 3 , the first input data includes input neurons i1 , i2 , i3 and i4 , and the second input data is a weight. For the output neuron o1, the weights are w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ ; for the output neuron o2, the weights w ₁₂ , w ₂₂ , w ₃₂ and w ₄₂ , where the weights w ₂₁ , w ₁₂ and The value of w ₄₂ is 0, and the sparse processing unit 1011 determines that the input neurons i1, i3 and i4 are connected to the output neuron o1, and the input neurons i2 and i3 are connected to the output neuron o2. The positional relationship data between the output neuron o1 and the input neuron is "021". Among them, the first number "0" in the positional relationship data indicates that the distance between the first input neuron connected to the output neuron o1 and the first input neuron is 0, that is, the distance between the first input neuron connected to the output neuron o1 is 0. The input neuron connected to element o1 is input neuron i1; the second number "2" in the above positional relationship data indicates that the second input neuron connected to output neuron o1 is the first input neuron connected to output neuron o1 The distance between connected input neurons (that is, input neuron i1) is 2, that is, the second input neuron connected to output neuron o1 is input neuron i3; the third number in the above positional relationship data "1" indicates that the distance between the third input neuron connected to the output neuron o1 and the second input neuron connected to the output neuron o1 is 1, that is, the third one is connected to the output neuron o1 The connected input neuron is input neuron i4.

上述输出神经元o2与输入神经元之间的位置关系数据为“11”。其中，该位置关系数据中的第一数字“1”表示第一个与输出神经元o2有连接的输入神经元与第一个输入神经元(即输入神经元i1)之间的距离为，即该第一个与输出神经元o2有连接关系的输入神经元为输出神经元i2；上述位置关系数据中的第二数字“1”表示第二个与输出神经元o2有连接的输入神经元与第一个与输出神经元o2有连接的输入神经元的距离为1，即第二个与输出神经元o2有连接的输入神经元为输入神经元i3。The positional relationship data between the output neuron o2 and the input neuron is "11". Wherein, the first number "1" in the positional relationship data indicates that the distance between the first input neuron connected to the output neuron o2 and the first input neuron (ie, input neuron i1) is, namely The first input neuron that is connected to the output neuron o2 is the output neuron i2; the second number "1" in the above positional relationship data indicates that the second input neuron that is connected to the output neuron o2 is connected to The distance between the first input neuron connected to output neuron o2 is 1, that is, the second input neuron connected to output neuron o2 is input neuron i3.

当上述第二输入数据为输入神经元i1、i2、i3和i4，且该输入神经元的值分别为1，0，3，5则上述位置关系数据即上述第三输出数据为“021”，上述第二输出数据为1，3，5。When the above-mentioned second input data are input neurons i1, i2, i3 and i4, and the values of the input neurons are 1, 0, 3, and 5 respectively, then the above-mentioned positional relationship data, that is, the above-mentioned third output data is "021", The above-mentioned second output data are 1, 3, 5.

当上述第一输入数据为输入神经元时，则上述第二输入数据为权值，上述第三输出数据为输出神经元与上述输入神经元之间的位置关系数据。上述第一数据处理单元1012接收到上述输入神经元后，将该输入神经元中绝对值小于或等于上述第二阈值的输入神经元剔除，并根据上述位置关系数据，从剔除后的输入神经元中选择与上述权值相关的输入神经元，作为第一输出数据输出。When the first input data is an input neuron, the second input data is a weight, and the third output data is positional relationship data between the output neuron and the input neuron. After the above-mentioned first data processing unit 1012 receives the above-mentioned input neuron, it removes the input neuron whose absolute value is less than or equal to the above-mentioned second threshold among the input neurons, and according to the above-mentioned positional relationship data, from the input neuron after removal Select the input neuron related to the above weight in , and output it as the first output data.

举例说明，假设上述第一阈值为0，上述输入神经元i1、i2、i3和i4，其值分别为1，0，3和5，对于输出神经元o1，上述第三输出数据(即位置关系数据)为“021”，上述第二输出数据为w₁₁，w₃₁和w₄₁。上述第一数据处理单元1012 将上述输入神经元i1、i2、i3和i4中值为0的输入神经元剔除，得到输入神经元i1、i3和i4。该第一数据处理单元1012根据上述第三输出数据“021”确定上述输入神经元i1、i3和i4均与上述输出神经元均有连接，故上述数据处理单元 1012将上述输入神经元i1、i3和i4作为第一输出数据输出，即输出1，3，5。For example, assuming that the above-mentioned first threshold is 0, the values of the above-mentioned input neurons i1, i2, i3 and i4 are respectively 1, 0, 3 and 5, and for the output neuron o1, the above-mentioned third output data (ie positional relationship data) is "021", and the above-mentioned second output data are w ₁₁ , w ₃₁ and w ₄₁ . The first data processing unit 1012 removes input neurons with a value of 0 among the input neurons i1, i2, i3 and i4 to obtain input neurons i1, i3 and i4. The first data processing unit 1012 determines that the above-mentioned input neurons i1, i3 and i4 are all connected to the above-mentioned output neurons according to the above-mentioned third output data "021", so the above-mentioned data processing unit 1012 connects the above-mentioned input neurons i1, i3 and i4 are output as the first output data, that is, 1, 3, 5 are output.

当上述第一输入数据为权值，上述第二输入数据为输入神经元时，上述第三输出数据为上述输入神经元的位置关系数据。上述第一数据处理单元1012接收到上述权值w₁₁，w₂₁，w₃₁和w₄₁后，将该权值中绝对值小于上述第一阈值的权值剔除，并根据上述位置关系数据，从上述剔除后的权值中选择与该上述输入神经元相关的权值，作为第一输出数据并输出。When the first input data is a weight and the second input data is an input neuron, the third output data is positional relationship data of the input neuron. After the above-mentioned first data processing unit 1012 receives the above-mentioned weight values w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ , it removes the weights whose absolute value is smaller than the above-mentioned first threshold value among the weight values, and according to the above-mentioned position relationship data, from A weight related to the input neuron is selected from the eliminated weights, and is output as the first output data.

举例说明，假设上述第二阈值为0，上述权值w₁₁，w₂₁，w₃₁和w₄₁，其值分别为1，0，3和4，对于输出神经元o1，上述第三输出数据(即位置关系数据) 为“1011”，上述第二输出数据为i1，i3和i5。上述第一数据处理单元1012将上述权值w₁₁，w₂₁，w₃₁和w₄₁中值为0的输入神经元剔除，得到权值w₁₁，w₂₁， w₃₁和w₄₁。该第一数据处理单元1012根据上述第三输出数据“1011”确定上述输入神经元i1、i2，i3和i4中的输入神经元i2的值为0，故上述第一数据处理单元1012将上述输入神经元1，3和4作为第一输出数据输出。For example, assuming that the above-mentioned second threshold is 0, and the above-mentioned weight values w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ are respectively 1, 0, 3 and 4, for the output neuron o1, the above-mentioned third output data ( That is, position relation data) is "1011", and the above-mentioned second output data are i1, i3 and i5. The first data processing unit 1012 removes input neurons with a value of 0 among the weights w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ to obtain weights w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ . The first data processing unit 1012 determines that the value of the input neuron i2 among the above-mentioned input neurons i1, i2, i3 and i4 is 0 according to the above-mentioned third output data "1011", so the above-mentioned first data processing unit 1012 converts the above-mentioned input Neurons 1, 3 and 4 output as the first output data.

在一种可行的实施例中，第三输入数据和第四输入数据分别为至少一个权值和至少一个输入神经元，上述压缩单元101确定上述至少一个输入神经元中绝对值大于上述第一阈值的输入神经元的位置，并获取输入神经元的位置关系数据；上述压缩单元101确定上述至少一个权值中绝对值大于上述第二阈值的权值的位置，并获取权值的位置关系数据。上述压缩单元101根据上述权值的位置关系数据和输入神经元的位置关系数据得到一个新的位置关系数据，该位置关系数据表示上述至少一个输入神经元中绝对值大于上述第一阈值的输入神经元与输出神经元之间的关系和对应的权值的值。101压缩单元101根据该新的位置关系数据、上述至少一个输入神经元和上述至少一个权值获取处理后的输入神经元和处理后的权值。In a feasible embodiment, the third input data and the fourth input data are at least one weight and at least one input neuron respectively, and the compression unit 101 determines that the absolute value of the at least one input neuron is greater than the first threshold The position of the input neuron, and obtain the positional relationship data of the input neuron; the compression unit 101 determines the position of the weight whose absolute value is greater than the second threshold among the at least one weight, and obtains the positional relationship data of the weight. The compression unit 101 obtains a new positional relationship data according to the positional relationship data of the weight and the positional relationship data of the input neuron, and the positional relationship data indicates that the absolute value of the at least one input neuron is greater than the first threshold. The relationship between the neuron and the output neuron and the value of the corresponding weight. 101 The compression unit 101 acquires a processed input neuron and a processed weight according to the new positional relationship data, the at least one input neuron, and the at least one weight.

进一步地，上述压缩单元101将上述处理后的输入神经元和处理后的权值按照一一对应的格式存储到上述存储单元102中。Further, the compression unit 101 stores the processed input neurons and the processed weights in the storage unit 102 in a one-to-one correspondence format.

具体地，上述压缩单元101对上述处理后的输入神经元和上述处理后的权值按照一一对应的格式进行存储的具体方式是将上述处理后的输入神经元中的每个处理后的输入神经元和与其对应的处理后的权值作为一个数据集，并将该数据集存储到上述存储单元102中。Specifically, the compression unit 101 stores the processed input neurons and the processed weights in a one-to-one correspondence format by storing each processed input of the processed input neurons The neurons and their corresponding processed weights are used as a data set, and the data set is stored in the above-mentioned storage unit 102 .

对于压缩单元101包括第一稀疏处理单元1011和第一数据处理单元1012 的情况，压缩单元101中的稀疏处理单元1011对输入神经元或者权值进行稀疏化处理，减小了权值或者输入神经元的数量，进而减小了运算单元进行运算的次数，提高了运算效率。For the case where the compression unit 101 includes a first sparse processing unit 1011 and a first data processing unit 1012, the sparse processing unit 1011 in the compression unit 101 performs sparse processing on the input neurons or weights, reducing the weight or input neurons. The number of elements reduces the number of calculations performed by the calculation unit and improves the calculation efficiency.

在又一种具体实施方式中，所述输入数据包括第一输入数据和第二输入数据。具体如图4，上述压缩单元101包括：In yet another specific implementation manner, the input data includes first input data and second input data. Specifically as shown in Figure 4, the above-mentioned compression unit 101 includes:

第二稀疏处理单元1013，用于接收到第三输入数据后，根据所述第三输入数据得到第一位置关系数据，并将该第一位置关系数据传输至连接关系处理单元1015；The second sparse processing unit 1013 is configured to obtain first positional relationship data according to the third input data after receiving the third input data, and transmit the first positional relationship data to the connection relationship processing unit 1015;

第三稀疏处理单元1014，用于接收到第四输入数据后，根据所述第四输入数据得到第二位置关系数据，并将该第二位置关系数据传输至所述连接关系处理单元1015；The third sparse processing unit 1014 is configured to obtain second position relationship data according to the fourth input data after receiving the fourth input data, and transmit the second position relationship data to the connection relationship processing unit 1015;

所述连接关系处理单元1015，用于根据所述第一位置关系数据和所述第二位置关系数据，以得到第三位置关系数据，并将该第三位置关系数据传输至第二数据处理单元1016；The connection relation processing unit 1015 is configured to obtain third position relation data according to the first position relation data and the second position relation data, and transmit the third position relation data to the second data processing unit 1016;

所述第二数据处理单元1016，用于在接收到所述第三输入数据，所述第四输入数据和所述第三位置关系数据后，根据所述第三位置关系数据对所述第三输入数据和所述第四输入数据进行处理，以得到第四输出数据和第五输出数据；The second data processing unit 1016 is configured to, after receiving the third input data, the fourth input data and the third position relationship data, process the third position relationship data according to the third position relationship data. processing input data and said fourth input data to obtain fourth output data and fifth output data;

其中，当所述第三输入数据包括至少一个输入神经元，第四输入数据包括至少一个权值时，所述第一位置关系数据为输入神经元的位置关系数据，所述第二位置关系数据为权值的位置关系数据，所述第四输出数据为处理后的输入神经元，所述第五输出数据为处理后的权值；当所述第三输入数据包括至少一个权值，所述第四输入数据包括至少一个输入神经元时，所述第一位置关系数据为权值的位置关系数据，所述第二位置关系数据为输入神经元的位置关系数据，所述第四输出数据为处理后的权值，所述第五输出数据为处理后的输入神经元。Wherein, when the third input data includes at least one input neuron, and the fourth input data includes at least one weight, the first positional relationship data is the positional relationship data of the input neuron, and the second positional relationship data is the position relationship data of the weight, the fourth output data is the processed input neuron, and the fifth output data is the processed weight; when the third input data includes at least one weight, the When the fourth input data includes at least one input neuron, the first positional relationship data is the positional relationship data of the weight, the second positional relationship data is the positional relationship data of the input neuron, and the fourth output data is The processed weight value, the fifth output data is the processed input neuron.

当上述第三输入数据包括至少一个输入神经元时，上述第一位置关系数据为用于表示该至少一个输入神经元中绝对值大于上述第一阈值的输入神经元的位置的字符串；当上述第三输入数据包括至少一个权值时，上述第一位置关系数据为用于表示输入神经元与输出神经元之间是否有连接的字符串。When the above-mentioned third input data includes at least one input neuron, the above-mentioned first positional relationship data is a character string used to represent the position of the input neuron whose absolute value is greater than the above-mentioned first threshold among the at least one input neuron; when the above-mentioned When the third input data includes at least one weight value, the above-mentioned first positional relationship data is a character string used to indicate whether there is a connection between the input neuron and the output neuron.

当上述第四输入数据包括至少一个输入神经元时，上述第二位置关系数据为用于表示该至少一个输入神经元中绝对值大于上述第一阈值的输入神经元的位置的字符串；当上述第四输入数据包括至少一个权值时，上述第二位置关系数据为用于表示输入神经元与输出神经元之间是否有连接的字符串。When the above-mentioned fourth input data includes at least one input neuron, the above-mentioned second positional relationship data is a character string used to represent the position of the input neuron whose absolute value is greater than the above-mentioned first threshold in the at least one input neuron; when the above-mentioned When the fourth input data includes at least one weight, the above-mentioned second position relationship data is a character string used to indicate whether there is a connection between the input neuron and the output neuron.

需要说明的是，上述第一位置关系数据、第二位置关系数据和第三位置关系数据均可以步长索引或者直接索引的形式表示，具体可参见上述相关描述。换句话说，上述连接关系处理单元1015对上述第一位置关系数据和上述第二位置关系数据进行处理，以得到第三位置关系数据。该第三位置关系数据可以直接索引或者步长索引的形式表示。It should be noted that, the above-mentioned first positional relationship data, second positional relationship data and third positional relationship data can all be represented in the form of a step index or a direct index, for details, please refer to the relevant description above. In other words, the connection relation processing unit 1015 processes the first position relation data and the second position relation data to obtain the third position relation data. The third positional relationship data may be expressed in the form of a direct index or a step index.

具体地，上述当上述第一位置关系数据和上述第二位置关系数据均以直接索引的形式表示时，上述连接关系处理单元1015对上述第一位置关系数据和上述第二位置关系数据进行与操作，以得到第三位置关系数据，该第三位置关系数据是以直接索引的形式表示的。需要说明的是，表示上述第一位置关系数据和第二位置关系数据的字符串在内存中是按照物理地址高低的顺序存储的，可以是由高到低的顺序存储的，也可以是由低到高的顺序存储的。Specifically, when the above-mentioned first positional relationship data and the above-mentioned second positional relationship data are both represented in the form of direct indexes, the connection relationship processing unit 1015 performs an AND operation on the first positional relationship data and the second positional relationship data , to obtain the third position relationship data, the third position relationship data is expressed in the form of direct index. It should be noted that the character strings representing the first positional relationship data and the second positional relationship data are stored in the memory according to the order of physical addresses, which can be stored in order from high to low, or from low to low. stored in high order.

当上述第一位置关系数据和上述第二位置关系数据均以步长索引的形式表示，且表示上述第一位置关系数据和第二位置关系数据的字符串是按照物理地址由低到高的顺序存储时，上述连接关系处理单元1015将上述第一位置关系数据的字符串中的每一个元素与存储物理地址低于该元素存储的物理地址的元素进行累加，得到的新的元素组成第四位置关系数据；同理，上述连接关系处理单元1015对上述第二位置关系数据的字符串进行同样的处理，得到第五位置关系数据。然后上述连接关系处理单元1015从上述第四位置关系数据的字符串和上述第五位置关系数据的字符串中，选取相同的元素，按照元素值从小到大的顺序排序，组成一个新的字符串。上述连接关系处理单元1015将上述新的字符串中将每一个元素与其相邻且值小于该元素值的元素进行相减，以得到一个新的元素。按照该方法，对上述新的字串中的每个元素进行相应的操作，以得到上述第三位置关系数据。When the above-mentioned first positional relationship data and the above-mentioned second positional relationship data are both expressed in the form of a step index, and the character strings representing the above-mentioned first positional relationship data and the second positional relationship data are in the order of physical addresses from low to high When storing, the above-mentioned connection relationship processing unit 1015 accumulates each element in the character string of the above-mentioned first position relationship data and the element whose storage physical address is lower than the physical address of the element storage, and the obtained new element forms the fourth position Relationship data; similarly, the connection relationship processing unit 1015 performs the same processing on the character strings of the second position relationship data to obtain the fifth position relationship data. Then the connection relationship processing unit 1015 selects the same elements from the character strings of the fourth positional relationship data and the fifth positional relationship data, and sorts the element values in ascending order to form a new character string . The connection relationship processing unit 1015 subtracts each element in the new character string from its adjacent element whose value is smaller than the value of the element, so as to obtain a new element. According to this method, a corresponding operation is performed on each element in the above-mentioned new character string to obtain the above-mentioned third positional relationship data.

举例说明，假设以步长索引的形式表示上述第一位置关系数据和上述第二位置关系数据，上述第一位置关系数据的字符串为“01111”，上述第二位置关系数据的字符串为“022”，上述连接关系处理单元1015将上述第一位置关系数据的字符串中的每个元素与其相邻的前一个元素相加，得到第四位置关系数据“01234”；同理，上述连接关系处理单元1015对上述第二位置关系数据的字符串进行相同的处理后得到的第五位置关系数据为“024”。上述连接关系处理单元 1015从上述第四位置关系数据“01234”和上述第五位置关系数据“024”选组相同的元素，以得到新的字符串“024”。上述连接关系处理单元1015将该新的字符串中的每个元素与其相邻的前一个元素进行相减，即0，(2-0)，(4-2)，以得到上述第三连接数据“022”。For example, assuming that the above-mentioned first position relationship data and the above-mentioned second position relationship data are represented in the form of a step index, the character string of the above-mentioned first position relationship data is "01111", and the character string of the above-mentioned second position relationship data is " 022", the above-mentioned connection relationship processing unit 1015 adds each element in the character string of the above-mentioned first position relationship data to its adjacent previous element to obtain the fourth position relationship data "01234"; similarly, the above-mentioned connection relationship The fifth positional relationship data obtained after the processing unit 1015 performs the same processing on the character string of the second positional relationship data is "024". The connection relationship processing unit 1015 selects the same elements from the fourth position relationship data "01234" and the fifth position relationship data "024" to obtain a new character string "024". The above connection relationship processing unit 1015 subtracts each element in the new character string from its adjacent previous element, that is, 0, (2-0), (4-2), to obtain the above third connection data "022".

当上述第一位置关系数据和上述第二位置关系数据中的任意一个以步长索引的形式表示，另一个以直接索引的形式表示时，上述连接关系处理单元1015 将上述以步长索引表示的位置关系数据转换成以直接索引的表示形式或者将以直接索引表示的位置关系数据转换成以步长索引表示的形式。然后上述连接关系处理单元1015按照上述方法进行处理，以得到上述第三位置关系数据(即上述第五输出数据)。When any one of the above-mentioned first position relationship data and the above-mentioned second position relationship data is expressed in the form of a step index, and the other is expressed in the form of a direct index, the connection relationship processing unit 1015 will The positional relationship data is converted into a representation form represented by a direct index or the positional relation data represented by a direct index is converted into a form represented by a step index. Then the connection relationship processing unit 1015 performs processing according to the above method to obtain the third position relationship data (that is, the fifth output data).

可选地，当上述第一位置关系数据和上述第二位置关系数据均以直接索引的形式表示时，上述连接关系处理单元1015将上述第一位置关系数据和上述第二位置关系数据均转换成以步长索引的形式表示的位置关系数据，然后按照上述方法对上述第一位置关系数据和上述第二位置关系数据进行处理，以得到上述第三位置关系数据。Optionally, when both the above-mentioned first position relationship data and the above-mentioned second position relationship data are expressed in the form of a direct index, the above-mentioned connection relationship processing unit 1015 converts both the above-mentioned first position relationship data and the above-mentioned second position relationship data into The positional relationship data expressed in the form of a step index, and then process the first positional relationship data and the second positional relationship data according to the above method to obtain the third positional relationship data.

具体地，上述第三输入数据可为输入神经元或者权值、第四输入数据可为输入神经元或者权值，且上述第三输入数据和第四输入数据不一致。上述第二数据处理单元1016根据上述第三位置关系数据从上述第三输入数据(即输入神经元或者权值)中选取与该第三位置关系数据相关的数据，作为第四输出数据；上述第二数据处理单元1016根据上述第三位置关系数据从上述第四输入数据中选取与该第三位置关系数据相关的数据，作为第五输出数据。Specifically, the third input data may be an input neuron or a weight, the fourth input data may be an input neuron or a weight, and the third input data and the fourth input data are inconsistent. The second data processing unit 1016 selects data related to the third position relationship data from the third input data (ie input neurons or weights) according to the third position relationship data as the fourth output data; The second data processing unit 1016 selects data related to the third position relationship data from the fourth input data according to the third position relationship data as the fifth output data.

进一步地，上述第二数据处理单元1016将上述处理后的输入神经元中的每个处理后的输入神经元与其对应的处理后的权值作为一个数据集，将该数据集存储出上述存储单元102中。Further, the second data processing unit 1016 takes each of the processed input neurons and its corresponding processed weight as a data set, and stores the data set in the storage unit 102 in.

举例说明，假设上述第三输入数据包括输入神经元i1，i2，i3和i4，上述第四输入数据包括权值w₁₁，w₂₁，w₃₁和w₄₁，上述第三位置关系数据以直接索引方式表示，为“1010”，则上述第二数据处理单元1016输出的第四输出数据为输入神经元i1和i3，输出的第五输出数据为权值w₁₁和w₃₁。上述第二数据处理单元1016将输入神经元i1与权值w₁₁和输入神经元i3与权值w₃₁分别作为一个数据集，并将该数据集存储到上述存储单元102中。For example, assuming that the above-mentioned third input data includes input neurons i1, i2, i3 and i4, the above-mentioned fourth input data includes weights w ₁₁ , w ₂₁ , w ₃₁ and w ₄₁ , and the above-mentioned third position relationship data is directly indexed The mode indicates that it is "1010", then the fourth output data output by the second data processing unit 1016 is the input neurons i1 and i3, and the fifth output data output is the weights w ₁₁ and w ₃₁ . The second data processing unit 1016 takes the input neuron i1 and the weight w ₁₁ and the input neuron i3 and the weight w ₃₁ as a data set respectively, and stores the data set in the storage unit 102 .

对于压缩单元101包括第二稀疏处理单元1013，第三稀疏处理单元1014、连接关系处理单元1015和第二数据处理单元1016的情况，压缩单元101中的稀疏处理单元对输入神经元和权值均进行稀疏化处理，使得输入神经元和权值的数量进一步减小，进而减小了运算单元的运算量，提高了运算效率。For the case where the compression unit 101 includes the second sparse processing unit 1013, the third sparse processing unit 1014, the connection relationship processing unit 1015 and the second data processing unit 1016, the sparse processing unit in the compression unit 101 performs both input neurons and weights Thinning processing is performed to further reduce the number of input neurons and weights, thereby reducing the calculation amount of the calculation unit and improving the calculation efficiency.

在又一种具体实施方式中，所述输入数据包括至少一个输入权值或至少一个输入神经元。如图5，上述压缩单元601包括：In yet another specific implementation manner, the input data includes at least one input weight or at least one input neuron. As shown in Figure 5, the above compression unit 601 includes:

输入数据缓存单元6011，用于缓存所述输入数据，该输入数据包括至少一个输入神经元或者至少一个权值。The input data cache unit 6011 is configured to cache the input data, where the input data includes at least one input neuron or at least one weight.

连接关系缓存单元6012，用于缓存输入数据的位置关系数据，即上述输入神经元的位置关系数据或者上述权值的位置关系数据。The connection relationship cache unit 6012 is configured to cache the position relationship data of the input data, that is, the position relationship data of the input neurons or the position relationship data of the weights.

其中，上述输入神经元的位置关系数据为用于表示该输入神经元中绝对值是否小于或者等于第一阈值的字符串，上述权值的位置关系数据为表示该权值绝对值是否小于或者等于上述第一阈值的字符串，或者为表示该权值对应的输入神经元和输出神经元之间是否有连接的字符串。该输入神经元的位置关系数据和权值的位置关系数据可以直接索引或者步长索引的形式表示。Wherein, the above-mentioned positional relationship data of the input neuron is a character string used to indicate whether the absolute value of the input neuron is less than or equal to the first threshold, and the above-mentioned positional relationship data of the weight is to indicate whether the absolute value of the weight is less than or equal to The character string of the above-mentioned first threshold, or a character string indicating whether there is a connection between the input neuron and the output neuron corresponding to the weight. The positional relationship data of the input neuron and the positional relationship data of the weights can be expressed in the form of a direct index or a step index.

第四稀疏处理单元6013，用于根据所述输入数据的位置关系数据对所述输入数据进行处理，以得到处理后的输入数据，并将该处理后的输入数据存储到上述第一输入缓存单元中605。The fourth sparse processing unit 6013 is configured to process the input data according to the position relationship data of the input data to obtain processed input data, and store the processed input data in the above-mentioned first input buffer unit Middle 605.

在上述三个具体实施方式中，在所述压缩单元101对所述输入数据进行处理之前，所述压缩单元101还用于：In the above three specific implementation manners, before the compression unit 101 processes the input data, the compression unit 101 is further configured to:

对所述至少一个输入神经元进行分组，以得到M组输入神经元，所述M为大于或者等于1的整数；Grouping the at least one input neuron to obtain M groups of input neurons, where M is an integer greater than or equal to 1;

判断所述M组输入神经元的每一组输入神经元是否满足第一预设条件，所述第一预设条件包括一组输入神经元中绝对值小于或者等于第三阈值的输入神经元的个数小于或者等于第四阈值；judging whether each group of input neurons of the M groups of input neurons satisfies a first preset condition, and the first preset condition includes input neurons whose absolute value is less than or equal to a third threshold in a group of input neurons The number is less than or equal to the fourth threshold;

当所述M组输入神经元任意一组输入神经元不满足所述第一预设条件时，将该组输入神经元删除；When any group of input neurons in the M groups of input neurons does not meet the first preset condition, delete the group of input neurons;

对所述至少一个权值进行分组，以得到N组权值，所述N为大于或者等于 1的整数；Grouping the at least one weight to obtain N groups of weights, where N is an integer greater than or equal to 1;

判断所述N组权值的每一组权值是否满足第二预设条件，所述第二预设条件包括一组权值中绝对值小于或者等于第五阈值的权值的个数小于或者等于第六阈值；Judging whether each group of weights of the N groups of weights satisfies a second preset condition, the second preset condition includes that the number of weights in a group of weights whose absolute value is less than or equal to the fifth threshold is less than or equal to the sixth threshold;

当所述N组权值任意一组权值不满足所述第二预设条件时，将该组权值删除。When any set of weights of the N sets of weights does not satisfy the second preset condition, the set of weights is deleted.

可选地，上述第三阈值可为0.5，0.2，0.1，0.05，0.025，0.0，0或者其他值。其中，上述第四阈值与上述一组输入神经元中输入神经元的个数相关。可选地，该第四阈值＝一组输入神经元中的输入神经元个数-1或者该第四阈值为其他值。可选地，上述第五阈值可为0.5，0.2，0.1，0.05，0.025，0.01，0或者其他值。其中，上述第六阈值与上述一组权值中的权值个数相关。可选地，该第六阈值＝一组权值中的权值个数-1或者该第六阈值为其他值。需要说明的是，上述第三阈值和上述第五阈值可相同或者不同，上述第四阈值和上述第六阈值可相同或者不同。Optionally, the above third threshold may be 0.5, 0.2, 0.1, 0.05, 0.025, 0.0, 0 or other values. Wherein, the above-mentioned fourth threshold is related to the number of input neurons in the above-mentioned group of input neurons. Optionally, the fourth threshold=the number of input neurons in a group of input neurons-1 or the fourth threshold is other values. Optionally, the fifth threshold may be 0.5, 0.2, 0.1, 0.05, 0.025, 0.01, 0 or other values. Wherein, the sixth threshold is related to the number of weights in the set of weights. Optionally, the sixth threshold=number of weights in a group of weights-1 or the sixth threshold is other values. It should be noted that the third threshold and the fifth threshold may be the same or different, and the fourth threshold and the sixth threshold may be the same or different.

需要说明的是，本申请中涉及的位置关系数据的表示方式除了直接索引和步长索引之外，还可为以下几种情况：列表的列表(List of Lists，LIL)、坐标列表(Coordinatelist，COO)、压缩稀疏行(Compressed Sparse Row，CSR)、压缩稀疏列(Compressed SparseColumn，CSC)、(ELL Pack，ELL)以及混合(Hybird， HYB)。It should be noted that, in addition to the direct index and the step index, the representation of the positional relationship data involved in this application can also be the following situations: List of Lists (List of Lists, LIL), Coordinate List (Coordinatelist, COO), compressed sparse row (Compressed Sparse Row, CSR), compressed sparse column (Compressed SparseColumn, CSC), (ELL Pack, ELL) and hybrid (Hybird, HYB).

需要指出的是，本发明实施例中提到的输入神经元和输出神经元并非是指整个神经网络的输入层中的神经元和输出层中的神经元，而是对于神经网络中任意相邻的两层神经元，处于网络前馈运算下层中的神经元即为输入神经元，处于网络前馈运算上层中的神经元即为输出神经元。以卷积神经网络为例，假设一个卷积神经网络有L层，K＝1,2,3…L-1，对于第K层和第K+1层来说，第 K层被称为输入层，该层中的神经元为上述输入神经元，第K+1层被称为输入层，该层中的神经元为上述输出神经元，即除了顶层之外，每一层都可以作为输入层，其下一层为对应的输出层。It should be pointed out that the input neurons and output neurons mentioned in the embodiments of the present invention do not refer to the neurons in the input layer and the neurons in the output layer of the entire neural network, but for any adjacent neurons in the neural network There are two layers of neurons, the neurons in the lower layer of the network feedforward operation are the input neurons, and the neurons in the upper layer of the network feedforward operation are the output neurons. Taking the convolutional neural network as an example, suppose a convolutional neural network has L layers, K=1,2,3...L-1, for the Kth layer and the K+1th layer, the Kth layer is called the input Layer, the neurons in this layer are the above-mentioned input neurons, the K+1th layer is called the input layer, and the neurons in this layer are the above-mentioned output neurons, that is, except the top layer, each layer can be used as input layer, and the next layer is the corresponding output layer.

此外，本申请还提供了使用上述计算装置的数据处理方法，具体如图6所示，所述方法包括：In addition, the present application also provides a data processing method using the above-mentioned computing device, specifically as shown in FIG. 6 , the method includes:

步骤S601、所述指令控制单元获取运算指令，将所述运算指令译码为第一微指令和第二微指令，并将所述第一微指令发送给所述压缩单元，将所述第二微指令发送给所述运算单元；Step S601, the instruction control unit acquires an operation instruction, decodes the operation instruction into a first microinstruction and a second microinstruction, sends the first microinstruction to the compression unit, and converts the second microinstruction into sending microinstructions to the computing unit;

步骤S602、所述压缩单元根据所述第一微指令对获取的输入数据进行处理，得到处理后的输入数据；其中，所述输入数据包括至少一个输入神经元和/或至少一个输入数据，所述处理后的输入数据包括处理后的输入神经元和/或处理后的输入数据；Step S602, the compression unit processes the acquired input data according to the first microinstruction to obtain processed input data; wherein, the input data includes at least one input neuron and/or at least one input data, so The processed input data includes processed input neurons and/or processed input data;

步骤S603、所述运算单元根据所述第二微指令对所述处理后的输入数据进行处理，得到运算结果。Step S603, the operation unit processes the processed input data according to the second microinstruction to obtain an operation result.

关于本发明未示出或未描述的部分可具体参见前述所有或者部分实施例中的相关阐述，这里不再赘述。Regarding the unillustrated or undescribed parts of the present invention, reference may be made to the related explanations in all or some of the foregoing embodiments, and details are not repeated here.

此外本申请上述实施例阐述的压缩单元，还可应用到如下芯片装置(也可称为运算电路或计算单元)中，以实现数据的压缩处理，降低电路中数据的传输量以及数据的计算量，从而提升数据处理效率。In addition, the compression unit described in the above-mentioned embodiments of the present application can also be applied to the following chip devices (also referred to as computing circuits or computing units), so as to realize data compression processing and reduce the amount of data transmission and calculation of data in the circuit , so as to improve the efficiency of data processing.

参阅图7a，图7a为一种芯片装置的结构示意图，如图7a所示，该运算电路包括：主处理电路、基本处理电路和分支处理电路。具体的，主处理电路与分支处理电路连接，分支处理电路连接至少一个基本处理电路。Referring to FIG. 7a, FIG. 7a is a schematic structural diagram of a chip device. As shown in FIG. 7a, the arithmetic circuit includes: a main processing circuit, a basic processing circuit and a branch processing circuit. Specifically, the main processing circuit is connected to the branch processing circuit, and the branch processing circuit is connected to at least one basic processing circuit.

该分支处理电路，用于收发主处理电路或基本处理电路的数据。The branch processing circuit is used to send and receive data from the main processing circuit or the basic processing circuit.

参阅图7b，图7b为主处理电路的一种结构示意图，如图7b所示，主处理电路可以包括寄存器和/或片上缓存电路，该主处理电路还可以包括:控制电路、向量运算器电路、ALU(arithmetic and logic unit，算数逻辑电路)电路、累加器电路、DMA(Direct MemoryAccess，直接内存存取)电路等电路，当然在实际应用中，上述主处理电路还可以添加，转换电路(例如矩阵转置电路)、数据重排电路或激活电路等等其他的电路。Referring to Fig. 7b, Fig. 7b is a schematic structural diagram of the main processing circuit, as shown in Fig. 7b, the main processing circuit may include registers and/or on-chip cache circuits, and the main processing circuit may also include: a control circuit, a vector arithmetic unit circuit , ALU (arithmetic and logic unit, arithmetic logic circuit) circuit, accumulator circuit, DMA (Direct Memory Access, direct memory access) circuit and other circuits, of course in practical applications, the above-mentioned main processing circuit can also be added, conversion circuit (for example Matrix transposition circuit), data rearrangement circuit or activation circuit and other circuits.

主处理电路还包括数据发送电路、数据接收电路或接口，该数据发送电路可以集成数据分发电路以及数据广播电路，当然在实际应用中，数据分发电路以及数据广播电路也可以分别设置；在实际应用中上述数据发送电路以及数据接收电路也可以集成在一起形成数据收发电路。对于广播数据，即需要发送给每个基础处理电路的数据。对于分发数据，即需要有选择的发送给部分基础处理电路的数据，具体的选择方式可以由主处理电路依据负载以及计算方式进行具体的确定。对于广播发送方式，即将广播数据以广播形式发送至每个基础处理电路。(在实际应用中，通过一次广播的方式将广播数据发送至每个基础处理电路，也可以通过多次广播的方式将广播数据发送至每个基础处理电路，本申请具体实施方式并不限制上述广播的次数)，对于分发发送方式，即将分发数据有选择的发送给部分基础处理电路。The main processing circuit also includes a data sending circuit, a data receiving circuit or an interface. The data sending circuit can integrate a data distribution circuit and a data broadcast circuit. Of course, in practical applications, the data distribution circuit and the data broadcast circuit can also be set separately; The above-mentioned data sending circuit and data receiving circuit can also be integrated together to form a data sending and receiving circuit. For broadcast data, that is the data that needs to be sent to each underlying processing circuit. For distribution data, that is, data that needs to be selectively sent to some basic processing circuits, the specific selection method can be specifically determined by the main processing circuit according to the load and calculation method. For the broadcast sending mode, the broadcast data is sent to each basic processing circuit in a broadcast form. (In practical applications, the broadcast data is sent to each basic processing circuit through one broadcast, and the broadcast data can also be sent to each basic processing circuit through multiple broadcasts. The specific implementation of the application does not limit the above-mentioned The number of broadcasts), for the distribution transmission method, the distribution data is selectively sent to some basic processing circuits.

在实现分发数据时，主处理电路的控制电路向部分或者全部基础处理电路传输数据(该数据可以相同，也可以不同，具体的，如果采用分发的方式发送数据，各个接收数据的基础处理电路收到的数据可以不同，当然也可以有部分基础处理电路收到的数据相同；When distributing data, the control circuit of the main processing circuit transmits data to some or all of the basic processing circuits (the data can be the same or different. Specifically, if the data is sent in a distributed manner, each basic processing circuit that receives data receives The received data can be different, of course, some basic processing circuits can also receive the same data;

具体地，广播数据时，主处理电路的控制电路向部分或者全部基础处理电路传输数据，各个接收数据的基础处理电路可以收到相同的数据，即广播数据可以包括所有基础处理电路均需要接收到的数据。分发数据可以包括：部分基础处理电路需要接收到的数据。主处理电路可以通过一次或多次广播将该广播数据发送给所有分支处理电路，分支处理电路该广播数据转发给所有的基础处理电路。Specifically, when broadcasting data, the control circuit of the main processing circuit transmits data to some or all of the basic processing circuits, and each basic processing circuit receiving data can receive the same data, that is, the broadcast data can include all basic processing circuits that need to receive The data. The distributed data may include: data that some basic processing circuits need to receive. The main processing circuit may send the broadcast data to all branch processing circuits through one or more broadcasts, and the branch processing circuits forward the broadcast data to all basic processing circuits.

可选的，上述主处理电路的向量运算器电路可以执行向量运算,包括但不限于：两个向量加减乘除，向量与常数加、减、乘、除运算，或者对向量中的每个元素执行任意运算。其中，连续的运算具体可以为，向量与常数加、减、乘、除运算、激活运算、累加运算等等。Optionally, the vector operator circuit of the above-mentioned main processing circuit can perform vector operations, including but not limited to: addition, subtraction, multiplication, and division of two vectors, addition, subtraction, multiplication, and division operations between vectors and constants, or for each element in the vector Perform arbitrary operations. Wherein, the continuous operation may specifically be, vector and constant addition, subtraction, multiplication, division operation, activation operation, accumulation operation and so on.

每个基础处理电路可以包括基础寄存器和/或基础片上缓存电路；每个基础处理电路还可以包括：内积运算器电路、向量运算器电路、累加器电路等中一个或任意组合。上述内积运算器电路、向量运算器电路、累加器电路都可以是集成电路，上述内积运算器电路、向量运算器电路、累加器电路也可以为单独设置的电路。Each basic processing circuit may include a basic register and/or a basic on-chip cache circuit; each basic processing circuit may also include: one or any combination of an inner product operator circuit, a vector operator circuit, an accumulator circuit, and the like. The above-mentioned inner product operator circuit, vector operator circuit, and accumulator circuit can all be integrated circuits, and the above-mentioned inner product operator circuit, vector operator circuit, and accumulator circuit can also be independently arranged circuits.

分支处理电路和基础电路的连接结构可以是任意的，不局限在图7a的H型结构。可选的，主处理电路到基础电路是广播或分发的结构，基础电路到主处理电路是收集(gather)的结构。广播，分发和收集的定义如下：The connection structure between the branch processing circuit and the basic circuit can be arbitrary, and is not limited to the H-shaped structure in FIG. 7a. Optionally, the main processing circuit to the basic circuit is a broadcast or distribution structure, and the basic circuit to the main processing circuit is a gather (gather) structure. Broadcast, dispatch and collect are defined as follows:

所述主处理电路到基础电路的数据传递方式可以包括：The data transfer method from the main processing circuit to the basic circuit may include:

主处理电路与多个分支处理电路分别相连，每个分支处理电路再与多个基础电路分别相连。The main processing circuit is connected to a plurality of branch processing circuits respectively, and each branch processing circuit is connected to a plurality of basic circuits respectively.

主处理电路与一个分支处理电路相连，该分支处理电路再连接一个分支处理电路，依次类推，串联多个分支处理电路，然后，每个分支处理电路再与多个基础电路分别相连。The main processing circuit is connected to a branch processing circuit, the branch processing circuit is connected to a branch processing circuit, and so on, multiple branch processing circuits are connected in series, and then each branch processing circuit is connected to multiple basic circuits respectively.

主处理电路与多个分支处理电路分别相连，每个分支处理电路再串联多个基础电路。The main processing circuit is respectively connected to multiple branch processing circuits, and each branch processing circuit is further connected in series with multiple basic circuits.

主处理电路与一个分支处理电路相连，该分支处理电路再连接一个分支处理电路，依次类推，串联多个分支处理电路，然后，每个分支处理电路再串联多个基础电路。The main processing circuit is connected to a branch processing circuit, and the branch processing circuit is connected to a branch processing circuit, and so on, a plurality of branch processing circuits are connected in series, and then each branch processing circuit is connected in series to a plurality of basic circuits.

分发数据时，主处理电路向部分或者全部基础电路传输数据，各个接收数据的基础电路收到的数据可以不同；When distributing data, the main processing circuit transmits data to some or all of the basic circuits, and the data received by each basic circuit receiving data can be different;

广播数据时，主处理电路向部分或者全部基础电路传输数据，各个接收数据的基础电路收到相同的数据。When broadcasting data, the main processing circuit transmits data to some or all of the basic circuits, and each basic circuit receiving data receives the same data.

收集数据时，部分或全部基础电路向主处理电路传输数据。需要说明的，如图7a所示的芯片装置可以是一个单独的物理芯片，当然在实际应用中，该芯片装置也可以集成在其他的芯片内(例如CPU，GPU)，本申请具体实施方式并不限制上述芯片装置的物理表现形式。As data is collected, some or all of the underlying circuitry transfers the data to the main processing circuitry. It should be noted that the chip device shown in FIG. 7a can be a separate physical chip. Of course, in practical applications, the chip device can also be integrated in other chips (such as CPU, GPU). The physical representation of the above-mentioned chip device is not limited.

参阅图7c，图7c为一种芯片装置的数据分发示意图，如图7c的箭头所示，该箭头为数据的分发方向，如图7c所示，主处理电路接收到外部数据以后，将外部数据拆分以后，分发至多个分支处理电路，分支处理电路将拆分数据发送至基本处理电路。Referring to Figure 7c, Figure 7c is a schematic diagram of data distribution of a chip device, as shown by the arrow in Figure 7c, the arrow is the direction of data distribution, as shown in Figure 7c, after the main processing circuit receives the external data, the external data After splitting, it is distributed to multiple branch processing circuits, and the branch processing circuits send the split data to the basic processing circuit.

参阅图7d，图7d为一种芯片装置的数据回传示意图，如图7d的箭头所示，该箭头为数据的回传方向，如图7d所示，基本处理电路将数据(例如内积计算结果)回传给分支处理电路，分支处理电路在回传至主处理电路。Referring to Fig. 7d, Fig. 7d is a schematic diagram of data return transmission of a chip device, as shown by the arrow in Fig. 7d, the arrow is the direction of data return, as shown in Fig. 7d, the basic processing circuit converts data (such as inner product calculation) The result) is sent back to the branch processing circuit, and the branch processing circuit is sent back to the main processing circuit.

对于输入数据，具体的可以为向量、矩阵、多维(三维或四维及以上)数据，对于输入数据的一个具体的值，可以称为该输入数据的一个元素。The input data may specifically be vector, matrix, or multi-dimensional (three-dimensional or four-dimensional or above) data, and a specific value of the input data may be referred to as an element of the input data.

本披露实施例还提供一种如图7a所示的计算单元的计算方法，该计算方法应用与神经网络计算中，具体的，该计算单元可以用于对多层神经网络中一层或多层的输入数据与权值数据执行运算。The embodiment of the present disclosure also provides a calculation method of a calculation unit as shown in Figure 7a, the calculation method is applied to neural network calculations, specifically, the calculation unit can be used for one or more layers in a multi-layer neural network The input data and the weight data perform operations.

具体的，上述所述计算单元用于对训练的多层神经网络中一层或多层的输入数据与权值数据执行运算；Specifically, the calculation unit described above is used to perform calculations on the input data and weight data of one or more layers of the trained multi-layer neural network;

或所述计算单元用于对正向运算的多层神经网络中一层或多层的输入数据与权值数据执行运算。Or, the calculation unit is used to perform calculations on the input data and weight data of one or more layers of the multi-layer neural network in the forward operation.

上述运算包括但不限于：卷积运算、矩阵乘矩阵运算、矩阵乘向量运算、偏执运算、全连接运算、GEMM运算、GEMV运算、激活运算中的一种或任意组合。The above operations include, but are not limited to: one or any combination of convolution operations, matrix multiplication matrix operations, matrix multiplication vector operations, paranoid operations, full connection operations, GEMM operations, GEMV operations, and activation operations.

GEMM计算是指：BLAS库中的矩阵-矩阵乘法的运算。该运算的通常表示形式为：C＝alpha*op(S)*op(P)+beta*C，其中，S和P为输入的两个矩阵，C 为输出矩阵，alpha和beta为标量，op代表对矩阵S或P的某种操作，此外，还会有一些辅助的整数作为参数来说明矩阵的S和P的宽高；GEMM calculation refers to: the matrix-matrix multiplication operation in the BLAS library. The usual expression of this operation is: C=alpha*op(S)*op(P)+beta*C, where S and P are the two input matrices, C is the output matrix, alpha and beta are scalars, op Represents some kind of operation on the matrix S or P. In addition, there will be some auxiliary integers as parameters to illustrate the width and height of the matrix S and P;

GEMV计算是指：BLAS库中的矩阵-向量乘法的运算。该运算的通常表示形式为：C＝alpha*op(S)*P+beta*C，其中，S为输入矩阵，P为输入的向量， C为输出向量，alpha和beta为标量，op代表对矩阵S的某种操作。GEMV calculation refers to the operation of matrix-vector multiplication in the BLAS library. The usual representation of this operation is: C=alpha*op(S)*P+beta*C, where S is the input matrix, P is the input vector, C is the output vector, alpha and beta are scalars, and op represents the pair Some kind of operation on the matrix S.

在实际应用中，本申请上述压缩单元具体可设计或应用至本申请实施例中以下电路中的任一个或多个中：主处理电路、分支处理电路连接以及基础处理电路。关于所述压缩单元以及所述压缩单元涉及的数据处理具体可参见前述实施例中所述，这里不再赘述。In practical applications, the above-mentioned compression unit of the present application may be specifically designed or applied to any one or more of the following circuits in the embodiments of the present application: main processing circuit, connection of branch processing circuits, and basic processing circuit. For details about the compression unit and the data processing involved in the compression unit, reference may be made to the foregoing embodiments, and details are not repeated here.

申请实施例还提供一种计算机存储介质，其中，该计算机存储介质存储用于电子数据交换的计算机程序，该计算机程序使得计算机执行如上述方法实施例中记载的任何一种数据处理方法的部分或全部步骤。The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to execute part or part of any data processing method as described in the above method embodiments. All steps.

本申请实施例还提供一种计算机程序产品，所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种数据处理方法的部分或全部步骤。The embodiment of the present application also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to enable the computer to execute the method described in the above method embodiments Part or all of the steps of any data processing method.

在一些实施例里，还公开了一种芯片，其包括了上述用于执行数据处理方法所对应的神经网络处理器。In some embodiments, a chip is also disclosed, which includes the neural network processor corresponding to the above-mentioned data processing method.

在一些实施例里，公开了一种芯片封装结构，其包括了上述芯片。In some embodiments, a chip packaging structure is disclosed, which includes the above chip.

在一些实施例里，公开了一种板卡，其包括了上述芯片封装结构。In some embodiments, a board is disclosed, which includes the above-mentioned chip packaging structure.

在一些实施例里，公开了一种电子设备，其包括了上述板卡。In some embodiments, an electronic device including the above board is disclosed.

电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, video cameras, projectors, watches, headphones , mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.

所述交通工具包括飞机、轮船和/或车辆；所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机；所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。Said vehicles include airplanes, ships and/or vehicles; said household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods; said medical equipment includes nuclear magnetic resonance instruments, Ultrasound and/or electrocardiograph.

以上所述的具体实施例，对本发明实施例的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本披露的具体实施例而已，并不用于限制本披露，凡在本披露的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本披露的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the embodiments of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present disclosure, and are not intended to limit this disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this disclosure shall be included in the scope of protection of this disclosure.

Claims

1. A computing device, characterized in that it comprises an arithmetic unit, an instruction control unit, a storage unit and a compression unit;

The instruction control unit is configured to obtain an operation instruction, decode the operation instruction into a first microinstruction and a second microinstruction, send the first microinstruction to the compression unit, and convert the second microinstruction to the compression unit. sending microinstructions to the computing unit;

The storage unit is used to store input data, processed input data, operation instructions and operation results, the input data includes at least one input neuron and/or at least one weight, and the processed input data includes processed post-processing input neurons and/or post-processing weights;

The compression unit is configured to process the input data according to the first microinstruction to obtain the processed input data;

The operation unit is configured to process the processed input data according to the second microinstruction to obtain the operation result.

2. The computing device of claim 1, wherein:

The computing unit is specifically configured to obtain the first cache data and the second cache data, process the first cache data and the second cache data according to the second microinstruction, and obtain the operation result;

Wherein, the first cache data and/or the second cache data are related to the processed input data, and the first cache data and the second cache data are different.

3. The computing device of claim 1, wherein

The compression unit is specifically configured to determine a processing method for the input data according to the first microinstruction, and the processing method includes at least one of the following: pruning processing, quantization processing, and encoding processing;

The compression unit is further configured to perform corresponding processing on the input data according to the processing manner, so as to obtain processed input data.

4. The computing device according to claim 3, wherein the compression unit further comprises a control unit, and the operation state corresponding to the processing mode is modified by the control unit inside the compression unit, so as to realize Data processing in multiple processing modes; the operation state includes at least one of the following: a first state, a second state, a third state, and a fourth state, wherein,

The first state is used to indicate that the compression unit is in an initial state and does not perform any data processing;

The second state is associated with the pruning process, and is used to indicate that the compression unit will perform data pruning process;

The third state is associated with the quantization processing, and is used to indicate that the compression unit will perform data quantization processing;

The fourth state is associated with the encoding process, and is used to indicate that the compression unit will perform data encoding process.

5. The computing device according to claim 3, wherein when the processing method is pruning processing,

The compression unit is specifically configured to delete the input data whose absolute value is greater than a first threshold according to the pruning process, so as to obtain processed input data.

6. The computing device of claim 5, wherein:

The compression unit is specifically configured to delete the input data whose absolute value is greater than the first threshold by using the pruning process according to the positional relationship data, so as to obtain processed input data;

Wherein, the positional relationship data includes any of the following: the positional relationship data of the input neuron, the positional relationship data of the input weight, the positional relationship determined by the positional relationship data of the input neuron and the positional relationship of the input weight location relational data.

7. The computing device according to claim 6, wherein the positional relationship data can be expressed in the form of a direct index or a step index.

8. The computing device according to claim 3, wherein when the processing method is quantization processing,

The compression unit is specifically configured to perform clustering and quantization on the input data according to the quantization processing, so as to obtain processed input data.

9. The computing device according to claim 3, wherein when the processing method is encoding processing,

The compression unit is specifically configured to encode the input data in a preset encoding format according to the encoding process, so as to obtain the processed input data; wherein, the preset encoding format is automatically Defined settings.

10. The computing device according to any one of claims 2-9, wherein the computing device further comprises an instruction cache unit, a direct storage access unit, a first input cache unit, and a second input cache unit;

The instruction cache unit is used to cache the operation instructions;

The direct storage access unit is configured to read and write data between the storage unit and the instruction cache unit, the first input cache unit, the second input cache unit, and the output cache unit;

The instruction cache unit is configured to cache the direct storage access unit to read the neural network instruction;

The first input cache unit is configured to cache the first cache data read by the direct storage access unit;

The second input cache unit is configured to cache the second cache data read by the direct storage access unit, the first cache data and/or the second cache data are related to the processed input data, And the first cache data is different from the second cache data.

11. The computing device according to claim 10, wherein the computing device further comprises an output buffer unit;

The output buffer unit is used for buffering the operation result.

12. A method for processing data using a computing device, wherein the computing device includes an arithmetic unit, an instruction control unit, a storage unit, and a compression unit, and the method includes:

The instruction control unit acquires an operation instruction, decodes the operation instruction into a first microinstruction and a second microinstruction, sends the first microinstruction to the compression unit, and sends the second microinstruction to the compression unit. to the arithmetic unit;

The compression unit processes the acquired input data according to the first microinstruction to obtain processed input data; wherein, the input data includes at least one input neuron and/or at least one input data, and the processed The input data comprises processed input neurons and/or processed input data;

The operation unit processes the processed input data according to the second microinstruction to obtain an operation result.

13. The method according to claim 12, wherein the computing unit processes the processed input data according to the second microinstruction, and obtains a computing result comprising:

The operation unit obtains the first cache data and the second cache data, and processes the first cache data and the second cache data according to the second microinstruction to obtain an operation result;

14. The method according to claim 12 or 13, characterized in that,

The compression unit processes the input data according to the first microinstruction, and the processed input data includes:

The compression unit determines a processing method for the input data according to the first microinstruction; the processing method includes at least one of the following: pruning processing, quantization processing, and encoding processing;

The compression unit performs corresponding processing on the input data according to the processing manner to obtain processed input data.

15. The method according to claim 14, wherein the compression unit further comprises a control unit, and the operation state corresponding to the processing mode is modified by the control unit inside the compression unit, so as to realize multiple Data processing in one processing mode; the operation state includes at least one of the following: the first state, the second state, the third state and the fourth state, wherein,

16. A chip, characterized in that the chip comprises the computing device according to any one of claims 12-15.

17. An electronic device, characterized in that the electronic device comprises the chip according to claim 16.

18. A computer-readable storage medium, wherein the computer storage medium stores a computer program, the computer program includes program instructions, and when executed by a processor, the program instructions cause the processor to execute The method described in any one of requirements 12-15.