[go: up one dir, main page]

CN111353591B - Computing device and related product - Google Patents

Computing device and related product Download PDF

Info

Publication number
CN111353591B
CN111353591B CN201811566331.6A CN201811566331A CN111353591B CN 111353591 B CN111353591 B CN 111353591B CN 201811566331 A CN201811566331 A CN 201811566331A CN 111353591 B CN111353591 B CN 111353591B
Authority
CN
China
Prior art keywords
unit
neural network
instruction
sub
weight matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811566331.6A
Other languages
Chinese (zh)
Other versions
CN111353591A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN201811585964.1A priority Critical patent/CN111353598B/en
Priority to CN201811566331.6A priority patent/CN111353591B/en
Publication of CN111353591A publication Critical patent/CN111353591A/en
Application granted granted Critical
Publication of CN111353591B publication Critical patent/CN111353591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Feedback Control In General (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a computing device and a related product, wherein the computing device comprises a compression unit, an operation unit and a controller unit; the controller unit is used for acquiring a compression request for the first input data and indicating the compression unit to compress the first input data according to the compression request; wherein the first input data comprises a first weight matrix; the compression unit is used for compressing the first weight matrix into a second weight matrix; and the controller unit is also used for executing the neural network calculation according to the second input data and the calculation instruction. According to the application, in the compression process of the neural network, the topology structure of the neural network model can be kept unchanged, so that the occurrence of irregularity of the topology structure of the neural network model is avoided, and the operation amount of the neural network is reduced.

Description

一种计算装置及相关产品A computing device and related products

技术领域Technical Field

本申请涉及信息处理技术领域,具体涉及一种计算装置及相关产品。The present application relates to the field of information processing technology, and in particular to a computing device and related products.

背景技术Background Art

神经网络是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型,这种网络由大量的节点(或称神经元)之间星湖连接构成,通过调整内部大量节点之间相互连接的关系,利用输入神经元数据、权值产生输出数据模拟人脑的信息处理过程处理信息并生成模式识别之后的结果。A neural network is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This network is composed of a large number of nodes (or neurons) that are connected to each other. It simulates the information processing process of the human brain by adjusting the interconnected relationships between a large number of internal nodes and using input neuron data and weights to generate output data to process information and generate results after pattern recognition.

目前,神经网络被广泛应用在计算机视觉的各个领域,如图像识别、物体检测、图像分割等。然而,在实际应用中,神经网络模型往往有着数量庞大的模型参数(例如,超大规模权值),在这种情况下,这意味着神经网络需要大量的计算资源和存储资源,大量的计算资源和存储资源的开销会降低神经网络的运算速度,对硬件的传输带宽以及运算器的要求也大大提高了,因此,如何在减少神经网络模型的参数的同时,降低神经网络的计算量变得十分重要。At present, neural networks are widely used in various fields of computer vision, such as image recognition, object detection, image segmentation, etc. However, in practical applications, neural network models often have a large number of model parameters (for example, ultra-large-scale weights). In this case, this means that the neural network requires a large amount of computing resources and storage resources. The overhead of a large amount of computing resources and storage resources will reduce the computing speed of the neural network, and the requirements for the transmission bandwidth and operator of the hardware will also be greatly increased. Therefore, how to reduce the computational complexity of the neural network while reducing the parameters of the neural network model becomes very important.

现有技术中,通过剪枝方法对神经网络模型的参数进行调整,以减少神经网络模型的参数以及降低神经网络的计算量。以对神经网络的权值进行剪枝为例,如图1A所示,在对神经网络的权值进行剪枝之前,神经网络的拓扑结构是规则的,然而,在对神经网络的权值进行剪枝之后,容易导致神经网络模型中原有的规则的拓扑结构变得不规则。那么,如何避免神经网络模型中的拓扑结构变得不规则是亟需解决的技术问题。In the prior art, the parameters of the neural network model are adjusted by pruning methods to reduce the parameters of the neural network model and reduce the amount of calculation of the neural network. Taking the pruning of the weights of the neural network as an example, as shown in FIG1A , before the weights of the neural network are pruned, the topological structure of the neural network is regular. However, after the weights of the neural network are pruned, the original regular topological structure in the neural network model is easily caused to become irregular. Therefore, how to prevent the topological structure in the neural network model from becoming irregular is a technical problem that needs to be solved urgently.

发明内容Summary of the invention

本申请实施例提供了一种计算装置及相关产品,在神经网络压缩过程中,可以保证神经网络模型的拓扑结构保持不变,从而避免了神经网络模型的拓扑结构出现不规则,减少了神经网络的运算量。The embodiments of the present application provide a computing device and related products, which can ensure that the topological structure of the neural network model remains unchanged during the neural network compression process, thereby avoiding the irregularity of the topological structure of the neural network model and reducing the amount of computation of the neural network.

第一方面,提供一种计算装置,所述计算装置用于执行机器学习模型机器学习计算,所述计算装置包括:压缩单元、运算单元以及控制器单元;In a first aspect, a computing device is provided, the computing device being used to perform machine learning calculations of a machine learning model, the computing device comprising: a compression unit, an operation unit, and a controller unit;

所述控制器单元,用于获取针对第一输入数据的压缩请求,并根据所述压缩请求指示所述压缩单元对所述第一输入数据进行压缩;其中,所述第一输入数据包括第一权值矩阵;The controller unit is configured to obtain a compression request for first input data, and instruct the compression unit to compress the first input data according to the compression request; wherein the first input data includes a first weight matrix;

所述压缩单元,用于将第一权值矩阵压缩为第二权值矩阵;The compression unit is used to compress the first weight matrix into a second weight matrix;

所述控制器单元,还用于获取第二输入数据以及计算指令;所述第二输入数据包括所述第二权值矩阵以及输入神经元数据;The controller unit is further used to obtain second input data and calculation instructions; the second input data includes the second weight matrix and input neuron data;

所述控制器单元,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit is further configured to parse the calculation instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the second input data to the operation unit;

所述运算单元获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The operation unit obtains the operation instruction and performs neural network calculation according to the operation instruction and the second input data.

通过本申请,可以通过压缩单元将第一权值矩阵压缩得到第二权值矩阵,继而可以根据第二权值矩阵以及输入神经元数据执行神经网络计算,解决了现有技术中采用神经网络剪枝算法容易带来的神经网络的拓扑结构出现不规则的情形,可以对神经网络进行深度压缩,可以减少神经网络的计算量,提高运算速度。Through the present application, the first weight matrix can be compressed by a compression unit to obtain a second weight matrix, and then the neural network calculation can be performed according to the second weight matrix and the input neuron data, which solves the problem that the topological structure of the neural network is easily irregular when the neural network pruning algorithm is used in the prior art. The neural network can be deeply compressed, the calculation amount of the neural network can be reduced, and the operation speed can be improved.

第二方面,本申请实施例提供了一种机器学习运算装置,该机器学习运算装置包括一个或者多个第一方面所述的计算装置。该机器学习运算装置用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;In a second aspect, an embodiment of the present application provides a machine learning computing device, which includes one or more computing devices described in the first aspect. The machine learning computing device is used to obtain data to be computed and control information from other processing devices, and perform specified machine learning operations, and transmit the execution results to other processing devices through an I/O interface;

当所述机器学习运算装置包含多个所述计算装置时,所述多个所述计算装置间可以通过特定的结构进行链接并传输数据;When the machine learning computing device includes a plurality of computing devices, the plurality of computing devices may be linked and transmit data through a specific structure;

其中,多个所述计算装置通过PCIE总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述计算装置共享同一控制系统或拥有各自的控制系统;多个所述计算装置共享内存或者拥有各自的内存;多个所述计算装置的互联方式是任意互联拓扑。Among them, multiple computing devices are interconnected and transmit data through a PCIE bus to support larger-scale machine learning operations; multiple computing devices share the same control system or have their own control systems; multiple computing devices share memory or have their own memory; the interconnection method of multiple computing devices is any interconnection topology.

第三方面,本申请实施例提供了一种组合处理装置,该组合处理装置包括如第三方面所述的机器学习处理装置、通用互联接口,和其他处理装置。该机器学习运算装置与上述其他处理装置进行交互,共同完成用户指定的操作。该组合处理装置还可以包括存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。In a third aspect, an embodiment of the present application provides a combined processing device, which includes a machine learning processing device, a universal interconnection interface, and other processing devices as described in the third aspect. The machine learning operation device interacts with the above-mentioned other processing devices to jointly complete the operation specified by the user. The combined processing device may also include a storage device, which is respectively connected to the machine learning operation device and the other processing device, and is used to save data of the machine learning operation device and the other processing device.

第四方面,本申请实施例提供了一种神经网络芯片,该神经网络芯片包括上述第一方面所述的计算装置、上述第二方面所述的机器学习运算装置或者上述第三方面所述的组合处理装置。In a fourth aspect, an embodiment of the present application provides a neural network chip, which includes the computing device described in the first aspect, the machine learning computing device described in the second aspect, or the combined processing device described in the third aspect.

第五方面,本申请实施例提供了一种神经网络芯片封装结构,该神经网络芯片封装结构包括上述第四方面所述的神经网络芯片。In a fifth aspect, an embodiment of the present application provides a neural network chip packaging structure, which includes the neural network chip described in the fourth aspect above.

第六方面,本申请实施例提供了一种板卡,该板卡包括上述第五方面所述的神经网络芯片封装结构。In a sixth aspect, an embodiment of the present application provides a board card, which includes the neural network chip packaging structure described in the fifth aspect above.

第七方面,本申请实施例提供了一种电子装置,该电子装置包括上述第六方面所述的神经网络芯片或者上述第六方面所述的板卡。In a seventh aspect, an embodiment of the present application provides an electronic device, which includes the neural network chip described in the sixth aspect or the board described in the sixth aspect.

第八方面,本申请实施例还提供一种执行机器学习模型的计算方法,所述计算方法应用于计算装置,计算装置用于执行机器学习计算;所述计算装置包括:压缩单元、运算单元以及控制器单元;所述方法包括:In an eighth aspect, an embodiment of the present application further provides a calculation method for executing a machine learning model, wherein the calculation method is applied to a computing device, and the computing device is used to perform machine learning calculations; the computing device includes: a compression unit, a computing unit, and a controller unit; the method includes:

所述控制器单元获取针对第一输入数据的压缩请求,并根据所述压缩请求指示所述压缩单元对所述第一输入数据进行压缩;其中,所述第一输入数据包括第一权值矩阵;The controller unit obtains a compression request for first input data, and instructs the compression unit to compress the first input data according to the compression request; wherein the first input data includes a first weight matrix;

所述压缩单元,用于将第一权值矩阵压缩为第二权值矩阵;The compression unit is used to compress the first weight matrix into a second weight matrix;

所述控制器单元,还用于获取第二输入数据以及计算指令;所述第二输入数据包括所述第二权值矩阵以及输入神经元数据;The controller unit is further used to obtain second input data and calculation instructions; the second input data includes the second weight matrix and input neuron data;

所述控制器单元,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit is further configured to parse the calculation instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the second input data to the operation unit;

所述运算单元,用于获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The operation unit is used to obtain the operation instruction and perform neural network calculation according to the operation instruction and the second input data.

在一些实施例中,所述电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

在一些实施例中,所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the means of transportation include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound equipment and/or electrocardiographs.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1A是本申请实施例提供的一种对神经网络剪枝的操作示意图;FIG1A is a schematic diagram of an operation of pruning a neural network provided by an embodiment of the present application;

图1B是本申请实施例提供的一种计算装置的结构示意图;FIG1B is a schematic diagram of the structure of a computing device provided in an embodiment of the present application;

图2是本申请实施例提供的一种控制单元的结构示意图;FIG2 is a schematic diagram of the structure of a control unit provided in an embodiment of the present application;

图3是本申请实施例提供的一种神经网络运算方法的流程示意图;FIG3 is a flow chart of a neural network operation method provided in an embodiment of the present application;

图4是本申请实施例提供的一种神经网络压缩方法的流程示意图;FIG4 is a schematic diagram of a flow chart of a neural network compression method provided in an embodiment of the present application;

图5A是本申请实施例提供的一种神经网络架构的示意图;FIG5A is a schematic diagram of a neural network architecture provided in an embodiment of the present application;

图5B是本申请实施例提供的一种全连接层权值矩阵的示意图;FIG5B is a schematic diagram of a fully connected layer weight matrix provided in an embodiment of the present application;

图5C是本申请实施例提供的一种对全连接层权值矩阵进行压缩的操作示意图;FIG5C is a schematic diagram of an operation of compressing a fully connected layer weight matrix provided in an embodiment of the present application;

图5D是本申请实施例提供的一种卷积层中卷积核的结构示意图;FIG5D is a schematic diagram of the structure of a convolution kernel in a convolution layer provided in an embodiment of the present application;

图5E是本申请另一实施例提供的一种全连接层权值矩阵的示意图;FIG5E is a schematic diagram of a fully connected layer weight matrix provided in another embodiment of the present application;

图5F是本申请实施例提供的一种对LSTM层进行压缩的操作示意图;FIG5F is a schematic diagram of an operation of compressing an LSTM layer provided in an embodiment of the present application;

图6是本申请实施例提供的另一种计算装置的结构示意图;FIG6 is a schematic diagram of the structure of another computing device provided in an embodiment of the present application;

图7是本申请实施例提供的主处理电路的结构示意图;7 is a schematic diagram of the structure of a main processing circuit provided in an embodiment of the present application;

图8是本申请实施例提供的另一种计算装置的结构示意图;FIG8 is a schematic diagram of the structure of another computing device provided in an embodiment of the present application;

图9是本申请实施例提供的树型模块的结构示意图;FIG9 is a schematic diagram of the structure of a tree module provided in an embodiment of the present application;

图10是本申请实施例提供的又一种计算装置的结构图;FIG10 is a structural diagram of another computing device provided in an embodiment of the present application;

图11是本申请实施例提供的还一种计算装置的结构图;FIG11 is a structural diagram of another computing device provided in an embodiment of the present application;

图12是本申请实施例提供的另一种计算装置的结构图;FIG12 is a structural diagram of another computing device provided in an embodiment of the present application;

图13是本申请实施例提供的一种组合处理装置的结构图;FIG13 is a structural diagram of a combined processing device provided in an embodiment of the present application;

图14是本申请实施例提供的另一种组合处理装置的结构图;FIG14 is a structural diagram of another combined processing device provided in an embodiment of the present application;

图15是本申请实施例提供的一种板卡的结构示意图FIG. 15 is a schematic diagram of the structure of a board provided in an embodiment of the present application.

图16是本申请实施例提供的一种神经网络压缩方法的流程示意图;FIG16 is a flow chart of a neural network compression method provided in an embodiment of the present application;

图17A是本申请实施例提供的一种神经网络压缩装置的结构示意图;FIG17A is a schematic diagram of the structure of a neural network compression device provided in an embodiment of the present application;

图17B是本申请实施例提供的一种压缩单元的结构示意图;FIG17B is a schematic diagram of the structure of a compression unit provided in an embodiment of the present application;

图18是本申请实施例提供的一种电子设备的结构示意图。FIG. 18 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" etc. in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally includes other steps or units inherent to these processes, methods, products or devices.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本申请提供了一种压缩元,用于将第一权值矩阵压缩为第二权值矩阵,解决了现有技术中采用神经网络剪枝算法容易带来的神经网络的拓扑结构出现不规则的情形。在实际应用中,上述压缩单元可以用于神经网络计算中,具体地,用于执行神经网络计算的计算装置中,下面结合图1B所示的计算装置对本发明进行介绍。The present application provides a compression element for compressing a first weight matrix into a second weight matrix, which solves the problem that the topological structure of the neural network is easily irregular when the neural network pruning algorithm is used in the prior art. In practical applications, the above compression unit can be used in neural network calculations, specifically, in a computing device for performing neural network calculations. The present invention is introduced below in conjunction with the computing device shown in FIG1B.

参阅图1B,图1B是本发明实施例提供的一种计算装置的结构示意图,该计算装置用于执行机器学习计算,该计算装置包括:控制器单元11、运算单元12以及压缩单元13,其中,控制器单元11分别与运算单元12以及压缩单元13连接;Referring to FIG. 1B , FIG. 1B is a schematic diagram of the structure of a computing device provided by an embodiment of the present invention, the computing device is used to perform machine learning calculations, and the computing device includes: a controller unit 11, a computing unit 12, and a compression unit 13, wherein the controller unit 11 is connected to the computing unit 12 and the compression unit 13 respectively;

其中,控制器单元11,用于获取针对第一输入数据的压缩请求,并根据所述压缩请求指示所述压缩单元对所述第一输入数据进行压缩;其中,所述第一输入数据包括第一权值矩阵;在一种可选方案中,该压缩请求可以通过数据输入输出单元进行触发的,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚;The controller unit 11 is used to obtain a compression request for first input data, and instruct the compression unit to compress the first input data according to the compression request; wherein the first input data includes a first weight matrix; in an optional solution, the compression request can be triggered by a data input and output unit, and the data input and output unit can specifically be one or more data I/O interfaces or I/O pins;

所述压缩单元13,用于将所述第一权值矩阵压缩为第二权值矩阵;其中,第二权值矩阵中包括至少两个子矩阵;The compression unit 13 is used to compress the first weight matrix into a second weight matrix; wherein the second weight matrix includes at least two sub-matrices;

具体实现中,所述压缩单元13包括分解单元131、求解单元132以及训练单元133。其中,分解单元131,用于将所述第一权值矩阵分解成第三权值矩阵;其中,所述第三权值矩阵包括至少两个子矩阵;求解单元132,用于根据第一公式确定所述至少两个子矩阵中的每个子矩阵的大小,所述第一公式为Q≈Q1*Q2*......*Qn;其中,所述Q表示第一权值矩阵;所述Q1表示所述至少两个子矩阵中的第一子矩阵;所述Q2表示所述至少两个子矩阵中的第二子矩阵;所述Qn表示所述至少两个子矩阵中的第n子矩阵;训练单元133,用于调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵。In a specific implementation, the compression unit 13 includes a decomposition unit 131, a solution unit 132, and a training unit 133. The decomposition unit 131 is used to decompose the first weight matrix into a third weight matrix; wherein the third weight matrix includes at least two sub-matrices; the solution unit 132 is used to determine the size of each sub-matrix in the at least two sub-matrices according to a first formula, and the first formula is Q≈Q 1 *Q 2 *......*Q n ; wherein Q represents the first weight matrix; Q 1 represents the first sub-matrix in the at least two sub-matrices; Q 2 represents the second sub-matrix in the at least two sub-matrices; and Q n represents the nth sub-matrix in the at least two sub-matrices; the training unit 133 is used to adjust the size of each sub-matrix in the at least two sub-matrices, and obtain a second weight matrix that meets the preset accuracy by training the compressed machine learning model.

所述控制器单元11,还用于获取第二输入数据以及计算指令;所述第二输入数据包括第二权值矩阵以及输入神经元数据;在一种可选方案中,具体的,获取第二输入数据以及计算指令方式可以通过数据输入输出单元得到,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚。The controller unit 11 is also used to obtain second input data and calculation instructions; the second input data includes a second weight matrix and input neuron data; in an optional scheme, specifically, the second input data and calculation instructions can be obtained through a data input and output unit, which can specifically be one or more data I/O interfaces or I/O pins.

所述控制器单元11,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit 11 is further used to parse the calculation instruction to obtain multiple operation instructions, and send the multiple operation instructions and the second input data to the operation unit;

所述运算单元12,用于获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The computing unit 12 is used to obtain the computing instruction and perform neural network calculation according to the computing instruction and the second input data.

在其中一个实现方式中,考虑到上述计算装置中设置有“压缩指令”,在这种情况下,控制器单元11,用于获取第一输入数据以及压缩指令;其中,所述第一输入数据包括第一权值矩阵;在一种可选方案中,具体的,获取第一输入数据以及压缩指令方式可以通过数据输入输出单元得到,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚。In one implementation, considering that a "compression instruction" is provided in the above-mentioned computing device, in this case, the controller unit 11 is used to obtain first input data and a compression instruction; wherein, the first input data includes a first weight matrix; in an optional scheme, specifically, the first input data and the compression instruction method can be obtained through a data input-output unit, and the data input-output unit can specifically be one or more data I/O interfaces or I/O pins.

所述控制器单元11,还用于解析该压缩指令得到多个操作指令,将所述多个操作指令以及所述第一权值矩阵发送给压缩单元;The controller unit 11 is further configured to parse the compression instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the first weight matrix to the compression unit;

所述压缩单元13,用于根据所述多个操作指令将所述第一权值矩阵压缩为第二权值矩阵;The compression unit 13 is used to compress the first weight matrix into a second weight matrix according to the multiple operation instructions;

所述控制器单元11,还用于获取第二输入数据以及计算指令;所述第二输入数据包括所述第二权值矩阵以及输入神经元数据;在一种可选方案中,具体的,获取第二输入数据以及计算指令方式可以通过数据输入输出单元得到,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚。The controller unit 11 is also used to obtain second input data and calculation instructions; the second input data includes the second weight matrix and input neuron data; in an optional scheme, specifically, the second input data and calculation instruction method can be obtained through a data input and output unit, and the data input and output unit can specifically be one or more data I/O interfaces or I/O pins.

所述控制器单元11,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit 11 is further used to parse the calculation instruction to obtain multiple operation instructions, and send the multiple operation instructions and the second input data to the operation unit;

所述运算单元12,用于获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The computing unit 12 is used to obtain the computing instruction and perform neural network calculation according to the computing instruction and the second input data.

具体实现中,所述运算单元12包括主处理电路101以及多个从处理电路102,所述主处理电路101,用于对所述第二输入数据执行前序处理以及与所述多个从处理电路之间传输数据以及运算指令;In a specific implementation, the operation unit 12 includes a master processing circuit 101 and a plurality of slave processing circuits 102, wherein the master processing circuit 101 is configured to perform pre-order processing on the second input data and transmit data and operation instructions between the plurality of slave processing circuits;

多个从处理电路102,用于依据从所述主处理电路传输的数据以及运算指令并行执行中间运算得到多个中间结果,并将多个中间结果传输给所述主处理电路;A plurality of slave processing circuits 102, configured to perform intermediate operations in parallel according to data and operation instructions transmitted from the master processing circuit to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master processing circuit;

主处理电路101,用于对所述多个中间结果执行后续处理得到所述计算指令的计算结果。The main processing circuit 101 is used to perform subsequent processing on the multiple intermediate results to obtain the calculation result of the calculation instruction.

可选的,上述第二输入数据具体可以包括:第二权值矩阵以及输入神经元数据。上述计算结果具体可以为:神经网络运算的结果即输出神经元数据。Optionally, the second input data may specifically include: a second weight matrix and input neuron data. The calculation result may specifically be: the result of the neural network operation, that is, the output neuron data.

在其中一个实施方式中,上述计算装置还可以包括:该存储单元10和直接内存访问单元50,存储单元10可以包括:寄存器、缓存中的一个或任意组合,具体的,所述缓存,用于存储所述计算指令;所述寄存器,用于存储所述输入数据和标量;所述缓存为高速暂存缓存。直接内存访问单元50用于从存储单元10读取或存储数据。In one embodiment, the computing device may further include: the storage unit 10 and a direct memory access unit 50. The storage unit 10 may include: one or any combination of a register and a cache. Specifically, the cache is used to store the computing instructions; the register is used to store the input data and scalars; the cache is a high-speed temporary cache. The direct memory access unit 50 is used to read or store data from the storage unit 10.

本申请实施例中,如图2所示,该控制器单元11包括:指令缓存单元110、指令处理单元111、依赖关系处理单元112以及存储队列单元113;In the embodiment of the present application, as shown in FIG2 , the controller unit 11 includes: an instruction cache unit 110 , an instruction processing unit 111 , a dependency processing unit 112 , and a storage queue unit 113 ;

指令缓存单元110,用于存储所述人工神经网络运算关联的计算指令,在第零计算指令在被执行的过程中,同时将未被提交执行的其他指令缓存在所述指令缓存单元110中,当所述第零计算指令执行完之后,如果第一计算指令是指令缓存单元110中未被提交指令中最早的一条指令,则所述第一计算指令将被提交,一旦提交,该指令进行的操作对装置状态的改变将无法撤销;The instruction cache unit 110 is used to store the computing instructions associated with the artificial neural network operation. When the zeroth computing instruction is being executed, other instructions that have not been submitted for execution are cached in the instruction cache unit 110. After the zeroth computing instruction is executed, if the first computing instruction is the earliest instruction among the instructions that have not been submitted in the instruction cache unit 110, the first computing instruction will be submitted. Once submitted, the change of the device state caused by the operation performed by the instruction cannot be undone.

所述指令处理单元111,用于从所述指令缓存单元获取所述计算指令,并对所述计算指令解析得到多个操作指令;The instruction processing unit 111 is used to obtain the calculation instruction from the instruction cache unit, and parse the calculation instruction to obtain multiple operation instructions;

所述依赖关系处理单元112,用于在具有多个操作指令时,确定第一操作指令与所述第一操作指令之前的第零操作指令是否存在关联关系,如所述第一操作指令与所述第零操作指令存在关联关系,则将所述第一操作指令存储到存储队列单元113内,在所述第零操作指令执行完毕后,所述第一操作指令与所述第零操作指令的关联关系解除,则从所述存储队列单元113中提取所述第一操作指令传输至所述运算单元;The dependency processing unit 112 is used to determine whether a first operation instruction has an association relationship with a zeroth operation instruction before the first operation instruction when there are multiple operation instructions. If the first operation instruction has an association relationship with the zeroth operation instruction, the first operation instruction is stored in the storage queue unit 113. After the zeroth operation instruction is executed, the association relationship between the first operation instruction and the zeroth operation instruction is released, and the first operation instruction is extracted from the storage queue unit 113 and transmitted to the operation unit.

所述确定该第一操作指令与第一操作指令之前的第零操作指令是否存在关联关系包括:The determining whether the first operation instruction has an association relationship with the zeroth operation instruction before the first operation instruction comprises:

依据所述第一操作指令提取所述第一操作指令中所需数据(例如矩阵)的第一存储地址区间,依据所述第零操作指令提取所述第零操作指令中所需矩阵的第零存储地址区间,如所述第一存储地址区间与所述第零存储地址区间具有重叠的区域,则确定所述第一操作指令与所述第零操作指令具有关联关系,如所述第一存储地址区间与所述第零存储地址区间不具有重叠的区域,则确定所述第一操作指令与所述第零操作指令不具有关联关系。The first storage address interval of the data (e.g., a matrix) required in the first operation instruction is extracted according to the first operation instruction, and the zeroth storage address interval of the matrix required in the zeroth operation instruction is extracted according to the zeroth operation instruction. If the first storage address interval and the zeroth storage address interval have an overlapping area, it is determined that the first operation instruction and the zeroth operation instruction have an associated relationship. If the first storage address interval and the zeroth storage address interval do not have an overlapping area, it is determined that the first operation instruction and the zeroth operation instruction have no associated relationship.

存储队列单元113,用于存储指令队列,该指令队列包括:按该队列的前后顺序待执行的多个操作指令或计算指令。The storage queue unit 113 is used to store an instruction queue, which includes: a plurality of operation instructions or calculation instructions to be executed in the order of the queue.

本申请实施例中,如图2所示,所述指令处理单元111包括取指模块、译码模块以及指令队列,其中,所述取指模块,用于从所述指令缓存单元110中获取神经网络的计算指令;所述译码模块用于对所述取指模块获取的计算指令进行译码,得到神经网络的操作指令;所述指令队列用于对译码后得到的操作指令,按照待执行的先后顺序进行顺序存储。In an embodiment of the present application, as shown in FIG2 , the instruction processing unit 111 includes an instruction fetch module, a decoding module and an instruction queue, wherein the instruction fetch module is used to obtain computing instructions of the neural network from the instruction cache unit 110; the decoding module is used to decode the computing instructions obtained by the instruction fetch module to obtain operating instructions of the neural network; the instruction queue is used to sequentially store the operating instructions obtained after decoding in the order to be executed.

举例说明,在一个可选的技术方案中,主运算处理电路也可以包括一个控制器单元,该控制器单元可以包括主指令处理单元,具体用于将指令译码成微指令。当然在另一种可选方案中,从运算处理电路也可以包括另一个控制器单元,该另一个控制器单元包括从指令处理单元,具体用于接收并处理微指令。上述微指令可以为指令的下一级指令,该微指令可以通过对指令的拆分或解码后获得,能被进一步解码为各部件、各单元或各处理电路的控制信号。For example, in an optional technical solution, the main operation processing circuit may also include a controller unit, which may include a main instruction processing unit, specifically for decoding instructions into microinstructions. Of course, in another optional solution, the slave operation processing circuit may also include another controller unit, which includes a slave instruction processing unit, specifically for receiving and processing microinstructions. The above-mentioned microinstructions may be the next level instructions of the instructions, which may be obtained by splitting or decoding the instructions, and can be further decoded into control signals for each component, each unit or each processing circuit.

在一种可选方案中,该计算指令的结构可以如下表所示。In an optional solution, the structure of the calculation instruction may be as shown in the following table.

表1Table 1

操作码Operation Code 寄存器或立即数Register or immediate value 寄存器/立即数Register/Immediate ......

上表中的省略号表示可以包括多个寄存器或立即数。The ellipsis in the above table indicates that multiple registers or immediate values can be included.

在另一种可选方案中,该计算指令可以包括:一个或多个操作域以及一个操作码。该计算指令可以包括神经网络运算指令,也可以包括上述所涉及的压缩指令。以神经网络运算指令为例,如表1所示,其中,寄存器号0、寄存器号1、寄存器号2、寄存器号3、寄存器号4可以为操作域。其中,每个寄存器号0、寄存器号1、寄存器号2、寄存器号3、寄存器号4可以是一个或者多个寄存器的号码。In another optional solution, the computing instruction may include: one or more operation domains and an operation code. The computing instruction may include a neural network operation instruction, and may also include the compression instruction involved above. Taking the neural network operation instruction as an example, as shown in Table 1, register number 0, register number 1, register number 2, register number 3, and register number 4 may be operation domains. Each register number 0, register number 1, register number 2, register number 3, and register number 4 may be the number of one or more registers.

表2Table 2

上述寄存器可以为片外存储器,当然在实际应用中,也可以为片内存储器,用于存储数据,该数据具体可以为n维数据,n为大于等于1的整数,例如,n=1时,为1维数据,即向量,如n=2时,为2维数据,即矩阵,如n=3或3以上时,为多维张量。The above-mentioned registers can be off-chip memories, and of course in practical applications, they can also be on-chip memories for storing data. The data can specifically be n-dimensional data, where n is an integer greater than or equal to 1. For example, when n=1, it is 1-dimensional data, i.e., a vector; when n=2, it is 2-dimensional data, i.e., a matrix; and when n=3 or more, it is a multi-dimensional tensor.

在本发明实施例中,所述计算装置执行所述神经网络运算的过程如图3所示,包括:In the embodiment of the present invention, the process of the computing device executing the neural network operation is shown in FIG3 , and includes:

步骤S1、控制器单元接收压缩指令,将压缩指令译码解析为多个操作指令,并将多个操作指令发送给压缩单元。Step S1: The controller unit receives a compression instruction, decodes the compression instruction into a plurality of operation instructions, and sends the plurality of operation instructions to the compression unit.

控制器单元从存储单元读取压缩指令之后,将压缩指令解析为操作指令,并将所述操作指令发送至压缩单元。具体的,控制器单元11中指令处理单元111的取指模块从指令缓存单元110中获取压缩指令,并将该指令传送至译码模块,所述译码模块对所述压缩指令进行译码,得到操作指令,并将所述操作指令根据预设指令规则拆分为操作码和各个不同的操作域,其中,操作码和操作域的组成与作用可参照前文,在此不再赘述。所述译码模块将译码后得到的操作指令传送至指令队列中进行顺序存储,在所述指令队列中,根据所述运操作指令的操作码和操作与获取该指令对应的待处理数据的数据地址,并将所述数据地址传送至依赖关系处理单元112中,依赖关系处理单元分析该指令与正在执行的指令是否存在关联关系,若存在,则将该操作指令存储到存储队列单元113中直至所述关联关系解除,若不存在关联关系,则将该操作指令发送至压缩单元中执行对应的操作。After the controller unit reads the compression instruction from the storage unit, it parses the compression instruction into an operation instruction and sends the operation instruction to the compression unit. Specifically, the instruction fetch module of the instruction processing unit 111 in the controller unit 11 obtains the compression instruction from the instruction cache unit 110 and transmits the instruction to the decoding module. The decoding module decodes the compression instruction to obtain the operation instruction, and splits the operation instruction into an operation code and various operation domains according to the preset instruction rule, wherein the composition and function of the operation code and the operation domain can be referred to the foregoing, and will not be repeated here. The decoding module transmits the operation instruction obtained after decoding to the instruction queue for sequential storage. In the instruction queue, the data address of the to-be-processed data corresponding to the instruction is obtained according to the operation code and operation of the operation instruction, and the data address is transmitted to the dependency processing unit 112. The dependency processing unit analyzes whether there is an association relationship between the instruction and the instruction being executed. If there is, the operation instruction is stored in the storage queue unit 113 until the association relationship is resolved. If there is no association relationship, the operation instruction is sent to the compression unit to perform the corresponding operation.

S2、压缩单元接收控制单元发送的操作指令,并根据从存储单元中读取的第一权值矩阵进行压缩处理,以得到满足预设精度的第二权值矩阵。S2. The compression unit receives the operation instruction sent by the control unit, and performs compression processing according to the first weight matrix read from the storage unit to obtain a second weight matrix that meets a preset accuracy.

下面结合图4所示的本发明实施例提供的神经网络压缩方法的流程示意图,具体阐述本发明实施例是如何实现针对第一权值矩阵的压缩,以得到第二权值矩阵的,可以包括但不限于如下步骤:In conjunction with the flowchart of the neural network compression method provided by the embodiment of the present invention shown in FIG. 4 , the following specifically describes how the embodiment of the present invention implements compression of the first weight matrix to obtain the second weight matrix, which may include but is not limited to the following steps:

步骤S21、将所述第一权值矩阵分解成第三权值矩阵;其中,所述第三权值矩阵包括至少两个子矩阵。Step S21: decompose the first weight matrix into a third weight matrix; wherein the third weight matrix includes at least two sub-matrices.

具体实现中,第一权值矩阵中的权值数据可以为任意实数。这里,权值数据是指神经网络层与层之间的连接值,也即神经元之间的信息传递强度。In a specific implementation, the weight data in the first weight matrix can be any real number. Here, the weight data refers to the connection value between the neural network layers, that is, the information transmission strength between neurons.

在其中一个实施方式中,第三权值矩阵中包括两个子矩阵,这两个子矩阵中的每个子矩阵均包括压缩参数K。这里,压缩参数K为未知数,也即,在对第一权值矩阵进行分解时,可以确定第一权值矩阵可以分解为两个子矩阵,但是不确定这两个子矩阵的每个子矩阵的大小规模。In one embodiment, the third weight matrix includes two sub-matrices, and each of the two sub-matrices includes a compression parameter K. Here, the compression parameter K is an unknown number, that is, when the first weight matrix is decomposed, it can be determined that the first weight matrix can be decomposed into two sub-matrices, but the size of each of the two sub-matrices is not determined.

在其中另一个实施方式中,第三权值矩阵中的子矩阵的数量为n个,这里,n为大于2的正整数。这n个子矩阵中包括的压缩参数K的数量为(n-1)个。以将第一权值矩阵分为三个子矩阵为例,待求解的压缩参数K可以包括K1以及K2。In another embodiment, the number of sub-matrices in the third weight matrix is n, where n is a positive integer greater than 2. The number of compression parameters K included in the n sub-matrices is (n-1). Taking the example of dividing the first weight matrix into three sub-matrices, the compression parameters K to be solved may include K1 and K2.

步骤S22、根据第一公式确定所述至少两个子矩阵中的每个子矩阵的大小,所述第一公式为Q≈Q1*Q2*......*Qn;其中,所述Q表示第一权值矩阵;所述Q1表示所述至少两个子矩阵中的第一子矩阵;所述Q2表示所述至少两个子矩阵中的第二子矩阵;所述Qn表示所述至少两个子矩阵中的第n子矩阵。Step S22: determine the size of each submatrix in the at least two submatrices according to a first formula, wherein the first formula is Q≈Q 1 *Q 2 *......*Q n ; wherein Q represents a first weight matrix; Q 1 represents a first submatrix in the at least two submatrices; Q 2 represents a second submatrix in the at least two submatrices; and Q n represents an nth submatrix in the at least two submatrices.

具体实现中,第一公式中的运算符号“*”表示矩阵的乘法运算。In a specific implementation, the operation symbol “*” in the first formula represents a matrix multiplication operation.

在其中一个实施方式中,当第三权值矩阵中包括两个子矩阵时,第一公式可以表示为:In one implementation manner, when the third weight matrix includes two sub-matrices, the first formula can be expressed as:

Q≈Q1*Q2 (1.1)Q≈Q 1 *Q 2 (1.1)

在其中另一个实施方式中,当第三权值矩阵中包括至少两个子矩阵时,第一公式可以表示为:In another embodiment, when the third weight matrix includes at least two sub-matrices, the first formula can be expressed as:

Q≈Q1*Q2*......*Qn (1.2)Q≈Q 1 *Q 2 *......*Q n (1.2)

上述公式(1.2)中,n为大于2的正整数。In the above formula (1.2), n is a positive integer greater than 2.

具体实现中,根据所述第一公式和第二公式确定所述至少两个子矩阵中的每个子矩阵的大小,所述第二公式为||Q-Q1*Q2*......*Qn||≤T,其中,所述T表示预设的误差阈值。In a specific implementation, the size of each submatrix in the at least two submatrices is determined according to the first formula and the second formula, and the second formula is ||QQ 1 *Q 2 *......*Q n ||≤T, where T represents a preset error threshold.

具体实现中,这里所涉及的预设的误差阈值可以为5%-10%之间。可以理解的是,设置的预设的误差阈值越小,根据第一公式以及第二公式确定的至少两个子矩阵可以更好的表示第一权值矩阵的属性特征。In a specific implementation, the preset error threshold involved here may be between 5% and 10%. It is understandable that the smaller the preset error threshold is, the better the at least two sub-matrices determined according to the first formula and the second formula can represent the attribute characteristics of the first weight matrix.

步骤S23、调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵。Step S23: adjust the size of each submatrix of the at least two submatrices, and obtain a second weight matrix that meets a preset accuracy by training the compressed machine learning model.

具体实现中,调整至少两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数K值的动态变化过程,以寻找最佳的压缩参数K。随着压缩参数K值发生变化,针对第一权值矩阵与第二权值矩阵之间的压缩比也会发生变化。In a specific implementation, the process of adjusting the size of each of the at least two sub-matrices is actually a dynamic change process of the compression parameter K value to find the optimal compression parameter K. As the compression parameter K value changes, the compression ratio between the first weight matrix and the second weight matrix will also change.

以语音识别的应用场景为例,在某一段单词序列中,可能存在一些单词被错误地插入、删除或替换的情况。例如,对于包含N个单词的一段初始识别文字而言,如果有I个单词背插入、D个单词被删除以及E个文字被替换,那么,词错误率WER(Word Error Rate,WER)为:Taking the application scenario of speech recognition as an example, in a certain sequence of words, some words may be inserted, deleted or replaced by mistake. For example, for a segment of initial recognition text containing N words, if I words are inserted, D words are deleted and E words are replaced, then the word error rate WER (Word Error Rate, WER) is:

WER=(I+D+E)/N (1.3)WER=(I+D+E)/N (1.3)

其中,错误率WER通常用百分比表示。Among them, the error rate WER is usually expressed as a percentage.

在采用神经网络模型识别该段单词序列时,可以得到该段单词序列的词错误率的检测精度。在本发明实施例中,这里所涉及的预设精度为压缩前的神经网络模型针对词错误率WER的检测精度。例如,该预设精度为70%。在一般情况下,压缩后的神经网络的错误率WER会变大,这意味着压缩后的神经网络的精度会变差。When the neural network model is used to recognize the word sequence, the detection accuracy of the word error rate of the word sequence can be obtained. In the embodiment of the present invention, the preset accuracy involved here is the detection accuracy of the neural network model for the word error rate WER before compression. For example, the preset accuracy is 70%. In general, the error rate WER of the compressed neural network will increase, which means that the accuracy of the compressed neural network will deteriorate.

在发明实施例中,通过测量不同压缩比(压缩参数K值不同)所对应的神经网络模型的词错误率的检测精度,以得到满足预设精度的第二权值矩阵。In an embodiment of the invention, the detection accuracy of the word error rate of the neural network model corresponding to different compression ratios (different values of the compression parameter K) is measured to obtain a second weight matrix that meets the preset accuracy.

在一种优选的实施方式中,所述训练单元,用于调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵,包括:In a preferred embodiment, the training unit is used to adjust the size of each submatrix of the at least two submatrices, and obtain a second weight matrix that meets a preset accuracy by training the compressed machine learning model, including:

所述训练单元,具体用于调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度并且与所述第一权值矩阵之间的压缩比满足预设压缩比的第二权值矩阵。The training unit is specifically used to adjust the size of each submatrix of the at least two submatrices, and to obtain a second weight matrix that meets a preset accuracy and has a compression ratio that meets a preset compression ratio with the first weight matrix by training the compressed machine learning model.

可以理解的是,在该实施方式中,当前状态下的压缩参数K值不仅可以使得神经网络模型获得最优的压缩效果,还可以使得该压缩后的神经网络模型在检测词错误率WER时满足预设精度。在神经网络模型处于最优的压缩效果时,可以进一步地减少神经网络的运算量。It can be understood that, in this embodiment, the compression parameter K value in the current state can not only enable the neural network model to obtain the best compression effect, but also enable the compressed neural network model to meet the preset accuracy when detecting the word error rate WER. When the neural network model is in the best compression effect, the amount of computation of the neural network can be further reduced.

以神经网络的全连接层为例,全连接层是指对n-1层和n层而言,n-1层的任意一个节点,都和n层的所有节点有连接。具体地,参见图5A,是本发明实施例提供的一种神经网络的一维全连接层的结构示意图,如图5A所示,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间的这一全连接层的二维参数矩阵为(3,4),该二维参数矩阵(3,4)表示在输入层到隐含层之间的全连接层结构中,输入神经元的个数为3,输出神经元的个数为4,权值数量为12。具体实现中,这12个权值可以表示为4行3列的权值矩阵,其权值矩阵的表现形式可以如图5B所示。Taking the fully connected layer of a neural network as an example, the fully connected layer means that for the n-1 layer and the n layer, any node in the n-1 layer is connected to all the nodes in the n layer. Specifically, referring to FIG5A, it is a schematic diagram of the structure of a one-dimensional fully connected layer of a neural network provided in an embodiment of the present invention. As shown in FIG5A, the neural network includes an input layer, a hidden layer, and an output layer, wherein the two-dimensional parameter matrix of the fully connected layer between the input layer and the hidden layer is (3,4), and the two-dimensional parameter matrix (3,4) indicates that in the fully connected layer structure between the input layer and the hidden layer, the number of input neurons is 3, the number of output neurons is 4, and the number of weights is 12. In a specific implementation, these 12 weights can be represented as a weight matrix of 4 rows and 3 columns, and the expression of the weight matrix can be shown in FIG5B.

在全连接层神经网络中,所述第一公式包括:M≈M1*M2;所述两个子矩阵指包括第一子矩阵M1和第二子矩阵M2,所述M1为Nin*K矩阵,所述M2为K*Nout矩阵;其中,K为压缩参数,Nin为所述神经网络的输入神经元的个数,Nout为所述神经网络的输出神经元的个数;所述压缩参数用于表征所述M1的输出神经元的个数以及所述M2的输入神经元的个数,所述K为大于0且小于等于min(Nin,Nout)的正整数。In a fully connected layer neural network, the first formula includes: M≈M 1 *M 2 ; the two sub-matrices refer to a first sub-matrix M1 and a second sub-matrix M2, the M1 is an N in *K matrix, and the M2 is a K*N out matrix; wherein K is a compression parameter, N in is the number of input neurons of the neural network, and N out is the number of output neurons of the neural network; the compression parameter is used to characterize the number of output neurons of the M1 and the number of input neurons of the M2, and K is a positive integer greater than 0 and less than or equal to min(N in , N out ).

如前所述,调整两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数K值的动态变化过程,以寻找最佳的压缩参数K。在实际应用中,可以采用二分查找的方式来确定全连接层神经网络中的压缩参数K值,从而得到满足预设精度的第二权值矩阵。在其中一个实施方式中,利用二分查找方式确定的压缩参数K可以使得第二权值矩阵满足预设精度。在其中另一个实施方式中,利用二分查找方式确定的压缩参数K可以使得第二权值矩阵满足预设精度的同时,第一权值矩阵与第二权值矩阵的压缩比满足预设压缩比,也即,针对该神经网络模型的压缩获得较优的压缩效果。As mentioned above, the process of adjusting the size of each of the two sub-matrices is actually a dynamic change process of the compression parameter K value to find the optimal compression parameter K. In practical applications, a binary search method can be used to determine the compression parameter K value in the fully connected layer neural network, so as to obtain a second weight matrix that meets the preset accuracy. In one embodiment, the compression parameter K determined by the binary search method can make the second weight matrix meet the preset accuracy. In another embodiment, the compression parameter K determined by the binary search method can make the second weight matrix meet the preset accuracy while the compression ratio of the first weight matrix to the second weight matrix meets the preset compression ratio, that is, a better compression effect is obtained for the compression of the neural network model.

具体实现中,压缩参数K值不同,也即基于多个不同压缩比对第一权值矩阵进行压缩,这里,在全连接层神经网络中,压缩比为 In the specific implementation, the compression parameter K has different values, that is, the first weight matrix is compressed based on multiple different compression ratios. Here, in the fully connected layer neural network, the compression ratio is

接下来具体阐述如何采用二分查找的方式来确定压缩参数K值。首先,设定两个参数KL和KR。初始化情况下,令KL=1,KR=min(Nin,Nout)。在调整参数过程中K=(KL+KR)/2。如果M1*M2表示的第二权值矩阵导致压缩后的神经网络模型的精度下降X%(这里,X=1~10等等)时,调整参数KL,使得KL=K。如果M1*M2表示的第二权值矩阵导致压缩后的神经网络模型满足预设精度,那么,调整KR,使得KR=K,重复执行上述步骤,直至满足结束条件K=KL或者K=KR。Next, we will explain in detail how to use binary search to determine the compression parameter K value. First, set two parameters KL and KR. In the initialization, let KL = 1, KR = min (N in , N out ). In the process of adjusting the parameters, K = (KL + KR) / 2. If the second weight matrix represented by M1 * M2 causes the accuracy of the compressed neural network model to decrease by X% (here, X = 1 to 10, etc.), adjust the parameter KL so that KL = K. If the second weight matrix represented by M1 * M2 causes the compressed neural network model to meet the preset accuracy, then adjust KR so that KR = K, and repeat the above steps until the end condition K = KL or K = KR is met.

以图5A中输入层到隐含层之间的这一全连接层为例,压缩参数K值为大于0且小于等于3的正整数。通过上述二分查找的方式确定压缩参数K=2,也即,满足预设精度的第二权值矩阵中的第一子矩阵M1为(3,2)矩阵,第二子矩阵M2为(2,4)矩阵。具体地,针对图5A中输入层到隐含层之间的这一全连接层的压缩可以如图5C所示。Taking the fully connected layer between the input layer and the hidden layer in FIG5A as an example, the compression parameter K value is a positive integer greater than 0 and less than or equal to 3. The compression parameter K=2 is determined by the above binary search method, that is, the first submatrix M1 in the second weight matrix that meets the preset accuracy is a (3,2) matrix, and the second submatrix M2 is a (2,4) matrix. Specifically, the compression of the fully connected layer between the input layer and the hidden layer in FIG5A can be shown in FIG5C.

在其中一个实施方式中,当第一公式的表现形式如公式(1.2)所示时,也即第三权值矩阵中的子矩阵的数量为n个,这里,n为大于2的正整数,此时,压缩参数K的数量为(n-1)个,可以表示为K1,K2,……,Kn-1。在实际应用中,可以采用自适应算法(例如,遗传算法)来确定全连接层神经网络中的(n-1)个压缩参数K值,从而得到满足预设精度和/或满足压缩效果的第二权值矩阵。接下来具体阐述是如何采用遗传算法来确定全连接层神经网络中的(n-1)个压缩参数K值的:In one embodiment, when the first formula is expressed as shown in formula (1.2), that is, the number of sub-matrices in the third weight matrix is n, where n is a positive integer greater than 2. At this time, the number of compression parameters K is (n-1), which can be expressed as K1, K2, ..., Kn -1 . In practical applications, an adaptive algorithm (for example, a genetic algorithm) can be used to determine the (n-1) compression parameter K values in the fully connected layer neural network, so as to obtain a second weight matrix that meets the preset accuracy and/or meets the compression effect. The following specifically describes how to use a genetic algorithm to determine the (n-1) compression parameter K values in the fully connected layer neural network:

步骤1:随机产生种群:设定种群的规模为P个,设置最大迭代次数Tmax,例如,Tmax=100。在初始状态下,设置迭代次数计数器t=0;交叉概率Pc=A(例如,A=0.4),变异概率Pm=B(例如,B=0.6),种群的矩阵每一行表示一个基因串个体,每一列表示个体的数目;这里,每一个个体是一组关于压缩参数K(例如,Kj)值的解;Step 1: Randomly generate a population: Set the size of the population to P, set the maximum number of iterations T max , for example, T max = 100. In the initial state, set the iteration counter t = 0; crossover probability Pc = A (for example, A = 0.4), mutation probability Pm = B (for example, B = 0.6), each row of the population matrix represents a gene string individual, and each column represents the number of individuals; here, each individual is a set of solutions about the value of the compression parameter K (for example, K j );

步骤2:计算种群中每个个体的适应度;这里,适应度是指该个体对应的第一权值矩阵与第二权值矩阵的压缩比和/或精度,其中,压缩比用于表征针对神经网络的压缩效果。Step 2: Calculate the fitness of each individual in the population; here, fitness refers to the compression ratio and/or accuracy of the first weight matrix and the second weight matrix corresponding to the individual, where the compression ratio is used to characterize the compression effect on the neural network.

步骤3:将选择算子作用于种群,把优化的个体直接遗传到下一代;Step 3: Apply the selection operator to the population and directly pass the optimized individuals to the next generation;

步骤4:将交叉算子中作用于种群,对于任意两个个体,随机产生若干基因串的位置点,交换两个个体该位置上的值;Step 4: Apply the crossover operator to the population. For any two individuals, randomly generate several gene string position points and exchange the values of the two individuals at that position.

步骤5:将变异算子作用于种群,对于任意个体,随机产生若干基因串的位置,然后改动这些位置上的值;这里,变异是指随机改变Kj的值;Step 5: Apply the mutation operator to the population. For any individual, randomly generate the positions of several gene strings and then change the values at these positions. Here, mutation refers to randomly changing the value of Kj .

步骤6:保留每一代中适应度最高的个体,进入下一代;Step 6: Keep the individuals with the highest fitness in each generation and enter the next generation;

步骤7:判断是否达到最大迭代次数Tmax,若t=Tmax,则输出具有最大适应度的个体,终止计算;否则,跳到步骤2继续执行。Step 7: Determine whether the maximum number of iterations T max is reached. If t=T max , output the individual with the maximum fitness and terminate the calculation; otherwise, jump to step 2 to continue.

从而可以根据上述遗传算法来确定全连接层神经网络中的(n-1)个压缩参数K值。Therefore, the (n-1) compression parameter K values in the fully connected layer neural network can be determined according to the above genetic algorithm.

以神经网络的卷积层为例,如图5D所示,卷积层可以认为是一个四维矩阵(Nfin,Nfout,Kx,Ky),其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,(Kx,Ky)为卷积层中卷积核的大小。Taking the convolution layer of a neural network as an example, as shown in Figure 5D, the convolution layer can be considered as a four-dimensional matrix (N fin , N fout , K x , Ky ), where N fin is the number of input feature images, N fout is the number of output feature images, and (K x , Ky ) is the size of the convolution kernel in the convolution layer.

在卷积层神经网络中,所述卷积层神经网络包括Nfin*Nfout个卷积核;所述第一公式包括:F≈F1*F2;其中,F表示所述Nfin*Nfout个卷积核中的任意一个卷积核;所述F1为第一子卷积核;所述F2为第二子卷积核;所述第一子卷积核F1为(Kx,R),所述第二子卷积核F2为(R,Ky),(Kx,Ky)表示卷积核的大小,R为压缩参数,所述R为大于0且小于等于min(Kx,Ky)的正整数。In a convolutional layer neural network, the convolutional layer neural network includes N fin *N fout convolution kernels; the first formula includes: F≈F 1 *F 2 ; wherein F represents any one of the N fin *N fout convolution kernels; the F1 is the first sub-convolution kernel; the F2 is the second sub-convolution kernel; the first sub-convolution kernel F1 is (K x , R), the second sub-convolution kernel F2 is (R, Ky ), (K x , Ky ) represents the size of the convolution kernel, R is a compression parameter, and R is a positive integer greater than 0 and less than or equal to min(K x , Ky ).

如前所述,调整两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数R值的动态变化过程,以寻找最佳的压缩参数R。在实际应用中,可以采用二分查找的方式来确定卷积层神经网络中的压缩参数R值,从而得到满足预设精度的第二权值矩阵。As mentioned above, the process of adjusting the size of each of the two sub-matrices is actually a dynamic change process of the compression parameter R value to find the best compression parameter R. In practical applications, a binary search method can be used to determine the compression parameter R value in the convolutional layer neural network, so as to obtain a second weight matrix that meets the preset accuracy.

具体实现中,压缩参数R值不同,也即基于多个不同压缩比对第一权值矩阵进行压缩,这里,在卷积层神经网络中,压缩比 In the specific implementation, the compression parameter R value is different, that is, the first weight matrix is compressed based on multiple different compression ratios. Here, in the convolutional layer neural network, the compression ratio

在本发明实施例中,采用二分查找的方式确定压缩参数R值的实现过程参考前述文字描述,此处不多加赘述。In the embodiment of the present invention, the implementation process of determining the value of the compression parameter R by using a binary search method is described in the foregoing text and will not be elaborated herein.

例如,在图5D所示的卷积层神经网络结构中,该卷积层中包括4个卷积核,卷积核大小为3*3,其中,第1个卷积核中,Nfin=4,Nfout=6,压缩参数R值为大于0且小于等于4的正整数。通过上述二分查找的方式确定压缩参数R=4,也即,满足预设精度的第1个卷积核中的第一子卷积核F1为(3,4)矩阵,第二子卷积核F2为(4,3)矩阵。在其中一个实施方式中,针对图5D所示的其它卷积核,可以采用与第1个卷积核相同的压缩方法,也可以采用与第1个卷积核不同的压缩方法,本发明实施例不作具体限定。For example, in the convolutional layer neural network structure shown in FIG5D , the convolutional layer includes 4 convolution kernels, and the convolution kernel size is 3*3, wherein, in the first convolution kernel, N fin =4, N fout =6, and the compression parameter R value is a positive integer greater than 0 and less than or equal to 4. The compression parameter R=4 is determined by the above-mentioned binary search method, that is, the first sub-convolution kernel F1 in the first convolution kernel that meets the preset accuracy is a (3,4) matrix, and the second sub-convolution kernel F2 is a (4,3) matrix. In one embodiment, for other convolution kernels shown in FIG5D , the same compression method as the first convolution kernel can be used, or a compression method different from the first convolution kernel can be used, which is not specifically limited in the embodiment of the present invention.

在其中一个实施方式中,当第一公式的表现形式如公式(1.2)所示时,可以采用自适应算法(例如,遗传算法)确定卷积层神经网络中的压缩参数R值,其具体实现过程请参考前述描述,此处不多加赘述。In one embodiment, when the first formula is expressed as shown in formula (1.2), an adaptive algorithm (for example, a genetic algorithm) can be used to determine the value of the compression parameter R in the convolutional layer neural network. Please refer to the above description for the specific implementation process, which will not be elaborated here.

以神经网络的长短时记忆LSTM层(LSTM,Long Short-term Memory)为例,LSTM层的权值由多个全连接层权值组成。假设LSTM层的权值由t个全连接层权值组成,t为大于0的正整数。例如,第j个全连接层权值分别为(Nin_j,Nout_j),其中,Nin_j表示第j个全连接层输入神经元个数,Nout_j表示第j个全连接层输出神经元个数,第j个全连接层的权值数量为Nin_j*Nout_jTaking the LSTM layer (Long Short-term Memory) of the neural network as an example, the weights of the LSTM layer are composed of multiple fully connected layer weights. Assume that the weights of the LSTM layer are composed of t fully connected layer weights, and t is a positive integer greater than 0. For example, the weights of the j-th fully connected layer are (N in_j ,N out_j ), where N in_j represents the number of input neurons of the j-th fully connected layer, N out_j represents the number of output neurons of the j-th fully connected layer, and the number of weights of the j-th fully connected layer is N in_j *N out_j .

在LSTM层神经网络中,所述LSTM层包括N个全连接层,所述N为大于0的正整数;针对第j个全连接层,所述第一公式包括:Mj≈Mj_1*Mj_2;所述第j个全连接层中的两个子矩阵包括第一子矩阵Mj_1和第二子矩阵Mj_2,所述Mj_1为Nin_j*S矩阵,所述Mj_2为S*Nout_j矩阵;其中,S为压缩参数,Nin_j为所述神经网络第j个全连接层的输入神经元的个数,Nout_j为所述神经网络第j个全连接层的输出神经元的个数;所述压缩参数用于表征所述Mj_1的输出神经元的个数以及所述Mj_2的输入神经元的个数,所述S为大于0且小于等于min(Nin_j,Nout_j)的正整数。In an LSTM layer neural network, the LSTM layer includes N fully connected layers, where N is a positive integer greater than 0; for the j-th fully connected layer, the first formula includes: M j ≈M j_1 *M j_2 ; the two sub-matrices in the j-th fully connected layer include a first sub-matrix M j_1 and a second sub-matrix M j_2 , where M j_1 is an N in_j *S matrix, and M j_2 is an S*N out_j matrix; wherein S is a compression parameter, N in_j is the number of input neurons of the j-th fully connected layer of the neural network, and N out_j is the number of output neurons of the j-th fully connected layer of the neural network; the compression parameter is used to characterize the number of output neurons of the M j_1 and the number of input neurons of the M j_2 , and S is a positive integer greater than 0 and less than or equal to min(N in_j , N out_j ).

如前所述,调整两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数S值的动态变化过程,以寻找最佳的压缩参数S。在实际应用中,可以采用二分查找的方式来确定卷积层神经网络中的压缩参数S值,从而得到满足预设精度的第二权值矩阵。As mentioned above, the process of adjusting the size of each of the two sub-matrices is actually a dynamic change process of the compression parameter S value to find the best compression parameter S. In practical applications, a binary search method can be used to determine the compression parameter S value in the convolutional layer neural network, thereby obtaining a second weight matrix that meets the preset accuracy.

具体实现中,针对第j个全连接层,压缩参数S值不同,也即基于多个不同压缩比对第一权值矩阵进行压缩,这里,在第j个全连接层中,压缩比 In the specific implementation, for the jth fully connected layer, the compression parameter S value is different, that is, the first weight matrix is compressed based on multiple different compression ratios. Here, in the jth fully connected layer, the compression ratio

在本发明实施例中,采用二分查找的方式确定第j个全连接层中的压缩参数S值的实现过程参考前述文字描述,此处不多加赘述。In the embodiment of the present invention, the implementation process of determining the value of the compression parameter S in the j-th fully connected layer by using a binary search method is described in the above text and will not be repeated here.

以图5A所示的神经网络架构为例,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间为第1个全连接层,隐含层到输出层之间为第2个全连接层。针对输入层到隐含层之间的这一全连接层结构(也即,第1个全连接层)的具体阐述请参考前述描述,此处不多加赘述。由图5A可知,隐含层到输出层之间的这一全连接层的二维参数矩阵为(4,2),该二维参数矩阵(4,2)表示在隐含层到输出层之间的全连接层结构中,输入神经元的个数为4,输出神经元的个数为2,权值数量为8。具体实现中,这8个权值可以表示为2行4列权值矩阵,其权值矩阵的表现形式可以如图5E所示。那么,在这种情况下,压缩参数S值为大于0且小于等于2的正整数。通过二分查找的方式确定压缩参数S=2,也即,在第2个全连接层中,满足预设精度的第二权值矩阵中的第一子矩阵M2_1为(4,2)矩阵,第二子矩阵M2_2为(2,2)矩阵。具体地,针对图5A中输入层但隐含层之间的这一全连接层以及隐含层到输出层之间的这一全连接层的压缩可以如图5F所示。Taking the neural network architecture shown in FIG5A as an example, the neural network includes an input layer, a hidden layer, and an output layer, wherein the first fully connected layer is between the input layer and the hidden layer, and the second fully connected layer is between the hidden layer and the output layer. For the specific elaboration of this fully connected layer structure between the input layer and the hidden layer (that is, the first fully connected layer), please refer to the above description, and no further elaboration is given here. As can be seen from FIG5A, the two-dimensional parameter matrix of this fully connected layer between the hidden layer and the output layer is (4,2), and the two-dimensional parameter matrix (4,2) indicates that in the fully connected layer structure between the hidden layer and the output layer, the number of input neurons is 4, the number of output neurons is 2, and the number of weights is 8. In a specific implementation, these 8 weights can be represented as a 2-row 4-column weight matrix, and the expression of the weight matrix can be shown in FIG5E. Then, in this case, the value of the compression parameter S is a positive integer greater than 0 and less than or equal to 2. The compression parameter S=2 is determined by binary search, that is, in the second fully connected layer, the first submatrix M 2 _ 1 in the second weight matrix that meets the preset accuracy is a (4,2) matrix, and the second submatrix M 2 _ 2 is a (2,2) matrix. Specifically, the compression of the fully connected layer between the input layer and the hidden layer in FIG5A and the fully connected layer between the hidden layer and the output layer can be shown in FIG5F.

在其中一个实施方式中,当第一公式的表现形式如公式(1.2)所示时,可以采用自适应算法(例如,遗传算法)确定LSTM层神经网络中的压缩参数S值,其具体实现过程请参考前述描述,此处不多加赘述。In one embodiment, when the first formula is expressed as shown in formula (1.2), an adaptive algorithm (for example, a genetic algorithm) can be used to determine the value of the compression parameter S in the LSTM layer neural network. For the specific implementation process, please refer to the above description and no further details will be given here.

通过本发明实施例,控制器单元在获取到压缩指令后,将其进行解析可以得到多个操作指令,之后,将这多个操作指令以及第一权值矩阵发送给压缩单元,继而压缩元通过对第一权值矩阵进行分解,可以得到第二权值矩阵。具体实现中,第二权值矩阵包括至少两个子矩阵,继而通过调整这至少两个子矩阵中的每个子矩阵的大小,以及结合训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵,解决了现有技术中采用神经网络剪枝算法容易带来的神经网络的拓扑结构出现不规则的情形。此外,对神经网络进行压缩,可以减少神经网络的运算量,进而提高运算速度。Through the embodiment of the present invention, after obtaining the compression instruction, the controller unit can parse it to obtain multiple operation instructions, and then send these multiple operation instructions and the first weight matrix to the compression unit, and then the compression unit can obtain the second weight matrix by decomposing the first weight matrix. In a specific implementation, the second weight matrix includes at least two sub-matrices, and then by adjusting the size of each of the at least two sub-matrices, and combining the machine learning model after training compression, a second weight matrix that meets the preset accuracy is obtained, which solves the problem of irregular topological structure of the neural network that is easily caused by the use of neural network pruning algorithms in the prior art. In addition, compressing the neural network can reduce the amount of calculation of the neural network, thereby improving the calculation speed.

S3、控制器单元获取第二输入数据以及计算指令,其中,第二输入数据包括第二权值矩阵及输入神经元数据。S3. The controller unit obtains second input data and calculation instructions, wherein the second input data includes a second weight matrix and input neuron data.

S4、控制器单元将计算指令解析为运算指令,将运算指令以及第二输入数据发送给运算单元。S4. The controller unit parses the calculation instruction into an operation instruction, and sends the operation instruction and the second input data to the operation unit.

具体实现中,针对控制器单元获取计算指令,并将计算指令进行解析,以得到多个运算指令的实现方式,请参考前述控制器单元获取压缩指令的文字描述,此处不多加赘述。In a specific implementation, the controller unit obtains calculation instructions and parses the calculation instructions to obtain implementation methods of multiple operation instructions. Please refer to the text description of the controller unit obtaining compression instructions mentioned above, and no further details will be given here.

S5、运算单元接收控制器单元发送的运算指令,并根据运算指令以及第二输入数据执行神经网络计算。S5. The operation unit receives the operation instruction sent by the controller unit, and performs neural network calculation according to the operation instruction and the second input data.

在实际应用中,这里所涉及的神经网络计算可以包括人工神经网络运算,也可以包括卷积神经网络运算等等。In practical applications, the neural network calculations involved here may include artificial neural network operations, convolutional neural network operations, and so on.

以人工神经网络运算为例,对于人工神经网络运算来说,如果该人工神经网络运算具有多层运算,多层运算的输入神经元和输出神经元并非是指整个神经网络的输入层中神经元和输出层中神经元,而是对于网络中任意相邻的两层,处于网络正向运算下层中的神经元即为输入神经元,处于网络正向运算上层中的神经元即为输出神经元。以卷积神经网络为例,设一个卷积神经网络有L层,K=1,2,...,L-1,对于第K层和第K+1层来说,我们将第K层称为输入层,其中的神经元为所述输入神经元,第K+1层称为输出层,其中的神经元为所述输出神经元。即除最顶层外,每一层都可以作为输入层,其下一层为对应的输出层。Taking artificial neural network operation as an example, for artificial neural network operation, if the artificial neural network operation has multi-layer operation, the input neurons and output neurons of the multi-layer operation do not refer to the neurons in the input layer and the neurons in the output layer of the entire neural network, but for any two adjacent layers in the network, the neurons in the lower layer of the network forward operation are the input neurons, and the neurons in the upper layer of the network forward operation are the output neurons. Taking convolutional neural network as an example, suppose a convolutional neural network has L layers, K=1,2,...,L-1, for the Kth layer and the K+1th layer, we call the Kth layer the input layer, the neurons therein are the input neurons, and the K+1th layer is called the output layer, the neurons therein are the output neurons. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer.

具体实现中,对于神经网络中的运算可以为神经网络中的一层的运算,对于多层神经网络,其实现过程是,在正向运算中,当上一层人工神经网络执行完成之后,下一层的运算指令会将运算单元中计算出的输出神经元作为下一层的输入神经元进行运算(或者是对该输出神经元进行某些操作再作为下一层的输入神经元),同时,将权值也替换为下一层的权值;在反向运算中,当上一层人工神经网络的反向运算执行完成后,下一层运算指令会将运算单元中计算出的输入神经元梯度作为下一层的输出神经元梯度进行运算(或者是对该输入神经元梯度进行某些操作再作为下一层的输出神经元梯度),同时将权值替换为下一层的权值。In a specific implementation, the operation in the neural network can be the operation of a layer in the neural network. For a multi-layer neural network, the implementation process is that in the forward operation, after the execution of the previous layer of artificial neural network is completed, the operation instruction of the next layer will use the output neuron calculated in the operation unit as the input neuron of the next layer for operation (or perform certain operations on the output neuron and then use it as the input neuron of the next layer), and at the same time, the weights are also replaced by the weights of the next layer; in the reverse operation, when the reverse operation of the previous layer of artificial neural network is completed, the operation instruction of the next layer will use the input neuron gradient calculated in the operation unit as the output neuron gradient of the next layer for operation (or perform certain operations on the input neuron gradient and then use it as the output neuron gradient of the next layer), and at the same time, the weights are replaced by the weights of the next layer.

以完成神经网络的正向运算过程为例,首先,运算单元从存储单元中读取第二输入数据,其中,第二输入数据包括第二权值矩阵以及输入神经元数据。Taking the forward operation process of the neural network as an example, first, the operation unit reads the second input data from the storage unit, wherein the second input data includes a second weight matrix and input neuron data.

其次,主处理电路读取相对应的神经元数据,并将所述神经元数据按照指定顺序依次广播给各个从处理电路。在实际应用中,神经元数据可以只广播一次,从处理电路接收该数据后暂存到缓存或寄存器中,便于对其进行复用。此外,神经元数据也可以进行多次广播,从处理电路接收到数据之后直接使用,无需复用。在一种可能的实施方式中,主处理电路读取所述神经元数据之后,直接将神经元数据进行广播。Secondly, the main processing circuit reads the corresponding neuron data and broadcasts the neuron data to each slave processing circuit in the specified order. In practical applications, the neuron data can be broadcast only once, and the slave processing circuit receives the data and temporarily stores it in a cache or register to facilitate reuse. In addition, the neuron data can also be broadcast multiple times, and the slave processing circuit uses the data directly after receiving it without multiplexing. In one possible implementation, the main processing circuit directly broadcasts the neuron data after reading it.

之后,每个从处理电路将读入的神经元数据和第二权值矩阵根据运算指令进行内积运算,而后将内积结果传递回主处理电路。Afterwards, each slave processing circuit performs an inner product operation on the read neuron data and the second weight matrix according to the operation instruction, and then transmits the inner product result back to the master processing circuit.

在其中一个实施方式中,从处理电路可以将每次执行内积运算得到的部分和传输回主处理电路进行累加;在在其中一个实施方式中,也可以将每次从处理电路执行的内积运算得到的部分和保存在从处理电路的寄存器和/或片上缓存中,累加结束之后传输回主处理电路;在其中一个实施方式中,也可以将每次从处理电路执行的内积运算得到的部分和在部分情况下保存在从处理电路的寄存器和/或片上缓存中进行累加,部分情况下传输到主处理电路进行累加,累加结束之后传输回主处理电路。In one of the embodiments, the slave processing circuit may transmit the partial sum obtained from each inner product operation back to the main processing circuit for accumulation; in one of the embodiments, the partial sum obtained from each inner product operation performed by the slave processing circuit may be stored in the register and/or on-chip cache of the slave processing circuit, and transmitted back to the main processing circuit after the accumulation is completed; in one of the embodiments, the partial sum obtained from each inner product operation performed by the slave processing circuit may be stored in the register and/or on-chip cache of the slave processing circuit for accumulation in some cases, and transmitted to the main processing circuit for accumulation in some cases, and transmitted back to the main processing circuit after the accumulation is completed.

最后,主处理电路将各从处理电路的结果进行累加、激活等操作后,直到完成神经网络的正向运算过程,得到预测结果和实际结果间的误差值,即最后一层的神经元梯度数据,保存到存储单元。Finally, the main processing circuit accumulates and activates the results of each slave processing circuit until the forward operation process of the neural network is completed, and the error value between the predicted result and the actual result, that is, the neuron gradient data of the last layer, is obtained and saved in the storage unit.

在本发明实施例中,运算单元12可以设置成一主多从结构。在一种可选实施例中,运算单元12如图6所示,可以包括一个主处理电路101和多个从处理电路102。在一个实施例里,如图6所示,多个从处理电路呈阵列分布;每个从处理电路与相邻的其他从处理电路连接,主处理电路连接所述多个从处理电路中的k个从处理电路,所述k个从处理电路为:第1行的n个从处理电路、第m行的n个从处理电路以及第1列的m个从处理电路,需要说明的是,如图6所示的K个从处理电路仅包括第1行的n个从处理电路、第m行的n个从处理电路以及第1列的m个从处理电路,即该k个从处理电路为多个从处理电路中直接与主处理电路连接的从处理电路。In an embodiment of the present invention, the operation unit 12 can be set to a one-master-multiple-slave structure. In an optional embodiment, the operation unit 12, as shown in FIG6 , may include a master processing circuit 101 and multiple slave processing circuits 102. In one embodiment, as shown in FIG6 , multiple slave processing circuits are distributed in an array; each slave processing circuit is connected to other adjacent slave processing circuits, and the master processing circuit is connected to k slave processing circuits among the multiple slave processing circuits, and the k slave processing circuits are: n slave processing circuits in the first row, n slave processing circuits in the mth row, and m slave processing circuits in the first column. It should be noted that the K slave processing circuits shown in FIG6 only include n slave processing circuits in the first row, n slave processing circuits in the mth row, and m slave processing circuits in the first column, that is, the k slave processing circuits are slave processing circuits directly connected to the master processing circuit among the multiple slave processing circuits.

K个从处理电路,用于在所述主处理电路以及多个从处理电路之间的数据以及指令的转发。K slave processing circuits are used to forward data and instructions between the master processing circuit and multiple slave processing circuits.

可选的,如图7所示,该主处理电路还可以包括:转换处理电路120、激活处理电路121、加法处理电路122中的一种或任意组合;Optionally, as shown in FIG7 , the main processing circuit may further include: one or any combination of a conversion processing circuit 120 , an activation processing circuit 121 , and an addition processing circuit 122 ;

转换处理电路120,用于将主处理电路接收的数据块或中间结果执行第一数据结构与第二数据结构之间的互换(例如连续数据与离散数据的转换);或将主处理电路接收的数据块或中间结果执行第一数据类型与第二数据类型之间的互换(例如定点类型与浮点类型的转换);The conversion processing circuit 120 is used to perform an exchange between a first data structure and a second data structure (e.g., conversion between continuous data and discrete data) on the data block or intermediate result received by the main processing circuit; or perform an exchange between a first data type and a second data type (e.g., conversion between a fixed-point type and a floating-point type) on the data block or intermediate result received by the main processing circuit;

激活处理电路121,用于执行主处理电路内数据的激活运算;An activation processing circuit 121, used to perform activation operations on data in the main processing circuit;

加法处理电路122,用于执行加法运算或累加运算。The addition processing circuit 122 is used to perform addition operations or accumulation operations.

所述主处理电路,用于将确定所述输入神经元为广播数据,权值为分发数据,将分发数据分配成多个数据块,将所述多个数据块中的至少一个数据块以及多个运算指令中的至少一个运算指令发送给所述从处理电路;The master processing circuit is used to determine that the input neuron is broadcast data, the weight is distribution data, distribute the distribution data into multiple data blocks, and send at least one data block of the multiple data blocks and at least one operation instruction of the multiple operation instructions to the slave processing circuit;

所述多个从处理电路,用于依据该运算指令对接收到的数据块执行运算得到中间结果,并将运算结果传输给所述主处理电路;The multiple slave processing circuits are used to perform operations on the received data blocks according to the operation instructions to obtain intermediate results, and transmit the operation results to the master processing circuit;

所述主处理电路,用于将多个从处理电路发送的中间结果进行处理得到该计算指令的结果,将该计算指令的结果发送给所述控制器单元。The main processing circuit is used to process the intermediate results sent by the multiple slave processing circuits to obtain the result of the calculation instruction, and send the result of the calculation instruction to the controller unit.

所述从处理电路包括:乘法处理电路;The slave processing circuit includes: a multiplication processing circuit;

所述乘法处理电路,用于对接收到的数据块执行乘积运算得到乘积结果;The multiplication processing circuit is used to perform a product operation on the received data block to obtain a product result;

转发处理电路(可选的),用于将接收到的数据块或乘积结果转发。A forwarding processing circuit (optional) is used to forward the received data block or product result.

累加处理电路,所述累加处理电路,用于对该乘积结果执行累加运算得到该中间结果。The accumulation processing circuit is used to perform an accumulation operation on the product result to obtain the intermediate result.

另一个实施例里,该运算指令为矩阵乘以矩阵的指令、累加指令、激活指令等等计算指令。In another embodiment, the operation instruction is a matrix multiplication instruction, an accumulation instruction, an activation instruction, or the like.

下面通过神经网络运算指令来说明如图1B所示的计算装置的具体计算方法。对于神经网络运算指令来说,其实际需要执行的公式可以为:s=s(∑wxi+b),其中,即将权值w乘以输入数据xi,进行求和,然后加上偏置b后做激活运算s(h),得到最终的输出结果s。The specific calculation method of the calculation device shown in FIG1B is explained below by using a neural network operation instruction. For the neural network operation instruction, the formula that actually needs to be executed may be: s=s( ∑wxi +b), wherein the weight w is multiplied by the input dataxi , the sum is taken, and then the bias b is added and the activation operation s(h) is performed to obtain the final output result s.

在一种可选的实施方案中,如图8所示,所述运算单元包括:树型模块40,所述树型模块包括:一个根端口401和多个支端口404,所述树型模块的根端口连接所述主处理电路,所述树型模块的多个支端口分别连接多个从处理电路中的一个从处理电路;上述树型模块具有收发功能,所述树型模块具有收发功能,用于转发所述主处理电路与所述多个从处理电路之间的数据块、权值以及运算指令,即可以将主处理电路的数据传送给各个从处理电路,也可以将各个从处理电路的数据传送给主处理电路。In an optional implementation, as shown in Figure 8, the operation unit includes: a tree module 40, the tree module includes: a root port 401 and multiple branch ports 404, the root port of the tree module is connected to the main processing circuit, and the multiple branch ports of the tree module are respectively connected to one of the multiple slave processing circuits; the above-mentioned tree module has a transceiver function, and the tree module has a transceiver function for forwarding data blocks, weights and operation instructions between the main processing circuit and the multiple slave processing circuits, that is, the data of the main processing circuit can be transmitted to each slave processing circuit, and the data of each slave processing circuit can also be transmitted to the main processing circuit.

可选的,该树型模块为计算装置的可选择结果,其可以包括至少1层节点,该节点为具有转发功能的线结构,该节点本身可以不具有计算功能。如树型模块具有零层节点,即无需该树型模块。Optionally, the tree module is an optional result of the computing device, which may include at least one layer of nodes, the nodes are line structures with forwarding functions, and the nodes themselves may not have computing functions. If the tree module has zero-layer nodes, the tree module is not needed.

可选的,该树型模块可以为n叉树结构,例如,如图9所示的二叉树结构,当然也可以为三叉树结构,该n可以为大于等于2的整数。本申请具体实施方式并不限制上述n的具体取值,上述层数也可以为2,从处理电路可以连接除倒数第二层节点以外的其他层的节点,例如可以连接如图9所示的倒数第一层的节点。Optionally, the tree module may be an n-ary tree structure, for example, a binary tree structure as shown in FIG9 , or a ternary tree structure, and n may be an integer greater than or equal to 2. The specific implementation of the present application does not limit the specific value of n, and the number of layers may also be 2, and the slave processing circuit may be connected to nodes of other layers except the penultimate layer nodes, for example, the penultimate layer nodes as shown in FIG9 may be connected.

可选的,上述运算单元可以携带单独的缓存,如图10所示,可以包括:神经元缓存单元,该神经元缓存单元63缓存该从处理电路的输入神经元向量数据和输出神经元值数据。Optionally, the above-mentioned operation unit may carry a separate cache, as shown in FIG10 , and may include: a neuron cache unit, wherein the neuron cache unit 63 caches the input neuron vector data and output neuron value data of the slave processing circuit.

如图11所示,该运算单元还可以包括:权值缓存单元64,用于缓存该从处理电路在计算过程中需要的权值数据。As shown in FIG. 11 , the operation unit may further include: a weight cache unit 64 for caching weight data required by the slave processing circuit during the calculation process.

在一种可选实施例中,运算单元12如图12所示,可以包括分支处理电路103;其具体的连接结构如图12所示,其中,In an optional embodiment, the operation unit 12 is shown in FIG. 12 and may include a branch processing circuit 103; its specific connection structure is shown in FIG. 12, wherein:

主处理电路101与分支处理电路103(一个或多个)连接,分支处理电路103与一个或多个从处理电路102连接;The master processing circuit 101 is connected to the branch processing circuit 103 (one or more), and the branch processing circuit 103 is connected to one or more slave processing circuits 102;

分支处理电路103,用于执行转发主处理电路101与从处理电路102之间的数据或指令。The branch processing circuit 103 is used to forward data or instructions between the main processing circuit 101 and the slave processing circuit 102 .

在一种可选实施例中,以神经网络运算中的全连接运算为例,过程可以为:y=f(wx+b),其中,x为输入神经元矩阵,w为权值矩阵,b为偏置标量,f为激活函数,具体可以为:sigmoid函数,tanh、relu、softmax函数中的任意一个。这里假设为二叉树结构,具有8个从处理电路,其实现的方法可以为:In an optional embodiment, taking the full connection operation in the neural network operation as an example, the process can be: y = f (wx + b), where x is the input neuron matrix, w is the weight matrix, b is the bias scalar, and f is the activation function, which can be: sigmoid function, any one of tanh, relu, and softmax functions. Here, it is assumed that the binary tree structure has 8 slave processing circuits, and the implementation method can be:

控制器单元从存储单元内获取输入神经元矩阵x,权值矩阵w以及全连接运算指令,将输入神经元矩阵x,权值矩阵w以及全连接运算指令传输给主处理电路;The controller unit obtains the input neuron matrix x, the weight matrix w and the full connection operation instruction from the storage unit, and transmits the input neuron matrix x, the weight matrix w and the full connection operation instruction to the main processing circuit;

主处理电路确定该输入神经元矩阵x为广播数据,确定权值矩阵w为分发数据,将权值矩阵w拆分成8个子矩阵,然后将8个子矩阵通过树型模块分发给8个从处理电路,将输入神经元矩阵x广播给8个从处理电路,The master processing circuit determines that the input neuron matrix x is broadcast data, determines that the weight matrix w is distributed data, splits the weight matrix w into 8 sub-matrices, and then distributes the 8 sub-matrices to 8 slave processing circuits through the tree module, and broadcasts the input neuron matrix x to the 8 slave processing circuits.

从处理电路并行执行8个子矩阵与输入神经元矩阵x的乘法运算和累加运算得到8个中间结果,将8个中间结果发送给主处理电路;The slave processing circuit performs multiplication and accumulation operations of the eight sub-matrices and the input neuron matrix x in parallel to obtain eight intermediate results, and sends the eight intermediate results to the master processing circuit;

主处理电路,用于将8个中间结果排序得到wx的运算结果,将该运算结果执行偏置b的运算后执行激活操作得到最终结果y,将最终结果y发送至控制器单元,控制器单元将该最终结果y输出或存储至存储单元内。The main processing circuit is used to sort the 8 intermediate results to obtain the operation result of wx, perform the operation of bias b on the operation result and then perform the activation operation to obtain the final result y, and send the final result y to the controller unit, and the controller unit outputs or stores the final result y in the storage unit.

如图1B所示的计算装置执行神经网络正向运算指令的方法具体可以为:The method for the computing device shown in FIG. 1B to execute the neural network forward operation instruction may specifically be:

控制器单元从指令存储单元内提取神经网络正向运算指令、神经网络运算指令对应的操作域以及至少一个操作码,控制器单元将该操作域传输至数据访问单元,将该至少一个操作码发送至运算单元。The controller unit extracts the neural network forward operation instruction, the operation domain corresponding to the neural network operation instruction and at least one operation code from the instruction storage unit, transmits the operation domain to the data access unit, and sends the at least one operation code to the operation unit.

控制器单元从存储单元内提取该操作域对应的权值w和偏置b(当b为0时,不需要提取偏置b),将权值w和偏置b传输至运算单元的主处理电路,控制器单元从存储单元内提取输入数据Xi,将该输入数据Xi发送至主处理电路。The controller unit extracts the weight w and bias b corresponding to the operation domain from the storage unit (when b is 0, there is no need to extract bias b), and transmits the weight w and bias b to the main processing circuit of the operation unit. The controller unit extracts the input data Xi from the storage unit and sends the input data Xi to the main processing circuit.

主处理电路依据该至少一个操作码确定为乘法运算,确定输入数据Xi为广播数据,确定权值数据为分发数据,将权值w拆分成n个数据块;The main processing circuit determines that the at least one operation code is a multiplication operation, determines that the input data Xi is broadcast data, determines that the weight data is distribution data, and splits the weight w into n data blocks;

控制器单元的指令处理单元依据该至少一个操作码确定乘法指令、偏置指令和累加指令,将乘法指令、偏置指令和累加指令发送至主处理电路,主处理电路将该乘法指令、输入数据Xi以广播的方式发送给多个从处理电路,将该n个数据块分发给该多个从处理电路(例如具有n个从处理电路,那么每个从处理电路发送一个数据块);多个从处理电路,用于依据该乘法指令将该输入数据Xi与接收到的数据块执行乘法运算得到中间结果,将该中间结果发送至主处理电路,该主处理电路依据该累加指令将多个从处理电路发送的中间结果执行累加运算得到累加结果,依据该偏置指令将该累加结果执行加偏置b得到最终结果,将该最终结果发送至该控制器单元。The instruction processing unit of the controller unit determines the multiplication instruction, the bias instruction and the accumulation instruction according to the at least one operation code, and sends the multiplication instruction, the bias instruction and the accumulation instruction to the main processing circuit, the main processing circuit sends the multiplication instruction and the input data Xi to multiple slave processing circuits in a broadcast manner, and distributes the n data blocks to the multiple slave processing circuits (for example, if there are n slave processing circuits, then each slave processing circuit sends a data block); multiple slave processing circuits are used to perform multiplication operations on the input data Xi and the received data blocks according to the multiplication instruction to obtain an intermediate result, and send the intermediate result to the main processing circuit, the main processing circuit performs accumulation operations on the intermediate results sent by the multiple slave processing circuits according to the accumulation instruction to obtain an accumulation result, adds the bias b to the accumulation result according to the bias instruction to obtain a final result, and sends the final result to the controller unit.

另外,加法运算和乘法运算的顺序可以调换。In addition, the order of addition and multiplication operations can be reversed.

本申请提供的技术方案通过一个指令即神经网络运算指令即实现了神经网络的乘法运算以及偏置运算,在神经网络计算的中间结果均无需存储或提取,减少了中间数据的存储以及提取操作,所以其具有减少对应的操作步骤,提高神经网络的计算效果的优点。The technical solution provided in the present application implements the multiplication operation and bias operation of the neural network through one instruction, namely the neural network operation instruction. The intermediate results of the neural network calculation do not need to be stored or extracted, which reduces the storage and extraction operations of the intermediate data. Therefore, it has the advantages of reducing the corresponding operation steps and improving the calculation effect of the neural network.

本申请还揭露了一个机器学习运算装置,其包括一个或多个在本申请中提到的计算装置,用于从其他处理装置中获取待运算数据和控制信息,执行指定的机器学习运算,执行结果通过I/O接口传递给外围设备。外围设备譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口,服务器。当包含一个以上计算装置时,计算装置间可以通过特定的结构进行链接并传输数据,譬如,通过PCIE总线进行互联并传输数据,以支持更大规模的机器学习的运算。此时,可以共享同一控制系统,也可以有各自独立的控制系统;可以共享内存,也可以每个加速器有各自的内存。此外,其互联方式可以是任意互联拓扑。The present application also discloses a machine learning computing device, which includes one or more computing devices mentioned in the present application, and is used to obtain data to be calculated and control information from other processing devices, perform specified machine learning operations, and transmit the execution results to peripheral devices through I/O interfaces. Peripheral devices include cameras, displays, mice, keyboards, network cards, wifi interfaces, and servers. When more than one computing device is included, the computing devices can be linked and data can be transmitted through a specific structure, for example, interconnected and data can be transmitted through a PCIE bus to support larger-scale machine learning operations. At this time, the same control system can be shared, or each independent control system can be provided; memory can be shared, or each accelerator can have its own memory. In addition, the interconnection method can be any interconnection topology.

该机器学习运算装置具有较高的兼容性,可通过PCIE接口与各种类型的服务器相连接。The machine learning computing device has high compatibility and can be connected to various types of servers through a PCIE interface.

本申请还揭露了一个组合处理装置,其包括上述的机器学习运算装置,通用互联接口,和其他处理装置。机器学习运算装置与其他处理装置进行交互,共同完成用户指定的操作。图13为组合处理装置的示意图。The present application also discloses a combined processing device, which includes the above-mentioned machine learning computing device, a universal interconnection interface, and other processing devices. The machine learning computing device interacts with other processing devices to jointly complete the operation specified by the user. FIG13 is a schematic diagram of the combined processing device.

其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口,包括数据搬运,完成对本机器学习运算装置的开启、停止等基本控制;其他处理装置也可以和机器学习运算装置协作共同完成运算任务。Other processing devices include one or more types of processors such as central processing unit (CPU), graphics processing unit (GPU), neural network processor, and other general/special processors. There is no limit on the number of processors included in other processing devices. Other processing devices serve as interfaces between the machine learning computing device and external data and control, including data handling, to complete basic control of the machine learning computing device such as starting and stopping; other processing devices can also collaborate with the machine learning computing device to complete computing tasks.

通用互联接口,用于在所述机器学习运算装置与其他处理装置间传输数据和控制指令。该机器学习运算装置从其他处理装置中获取所需的输入数据,写入机器学习运算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入机器学习运算装置片上的控制缓存;也可以读取机器学习运算装置的存储模块中的数据并传输给其他处理装置。A universal interconnection interface is used to transmit data and control instructions between the machine learning computing device and other processing devices. The machine learning computing device can obtain the required input data from other processing devices and write it into the storage device on the machine learning computing device chip; it can obtain control instructions from other processing devices and write them into the control cache on the machine learning computing device chip; it can also read data in the storage module of the machine learning computing device and transmit it to other processing devices.

可选的,该结构如图14所示,还可以包括存储装置,存储装置分别与所述机器学习运算装置和所述其他处理装置连接。存储装置用于保存在所述机器学习运算装置和所述其他处理装置的数据,尤其适用于所需要运算的数据在本机器学习运算装置或其他处理装置的内部存储中无法全部保存的数据。Optionally, as shown in FIG14 , the structure may further include a storage device, which is connected to the machine learning operation device and the other processing device, respectively. The storage device is used to store data in the machine learning operation device and the other processing device, and is particularly suitable for data that cannot be fully stored in the internal storage of the machine learning operation device or other processing devices.

该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。The combined processing device can be used as a SOC chip system for mobile phones, robots, drones, video surveillance equipment and other devices, effectively reducing the core area of the control part, improving the processing speed, and reducing the overall power consumption. In this case, the universal interconnection interface of the combined processing device is connected to certain components of the device. Certain components include cameras, displays, mice, keyboards, network cards, and wifi interfaces.

在一些实施例里,还申请了一种芯片,其包括了上述机器学习运算装置或组合处理装置。In some embodiments, a chip is also applied for, which includes the above-mentioned machine learning computing device or combined processing device.

在一些实施例里,申请了一种芯片封装结构,其包括了上述芯片。In some embodiments, a chip packaging structure is applied for, which includes the above-mentioned chip.

在一些实施例里,申请了一种板卡,其包括了上述芯片封装结构。参阅图15,图15提供了一种板卡,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;In some embodiments, a board card is applied, which includes the above chip packaging structure. Referring to FIG. 15 , FIG. 15 provides a board card, which includes, in addition to the above chip 389 , other supporting components, including but not limited to: a storage device 390 , an interface device 391 and a control device 392 ;

所述存储器件390与所述芯片封装结构内的芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The memory device 390 is connected to the chip in the chip package structure via a bus for storing data. The memory device may include multiple groups of memory cells 393. Each group of memory cells is connected to the chip via a bus. It is understood that each group of memory cells may be DDR SDRAM (English: Double Data Rate SDRAM, double rate synchronous dynamic random access memory).

DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In one embodiment, the storage device may include 4 groups of storage units. Each group of storage units may include multiple DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers, 64 bits of the above 72-bit DDR4 controllers are used to transmit data, and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.

在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。In one embodiment, each group of the storage units includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling DDR is arranged in the chip to control the data transmission and data storage of each of the storage units.

所述接口装置与所述芯片封装结构内的芯片电连接。所述接口装置用于实现所述芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE 3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本申请并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。The interface device is electrically connected to the chip in the chip packaging structure. The interface device is used to realize data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device can be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device can also be other interfaces. This application does not limit the specific forms of expression of the above-mentioned other interfaces. The interface unit can realize the switching function. In addition, the calculation results of the chip are still transmitted back to the external device (such as a server) by the interface device.

所述控制器件与所述芯片电连接。所述控制器件用于对所述芯片的状态进行监控。具体的,所述芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。The control device is electrically connected to the chip. The control device is used to monitor the state of the chip. Specifically, the chip and the control device can be electrically connected via an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the chip may include multiple processing chips, multiple processing cores or multiple processing circuits, which can drive multiple loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the chip.

在一些实施例里,申请了一种电子设备,其包括了上述板卡。In some embodiments, an electronic device is applied for, which includes the above-mentioned board.

电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic devices include data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, camcorders, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.

所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。The transportation means include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound machines and/or electrocardiographs.

在本发明实施例中,考虑到针对神经网络的压缩方法可以包括但不限于应用在上述计算装置中,还可以应用在其它场景下,例如,减少神经网络的精度损失。基于此,下面结合图16所示的本发明实施例提供的神经网络压缩方法的流程示意图,具体说明本发明是如何实现针对第一权值矩阵的压缩,以得到第二权值矩阵的,可以包括但不限于如下步骤:In the embodiment of the present invention, it is considered that the compression method for the neural network can be applied to the above-mentioned computing device, but can also be applied to other scenarios, for example, to reduce the accuracy loss of the neural network. Based on this, the following is a flowchart of the neural network compression method provided by the embodiment of the present invention shown in FIG16 to specifically explain how the present invention implements compression of the first weight matrix to obtain the second weight matrix, which can include but is not limited to the following steps:

步骤S100、获取第一输入数据;其中,所述第一输入数据包括第一权值矩阵。Step S100: Obtain first input data; wherein the first input data includes a first weight matrix.

具体实现中,第一权值矩阵中的权值数据可以为任意实数。这里,权值数据是指神经网络层与层之间的连接值,也即神经元之间的信息传递强度。In a specific implementation, the weight data in the first weight matrix can be any real number. Here, the weight data refers to the connection value between the neural network layers, that is, the information transmission strength between neurons.

步骤S102、将所述第一权值矩阵压缩为第二权值矩阵;其中,第二权值矩阵中包括至少两个子矩阵。Step S102: compress the first weight matrix into a second weight matrix; wherein the second weight matrix includes at least two sub-matrices.

在其中一个实施方式中,所述将所述第一权值矩阵调整为第二权值矩阵,包括:In one implementation manner, adjusting the first weight matrix to a second weight matrix includes:

将所述第一权值矩阵分解成第三权值矩阵;其中,所述第三权值矩阵包括至少两个子矩阵;Decomposing the first weight matrix into a third weight matrix; wherein the third weight matrix includes at least two sub-matrices;

根据第一公式确定所述至少两个子矩阵中的每个子矩阵的大小,所述第一公式为Q≈Q1*Q2*......*Qn;其中,其中,所述Q表示第一权值矩阵;所述Q1表示所述至少两个子矩阵中的第一子矩阵;所述Q2表示所述至少两个子矩阵中的第二子矩阵;所述Qn表示所述至少两个子矩阵中的第n子矩阵;The size of each submatrix in the at least two submatrices is determined according to a first formula, wherein the first formula is Q≈Q 1 *Q 2 *......*Q n ; wherein, Q represents a first weight matrix; Q 1 represents a first submatrix in the at least two submatrices; Q 2 represents a second submatrix in the at least two submatrices; and Q n represents an nth submatrix in the at least two submatrices;

调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵。The size of each submatrix of the at least two submatrices is adjusted, and a second weight matrix satisfying a preset accuracy is obtained by training a compressed machine learning model.

具体实现中,第一公式中的运算符号“*”表示矩阵的乘法运算。In a specific implementation, the operation symbol “*” in the first formula represents a matrix multiplication operation.

在其中一个实施方式中,当第三权值矩阵中包括两个子矩阵时,第一公式可以表示为:In one implementation manner, when the third weight matrix includes two sub-matrices, the first formula can be expressed as:

Q≈Q1*Q2(1.1)Q≈Q 1 *Q 2 (1.1)

其中,Q表示第一权值矩阵,所述Q1表示所述至少两个子矩阵中的第一子矩阵;所述Q2表示所述至少两个子矩阵中的第二子矩阵。Wherein, Q represents a first weight matrix, Q1 represents a first submatrix among the at least two submatrices, and Q2 represents a second submatrix among the at least two submatrices.

在其中另一个实施方式中,当第三权值矩阵中包括至少两个子矩阵时,第一公式可以表示为:In another embodiment, when the third weight matrix includes at least two sub-matrices, the first formula can be expressed as:

Q≈Q1*Q2*......*Qn(1.2)Q≈Q 1 *Q 2 *......*Q n (1.2)

上述公式(1.2)中,n为大于2的正整数。In the above formula (1.2), n is a positive integer greater than 2.

具体实现中,将第一权值矩阵压缩为第二权值矩阵的实现过程中,当应用到不同的神经网络时(例如,全连接层神经网络、卷积层神经网络、LSTM层神经网络),上述所涉及的针对第一权值矩阵的分解操作、求解至少两个子矩阵中的每个子矩阵以及调整至少两个子矩阵中的每个子矩阵以获得满足预设精度的第二权值矩阵将有所差异,接下来将进行具体阐述:In a specific implementation, in the process of compressing the first weight matrix into the second weight matrix, when applied to different neural networks (for example, a fully connected layer neural network, a convolutional layer neural network, an LSTM layer neural network), the above-mentioned decomposition operation on the first weight matrix, solving each of the at least two sub-matrices, and adjusting each of the at least two sub-matrices to obtain a second weight matrix that meets the preset accuracy will be different, which will be specifically explained below:

(1)全连接层神经网络:(1) Fully connected layer neural network:

全连接层是指对n-1层和n层而言,n-1层的任意一个节点,都和n层的所有节点有连接。具体地,参见图5A,是本发明实施例提供的一种神经网络的一维全连接层的结构示意图,如图5A所示,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间的这一全连接层的二维参数矩阵为(3,4),该二维参数矩阵(3,4)表示在输入层到隐含层之间的全连接层结构中,输入神经元的个数为3,输出神经元的个数为4,权值数量为12。具体实现中,这12个权值可以表示为4行3列的权值矩阵,其权值矩阵的表现形式可以如图5B所示。A fully connected layer means that for the n-1 layer and the n layer, any node in the n-1 layer is connected to all the nodes in the n layer. Specifically, referring to FIG5A, it is a schematic diagram of the structure of a one-dimensional fully connected layer of a neural network provided in an embodiment of the present invention. As shown in FIG5A, the neural network includes an input layer, a hidden layer, and an output layer, wherein the two-dimensional parameter matrix of the fully connected layer between the input layer and the hidden layer is (3,4), and the two-dimensional parameter matrix (3,4) indicates that in the fully connected layer structure between the input layer and the hidden layer, the number of input neurons is 3, the number of output neurons is 4, and the number of weights is 12. In a specific implementation, these 12 weights can be represented as a weight matrix of 4 rows and 3 columns, and the representation of the weight matrix can be shown in FIG5B.

在全连接层神经网络中,所述第一公式包括:M≈M1*M2;所述两个子矩阵指包括第一子矩阵M1和第二子矩阵M2,所述M1为Nin*K矩阵,所述M2为K*Nout矩阵;其中,K为压缩参数,Nin为所述神经网络的输入神经元的个数,Nout为所述神经网络的输出神经元的个数;所述压缩参数用于表征所述M1的输出神经元的个数以及所述M2的输入神经元的个数,所述K为大于0且小于等于min(Nin,Nout)的正整数。In a fully connected layer neural network, the first formula includes: M≈M 1 *M 2 ; the two sub-matrices refer to a first sub-matrix M1 and a second sub-matrix M2, the M1 is an N in *K matrix, and the M2 is a K*N out matrix; wherein K is a compression parameter, N in is the number of input neurons of the neural network, and N out is the number of output neurons of the neural network; the compression parameter is used to characterize the number of output neurons of the M1 and the number of input neurons of the M2, and K is a positive integer greater than 0 and less than or equal to min(N in , N out ).

如前所述,调整两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数K值的动态变化过程,以寻找最佳的压缩参数K。在实际应用中,可以采用二分查找的方式来确定全连接层神经网络中的压缩参数K值,从而得到满足预设精度的第二权值矩阵。在其中一个实施方式中,利用二分查找方式确定的压缩参数K可以使得第二权值矩阵满足预设精度。在其中另一个实施方式中,利用二分查找方式确定的压缩参数K可以使得第二权值矩阵满足预设精度的同时,第一权值矩阵与第二权值矩阵的压缩比满足预设压缩比,也即,针对该神经网络模型的压缩获得较优的压缩效果。As mentioned above, the process of adjusting the size of each of the two sub-matrices is actually a dynamic change process of the compression parameter K value to find the optimal compression parameter K. In practical applications, a binary search method can be used to determine the compression parameter K value in the fully connected layer neural network, so as to obtain a second weight matrix that meets the preset accuracy. In one embodiment, the compression parameter K determined by the binary search method can make the second weight matrix meet the preset accuracy. In another embodiment, the compression parameter K determined by the binary search method can make the second weight matrix meet the preset accuracy while the compression ratio of the first weight matrix to the second weight matrix meets the preset compression ratio, that is, a better compression effect is obtained for the compression of the neural network model.

具体实现中,压缩参数K值不同,也即基于多个不同压缩比对第一权值矩阵进行压缩,这里,在全连接层神经网络中,压缩比为 In the specific implementation, the compression parameter K has different values, that is, the first weight matrix is compressed based on multiple different compression ratios. Here, in the fully connected layer neural network, the compression ratio is

接下来具体阐述如何采用二分查找的方式来确定压缩参数K值。首先,设定两个参数KL和KR。初始化情况下,令KL=1,KR=min(Nin,Nout)。在调整参数过程中K=(KL+KR)/2。如果M1*M2表示的第二权值矩阵导致压缩后的神经网络模型的精度下降X%(这里,X=1~10等等)时,调整参数KL,使得KL=K。如果M1*M2表示的第二权值矩阵导致压缩后的神经网络模型满足预设精度,那么,调整KR,使得KR=K,重复执行上述步骤,直至满足结束条件K=KL或者K=KR。Next, we will explain in detail how to use binary search to determine the compression parameter K value. First, set two parameters KL and KR. In the initialization, let KL = 1, KR = min (N in , N out ). In the process of adjusting the parameters, K = (KL + KR) / 2. If the second weight matrix represented by M1 * M2 causes the accuracy of the compressed neural network model to decrease by X% (here, X = 1 to 10, etc.), adjust the parameter KL so that KL = K. If the second weight matrix represented by M1 * M2 causes the compressed neural network model to meet the preset accuracy, then adjust KR so that KR = K, and repeat the above steps until the end condition K = KL or K = KR is met.

以图5A中输入层到隐含层之间的这一全连接层为例,压缩参数K值为大于0且小于等于3的正整数。通过上述二分查找的方式确定压缩参数K=2,也即,满足预设精度的第二权值矩阵中的第一子矩阵M1为(3,2)矩阵,第二子矩阵M2为(2,4)矩阵。具体地,针对图5A中输入层到隐含层之间的这一全连接层的压缩可以如图5C所示。Taking the fully connected layer between the input layer and the hidden layer in FIG5A as an example, the compression parameter K value is a positive integer greater than 0 and less than or equal to 3. The compression parameter K=2 is determined by the above binary search method, that is, the first submatrix M1 in the second weight matrix that meets the preset accuracy is a (3,2) matrix, and the second submatrix M2 is a (2,4) matrix. Specifically, the compression of the fully connected layer between the input layer and the hidden layer in FIG5A can be shown in FIG5C.

在其中一个实施方式中,当第一公式的表现形式如公式(1.2)所示时,也即第三权值矩阵中的子矩阵的数量为n个,这里,n为大于2的正整数,此时,压缩参数K的数量为(n-1)个。在实际应用中,可以采用自适应算法(例如,遗传算法)来确定全连接层神经网络中的(n-1)个压缩参数K值,从而得到满足预设精度和/或满足压缩效果的第二权值矩阵。接下来具体阐述是如何采用遗传算法来确定全连接层神经网络中的(n-1)个压缩参数K值的:In one embodiment, when the first formula is expressed as shown in formula (1.2), that is, the number of sub-matrices in the third weight matrix is n, where n is a positive integer greater than 2, and the number of compression parameters K is (n-1). In practical applications, an adaptive algorithm (e.g., a genetic algorithm) can be used to determine the (n-1) compression parameter K values in the fully connected layer neural network, thereby obtaining a second weight matrix that meets the preset accuracy and/or meets the compression effect. The following specifically describes how to use a genetic algorithm to determine the (n-1) compression parameter K values in the fully connected layer neural network:

步骤1:随机产生种群:设定种群的规模为P个,设置最大迭代次数Tmax,例如,Tmax=100。在初始状态下,设置迭代次数计数器t=0;交叉概率Pc=A(例如,A=0.4),变异概率Pm=B(例如,B=0.6),种群的矩阵每一行表示一个基因串个体,每一列表示个体的数目;这里,每一个个体是一组关于压缩参数K(例如,Kj)值的解;Step 1: Randomly generate a population: Set the size of the population to P, set the maximum number of iterations T max , for example, T max = 100. In the initial state, set the iteration counter t = 0; crossover probability Pc = A (for example, A = 0.4), mutation probability Pm = B (for example, B = 0.6), each row of the population matrix represents a gene string individual, and each column represents the number of individuals; here, each individual is a set of solutions about the value of the compression parameter K (for example, K j );

步骤2:计算种群中每个个体的适应度;这里,适应度是指该个体对应的第一权值矩阵与第二权值矩阵的压缩比和/或精度,其中,压缩比用于表征针对神经网络的压缩效果。Step 2: Calculate the fitness of each individual in the population; here, fitness refers to the compression ratio and/or accuracy of the first weight matrix and the second weight matrix corresponding to the individual, where the compression ratio is used to characterize the compression effect on the neural network.

步骤3:将选择算子作用于种群,把优化的个体直接遗传到下一代;Step 3: Apply the selection operator to the population and directly pass the optimized individuals to the next generation;

步骤4:将交叉算子中作用于种群,对于任意两个个体,随机产生若干基因串的位置点,交换两个个体该位置上的值;Step 4: Apply the crossover operator to the population. For any two individuals, randomly generate several gene string position points and exchange the values of the two individuals at that position.

步骤5:将变异算子作用于种群,对于任意个体,随机产生若干基因串的位置,然后改动这些位置上的值;这里,变异是指随机改变Kj的值;Step 5: Apply the mutation operator to the population. For any individual, randomly generate the positions of several gene strings and then change the values at these positions. Here, mutation refers to randomly changing the value of Kj .

步骤6:保留每一代中适应度最高的个体,进入下一代;Step 6: Keep the individuals with the highest fitness in each generation and enter the next generation;

步骤7:判断是否达到最大迭代次数Tmax,若t=Tmax,则输出具有最大适应度的个体,终止计算;否则,跳到步骤2继续执行。Step 7: Determine whether the maximum number of iterations T max is reached. If t=T max , output the individual with the maximum fitness and terminate the calculation; otherwise, jump to step 2 to continue.

从而可以根据上述遗传算法来确定全连接层神经网络中的(n-1)个压缩参数K值。Therefore, the (n-1) compression parameter K values in the fully connected layer neural network can be determined according to the above genetic algorithm.

(2)卷积层神经网络:(2) Convolutional neural network:

以神经网络的卷积层为例,如图5D所示,卷积层可以认为是一个四维矩阵(Nfin,Nfout,Kx,Ky),其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,(Kx,Ky)为卷积层中卷积核的大小。Taking the convolution layer of a neural network as an example, as shown in Figure 5D, the convolution layer can be considered as a four-dimensional matrix (N fin , N fout , K x , Ky ), where N fin is the number of input feature images, N fout is the number of output feature images, and (K x , Ky ) is the size of the convolution kernel in the convolution layer.

在卷积层神经网络中,所述卷积层神经网络包括Nfin*Nfout个卷积核;所述第一公式包括:F≈F1*F2;其中,F表示所述Nfin*Nfout个卷积核中的任意一个卷积核;所述F1为第一子卷积核;所述F2为第二子卷积核;所述第一子卷积核F1为(Kx,R),所述第二子卷积核F2为(R,Ky),(Kx,Ky)表示卷积核的大小,R为压缩参数,所述R为大于0且小于等于min(Kx,Ky)的正整数。In a convolutional layer neural network, the convolutional layer neural network includes N fin *N fout convolution kernels; the first formula includes: F≈F 1 *F 2 ; wherein F represents any one of the N fin *N fout convolution kernels; the F1 is the first sub-convolution kernel; the F2 is the second sub-convolution kernel; the first sub-convolution kernel F1 is (K x , R), the second sub-convolution kernel F2 is (R, Ky ), (K x , Ky ) represents the size of the convolution kernel, R is a compression parameter, and R is a positive integer greater than 0 and less than or equal to min(K x , Ky ).

如前所述,调整两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数R值的动态变化过程,以寻找最佳的压缩参数R。在实际应用中,可以采用二分查找的方式来确定卷积层神经网络中的压缩参数R值,从而得到满足预设精度的第二权值矩阵。As mentioned above, the process of adjusting the size of each of the two sub-matrices is actually a dynamic change process of the compression parameter R value to find the best compression parameter R. In practical applications, a binary search method can be used to determine the compression parameter R value in the convolutional layer neural network, so as to obtain a second weight matrix that meets the preset accuracy.

具体实现中,压缩参数R值不同,也即基于多个不同压缩比对第一权值矩阵进行压缩,这里,在卷积层神经网络中,压缩比 In the specific implementation, the compression parameter R value is different, that is, the first weight matrix is compressed based on multiple different compression ratios. Here, in the convolutional layer neural network, the compression ratio

在本发明实施例中,采用二分查找的方式确定压缩参数R值的实现过程参考前述文字描述,此处不多加赘述。In the embodiment of the present invention, the implementation process of determining the value of the compression parameter R by using a binary search method is described in the foregoing text and will not be elaborated herein.

例如,图5D所示的卷积层神经网络结构中,该卷积层中包括4个卷积核,卷积核大小为3*3,其中,第1个卷积核中,Nfin=4,Nfout=6,压缩参数R值为大于0且小于等于4的正整数。通过上述二分查找的方式确定压缩参数R=4,也即,满足预设精度的第1个卷积核中的第一子卷积核F1为(3,4)矩阵,第二子卷积核F2为(4,3)矩阵。在其中一个实施方式中,针对图5D所示的其它卷积核,可以采用与第1个卷积核相同的压缩方法,也可以采用与第1个卷积核不同的压缩方法,本发明实施例不作具体限定。For example, in the convolutional layer neural network structure shown in FIG5D , the convolutional layer includes 4 convolution kernels, and the convolution kernel size is 3*3, wherein, in the first convolution kernel, N fin =4, N fout =6, and the compression parameter R value is a positive integer greater than 0 and less than or equal to 4. The compression parameter R=4 is determined by the above-mentioned binary search method, that is, the first sub-convolution kernel F1 in the first convolution kernel that meets the preset accuracy is a (3,4) matrix, and the second sub-convolution kernel F2 is a (4,3) matrix. In one embodiment, for other convolution kernels shown in FIG5D , the same compression method as the first convolution kernel can be used, or a compression method different from the first convolution kernel can be used, which is not specifically limited in the embodiment of the present invention.

在其中一个实施方式中,当第一公式的表现形式如公式(1.2)所示时,可以采用自适应算法(例如,遗传算法)确定卷积层神经网络中的压缩参数R值,其具体实现过程请参考前述描述,此处不多加赘述。In one embodiment, when the first formula is expressed as shown in formula (1.2), an adaptive algorithm (for example, a genetic algorithm) can be used to determine the value of the compression parameter R in the convolutional layer neural network. Please refer to the above description for the specific implementation process, which will not be elaborated here.

(3)LSTM层神经网络:(3) LSTM layer neural network:

以神经网络的长短时记忆LSTM层(LSTM,Long Short-term Memory)为例,LSTM层的权值由多个全连接层权值组成。假设LSTM层的权值由t个全连接层权值组成,t为大于0的正整数。例如,第j个全连接层权值分别为(Nin_j,Nout_j),其中,Nin_j表示第j个全连接层输入神经元个数,Nout_j表示第j个全连接层输出神经元个数,第j个全连接层的权值数量为Nin_j*Nout_jTaking the LSTM layer (Long Short-term Memory) of the neural network as an example, the weights of the LSTM layer are composed of multiple fully connected layer weights. Assume that the weights of the LSTM layer are composed of t fully connected layer weights, and t is a positive integer greater than 0. For example, the weights of the j-th fully connected layer are (N in_j ,N out_j ), where N in_j represents the number of input neurons of the j-th fully connected layer, N out_j represents the number of output neurons of the j-th fully connected layer, and the number of weights of the j-th fully connected layer is N in_j *N out_j .

在LSTM层神经网络中,所述LSTM层包括N个全连接层,所述N为大于0的正整数;针对第j个全连接层,所述第一公式包括:Mj≈Mj_1*Mj_2;所述第j个全连接层中的两个子矩阵包括第一子矩阵Mj_1和第二子矩阵Mj_2,所述Mj_1为Nin_j*S矩阵,所述Mj_2为S*Nout_j矩阵;其中,S为压缩参数,Nin_j为所述神经网络第j个全连接层的输入神经元的个数,Nout_j为所述神经网络第j个全连接层的输出神经元的个数;所述压缩参数用于表征所述Mj_1的输出神经元的个数以及所述Mj_2的输入神经元的个数,所述S为大于0且小于等于min(Nin_j,Nout_j)的正整数。In an LSTM layer neural network, the LSTM layer includes N fully connected layers, where N is a positive integer greater than 0; for the j-th fully connected layer, the first formula includes: M j ≈M j_1 *M j_2 ; the two sub-matrices in the j-th fully connected layer include a first sub-matrix M j_1 and a second sub-matrix M j_2 , where M j_1 is an N in_j *S matrix, and M j_2 is an S*N out_j matrix; wherein S is a compression parameter, N in_j is the number of input neurons of the j-th fully connected layer of the neural network, and N out_j is the number of output neurons of the j-th fully connected layer of the neural network; the compression parameter is used to characterize the number of output neurons of the M j_1 and the number of input neurons of the M j_2 , and S is a positive integer greater than 0 and less than or equal to min(N in_j , N out_j ).

如前所述,调整两个子矩阵的每个子矩阵的大小的过程,其实质是压缩参数S值的动态变化过程,以寻找最佳的压缩参数S。在实际应用中,可以采用二分查找的方式来确定卷积层神经网络中的压缩参数S值,从而得到满足预设精度的第二权值矩阵。As mentioned above, the process of adjusting the size of each of the two sub-matrices is actually a dynamic change process of the compression parameter S value to find the best compression parameter S. In practical applications, a binary search method can be used to determine the compression parameter S value in the convolutional layer neural network, thereby obtaining a second weight matrix that meets the preset accuracy.

具体实现中,针对第j个全连接层,压缩参数S值不同,也即基于多个不同压缩比对第一权值矩阵进行压缩,这里,在第j个全连接层中,压缩比 In the specific implementation, for the jth fully connected layer, the compression parameter S value is different, that is, the first weight matrix is compressed based on multiple different compression ratios. Here, in the jth fully connected layer, the compression ratio

在本发明实施例中,采用二分查找的方式确定第j个全连接层中的压缩参数S值的实现过程参考前述文字描述,此处不多加赘述。In the embodiment of the present invention, the implementation process of determining the value of the compression parameter S in the j-th fully connected layer by using a binary search method is described in the above text and will not be repeated here.

以图5A所示的神经网络架构为例,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间为第1个全连接层,隐含层到输出层之间为第2个全连接层。针对输入层到隐含层之间的这一全连接层结构(也即,第1个全连接层)的具体阐述请参考前述描述,此处不多加赘述。由图5A可知,隐含层到输出层之间的这一全连接层的二维参数矩阵为(4,2),该二维参数矩阵(4,2)表示在隐含层到输出层之间的全连接层结构中,输入神经元的个数为4,输出神经元的个数为2,权值数量为8。具体实现中,这8个权值可以表示为2行4列权值矩阵,其权值矩阵的表现形式可以如图5E所示。那么,在这种情况下,压缩参数S值为大于0且小于等于2的正整数。通过二分查找的方式确定压缩参数S=2,也即,在第2个全连接层中,满足预设精度的第二权值矩阵中的第一子矩阵M2_1为(4,2)矩阵,第二子矩阵M2_2为(2,2)矩阵。具体地,针对图5A中输入层到隐含层之间的这一全连接层以及隐含层到输出层之间的这一全连接层的压缩可以如图5F所示。Taking the neural network architecture shown in FIG5A as an example, the neural network includes an input layer, a hidden layer, and an output layer, wherein the first fully connected layer is between the input layer and the hidden layer, and the second fully connected layer is between the hidden layer and the output layer. For the specific elaboration of this fully connected layer structure between the input layer and the hidden layer (that is, the first fully connected layer), please refer to the above description, and no further elaboration is given here. As can be seen from FIG5A, the two-dimensional parameter matrix of this fully connected layer between the hidden layer and the output layer is (4,2), and the two-dimensional parameter matrix (4,2) indicates that in the fully connected layer structure between the hidden layer and the output layer, the number of input neurons is 4, the number of output neurons is 2, and the number of weights is 8. In a specific implementation, these 8 weights can be represented as a 2-row 4-column weight matrix, and the expression of the weight matrix can be shown in FIG5E. Then, in this case, the value of the compression parameter S is a positive integer greater than 0 and less than or equal to 2. The compression parameter S=2 is determined by binary search, that is, in the second fully connected layer, the first submatrix M 2 _ 1 in the second weight matrix that meets the preset accuracy is a (4,2) matrix, and the second submatrix M 2 _ 2 is a (2,2) matrix. Specifically, the compression of the fully connected layer between the input layer and the hidden layer and the fully connected layer between the hidden layer and the output layer in FIG. 5A can be shown in FIG. 5F.

在其中一个实施方式中,当第一公式的表现形式如公式(1.2)所示时,可以采用自适应算法(例如,遗传算法)确定LSTM层神经网络中的压缩参数S值,其具体实现过程请参考前述描述,此处不多加赘述。In one embodiment, when the first formula is expressed as shown in formula (1.2), an adaptive algorithm (for example, a genetic algorithm) can be used to determine the value of the compression parameter S in the LSTM layer neural network. For the specific implementation process, please refer to the above description and no further details will be given here.

步骤S104、根据第二输入数据执行神经网络计算,其中,第二输入数据包括第二权值矩阵以及神经元数据。Step S104: performing neural network calculation according to second input data, wherein the second input data includes a second weight matrix and neuron data.

在实际应用中,这里所涉及的神经网络计算可以包括人工神经网络运算,也可以包括卷积神经网络运算等等。In practical applications, the neural network calculations involved here may include artificial neural network operations, convolutional neural network operations, and so on.

以人工神经网络运算为例,对于人工神经网络运算来说,如果该人工神经网络运算具有多层运算,多层运算的输入神经元和输出神经元并非是指整个神经网络的输入层中神经元和输出层中神经元,而是对于网络中任意相邻的两层,处于网络正向运算下层中的神经元即为输入神经元,处于网络正向运算上层中的神经元即为输出神经元。以卷积神经网络为例,设一个卷积神经网络有L层,K=1,2,...,L-1,对于第K层和第K+1层来说,我们将第K层称为输入层,其中的神经元为所述输入神经元,第K+1层称为输出层,其中的神经元为所述输出神经元。即除最顶层外,每一层都可以作为输入层,其下一层为对应的输出层。Taking artificial neural network operation as an example, for artificial neural network operation, if the artificial neural network operation has multi-layer operation, the input neurons and output neurons of the multi-layer operation do not refer to the neurons in the input layer and the neurons in the output layer of the entire neural network, but for any two adjacent layers in the network, the neurons in the lower layer of the network forward operation are the input neurons, and the neurons in the upper layer of the network forward operation are the output neurons. Taking convolutional neural network as an example, suppose a convolutional neural network has L layers, K=1,2,...,L-1, for the Kth layer and the K+1th layer, we call the Kth layer the input layer, the neurons therein are the input neurons, and the K+1th layer is called the output layer, the neurons therein are the output neurons. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer.

具体实现中,对于神经网络中的运算可以为神经网络中的一层的运算,对于多层神经网络,其实现过程是,在正向运算中,当上一层人工神经网络执行完成之后,下一层的运算指令会将运算单元中计算出的输出神经元作为下一层的输入神经元进行运算(或者是对该输出神经元进行某些操作再作为下一层的输入神经元),同时,将权值也替换为下一层的权值;在反向运算中,当上一层人工神经网络的反向运算执行完成后,下一层运算指令会将运算单元中计算出的输入神经元梯度作为下一层的输出神经元梯度进行运算(或者是对该输入神经元梯度进行某些操作再作为下一层的输出神经元梯度),同时将权值替换为下一层的权值。In a specific implementation, the operation in the neural network can be the operation of a layer in the neural network. For a multi-layer neural network, the implementation process is that in the forward operation, after the execution of the previous layer of artificial neural network is completed, the operation instruction of the next layer will use the output neuron calculated in the operation unit as the input neuron of the next layer for operation (or perform certain operations on the output neuron and then use it as the input neuron of the next layer), and at the same time, the weights are also replaced by the weights of the next layer; in the reverse operation, when the reverse operation of the previous layer of artificial neural network is completed, the operation instruction of the next layer will use the input neuron gradient calculated in the operation unit as the output neuron gradient of the next layer for operation (or perform certain operations on the input neuron gradient and then use it as the output neuron gradient of the next layer), and at the same time, the weights are replaced by the weights of the next layer.

本发明实施例通过对第一权值矩阵进行分解,可以得到包含压缩参数的至少两个子矩阵,之后,根据公式求解这至少两个子矩阵中的每个子矩阵,并通过训练压缩后的神经网络以获得满足预设精度的第二权值矩阵,解决了现有技术中采用神经网络剪枝算法容易带来的神经网络的拓扑结构出现不规则的情形,可以对神经网络进行深度压缩,可以减少神经网络的计算量,提高运算速度。The embodiment of the present invention can obtain at least two sub-matrices containing compression parameters by decomposing the first weight matrix. Then, each of the at least two sub-matrices is solved according to the formula, and the compressed neural network is trained to obtain a second weight matrix that meets the preset accuracy. This solves the problem of irregular topological structure of the neural network that is easily caused by the use of the neural network pruning algorithm in the prior art, and can deeply compress the neural network, reduce the calculation amount of the neural network, and improve the operation speed.

为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了一种神经网络压缩装置,下面结合附图来进行详细说明:In order to better implement the above-mentioned solution of the embodiment of the present invention, the present invention also provides a neural network compression device, which is described in detail below with reference to the accompanying drawings:

如图17A所示的本发明实施例提供的神经网络压缩装置的结构示意图,该神经网络压缩装置包括:获取单元300、压缩单元13以及计算单元304;FIG. 17A is a schematic diagram of the structure of a neural network compression device provided by an embodiment of the present invention, wherein the neural network compression device includes: an acquisition unit 300, a compression unit 13, and a calculation unit 304;

其中,所述获取单元300,用于获取第一输入数据;其中,所述第一输入数据包括第一权值矩阵;Wherein, the acquisition unit 300 is used to acquire first input data; wherein, the first input data includes a first weight matrix;

所述压缩单元13,用于将所述第一权值矩阵压缩为第二权值矩阵;其中,所述第二权值矩阵中包括至少两个子矩阵;The compression unit 13 is used to compress the first weight matrix into a second weight matrix; wherein the second weight matrix includes at least two sub-matrices;

所述计算单元304,用于根据第二输入数据执行神经网络计算,其中,所述第二输入数据包括所述第二权值矩阵以及输入神经元数据。The calculation unit 304 is used to perform neural network calculation according to the second input data, wherein the second input data includes the second weight matrix and input neuron data.

在其中一个实施方式中,如图17B所示,压缩单元13包括分解单元130、求解单元131以及训练单元132;In one embodiment, as shown in FIG17B , the compression unit 13 includes a decomposition unit 130 , a solution unit 131 , and a training unit 132 ;

其中,所述分解单元130,用于将所述第一权值矩阵分解成第三权值矩阵;其中,所述第三权值矩阵包括至少两个子矩阵;The decomposition unit 130 is used to decompose the first weight matrix into a third weight matrix; wherein the third weight matrix includes at least two sub-matrices;

所述求解单元131,用于根据第一公式确定所述至少两个子矩阵中的每个子矩阵的大小,所述第一公式为Q≈Q1*Q2*......*Qn;其中,所述Q表示第一权值矩阵;所述Q1表示所述至少两个子矩阵中的第一子矩阵;所述Q2表示所述至少两个子矩阵中的第二子矩阵;所述Qn表示所述至少两个子矩阵中的第n子矩阵;The solving unit 131 is used to determine the size of each submatrix in the at least two submatrices according to a first formula, wherein the first formula is Q≈Q 1 *Q 2 *......*Q n ; wherein Q represents a first weight matrix; Q 1 represents a first submatrix in the at least two submatrices; Q 2 represents a second submatrix in the at least two submatrices; and Q n represents an nth submatrix in the at least two submatrices;

所述训练单元132,用于调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵。The training unit 132 is used to adjust the size of each sub-matrix of the at least two sub-matrices, and obtain a second weight matrix that meets a preset accuracy by training the compressed machine learning model.

可选的,所述求解单元131,具体用于根据所述第一公式和第二公式确定所述至少两个子矩阵中的每个子矩阵的大小,所述第二公式为||Q-Q1*Q2*......*Qn||≤T,其中,所述T表示预设的误差阈值。Optionally, the solving unit 131 is specifically configured to determine the size of each submatrix of the at least two submatrices according to the first formula and a second formula, wherein the second formula is ||QQ 1 *Q 2 *......*Q n ||≤T, where T represents a preset error threshold.

可选的,所述训练单元132,具体用于调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度并且与所述第一权值矩阵之间的压缩比满足预设压缩比的第二权值矩阵。Optionally, the training unit 132 is specifically used to adjust the size of each sub-matrix of the at least two sub-matrices, and to obtain a second weight matrix that meets a preset accuracy and has a compression ratio that meets a preset compression ratio with the first weight matrix by training the compressed machine learning model.

可选的,所述神经网络包括全连接层神经网络;所述第一公式包括:M≈M1*M2;所述两个子矩阵指包括第一子矩阵M1和第二子矩阵M2,所述M1为Nin*K矩阵,所述M2为K*Nout矩阵;其中,K为压缩参数,Nin为所述神经网络的输入神经元的个数,Nout为所述神经网络的输出神经元的个数;所述压缩参数用于表征所述M1的输出神经元的个数以及所述M2的输入神经元的个数,所述K为大于0且小于等于min(Nin,Nout)的正整数。Optionally, the neural network includes a fully connected layer neural network; the first formula includes: M≈M 1 *M 2 ; the two sub-matrices refer to a first sub-matrix M1 and a second sub-matrix M2, the M1 is an N in *K matrix, and the M2 is a K*N out matrix; wherein K is a compression parameter, N in is the number of input neurons of the neural network, and N out is the number of output neurons of the neural network; the compression parameter is used to characterize the number of output neurons of M1 and the number of input neurons of M2, and K is a positive integer greater than 0 and less than or equal to min(N in , N out ).

可选的,所述神经网络包括卷积层神经网络;所述卷积层神经网络包括Nfin*Nfout个卷积核;所述第一公式包括:F≈F1*F2;其中,F表示所述Nfin*Nfout个卷积核中的任意一个卷积核;所述F1为第一子卷积核;所述F2为第二子卷积核;所述第一子卷积核F1为(Kx,R),所述第二子卷积核F2为(R,Ky),(Kx,Ky)表示卷积核的大小,R为压缩参数,所述R为大于0且小于等于min(Kx,Ky)的正整数。Optionally, the neural network includes a convolutional layer neural network; the convolutional layer neural network includes N fin *N fout convolution kernels; the first formula includes: F≈F 1 *F 2 ; wherein F represents any one of the N fin *N fout convolution kernels; the F1 is the first sub-convolution kernel; the F2 is the second sub-convolution kernel; the first sub-convolution kernel F1 is (K x , R), the second sub-convolution kernel F2 is (R, Ky ), (K x , Ky ) represents the size of the convolution kernel, R is a compression parameter, and R is a positive integer greater than 0 and less than or equal to min(K x , Ky ).

可选的,所述神经网络包括LSTM层神经网络,所述LSTM层包括N个全连接层,所述N为大于0的正整数;针对第j个全连接层,所述第一公式包括:Mj≈Mj_1*Mj_2;所述第j个全连接层中的两个子矩阵包括第一子矩阵Mj_1和第二子矩阵Mj_2,所述Mj_1为Nin_j*S矩阵,所述Mj_2为S*Nout_j矩阵;其中,S为压缩参数,Nin_j为所述神经网络第j个全连接层的输入神经元的个数,Nout_j为所述神经网络第j个全连接层的输出神经元的个数;所述压缩参数用于表征所述Mj_1的输出神经元的个数以及所述Mj_2的输入神经元的个数,所述S为大于0且小于等于min(Nin_j,Nout_j)的正整数。Optionally, the neural network includes an LSTM layer neural network, the LSTM layer includes N fully connected layers, and N is a positive integer greater than 0; for the j-th fully connected layer, the first formula includes: M j ≈M j_1 *M j_2 ; the two sub-matrices in the j-th fully connected layer include a first sub-matrix M j_1 and a second sub-matrix M j_2 , the M j_1 is an N in_j *S matrix, and the M j_2 is an S*N out_j matrix; wherein S is a compression parameter, N in_j is the number of input neurons of the j-th fully connected layer of the neural network, and N out_j is the number of output neurons of the j-th fully connected layer of the neural network; the compression parameter is used to characterize the number of output neurons of the M j_1 and the number of input neurons of the M j_2 , and S is a positive integer greater than 0 and less than or equal to min(N in_j , N out_j ).

本发明实施例通过对第一权值矩阵进行分解,可以得到包含压缩参数的至少两个子矩阵,之后,根据公式求解这至少两个子矩阵中的每个子矩阵,并通过训练压缩后的神经网络以获得满足预设精度的第二权值矩阵,解决了现有技术中采用神经网络剪枝算法容易带来的神经网络的拓扑结构出现不规则的情形,对神经网络进行深度压缩,可以减少神经网络的计算量,提高运算速度。The embodiment of the present invention can obtain at least two sub-matrices containing compression parameters by decomposing the first weight matrix. Then, each of the at least two sub-matrices is solved according to the formula, and the compressed neural network is trained to obtain a second weight matrix that meets the preset accuracy. This solves the problem of irregular topological structure of the neural network that is easily caused by the use of the neural network pruning algorithm in the prior art. The deep compression of the neural network can reduce the calculation amount of the neural network and improve the operation speed.

为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了另一种电子设备,下面结合附图来进行详细说明:In order to better implement the above solution of the embodiment of the present invention, the present invention also provides another electronic device, which is described in detail below with reference to the accompanying drawings:

如图18示出的本发明实施例提供的电子设备的结构示意图,电子设备40可以包括处理器401、存储器404和通信模块405,处理器401、存储器404和通信模块405可以通过总线406相互连接。存储器404可以是高速随机存储记忆体(Random Access Memory,RAM)存储器,也可以是非易失性的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器404可选的还可以是至少一个位于远离前述处理器401的存储系统。存储器404用于存储应用程序代码,可以包括操作系统、网络通信模块、用户接口模块以及数据处理程序,通信模块405用于与外部设备进行信息交互;处理器401被配置用于调用该程序代码,执行以下步骤:As shown in FIG18 , a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention, the electronic device 40 may include a processor 401, a memory 404 and a communication module 405, and the processor 401, the memory 404 and the communication module 405 may be interconnected via a bus 406. The memory 404 may be a high-speed random access memory (RAM) memory, or a non-volatile memory, such as at least one disk memory. The memory 404 may optionally be at least one storage system located away from the aforementioned processor 401. The memory 404 is used to store application code, and may include an operating system, a network communication module, a user interface module and a data processing program. The communication module 405 is used to interact with external devices for information; the processor 401 is configured to call the program code and execute the following steps:

获取第一输入数据;其中,所述第一输入数据包括第一权值矩阵;Acquire first input data; wherein the first input data includes a first weight matrix;

将所述第一权值矩阵压缩为第二权值矩阵;其中,所述第二权值矩阵中包括至少两个子矩阵;Compressing the first weight matrix into a second weight matrix; wherein the second weight matrix includes at least two sub-matrices;

根据第二输入数据执行神经网络计算,其中,所述第二输入数据包括所述第二权值矩阵以及输入神经元数据。A neural network calculation is performed according to second input data, wherein the second input data includes the second weight matrix and input neuron data.

其中,处理器401将所述第一权值矩阵压缩为第二权值矩阵;其中,所述第二权值矩阵中包括至少两个子矩阵,可以包括:The processor 401 compresses the first weight matrix into a second weight matrix; wherein the second weight matrix includes at least two sub-matrices, which may include:

将所述第一权值矩阵分解成第三权值矩阵;其中,所述第三权值矩阵包括至少两个子矩阵;Decomposing the first weight matrix into a third weight matrix; wherein the third weight matrix includes at least two sub-matrices;

确定所述至少两个子矩阵中的每个子矩阵的大小,所述第一公式为Q≈Q1*Q2*......*Qn;其中,所述Q表示第一权值矩阵;所述Q1表示所述至少两个子矩阵中的第一子矩阵;所述Q2表示所述至少两个子矩阵中的第二子矩阵;所述Qn表示所述至少两个子矩阵中的第n子矩阵;Determine the size of each submatrix in the at least two submatrices, the first formula is Q≈Q 1 *Q 2 *......*Q n ; wherein Q represents a first weight matrix; Q 1 represents a first submatrix in the at least two submatrices; Q 2 represents a second submatrix in the at least two submatrices; Q n represents an nth submatrix in the at least two submatrices;

调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵。The size of each submatrix of the at least two submatrices is adjusted, and a second weight matrix satisfying a preset accuracy is obtained by training a compressed machine learning model.

其中,处理器401根据第一公式确定所述至少两个子矩阵中的每个子矩阵,所述第一公式为Q≈Q1*Q2*......*Qn,可以包括:The processor 401 determines each sub-matrix of the at least two sub-matrices according to a first formula, where the first formula is Q≈Q 1 *Q 2 *...*Q n , and may include:

根据所述第一公式和第二公式确定所述两个子矩阵中的每个子矩阵的大小,所述第二公式为||Q-Q1*Q2*......*Qn||≤T,其中,所述T表示预设的误差阈值。The size of each of the two sub-matrices is determined according to the first formula and the second formula, wherein the second formula is ||QQ 1 *Q 2 *......*Q n ||≤T, wherein T represents a preset error threshold.

其中,处理器401调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度的第二权值矩阵,可以包括:The processor 401 adjusts the size of each of the at least two sub-matrices and obtains a second weight matrix that meets a preset accuracy by training the compressed machine learning model, which may include:

调整所述至少两个子矩阵中的每个子矩阵的大小,并通过训练压缩后的机器学习模型,以得到满足预设精度并且与所述第一权值矩阵之间的压缩比满足预设压缩比的第二权值矩阵。The size of each submatrix of the at least two submatrices is adjusted, and the compressed machine learning model is trained to obtain a second weight matrix that meets the preset accuracy and has a compression ratio that meets the preset compression ratio with the first weight matrix.

其中,所述神经网络为全连接层神经网络;所述第一公式包括:M≈M1*M2;所述两个子矩阵指包括第一子矩阵M1和第二子矩阵M2,所述M1为Nin*K矩阵,所述M2为K*Nout矩阵;其中,K为压缩参数,Nin为所述神经网络的输入神经元的个数,Nout为所述神经网络的输出神经元的个数;所述压缩参数用于表征所述M1的输出神经元的个数以及所述M2的输入神经元的个数,所述K为大于0且小于等于min(Nin,Nout)的正整数。Wherein, the neural network is a fully connected layer neural network; the first formula includes: M≈M 1 *M 2 ; the two sub-matrices refer to a first sub-matrix M1 and a second sub-matrix M2, the M1 is an N in *K matrix, and the M2 is a K*N out matrix; wherein K is a compression parameter, N in is the number of input neurons of the neural network, and N out is the number of output neurons of the neural network; the compression parameter is used to characterize the number of output neurons of the M1 and the number of input neurons of the M2, and K is a positive integer greater than 0 and less than or equal to min(N in , N out ).

其中,所述神经网络为卷积层神经网络;所述卷积层神经网络包括Nfin*Nfout个卷积核;所述第一公式包括:F≈F1*F2;其中,F表示所述Nfin*Nfout个卷积核中的任意一个卷积核;所述F1为第一子卷积核;所述F2为第二子卷积核;所述第一子卷积核F1为(Kx,R),所述第二子卷积核F2为(R,Ky),(Kx,Ky)表示卷积核的大小,R为压缩参数,所述R为大于0且小于等于min(Kx,Ky)的正整数。In which, the neural network is a convolutional layer neural network; the convolutional layer neural network includes N fin *N fout convolution kernels; the first formula includes: F≈F 1 *F 2 ; wherein F represents any one of the N fin *N fout convolution kernels; the F1 is the first sub-convolution kernel; the F2 is the second sub-convolution kernel; the first sub-convolution kernel F1 is (K x , R), the second sub-convolution kernel F2 is (R, Ky ), (K x , Ky ) represents the size of the convolution kernel, R is a compression parameter, and R is a positive integer greater than 0 and less than or equal to min(K x , Ky ).

其中,所述神经网络为LSTM层神经网络;所述LSTM层神经网络包括N个全连接层,所述N为大于0的正整数;针对第j个全连接层,所述第一公式包括:Mj≈Mj_1*Mj_2;所述第j个全连接层中的两个子矩阵包括第一子矩阵Mj_1和第二子矩阵Mj_2,所述Mj_1为Nin_j*S矩阵,所述Mj_2为S*Nout_j矩阵;其中,S为压缩参数,Nin_j为所述神经网络第j个全连接层的输入神经元的个数,Nout_j为所述神经网络第j个全连接层的输出神经元的个数;所述压缩参数用于表征所述Mj_1的输出神经元的个数以及所述Mj_2的输入神经元的个数,所述S为大于0且小于等于min(Nin_j,Nout_j)的正整数。Wherein, the neural network is an LSTM layer neural network; the LSTM layer neural network includes N fully connected layers, where N is a positive integer greater than 0; for the j-th fully connected layer, the first formula includes: M j ≈M j_1 *M j_2 ; the two sub-matrices in the j-th fully connected layer include a first sub-matrix M j_1 and a second sub-matrix M j_2 , where M j_1 is an N in_j *S matrix, and M j_2 is an S*N out_j matrix; wherein S is a compression parameter, N in_j is the number of input neurons of the j-th fully connected layer of the neural network, and N out_j is the number of output neurons of the j-th fully connected layer of the neural network; the compression parameter is used to characterize the number of output neurons of the M j_1 and the number of input neurons of the M j_2 , and S is a positive integer greater than 0 and less than or equal to min(N in_j , N out_j ).

需要说明的是,本发明实施例中的电子设备40中处理器的执行步骤可参考上述各方法实施例中图16实施例中的电子设备运行的具体实现方式,这里不再赘述。It should be noted that the execution steps of the processor in the electronic device 40 in the embodiment of the present invention can refer to the specific implementation methods of the operation of the electronic device in the embodiment of Figure 16 in the above-mentioned method embodiments, which will not be repeated here.

在实际应用中,电子设备40中的处理器401包括但不限于只有一个。在其中一个实施方式中,电子设备40中还包括处理图像的图形处理器GPU(GPU,Graphic ProcessingUni),也还可以包括嵌入式神经网络处理器(NPU,Neural-network Process Units)。此时,针对神经网络的压缩方法可以被集成在NPU中。在其中一个实施方式中,处理器401可以控制NPU执行针对第一权值矩阵的压缩方法。In practical applications, the processor 401 in the electronic device 40 includes but is not limited to only one. In one embodiment, the electronic device 40 also includes a graphics processor GPU (GPU, Graphic ProcessingUni) for processing images, and may also include an embedded neural network processor (NPU, Neural-network Process Units). In this case, the compression method for the neural network can be integrated into the NPU. In one embodiment, the processor 401 can control the NPU to execute the compression method for the first weight matrix.

在具体实现中,如前所述,电子设备40可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备,本发明实施例不作具体限定。In a specific implementation, as mentioned above, the electronic device 40 may include a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device, which is not specifically limited in the embodiments of the present invention.

本发明实施例还提供了一种计算机存储介质,用于存储为上述图16所示的电子设备所用的计算机软件指令,其包含用于执行上述方法实施例所涉及的程序。通过执行存储的程序,可以实现针对第一权值矩阵的压缩,以得到满足预设精度的第二权值矩阵,从而避免了神经网络模型的拓扑结构出现不规则,减少了神经网络的运算量。The embodiment of the present invention further provides a computer storage medium for storing computer software instructions used by the electronic device shown in FIG. 16, which includes a program for executing the method embodiment. By executing the stored program, the first weight matrix can be compressed to obtain a second weight matrix that meets the preset accuracy, thereby avoiding irregular topological structure of the neural network model and reducing the amount of computation of the neural network.

需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that, for the aforementioned method embodiments, for the sake of simplicity, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the present application is not limited by the described order of actions, because according to the present application, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the present application.

在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only schematic, such as the division of the units, which is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the device or unit can be electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of a software program module.

所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application, in essence, or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a memory and includes several instructions for a computer device (which can be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。A person skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash drive, a read-only memory (English: Read-Only Memory, abbreviated as: ROM), a random access memory (English: Random Access Memory, abbreviated as: RAM), a magnetic disk or an optical disk, etc.

以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the present application are introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea. At the same time, for general technical personnel in this field, according to the idea of the present application, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.

Claims (19)

1. A computing device for performing machine learning calculations, the computing device comprising: a compression unit, an operation unit and a controller unit;
the controller unit is used for acquiring a compression request for first input data and indicating the compression unit to compress the first input data according to the compression request; wherein the first input data comprises a first weight matrix;
The compression unit is used for compressing the first weight matrix into a second weight matrix;
the controller unit is also used for acquiring second input data and a calculation instruction; the second input data includes the second weight matrix and input neuron data;
the controller unit is further configured to parse the calculation instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the second input data to the operation unit;
the operation unit is used for acquiring the operation instruction and executing neural network calculation according to the operation instruction and the second input data; the neural network is a convolutional layer neural network; the compression unit includes: the system comprises a decomposition unit, a solving unit and a training unit;
The decomposing unit is used for decomposing the first weight matrix into a third weight matrix; wherein the third weight matrix comprises at least two sub-matrices;
the solving unit is used for determining the size of each sub-matrix in the at least two sub-matrices according to a first formula,
The training unit is used for adjusting the size of each of the at least two submatrices to obtain a second weight matrix;
The computing device is used for executing convolutional neural network computation; the convolutional layer neural network comprises N fin*Nfout convolutional kernels; the first formula includes: f is approximately equal to F 1*F2; wherein F represents any one of the N fin*Nfout convolution kernels; the F1 is a first sub convolution kernel; the F2 is a second sub convolution kernel; the first sub convolution kernel F1 is (K x, R), the second sub convolution kernel F2 is (R, K y),(Kx,Ky) representing the size of the convolution kernel, R is a compression parameter, and R is a positive integer greater than 0 and less than or equal to min (K x,Ky);
The arithmetic unit includes: a master processing circuit and a plurality of slave processing circuits;
the main processing circuit executes preamble processing on the second input data and transmits data and operation instructions with the plurality of auxiliary processing circuits;
the plurality of slave processing circuits execute intermediate operation in parallel according to the data and operation instructions transmitted from the master processing circuit to obtain a plurality of intermediate results, and the plurality of intermediate results are transmitted to the master processing circuit;
and the main processing circuit executes subsequent processing on the plurality of intermediate results to obtain a calculation result of the calculation instruction.
2. The computing device of claim 1, wherein the computing device is configured to,
The first formula is Q approximately equal to Q 1*Q2*......*Qn; wherein, Q represents a first weight matrix; the Q 1 represents a first sub-matrix of the at least two sub-matrices; the Q 2 represents a second sub-matrix of the at least two sub-matrices; the Q n represents an nth sub-matrix of the at least two sub-matrices;
the training unit is used for adjusting the size of each of the at least two submatrices and obtaining a second weight matrix meeting preset precision by training the compressed machine learning model.
3. The computing device of claim 2, wherein the solving unit is configured to determine each of the at least two submatrices according to a first formula, where the first formula is q≡q 1*Q2*......*Qn, and includes:
The solving unit is specifically configured to determine a size of each of the at least two submatrices according to the first formula and the second formula, where the second formula is ||q-Q 1*Q2*......*Qn |t, and the T represents a preset error threshold.
4. The computing device of claim 2, wherein the training unit to resize each of the at least two submatrices and to derive the second weight matrix satisfying the preset accuracy by training the compressed machine learning model comprises:
the training unit is specifically configured to adjust a size of each of the at least two submatrices, and obtain a second weight matrix that meets a preset precision and a compression ratio with the first weight matrix meets a preset compression ratio by training a compressed machine learning model.
5. The computing device of any of claims 2 to 4, wherein the computing device is configured to perform full-connection layer neural network computations; the at least two sub-matrices include two sub-matrices; the first formula includes: m is approximately equal to M 1*M2; the two submatrices comprise a first submatrix M1 and a second submatrix M2, wherein M1 is an N in x K matrix, and M2 is a K x N out matrix; wherein K is a compression parameter, N in is the number of input neurons of the neural network, and N out is the number of output neurons of the neural network; the compression parameter is used for representing the number of output neurons of the M1 and the number of input neurons of the M2, and K is a positive integer which is more than 0 and less than or equal to min (N in,Nout).
6. The computing device of any of claims 2-4, wherein the computing device is configured to perform LSTM layer neural network computations, the LSTM layer comprising N fully connected layers, the N being a positive integer greater than 0; for the j-th fully connected layer, the first formula includes: m j≈Mj_1*Mj_2; the two submatrices in the j-th full connection layer include a first submatrix M j_1 and a second submatrix M j_2, where M j_1 is an N in_j x S matrix, and M j_2 is an S x N out_j matrix; wherein S is a compression parameter, N in_j is the number of input neurons of the j-th full-connection layer of the neural network, and N out_j is the number of output neurons of the j-th full-connection layer of the neural network; the compression parameter is used for representing the number of output neurons of the M j_1 and the number of input neurons of the M j_2, and S is a positive integer greater than 0 and less than or equal to min (N in_j,Nout_j).
7. The computing device of claim 1, wherein the computing device further comprises: a storage unit and a direct memory access unit, the storage unit comprising: registers, caches, any combination;
The buffer is used for storing the first input data and the second input data;
The register is used for storing the first input data and the second input data;
The cache includes a scratch pad cache;
The controller unit includes: an instruction storage unit, an instruction processing unit and a storage queue unit;
the instruction storage unit is used for storing calculation instructions related to the artificial neural network operation;
the instruction processing unit is used for analyzing the calculation instructions to obtain a plurality of operation instructions;
The store queue unit is configured to store an instruction queue, where the instruction queue includes: a plurality of operation instructions or calculation instructions to be executed according to the front-back sequence of the queue;
The controller unit includes a main processing circuit including: a dependency relationship processing unit;
The dependency relation processing unit is used for determining whether a first operation instruction and a zeroth operation instruction before the first operation instruction have an association relation, if so, caching the first operation instruction in the instruction storage unit, and after the zeroth operation instruction is executed, extracting the first operation instruction from the instruction storage unit and transmitting the first operation instruction to the operation unit;
the determining whether the association relationship exists between the first operation instruction and the zeroth operation instruction before the first operation instruction includes:
Extracting a first storage address interval of required data in the first operation instruction according to the first operation instruction, extracting a zeroth storage address interval of required data in the zeroth operation instruction according to the zeroth operation instruction, determining that the first operation instruction and the zeroth operation instruction have an association relation if the first storage address interval and the zeroth storage address interval have overlapping areas, and determining that the first operation instruction and the zeroth operation instruction do not have an association relation if the first storage address interval and the zeroth storage address interval do not have overlapping areas.
8. A machine learning computing device, characterized in that the machine learning computing device comprises one or more computing devices according to any one of claims 1-7, and is configured to obtain input data and control information to be computed from other processing devices, perform specified machine learning operations, and transmit the execution results to the other processing devices through I/O interfaces;
when the machine learning computing device comprises a plurality of computing devices, the computing devices are connected through a specific structure and data are transmitted;
the computing devices are interconnected through a PCIE bus of a rapid external equipment interconnection bus and transmit data so as to support larger-scale machine learning operation; a plurality of the computing devices share the same control system or have respective control systems; a plurality of computing devices share memory or have respective memories; the manner in which the plurality of computing devices are interconnected is an arbitrary interconnection topology.
9. A combination processing device, comprising the machine learning computing device of claim 8, a universal interconnect interface, a storage device, and other processing devices;
The machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user; the storage device is connected with the machine learning operation device and the other processing device respectively and used for storing data of the machine learning operation device and the other processing device.
10. A neural network chip, characterized in that the machine learning chip includes the machine learning arithmetic device according to claim 8 or the combination processing device according to claim 9.
11. An electronic device, characterized in that, the electronic device comprising the chip of claim 10.
12. A board, characterized in that, the board includes: a memory device, an interface device and a control device, and a neural network chip as claimed in claim 10;
The neural network chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
The interface device is used for realizing data transmission between the chip and external equipment;
the control device is used for monitoring the state of the chip.
13. A computing method of executing a machine learning model, characterized in that the computing method is applied to a computing device for executing a machine learning calculation; the computing device includes: a compression unit, an operation unit and a controller unit; the method comprises the following steps:
the controller unit obtains a compression request for first input data and instructs the compression unit to compress the first input data according to the compression request; wherein the first input data comprises a first weight matrix;
The compression unit is used for compressing the first weight matrix into a second weight matrix;
the controller unit acquires second input data and a calculation instruction; the second input data includes the second weight matrix and input neuron data;
the controller unit analyzes the calculation instruction to obtain a plurality of calculation instructions, and sends the plurality of calculation instructions and the second input data to the calculation unit;
The operation unit obtains the operation instruction and executes neural network calculation according to the operation instruction and the second input data; the neural network is a convolutional layer neural network;
the compression unit includes: the device comprises a decomposition unit, a solving unit and a training unit;
The decomposing unit is used for decomposing the first weight matrix into a third weight matrix; wherein the third weight matrix comprises at least two sub-matrices;
the solving unit is used for determining the size of each sub-matrix in the at least two sub-matrices according to a first formula,
The training unit is used for adjusting the size of each of the at least two submatrices to obtain a second weight matrix;
The computing device is used for executing convolutional neural network computation; the convolutional layer neural network comprises N fin*Nfout convolutional kernels; the first formula includes: f is approximately equal to F 1*F2; wherein F represents any one of the N fin*Nfout convolution kernels; the F1 is a first sub convolution kernel; the F2 is a second sub convolution kernel; the first sub convolution kernel F1 is (K x, R), the second sub convolution kernel F2 is (R, K y),(Kx,Ky) representing the size of the convolution kernel, R is a compression parameter, and R is a positive integer greater than 0 and less than or equal to min (K x,Ky);
The arithmetic unit includes: a master processing circuit and a plurality of slave processing circuits;
the main processing circuit executes preamble processing on the second input data and transmits data and operation instructions with the plurality of auxiliary processing circuits;
the plurality of slave processing circuits execute intermediate operation in parallel according to the data and operation instructions transmitted from the master processing circuit to obtain a plurality of intermediate results, and the plurality of intermediate results are transmitted to the master processing circuit;
and the main processing circuit executes subsequent processing on the plurality of intermediate results to obtain a calculation result of the calculation instruction.
14. The method of claim 13, wherein the step of determining the position of the probe is performed,
The first formula is Q approximately equal to Q 1*Q2*......*Qn; wherein, Q represents a first weight matrix; the Q 1 represents a first sub-matrix of the at least two sub-matrices; the Q 2 represents a second sub-matrix of the at least two sub-matrices; the Q n represents an nth sub-matrix of the at least two sub-matrices;
the training unit is used for adjusting the size of each of the at least two submatrices and obtaining a second weight matrix meeting preset precision by training the compressed machine learning model.
15. The method of claim 14, wherein the solving unit is configured to determine each of the at least two sub-matrices according to a first formula, where the first formula is q≡q 1*Q2*......*Qn, and includes:
The solving unit is specifically configured to determine a size of each of the at least two submatrices according to the first formula and the second formula, where the second formula is ||q-Q 1*Q2*......*Qn |t, and the T represents a preset error threshold.
16. The method according to claim 14, wherein the training unit, configured to adjust the size of each of the at least two sub-matrices, and obtain the second weight matrix satisfying the preset precision by training the compressed machine learning model, includes:
the training unit is specifically configured to adjust a size of each of the at least two submatrices, and obtain a second weight matrix that meets a preset precision and a compression ratio with the first weight matrix meets a preset compression ratio by training a compressed machine learning model.
17. The method of any of claims 14-16, wherein the computing device is configured to perform a full connection layer neural network calculation; the at least two sub-matrices include two sub-matrices; the first formula includes: m is approximately equal to M 1*M2; the two submatrices comprise a first submatrix M1 and a second submatrix M2, wherein M1 is an N in x K matrix, and M2 is a K x N out matrix; wherein K is a compression parameter, N in is the number of input neurons of the neural network, and N out is the number of output neurons of the neural network; the compression parameter is used for representing the number of output neurons of the M1 and the number of input neurons of the M2, and K is a positive integer which is more than 0 and less than or equal to min (N in,Nout).
18. The method of any of claims 14-16, wherein the computing device is configured to perform LSTM layer neural network computations, the LSTM layer comprising N fully connected layers, the N being a positive integer greater than 0; for the j-th fully connected layer, the first formula includes: m j≈Mj_1*Mj_2; the two submatrices in the j-th full connection layer include a first submatrix M j_1 and a second submatrix M j_2, where M j_1 is an N in_j x S matrix, and M j_2 is an S x N out_j matrix; wherein S is a compression parameter, N in_j is the number of input neurons of the j-th full-connection layer of the neural network, and N out_j is the number of output neurons of the j-th full-connection layer of the neural network; the compression parameter is used for representing the number of output neurons of the M j_1 and the number of input neurons of the M j_2, and S is a positive integer greater than 0 and less than or equal to min (N in_j,Nout_j).
19. The method of claim 13, wherein the computing device further comprises: a storage unit and a direct memory access unit, the storage unit comprising: registers, caches, any combination;
the cache stores the first input data and the second input data;
The register stores scalar quantities in the first input data and the second input data; the cache includes a scratch pad cache;
The controller unit includes: an instruction storage unit, an instruction processing unit and a storage queue unit;
The instruction storage unit stores calculation instructions related to the artificial neural network operation;
the instruction processing unit analyzes the calculation instructions to obtain a plurality of operation instructions;
The store queue unit stores an instruction queue, the instruction queue comprising: a plurality of operation instructions or calculation instructions to be executed according to the front-back sequence of the queue;
The controller unit includes a main processing circuit including: a dependency relationship processing unit;
The dependency relation processing unit determines whether a first operation instruction and a zeroth operation instruction before the first operation instruction have an association relation, if so, the first operation instruction is cached in the instruction storage unit, and after the execution of the zeroth operation instruction is finished, the first operation instruction is extracted from the instruction storage unit and transmitted to the operation unit;
the determining whether the association relationship exists between the first operation instruction and the zeroth operation instruction before the first operation instruction includes:
Extracting a first storage address interval of required data in the first operation instruction according to the first operation instruction, extracting a zeroth storage address interval of required data in the zeroth operation instruction according to the zeroth operation instruction, determining that the first operation instruction and the zeroth operation instruction have an association relation if the first storage address interval and the zeroth storage address interval have overlapping areas, and determining that the first operation instruction and the zeroth operation instruction do not have an association relation if the first storage address interval and the zeroth storage address interval do not have overlapping areas.
CN201811566331.6A 2018-12-20 2018-12-20 Computing device and related product Active CN111353591B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811585964.1A CN111353598B (en) 2018-12-20 2018-12-20 Neural network compression method, electronic equipment and computer readable medium
CN201811566331.6A CN111353591B (en) 2018-12-20 2018-12-20 Computing device and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811566331.6A CN111353591B (en) 2018-12-20 2018-12-20 Computing device and related product

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201811585964.1A Division CN111353598B (en) 2018-12-20 2018-12-20 Neural network compression method, electronic equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN111353591A CN111353591A (en) 2020-06-30
CN111353591B true CN111353591B (en) 2024-08-20

Family

ID=71193691

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811585964.1A Active CN111353598B (en) 2018-12-20 2018-12-20 Neural network compression method, electronic equipment and computer readable medium
CN201811566331.6A Active CN111353591B (en) 2018-12-20 2018-12-20 Computing device and related product

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201811585964.1A Active CN111353598B (en) 2018-12-20 2018-12-20 Neural network compression method, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (2) CN111353598B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065899A (en) * 2020-07-29 2022-02-18 中科亿海微电子科技(苏州)有限公司 Full-connection layer compression method, full-connection layer compression device, electronic equipment, accelerator and storage medium
CN114168895B (en) * 2020-09-11 2025-04-08 广州希姆半导体科技有限公司 Matrix computing circuit, method, electronic device, and computer-readable storage medium
CN114168894A (en) * 2020-09-11 2022-03-11 北京希姆计算科技有限公司 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN112200301B (en) * 2020-09-18 2024-04-09 星宸科技股份有限公司 Convolution computing device and method
CN112101487B (en) * 2020-11-17 2021-07-16 深圳感臻科技有限公司 Compression method and device for fine-grained recognition model
CN112329926B (en) * 2020-11-30 2024-09-10 珠海采筑电子商务有限公司 Quality improvement method and system for intelligent robot
CN112580639B (en) * 2021-03-01 2021-08-13 四川大学 An image recognition method for early gastric cancer based on evolutionary neural network model compression
CN113255253B (en) * 2021-06-03 2022-05-24 北京华大九天科技股份有限公司 Matrix fast decomposition method based on resistance-capacitance network

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795344B (en) * 2010-03-02 2013-03-27 北京大学 Digital hologram compression method and system, decoding method and system, and transmission method and system
US10223635B2 (en) * 2015-01-22 2019-03-05 Qualcomm Incorporated Model compression and fine-tuning
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
US10515307B2 (en) * 2015-06-05 2019-12-24 Google Llc Compressed recurrent neural network models
CN106991477B (en) * 2016-01-20 2020-08-14 中科寒武纪科技股份有限公司 Artificial neural network compression coding device and method
CN107329936A (en) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing neural network computing and matrix/vector computing
US10832123B2 (en) * 2016-08-12 2020-11-10 Xilinx Technology Beijing Limited Compression of deep neural networks with proper use of mask
CN107239825B (en) * 2016-08-22 2021-04-09 赛灵思电子科技(北京)有限公司 Deep neural network compression method considering load balance
US10762426B2 (en) * 2016-08-12 2020-09-01 Beijing Deephi Intelligent Technology Co., Ltd. Multi-iteration compression for deep neural networks
US10984308B2 (en) * 2016-08-12 2021-04-20 Xilinx Technology Beijing Limited Compression method for deep neural networks with load balance
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
WO2018107414A1 (en) * 2016-12-15 2018-06-21 上海寒武纪信息科技有限公司 Apparatus, equipment and method for compressing/decompressing neural network model
CN107798697A (en) * 2017-10-26 2018-03-13 中国科学院深圳先进技术研究院 A kind of medical image registration method based on convolutional neural networks, system and electronic equipment
CN107832835A (en) * 2017-11-14 2018-03-23 贵阳海信网络科技有限公司 The light weight method and device of a kind of convolutional neural networks
CN108280514B (en) * 2018-01-05 2020-10-16 中国科学技术大学 FPGA-based sparse neural network acceleration system and design method
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108932548A (en) * 2018-05-22 2018-12-04 中国科学技术大学苏州研究院 A kind of degree of rarefication neural network acceleration system based on FPGA

Also Published As

Publication number Publication date
CN111353598B (en) 2024-09-24
CN111353598A (en) 2020-06-30
CN111353591A (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN111353591B (en) Computing device and related product
CN110383300B (en) A computing device and method
US20210117810A1 (en) On-chip code breakpoint debugging method, on-chip processor, and chip breakpoint debugging system
CN109522052B (en) Computing device and board card
CN110163363B (en) Computing device and method
CN107832845A (en) A kind of information processing method and Related product
CN110276447B (en) Computing device and method
CN111047022B (en) Computing device and related product
US12094456B2 (en) Information processing method and system
CN111488963B (en) Neural network computing device and method
CN109753319B (en) Device for releasing dynamic link library and related product
CN110059797B (en) Computing device and related product
CN109711540B (en) Computing device and board card
CN110059809B (en) Computing device and related product
CN111047021B (en) Computing device and related product
CN111930681A (en) Computing device and related product
CN111382848B (en) Computing device and related product
CN111291871B (en) Computing device and related product
CN111198714B (en) Retraining method and related product
CN111047024A (en) Computing device and related product
CN111222632B (en) Computing device, computing method and related product
CN111368985B (en) A neural network computing device and method
CN111738429B (en) Computing device and related product
CN118278472A (en) Quantization processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment
TG01 Patent term adjustment