CN111723920B - Artificial intelligence computing devices and related products - Google Patents
Artificial intelligence computing devices and related products Download PDFInfo
- Publication number
- CN111723920B CN111723920B CN201910226552.7A CN201910226552A CN111723920B CN 111723920 B CN111723920 B CN 111723920B CN 201910226552 A CN201910226552 A CN 201910226552A CN 111723920 B CN111723920 B CN 111723920B
- Authority
- CN
- China
- Prior art keywords
- instruction
- storage
- preset
- calculation
- load
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 32
- 238000003860 storage Methods 0.000 claims abstract description 357
- 238000004364 calculation method Methods 0.000 claims abstract description 185
- 238000000034 method Methods 0.000 claims description 33
- 238000012546 transfer Methods 0.000 claims description 16
- 238000010801 machine learning Methods 0.000 abstract description 30
- 238000012545 processing Methods 0.000 description 41
- 238000013528 artificial neural network Methods 0.000 description 19
- 230000015654 memory Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 8
- 238000004806 packaging method and process Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 2
- 241000209094 Oryza Species 0.000 description 2
- 235000007164 Oryza sativa Nutrition 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000002595 magnetic resonance imaging Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 235000009566 rice Nutrition 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000002604 ultrasonography Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30069—Instruction skipping instructions, e.g. SKIP
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请提供了一种人工智能计算装置及相关产品,该人工智能计算装置用于用于执行机器学习计算,本申请实施例针对构成循环体的两个以上指令集中的指令,通过将针对重复的指令使用操作码存储区域的同一操作码,节省操作码的存储空间,可缩减第二时间片中的指令集中各指令的代码量,也可节省指令存储空间,提高运算效率。
The present application provides an artificial intelligence computing device and related products, which are used to perform machine learning calculations. The embodiments of the present application save storage space for the opcodes by using the same opcode in the opcode storage area for repeated instructions in two or more instruction sets that constitute a loop body. The amount of code for each instruction in the instruction set in the second time slice can be reduced, instruction storage space can be saved, and computing efficiency can be improved.
Description
技术领域Technical Field
本申请涉及信息处理技术领域,具体涉及一种人工智能计算装置及相关产品。The present application relates to the field of information processing technology, and in particular to an artificial intelligence computing device and related products.
背景技术Background technique
人工神经网络是一种功能强大的算法,近年来被应用于图像、语言等各种领域。而人工智能计算装置的出现可以使神经网络得到硬件的支持,更高效地进行计算。人工智能计算装置一般有自己的指令集,指令集中会包含较多的待执行指令,执行指令集中的所有指令耗时较长,效率受到影响,也会包含重复执行的指令,例如,在进行数据加载的过程中,若数据规模较大,则需要多次搬运才能完成地址空间转换,又例如,模板运算中重复的加法乘法运算等。这里计数的重复计算在正常的操作中是直接进行直接的展开计算,每一指令会对应一段执行代码,重复的指令对应的代码会占用较多的存储空间。Artificial neural network is a powerful algorithm that has been applied to various fields such as images and language in recent years. The emergence of artificial intelligence computing devices can enable neural networks to be supported by hardware and perform calculations more efficiently. Artificial intelligence computing devices generally have their own instruction sets, which contain a large number of instructions to be executed. It takes a long time to execute all the instructions in the instruction set, which affects efficiency. It also contains repeated instructions. For example, in the process of data loading, if the data scale is large, it needs to be moved multiple times to complete the address space conversion. For example, repeated addition and multiplication operations in template operations. The repeated calculation of the count here is directly expanded in normal operation. Each instruction corresponds to a section of execution code. The code corresponding to the repeated instructions will take up more storage space.
发明内容Summary of the invention
本申请实施例提供了一种人工智能计算装置及相关产品,可减少指令的指令信息的代码量,提高指令计算效率。The embodiments of the present application provide an artificial intelligence computing device and related products, which can reduce the amount of code in the instruction information of an instruction and improve the efficiency of instruction calculation.
第一方面,提供一种人工智能计算装置,所述人工智能计算装置包括控制器单元和执行单元;其中,In a first aspect, an artificial intelligence computing device is provided, the artificial intelligence computing device comprising a controller unit and an execution unit; wherein:
所述控制器单元,用于获取待执行的第一指令集;以及,获取第二指令集;The controller unit is used to obtain a first instruction set to be executed; and obtain a second instruction set;
所述控制器单元,还用于确定所述第一指令集与所述第二指令集之间是否构成循环体;The controller unit is further used to determine whether the first instruction set and the second instruction set form a loop body;
所述执行单元,用于在所述第一指令集与所述第二指令集之间构成循环体时,根据所述第一指令集的指令信息执行所述第二指令集中的指令。The execution unit is used to execute instructions in the second instruction set according to instruction information of the first instruction set when a loop body is formed between the first instruction set and the second instruction set.
第二方面,本申请实施例提供了一种人工智能计算方法,应用于人工智能计算装置,所述方法包括:In a second aspect, an embodiment of the present application provides an artificial intelligence computing method, which is applied to an artificial intelligence computing device, and the method includes:
获取待执行的第一指令集;以及,获取第二指令集;Obtaining a first instruction set to be executed; and, obtaining a second instruction set;
确定所述第一指令集与所述第二指令集之间是否构成循环体;Determine whether the first instruction set and the second instruction set form a loop body;
在所述第一指令集与所述第二指令集之间构成循环体时,根据所述第一指令集的指令信息执行所述第二指令集中的指令。When a loop body is formed between the first instruction set and the second instruction set, instructions in the second instruction set are executed according to instruction information of the first instruction set.
第三方面,本申请实施例提供了一种机器学习运算装置,该机器学习运算装置包括一个或者多个第一方面所述的人工智能计算装置。该机器学习运算装置用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给外围设备;In a third aspect, an embodiment of the present application provides a machine learning computing device, which includes one or more artificial intelligence computing devices described in the first aspect. The machine learning computing device is used to obtain data to be computed and control information from other processing devices, and perform specified machine learning operations, and transmit the execution results to peripheral devices through an I/O interface;
当所述机器学习运算装置包含多个所述计算装置时,所述多个所述计算装置间可以通过特定的结构进行链接并传输数据;When the machine learning computing device includes a plurality of computing devices, the plurality of computing devices may be linked and transmit data through a specific structure;
其中,多个所述计算装置通过PCIE总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述计算装置共享同一控制系统或拥有各自的控制系统;多个所述计算装置共享内存或者拥有各自的内存;多个所述计算装置的互联方式是任意互联拓扑。Among them, multiple computing devices are interconnected and transmit data through a PCIE bus to support larger-scale machine learning operations; multiple computing devices share the same control system or have their own control systems; multiple computing devices share memory or have their own memory; the interconnection method of multiple computing devices is any interconnection topology.
第四方面,本申请实施例提供了一种组合处理装置,该组合处理装置包括如第三方面所述的机器学习运算装置、通用互联接口,和其他处理装置。该机器学习运算装置与上述其他处理装置进行交互,共同完成用户指定的操作。该组合处理装置还可以包括存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。In a fourth aspect, an embodiment of the present application provides a combined processing device, which includes a machine learning computing device, a universal interconnection interface, and other processing devices as described in the third aspect. The machine learning computing device interacts with the other processing devices to jointly complete the operations specified by the user. The combined processing device may also include a storage device, which is respectively connected to the machine learning computing device and the other processing devices, and is used to store data of the machine learning computing device and the other processing devices.
第五方面,本申请实施例提供了一种神经网络芯片,该神经网络芯片包括上述第一方面所述的计算装置、上述第三方面所述的机器学习运算装置或者上述第四方面所述的组合处理装置。In a fifth aspect, an embodiment of the present application provides a neural network chip, which includes the computing device described in the first aspect, the machine learning computing device described in the third aspect, or the combined processing device described in the fourth aspect.
第六方面,本申请实施例提供了一种神经网络芯片封装结构,该神经网络芯片封装结构包括上述第五方面所述的神经网络芯片;In a sixth aspect, an embodiment of the present application provides a neural network chip packaging structure, wherein the neural network chip packaging structure includes the neural network chip described in the fifth aspect above;
第七方面,本申请实施例提供了一种板卡,该板卡包括上述第六方面所述的神经网络芯片封装结构。In the seventh aspect, an embodiment of the present application provides a board card, which includes the neural network chip packaging structure described in the sixth aspect.
第八方面,本申请实施例提供了一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如第二方面所述的方法步骤。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute the method steps described in the second aspect.
第九方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如第二方面所述的方法步骤。In a ninth aspect, an embodiment of the present application provides a computer program product, comprising a non-transitory computer-readable storage medium storing a computer program, wherein the computer program is operable to cause a computer to execute the method steps described in the second aspect.
第十方面,本申请实施例提供了一种电子装置,该电子装置包括上述第五方面所述的神经网络芯片或者上述第七方面所述的板卡。In the tenth aspect, an embodiment of the present application provides an electronic device, which includes the neural network chip described in the fifth aspect or the board described in the seventh aspect.
在一些实施例中,所述电子装置包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
在一些实施例中,所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the means of transportation include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound equipment and/or electrocardiographs.
可以看出,在本申请实施例的方案中,该计算装置通过控制器单元获取待执行的第一指令集,获取第二指令集,确定第一指令集与第二指令集之间是否构成循环体,执行单元在第一指令集与第二指令集之间构成循环体时,根据第一指令集的指令信息执行第二指令集中的指令,从而,可减少指令的指令信息的代码量,提高指令计算效率。It can be seen that in the scheme of the embodiment of the present application, the computing device obtains the first instruction set to be executed through the controller unit, obtains the second instruction set, determines whether a loop body is formed between the first instruction set and the second instruction set, and when the execution unit forms a loop body between the first instruction set and the second instruction set, executes the instructions in the second instruction set according to the instruction information of the first instruction set, thereby reducing the amount of code of the instruction information of the instructions and improving the instruction calculation efficiency.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.
图1是本申请实施例提供的一种人工智能计算装置的结构示意图;FIG1 is a schematic diagram of the structure of an artificial intelligence computing device provided in an embodiment of the present application;
图2A是本申请实施例提供的一种人工智能计算方法的流程示意图;FIG2A is a schematic diagram of a flow chart of an artificial intelligence computing method provided in an embodiment of the present application;
图2B是本申请实施例提供的一种并行执行神经网络的指令集中的指令的演示示意图;FIG2B is a schematic diagram illustrating instructions in an instruction set for executing a neural network in parallel provided by an embodiment of the present application;
图2C是本申请实施例提供的一种将指令集中的指令按照树型结构进行排布的演示示意图;FIG2C is a schematic diagram showing how instructions in an instruction set are arranged in a tree structure according to an embodiment of the present application;
图3是本申请实施例提供的一种组合处理装置的结构图;FIG3 is a structural diagram of a combined processing device provided in an embodiment of the present application;
图4是本申请实施例提供的另一种组合处理装置的结构图;FIG4 is a structural diagram of another combined processing device provided in an embodiment of the present application;
图5为本申请实施例提供的一种板卡的结构示意图。FIG5 is a schematic diagram of the structure of a board provided in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" etc. in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally includes other steps or units inherent to these processes, methods, products or devices.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
首先介绍本申请使用的计算装置。参阅图1,提供了一种人工智能计算装置,该人工智能计算装置用于执行机器学习计算,该计算装置包括:控制器单元11、存储单元10和执行单元12,其中,所述存储单元10连接外部存储装置,所述执行单元12包括加载执行单元121、计算执行单元122和存储执行单元123;其中,First, the computing device used in this application is introduced. Referring to FIG1 , an artificial intelligence computing device is provided, which is used to perform machine learning calculations. The computing device includes: a controller unit 11, a storage unit 10, and an execution unit 12, wherein the storage unit 10 is connected to an external storage device, and the execution unit 12 includes a load execution unit 121, a calculation execution unit 122, and a storage execution unit 123; wherein,
所述控制器单元,用于获取待执行的第一指令集;以及,获取第二指令集;The controller unit is used to obtain a first instruction set to be executed; and obtain a second instruction set;
所述控制器单元,还用于确定所述第一指令集与所述第二指令集之间是否构成循环体;The controller unit is further used to determine whether the first instruction set and the second instruction set form a loop body;
所述执行单元,用于在所述第一指令集与所述第二指令集之间构成循环体时,根据所述第一指令集的指令信息执行所述第二指令集中的指令。The execution unit is used to execute instructions in the second instruction set according to instruction information of the first instruction set when a loop body is formed between the first instruction set and the second instruction set.
在一个可能的实施例中,在所述根据所述第一指令集的指令信息执行所述第二指令集中的指令方面,所述执行单元具体用于:In a possible embodiment, in executing the instructions in the second instruction set according to the instruction information of the first instruction set, the execution unit is specifically configured to:
根据跳转指令跳转至所述第一指令集中与所述第二指令集中的第二指令对应的第一指令的操作码存储区域,从所述操作码存储区域获取所述第一指令的操作码,将所述操作码作为所述第二指令的操作码,其中,所述操作码包括所述第一指令的标识。According to a jump instruction, jump to an opcode storage area of a first instruction in the first instruction set corresponding to a second instruction in the second instruction set, obtain an opcode of the first instruction from the opcode storage area, and use the opcode as the opcode of the second instruction, wherein the opcode includes an identifier of the first instruction.
在一个可能的实施例中,所述第一指令集包含第一运算任务的第一加载指令、第一计算指令和第一存储指令;所述第二指令集包含第二运算任务的第二加载指令、第二计算指令和第二存储指令;在所述确定所述第一指令集与所述第二指令集之间是否构成循环体方面,所述控制器单元具体用于:In a possible embodiment, the first instruction set includes a first load instruction, a first calculation instruction, and a first storage instruction for a first computing task; the second instruction set includes a second load instruction, a second calculation instruction, and a second storage instruction for a second computing task; in determining whether a loop body is formed between the first instruction set and the second instruction set, the controller unit is specifically used to:
获取所述第一指令集和所述第二指令集中每一指令对应的预设指令信息,得到多个预设指令信息,所述预设指令信息包括以下至少一种:指令类型、剩余执行次数、是否奇偶性翻转;Obtaining preset instruction information corresponding to each instruction in the first instruction set and the second instruction set to obtain a plurality of preset instruction information, wherein the preset instruction information includes at least one of the following: instruction type, remaining execution times, and whether parity is flipped;
将所述第一加载指令对应的第一预设指令信息与所述第二加载指令对应的第二预设指令信息进行比对;将所述第一计算指令对应的第三预设指令信息与所述第二计算指令对应的第四预设指令信息进行比对;将所述第一存储指令对应的第五预设指令信息与所述第二存储指令对应的第六预设指令信息进行比对;Compare the first preset instruction information corresponding to the first load instruction with the second preset instruction information corresponding to the second load instruction; compare the third preset instruction information corresponding to the first calculation instruction with the fourth preset instruction information corresponding to the second calculation instruction; compare the fifth preset instruction information corresponding to the first storage instruction with the sixth preset instruction information corresponding to the second storage instruction;
若所述第一预设指令信息与所述第二预设指令信息之间仅存在操作次数的差异,所述第三预设指令信息与第四预设指令信息之间仅存在操作次数的差异,且所述第五预设指令信息与所述第六预设指令信息之间仅存在操作次数的差异,确定所述第一指令集与所述第二指令集之间构成循环体。If the first preset instruction information and the second preset instruction information only differ in the number of operations, the third preset instruction information and the fourth preset instruction information only differ in the number of operations, and the fifth preset instruction information and the sixth preset instruction information only differ in the number of operations, it is determined that the first instruction set and the second instruction set constitute a loop body.
在一个可能的实施例中,所述第一指令集包含第一运算任务的第一存储指令,第二运算任务的第二计算指令和第三运算任务对应的第三加载指令;所述第二指令集包含第二运算任务的第二存储指令,第三运算任务的第三计算指令和第四运算任务的第四加载指令;在所述确定所述第一指令集与所述第二指令集之间是否构成循环体方面,所述控制器单元具体用于:In a possible embodiment, the first instruction set includes a first storage instruction for a first computing task, a second calculation instruction for a second computing task, and a third load instruction corresponding to a third computing task; the second instruction set includes a second storage instruction for the second computing task, a third calculation instruction for the third computing task, and a fourth load instruction for a fourth computing task; in determining whether a loop body is formed between the first instruction set and the second instruction set, the controller unit is specifically used to:
获取所述第一指令集和所述第二指令集中每一指令对应的预设指令信息,得到多个预设指令信息,所述预设指令信息包括以下至少一种:指令类型、剩余执行次数、是否奇偶性翻转;Obtaining preset instruction information corresponding to each instruction in the first instruction set and the second instruction set to obtain a plurality of preset instruction information, wherein the preset instruction information includes at least one of the following: instruction type, remaining execution times, and whether parity is flipped;
将所述第一存储指令对应的第五预设指令信息与所述第二存储指令对应的第六预设指令信息进行比对;将所述第二计算指令对应的第七预设指令信息与所述第三计算指令对应的第八预设指令信息进行比对;将所述第三加载指令对应的第九预设指令信息与所述第四加载指令对应的第十预设指令信息进行比对;Compare the fifth preset instruction information corresponding to the first storage instruction with the sixth preset instruction information corresponding to the second storage instruction; compare the seventh preset instruction information corresponding to the second calculation instruction with the eighth preset instruction information corresponding to the third calculation instruction; compare the ninth preset instruction information corresponding to the third load instruction with the tenth preset instruction information corresponding to the fourth load instruction;
若所述第五预设指令信息与所述第六预设指令信息之间仅存在操作次数的差异,所述第七预设指令信息与第八预设指令信息之间仅存在操作次数的差异,且所述第九预设指令信息与所述第十预设指令信息之间仅存在操作次数的差异,确定所述第一指令集与所述第二指令集之间构成循环体。If the only difference between the fifth preset instruction information and the sixth preset instruction information is the number of operations, the only difference between the seventh preset instruction information and the eighth preset instruction information is the number of operations, and the only difference between the ninth preset instruction information and the tenth preset instruction information is the number of operations, it is determined that the first instruction set and the second instruction set constitute a loop body.
在一个可能的实施例中,所述控制器单元还用于:In a possible embodiment, the controller unit is further configured to:
确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间是否存在关联关系;Determine whether there is an association relationship between the first storage instruction, the second calculation instruction and the third load instruction;
所述执行单元,还用于在所述第一存储指令、所述第二计算指令和所述第三加载指令之间不存在关联关系时,在第一时间片内并行执行所述第一存储指令、所述第二计算指令和所述第三加载指令。The execution unit is further configured to execute the first storage instruction, the second calculation instruction and the third load instruction in parallel within a first time slice when there is no association relationship between the first storage instruction, the second calculation instruction and the third load instruction.
在一个可能的实施例中,在所述确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间是否存在关联关系方面,所述控制器单元具体用于:In a possible embodiment, in determining whether there is an association relationship between the first storage instruction, the second calculation instruction, and the third load instruction, the controller unit is specifically configured to:
提取所述第一存储指令中所需数据的第一存储地址区间,提取所述第二计算指令中所需数据的第二存储地址区间,提取所述第三加载指令中所需数据的第三存储地址区间,若所述第一存储地址区间、所述第二存储地址区间和所述第三存储地址区间两两之间不具有重叠的区域,确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间不存在关联关系。Extract the first storage address interval of the data required in the first storage instruction, extract the second storage address interval of the data required in the second calculation instruction, and extract the third storage address interval of the data required in the third loading instruction. If there is no overlapping area between the first storage address interval, the second storage address interval and the third storage address interval, determine that there is no association relationship between the first storage instruction, the second calculation instruction and the third loading instruction.
在一个可能的实施例中,在所述确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间是否存在关联关系方面,所述控制器单元具体用于:In a possible embodiment, in determining whether there is an association relationship between the first storage instruction, the second calculation instruction, and the third load instruction, the controller unit is specifically configured to:
提取所述第一存储指令对应的第一写入区域,提取所述第二计算指令对应的第二读取区域和第二写入区域,提取所述第三加载指令对应的第三读取区域;Extracting a first write area corresponding to the first storage instruction, extracting a second read area and a second write area corresponding to the second calculation instruction, and extracting a third read area corresponding to the third load instruction;
若所述第一写入区域、所述第二读取区域、所述第二写入区域和所述第三读取区域之间均不存在重叠区域,确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间不存在关联关系。If there is no overlapping area between the first writing area, the second reading area, the second writing area and the third reading area, it is determined that there is no association relationship between the first storage instruction, the second calculation instruction and the third loading instruction.
在一个可能的实施例中,所述人工智能计算装置还包括存储单元,所述存储单元与外部存储装置连接;所述执行单元包括加载执行单元、计算执行单元和存储执行单元;In a possible embodiment, the artificial intelligence computing device further includes a storage unit, and the storage unit is connected to an external storage device; the execution unit includes a loading execution unit, a calculation execution unit, and a storage execution unit;
在所述在第一时间片内并行执行所述第一存储指令、所述第二计算指令和所述第三加载指令方面,所述存储执行单元用于根据所述第一存储指令将所述第一运算任务中第一输入数据对应的第一计算结果从所述存储单元传输至所述外部存储装置,所述计算执行单元用于根据所述第二计算指令对所述第二运算任务中第二输入数据进行计算,得到第二计算结果;所述加载执行单元用于根据所述第三加载指令将所述第三运算任务中的第三输入数据从所述外部存储装置传输至所述存储单元。In terms of executing the first storage instruction, the second calculation instruction and the third loading instruction in parallel within the first time slice, the storage execution unit is used to transfer the first calculation result corresponding to the first input data in the first calculation task from the storage unit to the external storage device according to the first storage instruction, the calculation execution unit is used to calculate the second input data in the second calculation task according to the second calculation instruction to obtain the second calculation result; and the loading execution unit is used to transfer the third input data in the third calculation task from the external storage device to the storage unit according to the third loading instruction.
在一个可能的实施例中,所述存储单元包括第一存储区域和第二存储区域,在所述根据所述第三加载指令将所述第三运算任务中的第三输入数据从所述外部存储装置传输至所述存储单元方面,所述加载执行单元具体用于:In a possible embodiment, the storage unit includes a first storage area and a second storage area, and in terms of transmitting the third input data in the third computing task from the external storage device to the storage unit according to the third load instruction, the load execution unit is specifically used to:
在所述第一时间片内根据所述第三加载指令将所述第三运算任务中的第三输入数据进行乒乓操作,从所述外部存储装置传输至所述第一存储区域。In the first time slice, a ping-pong operation is performed on the third input data in the third computing task according to the third load instruction, and the data is transferred from the external storage device to the first storage area.
在一个可能的实施例中,所述第三输入数据包括多个第三输入子数据,在将所述第三运算任务中的第三输入数据进行乒乓操作,从所述外部存储装置传输至所述第一存储区域方面,所述加载执行单元具体用于:In a possible embodiment, the third input data includes a plurality of third input sub-data, and in terms of performing a ping-pong operation on the third input data in the third computing task and transferring the third input data from the external storage device to the first storage area, the load execution unit is specifically configured to:
预估所述多个第三输入子数据中每一第三输入子数据在所述第一存储区域的目标存储时长,得到多个目标存储时长;estimating a target storage duration of each of the plurality of third input sub-data in the first storage area to obtain a plurality of target storage durations;
按照存储时长从大到小的顺序将所述多个目标存储时长对应的所述多个第三输入子数据传输至所述第一存储区域,并从所述第一存储区域的两端存储至中间。The plurality of third input sub-data corresponding to the plurality of target storage durations are transmitted to the first storage area in a descending order of storage durations, and are stored from both ends to the middle of the first storage area.
如图2A所示,图2A为本申请实施例提供的一种人工智能计算方法的流程示意图,应用于人工智能计算装置,所述人工智能计算装置包括控制器单元、存储单元和执行单元;所述存储单元连接外部存储装置,所述执行单元包括加载执行单元、计算执行单元和存储执行单元;所述方法包括:As shown in FIG. 2A , FIG. 2A is a flow chart of an artificial intelligence computing method provided in an embodiment of the present application, which is applied to an artificial intelligence computing device, wherein the artificial intelligence computing device includes a controller unit, a storage unit, and an execution unit; the storage unit is connected to an external storage device, and the execution unit includes a loading execution unit, a calculation execution unit, and a storage execution unit; the method includes:
201、获取待执行的第一指令集;以及,获取第二指令集。201. Obtain a first instruction set to be executed; and obtain a second instruction set.
本申请实施例中,可将神经网络的指令集中的多个指令划分为输入输出指令和计算指令,输入输出指令可划分为加载指令和存储指令,其中,人工智能计算装置的执行单元用于根据加载指令将输入数据从外部存储装置传输到人工智能计算装置上的存储单元,然后根据计算指令从存储单元直接获取输入数据,并根据输入数据进行计算,得到计算结果,将计算结果缓存至存储单元,最后根据存储指令将计算结果从存储单元传输到外部存储装置。In an embodiment of the present application, multiple instructions in the instruction set of the neural network can be divided into input and output instructions and calculation instructions, and the input and output instructions can be divided into loading instructions and storage instructions, wherein the execution unit of the artificial intelligence computing device is used to transfer input data from an external storage device to a storage unit on the artificial intelligence computing device according to the loading instruction, and then directly obtain the input data from the storage unit according to the calculation instruction, and perform calculations based on the input data to obtain calculation results, cache the calculation results to the storage unit, and finally transfer the calculation results from the storage unit to the external storage device according to the storage instruction.
其中,神经网络的指令集的划分可以不局限于加载指令、计算指令和存储指令三个阶段的划分,还可以其他标准划分指令,本申请实施例不做限定。Among them, the division of the instruction set of the neural network may not be limited to the three stages of loading instructions, computing instructions and storage instructions, and instructions may also be divided according to other standards, which is not limited in the embodiments of the present application.
可选地,第一指令集可包括第一运算任务的第一加载指令、第一计算指令和第一存储指令;第二指令集可包括第二运算任务的第二加载指令、第二计算指令和第二存储指令。其中,第一加载指令用于将第一运算任务中的第一输入数据从所述外部存储装置传输至存储单元,第一计算指令用于对第一运算任务中的第一输入数据进行计算并得到第一计算结果,第一存储指令用于将第一计算结果从存储单元传输至外部存储装置;第二加载指令用于将第二运算任务中的第二输入数据从所述外部存储装置传输至存储单元,第二计算指令用于对第二运算任务中的第二输入数据进行计算并得到第二计算结果,第二存储指令用于将第二计算结果从存储单元传输至外部存储装置。Optionally, the first instruction set may include a first load instruction, a first calculation instruction and a first storage instruction of the first computing task; and the second instruction set may include a second load instruction, a second calculation instruction and a second storage instruction of the second computing task. The first load instruction is used to transfer the first input data in the first computing task from the external storage device to the storage unit, the first calculation instruction is used to calculate the first input data in the first computing task and obtain a first calculation result, and the first storage instruction is used to transfer the first calculation result from the storage unit to the external storage device; the second load instruction is used to transfer the second input data in the second computing task from the external storage device to the storage unit, the second calculation instruction is used to calculate the second input data in the second computing task and obtain a second calculation result, and the second storage instruction is used to transfer the second calculation result from the storage unit to the external storage device.
可选地,第一指令集可包括第一运算任务的第一存储指令,第二运算任务的第二计算指令和第三运算任务的第三加载指令;第二指令集包含第二运算任务的第二存储指令,第三运算任务的第三计算指令和第四运算任务的第四加载指令。其中,第一存储指令用于将第一计算结果从存储单元传输至外部存储装置,第二计算指令用于对第二运算任务中的第二输入数据进行计算并得到第二计算结果,第三加载指令用于将第三运算任务中的第三输入数据从所述外部存储装置传输至存储单元;第二存储指令用于将第二计算结果从存储单元传输至外部存储装置,第三计算指令用于对第三运算任务中的第三输入数据进行计算并得到第三计算结果,第四加载指令用于将第四运算任务中的第四输入数据从所述外部存储装置传输至存储单元。Optionally, the first instruction set may include a first storage instruction for the first computing task, a second calculation instruction for the second computing task, and a third load instruction for the third computing task; the second instruction set includes a second storage instruction for the second computing task, a third calculation instruction for the third computing task, and a fourth load instruction for the fourth computing task. Among them, the first storage instruction is used to transfer the first calculation result from the storage unit to the external storage device, the second calculation instruction is used to calculate the second input data in the second computing task and obtain the second calculation result, and the third load instruction is used to transfer the third input data in the third computing task from the external storage device to the storage unit; the second storage instruction is used to transfer the second calculation result from the storage unit to the external storage device, the third calculation instruction is used to calculate the third input data in the third computing task and obtain the third calculation result, and the fourth load instruction is used to transfer the fourth input data in the fourth computing task from the external storage device to the storage unit.
202、确定所述第一指令集与所述第二指令集之间是否构成循环体。202. Determine whether the first instruction set and the second instruction set form a loop body.
可选地,所述第一指令集包含第一运算任务的第一加载指令、第一计算指令和第一存储指令;所述第二指令集包含第二运算任务的第二加载指令、第二计算指令和第二存储指令;上述步骤202中,确定所述第一指令集与所述第二指令集之间是否构成循环体,可包括以下步骤:Optionally, the first instruction set includes a first load instruction, a first calculation instruction and a first storage instruction of a first computing task; the second instruction set includes a second load instruction, a second calculation instruction and a second storage instruction of a second computing task; in the above step 202, determining whether a loop body is formed between the first instruction set and the second instruction set may include the following steps:
获取所述第一指令集和所述第二指令集中每一指令对应的预设指令信息,得到多个预设指令信息,所述预设指令信息包括以下至少一种:指令类型、剩余执行次数、是否奇偶性翻转;Obtaining preset instruction information corresponding to each instruction in the first instruction set and the second instruction set to obtain a plurality of preset instruction information, wherein the preset instruction information includes at least one of the following: instruction type, remaining execution times, and whether parity is flipped;
将所述第一加载指令对应的第一预设指令信息与所述第二加载指令对应的第二预设指令信息进行比对;将所述第一计算指令对应的第三预设指令信息与所述第二计算指令对应的第四预设指令信息进行比对;将所述第一存储指令对应的第五预设指令信息与所述第二存储指令对应的第六预设指令信息进行比对;Compare the first preset instruction information corresponding to the first load instruction with the second preset instruction information corresponding to the second load instruction; compare the third preset instruction information corresponding to the first calculation instruction with the fourth preset instruction information corresponding to the second calculation instruction; compare the fifth preset instruction information corresponding to the first storage instruction with the sixth preset instruction information corresponding to the second storage instruction;
若所述第一预设指令信息与所述第二预设指令信息之间仅存在操作次数的差异,所述第三预设指令信息与第四预设指令信息之间仅存在操作次数的差异,且所述第五预设指令信息与所述第六预设指令信息之间仅存在操作次数的差异,确定所述第一指令集与所述第二指令集之间构成循环体。If the first preset instruction information and the second preset instruction information only differ in the number of operations, the third preset instruction information and the fourth preset instruction information only differ in the number of operations, and the fifth preset instruction information and the sixth preset instruction information only differ in the number of operations, it is determined that the first instruction set and the second instruction set constitute a loop body.
其中,预设指令信息中可包括以下至少一种信息:指令类型、剩余执行次数、是否奇偶性翻转。指令类型是指该指令为加载指令、计算指令或者存储指令,以及当指令为计算指令时,计算指令中包含的运算符类型,运算符类型可包括以下至少一种:加、减、乘、除、卷积、以及上述多种运算符之间的组合等等,剩余执行次数是指针对一个运算中需要执行多次的重复运算的剩余执行次数。The preset instruction information may include at least one of the following information: instruction type, remaining execution times, and whether parity is flipped. The instruction type refers to whether the instruction is a load instruction, a calculation instruction, or a storage instruction, and when the instruction is a calculation instruction, the operator type contained in the calculation instruction, and the operator type may include at least one of the following: addition, subtraction, multiplication, division, convolution, and a combination of the above multiple operators, etc. The remaining execution times refers to the remaining execution times for a repeated operation that needs to be performed multiple times in an operation.
本申请实施例中,可将第一运算任务中的第一加载指令、第一计算指令和第一存储指令与第二运算任务的第二加载指令、第二计算指令和第二存储指令对应的预设指令信息进行比对确定第一指令集与第二指令集之间构成循环体,例如,在运算Yi=∑(wxi+b),i=1,2,3,...100时,假定Y1=wx1+b为第一运算任务,Y2=wx2+b为第二运算任务,Y1=wx1+b运算的第一加载指令、第一计算指令和第一存储指令对应第一指令集,Y2=wx2+b运算的第二加载指令、第二计算指令和第二存储指令对应第二指令集。其中,Y1=wx1+b运算对应的多个预设指令信息中,Y1=wx1+b运算对应的第一计算指令的剩余计算次数为99次,Y2=wx2+b运算对应的第二计算指令的剩余计算次数为98次,可见,第一运算任务对应的第一指令集与第二运算任务对应的第二指令集中的指令之间,第一加载指令与第二加载指令类型相同,剩余加载次数不同,第一存储指令与第二存储指令类型相同,剩余存储次数不同,第一计算指令与第二计算指令中的运算符类型都包括加法和乘法运算符,且运算顺序都相同,仅仅是剩余计算次数不同。因此,可确定第一指令集与第二指令集为循环体。In an embodiment of the present application, the first load instruction, the first calculation instruction and the first storage instruction in the first computing task can be compared with the preset instruction information corresponding to the second load instruction, the second calculation instruction and the second storage instruction of the second computing task to determine that a loop body is formed between the first instruction set and the second instruction set. For example, when operating Yi =∑( wxi +b), i=1, 2, 3, ...100, assuming that Y1 = wx1 +b is the first computing task and Y2 = wx2 +b is the second computing task, the first load instruction, the first calculation instruction and the first storage instruction of the operation Y1 = wx1 +b correspond to the first instruction set, and the second load instruction, the second calculation instruction and the second storage instruction of the operation Y2 = wx2 +b correspond to the second instruction set. Among the multiple preset instruction information corresponding to the operation Y 1 =wx 1 +b, the remaining calculation times of the first calculation instruction corresponding to the operation Y 1 =wx 1 +b is 99 times, and the remaining calculation times of the second calculation instruction corresponding to the operation Y 2 =wx 2 +b is 98 times. It can be seen that between the instructions in the first instruction set corresponding to the first operation task and the second instruction set corresponding to the second operation task, the first load instruction and the second load instruction are of the same type, but the remaining load times are different, the first storage instruction and the second storage instruction are of the same type, but the remaining storage times are different, and the operator types in the first calculation instruction and the second calculation instruction both include addition and multiplication operators, and the operation order is the same, only the remaining calculation times are different. Therefore, it can be determined that the first instruction set and the second instruction set are loop bodies.
可选地,所述第一指令集包含第一运算任务的第一存储指令,第二运算任务的第二计算指令和第三运算任务的第三加载指令;所述第二指令集包含第二运算任务的第二存储指令,第三运算任务的第三计算指令和第四运算任务的第四加载指令;上述步骤202中,确定所述第一指令集与所述第二指令集之间是否构成循环体,可包括以下步骤:Optionally, the first instruction set includes a first storage instruction of a first computing task, a second calculation instruction of a second computing task, and a third load instruction of a third computing task; the second instruction set includes a second storage instruction of the second computing task, a third calculation instruction of the third computing task, and a fourth load instruction of a fourth computing task; in the above step 202, determining whether a loop body is formed between the first instruction set and the second instruction set may include the following steps:
获取所述第一指令集和所述第二指令集中每一指令对应的预设指令信息,得到多个预设指令信息,所述预设指令信息包括以下至少一种:指令类型、剩余执行次数、是否奇偶性翻转;Obtaining preset instruction information corresponding to each instruction in the first instruction set and the second instruction set to obtain a plurality of preset instruction information, wherein the preset instruction information includes at least one of the following: instruction type, remaining execution times, and whether parity is flipped;
将所述第一存储指令对应的第五预设指令信息与所述第二存储指令对应的第六预设指令信息进行比对;将所述第二计算指令对应的第七预设指令信息与所述第三计算指令对应的第八预设指令信息进行比对;将所述第三加载指令对应的第九预设指令信息与所述第四加载指令对应的第十预设指令信息进行比对;Compare the fifth preset instruction information corresponding to the first storage instruction with the sixth preset instruction information corresponding to the second storage instruction; compare the seventh preset instruction information corresponding to the second calculation instruction with the eighth preset instruction information corresponding to the third calculation instruction; compare the ninth preset instruction information corresponding to the third load instruction with the tenth preset instruction information corresponding to the fourth load instruction;
若所述第五预设指令信息与所述第六预设指令信息之间仅存在操作次数的差异,所述第七预设指令信息与第八预设指令信息之间仅存在操作次数的差异,且所述第九预设指令信息与所述第十预设指令信息之间仅存在操作次数的差异,确定所述第一指令集与所述第二指令集之间构成循环体。If the only difference between the fifth preset instruction information and the sixth preset instruction information is the number of operations, the only difference between the seventh preset instruction information and the eighth preset instruction information is the number of operations, and the only difference between the ninth preset instruction information and the tenth preset instruction information is the number of operations, it is determined that the first instruction set and the second instruction set constitute a loop body.
本申请实施例中,可将神经网络的指令集中的指令按照树型结构进行排布,请参阅图2B,图2B为本申请实施例提供的一种将指令集中的指令按照树型结构进行排布的演示示意图,如图2B所示,树型结构中第一层数字用于表示芯片信息,例如,“1”表示第一个芯片,第二层数字用于表示时间片,例如“1”表示第一时间片,“2”表示第二时间片,以此类推,第三层字母表示每一时间片内的加载指令、计算指令、存储指令,其中,L代表加载指令、C代表计算指令、S代表存储指令,每一指令对应一个预设指令信息,例如,在运算Yi=∑(wxi+b),i=1,2,3,...100时,i的取值会从1变化到100,则该运算要重复执行Yi=wxi+b的总次数为100,第一时间片中,则该运算要重复执行Yi=wxi+b的总次数为100,每一次都要执行加法和乘法运算,因此,可确定该运算中100次Yi=wxi+b的运算为一个循环体。In an embodiment of the present application, the instructions in the instruction set of the neural network can be arranged in a tree structure, please refer to Figure 2B, Figure 2B is a demonstration schematic diagram of arranging the instructions in the instruction set in a tree structure provided by an embodiment of the present application. As shown in Figure 2B, the first layer of numbers in the tree structure is used to represent chip information, for example, "1" represents the first chip, and the second layer of numbers is used to represent time slices, for example, "1" represents the first time slice, "2" represents the second time slice, and so on. The third layer of letters represents the loading instructions, calculation instructions, and storage instructions in each time slice, where L represents the loading instructions, C represents the calculation instructions, and S represents the storage instructions. Each instruction corresponds to a preset instruction information. For example, when operating Yi =∑( wxi +b), i=1, 2, 3, ... 100, the value of i will change from 1 to 100, then the operation Yi = wxi +b is repeated a total of 100 times, and in the first time slice, the operation Yi = wxi is repeated. The total number of times Yi = wxi + b is 100, and addition and multiplication operations are performed each time. Therefore, it can be determined that the 100 operations of Yi = wxi + b in the operation are a loop body.
其中,可预先对各个时间片的指令集对应的循环体进行解析,得到树型结构中每一节点的预设指令信息,针对紧邻的第一时间片和第二时间片,可判断第一时间片对应的第一指令集与第二时间片对应的第二指令集是否构成循环体,具体地,将第一运算任务的第一存储指令对应的第五预设指令信息与第二运算任务的第二存储指令对应的第六预设指令信息进行比对;将第二运算任务的第二计算指令对应的第七预设指令信息与第三运算任务的第三计算指令对应的第八预设指令信息进行比对;以及将第三运算任务的第三加载指令对应的第九预设指令信息与第四运算任务的第四加载指令对应的第十预设指令信息进行比对;若满足除了的剩余执行次数不同,且第二时间片对应的指令的剩余执行次数较小,其余信息完全相同,则可确定第二时间片对应的第二指令集合与第一时间片对应的第一指令集合构成循环体,例如,若第一时间片中包含加载指令、计算指令和存储指令,计算指令包括的运算符为加法和乘法,加载指令的剩余操作次数为5次,计算指令的剩余操作次数为9次,存储指令的剩余操作次数为3次,第二时间片中的第二指令集中也包含加载指令、计算指令和存储指令,计算指令包括的运算符为加法和乘法,加载指令的剩余操作次数为4次,计算指令的剩余操作次数为8次,存储指令的剩余操作次数为2次,可确定第一时间片对应的第一指令集与所述计算指令所属的第二时间片对应的第二指令集构成循环体。Among them, the loop body corresponding to the instruction set of each time slice can be parsed in advance to obtain the preset instruction information of each node in the tree structure. For the adjacent first time slice and second time slice, it can be determined whether the first instruction set corresponding to the first time slice and the second instruction set corresponding to the second time slice constitute a loop body. Specifically, the fifth preset instruction information corresponding to the first storage instruction of the first computing task is compared with the sixth preset instruction information corresponding to the second storage instruction of the second computing task; the seventh preset instruction information corresponding to the second computing instruction of the second computing task is compared with the eighth preset instruction information corresponding to the third computing instruction of the third computing task; and the ninth preset instruction information corresponding to the third load instruction of the third computing task is compared with the tenth preset instruction information corresponding to the fourth load instruction of the fourth computing task; if the remaining execution times except for are different and the second time slice is If the remaining execution times of the instructions corresponding to the time slice are small and the rest of the information is exactly the same, it can be determined that the second instruction set corresponding to the second time slice and the first instruction set corresponding to the first time slice constitute a loop body. For example, if the first time slice includes loading instructions, calculation instructions and storage instructions, the operators included in the calculation instructions are addition and multiplication, the remaining operation times of the loading instructions are 5 times, the remaining operation times of the calculation instructions are 9 times, and the remaining operation times of the storage instructions are 3 times, and the second instruction set in the second time slice also includes loading instructions, calculation instructions and storage instructions, the operators included in the calculation instructions are addition and multiplication, the remaining operation times of the loading instructions are 4 times, the remaining operation times of the calculation instructions are 8 times, and the remaining operation times of the storage instructions are 2 times, it can be determined that the first instruction set corresponding to the first time slice and the second instruction set corresponding to the second time slice to which the calculation instructions belong constitute a loop body.
进一步地,可确定连续的多个时间片对应的多个指令集是否构成循环体,若连续的多个时间片对应的多个指令集构成循环体,表明该连续多个时间片中类型相同的指令为重复执行的指令,在该循环体中,循环体的起点为剩余操作次数最大的节点所在的时间片,循环体的长度为满足循环条件的最远时间片与起始时间片的差值。Furthermore, it can be determined whether multiple instruction sets corresponding to multiple consecutive time slices constitute a loop body. If multiple instruction sets corresponding to multiple consecutive time slices constitute a loop body, it indicates that instructions of the same type in the multiple consecutive time slices are repeatedly executed instructions. In the loop body, the starting point of the loop body is the time slice where the node with the largest number of remaining operations is located, and the length of the loop body is the difference between the farthest time slice that meets the loop condition and the starting time slice.
203、在所述第一指令集与所述第二指令集之间构成循环体时,根据所述第一指令集的指令信息执行所述第二指令集中的指令。203. When a loop body is formed between the first instruction set and the second instruction set, instructions in the second instruction set are executed according to instruction information of the first instruction set.
其中,上述指令信息可包括指令的操作码和操作域,具体实现中,若第一指令集与所述第二指令集之间构成循环体时,可将第一指令集中的指令的操作码和操作域进行存储,然后,在执行第二指令集中的指令时,直接跳转至第一指令集中与第二指令集中的指令相对应的指令的操作码,进而根据第一指令集的指令的操作码执行第二指令集中的指令。Among them, the above-mentioned instruction information may include the operation code and operation domain of the instruction. In a specific implementation, if a loop body is formed between the first instruction set and the second instruction set, the operation code and operation domain of the instruction in the first instruction set can be stored. Then, when executing the instruction in the second instruction set, it directly jumps to the operation code of the instruction in the first instruction set corresponding to the instruction in the second instruction set, and then executes the instruction in the second instruction set according to the operation code of the instruction in the first instruction set.
例如,在运算Yi=∑(wxi+b),i=1,2,3,...100时,i的取值会从1变化到100,则该运算要重复执行Yi=wxi+b的总次数为100,第一时间片中,则该运算要重复执行Yi=wxi+b的总次数为100,每一次都要执行加法和乘法运算,因此,可确定该运算中100次Yi=wxi+b的运算为一个循环体,本申请实施例中,可将第一时间片对应的第一计算指令对应的操作码存储在操作码存储区域,无需重复存储100次Yi=wxi+b运算对应的多个指令的操作码,在执行第二时间片的过程中,可通过跳转指令,跳转至操作码存储区域,获取第二指令集对应的第一指令集的指令的操作码,从而可重复使用操作码存储区域的操作码,节省操作码的存储空间,可缩减第二时间片中的指令集中各指令的代码量,也可节省指令存储空间,提高运算效率。For example, when operating Yi =∑( wxi +b), i=1, 2, 3, ...100, the value of i will change from 1 to 100, then the operation Yi = wxi +b must be repeated a total of 100 times. In the first time slice, the operation Yi = wxi +b must be repeated a total of 100 times, and addition and multiplication operations must be performed each time. Therefore, it can be determined that the 100 operations of Yi = wxi +b in the operation are a loop body. In the embodiment of the present application, the operation code corresponding to the first calculation instruction corresponding to the first time slice can be stored in the operation code storage area, and there is no need to repeatedly store the operation codes of multiple instructions corresponding to the 100 operations of Yi = wxi +b. During the execution of the second time slice, a jump instruction can be used to jump to the operation code storage area to obtain the operation code of the instruction of the first instruction set corresponding to the second instruction set, so that the operation code of the operation code storage area can be reused, saving the storage space of the operation code, reducing the amount of code of each instruction in the instruction set in the second time slice, saving instruction storage space, and improving computing efficiency.
可选地,本申请实施例中,假定Y1=wx1+b为第一运算任务,Y2=wx2+b为第二运算任务,Y3=wx3+b为第三运算任务,第一指令集包括Y1=wx1+b运算对应的第一存储指令,Y2=wx2+b运算对应的第一计算指令和Y3=wx3+b对应的第一加载指令,第二指令集包括Y2=wx2+b运算对应的第二存储指令、Y3=wx3+b对应的第二计算指令以及Y4=wx4+b运算对应的第二加载指令。其中,Y1=wx1+b运算对应的多个预设指令信息中,Y1=wx1+b运算对应的计算指令的剩余计算次数为99次;Y2=wx2+b运算对应的第一计算指令的剩余计算次数为98次,可见,第一时间片对应的第一指令集与第二时间片内的第二指令集中的指令之间,第一加载指令与第二加载指令类型相同,剩余加载次数不同,第一存储指令与第二存储指令类型相同,剩余存储次数不同,第一计算指令与第二计算指令中的运算符类型都包括加法和乘法运算符,且运算顺序都相同,仅仅是剩余计算次数不同。因此,可确定第一指令集与第二指令集为循环体。Optionally, in the embodiment of the present application, it is assumed that Y 1 =wx 1 +b is the first operation task, Y 2 =wx 2 +b is the second operation task, and Y 3 =wx 3 +b is the third operation task. The first instruction set includes a first storage instruction corresponding to the Y 1 =wx 1 +b operation, a first calculation instruction corresponding to the Y 2 =wx 2 +b operation, and a first load instruction corresponding to the Y 3 =wx 3 +b operation. The second instruction set includes a second storage instruction corresponding to the Y 2 =wx 2 +b operation, a second calculation instruction corresponding to the Y 3 =wx 3 +b operation, and a second load instruction corresponding to the Y 4 =wx 4 +b operation. Among them, among the multiple preset instruction information corresponding to the operation Y 1 =wx 1 +b, the remaining number of calculations of the calculation instruction corresponding to the operation Y 1 =wx 1 +b is 99 times; the remaining number of calculations of the first calculation instruction corresponding to the operation Y 2 =wx 2 +b is 98 times. It can be seen that between the instructions in the first instruction set corresponding to the first time slice and the second instruction set in the second time slice, the first load instruction and the second load instruction are of the same type, but the remaining number of loads is different, the first storage instruction and the second storage instruction are of the same type, but the remaining number of storages is different, the operator types in the first calculation instruction and the second calculation instruction both include addition and multiplication operators, and the operation order is the same, only the remaining number of calculations is different. Therefore, it can be determined that the first instruction set and the second instruction set are loop bodies.
可选地,上述步骤203中,根据所述第一指令集的指令信息执行所述第二指令集中的指令,可包括以下步骤:Optionally, in the above step 203, executing the instructions in the second instruction set according to the instruction information of the first instruction set may include the following steps:
根据跳转指令跳转至所述第一指令集中与所述第二指令集中的第二指令对应的第一指令的操作码存储区域,从所述操作码存储区域获取所述第一指令的操作码,将所述操作码作为所述第二指令的操作码,其中,所述操作码包括所述第一指令的标识。According to a jump instruction, jump to an opcode storage area of a first instruction in the first instruction set corresponding to a second instruction in the second instruction set, obtain an opcode of the first instruction from the opcode storage area, and use the opcode as the opcode of the second instruction, wherein the opcode includes an identifier of the first instruction.
可选地,本申请实施例中,还可包括以下步骤:Optionally, in the embodiment of the present application, the following steps may also be included:
A1确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间是否存在关联关系;A1 determines whether there is an association relationship between the first storage instruction, the second calculation instruction and the third load instruction;
A2、在所述第一存储指令、所述第二计算指令和所述第三加载指令之间不存在关联关系时,在第一时间片内并行执行所述第一存储指令、所述第二计算指令和所述第三加载指令。A2. When there is no association relationship among the first storage instruction, the second calculation instruction and the third load instruction, the first storage instruction, the second calculation instruction and the third load instruction are executed in parallel within a first time slice.
本申请实施例中,加载指令与存储指令之间、加载指令与计算指令之间、存储指令与计算指令之间可以并行执行,加载指令与加载指令之间、计算指令与计算指令之间、存储指令与存储指令之间不可并行执行,需要串行执行。In the embodiments of the present application, load instructions and storage instructions, load instructions and calculation instructions, and storage instructions and calculation instructions can be executed in parallel, but load instructions and load instructions, calculation instructions and calculation instructions, and storage instructions and storage instructions cannot be executed in parallel and need to be executed serially.
其中,在执行指令的过程中,在两条指令之间,若执行一条指令需要用到另一条指令的数据,表明该两条指令之间存在关联关系,例如,若执行一条计算指令需要用到一条加载指令加载的数据,表明该计算指令需要在该加载指令执行完才能执行,可确定该加载指令与该计算指令具有关联关系,因此,可确定待执行的指令之间的关联关系,若确定待执行的多条指令不存在关联关系,则通过执行单元中的加载执行单元、计算执行单元和存储执行单元并行执行不存在关联关系的两条或者三条指令,本申请实施例中,可并行执行指令的情况包括以下几种:加载指令与存储指令之间可并行执行、加载指令与计算指令之间可并行执行、存储指令与计算指令之间可并行执行、加载指令计算指令与存储指令之间可并行执行。因此,本申请实施例中,可将神经网络的指令集中的多个指令按照流水线的方式进行排布,请参阅图2C,图2C为本申请实施例提供的一种并行执行神经网络的指令集中的指令的演示示意图,如图2C所示,L代表加载指令、C代表计算指令、S代表存储指令,其中,横向的每一行加载指令、计算指令和存储指令对应一个运算任务,可对输入数据进行加载、计算得到计算结果,将结果进行存储;纵向的每一列加载指令、计算指令和存储指令对应的一个时间片,表示将不存在关联关系的加载指令、计算指令和存储指令进行并行执行。可见,通过将不存在关联关系的指令进行并行执行,可以让不存在关联关系的多个运算任务并行执行,从而节省了计算时间,提高了计算效率。Among them, in the process of executing instructions, between two instructions, if the execution of one instruction requires the data of another instruction, it indicates that there is an association relationship between the two instructions. For example, if the execution of a calculation instruction requires the data loaded by a load instruction, it indicates that the calculation instruction can only be executed after the load instruction is executed. It can be determined that the load instruction has an association relationship with the calculation instruction. Therefore, the association relationship between the instructions to be executed can be determined. If it is determined that there is no association relationship between multiple instructions to be executed, the two or three instructions that are not associated are executed in parallel through the loading execution unit, the calculation execution unit and the storage execution unit in the execution unit. In the embodiment of the present application, the situations in which instructions can be executed in parallel include the following: loading instructions and storage instructions can be executed in parallel, loading instructions and calculation instructions can be executed in parallel, storage instructions and calculation instructions can be executed in parallel, and loading instructions, calculation instructions and storage instructions can be executed in parallel. Therefore, in the embodiment of the present application, multiple instructions in the instruction set of the neural network can be arranged in a pipeline manner, please refer to Figure 2C, Figure 2C is a demonstration schematic diagram of a parallel execution of instructions in the instruction set of the neural network provided by the embodiment of the present application, as shown in Figure 2C, L represents a load instruction, C represents a calculation instruction, and S represents a storage instruction, wherein each horizontal row of load instructions, calculation instructions, and storage instructions corresponds to a computing task, which can load the input data, calculate the calculation result, and store the result; each vertical column of load instructions, calculation instructions, and storage instructions corresponds to a time slice, indicating that the load instructions, calculation instructions, and storage instructions that are not associated are executed in parallel. It can be seen that by executing instructions that are not associated in parallel, multiple computing tasks that are not associated can be executed in parallel, thereby saving computing time and improving computing efficiency.
可选地,上述步骤A1中,确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间是否存在关联关系,可包括以下步骤:Optionally, in the above step A1, determining whether there is an association relationship between the first storage instruction, the second calculation instruction and the third load instruction may include the following steps:
A11、提取所述第一存储指令中所需数据的第一存储地址区间,提取所述第二计算指令中所需数据的第二存储地址区间,提取所述第三加载指令中所需数据的第三存储地址区间;A11, extracting a first storage address interval of data required in the first storage instruction, extracting a second storage address interval of data required in the second calculation instruction, and extracting a third storage address interval of data required in the third load instruction;
A12、若所述第一存储地址区间、所述第二存储地址区间和所述第三存储地址区间两两之间不具有重叠的区域,确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间不存在关联关系。A12: If the first storage address interval, the second storage address interval and the third storage address interval do not have any overlapping areas, determine that there is no association relationship between the first storage instruction, the second calculation instruction and the third load instruction.
可选地,上述步骤A1中,确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间是否存在关联关系,可包括以下步骤:Optionally, in the above step A1, determining whether there is an association relationship between the first storage instruction, the second calculation instruction and the third load instruction may include the following steps:
A13、提取所述第一存储指令对应的第一写入区域,提取所述第二计算指令对应的第二读取区域和第二写入区域,提取所述第三加载指令对应的第三读取区域;A13, extracting a first write area corresponding to the first storage instruction, extracting a second read area and a second write area corresponding to the second calculation instruction, and extracting a third read area corresponding to the third load instruction;
A14、若所述第一写入区域、所述第二读取区域、所述第二写入区域和所述第三读取区域之间均不存在重叠区域,确定所述第一存储指令、所述第二计算指令和所述第三加载指令之间不存在关联关系。A14. If there is no overlapping area between the first writing area, the second reading area, the second writing area and the third reading area, determine that there is no association relationship between the first storage instruction, the second calculation instruction and the third loading instruction.
可选地,所述人工智能计算装置还包括存储单元,所述存储单元与外部存储装置连接;上述步骤A2中,在第一时间片内并行执行所述第一存储指令、所述第二计算指令和所述第三加载指令,可包括以下步骤:Optionally, the artificial intelligence computing device further includes a storage unit, and the storage unit is connected to an external storage device; in the above step A2, executing the first storage instruction, the second computing instruction and the third loading instruction in parallel within the first time slice may include the following steps:
B1、根据所述第一存储指令将所述第一运算任务中第一输入数据对应的第一计算结果从所述存储单元传输至所述外部存储装置;B1. transmitting the first calculation result corresponding to the first input data in the first calculation task from the storage unit to the external storage device according to the first storage instruction;
B2、根据所述第二计算指令对所述第二运算任务中第二输入数据进行计算,得到第二计算结果;B2. Calculate the second input data in the second computing task according to the second computing instruction to obtain a second computing result;
B3、根据所述第三加载指令将所述第三运算任务中的第三输入数据从所述外部存储装置传输至所述存储单元。B3. Transferring the third input data in the third computing task from the external storage device to the storage unit according to the third loading instruction.
可选地,所述存储单元包括第一存储区域和第二存储区域,上述步骤B3中,根据所述第三加载指令将所述第三运算任务中的第三输入数据从所述外部存储装置传输至所述存储单元,可包括以下步骤:Optionally, the storage unit includes a first storage area and a second storage area. In the above step B3, transmitting the third input data in the third computing task from the external storage device to the storage unit according to the third load instruction may include the following steps:
在所述第一时间片内根据所述第三加载指令将所述第三运算任务中的第三输入数据进行乒乓操作,从所述外部存储装置传输至所述第一存储区域。In the first time slice, a ping-pong operation is performed on the third input data in the third computing task according to the third load instruction, and the data is transferred from the external storage device to the first storage area.
其中,可将存储单元划分为第一存储区域和第二存储区域,在执行神经网络的指令集中的加载指令时,可进行乒乓操作轮流将输入数据从外部存储装置传输到第一存储区域和第二存储区域进行存储,具体地,在第一时间片内,可根据第三加载指令将第三输入数据传存储至第一存储区域,在第二时间片内,可根据第四加载指令将第四输入数据存储至第二存储区域,此时可并行执行第三计算指令,根据第三计算指令从第一存储区域获取第三输入数据进行计算,得到计算结果,在下一时间片,可将下一输入数据存储至第一存储区域,且并行执行第四加载指令对应的下一计算指令,如此循环。从而,可以节省存储单元的存储空间。The storage unit can be divided into a first storage area and a second storage area. When executing the load instruction in the instruction set of the neural network, a ping-pong operation can be performed to transfer the input data from the external storage device to the first storage area and the second storage area for storage in turn. Specifically, in the first time slice, the third input data can be transferred and stored in the first storage area according to the third load instruction. In the second time slice, the fourth input data can be stored in the second storage area according to the fourth load instruction. At this time, the third calculation instruction can be executed in parallel, and the third input data can be obtained from the first storage area according to the third calculation instruction for calculation to obtain the calculation result. In the next time slice, the next input data can be stored in the first storage area, and the next calculation instruction corresponding to the fourth load instruction can be executed in parallel, and the cycle is repeated. Thus, the storage space of the storage unit can be saved.
可选地,所述第三输入数据包括多个第三输入子数据,将所述第三运算任务中的第三输入数据进行乒乓操作,从所述外部存储装置传输至所述第一存储区域,具体可包括以下步骤:Optionally, the third input data includes a plurality of third input sub-data, and performing a ping-pong operation on the third input data in the third computing task and transmitting the third input data from the external storage device to the first storage area may specifically include the following steps:
C1、预估所述多个第三输入子数据中每一第三输入子数据在所述第一存储区域的目标存储时长,得到多个目标存储时长;C1. estimating a target storage duration of each of the plurality of third input sub-data in the first storage area to obtain a plurality of target storage durations;
C2、按照存储时长从大到小的顺序将所述多个目标存储时长对应的所述多个第三输入子数据传输至所述第一存储区域,并从所述第一存储区域的两端存储至中间C2, transferring the plurality of third input sub-data corresponding to the plurality of target storage durations to the first storage area in descending order of storage duration, and storing them from both ends of the first storage area to the middle
其中,将输入数据第一存储区域中,存储的位置越靠近中间,计算时读取输入数据所需要的时间越长,因此,可在存储上述多个第三输入子数据的过程中,先确定每一第三输入子数据的目标存储时长,然后按照存储时长从大到小的顺序从所述第一存储区域的两端存储至中间,如此,在获取第三输入数据进行计算的过程中,可减少较大的目标存储时长对应的第三输入子数据的读取时长,进而提高运算效率。Among them, the closer the storage position of the input data is to the middle, the longer the time required to read the input data during calculation. Therefore, in the process of storing the above-mentioned multiple third input sub-data, the target storage time of each third input sub-data can be determined first, and then stored from both ends to the middle of the first storage area in descending order of storage time. In this way, in the process of obtaining the third input data for calculation, the reading time of the third input sub-data corresponding to the larger target storage time can be reduced, thereby improving the computing efficiency.
类似地,在将输入数据从外部存储装置传输至第二存储区域的过程中,也可按照存储时长从大到小的顺序从所述第二存储区域的两端存储至中间。Similarly, in the process of transmitting the input data from the external storage device to the second storage area, the data may be stored from both ends to the middle of the second storage area in descending order of storage duration.
举例说明,执行运算Yi=∑(wxi+b)的过程中中,w和b为会重复读取操作的数据,可确定w和b对应的存储时长较长,可将w和b存储在第一存储区域或者第二存储区域的两端,将xi存储在第一存储区域或者第二存储区域的中间,从而,在从第一存储区域或者第二存储区域读取数据时,每次读取w和b的时长较小,从而可减少读取数据的耗时。For example, in the process of executing the operation Yi =∑( wxi +b), w and b are data that will be repeatedly read, and it can be determined that the storage time corresponding to w and b is longer. w and b can be stored at both ends of the first storage area or the second storage area, and x i can be stored in the middle of the first storage area or the second storage area. Therefore, when reading data from the first storage area or the second storage area, the time for reading w and b each time is shorter, thereby reducing the time spent on reading data.
再举例说明,如图2C所示,横向的每一行加载指令、计算指令和存储指令对应一个运算任务,例如,第一个运算任务可包括第一加载指令La、第一计算指令Ca和第一存储指令Sa,可通过第一加载指令La从外部存储装置将输入数据加载到人工智能计算装置上存储单元的a1区域;然后通过第一计算指令Ca从a1区域读取输入数据,对输入数据进行计算,得到计算结果,将计算结果存储在人工智能计算装置上存储单元的a2区域;最后,通过第一存储指令Sa从a2区域读取计算结果,并将计算结果从存储单元的a2区域传输至外部存储装置,类似地,第二个运算任务可包括第二加载指令Lb、第二计算指令Cb和第二存储指令Sb,第三个运算任务可包括第三加载指令Lc、第三计算指令Cc和第三存储指令Sc,第四个运算任务可包括第四加载指令Ld、第四计算指令Cd和第四存储指令Sd。可以看出,在第一时间片内,若第一运算任务的第一存储指令Sa、第二运算任务的第二计算指令Cb和第三运算任务的第三加载指令Lc之间不存在关联关系,可在第一时间片内并行执行第一存储指令Sa、第二计算指令Cb以及第三加载指令Lc,此外,若第二运算任务、第三运算任务与第四运算任务之间不具有关联关系,还可在第二时间片内并行执行第二运算任务的第二存储指令Sb、第三个运算任务的第三计算指令Cc以及第四个运算任务的第四加载指令Ld。To give another example, as shown in FIG2C , each horizontal row of loading instructions, computing instructions, and storage instructions corresponds to a computing task. For example, the first computing task may include a first loading instruction La, a first computing instruction Ca, and a first storage instruction Sa. The first loading instruction La may be used to load input data from an external storage device to the a1 area of the storage unit on the artificial intelligence computing device; then, the first computing instruction Ca may be used to read the input data from the a1 area, calculate the input data, obtain a calculation result, and store the calculation result in the a2 area of the storage unit on the artificial intelligence computing device; finally, the first storage instruction Sa may be used to read the calculation result from the a2 area, and transmit the calculation result from the a2 area of the storage unit to the external storage device. Similarly, the second computing task may include a second loading instruction Lb, a second computing instruction Cb, and a second storage instruction Sb. The third computing task may include a third loading instruction Lc, a third computing instruction Cc, and a third storage instruction Sc. The fourth computing task may include a fourth loading instruction Ld, a fourth computing instruction Cd, and a fourth storage instruction Sd. It can be seen that in the first time slice, if there is no association between the first storage instruction Sa of the first computing task, the second computing instruction Cb of the second computing task and the third loading instruction Lc of the third computing task, the first storage instruction Sa, the second computing instruction Cb and the third loading instruction Lc can be executed in parallel in the first time slice. In addition, if there is no association between the second computing task, the third computing task and the fourth computing task, the second storage instruction Sb of the second computing task, the third computing instruction Cc of the third computing task and the fourth loading instruction Ld of the fourth computing task can also be executed in parallel in the second time slice.
进一步地,若第一时间片内并行执行的第一存储指令Sa、第二计算指令Cb以及第三加载指令Lc构成的第一指令集与第二时间片内并行执行的第二存储指令Sb、第三计算指令Cc、第四加载指令Ld构成的第二指令集构成循环体,可在执行第二时间片对应的指令集中的指令时,根据跳转指令跳转至第一指令集对应的指令的操作码存储区域,具体地,从操作码存储区域获取第三加载指令Lc的第一操作码、第二计算指令Cb的第二操作码、第一存储指令Sa的第三操作码;然后,将第一操作码作为第四加载指令Ld的操作码,将第二操作码作为第三计算指令Cc的操作码,将第三操作码作为第二存储指令Sb的操作码;此外,可获取第四加载指令Ld对应的第一操作域,第三计算指令Cc对应的第二操作域,第二存储指令Sb对应的第三操作域。Further, if a first instruction set consisting of a first storage instruction Sa, a second computing instruction Cb, and a third loading instruction Lc executed in parallel in a first time slice and a second instruction set consisting of a second storage instruction Sb, a third computing instruction Cc, and a fourth loading instruction Ld executed in parallel in a second time slice constitute a loop body, when executing instructions in the instruction set corresponding to the second time slice, the jump can be made to the operation code storage area of the instructions corresponding to the first instruction set according to the jump instruction. Specifically, the first operation code of the third loading instruction Lc, the second operation code of the second computing instruction Cb, and the third operation code of the first storage instruction Sa are obtained from the operation code storage area; then, the first operation code is used as the operation code of the fourth loading instruction Ld, the second operation code is used as the operation code of the third computing instruction Cc, and the third operation code is used as the operation code of the second storage instruction Sb; in addition, the first operation domain corresponding to the fourth loading instruction Ld, the second operation domain corresponding to the third computing instruction Cc, and the third operation domain corresponding to the second storage instruction Sb can be obtained.
本申请提供的技术方案通过对神经网络的指令集中重复的指令进行折叠,通过跳转指令执行重复的指令,减少了重复指令展开的代码量,通过将神经网络中的数据存储在划分的不同区域,提高了获取数据的效率,从而提高了神经网络的运算效率。The technical solution provided by the present application folds repeated instructions in the instruction set of the neural network and executes the repeated instructions through jump instructions, thereby reducing the amount of code for expanding repeated instructions, and improves the efficiency of data acquisition by storing the data in the neural network in different divided areas, thereby improving the computing efficiency of the neural network.
本申请还揭露了一个机器学习运算装置,其包括一个或多个在本申请中提到的人工智能计算装置,用于从其他处理装置中获取待运算数据和控制信息,执行指定的机器学习运算,执行结果通过I/O接口传递给外围设备。外围设备譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口,服务器。当包含一个以上人工智能计算装置时,人工智能计算装置间可以通过特定的结构进行链接并传输数据,譬如,通过PCIE总线进行互联并传输数据,以支持更大规模的机器学习的运算。此时,可以共享同一控制系统,也可以有各自独立的控制系统;可以共享内存,也可以每个加速器有各自的内存。此外,其互联方式可以是任意互联拓扑。The present application also discloses a machine learning computing device, which includes one or more artificial intelligence computing devices mentioned in the present application, and is used to obtain data to be calculated and control information from other processing devices, perform specified machine learning operations, and transmit the execution results to peripheral devices through I/O interfaces. Peripheral devices include cameras, displays, mice, keyboards, network cards, wifi interfaces, and servers. When more than one artificial intelligence computing device is included, the artificial intelligence computing devices can be linked and data can be transmitted through a specific structure, for example, interconnected and data can be transmitted through a PCIE bus to support larger-scale machine learning operations. At this time, the same control system can be shared, or each independent control system can be provided; memory can be shared, or each accelerator can have its own memory. In addition, the interconnection method can be any interconnection topology.
该机器学习运算装置具有较高的兼容性,可通过PCIE接口与各种类型的服务器相连接。The machine learning computing device has high compatibility and can be connected to various types of servers through a PCIE interface.
本申请还揭露了一个组合处理装置,其包括上述的机器学习运算装置,通用互联接口,和其他处理装置。机器学习运算装置与其他处理装置进行交互,共同完成用户指定的操作。图3为组合处理装置的示意图。The present application also discloses a combined processing device, which includes the above-mentioned machine learning computing device, a universal interconnection interface, and other processing devices. The machine learning computing device interacts with other processing devices to jointly complete the operation specified by the user. FIG3 is a schematic diagram of the combined processing device.
其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口,包括数据搬运,完成对本机器学习运算装置的开启、停止等基本控制;其他处理装置也可以和机器学习运算装置协作共同完成运算任务。Other processing devices include one or more types of processors such as central processing unit (CPU), graphics processing unit (GPU), neural network processor, and other general/special processors. There is no limit on the number of processors included in other processing devices. Other processing devices serve as interfaces between the machine learning computing device and external data and control, including data handling, to complete basic control of the machine learning computing device such as starting and stopping; other processing devices can also collaborate with the machine learning computing device to complete computing tasks.
通用互联接口,用于在所述机器学习运算装置与其他处理装置间传输数据和控制指令。该机器学习运算装置从其他处理装置中获取所需的输入数据,写入机器学习运算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入机器学习运算装置片上的控制缓存;也可以读取机器学习运算装置的存储模块中的数据并传输给其他处理装置。A universal interconnection interface is used to transmit data and control instructions between the machine learning computing device and other processing devices. The machine learning computing device can obtain the required input data from other processing devices and write it into the storage device on the machine learning computing device chip; it can obtain control instructions from other processing devices and write them into the control cache on the machine learning computing device chip; it can also read data in the storage module of the machine learning computing device and transmit it to other processing devices.
可选的,该结构如图4所示,还可以包括存储装置,存储装置分别与所述机器学习运算装置和所述其他处理装置连接。存储装置用于保存在所述机器学习运算装置和所述其他处理装置的数据,尤其适用于所需要运算的数据在本机器学习运算装置或其他处理装置的内部存储中无法全部保存的数据。Optionally, as shown in FIG4 , the structure may further include a storage device, which is connected to the machine learning operation device and the other processing device, respectively. The storage device is used to store data in the machine learning operation device and the other processing device, and is particularly suitable for data that cannot be fully stored in the internal storage of the machine learning operation device or other processing devices.
该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。The combined processing device can be used as a SOC chip system for mobile phones, robots, drones, video surveillance equipment and other devices, effectively reducing the core area of the control part, improving the processing speed, and reducing the overall power consumption. In this case, the universal interconnection interface of the combined processing device is connected to certain components of the device. Certain components include cameras, displays, mice, keyboards, network cards, and wifi interfaces.
在一些实施例里,还公开了一种芯片,其包括了上述机器学习运算装置或组合处理装置。In some embodiments, a chip is also disclosed, which includes the above-mentioned machine learning computing device or combined processing device.
在一些实施例里,公开了一种芯片封装结构,其包括了上述芯片。In some embodiments, a chip packaging structure is disclosed, which includes the above-mentioned chip.
在一些实施例里,公开了一种板卡,其包括了上述芯片封装结构。参阅图5,图5提供了一种板卡,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;In some embodiments, a board is disclosed, which includes the above chip packaging structure. Referring to FIG5 , FIG5 provides a board, which, in addition to the above chip 389, may also include other supporting components, including but not limited to: a storage device 390, an interface device 391 and a control device 392;
所述存储器件390与所述芯片封装结构内的芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述芯片通过总线连接。可以理解,每一组所述存储单元可以是DDRSDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include multiple groups of storage units 393. Each group of storage units is connected to the chip through a bus. It can be understood that each group of storage units may be DDRSDRAM (English: Double Data Rate SDRAM, double rate synchronous dynamic random access memory).
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In one embodiment, the storage device may include 4 groups of storage units. Each group of storage units may include multiple DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers, 64 bits of the above 72-bit DDR4 controllers are used for data transmission, and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。In one embodiment, each group of the storage units includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling DDR is arranged in the chip to control the data transmission and data storage of each of the storage units.
所述接口装置与所述芯片封装结构内的芯片电连接。所述接口装置用于实现所述芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE3.0X16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本申请并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。The interface device is electrically connected to the chip in the chip packaging structure. The interface device is used to realize data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device can be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface to realize data transfer. Preferably, when the PCIE3.0X16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device can also be other interfaces. This application does not limit the specific forms of expression of the above-mentioned other interfaces. The interface unit can realize the switching function. In addition, the calculation results of the chip are still transmitted back to the external device (such as a server) by the interface device.
所述控制器件与所述芯片电连接。所述控制器件用于对所述芯片的状态进行监控。具体的,所述芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。The control device is electrically connected to the chip. The control device is used to monitor the state of the chip. Specifically, the chip and the control device can be electrically connected via an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the chip may include multiple processing chips, multiple processing cores or multiple processing circuits, which can drive multiple loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the chip.
在一些实施例里,申请了一种电子装置,其包括了上述板卡。In some embodiments, an electronic device is applied for, which includes the above-mentioned board.
电子装置包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic devices include data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, camcorders, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。The transportation means include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound machines and/or electrocardiographs.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that, for the aforementioned method embodiments, for the sake of simplicity, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the present application is not limited by the described order of actions, because according to the present application, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the present application.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only schematic, such as the division of the units, which is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the device or unit can be electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of a software program module.
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application can essentially or in other words, the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a memory, including a number of instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, disk or optical disk and other media that can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。A person skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash drive, a read-only memory (English: Read-Only Memory, abbreviated as: ROM), a random access memory (English: Random Access Memory, abbreviated as: RAM), a magnetic disk or an optical disk, etc.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the present application are introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea. At the same time, for general technical personnel in this field, according to the idea of the present application, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.
Claims (16)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910226552.7A CN111723920B (en) | 2019-03-22 | 2019-03-22 | Artificial intelligence computing devices and related products |
PCT/CN2020/080447 WO2020192587A1 (en) | 2019-03-22 | 2020-03-20 | Artificial intelligence computing device and related product |
US17/440,529 US11983535B2 (en) | 2019-03-22 | 2020-03-20 | Artificial intelligence computing device and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910226552.7A CN111723920B (en) | 2019-03-22 | 2019-03-22 | Artificial intelligence computing devices and related products |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111723920A CN111723920A (en) | 2020-09-29 |
CN111723920B true CN111723920B (en) | 2024-05-17 |
Family
ID=72563777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910226552.7A Active CN111723920B (en) | 2019-03-22 | 2019-03-22 | Artificial intelligence computing devices and related products |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111723920B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113900509A (en) * | 2021-09-03 | 2022-01-07 | 重庆科创职业学院 | Artificial intelligence computing device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2950464A (en) * | 1958-08-29 | 1960-08-23 | Itt | Error detection systems |
US3401376A (en) * | 1965-11-26 | 1968-09-10 | Burroughs Corp | Central processor |
US6282633B1 (en) * | 1998-11-13 | 2001-08-28 | Tensilica, Inc. | High data density RISC processor |
CN101078979A (en) * | 2007-06-29 | 2007-11-28 | 东南大学 | Storage control circuit with multiple-passage instruction pre-fetching function |
CN103957463A (en) * | 2014-05-28 | 2014-07-30 | 谭兆红 | Preschool education high-definition anime playing system |
CN104395876A (en) * | 2012-07-06 | 2015-03-04 | 皇家飞利浦有限公司 | Electric connection system |
US9443192B1 (en) * | 2015-08-30 | 2016-09-13 | Jasmin Cosic | Universal artificial intelligence engine for autonomous computing devices and software applications |
CN107608715A (en) * | 2017-07-20 | 2018-01-19 | 上海寒武纪信息科技有限公司 | For performing the device and method of artificial neural network forward operation |
CN108875926A (en) * | 2017-10-30 | 2018-11-23 | 上海寒武纪信息科技有限公司 | Interaction language translating method and Related product |
CN109062611A (en) * | 2018-02-05 | 2018-12-21 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector scaling instruction |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8505002B2 (en) * | 2006-09-29 | 2013-08-06 | Arm Limited | Translation of SIMD instructions in a data processing system |
US20080162399A1 (en) * | 2006-12-31 | 2008-07-03 | Think Passenger, Inc. | Consumer marketing platform |
US20100122066A1 (en) * | 2008-11-12 | 2010-05-13 | Freescale Semiconductor, Inc. | Instruction method for facilitating efficient coding and instruction fetch of loop construct |
US20120079303A1 (en) * | 2010-09-24 | 2012-03-29 | Madduri Venkateswara R | Method and apparatus for reducing power consumption in a processor by powering down an instruction fetch unit |
US8479185B2 (en) * | 2010-12-09 | 2013-07-02 | Oracle International Corporation | Method and system for utilizing parallelism across loops |
US9898289B2 (en) * | 2014-10-20 | 2018-02-20 | International Business Machines Corporation | Coordinated start interpretive execution exit for a multithreaded processor |
US10762164B2 (en) * | 2016-01-20 | 2020-09-01 | Cambricon Technologies Corporation Limited | Vector and matrix computing device |
-
2019
- 2019-03-22 CN CN201910226552.7A patent/CN111723920B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2950464A (en) * | 1958-08-29 | 1960-08-23 | Itt | Error detection systems |
US3401376A (en) * | 1965-11-26 | 1968-09-10 | Burroughs Corp | Central processor |
US6282633B1 (en) * | 1998-11-13 | 2001-08-28 | Tensilica, Inc. | High data density RISC processor |
CN101078979A (en) * | 2007-06-29 | 2007-11-28 | 东南大学 | Storage control circuit with multiple-passage instruction pre-fetching function |
CN104395876A (en) * | 2012-07-06 | 2015-03-04 | 皇家飞利浦有限公司 | Electric connection system |
CN103957463A (en) * | 2014-05-28 | 2014-07-30 | 谭兆红 | Preschool education high-definition anime playing system |
US9443192B1 (en) * | 2015-08-30 | 2016-09-13 | Jasmin Cosic | Universal artificial intelligence engine for autonomous computing devices and software applications |
CN107608715A (en) * | 2017-07-20 | 2018-01-19 | 上海寒武纪信息科技有限公司 | For performing the device and method of artificial neural network forward operation |
CN107992329A (en) * | 2017-07-20 | 2018-05-04 | 上海寒武纪信息科技有限公司 | A kind of computational methods and Related product |
CN108875926A (en) * | 2017-10-30 | 2018-11-23 | 上海寒武纪信息科技有限公司 | Interaction language translating method and Related product |
CN109117947A (en) * | 2017-10-30 | 2019-01-01 | 上海寒武纪信息科技有限公司 | Profile testing method and Related product |
CN109062611A (en) * | 2018-02-05 | 2018-12-21 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector scaling instruction |
Also Published As
Publication number | Publication date |
---|---|
CN111723920A (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096309B (en) | Operation method, operation device, computer equipment and storage medium | |
CN110096310B (en) | Operation method, operation device, computer equipment and storage medium | |
TW201805858A (en) | Method for performing neural network computation and apparatus | |
CN111047022B (en) | Computing device and related product | |
CN111767995B (en) | Computing methods, devices and related products | |
CN111209243B (en) | Data processing device, method and related product | |
CN111047021B (en) | Computing device and related product | |
CN111723920B (en) | Artificial intelligence computing devices and related products | |
CN112052040B (en) | Processing method, device, computer equipment and storage medium | |
CN113033789B (en) | Bus system, integrated circuit device, board card and order preserving method for order preserving | |
CN111832714B (en) | Computing methods and devices | |
CN109739514B (en) | Parameter processing method and related product | |
CN111813449A (en) | Computing method, device and related products | |
CN111275197B (en) | Operation method, device, computer equipment and storage medium | |
CN111723921B (en) | Artificial intelligence computing device and related products | |
US20220156077A1 (en) | Artificial intelligence computing device and related product | |
CN118012505A (en) | Artificial intelligence processors, integrated circuit chips, boards, electronic devices | |
CN112395009A (en) | Operation method, operation device, computer equipment and storage medium | |
CN111290789B (en) | Operation method, operation device, computer equipment and storage medium | |
CN111382850A (en) | Operation method, device and related product | |
CN111338694B (en) | Operation method, device, computer equipment and storage medium | |
CN111026440B (en) | Operation method, operation device, computer equipment and storage medium | |
CN111260045B (en) | Decoder and atomic instruction analysis method | |
CN111124497B (en) | Operation method, operation device, computer equipment and storage medium | |
CN111399905B (en) | Operation method, device and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TG01 | Patent term adjustment | ||
TG01 | Patent term adjustment |