CN112396186B - Execution method, execution device and related product - Google Patents
Execution method, execution device and related product Download PDFInfo
- Publication number
- CN112396186B CN112396186B CN201910740813.7A CN201910740813A CN112396186B CN 112396186 B CN112396186 B CN 112396186B CN 201910740813 A CN201910740813 A CN 201910740813A CN 112396186 B CN112396186 B CN 112396186B
- Authority
- CN
- China
- Prior art keywords
- data
- execution
- executed
- address
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present disclosure relates to an execution method, apparatus, and related products. The machine learning device comprises one or more instruction processing devices, a control device and a control device, wherein the one or more instruction processing devices are used for acquiring data and control information to be executed from other processing devices, executing specified machine learning execution and transmitting an execution result to the other processing devices through an I/O interface; when the machine learning execution device includes a plurality of instruction processing devices, the plurality of instruction processing devices can be connected to each other by a specific configuration and data can be transferred. The instruction processing devices are interconnected and transmit data through a PCIE bus of a rapid external equipment interconnection bus; the plurality of instruction processing devices share the same control system or have respective control systems and share memories or have respective memories; the interconnection of the plurality of instruction processing apparatuses is an arbitrary interconnection topology. The execution method, the execution device and the related products provided by the embodiment of the disclosure are wide in application range, high in processing efficiency and high in processing speed for collecting instructions.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an execution method, an execution device, and related products.
Background
With the continuous development of technology, machine learning, especially neural network algorithms, are increasingly used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. But as the complexity of neural network algorithms increases, the variety and number of data executions involved increases. In the related art, the collection of data is performed at low efficiency and low speed.
Disclosure of Invention
In view of this, the present disclosure proposes an execution method, apparatus and related products to improve the efficiency and speed of the collection execution of data.
According to a first aspect of the present disclosure, there is provided a collection instruction processing apparatus, the apparatus comprising:
the control module is used for analyzing the acquired collection instruction to obtain an operation code and an operation domain of the collection instruction, and acquiring at least one index data, at least one data to be executed and a target address required by executing the collection instruction according to the operation code and the operation domain;
an execution module for determining selected data from the data to be operated according to the index data, storing the selected data and the number of the selected data as the execution result of the collection instruction into the target address,
The operation code is used for indicating the execution of the collection instruction on data to be collection execution, and the operation field comprises a data address to be executed, an index data address and the target address.
According to a second aspect of the present disclosure, there is provided a machine learning execution device, the device comprising:
one or more of the collection instruction processing apparatuses according to the first aspect of the present invention are configured to obtain data and control information to be executed from other processing apparatuses, execute specified machine learning execution, and transmit an execution result to the other processing apparatuses through an I/O interface;
when the machine learning execution device comprises a plurality of the collection instruction processing devices, the collection instruction processing devices can be connected through a specific structure and transmit data;
The collecting instruction processing devices are interconnected through a PCIE bus of a rapid external equipment interconnection bus and transmit data so as to support the execution of machine learning in a larger scale; the plurality of collection instruction processing devices share the same control system or have respective control systems; the plurality of collection instruction processing devices share a memory or have respective memories; the interconnection mode of the plurality of the collection instruction processing devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combination processing apparatus, the apparatus comprising:
the machine learning execution device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning execution device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network execution apparatus of the second aspect or the combination processing apparatus of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure including the machine learning chip of the fourth aspect described above.
According to a sixth aspect of the present disclosure, there is provided a board including the machine learning chip package structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device including the machine learning chip described in the fourth aspect or the board described in the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a gather instruction processing method, the method being applied to a gather instruction processing apparatus, the method comprising:
analyzing the acquired collection instruction to obtain an operation code and an operation domain of the collection instruction, and acquiring at least one index data, at least one data to be executed and a target address required by executing the collection instruction according to the operation code and the operation domain;
determining selected data from the data to be executed according to the index data, storing the selected data and the number of the selected data as the execution result of the collection instruction into the target address,
And executing, wherein the operation code is used for indicating the execution of the data by the collection instruction to be collection execution, and the operation field comprises a data address to be executed, an index data address and the target address.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
The embodiment of the disclosure provides a method, a device and a related product for processing a collection instruction, wherein the device comprises a control module and an execution module. The control module is used for analyzing the acquired collection instruction to obtain an operation code and an operation domain of the collection instruction, and acquiring at least one index data, at least one data to be executed and a target address required by executing the collection instruction according to the operation code and the operation domain. Determining selected data from the data to be executed according to the index data, storing the selected data and the number of the selected data as the execution result of the collection instruction into the target address,
The collection instruction processing method, the collection instruction processing device and the related products provided by the embodiment of the disclosure are wide in application range, high in processing efficiency and high in processing speed for collection instructions.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 illustrates a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure;
FIG. 2 illustrates a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure;
FIGS. 2 a-2 e illustrate block diagrams of a gather instruction processing apparatus according to an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of an application scenario of a gather instruction processing apparatus according to an embodiment of the present disclosure;
FIGS. 4a, 4b show block diagrams of a combined processing apparatus according to an embodiment of the present disclosure;
FIG. 5 shows a schematic structural view of a board card according to an embodiment of the present disclosure;
fig. 6 shows a flowchart of a gather instruction processing method according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
Fig. 1 shows a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a control module 11 and an execution module 12.
The control module 11 is configured to parse the obtained collection instruction to obtain an operation code and an operation domain of the collection instruction, and obtain at least one index data, at least one data to be executed, and a target address required for executing the collection instruction according to the operation code and the operation domain. The operation code is used for indicating the execution of the data by the collection instruction to be collection execution, and the operation domain comprises a data address to be executed, an index data address and a target address.
And an execution module 12, configured to determine selected data from the data to be executed according to the index data, and store the selected data and the number of the selected data as an execution result of the gather instruction in the target address, where the operation code is configured to instruct the gather instruction to execute the data as a gather execution, and the operation field includes a data address to be executed, an index data address, and the target address.
In this embodiment, the control module may acquire at least one data to be executed and at least one index data from the data address to be executed and the index data address, respectively. The control module may obtain the collection instruction, the at least one data to be executed, and the at least one index data through a data input output unit, which may be one or more data I/O interfaces or I/O pins.
In this embodiment, the opcode may be a portion of an instruction or field (usually represented by a code) specified in a computer program to perform an operation, and is an instruction sequence number used to inform an apparatus executing the instruction of which instruction is specifically required to be executed. The operation field may be a source of all data required to execute the corresponding instruction, including parameter data, data to be executed, corresponding execution method, or an address storing the parameter data, the data to be executed, the corresponding execution method, etc. It must include an opcode and an operation field for a gather instruction, where the operation field includes at least a data address to be executed, an index data address, and a target address.
It should be appreciated that the instruction format of the gather instruction, as well as the opcode and operation fields contained, may be set by those skilled in the art as desired, and this disclosure is not limited in this regard.
Optionally, as shown in fig. 2, the execution module includes one or more comparators, selectors, and counters. Wherein the one or more comparators are used for comparing the index data with preset conditions and determining whether the index data meets the preset conditions; and the one or more selectors are used for taking the data to be executed corresponding to the index data meeting the preset condition as the selected data when the index data meets the preset condition. A counter for determining the number of selected data.
Alternatively, the preset condition may be that the index data is not zero.
In this implementation, when the index data is not zero, the number of the index data which is not zero and the data to be executed corresponding to the index data which is not zero are sequentially stored to the first address and the second address of the target address. The preset condition may also be that the index data is not a specified value, and the specified value may be a value such as 1. The preset conditions can be set by those skilled in the art according to actual needs, and the present disclosure is not limited thereto.
In this implementation, the condition or index data may be preset as needed to store the data required in the data to be executed to the target address. For example, to collect the data to be executed according to different collection needs, different preset conditions may be set, or different index data may be set to implement different collection of the data to be executed.
Alternatively, the data to be executed may be tensor data, the index data may be tensor data corresponding to the data to be executed, and the number of index data may be greater than or equal to the number of data to be executed. At this time, a preset mapping relationship exists between the index data and the data to be executed, so that the execution module can select selected data from the data to be executed according to the index data and the preset mapping relationship. Specifically, the number of the index data may be equal to the number of the data to be executed, the index data is set in one-to-one correspondence with the data to be executed, and at this time, when the index data satisfies a preset condition, the data to be executed corresponding to the index data may be selected as the data.
The tensor data may be neural network data, such as neuron data or weight data of a neural network, and the like. Tensor data refers to data above 0 dimensions, which may have multiple dimensions. In particular, the 0-dimensional tensor data is scalar data, the 1-dimensional tensor data is vector data, and the 2-dimensional tensor data may be matrix data or the like. In other embodiments, the index data and the data to be executed may also be scalar data or the like, and are not particularly limited herein.
Further optionally, the index data has a bit value. For example, the index data has a value of 0 or 1. Optionally, when the value of the index data is 0, the data to be executed corresponding to the index data is discarded, and when the value of the index data is 1, the data to be executed corresponding to the index data is used as the selected data.
Alternatively, the target address in the instruction operation domain may be a start address, the start address being divided into a first address for storing the number of the selected data and a second address for storing the selected data, and the execution module may determine a size of a memory space required for the execution result according to the start address and the size of the execution result, and store the execution result in the determined memory space.
Optionally, the target address includes a first address for storing the number of selected data and a second address for storing the selected data; wherein the size of the address space pointed by the first address is smaller than or equal to the size of the address space pointed by the second address. For example, the target address may comprise a plurality of row addresses, wherein the memory space pointed to by the first row address is used for storing the number of selected data and the memory space pointed to by the other row addresses is used for storing the selected data.
Alternatively, the source address and the destination address of the data to be executed in the operation domain may be one or more, one for each source address. The destination address includes a first address for storing the number of selected data and a second address for storing the selected data, wherein the first address points to an address space having a size less than or equal to a size of an address space to which the second address points. The data to be operated on may be one or more. That is, in this embodiment, the apparatus may include one or more control modules and one or more execution modules, and the number of control modules and execution modules may be set according to actual needs, which is not limited in this disclosure.
In other alternative embodiments, the execution result may also include only selected data, and the target address is used to point to the address of the memory space of the selected data, accordingly. Further optionally, the index data has a bit value. For example, the index data has a value of 0 or 1. Optionally, when the value of the index data is 0, the data to be executed corresponding to the index data is discarded, and when the value of the index data is 1, the data to be executed corresponding to the index data is used as the selected data.
The embodiment of the disclosure provides a collection instruction processing device, which comprises a control module and an execution module. The control module is used for analyzing the acquired collection instruction to obtain an operation code and an operation domain of the collection instruction, and acquiring at least one index data, at least one data to be executed and a target address required by executing the collection instruction according to the operation code and the operation domain. And the execution module is used for determining selected data from the data to be executed according to the index data, and storing the selected data and the number of the selected data as the execution result of the collection instruction into the target address. The execution module comprises one or more comparators, which are used for comparing the index data with preset conditions and determining whether the index data meets the preset conditions; and the one or more selectors are used for taking the data to be executed corresponding to the index data meeting the preset condition as the selected data when the index data meets the preset condition.
In one possible implementation manner, the apparatus further includes a storage module, configured to store the at least one index data, the at least one data to be operated on, and the preset condition.
The collection instruction processing device provided by the embodiment of the disclosure is wide in application range, high in processing efficiency and high in processing speed for collection instructions.
Fig. 2a shows a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2a, the execution module 12 may include a master execution sub-module 121 including a counter and at least one slave execution sub-module 122 including a comparator and a selector. Further, the execution module may further include a data access circuit, where the data access circuit may obtain data to be operated from the storage module, and the data access circuit may further store an execution result into the storage module. Alternatively, the data access circuitry may be a direct memory access module.
The control module 11 is further configured to parse the collection instruction to obtain at least one execution instruction, and send the at least one index data, the at least one data to be executed, and the at least one execution instruction to the main execution sub-module 121.
The one or more comparators of the slave execution sub-module 122 are configured to compare the index data with a preset condition, and determine whether the index data satisfies the preset condition. And the selector of the slave execution sub-module is used for taking the data to be executed corresponding to the index data meeting the preset condition as the selected data when the index data meets the preset condition, and sending the selected data to the master execution sub-module.
The main execution sub-module 121, the counter of the module is used to determine the number of the selected data, and store the number of the selected data and the selected data into the target address.
Alternatively, the target address may include a first address and a second address, wherein the first address points to a storage space for storing the number of selected data, the second address points to a storage space for storing the selected data, and the first address performs a smaller storage space than the second address points to.
Fig. 2b shows a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2b, the execution module 12 may further include one or more branch execution sub-modules 123, where the branch execution sub-modules 123 are configured to forward data and/or execution instructions between the master execution sub-module 121 and the slave execution sub-module 122. Wherein the main execution sub-module 121 is connected to one or more branch execution sub-modules 123.
Fig. 2c shows a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2c, at least one slave execution sub-module 122 is distributed in an array.
Each of the slave execution sub-modules 122 is connected to other adjacent slave execution sub-modules 122, and the master execution sub-module 121 is connected to k slave execution sub-modules 122 among the plurality of slave execution sub-modules 122, where the k slave execution sub-modules 122 are: n slave execution sub-modules 122 of row 1, n slave execution sub-modules 122 of row m, and m slave execution sub-modules 122 of column 1.
As shown in fig. 2c, the k slave execution sub-modules only include n slave execution sub-modules in the 1 st row, n slave execution sub-modules in the m th row, and m slave execution sub-modules in the 1 st column, that is, the k slave execution sub-modules are slave execution sub-modules directly connected with the master execution sub-module from the plurality of slave execution sub-modules. And k slave execution sub-modules are used for forwarding data and instructions among the master execution sub-module and the plurality of slave execution sub-modules.
Fig. 2d shows a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2d, the execution module may also include a tree submodule 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master execution sub-module 121, and the plurality of branch ports 402 are connected to the plurality of slave execution sub-modules 122, respectively.
The tree submodule 124 has a transceiver function and is used for forwarding data and/or execution instructions between the master execution submodule 121 and the slave execution submodule 122.
In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one layer of nodes. The node is a line structure with a forwarding function, and the node itself does not have an execution function. The lowest level of nodes is connected with the slave execution submodules to forward data and/or execution instructions between the master execution submodule 121 and the slave execution submodule 122. In particular, if the tree submodule has zero level nodes, the device does not require a tree submodule.
In one possible implementation, tree submodule 124 may include a plurality of nodes of an n-ary tree structure, which may have a plurality of layers.
For example, fig. 2e shows a block diagram of a gather instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 2e, the n-ary tree structure may be a binary tree structure, the tree submodule comprising a layer 2 node 01. The lowest level node 01 is connected with the slave execution sub-module 122 to forward data and/or execution instructions between the master execution sub-module 121 and the slave execution sub-module 122.
In this implementation, the n-ary tree structure may also be a three-ary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of layers of n in the n-ary tree structure and nodes in the n-ary tree structure can be set as desired by those skilled in the art, and this disclosure is not limited in this regard.
In one possible implementation, the main execution sub-module 121 may include one or more comparators for performing the comparison execution in the preamble processing and/or the subsequent processing.
In this implementation, the number of comparators and selectors in the slave execution sub-module may be set according to the size of the data amount to be compared, the processing speed, efficiency, etc. of the comparison execution, which is not limited by the present disclosure. For example, taking the preset condition that the index data is not 0 as an example, the comparator may compare the index data with 0 to obtain a comparison result. And further, the slave execution sub-module can determine the data to be executed corresponding to the index data which is not 0 as the selected data according to the comparison result, and transmit the selected data to the master execution sub-module. The main execution sub-module may determine a number of selected data, and store the number of selected data and the selected data sequentially in the first address and the second address of the target address.
In one possible implementation, the operation domain may also include a read-in amount or a storage address of the read-in amount. The control module 11 is further configured to obtain a read-in amount, and obtain at least one data to be executed according to the read-in amount. Wherein the data amount of the at least one data to be executed is smaller than or equal to the read-in amount, which is smaller than or equal to the data amount of the at least one index data.
In this implementation, the read-in amount may be a data amount of the acquired at least one data to be executed, and may be a size of the acquired data to be executed. When the operation field directly contains a specific value of the read-in amount, the value may be determined as the read-in amount. When a read-in amount of a memory address is included in the operation domain, the read-in amount may be acquired from the memory address.
In one possible implementation, when the operation domain does not include the read-in amount, at least one data to be executed may be acquired according to a preset default read-in amount. The acquired data volume of the at least one data to be executed is smaller than or equal to a default read-in volume, and the default read-in volume is smaller than or equal to the data volume of the at least one index data.
In this implementation, the data amount of the at least one data to be executed, the data amount of the at least one index data, and the data amount of the target address capable of data storage may be the same, and may be equal to the read-in amount or the default read-in amount.
Therefore, the execution module can sequentially store the data to be executed corresponding to the index data meeting the preset conditions into the target address, and the problems of insufficient target address, waste of the target address and the like are avoided.
In one possible implementation, as shown in fig. 2 a-2 e, the apparatus may further comprise a storage module 13. The storage module 13 is configured to store at least one index data, at least one data to be executed, and a preset condition.
In this implementation, the storage module may include one or more of memory, cache, and registers, and the cache may include a scratch pad cache. The at least one index data, the at least one data to be executed, and the preset condition may be stored in a memory, a buffer, and/or a register in the storage module as needed, which is not limited by the present disclosure.
In one possible implementation, the apparatus may further include a direct memory access module for reading or storing data from the storage module.
In one possible implementation, as shown in fig. 2a-2e, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.
The instruction storage sub-module 111 is used for storing the collection instruction.
The instruction processing sub-module 112 is configured to parse the collected instruction to obtain an operation code and an operation domain of the collected instruction.
The queue storage submodule 113 is configured to store an instruction queue including a plurality of collection instructions sequentially arranged in order of execution.
In this implementation, the execution order of the plurality of collected instructions may be arranged according to the reception time, priority level, and the like of the collected instructions to obtain an instruction queue, so that the plurality of collected instructions are sequentially executed according to the instruction queue.
In one possible implementation, as shown in FIGS. 2 a-2 e, execution module 12 may include a dependency processing sub-module 122.
The dependency relationship processing sub-module 122 is configured to cache the first collection instruction in the instruction storage sub-module 112 when determining that the first collection instruction has an association relationship with a zeroth collection instruction before the first collection instruction, and extract the first collection instruction from the instruction storage sub-module 112 and send the first collection instruction to the execution module 12 after the execution of the zeroth collection instruction is completed. Wherein the first gather instruction and the zeroth gather instruction are instructions of a plurality of gather instructions.
Wherein, the association relationship between the first collection instruction and the zeroth collection instruction before the first collection instruction includes: the first storage address interval storing the data required by the first collect instruction and the zeroth storage address interval storing the data required by the zeroth collect instruction have overlapping areas. In contrast, the first gather instruction and the zeroth gather instruction have no association relationship, and the first memory address interval and the zeroth memory address interval have no overlapping area.
By the method, the following collection instructions can be executed after the previous collection instructions are executed according to the dependency relationship among the collection instructions, and the accuracy of the execution results is ensured.
In one possible implementation, the instruction format of the gather instruction may be:
collect dst,src0,src1,size
Wherein collect is the opcode of the gather instruction, dst, src0, src1, size are the operation fields of the gather instruction. dst is a target address, wherein dst includes a first address and a second address, src0 is a data address to be executed, src1 is an index data address, and size is a read-in amount. The execution module can acquire the index data and the data to be executed with the size from the storage module according to the analyzed instruction, and when the index data meets the preset condition, the data to be executed corresponding to the index data meeting the preset condition is used as the selected data. The execution module may further determine the number of selected data and store the selected data and the number thereof in the memory space pointed to by the target address. Alternatively, the value of the index data may also be a bit value represented by 0 or 1.
When the data to be operated on is plural, src0 may include one src00 or plural data addresses to be operated on src00, src01, src02 …, and src0n, where the one data address to be operated on may include one data to be operated on, and may also include a set of data to be operated on, which is not limited in this disclosure. When the data to be operated on is plural, the index data may be one or plural, the index data may be the same number of data of the same data type as the data to be operated on, and in this case, src1 may include plural index data addresses src10, src11, src12 …, src1n. The index data may also be bit data, where src1 is a bit number having a bit number that is the data to be operated on, which is not limited by the present disclosure.
When the data to be operated is plural, the instruction format may include plural data addresses to be operated and one or more index data addresses, and taking the case of including two data to be operated as an example, the instruction format of the gather instruction may be:
collect,dst,src00,src01,src1,size
The instruction format of the gather instruction may also be:
collect,dst,src00,src01,src10,src11,size
it should be appreciated that one skilled in the art may set the location of the opcode, opcode and operation field in the instruction format for the gather instruction as desired, and this disclosure is not limited in this regard.
In one possible implementation, the apparatus may be provided in one or more of (Graphics Processing Unit, GPU for short), central processor (Central Processing Unit, CPU for short) and embedded neural network processor (Neural-network Processing Unit, NPU for short).
It should be noted that, although the above embodiment has been described as an example of the gather instruction processing apparatus, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.
Application example
An application example according to an embodiment of the present disclosure is given below in conjunction with "data collection with a collection instruction processing apparatus" as one exemplary application scenario, in order to facilitate understanding of the flow of the collection instruction processing apparatus. It will be appreciated by those skilled in the art that the following application examples are for purposes of facilitating understanding of the embodiments of the present disclosure only and should not be construed as limiting the embodiments of the present disclosure.
Fig. 3 shows a schematic diagram of an application scenario of a gather instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the procedure of the gather instruction processing apparatus processing the gather instruction is as follows:
The control module 11 parses the obtained gather instruction 1 (for example, the gather instruction 1 is @ select #500#100#200# 5) to obtain the operation code and the operation field of the gather instruction 1. The operation code of the collection instruction 1 is select, the target address is 500, wherein the target address comprises a first address and a second address, the data address to be executed is 100, the index data address is 200, and the read-in amount is 5. The control module 11 acquires a plurality of data to be executed and a plurality of index data with a read-in amount of 5 from the data address to be executed 100 and the index data address 200, respectively.
It is assumed that the obtained plurality of data to be executed includes 1,5, 6, 7, 3. The plurality of index data includes 1, 8, 0, 6, and 9. The preset condition is that the index data is not 0.
The comparator of the execution module 12 determines whether the plurality of index data is 0, and sequentially stores the data to be executed corresponding to the index data other than 0 in the target address 500 when the index data is not 0. Specifically, the executing module 12 determines whether the plurality of index data "1, 8, 0, 6, 9" is not 0, and, because the third index data is 0, sequentially stores the selected data "1, 5, 7, 3" of the plurality of data to be executed, which is determined to satisfy the preset condition, in the second address of the target address 500, and the counter of the executing module 12 determines that the number of the selected data is 4 according to the selected data stored in the second address and stores the selected data in the first address of the target address 500. The operation of the above modules may be described with reference to the relevant description above. As shown in fig. 3, the first address may be a memory space pointed to by a first row of the target address, and the second address may be a memory space pointed to by other rows of the target address.
In one possible implementation manner, the obtained index data may be a bit value 11011, where each bit of the bit value corresponds to one data to be executed, and the preset condition is that the value of the bit of the index data corresponding to the data to be executed is not 0.
The comparator of the execution module 12 determines whether the bit of the index data is 0, and the selector of the execution module may sequentially store the data to be executed corresponding to the bit of the index data that is not 0 in the target address 500 when the bit of the index data is not 0. Specifically, the comparator of the execution module 12 determines whether the index data "1, 0, 1" is not 0, and since the third bit of the index data is 0, the selector may sequentially store the selected data determined as "1, 5, 7, 3" of the plurality of data to be executed as satisfying the preset condition into the second address of the target address 500, the counter of the execution module 12 determines that the number of the selected data is 4 according to the selected data stored into the second address, and stores the number of the selected data into the first address of the target address 500. As shown in fig. 3, the first address may be a memory space pointed to by a first row of the target address, and the second address may be a memory space pointed to by other rows of the target address.
Thus, the collection instruction processing device can efficiently and quickly process the collection instruction.
The present disclosure provides a machine learning execution device that may include one or more of the above-described collection instruction processing devices for acquiring data and control information to be executed from other processing devices, and executing specified machine learning execution. The machine learning execution device may obtain the collection instruction from other machine learning execution devices or non-machine learning execution devices, and transmit the execution result to the peripheral device (may also be referred to as other processing devices) through the I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one collection instruction processing device is included, the collection instruction processing devices may be linked and data transmitted by a specific structure, for example, interconnected and data transmitted by a PCIE bus, so as to support the execution of a larger-scale neural network. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.
The machine learning execution device has higher compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing apparatus according to an embodiment of the disclosure. As shown in fig. 4a, the combined processing device includes the machine learning execution device, the universal interconnect interface, and other processing devices described above. The machine learning execution device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing means may include one or more processor types of general purpose/special purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning execution device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning execution device; other processing devices may also cooperate with the machine learning execution device to perform execution tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning execution device and other processing devices. The machine learning execution device acquires required input data from other processing devices and writes the required input data into a storage device on a machine learning execution device chip; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning execution device; the data in the memory module of the machine learning execution device may also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing apparatus according to an embodiment of the disclosure. In a possible implementation, as shown in fig. 4b, the combined processing device may further comprise a storage device, which is connected to the machine learning execution device and the other processing device, respectively. The storage device is used for storing the data of the machine learning execution device and the other processing devices, and is especially suitable for the data which is needed to be executed and cannot be stored in the internal storage of the machine learning execution device or the other processing devices.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.
The present disclosure provides a machine learning chip including the machine learning execution device or the combination processing device described above.
The present disclosure provides a machine learning chip packaging structure including the machine learning chip described above.
The present disclosure provides a board card, and fig. 5 shows a schematic structural diagram of the board card according to an embodiment of the present disclosure. As shown in fig. 5, the board card includes the above machine learning chip package structure or the above machine learning chip. In addition to including machine learning chip 389, the board card may include other kits including, but not limited to: a memory device 390, an interface device 391 and a control device 392.
The memory device 390 is connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each set of memory units 393 is connected to the machine learning chip 389 via a bus. It is understood that each set of memory units 393 may be DDR SDRAM (Double sided DATA RATE SDRAM, double speed synchronous dynamic random access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 sets of memory cells 393. Each set of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers within, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is appreciated that the theoretical bandwidth of data transfer may reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells 393.
In one embodiment, each set of memory cells 393 includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage for each memory unit 393.
The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure). The interface device 391 is used to enable data transfer between the machine learning chip 389 and an external device (e.g., a server or computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the machine learning chip 289 through a standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device 391 may be another interface, and the disclosure is not limited to the specific implementation form of the other interface, and the interface device may be capable of implementing the transfer function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to the machine learning chip 389. The control device 392 is configured to monitor the status of the machine learning chip 389. Specifically, machine learning chip 389 and control device 392 may be electrically connected via an SPI interface. The control device 392 may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the machine learning chip 389 may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the machine learning chip 389 may be in different operating states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device including the machine learning chip or the board card described above.
The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers, range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
Fig. 6 shows a flowchart of a gather instruction processing method according to an embodiment of the present disclosure. As shown in fig. 6, the method is applied to the above-described collection instruction processing device, and includes step S51 and step S52.
In step S51, the acquired collection instruction is parsed to obtain an operation code and an operation field of the collection instruction, and at least one index data, at least one data to be executed, and a target address required for executing the collection instruction are acquired according to the operation code and the operation field. The operation code is used for indicating the execution of the data by the collection instruction to be collection execution, and the operation domain comprises a data address to be executed, an index data address and a target address.
In step S52, selected data is determined from the data to be executed according to the index data, and the selected data and the number of selected data are stored in the target address as the execution result of the collection instruction. Execution in one possible implementation, the method may further include: analyzing the collection instruction to obtain a plurality of execution instructions.
Wherein, step S52 may include:
One or more comparators compare the index data with a preset condition and determine whether the index data meets the preset condition;
And when the index data meets the preset condition, one or more selectors take the data to be executed corresponding to the index data meeting the preset condition as the selected data, and store the selected data in a target address.
In one possible implementation, step S52 may include: the number of the selected data is determined by a counter and stored at the target address.
In one possible implementation, the execution target address includes a first address for storing the number of selected data and a second address for storing the selected data;
Wherein the size of the address space pointed by the first address is smaller than or equal to the size of the address space pointed by the second address.
In one possible implementation, the operation domain may further include a read-in amount or a storage address of the read-in amount, and step S51 may include: and acquiring the read-in quantity, and acquiring a plurality of data to be executed according to the read-in quantity. Wherein the data amount of the at least one data to be executed is smaller than or equal to the read-in amount, which is smaller than or equal to the data amount of the plurality of index data.
In one possible implementation, the method may further include: at least one index data, at least one data to be executed, and a preset condition are stored.
In one possible implementation manner, the data to be executed is tensor data, the index data is tensor data corresponding to the data to be executed, and the number of the index data is greater than or equal to the number of the data to be executed.
In one possible implementation, the index data has a bit value.
In one possible implementation, the execution module includes a master execution sub-module including a counter and at least one slave execution sub-module including a comparator and a selector;
The control module analyzes the compiled collection instruction to obtain at least one execution instruction, and sends the at least one index data, the at least one data to be executed and the at least one execution instruction to the slave execution sub-module;
The comparator in the slave execution sub-module compares the index data with preset conditions to determine whether the index data meets the preset conditions; when the index data meets the preset condition, the selector in the slave execution sub-module takes the data to be executed corresponding to the index data meeting the preset condition as the selected data, and sends the selected data to the master execution sub-module;
the counter of the main execution sub-module determines the number of the selected data according to the selected data transmitted by at least one of the auxiliary execution sub-modules, and stores the number of the selected data and the selected data into the target address.
In one possible implementation, step S51 may include:
Storing a collection instruction;
analyzing the collection instruction to obtain an operation code and an operation domain of the collection instruction;
an instruction queue is stored, the instruction queue comprising a plurality of gather instructions arranged in order of execution.
In one possible implementation, the method may further include:
When the association relation between the first collection instruction and the zeroth collection instruction before the first collection instruction is determined, the first collection instruction is cached, after the execution of the zeroth collection instruction is finished, the first collection instruction is executed,
Wherein, the association relationship between the first collection instruction and the zeroth collection instruction before the first collection instruction includes:
the first storage address interval storing the data required by the first collect instruction and the zeroth storage address interval storing the data required by the zeroth collect instruction have overlapping areas.
In one possible implementation, the preset condition may include the index data being non-zero.
It should be noted that, although the above embodiment has been described as an example of the gather instruction processing method, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.
The collection instruction processing method provided by the embodiment of the disclosure has the advantages of wide application range, high processing efficiency and high processing speed on collection instructions.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present disclosure, it should be understood that the disclosed system and apparatus may be implemented in other manners. For example, the system, apparatus embodiments described above are merely illustrative, such as the division of devices, apparatus, modules, is merely a logical function division, and there may be additional divisions when actually implemented, such as multiple modules may be combined or integrated into another system or apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be an indirect coupling or communication connection via some interfaces, devices, means, or modules, which may be in electrical or other form.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be collected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present disclosure may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software program modules.
The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the various embodiments of the present disclosure. And the aforementioned memory includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The foregoing may be better understood in light of the following clauses:
clause 1: a gather instruction processing apparatus, the apparatus comprising:
the control module is used for analyzing the acquired collection instruction to obtain an operation code and an operation domain of the collection instruction, and acquiring at least one index data, at least one data to be operated and a target address required by executing the collection instruction according to the operation code and the operation domain;
The execution module is used for determining selected data from data to be operated according to the index data, and storing the selected data and the number of the selected data as the execution result of the collection instruction into the target address;
The operation code is used for indicating the execution of the data by the collection instruction to be collection execution, and the operation domain comprises a data address to be operated, an index data address and the target address.
Clause 2: the apparatus of clause 1, the execution module comprising:
one or more comparators for comparing the index data with a preset condition, and determining whether the index data satisfies the preset condition;
One or more selectors for, when the index data satisfies the preset condition, taking data to be executed corresponding to the index data satisfying the preset condition as the selected data; and a counter for determining the number of selected data.
Clause 3: the apparatus of clause 1 or 2, wherein the data to be executed is tensor data, the index data is tensor data corresponding to the data to be executed, and the number of the index data is greater than or equal to the number of the data to be executed.
Clause 4; the apparatus of clause 3, wherein the index data has a value of a bit value.
Clause 5: the apparatus of clause 1 or 2, the target address comprising a first address for storing the number of selected data and a second address for storing the selected data;
Wherein the size of the address space pointed by the first address is smaller than or equal to the size of the address space pointed by the second address.
Clause 6: the apparatus of any of clauses 1-5, the execution module comprising a master execution sub-module and at least one slave execution sub-module, the master execution sub-module comprising the counter, the at least one slave execution sub-module comprising the comparator and the selector;
the control module is further configured to parse the compiled collection instruction to obtain at least one execution instruction, and send the at least one index data, the at least one data to be executed, and the at least one execution instruction to the slave execution sub-module;
The one or more comparators of the slave execution sub-module are used for comparing the index data with preset conditions to determine whether the index data meets the preset conditions, and the selector is used for taking data to be executed corresponding to the index data meeting the preset conditions as the selected data when the index data meets the preset conditions and sending the selected data to the master execution sub-module;
the counter of the main execution sub-module is used for determining the number of the selected data and storing the number of the selected data and the selected data into a first address and a second address of the target address. .
Clause 7: the apparatus of any of clauses 1-6, the operation field further comprising a read-in amount or a storage address of the read-in amount,
Wherein the control module is further configured to obtain the read-in amount, and obtain the at least one data to be executed according to the read-in amount,
Wherein the data amount of the at least one data to be executed is smaller than or equal to the read-in amount, which is smaller than or equal to the data amount of the at least one index data.
Clause 8: the apparatus of any of clauses 1-7, further comprising:
And the storage module is used for storing at least one of the index data, the data to be executed and the preset condition.
Clause 9: the apparatus of any of clauses 1-8, the control module comprising:
An instruction storage sub-module storing the collection instruction;
The instruction processing sub-module analyzes the collection instruction to obtain an operation code and an operation domain of the collection instruction; the queue storage submodule stores an instruction queue, and the instruction queue comprises a plurality of collection instructions which are sequentially arranged according to an execution sequence.
Clause 10: the apparatus of any of clauses 1-9, the control module further comprising:
A dependency relationship processing sub-module, configured to cache a first to-be-executed instruction in the plurality of to-be-executed instructions in the instruction storage sub-module when determining that there is an association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction, extract the first to-be-executed instruction from the instruction storage sub-module after the execution of the zeroth to-be-executed instruction is completed, send the first to-be-executed instruction to the execution module,
Wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
and the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
Clause 11: the apparatus of any one of clauses 1 to 10, wherein the preset condition comprises index data being non-zero.
Clause 12: a gather instruction processing method, the method being applied to a gather instruction processing apparatus, the method comprising:
Analyzing the acquired collection instruction to obtain an operation code and an operation domain of the collection instruction, and acquiring at least one index data, at least one data to be executed and a target address required by executing the collection instruction according to the operation code and the operation domain;
Determining selected data from the data to be executed according to the index data, and storing the selected data and the number of the selected data as an execution result of the collection instruction into the target address;
The operation code is used for indicating the execution of the collection instruction on data to be collection execution, and the operation field comprises a data address to be executed, an index data address and the target address.
Clause 13: the method according to clause 12, wherein determining selected data from the data to be executed according to the index data, and storing the selected data and the number of the selected data as the execution result of the collection instruction in the target address includes:
One or more comparators compare the index data with a preset condition and determine whether the index data meets the preset condition;
When the index data meet the preset conditions, one or more selectors take data to be executed corresponding to the index data meeting the preset conditions as the selected data, and store the selected data into the target address;
a counter determines the number of selected data and stores the number of selected data in the target address.
Clause 14: the method of clause 12 or 13, wherein the data to be executed is tensor data, the index data is tensor data corresponding to the data to be executed, and the number of the index data is greater than or equal to the number of the data to be executed.
Clause 15: the method of clause 14, wherein the index data has a value of a bit number.
Clause 16: the method of clause 13 or 14, the target address comprising a first address for storing the number of selected data and a second address for storing the selected data;
Wherein the size of the address space pointed by the first address is smaller than or equal to the size of the address space pointed by the second address.
Clause 17: the method of clause 13, the execution module comprising a master execution sub-module and at least one slave execution sub-module, the master execution sub-module comprising the counter, the at least one slave execution sub-module comprising the comparator and the selector;
The control module analyzes the compiled collection instruction to obtain at least one execution instruction, and sends the at least one index data, the at least one data to be executed and the at least one execution instruction to the slave execution sub-module;
The comparator of the slave execution sub-module compares the index data with preset conditions to determine whether the index data meets the preset conditions;
When the index data meets the preset condition, the selector of the slave execution sub-module takes the data to be executed corresponding to the index data meeting the preset condition as the selected data, and sends the selected data to the master execution sub-module;
The counter of the main execution sub-module is used for determining the number of the selected data and storing the number of the selected data and the selected data into a first address and a second address of the target address.
Clause 18: the method of clause 12, the operation field further comprising a read-in amount or a storage address of the read-in amount,
Wherein the control module obtains the read-in quantity and obtains the at least one data to be executed according to the read-in quantity,
Wherein the data amount of the at least one data to be executed is smaller than or equal to the read-in amount, which is smaller than or equal to the data amount of the at least one index data.
Clause 19: the method of clause 12, further comprising:
The storage module stores at least one of the plurality of index data, the plurality of data to be executed, and the preset condition.
Clause 20: the method according to clause 12, wherein the parsing the obtained collection instruction to obtain the operation code and the operation field of the collection instruction includes:
The instruction storage submodule stores the collection instruction;
the instruction processing sub-module analyzes the collection instruction to obtain an operation code and an operation domain of the collection instruction;
The queue storage submodule stores an instruction queue which comprises a plurality of collection instructions which are sequentially arranged according to an execution sequence.
Clause 21: the method of any of clauses 12-20, further comprising:
When the dependency relation processing sub-module determines that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relation with a zeroth to-be-executed instruction before the first to-be-executed instruction, the first to-be-executed instruction is cached in the instruction storage sub-module, after the execution of the zeroth to-be-executed instruction is finished, the first to-be-executed instruction is extracted from the instruction storage sub-module and sent to the execution module,
Wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
and the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
Clause 22: the method of any of clauses 12 to 21, wherein the preset condition includes the index data being non-zero.
Clause 22: a computer readable medium, wherein the computer readable medium has stored therein a computer program which, when executed by one or more processing devices, performs the method steps of clauses 12-21.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (8)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910740813.7A CN112396186B (en) | 2019-08-12 | 2019-08-12 | Execution method, execution device and related product |
| PCT/CN2020/088248 WO2020233387A1 (en) | 2019-05-17 | 2020-04-30 | Command processing method and apparatus, and related products |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910740813.7A CN112396186B (en) | 2019-08-12 | 2019-08-12 | Execution method, execution device and related product |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112396186A CN112396186A (en) | 2021-02-23 |
| CN112396186B true CN112396186B (en) | 2024-05-03 |
Family
ID=74602379
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910740813.7A Active CN112396186B (en) | 2019-05-17 | 2019-08-12 | Execution method, execution device and related product |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112396186B (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6360220B1 (en) * | 1998-08-04 | 2002-03-19 | Microsoft Corporation | Lock-free methods and systems for accessing and storing information in an indexed computer data structure having modifiable entries |
| CN101131719A (en) * | 2006-08-23 | 2008-02-27 | 北京同方微电子有限公司 | Micro-processor kernel used for cryptography arithmetic |
| CN109032670A (en) * | 2018-08-08 | 2018-12-18 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector duplicate instructions |
| CN109492241A (en) * | 2018-08-10 | 2019-03-19 | 北京中科寒武纪科技有限公司 | Conversion method, apparatus, computer equipment and storage medium |
| CN109657782A (en) * | 2018-12-14 | 2019-04-19 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
| CN109726822A (en) * | 2018-12-14 | 2019-05-07 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8229879B2 (en) * | 2005-07-08 | 2012-07-24 | Brainlike, Inc. | System and method for auto-adaptive network |
-
2019
- 2019-08-12 CN CN201910740813.7A patent/CN112396186B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6360220B1 (en) * | 1998-08-04 | 2002-03-19 | Microsoft Corporation | Lock-free methods and systems for accessing and storing information in an indexed computer data structure having modifiable entries |
| CN101131719A (en) * | 2006-08-23 | 2008-02-27 | 北京同方微电子有限公司 | Micro-processor kernel used for cryptography arithmetic |
| CN109032670A (en) * | 2018-08-08 | 2018-12-18 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector duplicate instructions |
| CN109492241A (en) * | 2018-08-10 | 2019-03-19 | 北京中科寒武纪科技有限公司 | Conversion method, apparatus, computer equipment and storage medium |
| CN109657782A (en) * | 2018-12-14 | 2019-04-19 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
| CN109726822A (en) * | 2018-12-14 | 2019-05-07 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112396186A (en) | 2021-02-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112396186B (en) | Execution method, execution device and related product | |
| CN111813449A (en) | Computing method, device and related products | |
| CN111047030A (en) | Computing method, apparatus, computer equipment and storage medium | |
| CN111949317A (en) | Instruction processing method and device and related product | |
| CN111381873A (en) | Computing method, device and related products | |
| CN112394985B (en) | Execution method, device and related products | |
| CN111949318B (en) | Instruction processing method, device and related products | |
| CN111290789B (en) | Operation method, operation device, computer equipment and storage medium | |
| CN111026440B (en) | Operation method, operation device, computer equipment and storage medium | |
| CN111275197B (en) | Computing methods, devices, computer equipment and storage media | |
| CN111124497B (en) | Operation method, operation device, computer equipment and storage medium | |
| CN112395003A (en) | Operation method, device and related product | |
| CN111353595A (en) | Operation method, device and related product | |
| CN111339060B (en) | Computing methods, devices, computer equipment and storage media | |
| CN111966401A (en) | Instruction processing method, device and related products | |
| CN111382850A (en) | Computing method, device and related products | |
| CN111382851A (en) | Operation method, device and related product | |
| CN111382390A (en) | Computing method, device and related products | |
| CN111338694B (en) | Operation method, device, computer equipment and storage medium | |
| CN111079914B (en) | Operation method, system and related product | |
| CN111078125B (en) | Operation method, device and related product | |
| CN111078285B (en) | Operation method, system and related product | |
| CN112346781A (en) | Instruction processing method and device and related product | |
| CN111966325A (en) | Instruction processing method and device and related product | |
| CN112346707A (en) | Instruction processing method, device and related products |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |