[go: up one dir, main page]

CN112148291A - Instruction block processing method and device, storage medium, and electronic device - Google Patents

Instruction block processing method and device, storage medium, and electronic device Download PDF

Info

Publication number
CN112148291A
CN112148291A CN201910562823.6A CN201910562823A CN112148291A CN 112148291 A CN112148291 A CN 112148291A CN 201910562823 A CN201910562823 A CN 201910562823A CN 112148291 A CN112148291 A CN 112148291A
Authority
CN
China
Prior art keywords
instruction
instruction block
jump
neural network
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910562823.6A
Other languages
Chinese (zh)
Inventor
姚海东
徐东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201910562823.6A priority Critical patent/CN112148291A/en
Priority to PCT/CN2020/085180 priority patent/WO2020259020A1/en
Publication of CN112148291A publication Critical patent/CN112148291A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • G06F8/63Image based installation; Cloning; Build to order

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the invention provides a method and a device for processing an instruction block, a storage medium and an electronic device, wherein the method comprises the following steps: compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group, instruction block preface table, jump instruction mapping table, instruction block group include a plurality of instruction blocks to be processed, instruction block preface table is used for pointing out the operation order of a plurality of instruction blocks to and the execution equipment of operation instruction block, execution equipment includes: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: a jump instruction and a next executed instruction block; and loading the mirror image packet, and processing the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

Description

指令块的处理方法及装置、存储介质、电子装置Instruction block processing method and device, storage medium, and electronic device

技术领域technical field

本发明涉及神经网络领域,具体而言,涉及一种指令块的处理方法及装置、存储介质、电子装置。The present invention relates to the field of neural networks, and in particular, to a method and device for processing instruction blocks, a storage medium, and an electronic device.

背景技术Background technique

随着计算力的极大提高和大数据的获取便利,深度学习技术取得了巨大进步,越来越多的图像处理、自然语言分析等问题,能通过深度学习技术,得到很好的解决。With the great improvement of computing power and the convenience of obtaining big data, great progress has been made in deep learning technology. More and more problems such as image processing and natural language analysis can be well solved by deep learning technology.

深度神经神经网络模型解决业务问题,需要执行推理(Inference)过程。执行推理运算的设备一般有中央处理器(Central Processing Unit,简称为CPU),图形处理器(Graphics Processing Unit,简称为GPU),现场可编程门阵列(Field Programable GateArray,简称为FPGA)等,在进行此类业务落地的过程中,要高效利用资源和快速获得结果,既需要对推理运算设备的计算、存储架构有深入理解,也需要对深度神经网络描述的运算要求有深刻理解。往往有较大难度,并会花费较长时间。Deep neural network models solve business problems and need to perform an inference process. Devices that perform inference operations generally include a central processing unit (Central Processing Unit, referred to as CPU), graphics processing unit (Graphics Processing Unit, referred to as GPU), field programmable gate array (Field Programable Gate Array, referred to as FPGA), etc. In the process of implementing this kind of business, in order to efficiently utilize resources and obtain results quickly, it is necessary to have a deep understanding of the computing and storage architecture of inference computing devices, as well as the computing requirements described by deep neural networks. It is often more difficult and takes a long time.

特别的,有些业务功能往往需要多个神经网络的组合来完成,比如,在人脸识别业务场景中,需先调用一深度神经网络模型检测图像中是否含有人的脸部图像(FaceDetection人脸检测),如果有人的脸部图像,再将该图像输入另一种深度神经网络进行推理运算,获取该人脸图像的详细特征信息,进行辨别(Face Idnetification,人脸识别),最终获得业务所需的结果。In particular, some business functions often require a combination of multiple neural networks to complete. For example, in the face recognition business scenario, a deep neural network model needs to be called first to detect whether the image contains a human face image (FaceDetection face detection). ), if there is a face image of someone, then input the image into another deep neural network for inference operation, obtain the detailed feature information of the face image, perform identification (Face Idnetification, face recognition), and finally obtain the business needs the result of.

针对相关技术中,对于一个或多个神经网络系统,如何调度处理不同的指令块等问题,目前尚未存在有效的解决方案。In the related art, for one or more neural network systems, how to schedule and process different instruction blocks, etc., there is currently no effective solution.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供了一种指令块的处理方法及装置、存储介质、电子装置,以解决相关技术中对于一个或多个神经网络系统,如何调度处理不同的指令等问题。Embodiments of the present invention provide an instruction block processing method and device, a storage medium, and an electronic device to solve the problem of how to schedule and process different instructions for one or more neural network systems in the related art.

根据本发明的一个实施例,提供了一种指令块的处理方法,包括:通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。According to an embodiment of the present invention, a method for processing instruction blocks is provided, including: compiling a description file of a neural network model through a compilation module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block Block sequence table, jump instruction mapping table, the instruction block group includes multiple instruction blocks to be processed, the instruction block sequence table is used to indicate the running order of the multiple instruction blocks, and the execution device for running the instruction blocks , the execution device includes: a processor and an acceleration device, each of the instruction blocks is provided with a jump instruction, and the jump instruction mapping table includes: the jump instruction and the next executed instruction block; The image package is processed, and the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table.

在本发明实施例中,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,包括:通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present invention, compiling the description files of the neural network model by the compiling module to obtain the image package includes: compiling the description files of multiple neural network models through the compiling module to obtain the images corresponding to the multiple neural network models Bag.

在本发明实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,包括:指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组的指令块。In the embodiment of the present invention, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to execute the instruction block according to the instruction block. The execution order of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.

在本发明实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块之后,所述方法还包括:对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In the embodiment of the present invention, after the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table, the method further includes: for each instruction, according to the execution sequence After processing an instruction, cache the resulting data in a pre-allocated buffer.

根据本发明的另一个实施例,还提供了一种指令块的处理装置,包括:编译模块,用于对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;处理模块,用于加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。According to another embodiment of the present invention, an apparatus for processing instruction blocks is also provided, including: a compilation module for compiling a description file of a neural network model to obtain an image package, wherein the image package includes: an instruction Block group, instruction block sequence table, jump instruction mapping table, the instruction block group includes multiple instruction blocks to be processed, the instruction block sequence table is used to indicate the running order of the multiple instruction blocks, and the running instruction A block execution device, the execution device includes: a processor, an acceleration device, a jump instruction is set after each of the instruction blocks, and the jump instruction mapping table includes: the jump instruction and the next executed instruction block; a processing module, configured to load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

在本发明实施例中,所述编译模块,用于通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present invention, the compiling module is used for compiling description files of multiple neural network models through the compiling module to obtain image packages corresponding to the multiple neural network models.

在本发明实施例中,所述处理模块,还用于指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组。In the embodiment of the present invention, the processing module is further configured to instruct the execution device in the instruction block sequence table to process the instruction block group according to the running order of the instruction block sequence table and the jump instruction mapping table .

在本发明实施例中,所述处理模块,还用于对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In this embodiment of the present invention, the processing module is further configured to, for each instruction, cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running order.

根据本发明的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present invention, a storage medium is also provided, wherein a computer program is stored in the storage medium, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.

根据本发明的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present invention, there is also provided an electronic device comprising a memory and a processor, wherein the memory stores a computer program, the processor is configured to run the computer program to execute any of the above Steps in Method Examples.

通过本发明,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,解决了相关技术中,对于一个或多个神经网络系统,如何调度处理不同的指令块等问题,进而灵活的处理指令块组中的多个指令块。According to the present invention, the description file of the neural network model is compiled by the compiling module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, and a jump instruction mapping table, and the instruction block group includes Multiple instruction blocks to be processed, the instruction block sequence table is used to indicate the running order of the multiple instruction blocks, and an execution device for running the instruction blocks, the execution device includes: a processor, an acceleration device, each of which A jump instruction is provided after the instruction block, and the jump instruction mapping table includes: the jump instruction and the instruction block to be executed next; load the image package, and follow the instruction block sequence table and the jump instruction block. Converting the instruction mapping table to process the instruction blocks of the instruction block group solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related art, and then flexibly process multiple instruction blocks in the instruction block group. instruction block.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings described herein are used to provide further understanding of the present invention and constitute a part of the present application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:

图1是本发明实施例的一种指令块的处理方法的终端的硬件结构框图;1 is a block diagram of a hardware structure of a terminal of a method for processing an instruction block according to an embodiment of the present invention;

图2是根据本发明实施例的指令块的处理方法的流程图;2 is a flowchart of a method for processing an instruction block according to an embodiment of the present invention;

图3是根据本发明实施例的指令块的处理装置的结构框图;3 is a structural block diagram of an apparatus for processing an instruction block according to an embodiment of the present invention;

图4是根据本发明优选实施例的是编译模块工作流程示意图;4 is a schematic diagram of a compilation module workflow according to a preferred embodiment of the present invention;

图5是根据本发明优选实施例的镜像包构成示意图;FIG. 5 is a schematic diagram of the composition of an image package according to a preferred embodiment of the present invention;

图6是根据本发明优选实施例的运行态模块功能框图;6 is a functional block diagram of a running state module according to a preferred embodiment of the present invention;

图7是根据本发明优选实施例的输入输出缓冲及控制信息示意图;7 is a schematic diagram of input and output buffering and control information according to a preferred embodiment of the present invention;

图8是根据本发明优选实施例的加速设备指令块及跳转指令添加示意图;8 is a schematic diagram of adding an acceleration device instruction block and a jump instruction according to a preferred embodiment of the present invention;

图9是根据本发明优选实施例的加速设备指令块跳转位置映射表;Fig. 9 is the jump position mapping table of the instruction block of the acceleration device according to the preferred embodiment of the present invention;

图10是根据本发明优选实施例的加速设备按指令运行流程图;10 is a flow chart of the acceleration device running according to the instruction according to the preferred embodiment of the present invention;

图11是根据本发明优选实施例的加速设备内部功能框图;11 is a block diagram of the internal functions of an acceleration device according to a preferred embodiment of the present invention;

图12是根据本发明优选实施例的主机和加速设备交互流程图;FIG. 12 is a flow chart of the interaction between the host and the acceleration device according to the preferred embodiment of the present invention;

图13是根据本发明优选实施例的总体系统框图。Figure 13 is an overall system block diagram according to a preferred embodiment of the present invention.

具体实施方式Detailed ways

下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in conjunction with embodiments. It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict.

需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence.

实施例1Example 1

本申请实施例1所提供的方法实施例可以在终端或者类似的运算装置中执行。以运行在终端上为例,图1是本发明实施例的一种指令块的处理方法的终端的硬件结构框图。如图1所示,终端10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,可选地,上述终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述终端的结构造成限定。例如,终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示等同功能或比图1所示功能更多的不同的配置。The method embodiment provided in Embodiment 1 of the present application may be executed in a terminal or a similar computing device. Taking running on a terminal as an example, FIG. 1 is a block diagram of a hardware structure of a terminal of a method for processing an instruction block according to an embodiment of the present invention. As shown in FIG. 1 , the terminal 10 may include one or more (only one is shown in FIG. 1 ) processors 102 (the processors 102 may include but are not limited to processing devices such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, optionally, the above-mentioned terminal may further include a transmission device 106 and an input/output device 108 for communication functions. Those of ordinary skill in the art can understand that the structure shown in FIG. 1 is only for illustration, which does not limit the structure of the above-mentioned terminal. For example, the terminal 10 may also include more or fewer components than those shown in FIG. 1 , or have a different configuration with equivalent or more functions than those shown in FIG. 1 .

存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本发明实施例中的网约车的导航方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过神经网络模型连接至终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the method for navigating a car-hailing network in the embodiment of the present invention. By running the computer program stored in the memory 104, the processor 102, Thereby, various functional applications and data processing are performed, that is, the above-mentioned method is realized. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, memory 104 may further include memory located remotely from processor 102, and these remote memories may be connected to terminal 10 through a neural network model. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。Transmission means 106 are used to receive or transmit data via a network. The specific example of the above-mentioned network may include a wireless network provided by the communication provider of the terminal 10 . In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet in a wireless manner.

在本实施例中提供了一种运行于终端的指令块的处理方法,图2是根据本发明实施例的指令块的处理方法的流程图,如图2所示,该流程包括如下步骤:A method for processing an instruction block running on a terminal is provided in this embodiment. FIG. 2 is a flowchart of a method for processing an instruction block according to an embodiment of the present invention. As shown in FIG. 2 , the process includes the following steps:

步骤S202,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,镜像包包括:指令块组,指令块序表,跳转指令映射表,指令块组包括待处理的多个指令块,指令块序表用于指示多个指令块的运行顺序,以及运行指令块的执行设备,执行设备包括:处理器,加速设备,每一个指令块后设置有跳转指令,跳转指令映射表包括:跳转指令和下一个执行的指令块;Step S202, compiling the description file of the neural network model by the compiling module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, and a jump instruction mapping table, and the instruction block group includes a plurality of to-be-processed The instruction block, the instruction block sequence table is used to indicate the running order of multiple instruction blocks, and the execution device for running the instruction block. The execution device includes: a processor, an acceleration device, and each instruction block is provided with a jump instruction, a jump instruction The mapping table includes: jump instructions and the next executed instruction block;

步骤S204,加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。Step S204: Load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

通过本发明,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,解决了相关技术中,对于一个或多个神经网络系统,如何调度处理不同的指令块等问题,进而灵活的处理指令块组中的多个指令块。According to the present invention, the description file of the neural network model is compiled by the compiling module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, and a jump instruction mapping table, and the instruction block group includes Multiple instruction blocks to be processed, the instruction block sequence table is used to indicate the running order of the multiple instruction blocks, and an execution device for running the instruction blocks, the execution device includes: a processor, an acceleration device, each of which A jump instruction is provided after the instruction block, and the jump instruction mapping table includes: the jump instruction and the instruction block to be executed next; load the image package, and follow the instruction block sequence table and the jump instruction block. Converting the instruction mapping table to process the instruction blocks of the instruction block group solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related art, and then flexibly process multiple instruction blocks in the instruction block group. instruction block.

在本发明实施例中,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,包括:通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present invention, compiling the description files of the neural network model by the compiling module to obtain the image package includes: compiling the description files of multiple neural network models through the compiling module to obtain the images corresponding to the multiple neural network models Bag.

在本发明实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,包括:指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组的指令块。In the embodiment of the present invention, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to execute the instruction block according to the instruction block. The execution order of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.

在本发明实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块之后,所述方法还包括:对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In the embodiment of the present invention, after the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table, the method further includes: for each instruction, according to the execution sequence After processing an instruction, cache the resulting data in a pre-allocated buffer.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present invention.

在本实施例中还提供了一种指令块的处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, an apparatus for processing an instruction block is also provided, and the apparatus is used to implement the above-mentioned embodiments and preferred implementations, and what has been described will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.

图3是根据本发明实施例的指令块的处理装置的结构框图,如图3所示,该装置包括:FIG. 3 is a structural block diagram of an apparatus for processing an instruction block according to an embodiment of the present invention. As shown in FIG. 3 , the apparatus includes:

编译模块30,用于对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;The compiling module 30 is used for compiling the description file of the neural network model to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, and a jump instruction mapping table, and the instruction block group includes an instruction block group to be processing multiple instruction blocks, the instruction block sequence table is used to indicate the execution order of the multiple instruction blocks, and an execution device for executing the instruction blocks, the execution device includes: a processor, an acceleration device, each of the A jump instruction is provided after the instruction block, and the jump instruction mapping table includes: the jump instruction and the instruction block to be executed next;

处理模块32,用于加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。The processing module 32 is configured to load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

通过本发明,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,解决了相关技术中,对于一个或多个神经网络系统,如何调度处理不同的指令块等问题,进而灵活的处理指令块组中的多个指令块。According to the present invention, the description file of the neural network model is compiled by the compiling module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, and a jump instruction mapping table, and the instruction block group includes Multiple instruction blocks to be processed, the instruction block sequence table is used to indicate the running order of the multiple instruction blocks, and an execution device for running the instruction blocks, the execution device includes: a processor, an acceleration device, each of which A jump instruction is provided after the instruction block, and the jump instruction mapping table includes: the jump instruction and the instruction block to be executed next; load the image package, and follow the instruction block sequence table and the jump instruction block. Converting the instruction mapping table to process the instruction blocks of the instruction block group solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related art, and then flexibly process multiple instruction blocks in the instruction block group. instruction block.

在本发明实施例中,所述编译模块30,用于通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present invention, the compiling module 30 is configured to compile the description files of multiple neural network models through the compiling module to obtain image packages corresponding to the multiple neural network models.

在本发明实施例中,所述处理模块32,还用于指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组。In this embodiment of the present invention, the processing module 32 is further configured to instruct the execution device in the instruction block sequence table to process the instruction block according to the running order of the instruction block sequence table and the jump instruction mapping table Group.

在本发明实施例中,所述处理模块32,还用于对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In this embodiment of the present invention, the processing module 32 is further configured to, for each instruction, cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running order.

以下结合优选实施例对上述指令块的处理的过程进行大致说明,但不用于限定本发明实施例的技术方案。The process of processing the above-mentioned instruction block will be roughly described below with reference to the preferred embodiment, but it is not used to limit the technical solution of the embodiment of the present invention.

优选实施例1Preferred Embodiment 1

本发明优选实施例侧重的是人脸检测业务推理功能的实现,需要说明的是,神经网络模型1进行人脸检测功能,识别图片中是否包含人脸,并给出人脸在图片中的位置;神经网络模型2进行人脸识别,对神经网络模型1给出的人脸进行特征提取,同数据库进行比对,给出识别结果;一般,在输入神经网络模型前需进行一些预处理,神经网络模型运行完成后进行一些后处理工作。The preferred embodiment of the present invention focuses on the realization of the business reasoning function of face detection. It should be noted that the neural network model 1 performs the function of face detection, identifies whether the picture contains a human face, and gives the position of the human face in the picture. ; The neural network model 2 performs face recognition, extracts the features of the face given by the neural network model 1, compares it with the database, and gives the recognition result; generally, some preprocessing is required before inputting the neural network model. After the network model runs, some post-processing work is performed.

基于上述神经网络模型1和神经网络模型2所完成的功能,本发明优选实施例的技术方案包括以下步骤:Based on the functions completed by the above-mentioned neural network model 1 and neural network model 2, the technical solution of the preferred embodiment of the present invention includes the following steps:

步骤1:两个神经网络模型输入编译模块进行编译;Step 1: The two neural network models are input into the compilation module for compilation;

步骤1.1:两个神经网络模型描述文件输入编码模块进行编译;Step 1.1: Input the two neural network model description files into the encoding module for compilation;

步骤1.2:编译模块进行编译,最后输出镜像包,如图4所示;Step 1.2: Compile the module to compile, and finally output the image package, as shown in Figure 4;

其中,如图5所示,镜像包包括指令块组和指令块序描述,以及加速设备跳转映射表。Wherein, as shown in FIG. 5 , the image package includes an instruction block group and instruction block sequence description, and an acceleration device jump mapping table.

具体地,指令块组包括NET_C0,NET_D1,NET_C1,NET_D2,NET_C2;分别对应CPU人脸检测预处理(NET_C0),加速设备进行人脸检测(NET_D1),CPU进行人脸检测后处理及人脸识别预处理(NET_C1),加速设备进行人脸识别处理(NET_D2),CPU进行人脸识别后处理,完成业务(NET_C2);Specifically, the instruction block group includes NET_C0, NET_D1, NET_C1, NET_D2, and NET_C2; they correspond to the CPU face detection preprocessing (NET_C0), the acceleration device performs face detection (NET_D1), and the CPU performs face detection post-processing and face recognition. Preprocessing (NET_C1), accelerating the device for face recognition processing (NET_D2), CPU for face recognition post-processing, and completing the business (NET_C2);

指令块序表给出C0-D1-C1-D2-C2,其中,Cx表示CPU进行序号为x的计算处理;Dx表示加速设备Device进行顺序号x段指令处理。The instruction block sequence table gives C0-D1-C1-D2-C2, where Cx indicates that the CPU performs the calculation processing with the sequence number x; Dx indicates that the acceleration device Device performs the sequence number x segment instruction processing.

需要说明的是,本发明优选实施例假定神经网络模型1描述的相关运算整体由加速设备进行处理(NET_D1),神经网络模型2描述的计算要求由加速设备进行处理(NET_D2),输入首先经过人脸检测预处理(NET_C0),输出结果提交给加速设备处理,要求进行人脸检测。加速设备接收相关输入后,完成人脸检测NET_D1运算,输出人脸位置等信息,CPU获取,进行人脸检测后处理及人脸识别预处理(NET_C1描述),处理完成后的数据提交到加速设备进行人脸识别(NET_D2)处理,处理完成后,提交给CPU进行神经网络模型人脸识别后处理(NET_C2),完成整体业务功能;It should be noted that, in the preferred embodiment of the present invention, it is assumed that the related operations described by the neural network model 1 are processed by the acceleration device as a whole (NET_D1), and the computations described by the neural network model 2 are required to be processed by the acceleration device (NET_D2). Face detection preprocessing (NET_C0), the output results are submitted to the acceleration device for processing, and face detection is required. After the acceleration device receives the relevant input, it completes the face detection NET_D1 operation, outputs the face position and other information, and the CPU obtains it, performs the face detection post-processing and face recognition preprocessing (described in NET_C1), and submits the processed data to the acceleration device. Perform face recognition (NET_D2) processing. After the processing is completed, submit it to the CPU for neural network model face recognition post-processing (NET_C2) to complete the overall business function;

需要说明的是,此处假定单神经网络模型单指令块的划分并不失一般性,加速设备如果支持神经神经网络模型描述的一部分运算,可以划分成多个模块,具体可参照优选实施例2。It should be noted that it is assumed here that the division of a single instruction block of a single neural network model is not loss of generality. If the acceleration device supports part of the operations described by the neural network model, it can be divided into multiple modules. For details, please refer to the preferred embodiment 2. .

步骤2运行态相关处理流程;Step 2 is the processing flow related to the running state;

编译阶段完成镜像包输出,提交到运行态进行运行。In the compilation phase, the image package output is completed and submitted to the running state for operation.

如附图6所示,运行态包括如下模块:加载模块,加速设备控制管理模块,输入输出管理模块,上层API接口等,其中,As shown in Figure 6, the running state includes the following modules: a loading module, an acceleration device control management module, an input and output management module, an upper-layer API interface, etc., wherein,

加载模块完成指令块组到相应设备的加载,加速设备管理模块对加速设备的启动、停止、复位等进行控制;API接口完成同上层用户的交互;The loading module completes the loading of the instruction block group to the corresponding device, and the acceleration device management module controls the start, stop, reset, etc. of the acceleration device; the API interface completes the interaction with the upper-layer user;

输入输出管理模块,完成同加速设备(如图11所示为加速设备的内部框图)的输入输出交互,并通过缓存项中所包含的控制信息对运行项进行组织,具体地:从加速设备侧看,有输入缓存区(InBuffer)和输出缓存区(OutBuffer);附图7所示:缓存区内容有两块,一块控制信息,一块数据信息;控制信息包括图片序号Px,指令块处理设备及指令块序号Tx,其中,x是数字,T是设备类型,有C和D,C表示HOST侧CPU,D表示加速设备Device。The input and output management module completes the input and output interaction with the acceleration device (as shown in Figure 11 is the internal block diagram of the acceleration device), and organizes the running items through the control information contained in the cache items, specifically: from the acceleration device side See, there is an input buffer area (InBuffer) and an output buffer area (OutBuffer); as shown in Figure 7: the buffer area has two pieces of content, a piece of control information and a piece of data information; the control information includes the picture serial number Px, the instruction block processing equipment and The instruction block serial number Tx, where x is a number, T is the device type, there are C and D, C represents the CPU on the host side, and D represents the acceleration device Device.

步骤2.1:使用API接口,进行输入输出设置,完成编程;Step 2.1: Use the API interface to set the input and output, and complete the programming;

步骤2.2:通用编译工具(如gcc)对代码进行编译,生成可执行文件;Step 2.2: Compile the code with a general compilation tool (such as gcc) to generate an executable file;

步骤2.3:运行可执行文件:运行过程如附图12所示。Step 2.3: Running the executable file: The running process is shown in Figure 12.

HOST侧运行态按照指令块序表,先调度CPU进行人脸检测预处理(NET_C0)处理,计算完成后,将数据填充到InBuffer,填充控制信息为P1-Net-D1,表示需要进行人脸检测神经网络模型推理过程。In the running state of the HOST side, according to the instruction block sequence table, the CPU is first scheduled to perform face detection preprocessing (NET_C0) processing. After the calculation is completed, the data is filled into InBuffer, and the filling control information is P1-Net-D1, indicating that face detection is required. Neural network model inference process.

加速器获取到InBuffer中该项内容,进行人脸检测(Net-D1)的指令处理,处理完成后,数据填充到OutBuffer,并同时将该输入控制信息(P1-Net-D1)复制。The accelerator obtains the content in the InBuffer, and performs the command processing of the face detection (Net-D1). After the processing is completed, the data is filled into the OutBuffer, and the input control information (P1-Net-D1) is copied at the same time.

HOST侧从设备OutBuffer获取到该项内容后,根据指令块序表,判断进行人脸检测后处理及人脸识别神经网络模型预处理(Net-C1)处理,处理完成后,填充数据到InBuffer,并根据指令集序表填充控制信息为P1-Net_D2(人脸识别神经网络模型运行)。After the HOST side obtains the content from the device OutBuffer, it judges the post-processing of face detection and the pre-processing of the neural network model of face recognition (Net-C1) according to the instruction block sequence table. And fill the control information as P1-Net_D2 (face recognition neural network model operation) according to the instruction set sequence table.

加速设备获取该项进行人脸识别计算(Net_d2)处理,输出数据,复制控制信息P1-Net_D2。The acceleration device acquires this item for face recognition calculation (Net_d2) processing, outputs the data, and copies the control information P1-Net_D2.

CPU侧获取到该项,根据指令块序表,进行人脸识别数据后处理(NET_C2)处理,完成总体推理。The CPU side obtains this item, and performs the post-processing (NET_C2) processing of the face recognition data according to the instruction block sequence table to complete the overall reasoning.

从上述流程可见,使用指令块序表的过程中,加速设备侧对控制信息的处理,仅进行拷贝。主机侧根据指令块序表进行控制信息的维护、更改。It can be seen from the above process that in the process of using the instruction block sequence table, the processing of the control information on the device side is accelerated, and only copying is performed. The host side maintains and modifies the control information according to the instruction block sequence table.

此处,加速设备有进行两个不同功能,人脸检测(NET-D1)和人脸识别(NET-D2)功能,这其中的时分复用是用跳转指令、跳转映射表完成神经网络模型功能推理切换的。相关情况如下:Here, the acceleration device has two different functions, face detection (NET-D1) and face recognition (NET-D2), among which the time division multiplexing is to use jump instructions and jump mapping tables to complete the neural network Model function inference switch. The relevant situations are as follows:

第一种情况:编译态处理:如下四个步骤:The first case: compiled state processing: the following four steps:

步骤a:编译模块根据设备情况生成神经网络模型调度到不同设备的指令块NET_Tx,(本例中人脸检测处理(NET_D1)和人脸识别处理(NET_D2);Step a: The compilation module generates the neural network model according to the equipment conditions and dispatches the instruction block NET_Tx to different equipment, (in this example, face detection processing (NET_D1) and face recognition processing (NET_D2);

步骤b:编译模块在加速设备指令块NET_Dx后增加跳转指令JMP 0;Step b: The compilation module adds the jump instruction JMP 0 after the acceleration device instruction block NET_Dx;

步骤c:编译模块生成加速设备跳转映射表(也可运行态生成,此处描述编译模块生成)。跳转映射表如附图9。Step c: Compile the module to generate the jump mapping table of the acceleration device (it can also be generated in runtime, and the generation of the compilation module is described here). The jump mapping table is shown in Figure 9.

步骤d:编译模块在加速设备指令块NET-D0前增加buff获取指令和JMP Rj指令;(附图8)Step d: The compilation module adds the buff acquisition instruction and the JMP Rj instruction before the acceleration device instruction block NET-D0; (Fig. 8)

第二种情况:加速设备对指令的运行过程如下:(附图10)The second case: the operation process of the acceleration device to the instruction is as follows: (Figure 10)

1)执行指令,从指令基址0执行;1) Execute the instruction, execute from the instruction base address 0;

2)从InBuffer获取数据(控制信息+数据输入),从控制字中解析神经网络模型索引,查找映射表,获取执行指令基址,填充到Rj。2) Obtain data (control information + data input) from InBuffer, parse the neural network model index from the control word, look up the mapping table, obtain the base address of the execution instruction, and fill it with Rj.

3)跳转到指令基址开始执行;3) Jump to the instruction base address to start execution;

4)直至执行到该段神经网络模型处理尾端,输出执行结果到OutBuffer,将输入控制字写入;4) Until the execution reaches the end of the neural network model processing, output the execution result to OutBuffer, and write the input control word;

5)然后执行JMP 0,重复执行。5) Then execute JMP 0 and repeat.

优选实施例2:单神经网络模型需要在主机和加速设备共同运行Preferred Embodiment 2: The single neural network model needs to run together on the host and the acceleration device

步骤2.1:神经网络模型输入编译模块,进行编译,输出镜像包;Step 2.1: Input the neural network model into the compilation module, compile it, and output the image package;

假设神经网络模型指令块组合及指令块序表为C0-D1-C1-D2-C2;表示神经网络模型需要先经主机CPU预处理,然后加速设备处理,然后CPU处理,然后加速设备再处理,最后CPU处理;Assume that the neural network model instruction block combination and instruction block sequence table are C0-D1-C1-D2-C2; it means that the neural network model needs to be preprocessed by the host CPU, then accelerated by the device, then processed by the CPU, and then processed by the accelerated device. final CPU processing;

设备跳转表例同优选实施例1,此处不再赘述。The example of the device jumping table is the same as that of the preferred embodiment 1, and details are not repeated here.

步骤2.2:根据API接口编码,编译生成可执行文件;Step 2.2: Compile and generate executable files according to the API interface code;

步骤2.3:HOST侧运行态运行,进行镜像加载,运行;过程大体同优选实施例1中的步骤2;Step 2.3: The HOST side runs in a running state, performs image loading, and runs; the process is generally the same as Step 2 in the preferred embodiment 1;

步骤2.4:运行态持续运行,持续给出推理运算结果。Step 2.4: The running state continues to run, and the inference operation results are continuously given.

优选实施例3:多神经网络模型组合,且单神经网络模型需要多拆分Preferred Embodiment 3: Combination of multiple neural network models, and a single neural network model needs multiple splits

步骤3.1:多神经网络模型输入编译模块,进行编译,输出镜像包;Step 3.1: Input the multi-neural network model into the compilation module, compile it, and output the image package;

不失一般性,可设神经网络模型指令块组合及指令块序表为C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;Without loss of generality, the neural network model instruction block combination and instruction block sequence table can be set as C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;

设备跳转表例同优选实施例1。The example of the device jump table is the same as that of the preferred embodiment 1.

步骤3.2:同优选实施例2的其他步骤。Step 3.2: the same as other steps of preferred embodiment 2.

从本发明优选实施例来看,本发明实施例以及优选实施例所公开内容不仅适用于多神经网络模型组合推理业务,单神经网络模型需HOST和加速设备组合完成的业务也能应用。此外,本身有HOST和加速设备组合完成的神经网络模型的多神经网络模型组合也能适用;主机和多个加速设备组合也能适用;这些都在本发明的保护范围内。From the perspective of the preferred embodiments of the present invention, the embodiments of the present invention and the contents disclosed in the preferred embodiments are not only applicable to the combined inference business of multiple neural network models, but also can be applied to services that require a combination of HOST and acceleration devices for a single neural network model. In addition, the combination of multiple neural network models with the neural network model completed by the HOST and the acceleration device itself is also applicable; the combination of the host and multiple acceleration devices is also applicable; these are all within the protection scope of the present invention.

进一步地,本发明上述实施例以及优选实施例的技术方案,针对多神经神经网络模型组合相关的业务推理难落地实现的问题,提供了一种通过编译和运行两阶段、用跳转指令及映射表和指令块序表实现多神经神经网络模型组合业务推理的方法、装置和系统。Further, the technical solutions of the above-mentioned embodiments and preferred embodiments of the present invention, aiming at the problem that business reasoning related to the combination of multiple neural network models is difficult to implement, provide a two-stage compilation and running, using jump instructions and mapping. The table and the instruction block sequence table realize the method, device and system for combining business reasoning of multi-neural neural network models.

在本发明一可选实施例中,提供了一种编译生成使用指令块序表,运行时根据该表进行多设备调度、协同运算的方法;In an optional embodiment of the present invention, there is provided a method for compiling and generating a sequence table of used instruction blocks, and performing multi-device scheduling and cooperative operation according to the table at runtime;

在本发明一可选实施例中,提供了一种加速设备基于跳转映射表使用简单跳转指令,进行时分复用,完成不同运算功能的方法和装置;In an optional embodiment of the present invention, there is provided a method and a device for an acceleration device to use a simple jump instruction based on a jump map to perform time division multiplexing to complete different computing functions;

在本发明一可选实施例中,提供了一种神经神经网络模型加速设备,包括:指令缓存,用于存储相关指令;功能单元集模块,实现神经神经网络模型相关计算模块;跳转指令,跳转映射表,用于实现神经网络模型功能组的跳转;寄存器组等;In an optional embodiment of the present invention, a neural network model acceleration device is provided, including: an instruction cache for storing related instructions; a functional unit set module for implementing a neural network model related calculation module; a jump instruction, Jump mapping table, used to realize the jump of neural network model function group; register group, etc.;

在本发明一可选实施例中,提供了一种深度神经神经网络模型的编译模块,对深度神经神经网络模型进行编译转换为相关指令集;并生成相关指令块序表及跳转映射表;In an optional embodiment of the present invention, a compiling module for a deep neural network model is provided, which compiles and converts the deep neural network model into a relevant instruction set; and generates a relevant instruction block sequence table and a jump mapping table;

在本发明一可选实施例中,提供了一种神经神经网络模型推理运行的运行态模块:包括加载模块,加载相关镜像到具体位置;设备控制,控制加速设备的启动、停止、复位等;设备的输入输出管理,提供要处理的数据和要求给设备,获取设备的处理结果;包括提供给业务用户的编程接口(API)等;In an optional embodiment of the present invention, a running state module for the inference operation of a neural network model is provided: including a loading module, which loads a relevant image to a specific location; device control, which controls the start, stop, reset, etc. of the acceleration device; Input and output management of equipment, provide data to be processed and requirements to equipment, and obtain processing results of equipment; including programming interface (API) provided to business users;

综上,通过本发明实施例以及优选实施例的指令块的处理方法及装置,采用编译和运行态系统(如图13所示),可方便、快速、高效的完成多神经神经网络模型的推理落地,简便完成相关业务功能。To sum up, through the method and device for processing instruction blocks in the embodiment of the present invention and the preferred embodiment, the compiling and running state system (as shown in FIG. 13 ) can be used to conveniently, quickly and efficiently complete the reasoning of the multi-neural network model Landing, easy to complete related business functions.

本发明的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。An embodiment of the present invention further provides a storage medium, where a computer program is stored in the storage medium, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.

可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:Optionally, in this embodiment, the above-mentioned storage medium may be configured to store a computer program for executing the following steps:

S1,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,镜像包包括:指令块组,指令块序表,跳转指令映射表,指令块组包括待处理的多个指令块,指令块序表用于指示多个指令块的运行顺序,以及运行指令块的执行设备,执行设备包括:处理器,加速设备,每一个指令块后设置有跳转指令,跳转指令映射表包括:跳转指令和下一个执行的指令块;S1, the description file of the neural network model is compiled by the compilation module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, a jump instruction mapping table, and the instruction block group includes a plurality of instructions to be processed Block, the instruction block sequence table is used to indicate the running order of multiple instruction blocks, and the execution device for running the instruction block. The execution device includes: a processor, an acceleration device, and each instruction block is provided with a jump instruction, and the jump instruction mapping The table includes: jump instructions and the next executed instruction block;

S2,加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。S2: Load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。Optionally, in this embodiment, the above-mentioned storage medium may include but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, referred to as ROM), a random access memory (Random Access Memory, referred to as RAM), Various media that can store computer programs, such as removable hard disks, magnetic disks, or optical disks.

本发明的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。An embodiment of the present invention further provides a storage medium, where a computer program is stored in the storage medium, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.

本发明的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present invention also provides an electronic device, comprising a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any of the above method embodiments.

可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.

可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:Optionally, in this embodiment, the above-mentioned processor may be configured to execute the following steps through a computer program:

S1,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,镜像包包括:指令块组,指令块序表,跳转指令映射表,指令块组包括待处理的多个指令块,指令块序表用于指示多个指令块的运行顺序,以及运行指令块的执行设备,执行设备包括:处理器,加速设备,每一个指令块后设置有跳转指令,跳转指令映射表包括:跳转指令和下一个执行的指令块;S1, the description file of the neural network model is compiled by the compilation module to obtain an image package, wherein the image package includes: an instruction block group, an instruction block sequence table, a jump instruction mapping table, and the instruction block group includes a plurality of instructions to be processed Block, the instruction block sequence table is used to indicate the running order of multiple instruction blocks, and the execution device for running the instruction block. The execution device includes: a processor, an acceleration device, and each instruction block is provided with a jump instruction, and the jump instruction mapping The table includes: jump instructions and the next executed instruction block;

S2,加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。S2: Load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not described herein again in this embodiment.

本发明的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present invention also provides an electronic device, comprising a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any of the above method embodiments.

可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.

可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not described herein again in this embodiment.

显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的神经网络模型上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present invention can be implemented by a general-purpose computing device, and they can be centralized on a single computing device, or distributed in a neural network composed of multiple computing devices. On the network model, they can optionally be implemented in program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, in a different The illustrated or described steps are performed in order, either by fabricating them separately into individual integrated circuit modules, or by fabricating multiple modules or steps of them into a single integrated circuit module. As such, the present invention is not limited to any particular combination of hardware and software.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention shall be included within the protection scope of the present invention.

Claims (10)

1. A method for processing an instruction block, comprising:
compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block;
and loading the mirror image packet, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
2. The method of claim 1, wherein compiling the description file of the neural network model by the compiling module to obtain a mirror package comprises:
and compiling the description files of the plurality of neural network models through a compiling module to obtain mirror image packages corresponding to the plurality of neural network models.
3. The method of claim 1, wherein processing the instruction blocks of the group of instruction blocks according to the instruction block order table and the jump instruction mapping table comprises:
and instructing the execution equipment in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
4. The method according to any of claims 1 to 3, wherein after processing the instruction blocks of the group of instruction blocks according to the instruction block order table and the jump instruction mapping table, the method further comprises:
and for each instruction, after one instruction is processed according to the running sequence, caching the obtained data into a pre-allocated cache region.
5. An apparatus for processing an instruction block, comprising:
the compiling module is used for compiling the description file of the neural network model to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block;
and the processing module is used for loading the mirror image packet and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
6. The apparatus of claim 5, wherein the compiling module is configured to compile the description files of the plurality of neural network models through the compiling module to obtain the mirror image packages corresponding to the plurality of neural network models.
7. The apparatus of claim 5, wherein the processing module is further configured to instruct an execution device in the instruction block ordered table to process the instruction block groups according to the operation order of the instruction block ordered table and the jump instruction mapping table.
8. The apparatus of any of claims 5 to 7, wherein the processing module is further configured to, for each instruction, cache the obtained data in a pre-allocated cache area after processing one instruction according to the execution order.
9. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 4 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 4.
CN201910562823.6A 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium, and electronic device Pending CN112148291A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910562823.6A CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium, and electronic device
PCT/CN2020/085180 WO2020259020A1 (en) 2019-06-26 2020-04-16 Instruction block processing method and apparatus, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910562823.6A CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
CN112148291A true CN112148291A (en) 2020-12-29

Family

ID=73869963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910562823.6A Pending CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN112148291A (en)
WO (1) WO2020259020A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119046192A (en) * 2024-10-28 2024-11-29 上海灵动微电子股份有限公司 Direct memory access circuit and integrated circuit
CN119226224A (en) * 2024-11-18 2024-12-31 上海朔集半导体科技有限公司 Computing accelerators for MCU chips, MCU chips

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373410A (en) * 2015-12-22 2016-03-02 京信通信技术(广州)有限公司 Differential upgrading method and device for base station software
WO2017048647A1 (en) * 2015-09-19 2017-03-23 Microsoft Technology Licensing, Llc Implicit program order
CN108027731A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Debugging for block-based processor is supported
CN109272109A (en) * 2018-10-30 2019-01-25 北京地平线机器人技术研发有限公司 The instruction dispatching method and device of neural network model
CN109547575A (en) * 2019-01-04 2019-03-29 中国银行股份有限公司 A kind of data dispatching method, device and equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874212B (en) * 2015-06-30 2021-08-20 华为技术有限公司 A hardware acceleration method, compiler and apparatus
CN106227507B (en) * 2016-07-11 2019-10-18 北京深鉴智能科技有限公司 Computing system and its controller
CN107239315B (en) * 2017-04-11 2019-11-15 赛灵思公司 Programming Model for Neural Network Heterogeneous Computing Platform
CN109919311B (en) * 2019-03-13 2020-04-10 北京地平线机器人技术研发有限公司 Method for generating instruction sequence, method and device for executing neural network operation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017048647A1 (en) * 2015-09-19 2017-03-23 Microsoft Technology Licensing, Llc Implicit program order
CN108027731A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Debugging for block-based processor is supported
CN105373410A (en) * 2015-12-22 2016-03-02 京信通信技术(广州)有限公司 Differential upgrading method and device for base station software
CN109272109A (en) * 2018-10-30 2019-01-25 北京地平线机器人技术研发有限公司 The instruction dispatching method and device of neural network model
CN109547575A (en) * 2019-01-04 2019-03-29 中国银行股份有限公司 A kind of data dispatching method, device and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119046192A (en) * 2024-10-28 2024-11-29 上海灵动微电子股份有限公司 Direct memory access circuit and integrated circuit
CN119226224A (en) * 2024-11-18 2024-12-31 上海朔集半导体科技有限公司 Computing accelerators for MCU chips, MCU chips
CN119226224B (en) * 2024-11-18 2025-03-21 上海朔集半导体科技有限公司 Computing accelerators for MCU chips, MCU chips

Also Published As

Publication number Publication date
WO2020259020A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
KR102501992B1 (en) Data processing method and related products
KR102479264B1 (en) Dynamic batching for inference system for transformer-based generation tasks
US10372428B1 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
US12182688B2 (en) Hierarchical partitioning of operators
CN111651207B (en) A neural network model computing chip, method, device, device and medium
KR102498595B1 (en) Selective batching for inference system for transformer-based generation tasks
US11733983B2 (en) Method and apparatus for generating metadata by a compiler
US11748622B1 (en) Saving intermediate outputs of a neural network
CN110430444A (en) A kind of video stream processing method and system
US9898873B2 (en) Methods and systems for processing 3D graphic objects at a content processor
CN109491664A (en) Generation method, device, equipment and the storage medium of iOS application program
US20240330666A1 (en) Method and electronic apparatus for generating instructions of artificial intelligence accelerator
CN111124685A (en) Big data processing method and device, electronic equipment and storage medium
CN118536565A (en) AI algorithm acceleration method, device, equipment and readable storage medium
CN115576699A (en) Data processing method, data processing device, AI chip, electronic device and storage medium
CN112148291A (en) Instruction block processing method and device, storage medium, and electronic device
CN110580527A (en) Generating method, device and storage medium for general machine learning model
WO2023071509A1 (en) Model compilation method and apparatus, and model running system
CN112633502A (en) Cross-platform execution method and device of deep learning model and electronic equipment
CN119539075A (en) Model training reasoning method, device, equipment, medium and program product
CN119311253A (en) Task execution method based on domain specific language and software development tool chain
CN114127681B (en) Method and apparatus for autonomous acceleration of data stream AI applications
CN117519709A (en) Computational graph compilation method, compilation device, computing equipment and storage medium
CN116720567A (en) A model optimization method and related equipment
US20250147740A1 (en) Method of Compiling Neural Network Model, Compiler, and Storage Medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination