CN107103113A

CN107103113A - Towards the Automation Design method, device and the optimization method of neural network processor

Info

Publication number: CN107103113A
Application number: CN201710178281.3A
Authority: CN
Inventors: 韩银和; 许浩博; 王颖
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2017-03-23
Filing date: 2017-03-23
Publication date: 2017-08-29
Anticipated expiration: 2037-03-23
Also published as: CN107103113B; WO2018171717A1

Abstract

The present invention proposes a neural network processor-oriented automatic design method, device and optimization method. The method includes step 1, obtaining a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and Target running speed; step 2, according to the neural network model description file and the hardware resource constraint parameters, search the unit library from the built neural network component library, and generate the corresponding neural network model according to the unit library The hardware description language code of the neural network processor; step 3, converting the hardware description language code into the hardware circuit of the neural network processor.

Description

Automatic design method, device and optimization method for neural network processor

技术领域technical field

本发明涉及神经网络处理器体系结构技术领域，特别涉及面向神经网络处理器的自动化设计方法、装置及优化方法。The invention relates to the technical field of neural network processor architecture, in particular to an automatic design method, device and optimization method for neural network processors.

背景技术Background technique

深度学习及神经网络技术的飞速发展为大规模数据处理任务提供了新的解决途径，各种新型神经网络模型在处理复杂抽象问题上有着出色表现，其在视觉图像处理、语音识别及智能机器人等领域的新型应用层出不穷。The rapid development of deep learning and neural network technology provides new solutions for large-scale data processing tasks. Various new neural network models have excellent performance in dealing with complex and abstract problems. New applications in the field emerge in an endless stream.

目前利用深度神经网络进行实时任务分析大多依靠大规模高性能处理器或通用图形处理器，这些设备成本高功耗大，面向便携式智能设备应用时，存在电路规模大、能量消耗高和产品价格昂贵等一系列问题。因此，针对嵌入式设备及小型低成本数据中心等应用领域中高能效实时处理的应用，采用专用神经网络处理器加速而不是软件的方式进行神经网络模型计算成为一种更有效的解决方案，然而神经网络模型的拓扑结构及参数设计会根据不同的应用场景而改变，另外神经网络模型的发展更迭速度很快，提供一种可以面向各种应用场景并覆盖各种神经网络模型的通用高效神经网络处理器非常困难，这为高层应用开发者针对不同应用需求设计硬件加速解决方案带来了极大不变。At present, the real-time task analysis using deep neural networks mostly relies on large-scale high-performance processors or general-purpose graphics processors. These devices have high cost and high power consumption. When they are applied to portable smart devices, they have large circuit scale, high energy consumption and expensive products. And so on a series of questions. Therefore, for energy-efficient real-time processing applications in embedded devices and small low-cost data centers, it is a more effective solution to use dedicated neural network processors to accelerate neural network model calculations instead of software. The topology and parameter design of the network model will change according to different application scenarios. In addition, the development and change of the neural network model is very fast, providing a general and efficient neural network processing that can face various application scenarios and cover various neural network models. It is very difficult for high-level application developers to design hardware acceleration solutions for different application requirements.

目前现有的神经网络硬件加速技术包括专用集成电路(Application SpecificIntegrated Circuit，ASIC)芯片和现场可编程门阵列(Field Programmable Gate Array，FPGA)两种方式。在同等工艺条件下，ASIC芯片运行速度快且功耗低，但设计流程复杂、投片周期长、开发成本高，无法适应神经网络模型快速更新的特点；FPGA具有电路配置灵活、开发周期短的特点，但运行速度相对低，硬件开销及功耗相对较大。无论采用上述哪种硬件加速技术，均需要神经网络模型及算法开发人员在了解网络拓扑和数据流模式的同时掌握硬件开发技术，包括处理器架构设计、硬件代码编写、仿真验证及布局布线等环节，这些技术对专注于研究神经网络模型及结构设计、而不具备硬件设计能力的高层应用开发人员而言开发难度较高。因此，为了使高层开发者高效地进行神经网络技术应用开发，提供一种面向多种神经网络模型的神经网络处理器自动化设计方法及工具是非常迫切的。Currently, existing neural network hardware acceleration technologies include two methods, Application Specific Integrated Circuit (ASIC) chip and Field Programmable Gate Array (Field Programmable Gate Array, FPGA). Under the same process conditions, ASIC chips run fast and consume low power consumption, but the design process is complicated, the casting cycle is long, and the development cost is high, which cannot adapt to the characteristics of rapid update of neural network models; FPGA has the characteristics of flexible circuit configuration and short development cycle. features, but the running speed is relatively low, and the hardware overhead and power consumption are relatively large. No matter which of the above hardware acceleration technologies is used, it is necessary for the neural network model and algorithm developers to understand the network topology and data flow mode while mastering the hardware development technology, including processor architecture design, hardware code writing, simulation verification, layout and wiring, etc. , these technologies are more difficult to develop for high-level application developers who focus on researching neural network models and structural design, but do not have hardware design capabilities. Therefore, in order to enable high-level developers to efficiently develop neural network technology applications, it is very urgent to provide an automatic design method and tool for neural network processors oriented to multiple neural network models.

发明内容Contents of the invention

针对现有技术的不足，本发明提出面向神经网络处理器的自动化设计方法、装置及优化方法。Aiming at the deficiencies of the prior art, the present invention proposes an automatic design method, device and optimization method for neural network processors.

本发明提出一种面向神经网络处理器的自动化设计方法，包括：The present invention proposes an automatic design method for neural network processors, including:

步骤1，获取神经网络模型描述文件、硬件资源约束参数，其中所述硬件资源约束参数包括硬件资源大小及目标运行速度；Step 1, obtaining the neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;

步骤2，根据所述神经网络模型描述文件与所述硬件资源约束参数，从已构建的神经网络组件库中查找单元库，并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码；Step 2, according to the neural network model description file and the hardware resource constraint parameters, search the unit library from the built neural network component library, and generate the neural network processing corresponding to the neural network model according to the unit library The hardware description language code of the device;

步骤3，将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。Step 3, converting the hardware description language code into the hardware circuit of the neural network processor.

所述神经网络处理器包括存储结构、控制结构、计算结构。The neural network processor includes a storage structure, a control structure, and a calculation structure.

所述神经网络模型描述文件包括基本属性、参数描述与连接信息三部分，其中所述基本属性包括层名称与层类型，所述参数描述包括输出层数、卷积核大小与步进大小，所述连接信息包括连接名称、连接方向、连接类型。The neural network model description file includes three parts: basic attributes, parameter descriptions, and connection information, wherein the basic attributes include layer names and layer types, and the parameter descriptions include the number of output layers, convolution kernel size, and step size. The above connection information includes connection name, connection direction and connection type.

所述神经网络可复用单元库包括硬件描述文件及配置脚本两部分。The neural network reusable unit library includes two parts: a hardware description file and a configuration script.

所述神经网络可复用单元库包括神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元。The neural network reusable unit library includes a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit.

所述神经网络处理器包括主地址生成单元、数据地址生成单元与权重地址生成单元。The neural network processor includes a main address generation unit, a data address generation unit and a weight address generation unit.

还包括根据用户指定的神经网络模型与硬件资源约束参数确定数据路径，并根据神经网络中间层特征确定数据资源共享方式；It also includes determining the data path according to the user-specified neural network model and hardware resource constraint parameters, and determining the data resource sharing method according to the characteristics of the middle layer of the neural network;

根据硬件配置与网络特征生成存储器的地址访问流，所述地址访问流通过有限状态机的方式描述；Generate an address access flow of the memory according to the hardware configuration and network characteristics, and the address access flow is described by means of a finite state machine;

生成硬件描述语言代码，进而转化为所述神经网络处理器的硬件电路。Generate a hardware description language code, and then convert it into a hardware circuit of the neural network processor.

还包括根据所述神经网络模型、所述硬件资源约束参数、所述硬件描述语言代码，生成数据存储映射与控制指令流。It also includes generating a data storage map and a control instruction flow according to the neural network model, the hardware resource constraint parameters, and the hardware description language code.

本发明还包括一种面向神经网络处理器的自动化设计装置，包括：The present invention also includes a neural network processor-oriented automatic design device, comprising:

获取数据模块，用于获取神经网络模型描述文件、硬件资源约束参数，其中所述硬件资源约束参数包括硬件资源大小及目标运行速度；Obtaining a data module, configured to obtain a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;

生成硬件描述语言代码模块，用于根据所述神经网络模型描述文件与所述硬件资源约束参数，从已构建的神经网络组件库中查找单元库，并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码；Generate a hardware description language code module, which is used to search for a unit library from the constructed neural network component library according to the neural network model description file and the hardware resource constraint parameters, and generate a module corresponding to the neural network according to the unit library. The hardware description language code of the neural network processor of the network model;

生成硬件电路模块，用于将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。A hardware circuit module is generated for converting the hardware description language code into a hardware circuit of the neural network processor.

本发明还提出一种基于如所述的一种面向神经网络处理器的自动化设计方法的优化方法，包括：The present invention also proposes an optimization method based on a neural network processor-oriented automatic design method as described, including:

步骤1，定义卷积核大小为k*k，步进为s，存储器宽度为d，数据图张数为t，如果k^2＝d^2，将数据划分为k*k大小的数据块，数据宽度与存储器宽度一致，保证数据在存储器中连续存储；Step 1, define the convolution kernel size as k*k, stepping as s, memory width as d, number of data images as t, if k^2=d^2, divide the data into k*k size data blocks , the data width is consistent with the memory width, ensuring that the data is stored continuously in the memory;

步骤2，如果k^2！＝d^2,并且步进s是k与存储器宽度d的最大公约数，将数据划分为s*s大小的数据块，保证在一张数据图中数据在存储器中连续存储；Step 2, if k^2! =d^2, and stepping s is the greatest common divisor of k and memory width d, divide the data into data blocks of s*s size, and ensure that the data in a data map is continuously stored in the memory;

步骤3，若以上两项都不满足，则求出步进s、k、存储器宽度d的最大公约数f，将数据分割为大小为f*f的数据块，t张数据图交替存储。Step 3, if the above two items are not satisfied, find the greatest common divisor f of step s, k, and memory width d, divide the data into data blocks of size f*f, and store t data maps alternately.

由以上方案可知，本发明的优点在于：As can be seen from the above scheme, the present invention has the advantages of:

本发明可以将神经网络模型映射为硬件电路并根据硬件资源约束和网络特征自动优化电路结构及数据存储方式，同时生成相应的控制指令流，实现了神经网络硬件加速器的硬件及软件自动化协同设计，在缩短神经网络处理器设计周期的同时提高了神经网络处理器运算能效。The present invention can map the neural network model into a hardware circuit, automatically optimize the circuit structure and data storage mode according to hardware resource constraints and network characteristics, and generate corresponding control instruction streams at the same time, realizing the hardware and software automatic collaborative design of the neural network hardware accelerator, While shortening the design cycle of the neural network processor, the computing energy efficiency of the neural network processor is improved.

附图说明Description of drawings

图1是本发明提供的神经网络处理器的FPGA自动实现工具工作流程图；Fig. 1 is the FPGA automatic implementation tool work flow chart of neural network processor provided by the present invention;

图2是本发明本发明可自动生成的神经网络处理器系统示意图；Fig. 2 is a schematic diagram of a neural network processor system that can be automatically generated by the present invention;

图3是本发明采用的神经网络可复用单元库示意图；Fig. 3 is a schematic diagram of the neural network reusable unit library used in the present invention;

图4是本发明采用的地址生成电路接口示意图。Fig. 4 is a schematic diagram of the interface of the address generation circuit used in the present invention.

具体实施方式detailed description

为了使本发明的目的、技术方案、设计方法及优点更加清楚明了，以下结合附图通过具体实施例对本发明进一步详细说明，应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution, design method and advantages of the present invention clearer, the present invention will be further described in detail through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, and It is not intended to limit the invention.

本发明旨在提供面向神经网络处理器的自动化设计方法、装置及优化方法，该装置包括一硬件生成器和一编译器，所述硬件生成器可根据神经网络类型及硬件资源约束自动生成神经网络处理器的硬件描述语言代码，随后设计人员利用已有硬件电路设计方法通过硬件描述语言生成处理器硬件电路；所述编译器可根据神经网络处理器电路结构生成控制和数据调度指令流。The present invention aims to provide an automatic design method, device and optimization method for neural network processors, the device includes a hardware generator and a compiler, and the hardware generator can automatically generate a neural network according to the neural network type and hardware resource constraints The hardware description language code of the processor, and then the designer uses the existing hardware circuit design method to generate the processor hardware circuit through the hardware description language; the compiler can generate the control and data scheduling instruction flow according to the neural network processor circuit structure.

图1为本发明提供的神经网络处理器自动化生成技术示意图，具体步骤为：Fig. 1 is a schematic diagram of the neural network processor automatic generation technology provided by the present invention, and the specific steps are:

步骤1，本发明装置读取神经网络模型描述文件，描述文件内包括网络拓扑结构和各个运算层定义；Step 1, the device of the present invention reads the neural network model description file, which includes the network topology and the definitions of each computing layer;

步骤2，本发明装置读入硬件资源约束参数，硬件约束参数包括硬件资源大小及目标运行速度等，本发明装置可根据硬件约束参数生成相应的电路结构；Step 2, the device of the present invention reads in hardware resource constraint parameters, which include hardware resource size and target operating speed, etc., and the device of the present invention can generate a corresponding circuit structure according to the hardware constraint parameters;

步骤3，本发明装置根据所述神经网络模型描述脚本和硬件资源约束从已经构建好的神经网络组件库中索引适合的单元库，该工具所包含的硬件电路生成器利用上述单元库生成对应该神经网络模型的神经网络处理器硬件描述语言代码；Step 3, the device of the present invention indexes a suitable unit library from the constructed neural network component library according to the neural network model description script and hardware resource constraints, and the hardware circuit generator contained in the tool uses the above-mentioned unit library to generate a corresponding Neural Network Processor Hardware Description Language code for the neural network model;

步骤4，本发明装置所包含的编译器根据神经网络模型、逻辑资源约束及生成的硬件描述语言代码生成数据存储映射和控制指令流；Step 4, the compiler included in the device of the present invention generates data storage mapping and control instruction flow according to the neural network model, logic resource constraints and generated hardware description language code;

步骤5，通过已有硬件设计方法将硬件描述语言转化为硬件电路。Step 5, convert the hardware description language into a hardware circuit through the existing hardware design method.

本发明可自动生成的神经网络处理器基于存储-控制-计算的结构；The automatically generated neural network processor of the present invention is based on the storage-control-computation structure;

存储结构用于存储参与计算的数据、神经网络权重及处理器操作指令；The storage structure is used to store data involved in calculations, neural network weights and processor operation instructions;

控制结构包括译码电路与控制逻辑电路，用于解析操作指令，生成控制信号，该信号用于控制片上数据的调度与存储以及神经网络计算过程；The control structure includes a decoding circuit and a control logic circuit, which are used to analyze operation instructions and generate control signals, which are used to control the scheduling and storage of on-chip data and the neural network calculation process;

计算结构包括计算单元，用于参与该处理器中的神经网络计算操作。The computing structure includes computing units for participating in neural network computing operations in the processor.

图2为本发明可自动生成的神经网络处理器系统101示意图，该神经网络处理器系统101架构由七个部分构成，包括输入数据存储单元102、控制单元103、输出数据存储单元104、权重存储单元105、指令存储单元106、计算单元107。Fig. 2 is a schematic diagram of the neural network processor system 101 that can be automatically generated in the present invention. The architecture of the neural network processor system 101 is composed of seven parts, including an input data storage unit 102, a control unit 103, an output data storage unit 104, and a weight storage unit. Unit 105 , instruction storage unit 106 , calculation unit 107 .

输入数据存储单元102用于存储参与计算的数据，该数据包括原始特征图数据和参与中间层计算的数据；输出数据存储单元104存储计算得到的神经元响应值；指令存储单元106存储参与计算的指令信息，指令被解析为控制流来调度神经网络计算；权重存储单元105用于存储已经训练好的神经网络权重；The input data storage unit 102 is used to store the data involved in the calculation, which includes the original feature map data and the data involved in the calculation of the intermediate layer; the output data storage unit 104 stores the calculated neuron response value; the instruction storage unit 106 stores the data involved in the calculation Instruction information, the instruction is parsed into a control flow to schedule neural network calculations; the weight storage unit 105 is used to store trained neural network weights;

控制单元103分别与输出数据存储单元104、权重存储单元105、指令存储单元106、计算单元107相连，控制单元103获得保存在指令存储单元106中的指令并且解析该指令，控制单元103可根据解析指令得到的控制信号控制计算单元进行神经网络计算。The control unit 103 is connected to the output data storage unit 104, the weight storage unit 105, the instruction storage unit 106, and the calculation unit 107 respectively. The control unit 103 obtains the instruction stored in the instruction storage unit 106 and analyzes the instruction. The control unit 103 can analyze the instruction according to the The control signal obtained from the instruction controls the calculation unit to perform neural network calculation.

计算单元107用于根据控制单元103产生的控制信号来执行相应的神经网络计算。计算单元107与一个或多个存储单元相关联，计算单元107可以从与其相关联的输入数据存储单元102中的数据存储部件获得数据以进行计算，并且可以向与其相关联的输出数据存储单元104写入数据。计算单元107完成神经网络算法中的大部分运算，即向量乘加操作等。The calculation unit 107 is configured to perform corresponding neural network calculations according to the control signal generated by the control unit 103 . The calculation unit 107 is associated with one or more storage units, and the calculation unit 107 can obtain data from the data storage components in the input data storage unit 102 associated with it to perform calculations, and can send data to the output data storage unit 104 associated with it data input. The calculation unit 107 completes most of the operations in the neural network algorithm, that is, vector multiplication and addition operations and the like.

本发明通过提供所述神经网络描述文件格式描述神经网络模型特征，该描述文件内容包括基本属性、参数描述和连接信息三部分，其中基本属性包括层名称和层类型，参数描述包括，输出层数、卷积核大小和步进大小，连接信息包括连接名称、连接方向、连接类型。The present invention describes the characteristics of the neural network model by providing the neural network description file format. The content of the description file includes three parts: basic attributes, parameter descriptions, and connection information, wherein the basic attributes include layer names and layer types, and the parameter descriptions include the number of output layers. , convolution kernel size and step size, connection information includes connection name, connection direction, connection type.

为了适应各种神经网络模型的硬件实现设计，本发明提供的神经网络可复用单元库如图3，单元库包括硬件描述文件及配置脚本两部分。本发明提供的可复用单元库包括但不局限于：神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元等。In order to adapt to the hardware implementation design of various neural network models, the neural network reusable unit library provided by the present invention is shown in Figure 3. The unit library includes two parts: a hardware description file and a configuration script. The reusable unit library provided by the present invention includes but not limited to: neuron unit, accumulator unit, pooling unit, classifier unit, local response normalization unit, lookup table unit, address generation unit, control unit, etc.

本发明在利用上述可复用单元库构成神经网络处理器系统时，通过读取神经网络模型描述文件及硬件资源约束合理优化调用单元库。When the present invention utilizes the reusable unit library to form a neural network processor system, it rationally optimizes and calls the unit library by reading the neural network model description file and hardware resource constraints.

在神经网络处理器工作过程中，处理器需要自动获取片上和片外存储器数据的地址流，在本发明中，存储器地址流由编译器确定生成，由存储器地址流确定的存储器访问模式通过文本交互至硬件生成器，存储器访问模式包括主访问模式、数据访问模式和权重访问模式等。During the working process of the neural network processor, the processor needs to automatically obtain the address stream of on-chip and off-chip memory data. In the present invention, the memory address stream is determined and generated by the compiler, and the memory access mode determined by the memory address stream is passed through text interaction. To the hardware generator, the memory access mode includes master access mode, data access mode and weight access mode, etc.

硬件生成器依据所述存储器访问模式地址生成单元(AGU)。Hardware generators address generation unit (AGU) according to the memory access pattern.

利用本发明提供的神经网络处理器自动设计工具设计的神经网络处理器电路包括三种类型的地址生成单元，包括：主地址生成单元、数据地址生成单元和权重地址生成单元，其中，主地址生成单元负责片内存储器与片外存储器之间的数据交换，数据地址生成单元负责从片上存储器读取数据至计算单元以及将计算单元中间计算结果和最终计算结果存储至存储单元这两部分数据交换，权重地址生成单元负责从片上存储器读取权重数据至计算单元。The neural network processor circuit designed by the neural network processor automatic design tool provided by the present invention includes three types of address generation units, including: a main address generation unit, a data address generation unit and a weight address generation unit, wherein the main address generation unit The unit is responsible for the data exchange between the on-chip memory and the off-chip memory. The data address generation unit is responsible for reading data from the on-chip memory to the computing unit and storing the intermediate calculation results and final calculation results of the computing unit to the storage unit. The weight address generating unit is responsible for reading weight data from the on-chip memory to the computing unit.

在本发明中，硬件电路生成器与编译器协同工作实现地址生成电路的设计，具体设计算法步骤为：In the present invention, the hardware circuit generator and the compiler work together to realize the design of the address generation circuit, and the specific design algorithm steps are:

步骤1，本发明装置根据设计人员指定的神经网络模型和硬件约束确定数据路径，并依据神经网络中间层特征确定数据资源共享方式；Step 1, the device of the present invention determines the data path according to the neural network model and hardware constraints specified by the designer, and determines the data resource sharing mode according to the characteristics of the middle layer of the neural network;

步骤2，编译器根据硬件配置和网络特征生成存储器地址访问流，所述地址访问流由编译器通过有限状态机的方式描述；Step 2, the compiler generates a memory address access flow according to the hardware configuration and network characteristics, and the address access flow is described by the compiler through a finite state machine;

步骤3，所述有限状态机由硬件生成器映射为地址生成电路硬件描述语言，进而通过硬件电路设计方法映射为硬件电路。Step 3, the finite state machine is mapped into an address generation circuit hardware description language by a hardware generator, and then mapped into a hardware circuit through a hardware circuit design method.

图4为本发明提供的地址生成电路通用结构示意图。本发明所述地址生成电路具有通用信号接口，该接口包含的接口信号有：FIG. 4 is a schematic diagram of the general structure of the address generation circuit provided by the present invention. The address generation circuit of the present invention has a general signal interface, and the interface signals included in the interface are:

起始地址信号，即数据首地址；The starting address signal, that is, the first address of the data;

数据块尺寸信号，取一次数据的数据量；Data block size signal, the amount of data to fetch once;

存储器标志位信号，确定将数据存放的存储器编号；The memory flag bit signal determines the number of the memory where the data is stored;

工作模式信号，分为大卷积核取数据模式、小卷积核取数据模式、池化模式、全卷积模式等；The working mode signal is divided into large convolution kernel data acquisition mode, small convolution kernel data acquisition mode, pooling mode, full convolution mode, etc.;

卷积核尺寸信号，定义卷积核大小；The convolution kernel size signal defines the size of the convolution kernel;

长度信号，定义输出图片大小；The length signal defines the size of the output image;

输入层数目信号，标记输入层数目；input layer number signal, marking the number of input layers;

输出层数目信号，标记输出层数目；Output layer number signal, marking the number of output layers;

复位信号，该信号为1时，初始化地址生成电路；Reset signal, when the signal is 1, initialize the address generation circuit;

写使能信号，指定被访问存储器进行写操作；Write enable signal, specifying the accessed memory for write operation;

读使能信号，指定被访问存储器进行读操作；Read enable signal, specifying the accessed memory for read operation;

地址信号，给出访问存储器地址；Address signal, giving the access memory address;

结束信号，访问结束信号。End signal, access end signal.

所述参数确保AGU支持多种工作模式并保证在不同工作模式及神经网络传播过程中能够生成正确的读写地址流。The parameters ensure that the AGU supports multiple working modes and ensures that correct read and write address streams can be generated during different working modes and neural network propagation.

针对不同的目标网络，工具从所述模板中选取必要的参数构建地址发生器并提供片上及片外存储器访问模式。For different target networks, the tool selects the necessary parameters from the template to build an address generator and provides on-chip and off-chip memory access modes.

本发明提供的神经网路处理器使用数据驱动的方式构建处理器架构，因此所述地址生成电路不仅提供访问地址而且驱动不同神经层和及层数据块的执行。The neural network processor provided by the present invention constructs a processor architecture in a data-driven manner, so the address generation circuit not only provides access addresses but also drives the execution of different neural layers and layer data blocks.

由于资源约束的限制，神经网络模型在映射为硬件电路时无法按照其模型描述形式完整展开，因此本发明提出的自动设计工具采用软硬件协同工作的方式优化数据存储及访问机制，包括两部分内容：首先，编译器分析神经网络处理器的计算吞吐量和片上存储器大小，将神经网络特征数据和权重数据划分为适当的数据块集中存储和访问；其次，依据计算单元规模、存储器及数据位宽在数据块内进行数据分割。Due to resource constraints, when the neural network model is mapped to a hardware circuit, it cannot be completely expanded according to its model description form. Therefore, the automatic design tool proposed by the present invention optimizes the data storage and access mechanism by using software and hardware to work together, including two parts. : First, the compiler analyzes the computing throughput and on-chip memory size of the neural network processor, divides the neural network feature data and weight data into appropriate data blocks for centralized storage and access; secondly, according to the calculation unit scale, memory and data bit width Data partitioning is performed within a data block.

本发明基于上述优化机制提出一种数据存储及访问的优化方法，具体实施步骤为：The present invention proposes an optimization method for data storage and access based on the above-mentioned optimization mechanism, and the specific implementation steps are:

步骤1，定义卷积核大小为k*k，步进为s，存储器宽度为d，数据图张数为t，如果k^2＝d^2，将数据划分为k*k大小的数据块，数据宽度和存储器宽度一致，保证数据在存储器中连续存储；Step 1, define the convolution kernel size as k*k, stepping as s, memory width as d, number of data images as t, if k^2=d^2, divide the data into k*k size data blocks , the data width is consistent with the memory width, ensuring that the data is stored continuously in the memory;

步骤2，如果k^2！＝d^2,并且s是k和d的最大公约数，将数据划分为s*s大小的数据块，保证在一张数据图中数据可在存储器中连续存储；Step 2, if k^2! =d^2, and s is the greatest common divisor of k and d, divide the data into data blocks of s*s size, and ensure that the data in a data map can be continuously stored in the memory;

步骤3，若以上两项都不满足，则求出s、k、d的最大公约数f，将数据分割为大小为f*f的数据块，t张数据图交替存储。Step 3, if the above two items are not satisfied, find the greatest common divisor f of s, k, and d, divide the data into data blocks with a size of f*f, and store t data graphs alternately.

神经网络的计算数据包括输入特征数据和训练好的权重数据，通过良好的数据存储布局可以减小处理器内部数据带宽并提高存储空间利用效率。本发明提供的自动设计工具通过增加处理器数据存储局部性提高处理器的计算效率。The calculation data of the neural network includes input feature data and trained weight data. Through a good data storage layout, the internal data bandwidth of the processor can be reduced and the storage space utilization efficiency can be improved. The automatic design tool provided by the invention improves the calculation efficiency of the processor by increasing the processor data storage locality.

综上所述，本发明提供一款面向神经网络处理器的自动化设计工具，该工具具有从神经网络模型映射为描述神经网络处理器的硬件代码、依据硬件资源约束优化处理器架构和自动生成控制流指令等功能，实现了神经网络处理器的自动化设计，降低了神经网络处理器的设计周期，适应了神经网络技术网络模型更新快、运算速度要求块、能量效率要求高的应用特点。In summary, the present invention provides an automatic design tool for neural network processors, which has the functions of mapping from neural network models to describe the hardware codes of neural network processors, optimizing processor architecture according to hardware resource constraints, and automatically generating control Streaming instructions and other functions realize the automatic design of neural network processors, reduce the design cycle of neural network processors, and adapt to the application characteristics of neural network technology with fast update of network models, high computing speed requirements, and high energy efficiency requirements.

应当理解，虽然本说明书是按照各个实施例描述的，但并非每个实施例仅包含一个独立的技术方案，说明书的这种叙述方式仅仅是为清楚起见，本领域技术人员应当将说明书作为一个整体，各实施例中的技术方案也可以经适当组合，形成本领域技术人员可以理解的其他实施方式。It should be understood that although this description is described according to various embodiments, not each embodiment only includes an independent technical solution, and this description of the description is only for clarity, and those skilled in the art should take the description as a whole , the technical solutions in the various embodiments can also be properly combined to form other implementations that can be understood by those skilled in the art.

本发明还提出一种面向神经网络处理器的自动化设计装置，包括：The present invention also proposes an automatic design device for neural network processors, including:

将所述有限状态机映射为地址，并生成硬件描述语言代码，进而转化为所述神经网络处理器的硬件电路。The finite state machine is mapped to an address, and a hardware description language code is generated, and then converted into a hardware circuit of the neural network processor.

以上所述仅为本发明示意性的具体实施方式，并非用以限定本发明的范围。任何本领域的技术人员，在不脱离本发明的构思和原则的前提下所作的等同变化、修改与结合，均应属于本发明保护的范围。The above descriptions are only illustrative specific implementations of the present invention, and are not intended to limit the scope of the present invention. Any equivalent changes, modifications and combinations made by those skilled in the art without departing from the concept and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. an automated design method for neural network processors, characterized in that, comprising:

Step 1, obtaining the neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;

Step 2, according to the neural network model description file and the hardware resource constraint parameters, search the unit library from the built neural network component library, and generate the neural network processing corresponding to the neural network model according to the unit library The hardware description language code of the device;

Step 3, converting the hardware description language code into the hardware circuit of the neural network processor.

2. a kind of neural network processor-oriented automatic design method as claimed in claim 1, is characterized in that, described neural network processor comprises storage structure, control structure, computing structure.

3. a kind of neural network processor-oriented automatic design method as claimed in claim 1, is characterized in that, described neural network model description file comprises three parts of basic attribute, parameter description and connection information, wherein said basic attribute includes Layer name and layer type, the parameter description includes the number of output layers, convolution kernel size and step size, and the connection information includes connection name, connection direction, and connection type.

4. A neural network processor-oriented automatic design method according to claim 1, wherein the neural network reusable unit library includes two parts, a hardware description file and a configuration script.

5. a kind of automatic design method facing neural network processor as claimed in claim 1, is characterized in that, described neural network reusable unit storehouse comprises neuron unit, accumulator unit, pooling unit, classifier unit , a partial response normalization unit, a lookup table unit, an address generation unit, and a control unit.

6. A neural network processor-oriented automatic design method according to claim 1, characterized in that said neural network processor comprises a main address generation unit, a data address generation unit and a weight address generation unit.

7. A kind of neural network processor-oriented automatic design method as claimed in claim 1, is characterized in that, also comprises determining the data path according to the neural network model specified by the user and the hardware resource constraint parameter, and according to the neural network middle layer characteristic Determine the data resource sharing method;

Generate an address access flow of the memory according to the hardware configuration and network characteristics, and the address access flow is described by means of a finite state machine;

Generate a hardware description language code, and then convert it into a hardware circuit of the neural network processor.

8. a kind of neural network processor-oriented automatic design method as claimed in claim 1, is characterized in that, also comprises according to described neural network model, described hardware resource constraint parameter, described hardware description language code, generates data Memory mapping and control instruction flow.

9. An automatic design device for neural network processors, characterized in that it comprises:

Obtaining a data module, configured to obtain a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;

Generate a hardware description language code module, which is used to search for a unit library from the constructed neural network component library according to the neural network model description file and the hardware resource constraint parameters, and generate a module corresponding to the neural network according to the unit library. The hardware description language code of the neural network processor of the network model;

A hardware circuit module is generated for converting the hardware description language code into a hardware circuit of the neural network processor.

10 . The neural network processor-oriented automatic design device according to claim 9 , wherein the neural network processor comprises a storage structure, a control structure, and a calculation structure. 11 .

11. A neural network processor-oriented automatic design device according to claim 9, wherein the neural network model description file includes three parts: basic attributes, parameter descriptions and connection information, wherein the basic attributes include Layer name and layer type, the parameter description includes the number of output layers, convolution kernel size and step size, and the connection information includes connection name, connection direction, and connection type.

12. The neural network processor-oriented automatic design device according to claim 9, wherein the neural network reusable unit library includes two parts: a hardware description file and a configuration script.

13. A kind of automatic design device for neural network processor as claimed in claim 9, is characterized in that, described neural network reusable unit storehouse comprises neuron unit, accumulator unit, pooling unit, classifier unit , a partial response normalization unit, a lookup table unit, an address generation unit, and a control unit.

14 . The neural network processor-oriented automatic design device according to claim 9 , wherein the neural network processor comprises a main address generation unit, a data address generation unit and a weight address generation unit. 15 .

15. A neural network processor-oriented automatic design device as claimed in claim 9, further comprising determining the data path according to the user-specified neural network model and hardware resource constraint parameters, and determining the data path according to the characteristics of the neural network middle layer Determine the data resource sharing method;

16. A kind of neural network processor-oriented automatic design device as claimed in claim 9, is characterized in that, also comprises according to described neural network model, described hardware resource constraint parameter, described hardware description language code, generates data Memory mapping and control instruction flow.

17. A kind of optimization method based on a kind of automatic design method for neural network processor as described in any one of claim 1-8, it is characterized in that, comprising:

Step 1, define the convolution kernel size as k*k, stepping as s, memory width as d, number of data images as t, if k^2=d^2, divide the data into k*k size data blocks , the data width is consistent with the memory width, ensuring that the data is stored continuously in the memory;

Step 2, if k^2! =d^2, and stepping s is the greatest common divisor of k and memory width d, divide the data into data blocks of s*s size, and ensure that the data in a data map is continuously stored in the memory;

Step 3, if the above two items are not satisfied, find the greatest common divisor f of step s, k, and memory width d, divide the data into data blocks of size f*f, and store t data maps alternately.