CN107103113A - Towards the Automation Design method, device and the optimization method of neural network processor - Google Patents
Towards the Automation Design method, device and the optimization method of neural network processor Download PDFInfo
- Publication number
- CN107103113A CN107103113A CN201710178281.3A CN201710178281A CN107103113A CN 107103113 A CN107103113 A CN 107103113A CN 201710178281 A CN201710178281 A CN 201710178281A CN 107103113 A CN107103113 A CN 107103113A
- Authority
- CN
- China
- Prior art keywords
- neural network
- unit
- data
- hardware
- network processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Devices For Executing Special Programs (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
本发明提出一种面向神经网络处理器的自动化设计方法、装置及优化方法,该方法包括步骤1,获取神经网络模型描述文件、硬件资源约束参数,其中所述硬件资源约束参数包括硬件资源大小及目标运行速度;步骤2,根据所述神经网络模型描述文件与所述硬件资源约束参数,从已构建的神经网络组件库中查找单元库,并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码;步骤3,将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。
The present invention proposes a neural network processor-oriented automatic design method, device and optimization method. The method includes step 1, obtaining a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and Target running speed; step 2, according to the neural network model description file and the hardware resource constraint parameters, search the unit library from the built neural network component library, and generate the corresponding neural network model according to the unit library The hardware description language code of the neural network processor; step 3, converting the hardware description language code into the hardware circuit of the neural network processor.
Description
技术领域technical field
本发明涉及神经网络处理器体系结构技术领域,特别涉及面向神经网络处理器的自动化设计方法、装置及优化方法。The invention relates to the technical field of neural network processor architecture, in particular to an automatic design method, device and optimization method for neural network processors.
背景技术Background technique
深度学习及神经网络技术的飞速发展为大规模数据处理任务提供了新的解决途径,各种新型神经网络模型在处理复杂抽象问题上有着出色表现,其在视觉图像处理、语音识别及智能机器人等领域的新型应用层出不穷。The rapid development of deep learning and neural network technology provides new solutions for large-scale data processing tasks. Various new neural network models have excellent performance in dealing with complex and abstract problems. New applications in the field emerge in an endless stream.
目前利用深度神经网络进行实时任务分析大多依靠大规模高性能处理器或通用图形处理器,这些设备成本高功耗大,面向便携式智能设备应用时,存在电路规模大、能量消耗高和产品价格昂贵等一系列问题。因此,针对嵌入式设备及小型低成本数据中心等应用领域中高能效实时处理的应用,采用专用神经网络处理器加速而不是软件的方式进行神经网络模型计算成为一种更有效的解决方案,然而神经网络模型的拓扑结构及参数设计会根据不同的应用场景而改变,另外神经网络模型的发展更迭速度很快,提供一种可以面向各种应用场景并覆盖各种神经网络模型的通用高效神经网络处理器非常困难,这为高层应用开发者针对不同应用需求设计硬件加速解决方案带来了极大不变。At present, the real-time task analysis using deep neural networks mostly relies on large-scale high-performance processors or general-purpose graphics processors. These devices have high cost and high power consumption. When they are applied to portable smart devices, they have large circuit scale, high energy consumption and expensive products. And so on a series of questions. Therefore, for energy-efficient real-time processing applications in embedded devices and small low-cost data centers, it is a more effective solution to use dedicated neural network processors to accelerate neural network model calculations instead of software. The topology and parameter design of the network model will change according to different application scenarios. In addition, the development and change of the neural network model is very fast, providing a general and efficient neural network processing that can face various application scenarios and cover various neural network models. It is very difficult for high-level application developers to design hardware acceleration solutions for different application requirements.
目前现有的神经网络硬件加速技术包括专用集成电路(Application SpecificIntegrated Circuit,ASIC)芯片和现场可编程门阵列(Field Programmable Gate Array,FPGA)两种方式。在同等工艺条件下,ASIC芯片运行速度快且功耗低,但设计流程复杂、投片周期长、开发成本高,无法适应神经网络模型快速更新的特点;FPGA具有电路配置灵活、开发周期短的特点,但运行速度相对低,硬件开销及功耗相对较大。无论采用上述哪种硬件加速技术,均需要神经网络模型及算法开发人员在了解网络拓扑和数据流模式的同时掌握硬件开发技术,包括处理器架构设计、硬件代码编写、仿真验证及布局布线等环节,这些技术对专注于研究神经网络模型及结构设计、而不具备硬件设计能力的高层应用开发人员而言开发难度较高。因此,为了使高层开发者高效地进行神经网络技术应用开发,提供一种面向多种神经网络模型的神经网络处理器自动化设计方法及工具是非常迫切的。Currently, existing neural network hardware acceleration technologies include two methods, Application Specific Integrated Circuit (ASIC) chip and Field Programmable Gate Array (Field Programmable Gate Array, FPGA). Under the same process conditions, ASIC chips run fast and consume low power consumption, but the design process is complicated, the casting cycle is long, and the development cost is high, which cannot adapt to the characteristics of rapid update of neural network models; FPGA has the characteristics of flexible circuit configuration and short development cycle. features, but the running speed is relatively low, and the hardware overhead and power consumption are relatively large. No matter which of the above hardware acceleration technologies is used, it is necessary for the neural network model and algorithm developers to understand the network topology and data flow mode while mastering the hardware development technology, including processor architecture design, hardware code writing, simulation verification, layout and wiring, etc. , these technologies are more difficult to develop for high-level application developers who focus on researching neural network models and structural design, but do not have hardware design capabilities. Therefore, in order to enable high-level developers to efficiently develop neural network technology applications, it is very urgent to provide an automatic design method and tool for neural network processors oriented to multiple neural network models.
发明内容Contents of the invention
针对现有技术的不足,本发明提出面向神经网络处理器的自动化设计方法、装置及优化方法。Aiming at the deficiencies of the prior art, the present invention proposes an automatic design method, device and optimization method for neural network processors.
本发明提出一种面向神经网络处理器的自动化设计方法,包括:The present invention proposes an automatic design method for neural network processors, including:
步骤1,获取神经网络模型描述文件、硬件资源约束参数,其中所述硬件资源约束参数包括硬件资源大小及目标运行速度;Step 1, obtaining the neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;
步骤2,根据所述神经网络模型描述文件与所述硬件资源约束参数,从已构建的神经网络组件库中查找单元库,并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码;Step 2, according to the neural network model description file and the hardware resource constraint parameters, search the unit library from the built neural network component library, and generate the neural network processing corresponding to the neural network model according to the unit library The hardware description language code of the device;
步骤3,将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。Step 3, converting the hardware description language code into the hardware circuit of the neural network processor.
所述神经网络处理器包括存储结构、控制结构、计算结构。The neural network processor includes a storage structure, a control structure, and a calculation structure.
所述神经网络模型描述文件包括基本属性、参数描述与连接信息三部分,其中所述基本属性包括层名称与层类型,所述参数描述包括输出层数、卷积核大小与步进大小,所述连接信息包括连接名称、连接方向、连接类型。The neural network model description file includes three parts: basic attributes, parameter descriptions, and connection information, wherein the basic attributes include layer names and layer types, and the parameter descriptions include the number of output layers, convolution kernel size, and step size. The above connection information includes connection name, connection direction and connection type.
所述神经网络可复用单元库包括硬件描述文件及配置脚本两部分。The neural network reusable unit library includes two parts: a hardware description file and a configuration script.
所述神经网络可复用单元库包括神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元。The neural network reusable unit library includes a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit.
所述神经网络处理器包括主地址生成单元、数据地址生成单元与权重地址生成单元。The neural network processor includes a main address generation unit, a data address generation unit and a weight address generation unit.
还包括根据用户指定的神经网络模型与硬件资源约束参数确定数据路径,并根据神经网络中间层特征确定数据资源共享方式;It also includes determining the data path according to the user-specified neural network model and hardware resource constraint parameters, and determining the data resource sharing method according to the characteristics of the middle layer of the neural network;
根据硬件配置与网络特征生成存储器的地址访问流,所述地址访问流通过有限状态机的方式描述;Generate an address access flow of the memory according to the hardware configuration and network characteristics, and the address access flow is described by means of a finite state machine;
生成硬件描述语言代码,进而转化为所述神经网络处理器的硬件电路。Generate a hardware description language code, and then convert it into a hardware circuit of the neural network processor.
还包括根据所述神经网络模型、所述硬件资源约束参数、所述硬件描述语言代码,生成数据存储映射与控制指令流。It also includes generating a data storage map and a control instruction flow according to the neural network model, the hardware resource constraint parameters, and the hardware description language code.
本发明还包括一种面向神经网络处理器的自动化设计装置,包括:The present invention also includes a neural network processor-oriented automatic design device, comprising:
获取数据模块,用于获取神经网络模型描述文件、硬件资源约束参数,其中所述硬件资源约束参数包括硬件资源大小及目标运行速度;Obtaining a data module, configured to obtain a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;
生成硬件描述语言代码模块,用于根据所述神经网络模型描述文件与所述硬件资源约束参数,从已构建的神经网络组件库中查找单元库,并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码;Generate a hardware description language code module, which is used to search for a unit library from the constructed neural network component library according to the neural network model description file and the hardware resource constraint parameters, and generate a module corresponding to the neural network according to the unit library. The hardware description language code of the neural network processor of the network model;
生成硬件电路模块,用于将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。A hardware circuit module is generated for converting the hardware description language code into a hardware circuit of the neural network processor.
所述神经网络处理器包括存储结构、控制结构、计算结构。The neural network processor includes a storage structure, a control structure, and a calculation structure.
所述神经网络模型描述文件包括基本属性、参数描述与连接信息三部分,其中所述基本属性包括层名称与层类型,所述参数描述包括输出层数、卷积核大小与步进大小,所述连接信息包括连接名称、连接方向、连接类型。The neural network model description file includes three parts: basic attributes, parameter descriptions, and connection information, wherein the basic attributes include layer names and layer types, and the parameter descriptions include the number of output layers, convolution kernel size, and step size. The above connection information includes connection name, connection direction and connection type.
所述神经网络可复用单元库包括硬件描述文件及配置脚本两部分。The neural network reusable unit library includes two parts: a hardware description file and a configuration script.
所述神经网络可复用单元库包括神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元。The neural network reusable unit library includes a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit.
所述神经网络处理器包括主地址生成单元、数据地址生成单元与权重地址生成单元。The neural network processor includes a main address generation unit, a data address generation unit and a weight address generation unit.
还包括根据用户指定的神经网络模型与硬件资源约束参数确定数据路径,并根据神经网络中间层特征确定数据资源共享方式;It also includes determining the data path according to the user-specified neural network model and hardware resource constraint parameters, and determining the data resource sharing method according to the characteristics of the middle layer of the neural network;
根据硬件配置与网络特征生成存储器的地址访问流,所述地址访问流通过有限状态机的方式描述;Generate an address access flow of the memory according to the hardware configuration and network characteristics, and the address access flow is described by means of a finite state machine;
生成硬件描述语言代码,进而转化为所述神经网络处理器的硬件电路。Generate a hardware description language code, and then convert it into a hardware circuit of the neural network processor.
还包括根据所述神经网络模型、所述硬件资源约束参数、所述硬件描述语言代码,生成数据存储映射与控制指令流。It also includes generating a data storage map and a control instruction flow according to the neural network model, the hardware resource constraint parameters, and the hardware description language code.
本发明还提出一种基于如所述的一种面向神经网络处理器的自动化设计方法的优化方法,包括:The present invention also proposes an optimization method based on a neural network processor-oriented automatic design method as described, including:
步骤1,定义卷积核大小为k*k,步进为s,存储器宽度为d,数据图张数为t,如果k^2=d^2,将数据划分为k*k大小的数据块,数据宽度与存储器宽度一致,保证数据在存储器中连续存储;Step 1, define the convolution kernel size as k*k, stepping as s, memory width as d, number of data images as t, if k^2=d^2, divide the data into k*k size data blocks , the data width is consistent with the memory width, ensuring that the data is stored continuously in the memory;
步骤2,如果k^2!=d^2,并且步进s是k与存储器宽度d的最大公约数,将数据划分为s*s大小的数据块,保证在一张数据图中数据在存储器中连续存储;Step 2, if k^2! =d^2, and stepping s is the greatest common divisor of k and memory width d, divide the data into data blocks of s*s size, and ensure that the data in a data map is continuously stored in the memory;
步骤3,若以上两项都不满足,则求出步进s、k、存储器宽度d的最大公约数f,将数据分割为大小为f*f的数据块,t张数据图交替存储。Step 3, if the above two items are not satisfied, find the greatest common divisor f of step s, k, and memory width d, divide the data into data blocks of size f*f, and store t data maps alternately.
由以上方案可知,本发明的优点在于:As can be seen from the above scheme, the present invention has the advantages of:
本发明可以将神经网络模型映射为硬件电路并根据硬件资源约束和网络特征自动优化电路结构及数据存储方式,同时生成相应的控制指令流,实现了神经网络硬件加速器的硬件及软件自动化协同设计,在缩短神经网络处理器设计周期的同时提高了神经网络处理器运算能效。The present invention can map the neural network model into a hardware circuit, automatically optimize the circuit structure and data storage mode according to hardware resource constraints and network characteristics, and generate corresponding control instruction streams at the same time, realizing the hardware and software automatic collaborative design of the neural network hardware accelerator, While shortening the design cycle of the neural network processor, the computing energy efficiency of the neural network processor is improved.
附图说明Description of drawings
图1是本发明提供的神经网络处理器的FPGA自动实现工具工作流程图;Fig. 1 is the FPGA automatic implementation tool work flow chart of neural network processor provided by the present invention;
图2是本发明本发明可自动生成的神经网络处理器系统示意图;Fig. 2 is a schematic diagram of a neural network processor system that can be automatically generated by the present invention;
图3是本发明采用的神经网络可复用单元库示意图;Fig. 3 is a schematic diagram of the neural network reusable unit library used in the present invention;
图4是本发明采用的地址生成电路接口示意图。Fig. 4 is a schematic diagram of the interface of the address generation circuit used in the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案、设计方法及优点更加清楚明了,以下结合附图通过具体实施例对本发明进一步详细说明,应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution, design method and advantages of the present invention clearer, the present invention will be further described in detail through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, and It is not intended to limit the invention.
本发明旨在提供面向神经网络处理器的自动化设计方法、装置及优化方法,该装置包括一硬件生成器和一编译器,所述硬件生成器可根据神经网络类型及硬件资源约束自动生成神经网络处理器的硬件描述语言代码,随后设计人员利用已有硬件电路设计方法通过硬件描述语言生成处理器硬件电路;所述编译器可根据神经网络处理器电路结构生成控制和数据调度指令流。The present invention aims to provide an automatic design method, device and optimization method for neural network processors, the device includes a hardware generator and a compiler, and the hardware generator can automatically generate a neural network according to the neural network type and hardware resource constraints The hardware description language code of the processor, and then the designer uses the existing hardware circuit design method to generate the processor hardware circuit through the hardware description language; the compiler can generate the control and data scheduling instruction flow according to the neural network processor circuit structure.
图1为本发明提供的神经网络处理器自动化生成技术示意图,具体步骤为:Fig. 1 is a schematic diagram of the neural network processor automatic generation technology provided by the present invention, and the specific steps are:
步骤1,本发明装置读取神经网络模型描述文件,描述文件内包括网络拓扑结构和各个运算层定义;Step 1, the device of the present invention reads the neural network model description file, which includes the network topology and the definitions of each computing layer;
步骤2,本发明装置读入硬件资源约束参数,硬件约束参数包括硬件资源大小及目标运行速度等,本发明装置可根据硬件约束参数生成相应的电路结构;Step 2, the device of the present invention reads in hardware resource constraint parameters, which include hardware resource size and target operating speed, etc., and the device of the present invention can generate a corresponding circuit structure according to the hardware constraint parameters;
步骤3,本发明装置根据所述神经网络模型描述脚本和硬件资源约束从已经构建好的神经网络组件库中索引适合的单元库,该工具所包含的硬件电路生成器利用上述单元库生成对应该神经网络模型的神经网络处理器硬件描述语言代码;Step 3, the device of the present invention indexes a suitable unit library from the constructed neural network component library according to the neural network model description script and hardware resource constraints, and the hardware circuit generator contained in the tool uses the above-mentioned unit library to generate a corresponding Neural Network Processor Hardware Description Language code for the neural network model;
步骤4,本发明装置所包含的编译器根据神经网络模型、逻辑资源约束及生成的硬件描述语言代码生成数据存储映射和控制指令流;Step 4, the compiler included in the device of the present invention generates data storage mapping and control instruction flow according to the neural network model, logic resource constraints and generated hardware description language code;
步骤5,通过已有硬件设计方法将硬件描述语言转化为硬件电路。Step 5, convert the hardware description language into a hardware circuit through the existing hardware design method.
本发明可自动生成的神经网络处理器基于存储-控制-计算的结构;The automatically generated neural network processor of the present invention is based on the storage-control-computation structure;
存储结构用于存储参与计算的数据、神经网络权重及处理器操作指令;The storage structure is used to store data involved in calculations, neural network weights and processor operation instructions;
控制结构包括译码电路与控制逻辑电路,用于解析操作指令,生成控制信号,该信号用于控制片上数据的调度与存储以及神经网络计算过程;The control structure includes a decoding circuit and a control logic circuit, which are used to analyze operation instructions and generate control signals, which are used to control the scheduling and storage of on-chip data and the neural network calculation process;
计算结构包括计算单元,用于参与该处理器中的神经网络计算操作。The computing structure includes computing units for participating in neural network computing operations in the processor.
图2为本发明可自动生成的神经网络处理器系统101示意图,该神经网络处理器系统101架构由七个部分构成,包括输入数据存储单元102、控制单元103、输出数据存储单元104、权重存储单元105、指令存储单元106、计算单元107。Fig. 2 is a schematic diagram of the neural network processor system 101 that can be automatically generated in the present invention. The architecture of the neural network processor system 101 is composed of seven parts, including an input data storage unit 102, a control unit 103, an output data storage unit 104, and a weight storage unit. Unit 105 , instruction storage unit 106 , calculation unit 107 .
输入数据存储单元102用于存储参与计算的数据,该数据包括原始特征图数据和参与中间层计算的数据;输出数据存储单元104存储计算得到的神经元响应值;指令存储单元106存储参与计算的指令信息,指令被解析为控制流来调度神经网络计算;权重存储单元105用于存储已经训练好的神经网络权重;The input data storage unit 102 is used to store the data involved in the calculation, which includes the original feature map data and the data involved in the calculation of the intermediate layer; the output data storage unit 104 stores the calculated neuron response value; the instruction storage unit 106 stores the data involved in the calculation Instruction information, the instruction is parsed into a control flow to schedule neural network calculations; the weight storage unit 105 is used to store trained neural network weights;
控制单元103分别与输出数据存储单元104、权重存储单元105、指令存储单元106、计算单元107相连,控制单元103获得保存在指令存储单元106中的指令并且解析该指令,控制单元103可根据解析指令得到的控制信号控制计算单元进行神经网络计算。The control unit 103 is connected to the output data storage unit 104, the weight storage unit 105, the instruction storage unit 106, and the calculation unit 107 respectively. The control unit 103 obtains the instruction stored in the instruction storage unit 106 and analyzes the instruction. The control unit 103 can analyze the instruction according to the The control signal obtained from the instruction controls the calculation unit to perform neural network calculation.
计算单元107用于根据控制单元103产生的控制信号来执行相应的神经网络计算。计算单元107与一个或多个存储单元相关联,计算单元107可以从与其相关联的输入数据存储单元102中的数据存储部件获得数据以进行计算,并且可以向与其相关联的输出数据存储单元104写入数据。计算单元107完成神经网络算法中的大部分运算,即向量乘加操作等。The calculation unit 107 is configured to perform corresponding neural network calculations according to the control signal generated by the control unit 103 . The calculation unit 107 is associated with one or more storage units, and the calculation unit 107 can obtain data from the data storage components in the input data storage unit 102 associated with it to perform calculations, and can send data to the output data storage unit 104 associated with it data input. The calculation unit 107 completes most of the operations in the neural network algorithm, that is, vector multiplication and addition operations and the like.
本发明通过提供所述神经网络描述文件格式描述神经网络模型特征,该描述文件内容包括基本属性、参数描述和连接信息三部分,其中基本属性包括层名称和层类型,参数描述包括,输出层数、卷积核大小和步进大小,连接信息包括连接名称、连接方向、连接类型。The present invention describes the characteristics of the neural network model by providing the neural network description file format. The content of the description file includes three parts: basic attributes, parameter descriptions, and connection information, wherein the basic attributes include layer names and layer types, and the parameter descriptions include the number of output layers. , convolution kernel size and step size, connection information includes connection name, connection direction, connection type.
为了适应各种神经网络模型的硬件实现设计,本发明提供的神经网络可复用单元库如图3,单元库包括硬件描述文件及配置脚本两部分。本发明提供的可复用单元库包括但不局限于:神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元等。In order to adapt to the hardware implementation design of various neural network models, the neural network reusable unit library provided by the present invention is shown in Figure 3. The unit library includes two parts: a hardware description file and a configuration script. The reusable unit library provided by the present invention includes but not limited to: neuron unit, accumulator unit, pooling unit, classifier unit, local response normalization unit, lookup table unit, address generation unit, control unit, etc.
本发明在利用上述可复用单元库构成神经网络处理器系统时,通过读取神经网络模型描述文件及硬件资源约束合理优化调用单元库。When the present invention utilizes the reusable unit library to form a neural network processor system, it rationally optimizes and calls the unit library by reading the neural network model description file and hardware resource constraints.
在神经网络处理器工作过程中,处理器需要自动获取片上和片外存储器数据的地址流,在本发明中,存储器地址流由编译器确定生成,由存储器地址流确定的存储器访问模式通过文本交互至硬件生成器,存储器访问模式包括主访问模式、数据访问模式和权重访问模式等。During the working process of the neural network processor, the processor needs to automatically obtain the address stream of on-chip and off-chip memory data. In the present invention, the memory address stream is determined and generated by the compiler, and the memory access mode determined by the memory address stream is passed through text interaction. To the hardware generator, the memory access mode includes master access mode, data access mode and weight access mode, etc.
硬件生成器依据所述存储器访问模式地址生成单元(AGU)。Hardware generators address generation unit (AGU) according to the memory access pattern.
利用本发明提供的神经网络处理器自动设计工具设计的神经网络处理器电路包括三种类型的地址生成单元,包括:主地址生成单元、数据地址生成单元和权重地址生成单元,其中,主地址生成单元负责片内存储器与片外存储器之间的数据交换,数据地址生成单元负责从片上存储器读取数据至计算单元以及将计算单元中间计算结果和最终计算结果存储至存储单元这两部分数据交换,权重地址生成单元负责从片上存储器读取权重数据至计算单元。The neural network processor circuit designed by the neural network processor automatic design tool provided by the present invention includes three types of address generation units, including: a main address generation unit, a data address generation unit and a weight address generation unit, wherein the main address generation unit The unit is responsible for the data exchange between the on-chip memory and the off-chip memory. The data address generation unit is responsible for reading data from the on-chip memory to the computing unit and storing the intermediate calculation results and final calculation results of the computing unit to the storage unit. The weight address generating unit is responsible for reading weight data from the on-chip memory to the computing unit.
在本发明中,硬件电路生成器与编译器协同工作实现地址生成电路的设计,具体设计算法步骤为:In the present invention, the hardware circuit generator and the compiler work together to realize the design of the address generation circuit, and the specific design algorithm steps are:
步骤1,本发明装置根据设计人员指定的神经网络模型和硬件约束确定数据路径,并依据神经网络中间层特征确定数据资源共享方式;Step 1, the device of the present invention determines the data path according to the neural network model and hardware constraints specified by the designer, and determines the data resource sharing mode according to the characteristics of the middle layer of the neural network;
步骤2,编译器根据硬件配置和网络特征生成存储器地址访问流,所述地址访问流由编译器通过有限状态机的方式描述;Step 2, the compiler generates a memory address access flow according to the hardware configuration and network characteristics, and the address access flow is described by the compiler through a finite state machine;
步骤3,所述有限状态机由硬件生成器映射为地址生成电路硬件描述语言,进而通过硬件电路设计方法映射为硬件电路。Step 3, the finite state machine is mapped into an address generation circuit hardware description language by a hardware generator, and then mapped into a hardware circuit through a hardware circuit design method.
图4为本发明提供的地址生成电路通用结构示意图。本发明所述地址生成电路具有通用信号接口,该接口包含的接口信号有:FIG. 4 is a schematic diagram of the general structure of the address generation circuit provided by the present invention. The address generation circuit of the present invention has a general signal interface, and the interface signals included in the interface are:
起始地址信号,即数据首地址;The starting address signal, that is, the first address of the data;
数据块尺寸信号,取一次数据的数据量;Data block size signal, the amount of data to fetch once;
存储器标志位信号,确定将数据存放的存储器编号;The memory flag bit signal determines the number of the memory where the data is stored;
工作模式信号,分为大卷积核取数据模式、小卷积核取数据模式、池化模式、全卷积模式等;The working mode signal is divided into large convolution kernel data acquisition mode, small convolution kernel data acquisition mode, pooling mode, full convolution mode, etc.;
卷积核尺寸信号,定义卷积核大小;The convolution kernel size signal defines the size of the convolution kernel;
长度信号,定义输出图片大小;The length signal defines the size of the output image;
输入层数目信号,标记输入层数目;input layer number signal, marking the number of input layers;
输出层数目信号,标记输出层数目;Output layer number signal, marking the number of output layers;
复位信号,该信号为1时,初始化地址生成电路;Reset signal, when the signal is 1, initialize the address generation circuit;
写使能信号,指定被访问存储器进行写操作;Write enable signal, specifying the accessed memory for write operation;
读使能信号,指定被访问存储器进行读操作;Read enable signal, specifying the accessed memory for read operation;
地址信号,给出访问存储器地址;Address signal, giving the access memory address;
结束信号,访问结束信号。End signal, access end signal.
所述参数确保AGU支持多种工作模式并保证在不同工作模式及神经网络传播过程中能够生成正确的读写地址流。The parameters ensure that the AGU supports multiple working modes and ensures that correct read and write address streams can be generated during different working modes and neural network propagation.
针对不同的目标网络,工具从所述模板中选取必要的参数构建地址发生器并提供片上及片外存储器访问模式。For different target networks, the tool selects the necessary parameters from the template to build an address generator and provides on-chip and off-chip memory access modes.
本发明提供的神经网路处理器使用数据驱动的方式构建处理器架构,因此所述地址生成电路不仅提供访问地址而且驱动不同神经层和及层数据块的执行。The neural network processor provided by the present invention constructs a processor architecture in a data-driven manner, so the address generation circuit not only provides access addresses but also drives the execution of different neural layers and layer data blocks.
由于资源约束的限制,神经网络模型在映射为硬件电路时无法按照其模型描述形式完整展开,因此本发明提出的自动设计工具采用软硬件协同工作的方式优化数据存储及访问机制,包括两部分内容:首先,编译器分析神经网络处理器的计算吞吐量和片上存储器大小,将神经网络特征数据和权重数据划分为适当的数据块集中存储和访问;其次,依据计算单元规模、存储器及数据位宽在数据块内进行数据分割。Due to resource constraints, when the neural network model is mapped to a hardware circuit, it cannot be completely expanded according to its model description form. Therefore, the automatic design tool proposed by the present invention optimizes the data storage and access mechanism by using software and hardware to work together, including two parts. : First, the compiler analyzes the computing throughput and on-chip memory size of the neural network processor, divides the neural network feature data and weight data into appropriate data blocks for centralized storage and access; secondly, according to the calculation unit scale, memory and data bit width Data partitioning is performed within a data block.
本发明基于上述优化机制提出一种数据存储及访问的优化方法,具体实施步骤为:The present invention proposes an optimization method for data storage and access based on the above-mentioned optimization mechanism, and the specific implementation steps are:
步骤1,定义卷积核大小为k*k,步进为s,存储器宽度为d,数据图张数为t,如果k^2=d^2,将数据划分为k*k大小的数据块,数据宽度和存储器宽度一致,保证数据在存储器中连续存储;Step 1, define the convolution kernel size as k*k, stepping as s, memory width as d, number of data images as t, if k^2=d^2, divide the data into k*k size data blocks , the data width is consistent with the memory width, ensuring that the data is stored continuously in the memory;
步骤2,如果k^2!=d^2,并且s是k和d的最大公约数,将数据划分为s*s大小的数据块,保证在一张数据图中数据可在存储器中连续存储;Step 2, if k^2! =d^2, and s is the greatest common divisor of k and d, divide the data into data blocks of s*s size, and ensure that the data in a data map can be continuously stored in the memory;
步骤3,若以上两项都不满足,则求出s、k、d的最大公约数f,将数据分割为大小为f*f的数据块,t张数据图交替存储。Step 3, if the above two items are not satisfied, find the greatest common divisor f of s, k, and d, divide the data into data blocks with a size of f*f, and store t data graphs alternately.
神经网络的计算数据包括输入特征数据和训练好的权重数据,通过良好的数据存储布局可以减小处理器内部数据带宽并提高存储空间利用效率。本发明提供的自动设计工具通过增加处理器数据存储局部性提高处理器的计算效率。The calculation data of the neural network includes input feature data and trained weight data. Through a good data storage layout, the internal data bandwidth of the processor can be reduced and the storage space utilization efficiency can be improved. The automatic design tool provided by the invention improves the calculation efficiency of the processor by increasing the processor data storage locality.
综上所述,本发明提供一款面向神经网络处理器的自动化设计工具,该工具具有从神经网络模型映射为描述神经网络处理器的硬件代码、依据硬件资源约束优化处理器架构和自动生成控制流指令等功能,实现了神经网络处理器的自动化设计,降低了神经网络处理器的设计周期,适应了神经网络技术网络模型更新快、运算速度要求块、能量效率要求高的应用特点。In summary, the present invention provides an automatic design tool for neural network processors, which has the functions of mapping from neural network models to describe the hardware codes of neural network processors, optimizing processor architecture according to hardware resource constraints, and automatically generating control Streaming instructions and other functions realize the automatic design of neural network processors, reduce the design cycle of neural network processors, and adapt to the application characteristics of neural network technology with fast update of network models, high computing speed requirements, and high energy efficiency requirements.
应当理解,虽然本说明书是按照各个实施例描述的,但并非每个实施例仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施例中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。It should be understood that although this description is described according to various embodiments, not each embodiment only includes an independent technical solution, and this description of the description is only for clarity, and those skilled in the art should take the description as a whole , the technical solutions in the various embodiments can also be properly combined to form other implementations that can be understood by those skilled in the art.
本发明还提出一种面向神经网络处理器的自动化设计装置,包括:The present invention also proposes an automatic design device for neural network processors, including:
获取数据模块,用于获取神经网络模型描述文件、硬件资源约束参数,其中所述硬件资源约束参数包括硬件资源大小及目标运行速度;Obtaining a data module, configured to obtain a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target operating speed;
生成硬件描述语言代码模块,用于根据所述神经网络模型描述文件与所述硬件资源约束参数,从已构建的神经网络组件库中查找单元库,并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码;Generate a hardware description language code module, which is used to search for a unit library from the constructed neural network component library according to the neural network model description file and the hardware resource constraint parameters, and generate a module corresponding to the neural network according to the unit library. The hardware description language code of the neural network processor of the network model;
生成硬件电路模块,用于将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。A hardware circuit module is generated for converting the hardware description language code into a hardware circuit of the neural network processor.
所述神经网络处理器包括存储结构、控制结构、计算结构。The neural network processor includes a storage structure, a control structure, and a calculation structure.
所述神经网络模型描述文件包括基本属性、参数描述与连接信息三部分,其中所述基本属性包括层名称与层类型,所述参数描述包括输出层数、卷积核大小与步进大小,所述连接信息包括连接名称、连接方向、连接类型。The neural network model description file includes three parts: basic attributes, parameter descriptions, and connection information, wherein the basic attributes include layer names and layer types, and the parameter descriptions include the number of output layers, convolution kernel size, and step size. The above connection information includes connection name, connection direction and connection type.
所述神经网络处理器包括主地址生成单元、数据地址生成单元与权重地址生成单元。The neural network processor includes a main address generation unit, a data address generation unit and a weight address generation unit.
还包括根据用户指定的神经网络模型与硬件资源约束参数确定数据路径,并根据神经网络中间层特征确定数据资源共享方式;It also includes determining the data path according to the user-specified neural network model and hardware resource constraint parameters, and determining the data resource sharing method according to the characteristics of the middle layer of the neural network;
根据硬件配置与网络特征生成存储器的地址访问流,所述地址访问流通过有限状态机的方式描述;Generate an address access flow of the memory according to the hardware configuration and network characteristics, and the address access flow is described by means of a finite state machine;
所述神经网络可复用单元库包括硬件描述文件及配置脚本两部分。The neural network reusable unit library includes two parts: a hardware description file and a configuration script.
所述神经网络可复用单元库包括神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元。The neural network reusable unit library includes a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit.
将所述有限状态机映射为地址,并生成硬件描述语言代码,进而转化为所述神经网络处理器的硬件电路。The finite state machine is mapped to an address, and a hardware description language code is generated, and then converted into a hardware circuit of the neural network processor.
还包括根据所述神经网络模型、所述硬件资源约束参数、所述硬件描述语言代码,生成数据存储映射与控制指令流。It also includes generating a data storage map and a control instruction flow according to the neural network model, the hardware resource constraint parameters, and the hardware description language code.
以上所述仅为本发明示意性的具体实施方式,并非用以限定本发明的范围。任何本领域的技术人员,在不脱离本发明的构思和原则的前提下所作的等同变化、修改与结合,均应属于本发明保护的范围。The above descriptions are only illustrative specific implementations of the present invention, and are not intended to limit the scope of the present invention. Any equivalent changes, modifications and combinations made by those skilled in the art without departing from the concept and principle of the present invention shall fall within the protection scope of the present invention.
Claims (17)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710178281.3A CN107103113B (en) | 2017-03-23 | 2017-03-23 | The Automation Design method, apparatus and optimization method towards neural network processor |
PCT/CN2018/080207 WO2018171717A1 (en) | 2017-03-23 | 2018-03-23 | Automated design method and system for neural network processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710178281.3A CN107103113B (en) | 2017-03-23 | 2017-03-23 | The Automation Design method, apparatus and optimization method towards neural network processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107103113A true CN107103113A (en) | 2017-08-29 |
CN107103113B CN107103113B (en) | 2019-01-11 |
Family
ID=59676152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710178281.3A Active CN107103113B (en) | 2017-03-23 | 2017-03-23 | The Automation Design method, apparatus and optimization method towards neural network processor |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107103113B (en) |
WO (1) | WO2018171717A1 (en) |
Cited By (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341761A (en) * | 2017-07-12 | 2017-11-10 | 成都品果科技有限公司 | A kind of calculating of deep neural network performs method and system |
CN107633295A (en) * | 2017-09-25 | 2018-01-26 | 北京地平线信息技术有限公司 | For the method and apparatus for the parameter for being adapted to neutral net |
CN108154229A (en) * | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
CN108388943A (en) * | 2018-01-08 | 2018-08-10 | 中国科学院计算技术研究所 | A kind of pond device and method suitable for neural network |
CN108389183A (en) * | 2018-01-24 | 2018-08-10 | 上海交通大学 | Pulmonary nodule detects neural network accelerator and its control method |
CN108563808A (en) * | 2018-01-05 | 2018-09-21 | 中国科学技术大学 | The design method of heterogeneous reconfigurable figure computation accelerator system based on FPGA |
WO2018171717A1 (en) * | 2017-03-23 | 2018-09-27 | 中国科学院计算技术研究所 | Automated design method and system for neural network processor |
CN108921289A (en) * | 2018-06-20 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of FPGA isomery accelerated method, apparatus and system |
CN109685203A (en) * | 2018-12-21 | 2019-04-26 | 北京中科寒武纪科技有限公司 | Data processing method, device, computer system and storage medium |
CN109697509A (en) * | 2017-10-24 | 2019-04-30 | 上海寒武纪信息科技有限公司 | Processing method and processing device, operation method and device |
CN109726805A (en) * | 2017-10-30 | 2019-05-07 | 上海寒武纪信息科技有限公司 | The method for carrying out neural network processor design using black box simulator |
CN109726797A (en) * | 2018-12-21 | 2019-05-07 | 北京中科寒武纪科技有限公司 | Data processing method, device, computer system and storage medium |
CN109739802A (en) * | 2019-04-01 | 2019-05-10 | 上海燧原智能科技有限公司 | Computing cluster and computing cluster configuration method |
CN109754084A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Processing method, device and the Related product of network structure |
CN109754073A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Data processing method, device, electronic equipment and readable storage medium storing program for executing |
CN109978160A (en) * | 2019-03-25 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Configuration device, method and the Related product of artificial intelligence process device |
CN109993288A (en) * | 2017-12-29 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Processing with Neural Network method, computer system and storage medium |
CN110097179A (en) * | 2018-01-29 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Computer equipment, data processing method and storage medium |
CN110097180A (en) * | 2018-01-29 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Computer equipment, data processing method and storage medium |
CN110955380A (en) * | 2018-09-21 | 2020-04-03 | 中科寒武纪科技股份有限公司 | Access data generation method, storage medium, computer device and apparatus |
WO2020078446A1 (en) * | 2018-10-19 | 2020-04-23 | 中科寒武纪科技股份有限公司 | Computation method and apparatus, and related product |
CN111079909A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079914A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079924A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111078293A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111079912A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079910A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111079907A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111079911A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079916A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111126572A (en) * | 2019-12-26 | 2020-05-08 | 北京奇艺世纪科技有限公司 | Model parameter processing method and device, electronic equipment and storage medium |
CN111144561A (en) * | 2018-11-05 | 2020-05-12 | 杭州海康威视数字技术股份有限公司 | Neural network model determining method and device |
WO2020093885A1 (en) * | 2018-11-09 | 2020-05-14 | 北京灵汐科技有限公司 | Heterogeneous collaborative computing system |
CN111325311A (en) * | 2018-12-14 | 2020-06-23 | 深圳云天励飞技术有限公司 | Neural network model generation method, device, electronic device and storage medium |
CN111339027A (en) * | 2020-02-25 | 2020-06-26 | 中国科学院苏州纳米技术与纳米仿生研究所 | Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip |
CN111488969A (en) * | 2020-04-03 | 2020-08-04 | 北京思朗科技有限责任公司 | Execution optimization method and device based on neural network accelerator |
KR20200100528A (en) * | 2017-12-29 | 2020-08-26 | 캠브리콘 테크놀로지스 코퍼레이션 리미티드 | Neural network processing method, computer system and storage medium |
CN111868754A (en) * | 2018-03-23 | 2020-10-30 | 索尼公司 | Information processing apparatus and information processing method |
CN111931926A (en) * | 2020-10-12 | 2020-11-13 | 南京风兴科技有限公司 | Hardware acceleration system and control method for convolutional neural network CNN |
CN111949405A (en) * | 2020-08-13 | 2020-11-17 | Oppo广东移动通信有限公司 | Resource scheduling method, hardware accelerator and electronic device |
CN112052943A (en) * | 2019-06-05 | 2020-12-08 | 三星电子株式会社 | Electronic device and method for performing operation of the same |
CN112132271A (en) * | 2019-06-25 | 2020-12-25 | Oppo广东移动通信有限公司 | Neural network accelerator operation method, architecture and related device |
CN112912837A (en) * | 2018-11-08 | 2021-06-04 | 北京比特大陆科技有限公司 | Neural network compiling method, device, equipment, storage medium and program product |
US11113104B2 (en) | 2017-11-20 | 2021-09-07 | Shanghai Cambricon Information Technology Co., Ltd | Task parallel processing method, apparatus and system, storage medium and computer device |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11521046B2 (en) | 2017-11-08 | 2022-12-06 | Samsung Electronics Co., Ltd. | Time-delayed convolutions for neural network device and method |
CN115462079A (en) * | 2019-08-13 | 2022-12-09 | 深圳鲲云信息科技有限公司 | Neural network data stream acceleration method and device, computer equipment and storage medium |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
US11681899B2 (en) | 2018-12-07 | 2023-06-20 | Samsong Electronics Co., Ltd. | Dividing neural networks |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11361457B2 (en) | 2018-07-20 | 2022-06-14 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
JP7539872B2 (en) | 2018-10-11 | 2024-08-26 | テスラ,インコーポレイテッド | SYSTEM AND METHOD FOR TRAINING MACHINE MODELS WITH AUGMENTED DATA - Patent application |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US10956755B2 (en) | 2019-02-19 | 2021-03-23 | Tesla, Inc. | Estimating object properties using visual image data |
US12112112B2 (en) | 2020-11-12 | 2024-10-08 | Samsung Electronics Co., Ltd. | Method for co-design of hardware and neural network architectures using coarse-to-fine search, two-phased block distillation and neural hardware predictor |
JP2023032348A (en) * | 2021-08-26 | 2023-03-09 | 国立大学法人 東京大学 | Information processing device, and program |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022468A (en) * | 2016-05-17 | 2016-10-12 | 成都启英泰伦科技有限公司 | Artificial neural network processor integrated circuit and design method therefor |
WO2016179533A1 (en) * | 2015-05-06 | 2016-11-10 | Indiana University Research And Technology Corporation | Sensor signal processing using an analog neural network |
CN106447034A (en) * | 2016-10-27 | 2017-02-22 | 中国科学院计算技术研究所 | Neutral network processor based on data compression, design method and chip |
CN106529670A (en) * | 2016-10-27 | 2017-03-22 | 中国科学院计算技术研究所 | Neural network processor based on weight compression, design method, and chip |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103113B (en) * | 2017-03-23 | 2019-01-11 | 中国科学院计算技术研究所 | The Automation Design method, apparatus and optimization method towards neural network processor |
-
2017
- 2017-03-23 CN CN201710178281.3A patent/CN107103113B/en active Active
-
2018
- 2018-03-23 WO PCT/CN2018/080207 patent/WO2018171717A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016179533A1 (en) * | 2015-05-06 | 2016-11-10 | Indiana University Research And Technology Corporation | Sensor signal processing using an analog neural network |
CN106022468A (en) * | 2016-05-17 | 2016-10-12 | 成都启英泰伦科技有限公司 | Artificial neural network processor integrated circuit and design method therefor |
CN106447034A (en) * | 2016-10-27 | 2017-02-22 | 中国科学院计算技术研究所 | Neutral network processor based on data compression, design method and chip |
CN106529670A (en) * | 2016-10-27 | 2017-03-22 | 中国科学院计算技术研究所 | Neural network processor based on weight compression, design method, and chip |
Non-Patent Citations (2)
Title |
---|
YING WANG等: "DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family", 《DESIGN AUTOMATION CONFERENCE》 * |
叶莉娅等: "基于神经网络嵌入式系统体系结构的研究", 《杭州电子科技大学学报》 * |
Cited By (93)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018171717A1 (en) * | 2017-03-23 | 2018-09-27 | 中国科学院计算技术研究所 | Automated design method and system for neural network processor |
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US12020476B2 (en) | 2017-03-23 | 2024-06-25 | Tesla, Inc. | Data synthesis for autonomous control systems |
CN107341761A (en) * | 2017-07-12 | 2017-11-10 | 成都品果科技有限公司 | A kind of calculating of deep neural network performs method and system |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US12086097B2 (en) | 2017-07-24 | 2024-09-10 | Tesla, Inc. | Vector computational unit |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
CN107633295A (en) * | 2017-09-25 | 2018-01-26 | 北京地平线信息技术有限公司 | For the method and apparatus for the parameter for being adapted to neutral net |
US11461632B2 (en) | 2017-09-25 | 2022-10-04 | Nanjing Horizon Robotics Technology Co., Ltd. | Method and apparatus for adapting parameters of neural network |
CN109697509B (en) * | 2017-10-24 | 2020-10-20 | 上海寒武纪信息科技有限公司 | Processing method and device, and operation method and device |
CN109697509A (en) * | 2017-10-24 | 2019-04-30 | 上海寒武纪信息科技有限公司 | Processing method and processing device, operation method and device |
CN109726805A (en) * | 2017-10-30 | 2019-05-07 | 上海寒武纪信息科技有限公司 | The method for carrying out neural network processor design using black box simulator |
CN109726805B (en) * | 2017-10-30 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Method for designing neural network processor by using black box simulator |
US11521046B2 (en) | 2017-11-08 | 2022-12-06 | Samsung Electronics Co., Ltd. | Time-delayed convolutions for neural network device and method |
US11113104B2 (en) | 2017-11-20 | 2021-09-07 | Shanghai Cambricon Information Technology Co., Ltd | Task parallel processing method, apparatus and system, storage medium and computer device |
US11221877B2 (en) | 2017-11-20 | 2022-01-11 | Shanghai Cambricon Information Technology Co., Ltd | Task parallel processing method, apparatus and system, storage medium and computer device |
US11360811B2 (en) | 2017-11-20 | 2022-06-14 | Shanghai Cambricon Information Technology Co., Ltd | Task parallel processing method, apparatus and system, storage medium and computer device |
US11113103B2 (en) | 2017-11-20 | 2021-09-07 | Shanghai Cambricon Information Technology Co., Ltd | Task parallel processing method, apparatus and system, storage medium and computer device |
CN111582464B (en) * | 2017-12-29 | 2023-09-29 | 中科寒武纪科技股份有限公司 | Neural network processing method, computer system and storage medium |
CN111582464A (en) * | 2017-12-29 | 2020-08-25 | 中科寒武纪科技股份有限公司 | Neural network processing method, computer system, and storage medium |
KR20200100528A (en) * | 2017-12-29 | 2020-08-26 | 캠브리콘 테크놀로지스 코퍼레이션 리미티드 | Neural network processing method, computer system and storage medium |
CN109993288A (en) * | 2017-12-29 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Processing with Neural Network method, computer system and storage medium |
KR102720330B1 (en) | 2017-12-29 | 2024-10-22 | 캠브리콘 테크놀로지스 코퍼레이션 리미티드 | Neural network processing method, computer system and storage medium |
EP3629251A4 (en) * | 2017-12-29 | 2020-11-25 | Cambricon Technologies Corporation Limited | PROCESSING METHODS FOR NEURONAL NETWORK, COMPUTER SYSTEM AND STORAGE MEDIUM |
CN108563808A (en) * | 2018-01-05 | 2018-09-21 | 中国科学技术大学 | The design method of heterogeneous reconfigurable figure computation accelerator system based on FPGA |
CN108563808B (en) * | 2018-01-05 | 2020-12-04 | 中国科学技术大学 | Design Method of Heterogeneous Reconfigurable Graph Computation Accelerator System Based on FPGA |
CN108388943A (en) * | 2018-01-08 | 2018-08-10 | 中国科学院计算技术研究所 | A kind of pond device and method suitable for neural network |
CN108388943B (en) * | 2018-01-08 | 2020-12-29 | 中国科学院计算技术研究所 | A pooling device and method suitable for neural networks |
CN108154229B (en) * | 2018-01-10 | 2022-04-08 | 西安电子科技大学 | Image processing method based on FPGA accelerated convolutional neural network framework |
CN108154229A (en) * | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
CN108389183A (en) * | 2018-01-24 | 2018-08-10 | 上海交通大学 | Pulmonary nodule detects neural network accelerator and its control method |
CN110097179B (en) * | 2018-01-29 | 2020-03-10 | 上海寒武纪信息科技有限公司 | Computer device, data processing method, and storage medium |
CN110097179A (en) * | 2018-01-29 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Computer equipment, data processing method and storage medium |
CN110097180A (en) * | 2018-01-29 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Computer equipment, data processing method and storage medium |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
CN111868754A (en) * | 2018-03-23 | 2020-10-30 | 索尼公司 | Information processing apparatus and information processing method |
CN108921289B (en) * | 2018-06-20 | 2021-10-29 | 郑州云海信息技术有限公司 | A kind of FPGA heterogeneous acceleration method, device and system |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
CN108921289A (en) * | 2018-06-20 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of FPGA isomery accelerated method, apparatus and system |
US12079723B2 (en) | 2018-07-26 | 2024-09-03 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11983630B2 (en) | 2018-09-03 | 2024-05-14 | Tesla, Inc. | Neural networks for embedded devices |
CN110955380A (en) * | 2018-09-21 | 2020-04-03 | 中科寒武纪科技股份有限公司 | Access data generation method, storage medium, computer device and apparatus |
CN111079914B (en) * | 2018-10-19 | 2021-02-09 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079907A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
WO2020078446A1 (en) * | 2018-10-19 | 2020-04-23 | 中科寒武纪科技股份有限公司 | Computation method and apparatus, and related product |
CN111079909A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079914A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079924A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111078293A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111078293B (en) * | 2018-10-19 | 2021-03-16 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111079912A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079910A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111079911A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111079916A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
CN111144561A (en) * | 2018-11-05 | 2020-05-12 | 杭州海康威视数字技术股份有限公司 | Neural network model determining method and device |
CN111144561B (en) * | 2018-11-05 | 2023-05-02 | 杭州海康威视数字技术股份有限公司 | Neural network model determining method and device |
CN112912837B (en) * | 2018-11-08 | 2024-02-13 | 北京比特大陆科技有限公司 | Neural network compiling method, device, equipment, storage medium and program product |
CN112912837A (en) * | 2018-11-08 | 2021-06-04 | 北京比特大陆科技有限公司 | Neural network compiling method, device, equipment, storage medium and program product |
WO2020093885A1 (en) * | 2018-11-09 | 2020-05-14 | 北京灵汐科技有限公司 | Heterogeneous collaborative computing system |
US11908171B2 (en) | 2018-12-04 | 2024-02-20 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11681899B2 (en) | 2018-12-07 | 2023-06-20 | Samsong Electronics Co., Ltd. | Dividing neural networks |
CN111325311B (en) * | 2018-12-14 | 2024-03-29 | 深圳云天励飞技术有限公司 | Neural network model generation method for image recognition and related equipment |
CN111325311A (en) * | 2018-12-14 | 2020-06-23 | 深圳云天励飞技术有限公司 | Neural network model generation method, device, electronic device and storage medium |
CN109726797A (en) * | 2018-12-21 | 2019-05-07 | 北京中科寒武纪科技有限公司 | Data processing method, device, computer system and storage medium |
CN109685203A (en) * | 2018-12-21 | 2019-04-26 | 北京中科寒武纪科技有限公司 | Data processing method, device, computer system and storage medium |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US12136030B2 (en) | 2018-12-27 | 2024-11-05 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
CN109754073B (en) * | 2018-12-29 | 2020-03-10 | 中科寒武纪科技股份有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN109754084A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Processing method, device and the Related product of network structure |
CN109754073A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Data processing method, device, electronic equipment and readable storage medium storing program for executing |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
CN109978160A (en) * | 2019-03-25 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Configuration device, method and the Related product of artificial intelligence process device |
CN109739802A (en) * | 2019-04-01 | 2019-05-10 | 上海燧原智能科技有限公司 | Computing cluster and computing cluster configuration method |
US11734577B2 (en) | 2019-06-05 | 2023-08-22 | Samsung Electronics Co., Ltd | Electronic apparatus and method of performing operations thereof |
CN112052943A (en) * | 2019-06-05 | 2020-12-08 | 三星电子株式会社 | Electronic device and method for performing operation of the same |
WO2020246724A1 (en) * | 2019-06-05 | 2020-12-10 | Samsung Electronics Co., Ltd. | Electronic apparatus and method of performing operations thereof |
CN112132271A (en) * | 2019-06-25 | 2020-12-25 | Oppo广东移动通信有限公司 | Neural network accelerator operation method, architecture and related device |
CN115462079A (en) * | 2019-08-13 | 2022-12-09 | 深圳鲲云信息科技有限公司 | Neural network data stream acceleration method and device, computer equipment and storage medium |
CN111126572B (en) * | 2019-12-26 | 2023-12-08 | 北京奇艺世纪科技有限公司 | Model parameter processing method and device, electronic equipment and storage medium |
CN111126572A (en) * | 2019-12-26 | 2020-05-08 | 北京奇艺世纪科技有限公司 | Model parameter processing method and device, electronic equipment and storage medium |
CN111339027A (en) * | 2020-02-25 | 2020-06-26 | 中国科学院苏州纳米技术与纳米仿生研究所 | Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip |
CN111339027B (en) * | 2020-02-25 | 2023-11-28 | 中国科学院苏州纳米技术与纳米仿生研究所 | Reconfigurable artificial intelligence core and automatic design method for heterogeneous multi-core chips |
CN111488969B (en) * | 2020-04-03 | 2024-01-19 | 北京集朗半导体科技有限公司 | Execution optimization method and device based on neural network accelerator |
CN111488969A (en) * | 2020-04-03 | 2020-08-04 | 北京思朗科技有限责任公司 | Execution optimization method and device based on neural network accelerator |
CN111949405A (en) * | 2020-08-13 | 2020-11-17 | Oppo广东移动通信有限公司 | Resource scheduling method, hardware accelerator and electronic device |
CN111931926A (en) * | 2020-10-12 | 2020-11-13 | 南京风兴科技有限公司 | Hardware acceleration system and control method for convolutional neural network CNN |
Also Published As
Publication number | Publication date |
---|---|
CN107103113B (en) | 2019-01-11 |
WO2018171717A1 (en) | 2018-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107103113A (en) | Towards the Automation Design method, device and the optimization method of neural network processor | |
CN107016175B (en) | It is applicable in the Automation Design method, apparatus and optimization method of neural network processor | |
EP3884435A1 (en) | System and method for automated precision configuration for deep neural networks | |
CN112070202B (en) | Fusion graph generation method and device and computer readable storage medium | |
CN114035916B (en) | Compilation and scheduling methods of computational graphs and related products | |
CN108932135A (en) | The acceleration platform designing method of sorting algorithm based on FPGA | |
CN111563582A (en) | A method for implementing and optimizing accelerated convolutional neural network on FPGA | |
JP6503072B2 (en) | Semiconductor system and calculation method | |
CN116126341A (en) | Model compiling method, device, computer equipment and computer readable storage medium | |
Xu et al. | FCLNN: A flexible framework for fast CNN prototyping on FPGA with OpenCL and caffe | |
CN115345285B (en) | GPU-based timing chart neural network training method and system and electronic equipment | |
CN114968362B (en) | Heterogeneous fusion computing instruction set and method of use | |
CN105700933A (en) | Parallelization and loop optimization method and system for a high-level language of reconfigurable processor | |
CN104239630B (en) | A kind of emulation dispatch system of supportive test design | |
CN111667060B (en) | Deep learning algorithm compiling method and device and related products | |
WO2023030507A1 (en) | Compilation optimization method and apparatus, computer device and storage medium | |
CN116402091A (en) | Hybrid engine intelligent computing method and device for artificial intelligent chip | |
Ali et al. | RISC-V based MPSoC design exploration for FPGAs: area, power and performance | |
Odetola et al. | 2l-3w: 2-level 3-way hardware–software co-verification for the mapping of convolutional neural network (cnn) onto fpga boards | |
CN114127681B (en) | Method and apparatus for autonomous acceleration of data stream AI applications | |
CN105893660B (en) | A kind of CPU design method and computing system towards symbol BDD operations | |
CN115858092A (en) | Time sequence simulation method, device and system | |
CN111143208B (en) | Verification method for assisting FPGA to realize AI algorithm based on processor technology | |
CN114691457A (en) | A method, apparatus, storage medium and electronic device for determining hardware performance | |
Kuga et al. | Streaming Accelerator Design for Regular Expression on CPU+ FPGA Embedded System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |