CN107103113B

CN107103113B - The Automation Design method, apparatus and optimization method towards neural network processor

Info

Publication number: CN107103113B
Application number: CN201710178281.3A
Authority: CN
Inventors: 韩银和; 许浩博; 王颖
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2017-03-23
Filing date: 2017-03-23
Publication date: 2019-01-11
Anticipated expiration: 2037-03-23
Also published as: WO2018171717A1; CN107103113A

Abstract

The present invention provides an automatic design method, device and optimization method for a neural network processor. The method includes step 1: obtaining a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target running speed; step 2, according to the neural network model description file and the hardware resource constraint parameters, find a unit library from the built neural network component library, and generate a corresponding neural network model according to the unit library The hardware description language code of the neural network processor; Step 3, the hardware description language code is converted into the hardware circuit of the neural network processor.

Description

The Automation Design method, apparatus and optimization method towards neural network processor

Technical field

The present invention relates to neural network processor architecture technique fields, in particular to towards neural network processor The Automation Design method, apparatus and optimization method.

Background technique

The rapid development of deep learning and nerual network technique handles task for large-scale data and provides new solution way Diameter, various new neural network models have outstanding performance on handling complicated abstract problem, in visual pattern processing, voice The new application in the fields such as identification and intelligent robot emerges one after another.

It is analyzed currently with deep neural network progress real-time task and relies on extensive high-performance processor or general mostly Graphics processor, these equipment cost high power consumptions are big, towards portable intelligent device in application, there are circuit scales big, energy A series of problems, such as consumption height and valuable product.Therefore, it is answered for embedded device and small low-cost data center etc. The application handled in real time with high energy efficiency in field is accelerated using dedicated neural network processor rather than carries out mind by the way of software A kind of more effective solution is calculated as through network model, however the topological structure of neural network model and parameter designing meeting Changed according to different application scenarios, in addition quickly, providing one kind can be towards for the development change speed of neural network model The various application scenarios and Universal efficient neural network processor for covering various neural network models is extremely difficult, this is answered for high level With developer for the hardware-accelerated solution of different application Demand Design bring greatly it is constant.

Current existing neural network hardware acceleration technique includes specific integrated circuit (Application Specific Integrated Circuit, ASIC) chip and field programmable gate array (Field Programmable Gate Array, FPGA) two ways.Under same process conditions, the asic chip speed of service is fast and low in energy consumption, but design cycle is complicated, throws piece Period is long, development cost is high, can not adapt to the characteristics of neural network model quickly updates；FPGA is flexible with circuit configuration, opens Period short feature is sent out, but the speed of service is relatively low, hardware spending and power consumption are relatively large.Which kind of no matter added using above-mentioned hardware Fast technology is required to neural network model and algorithm development personnel and grasps while awareness network topology and pattern of traffic firmly The links such as part development technique, including processor architecture design, hardware identification code are write, simulating, verifying and placement-and-routing, these technologies For being absorbed in researching neural network model and structure design, the higher layer applications developer without having hardware design ability Development difficulty is higher.Therefore, in order to make high-rise developer efficiently carry out nerual network technique application and development, provide it is a kind of towards The neural network processor the Automation Design method and tool of a variety of neural network models are very urgent.

Summary of the invention

In view of the deficiencies of the prior art, the present invention proposes the Automation Design method, apparatus towards neural network processor And optimization method.

The present invention proposes a kind of the Automation Design method towards neural network processor, comprising:

Step 1, it obtains neural network model and describes file, hardware resource constraints parameter, wherein the hardware resource constraints Parameter includes hardware resource size and object run speed；

Step 2, file and the hardware resource constraints parameter are described according to the neural network model, from the mind constructed Through searching unit library in networking component library, and the neural network for corresponding to the neural network model is generated according to the cell library The hardware description language code of processor；

Step 3, the hardware description language code is converted to the hardware circuit of the neural network processor.

The neural network processor includes storage organization, control structure, calculates structure.

It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file, Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute Stating link information includes connection name, connection direction, connection type.

The neural network reusable cell library includes hardware description file and configuration script two parts.

The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.

The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list Member.

It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to Neural network middle layer feature determines data resource sharing mode；

It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape The mode of state machine describes；

Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.

Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code, Generate data storage mapping and control instruction stream.

The invention also includes a kind of the Automation Design device towards neural network processor, comprising:

Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein described hard Part resource constraint parameter includes hardware resource size and object run speed；

Hardware description language code module is generated, for describing file and hardware money according to the neural network model Source constrained parameters, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to institute State the hardware description language code of the neural network processor of neural network model；

Hardware circuit module is generated, for converting the neural network processor for the hardware description language code Hardware circuit.

The present invention also proposes a kind of based on a kind of the Automation Design method towards neural network processor as mentioned Optimization method, comprising:

Step 1, defining convolution kernel size is k*k, stepping s, memory width d, and datagram number is t, if k^2 Data are divided into the data block of k*k size by=d^2, and data width is consistent with memory width, guarantee data in memory Coutinuous store；

Step 2, if k^2！=d^2, and stepping s is the greatest common divisor of k Yu memory width d, and data are divided For the data block of s*s size, guarantee in a datagram data Coutinuous store in memory；

Step 3, if above two are all unsatisfactory for, the greatest common divisor f of stepping s, k, memory width d is found out, will be counted According to the data block that size is f*f is divided into, t datagrams are alternately stored.

As it can be seen from the above scheme the present invention has the advantages that

The present invention neural network model can be mapped as hardware circuit and according to hardware resource constraints and network characterization from Dynamic optimization circuit structure and data storage method, while corresponding control instruction stream is generated, realize neural network hardware acceleration The hardware of device and software automation collaborative design improve neural network while shortening the neural network processor design cycle Processor operation energy efficiency.

Detailed description of the invention

Fig. 1 is the automatic implementation tool work flow diagram of FPGA of neural network processor provided by the invention；

Fig. 2 is the neural network processor system schematic that invention can automatically generate；

Fig. 3 is the neural network reusable cell library schematic diagram that the present invention uses；

Fig. 4 is the address generating circuit interface diagram that the present invention uses.

Specific embodiment

It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail, it should be understood that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.

The present invention is intended to provide the Automation Design method, apparatus and optimization method towards neural network processor, the dress Set including a hardware generator and a compiler, the hardware generator can according to neural network type and hardware resource constraints from The dynamic hardware description language code for generating neural network processor, subsequent designer are logical using existing hardware circuit design method It crosses hardware description language and generates processor hardware circuit；The compiler can be generated according to neural network processor circuit structure and be controlled System and data dispatch command stream.

Fig. 1 is that neural network processor provided by the invention automates generation technique schematic diagram, specific steps are as follows:

Step 1, apparatus of the present invention read neural network model and describe file, include in description file network topology structure and Each operation layer definition；

Step 2, apparatus of the present invention read in hardware resource constraints parameter, and hardware constraints parameter includes hardware resource size and mesh Speed of service etc. is marked, apparatus of the present invention can generate corresponding circuit structure according to hardware constraints parameter；

Step 3, apparatus of the present invention are according to the neural network model description script and hardware resource constraints from having been built up Suitable cell library is indexed in good neural network component library, the hardware circuit generator which is included utilizes said units Library generates the neural network processor hardware description language code of the corresponding neural network model；

Step 4, the compiler that apparatus of the present invention are included is constrained and is generated hard according to neural network model, logical resource Part description language code building data storage mapping and control instruction stream；

Step 5, hardware circuit is converted by hardware description language by existing hardware design methods.

The neural network processor that the present invention can automatically generate is based on storage-control-calculating structure；

Storage organization is used to store data, neural network weight and the coprocessor operation instruction for participating in calculating；

Control structure includes that decoding circuit and control logic circuit generate for parsing operational order and control signal, the letter Scheduling and storage and neural computing process number for data in control sheet；

Calculating structure includes computing unit, for participating in the operation of the neural computing in the processor.

Fig. 2 is 101 schematic diagram of neural network processor system that the present invention can automatically generate, the neural network processor system 101 frameworks of uniting are made of seven parts, including input data storage unit 102, control unit 103, output data storage unit 104, weight storage unit 105, the location of instruction 106, computing unit 107.

Input data storage unit 102 is used to store the data for participating in calculating, the data include primitive character diagram data with Participate in the data that middle layer calculates；Output data storage unit 104 stores the neuron response being calculated；Instruction storage is single 106 storage of member participates in the command information calculated, and instruction is resolved to control stream to dispatch neural computing；Weight storage unit 105 for storing trained neural network weight；

Control unit 103 respectively with output data storage unit 104, weight storage unit 105, the location of instruction 106, Computing unit 107 is connected, and control unit 103 obtains the instruction being stored in the location of instruction 106 and parses the instruction, controls Unit 103 processed can carry out neural computing according to the control signal control computing unit analyzed the instruction.

Computing unit 107 is used to execute corresponding neural computing according to the control signal that control unit 103 generates. Computing unit 107 is associated with one or more storage units, and computing unit 107 can be deposited from input data associated there Data storage part in storage unit 102 obtains data to be calculated, and can deposit to output data associated there Data are written in storage unit 104.Computing unit 107 completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc..

The present invention describes neural network model feature by providing the neural network description file format, this describes file Content includes essential attribute, parameter description and link information three parts, and wherein essential attribute includes layer name and channel type, parameter Description includes the output number of plies, convolution kernel size and step size, and link information includes connection name, connection direction, connection class Type.

In order to adapt to the hardware design of various neural network models, neural network reusable unit provided by the invention Library such as Fig. 3, cell library include hardware description file and configuration script two parts.Reusable cell library provided by the invention include but It is not limited to: neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, look-up table Unit, scalar/vector, control unit etc..

The present invention is when constituting neural network processor system using above-mentioned reusable cell library, by reading neural network Model describes file and hardware resource constraints reasonably optimizing call unit library.

In the neural network processor course of work, processor needs the automatic ground for obtaining on piece and chip external memory data Location stream, in the present invention, storage address stream determines generation by compiler, the memory access mould determined by storage address stream For formula by text interaction to hardware generator, memory access patterns include that main access module, data access patterns and weight are visited Ask mode etc..

Hardware generator is according to the memory access patterns scalar/vector (AGU).

The neural network processor circuit packet designed using neural network processor design aids provided by the invention Include the scalar/vector of three types, comprising: main scalar/vector, data address generation unit and weight address generate single Member, wherein main scalar/vector is responsible for the data exchange between on-chip memory and chip external memory, and data address generates single Member is responsible for reading data to computing unit from on-chip memory and by computing unit results of intermediate calculations and final calculation result Store to this two parts data exchange of storage unit, weight scalar/vector be responsible for reading from on-chip memory weighted data to Computing unit.

In the present invention, hardware circuit generator and compiler, which cooperate, realizes the design of address generating circuit, specifically Algorithm for design step are as follows:

Step 1, the neural network model and hardware constraints that apparatus of the present invention are specified according to designer determine data path, And data resource sharing mode is determined according to neural network middle layer feature；

Step 2, compiler generates storage address access stream, the address access stream according to hardware configuration and network characterization It is described by way of finite state machine by compiler；

Step 3, the finite state machine is mapped as address generating circuit hardware description language, Jin Ertong by hardware generator It crosses hardware circuit design method and is mapped as hardware circuit.

Fig. 4 is address generating circuit universal architecture schematic diagram provided by the invention.Address generating circuit tool of the present invention There is universal signal interface, the interface signal which includes has:

Starting address signal, i.e. data first address；

Block size signal takes the data volume of a data；

Memory flag signal determines that the memory for storing data is numbered；

Operating mode signal is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full volume Product module formula etc.；

Convolution kernel size signal defines convolution kernel size；

Length signals, definition output picture size；

Input number of layers signal, label input number of layers；

Export number of layers signal, label output number of layers；

Reset signal, when which is 1, initialization address generative circuit；

Write enable signal specifies accessed memory to carry out write operation；

Enable signal is read, accessed memory is specified to carry out read operation；

Address signal provides access storage address；

End signal accesses end signal.

The parameter ensures that AGU supports multiple-working mode and guarantees in different working modes and neural network communication process In can generate correct read/write address stream.

For different target networks, tool is chosen necessary parameter building address generator and is provided from the template On piece and chip external memory access module.

Neural network processor provided by the invention constructs processor architecture using the mode of data-driven, therefore described Location generative circuit access address is not only provided and also the different nervous layers of driving and and layer data block execution.

Due to the limitation of resource constraint, neural network model can not describe shape according to its model when being mapped as hardware circuit Formula is completely unfolded, thus design aids proposed by the present invention using cooperative work of software and hardware method optimizing data storage and Access mechanism, including two parts content: firstly, the calculating handling capacity and on-chip memory of compiler analysis neural network processor Neural network characteristics data and weighted data are divided into set of data blocks appropriate and store and access by size；Secondly, according to meter It calculates unit scale, memory and data bit width and carries out data segmentation in data block.

The present invention is based on above-mentioned Optimization Mechanisms to propose a kind of optimization method data storage and accessed, specific implementation step Are as follows:

Step 2, if k^2！=d^2, and s is the greatest common divisor of k and d, and data are divided into the data of s*s size Block guarantees that in a datagram, data can Coutinuous store in memory；

Step 3, if above two are all unsatisfactory for, the greatest common divisor f of s, k, d are found out, data, which are divided into size, is The data block of f*f, t datagrams alternately store.

The calculating data of neural network include input feature vector data and trained weighted data, are deposited by good data Storage layout can reduce processor internal data bandwidth and improve memory space utilization efficiency.Automated Design work provided by the invention Tool stores the computational efficiency that locality improves processor by increasing processor data.

In conclusion the present invention provides a the Automation Design tool towards neural network processor, which has The hardware identification code of description neural network processor is mapped as, according to hardware resource constraints optimized processor frame from neural network model Structure flows the functions such as instruction with control is automatically generated, and realizes the Automation Design of neural network processor, reduces neural network The design cycle of processor has adapted to nerual network technique network model updating decision, arithmetic speed requires block, energy efficiency requirement High application characteristic.

Although not each embodiment only includes one it should be appreciated that this specification describes according to various embodiments A independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should will say As a whole, the technical solutions in the various embodiments may also be suitably combined for bright book, and forming those skilled in the art can be with The other embodiments of understanding.

The present invention also proposes a kind of the Automation Design device towards neural network processor, comprising:

The finite state machine is mapped as address, and generates hardware description language code, and then is converted into the nerve The hardware circuit of network processing unit.

The foregoing is merely the schematical specific embodiment of the present invention, the range being not intended to limit the invention.It is any Those skilled in the art, made equivalent variations, modification and combination under the premise of not departing from design and the principle of the present invention, It should belong to the scope of protection of the invention.

Claims

1. a kind of the Automation Design method towards neural network processor characterized by comprising

Step 1, it obtains neural network model and describes file, hardware resource constraints parameter, wherein the hardware resource constraints parameter Including hardware resource size and object run speed；

Step 2, file and the hardware resource constraints parameter are described according to the neural network model, from the nerve net constructed Searching unit library in network Component Gallery, and the Processing with Neural Network for corresponding to the neural network model is generated according to the cell library The hardware description language code of device；

Step 3, the neural network model and hardware resource constraints parameter specified according to user determine data path, and according to nerve Network middle layer feature determines data resource sharing mode, and compiler generates storage address according to hardware configuration and network characterization Access stream, hardware circuit generator and compiler, which cooperate, realizes address generating circuit；Address generating circuit includes Working mould Formula signal is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full convolution mode；Address generates Circuit further includes block size signal, takes the data volume of a data；

Step 4, the hardware description language code is converted to the hardware circuit of the neural network processor；

Wherein step 4 further include:, will be neural according to the calculating handling capacity and on-chip memory size of the neural network processor Network characterization data and weighted data, which are divided into set of data blocks, to be stored and accessed, and the meter according to the neural network processor Unit scale, memory and data bit width are calculated, data segmentation is carried out in data block.

2. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Neural network processor includes storage organization, control structure, calculates structure.

3. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described It includes essential attribute, parameter description and link information three parts that neural network model, which describes file, wherein the essential attribute packet Layer name and channel type are included, the parameter description includes the output number of plies, convolution kernel size and step size, the link information packet Include connection name, connection direction, connection type.

4. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Cell library includes hardware description file and configuration script two parts.

5. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Cell library includes neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, look-up table Unit, scalar/vector, control unit.

6. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Neural network processor includes main scalar/vector, data address generation unit and weight scalar/vector.

7. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that also wrap Include the neural network model specified according to user and hardware resource constraints parameter determine data path, and according to neural network among Layer feature determines data resource sharing mode；

It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through finite state machine Mode describe；

8. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that also wrap It includes according to the neural network model, the hardware resource constraints parameter, the hardware description language code, generates data storage Mapping and control instruction stream.

9. a kind of the Automation Design device towards neural network processor characterized by comprising

Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein the hardware provides Source constrained parameters include hardware resource size and object run speed；

Hardware description language code module is generated, for describing file and the hardware resource about according to the neural network model Beam parameter, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to the mind The hardware description language code of neural network processor through network model；

Hardware/Software Collaborative Design module, the neural network model specified according to user and hardware resource constraints parameter determine data road Diameter, and data resource sharing mode is determined according to neural network middle layer feature, compiler is according to hardware configuration and network characterization Storage address access stream is generated, hardware circuit generator and compiler cooperate and realize address generating circuit；Address generates Circuit includes operating mode signal, is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full volume Product module formula；Address generating circuit further includes block size signal, takes the data volume of a data；

Hardware circuit module is generated, for converting the hardware description language code on the hardware of the neural network processor Circuit；

Wherein generate hardware circuit module further include: according to the calculating handling capacity and on-chip memory of the neural network processor Neural network characteristics data and weighted data are divided into set of data blocks and store and access by size, and according to the nerve net Computing unit scale, memory and the data bit width of network processor carry out data segmentation in data block.

10. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Neural network processor is stated to include storage organization, control structure, calculate structure.

11. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Stating neural network model and describing file includes essential attribute, parameter description and link information three parts, wherein the essential attribute Including layer name and channel type, the parameter description includes the output number of plies, convolution kernel size and step size, the link information Including connection name, connection direction, connection type.

12. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Stating cell library includes hardware description file and configuration script two parts.

13. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Cell library is stated to include neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, search Table unit, scalar/vector, control unit.

14. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Stating neural network processor includes main scalar/vector, data address generation unit and weight scalar/vector.

15. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that also Data path is determined including the neural network model specified according to user and hardware resource constraints parameter, and according in neural network Interbed feature determines data resource sharing mode；

16. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that also Including generating data and depositing according to the neural network model, the hardware resource constraints parameter, the hardware description language code Storage mapping and control instruction stream.

17. a kind of a kind of the Automation Design towards neural network processor based on as described in claim 1-8 any one The optimization method of method characterized by comprising

Definition convolution kernel size is k*k, stepping s, memory width d, and datagram number is that t will be counted if k^2=d^2 According to the data block for being divided into k*k size, data width is consistent with memory width, guarantees data Coutinuous store in memory；

If k^2！=d^2, and stepping s is the greatest common divisor of k Yu memory width d, and data are divided into s*s size Data block guarantees in a datagram data Coutinuous store in memory；

If k^2！=d^2, and stepping s is not the greatest common divisor of k Yu memory width d, then finds out stepping s, k, storage Data are divided into the data block that size is f*f by the greatest common divisor f of device width d, and t datagrams alternately store.