Summary of the invention
In view of the deficiencies of the prior art, the present invention proposes the Automation Design method, apparatus towards neural network processor
And optimization method.
The present invention proposes a kind of the Automation Design method towards neural network processor, comprising:
Step 1, it obtains neural network model and describes file, hardware resource constraints parameter, wherein the hardware resource constraints
Parameter includes hardware resource size and object run speed;
Step 2, file and the hardware resource constraints parameter are described according to the neural network model, from the mind constructed
Through searching unit library in networking component library, and the neural network for corresponding to the neural network model is generated according to the cell library
The hardware description language code of processor;
Step 3, the hardware description language code is converted to the hardware circuit of the neural network processor.
The neural network processor includes storage organization, control structure, calculates structure.
It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file,
Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute
Stating link information includes connection name, connection direction, connection type.
The neural network reusable cell library includes hardware description file and configuration script two parts.
The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list
Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.
The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list
Member.
It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to
Neural network middle layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape
The mode of state machine describes;
Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.
Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code,
Generate data storage mapping and control instruction stream.
The invention also includes a kind of the Automation Design device towards neural network processor, comprising:
Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein described hard
Part resource constraint parameter includes hardware resource size and object run speed;
Hardware description language code module is generated, for describing file and hardware money according to the neural network model
Source constrained parameters, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to institute
State the hardware description language code of the neural network processor of neural network model;
Hardware circuit module is generated, for converting the neural network processor for the hardware description language code
Hardware circuit.
The neural network processor includes storage organization, control structure, calculates structure.
It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file,
Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute
Stating link information includes connection name, connection direction, connection type.
The neural network reusable cell library includes hardware description file and configuration script two parts.
The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list
Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.
The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list
Member.
It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to
Neural network middle layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape
The mode of state machine describes;
Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.
Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code,
Generate data storage mapping and control instruction stream.
The present invention also proposes a kind of based on a kind of the Automation Design method towards neural network processor as mentioned
Optimization method, comprising:
Step 1, defining convolution kernel size is k*k, stepping s, memory width d, and datagram number is t, if k^2
Data are divided into the data block of k*k size by=d^2, and data width is consistent with memory width, guarantee data in memory
Coutinuous store;
Step 2, if k^2!=d^2, and stepping s is the greatest common divisor of k Yu memory width d, and data are divided
For the data block of s*s size, guarantee in a datagram data Coutinuous store in memory;
Step 3, if above two are all unsatisfactory for, the greatest common divisor f of stepping s, k, memory width d is found out, will be counted
According to the data block that size is f*f is divided into, t datagrams are alternately stored.
As it can be seen from the above scheme the present invention has the advantages that
The present invention neural network model can be mapped as hardware circuit and according to hardware resource constraints and network characterization from
Dynamic optimization circuit structure and data storage method, while corresponding control instruction stream is generated, realize neural network hardware acceleration
The hardware of device and software automation collaborative design improve neural network while shortening the neural network processor design cycle
Processor operation energy efficiency.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear
Crossing specific embodiment, the present invention is described in more detail, it should be understood that specific embodiment described herein is only to explain
The present invention is not intended to limit the present invention.
The present invention is intended to provide the Automation Design method, apparatus and optimization method towards neural network processor, the dress
Set including a hardware generator and a compiler, the hardware generator can according to neural network type and hardware resource constraints from
The dynamic hardware description language code for generating neural network processor, subsequent designer are logical using existing hardware circuit design method
It crosses hardware description language and generates processor hardware circuit;The compiler can be generated according to neural network processor circuit structure and be controlled
System and data dispatch command stream.
Fig. 1 is that neural network processor provided by the invention automates generation technique schematic diagram, specific steps are as follows:
Step 1, apparatus of the present invention read neural network model and describe file, include in description file network topology structure and
Each operation layer definition;
Step 2, apparatus of the present invention read in hardware resource constraints parameter, and hardware constraints parameter includes hardware resource size and mesh
Speed of service etc. is marked, apparatus of the present invention can generate corresponding circuit structure according to hardware constraints parameter;
Step 3, apparatus of the present invention are according to the neural network model description script and hardware resource constraints from having been built up
Suitable cell library is indexed in good neural network component library, the hardware circuit generator which is included utilizes said units
Library generates the neural network processor hardware description language code of the corresponding neural network model;
Step 4, the compiler that apparatus of the present invention are included is constrained and is generated hard according to neural network model, logical resource
Part description language code building data storage mapping and control instruction stream;
Step 5, hardware circuit is converted by hardware description language by existing hardware design methods.
The neural network processor that the present invention can automatically generate is based on storage-control-calculating structure;
Storage organization is used to store data, neural network weight and the coprocessor operation instruction for participating in calculating;
Control structure includes that decoding circuit and control logic circuit generate for parsing operational order and control signal, the letter
Scheduling and storage and neural computing process number for data in control sheet;
Calculating structure includes computing unit, for participating in the operation of the neural computing in the processor.
Fig. 2 is 101 schematic diagram of neural network processor system that the present invention can automatically generate, the neural network processor system
101 frameworks of uniting are made of seven parts, including input data storage unit 102, control unit 103, output data storage unit
104, weight storage unit 105, the location of instruction 106, computing unit 107.
Input data storage unit 102 is used to store the data for participating in calculating, the data include primitive character diagram data with
Participate in the data that middle layer calculates;Output data storage unit 104 stores the neuron response being calculated;Instruction storage is single
106 storage of member participates in the command information calculated, and instruction is resolved to control stream to dispatch neural computing;Weight storage unit
105 for storing trained neural network weight;
Control unit 103 respectively with output data storage unit 104, weight storage unit 105, the location of instruction 106,
Computing unit 107 is connected, and control unit 103 obtains the instruction being stored in the location of instruction 106 and parses the instruction, controls
Unit 103 processed can carry out neural computing according to the control signal control computing unit analyzed the instruction.
Computing unit 107 is used to execute corresponding neural computing according to the control signal that control unit 103 generates.
Computing unit 107 is associated with one or more storage units, and computing unit 107 can be deposited from input data associated there
Data storage part in storage unit 102 obtains data to be calculated, and can deposit to output data associated there
Data are written in storage unit 104.Computing unit 107 completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc..
The present invention describes neural network model feature by providing the neural network description file format, this describes file
Content includes essential attribute, parameter description and link information three parts, and wherein essential attribute includes layer name and channel type, parameter
Description includes the output number of plies, convolution kernel size and step size, and link information includes connection name, connection direction, connection class
Type.
In order to adapt to the hardware design of various neural network models, neural network reusable unit provided by the invention
Library such as Fig. 3, cell library include hardware description file and configuration script two parts.Reusable cell library provided by the invention include but
It is not limited to: neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, look-up table
Unit, scalar/vector, control unit etc..
The present invention is when constituting neural network processor system using above-mentioned reusable cell library, by reading neural network
Model describes file and hardware resource constraints reasonably optimizing call unit library.
In the neural network processor course of work, processor needs the automatic ground for obtaining on piece and chip external memory data
Location stream, in the present invention, storage address stream determines generation by compiler, the memory access mould determined by storage address stream
For formula by text interaction to hardware generator, memory access patterns include that main access module, data access patterns and weight are visited
Ask mode etc..
Hardware generator is according to the memory access patterns scalar/vector (AGU).
The neural network processor circuit packet designed using neural network processor design aids provided by the invention
Include the scalar/vector of three types, comprising: main scalar/vector, data address generation unit and weight address generate single
Member, wherein main scalar/vector is responsible for the data exchange between on-chip memory and chip external memory, and data address generates single
Member is responsible for reading data to computing unit from on-chip memory and by computing unit results of intermediate calculations and final calculation result
Store to this two parts data exchange of storage unit, weight scalar/vector be responsible for reading from on-chip memory weighted data to
Computing unit.
In the present invention, hardware circuit generator and compiler, which cooperate, realizes the design of address generating circuit, specifically
Algorithm for design step are as follows:
Step 1, the neural network model and hardware constraints that apparatus of the present invention are specified according to designer determine data path,
And data resource sharing mode is determined according to neural network middle layer feature;
Step 2, compiler generates storage address access stream, the address access stream according to hardware configuration and network characterization
It is described by way of finite state machine by compiler;
Step 3, the finite state machine is mapped as address generating circuit hardware description language, Jin Ertong by hardware generator
It crosses hardware circuit design method and is mapped as hardware circuit.
Fig. 4 is address generating circuit universal architecture schematic diagram provided by the invention.Address generating circuit tool of the present invention
There is universal signal interface, the interface signal which includes has:
Starting address signal, i.e. data first address;
Block size signal takes the data volume of a data;
Memory flag signal determines that the memory for storing data is numbered;
Operating mode signal is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full volume
Product module formula etc.;
Convolution kernel size signal defines convolution kernel size;
Length signals, definition output picture size;
Input number of layers signal, label input number of layers;
Export number of layers signal, label output number of layers;
Reset signal, when which is 1, initialization address generative circuit;
Write enable signal specifies accessed memory to carry out write operation;
Enable signal is read, accessed memory is specified to carry out read operation;
Address signal provides access storage address;
End signal accesses end signal.
The parameter ensures that AGU supports multiple-working mode and guarantees in different working modes and neural network communication process
In can generate correct read/write address stream.
For different target networks, tool is chosen necessary parameter building address generator and is provided from the template
On piece and chip external memory access module.
Neural network processor provided by the invention constructs processor architecture using the mode of data-driven, therefore described
Location generative circuit access address is not only provided and also the different nervous layers of driving and and layer data block execution.
Due to the limitation of resource constraint, neural network model can not describe shape according to its model when being mapped as hardware circuit
Formula is completely unfolded, thus design aids proposed by the present invention using cooperative work of software and hardware method optimizing data storage and
Access mechanism, including two parts content: firstly, the calculating handling capacity and on-chip memory of compiler analysis neural network processor
Neural network characteristics data and weighted data are divided into set of data blocks appropriate and store and access by size;Secondly, according to meter
It calculates unit scale, memory and data bit width and carries out data segmentation in data block.
The present invention is based on above-mentioned Optimization Mechanisms to propose a kind of optimization method data storage and accessed, specific implementation step
Are as follows:
Step 1, defining convolution kernel size is k*k, stepping s, memory width d, and datagram number is t, if k^2
Data are divided into the data block of k*k size by=d^2, and data width is consistent with memory width, guarantee data in memory
Coutinuous store;
Step 2, if k^2!=d^2, and s is the greatest common divisor of k and d, and data are divided into the data of s*s size
Block guarantees that in a datagram, data can Coutinuous store in memory;
Step 3, if above two are all unsatisfactory for, the greatest common divisor f of s, k, d are found out, data, which are divided into size, is
The data block of f*f, t datagrams alternately store.
The calculating data of neural network include input feature vector data and trained weighted data, are deposited by good data
Storage layout can reduce processor internal data bandwidth and improve memory space utilization efficiency.Automated Design work provided by the invention
Tool stores the computational efficiency that locality improves processor by increasing processor data.
In conclusion the present invention provides a the Automation Design tool towards neural network processor, which has
The hardware identification code of description neural network processor is mapped as, according to hardware resource constraints optimized processor frame from neural network model
Structure flows the functions such as instruction with control is automatically generated, and realizes the Automation Design of neural network processor, reduces neural network
The design cycle of processor has adapted to nerual network technique network model updating decision, arithmetic speed requires block, energy efficiency requirement
High application characteristic.
Although not each embodiment only includes one it should be appreciated that this specification describes according to various embodiments
A independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should will say
As a whole, the technical solutions in the various embodiments may also be suitably combined for bright book, and forming those skilled in the art can be with
The other embodiments of understanding.
The present invention also proposes a kind of the Automation Design device towards neural network processor, comprising:
Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein described hard
Part resource constraint parameter includes hardware resource size and object run speed;
Hardware description language code module is generated, for describing file and hardware money according to the neural network model
Source constrained parameters, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to institute
State the hardware description language code of the neural network processor of neural network model;
Hardware circuit module is generated, for converting the neural network processor for the hardware description language code
Hardware circuit.
The neural network processor includes storage organization, control structure, calculates structure.
It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file,
Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute
Stating link information includes connection name, connection direction, connection type.
The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list
Member.
It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to
Neural network middle layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape
The mode of state machine describes;
The neural network reusable cell library includes hardware description file and configuration script two parts.
The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list
Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.
The finite state machine is mapped as address, and generates hardware description language code, and then is converted into the nerve
The hardware circuit of network processing unit.
Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code,
Generate data storage mapping and control instruction stream.
The foregoing is merely the schematical specific embodiment of the present invention, the range being not intended to limit the invention.It is any
Those skilled in the art, made equivalent variations, modification and combination under the premise of not departing from design and the principle of the present invention,
It should belong to the scope of protection of the invention.