[go: up one dir, main page]

CN107103113B - The Automation Design method, apparatus and optimization method towards neural network processor - Google Patents

The Automation Design method, apparatus and optimization method towards neural network processor Download PDF

Info

Publication number
CN107103113B
CN107103113B CN201710178281.3A CN201710178281A CN107103113B CN 107103113 B CN107103113 B CN 107103113B CN 201710178281 A CN201710178281 A CN 201710178281A CN 107103113 B CN107103113 B CN 107103113B
Authority
CN
China
Prior art keywords
neural network
data
hardware
network processor
automation design
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710178281.3A
Other languages
Chinese (zh)
Other versions
CN107103113A (en
Inventor
韩银和
许浩博
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201710178281.3A priority Critical patent/CN107103113B/en
Publication of CN107103113A publication Critical patent/CN107103113A/en
Priority to PCT/CN2018/080207 priority patent/WO2018171717A1/en
Application granted granted Critical
Publication of CN107103113B publication Critical patent/CN107103113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Devices For Executing Special Programs (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

本发明提出一种面向神经网络处理器的自动化设计方法、装置及优化方法,该方法包括步骤1,获取神经网络模型描述文件、硬件资源约束参数,其中所述硬件资源约束参数包括硬件资源大小及目标运行速度;步骤2,根据所述神经网络模型描述文件与所述硬件资源约束参数,从已构建的神经网络组件库中查找单元库,并根据所述单元库生成对应于所述神经网络模型的神经网络处理器的硬件描述语言代码;步骤3,将所述硬件描述语言代码转化为所述神经网络处理器的硬件电路。

The present invention provides an automatic design method, device and optimization method for a neural network processor. The method includes step 1: obtaining a neural network model description file and hardware resource constraint parameters, wherein the hardware resource constraint parameters include hardware resource size and target running speed; step 2, according to the neural network model description file and the hardware resource constraint parameters, find a unit library from the built neural network component library, and generate a corresponding neural network model according to the unit library The hardware description language code of the neural network processor; Step 3, the hardware description language code is converted into the hardware circuit of the neural network processor.

Description

The Automation Design method, apparatus and optimization method towards neural network processor
Technical field
The present invention relates to neural network processor architecture technique fields, in particular to towards neural network processor The Automation Design method, apparatus and optimization method.
Background technique
The rapid development of deep learning and nerual network technique handles task for large-scale data and provides new solution way Diameter, various new neural network models have outstanding performance on handling complicated abstract problem, in visual pattern processing, voice The new application in the fields such as identification and intelligent robot emerges one after another.
It is analyzed currently with deep neural network progress real-time task and relies on extensive high-performance processor or general mostly Graphics processor, these equipment cost high power consumptions are big, towards portable intelligent device in application, there are circuit scales big, energy A series of problems, such as consumption height and valuable product.Therefore, it is answered for embedded device and small low-cost data center etc. The application handled in real time with high energy efficiency in field is accelerated using dedicated neural network processor rather than carries out mind by the way of software A kind of more effective solution is calculated as through network model, however the topological structure of neural network model and parameter designing meeting Changed according to different application scenarios, in addition quickly, providing one kind can be towards for the development change speed of neural network model The various application scenarios and Universal efficient neural network processor for covering various neural network models is extremely difficult, this is answered for high level With developer for the hardware-accelerated solution of different application Demand Design bring greatly it is constant.
Current existing neural network hardware acceleration technique includes specific integrated circuit (Application Specific Integrated Circuit, ASIC) chip and field programmable gate array (Field Programmable Gate Array, FPGA) two ways.Under same process conditions, the asic chip speed of service is fast and low in energy consumption, but design cycle is complicated, throws piece Period is long, development cost is high, can not adapt to the characteristics of neural network model quickly updates;FPGA is flexible with circuit configuration, opens Period short feature is sent out, but the speed of service is relatively low, hardware spending and power consumption are relatively large.Which kind of no matter added using above-mentioned hardware Fast technology is required to neural network model and algorithm development personnel and grasps while awareness network topology and pattern of traffic firmly The links such as part development technique, including processor architecture design, hardware identification code are write, simulating, verifying and placement-and-routing, these technologies For being absorbed in researching neural network model and structure design, the higher layer applications developer without having hardware design ability Development difficulty is higher.Therefore, in order to make high-rise developer efficiently carry out nerual network technique application and development, provide it is a kind of towards The neural network processor the Automation Design method and tool of a variety of neural network models are very urgent.
Summary of the invention
In view of the deficiencies of the prior art, the present invention proposes the Automation Design method, apparatus towards neural network processor And optimization method.
The present invention proposes a kind of the Automation Design method towards neural network processor, comprising:
Step 1, it obtains neural network model and describes file, hardware resource constraints parameter, wherein the hardware resource constraints Parameter includes hardware resource size and object run speed;
Step 2, file and the hardware resource constraints parameter are described according to the neural network model, from the mind constructed Through searching unit library in networking component library, and the neural network for corresponding to the neural network model is generated according to the cell library The hardware description language code of processor;
Step 3, the hardware description language code is converted to the hardware circuit of the neural network processor.
The neural network processor includes storage organization, control structure, calculates structure.
It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file, Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute Stating link information includes connection name, connection direction, connection type.
The neural network reusable cell library includes hardware description file and configuration script two parts.
The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.
The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list Member.
It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to Neural network middle layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape The mode of state machine describes;
Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.
Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code, Generate data storage mapping and control instruction stream.
The invention also includes a kind of the Automation Design device towards neural network processor, comprising:
Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein described hard Part resource constraint parameter includes hardware resource size and object run speed;
Hardware description language code module is generated, for describing file and hardware money according to the neural network model Source constrained parameters, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to institute State the hardware description language code of the neural network processor of neural network model;
Hardware circuit module is generated, for converting the neural network processor for the hardware description language code Hardware circuit.
The neural network processor includes storage organization, control structure, calculates structure.
It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file, Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute Stating link information includes connection name, connection direction, connection type.
The neural network reusable cell library includes hardware description file and configuration script two parts.
The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.
The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list Member.
It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to Neural network middle layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape The mode of state machine describes;
Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.
Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code, Generate data storage mapping and control instruction stream.
The present invention also proposes a kind of based on a kind of the Automation Design method towards neural network processor as mentioned Optimization method, comprising:
Step 1, defining convolution kernel size is k*k, stepping s, memory width d, and datagram number is t, if k^2 Data are divided into the data block of k*k size by=d^2, and data width is consistent with memory width, guarantee data in memory Coutinuous store;
Step 2, if k^2!=d^2, and stepping s is the greatest common divisor of k Yu memory width d, and data are divided For the data block of s*s size, guarantee in a datagram data Coutinuous store in memory;
Step 3, if above two are all unsatisfactory for, the greatest common divisor f of stepping s, k, memory width d is found out, will be counted According to the data block that size is f*f is divided into, t datagrams are alternately stored.
As it can be seen from the above scheme the present invention has the advantages that
The present invention neural network model can be mapped as hardware circuit and according to hardware resource constraints and network characterization from Dynamic optimization circuit structure and data storage method, while corresponding control instruction stream is generated, realize neural network hardware acceleration The hardware of device and software automation collaborative design improve neural network while shortening the neural network processor design cycle Processor operation energy efficiency.
Detailed description of the invention
Fig. 1 is the automatic implementation tool work flow diagram of FPGA of neural network processor provided by the invention;
Fig. 2 is the neural network processor system schematic that invention can automatically generate;
Fig. 3 is the neural network reusable cell library schematic diagram that the present invention uses;
Fig. 4 is the address generating circuit interface diagram that the present invention uses.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail, it should be understood that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.
The present invention is intended to provide the Automation Design method, apparatus and optimization method towards neural network processor, the dress Set including a hardware generator and a compiler, the hardware generator can according to neural network type and hardware resource constraints from The dynamic hardware description language code for generating neural network processor, subsequent designer are logical using existing hardware circuit design method It crosses hardware description language and generates processor hardware circuit;The compiler can be generated according to neural network processor circuit structure and be controlled System and data dispatch command stream.
Fig. 1 is that neural network processor provided by the invention automates generation technique schematic diagram, specific steps are as follows:
Step 1, apparatus of the present invention read neural network model and describe file, include in description file network topology structure and Each operation layer definition;
Step 2, apparatus of the present invention read in hardware resource constraints parameter, and hardware constraints parameter includes hardware resource size and mesh Speed of service etc. is marked, apparatus of the present invention can generate corresponding circuit structure according to hardware constraints parameter;
Step 3, apparatus of the present invention are according to the neural network model description script and hardware resource constraints from having been built up Suitable cell library is indexed in good neural network component library, the hardware circuit generator which is included utilizes said units Library generates the neural network processor hardware description language code of the corresponding neural network model;
Step 4, the compiler that apparatus of the present invention are included is constrained and is generated hard according to neural network model, logical resource Part description language code building data storage mapping and control instruction stream;
Step 5, hardware circuit is converted by hardware description language by existing hardware design methods.
The neural network processor that the present invention can automatically generate is based on storage-control-calculating structure;
Storage organization is used to store data, neural network weight and the coprocessor operation instruction for participating in calculating;
Control structure includes that decoding circuit and control logic circuit generate for parsing operational order and control signal, the letter Scheduling and storage and neural computing process number for data in control sheet;
Calculating structure includes computing unit, for participating in the operation of the neural computing in the processor.
Fig. 2 is 101 schematic diagram of neural network processor system that the present invention can automatically generate, the neural network processor system 101 frameworks of uniting are made of seven parts, including input data storage unit 102, control unit 103, output data storage unit 104, weight storage unit 105, the location of instruction 106, computing unit 107.
Input data storage unit 102 is used to store the data for participating in calculating, the data include primitive character diagram data with Participate in the data that middle layer calculates;Output data storage unit 104 stores the neuron response being calculated;Instruction storage is single 106 storage of member participates in the command information calculated, and instruction is resolved to control stream to dispatch neural computing;Weight storage unit 105 for storing trained neural network weight;
Control unit 103 respectively with output data storage unit 104, weight storage unit 105, the location of instruction 106, Computing unit 107 is connected, and control unit 103 obtains the instruction being stored in the location of instruction 106 and parses the instruction, controls Unit 103 processed can carry out neural computing according to the control signal control computing unit analyzed the instruction.
Computing unit 107 is used to execute corresponding neural computing according to the control signal that control unit 103 generates. Computing unit 107 is associated with one or more storage units, and computing unit 107 can be deposited from input data associated there Data storage part in storage unit 102 obtains data to be calculated, and can deposit to output data associated there Data are written in storage unit 104.Computing unit 107 completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc..
The present invention describes neural network model feature by providing the neural network description file format, this describes file Content includes essential attribute, parameter description and link information three parts, and wherein essential attribute includes layer name and channel type, parameter Description includes the output number of plies, convolution kernel size and step size, and link information includes connection name, connection direction, connection class Type.
In order to adapt to the hardware design of various neural network models, neural network reusable unit provided by the invention Library such as Fig. 3, cell library include hardware description file and configuration script two parts.Reusable cell library provided by the invention include but It is not limited to: neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, look-up table Unit, scalar/vector, control unit etc..
The present invention is when constituting neural network processor system using above-mentioned reusable cell library, by reading neural network Model describes file and hardware resource constraints reasonably optimizing call unit library.
In the neural network processor course of work, processor needs the automatic ground for obtaining on piece and chip external memory data Location stream, in the present invention, storage address stream determines generation by compiler, the memory access mould determined by storage address stream For formula by text interaction to hardware generator, memory access patterns include that main access module, data access patterns and weight are visited Ask mode etc..
Hardware generator is according to the memory access patterns scalar/vector (AGU).
The neural network processor circuit packet designed using neural network processor design aids provided by the invention Include the scalar/vector of three types, comprising: main scalar/vector, data address generation unit and weight address generate single Member, wherein main scalar/vector is responsible for the data exchange between on-chip memory and chip external memory, and data address generates single Member is responsible for reading data to computing unit from on-chip memory and by computing unit results of intermediate calculations and final calculation result Store to this two parts data exchange of storage unit, weight scalar/vector be responsible for reading from on-chip memory weighted data to Computing unit.
In the present invention, hardware circuit generator and compiler, which cooperate, realizes the design of address generating circuit, specifically Algorithm for design step are as follows:
Step 1, the neural network model and hardware constraints that apparatus of the present invention are specified according to designer determine data path, And data resource sharing mode is determined according to neural network middle layer feature;
Step 2, compiler generates storage address access stream, the address access stream according to hardware configuration and network characterization It is described by way of finite state machine by compiler;
Step 3, the finite state machine is mapped as address generating circuit hardware description language, Jin Ertong by hardware generator It crosses hardware circuit design method and is mapped as hardware circuit.
Fig. 4 is address generating circuit universal architecture schematic diagram provided by the invention.Address generating circuit tool of the present invention There is universal signal interface, the interface signal which includes has:
Starting address signal, i.e. data first address;
Block size signal takes the data volume of a data;
Memory flag signal determines that the memory for storing data is numbered;
Operating mode signal is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full volume Product module formula etc.;
Convolution kernel size signal defines convolution kernel size;
Length signals, definition output picture size;
Input number of layers signal, label input number of layers;
Export number of layers signal, label output number of layers;
Reset signal, when which is 1, initialization address generative circuit;
Write enable signal specifies accessed memory to carry out write operation;
Enable signal is read, accessed memory is specified to carry out read operation;
Address signal provides access storage address;
End signal accesses end signal.
The parameter ensures that AGU supports multiple-working mode and guarantees in different working modes and neural network communication process In can generate correct read/write address stream.
For different target networks, tool is chosen necessary parameter building address generator and is provided from the template On piece and chip external memory access module.
Neural network processor provided by the invention constructs processor architecture using the mode of data-driven, therefore described Location generative circuit access address is not only provided and also the different nervous layers of driving and and layer data block execution.
Due to the limitation of resource constraint, neural network model can not describe shape according to its model when being mapped as hardware circuit Formula is completely unfolded, thus design aids proposed by the present invention using cooperative work of software and hardware method optimizing data storage and Access mechanism, including two parts content: firstly, the calculating handling capacity and on-chip memory of compiler analysis neural network processor Neural network characteristics data and weighted data are divided into set of data blocks appropriate and store and access by size;Secondly, according to meter It calculates unit scale, memory and data bit width and carries out data segmentation in data block.
The present invention is based on above-mentioned Optimization Mechanisms to propose a kind of optimization method data storage and accessed, specific implementation step Are as follows:
Step 1, defining convolution kernel size is k*k, stepping s, memory width d, and datagram number is t, if k^2 Data are divided into the data block of k*k size by=d^2, and data width is consistent with memory width, guarantee data in memory Coutinuous store;
Step 2, if k^2!=d^2, and s is the greatest common divisor of k and d, and data are divided into the data of s*s size Block guarantees that in a datagram, data can Coutinuous store in memory;
Step 3, if above two are all unsatisfactory for, the greatest common divisor f of s, k, d are found out, data, which are divided into size, is The data block of f*f, t datagrams alternately store.
The calculating data of neural network include input feature vector data and trained weighted data, are deposited by good data Storage layout can reduce processor internal data bandwidth and improve memory space utilization efficiency.Automated Design work provided by the invention Tool stores the computational efficiency that locality improves processor by increasing processor data.
In conclusion the present invention provides a the Automation Design tool towards neural network processor, which has The hardware identification code of description neural network processor is mapped as, according to hardware resource constraints optimized processor frame from neural network model Structure flows the functions such as instruction with control is automatically generated, and realizes the Automation Design of neural network processor, reduces neural network The design cycle of processor has adapted to nerual network technique network model updating decision, arithmetic speed requires block, energy efficiency requirement High application characteristic.
Although not each embodiment only includes one it should be appreciated that this specification describes according to various embodiments A independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should will say As a whole, the technical solutions in the various embodiments may also be suitably combined for bright book, and forming those skilled in the art can be with The other embodiments of understanding.
The present invention also proposes a kind of the Automation Design device towards neural network processor, comprising:
Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein described hard Part resource constraint parameter includes hardware resource size and object run speed;
Hardware description language code module is generated, for describing file and hardware money according to the neural network model Source constrained parameters, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to institute State the hardware description language code of the neural network processor of neural network model;
Hardware circuit module is generated, for converting the neural network processor for the hardware description language code Hardware circuit.
The neural network processor includes storage organization, control structure, calculates structure.
It includes essential attribute, parameter description and link information three parts, wherein institute that the neural network model, which describes file, Stating essential attribute includes layer name and channel type, and the parameter description includes the output number of plies, convolution kernel size and step size, institute Stating link information includes connection name, connection direction, connection type.
The neural network processor includes that main scalar/vector, data address generation unit and weight address generate list Member.
It further include the neural network model specified according to user and hardware resource constraints parameter determines data path, and according to Neural network middle layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through limited shape The mode of state machine describes;
The neural network reusable cell library includes hardware description file and configuration script two parts.
The neural network reusable cell library includes neuron elements, accumulator element, pond unit, classifier list Member, local acknowledgement's normalization unit, look-up table unit, scalar/vector, control unit.
The finite state machine is mapped as address, and generates hardware description language code, and then is converted into the nerve The hardware circuit of network processing unit.
Further include according to the neural network model, the hardware resource constraints parameter, the hardware description language code, Generate data storage mapping and control instruction stream.
The foregoing is merely the schematical specific embodiment of the present invention, the range being not intended to limit the invention.It is any Those skilled in the art, made equivalent variations, modification and combination under the premise of not departing from design and the principle of the present invention, It should belong to the scope of protection of the invention.

Claims (17)

1. a kind of the Automation Design method towards neural network processor characterized by comprising
Step 1, it obtains neural network model and describes file, hardware resource constraints parameter, wherein the hardware resource constraints parameter Including hardware resource size and object run speed;
Step 2, file and the hardware resource constraints parameter are described according to the neural network model, from the nerve net constructed Searching unit library in network Component Gallery, and the Processing with Neural Network for corresponding to the neural network model is generated according to the cell library The hardware description language code of device;
Step 3, the neural network model and hardware resource constraints parameter specified according to user determine data path, and according to nerve Network middle layer feature determines data resource sharing mode, and compiler generates storage address according to hardware configuration and network characterization Access stream, hardware circuit generator and compiler, which cooperate, realizes address generating circuit;Address generating circuit includes Working mould Formula signal is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full convolution mode;Address generates Circuit further includes block size signal, takes the data volume of a data;
Step 4, the hardware description language code is converted to the hardware circuit of the neural network processor;
Wherein step 4 further include:, will be neural according to the calculating handling capacity and on-chip memory size of the neural network processor Network characterization data and weighted data, which are divided into set of data blocks, to be stored and accessed, and the meter according to the neural network processor Unit scale, memory and data bit width are calculated, data segmentation is carried out in data block.
2. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Neural network processor includes storage organization, control structure, calculates structure.
3. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described It includes essential attribute, parameter description and link information three parts that neural network model, which describes file, wherein the essential attribute packet Layer name and channel type are included, the parameter description includes the output number of plies, convolution kernel size and step size, the link information packet Include connection name, connection direction, connection type.
4. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Cell library includes hardware description file and configuration script two parts.
5. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Cell library includes neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, look-up table Unit, scalar/vector, control unit.
6. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that described Neural network processor includes main scalar/vector, data address generation unit and weight scalar/vector.
7. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that also wrap Include the neural network model specified according to user and hardware resource constraints parameter determine data path, and according to neural network among Layer feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through finite state machine Mode describe;
Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.
8. a kind of the Automation Design method towards neural network processor as described in claim 1, which is characterized in that also wrap It includes according to the neural network model, the hardware resource constraints parameter, the hardware description language code, generates data storage Mapping and control instruction stream.
9. a kind of the Automation Design device towards neural network processor characterized by comprising
Data module is obtained, describes file, hardware resource constraints parameter for obtaining neural network model, wherein the hardware provides Source constrained parameters include hardware resource size and object run speed;
Hardware description language code module is generated, for describing file and the hardware resource about according to the neural network model Beam parameter, the searching unit library from the neural network component library constructed, and generated according to the cell library and correspond to the mind The hardware description language code of neural network processor through network model;
Hardware/Software Collaborative Design module, the neural network model specified according to user and hardware resource constraints parameter determine data road Diameter, and data resource sharing mode is determined according to neural network middle layer feature, compiler is according to hardware configuration and network characterization Storage address access stream is generated, hardware circuit generator and compiler cooperate and realize address generating circuit;Address generates Circuit includes operating mode signal, is divided into big convolution kernel and data pattern, small convolution kernel is taken to take data pattern, pond mode, full volume Product module formula;Address generating circuit further includes block size signal, takes the data volume of a data;
Hardware circuit module is generated, for converting the hardware description language code on the hardware of the neural network processor Circuit;
Wherein generate hardware circuit module further include: according to the calculating handling capacity and on-chip memory of the neural network processor Neural network characteristics data and weighted data are divided into set of data blocks and store and access by size, and according to the nerve net Computing unit scale, memory and the data bit width of network processor carry out data segmentation in data block.
10. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Neural network processor is stated to include storage organization, control structure, calculate structure.
11. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Stating neural network model and describing file includes essential attribute, parameter description and link information three parts, wherein the essential attribute Including layer name and channel type, the parameter description includes the output number of plies, convolution kernel size and step size, the link information Including connection name, connection direction, connection type.
12. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Stating cell library includes hardware description file and configuration script two parts.
13. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Cell library is stated to include neuron elements, accumulator element, pond unit, classifier unit, local acknowledgement's normalization unit, search Table unit, scalar/vector, control unit.
14. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that institute Stating neural network processor includes main scalar/vector, data address generation unit and weight scalar/vector.
15. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that also Data path is determined including the neural network model specified according to user and hardware resource constraints parameter, and according in neural network Interbed feature determines data resource sharing mode;
It is accessed and is flowed according to the address that hardware configuration and network characterization generate memory, the address access stream passes through finite state machine Mode describe;
Hardware description language code is generated, and then is converted into the hardware circuit of the neural network processor.
16. a kind of the Automation Design device towards neural network processor as claimed in claim 9, which is characterized in that also Including generating data and depositing according to the neural network model, the hardware resource constraints parameter, the hardware description language code Storage mapping and control instruction stream.
17. a kind of a kind of the Automation Design towards neural network processor based on as described in claim 1-8 any one The optimization method of method characterized by comprising
Definition convolution kernel size is k*k, stepping s, memory width d, and datagram number is that t will be counted if k^2=d^2 According to the data block for being divided into k*k size, data width is consistent with memory width, guarantees data Coutinuous store in memory;
If k^2!=d^2, and stepping s is the greatest common divisor of k Yu memory width d, and data are divided into s*s size Data block guarantees in a datagram data Coutinuous store in memory;
If k^2!=d^2, and stepping s is not the greatest common divisor of k Yu memory width d, then finds out stepping s, k, storage Data are divided into the data block that size is f*f by the greatest common divisor f of device width d, and t datagrams alternately store.
CN201710178281.3A 2017-03-23 2017-03-23 The Automation Design method, apparatus and optimization method towards neural network processor Active CN107103113B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710178281.3A CN107103113B (en) 2017-03-23 2017-03-23 The Automation Design method, apparatus and optimization method towards neural network processor
PCT/CN2018/080207 WO2018171717A1 (en) 2017-03-23 2018-03-23 Automated design method and system for neural network processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710178281.3A CN107103113B (en) 2017-03-23 2017-03-23 The Automation Design method, apparatus and optimization method towards neural network processor

Publications (2)

Publication Number Publication Date
CN107103113A CN107103113A (en) 2017-08-29
CN107103113B true CN107103113B (en) 2019-01-11

Family

ID=59676152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710178281.3A Active CN107103113B (en) 2017-03-23 2017-03-23 The Automation Design method, apparatus and optimization method towards neural network processor

Country Status (2)

Country Link
CN (1) CN107103113B (en)
WO (1) WO2018171717A1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11734577B2 (en) 2019-06-05 2023-08-22 Samsung Electronics Co., Ltd Electronic apparatus and method of performing operations thereof
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US12307350B2 (en) 2018-01-04 2025-05-20 Tesla, Inc. Systems and methods for hardware-based pooling
US12462575B2 (en) 2021-08-19 2025-11-04 Tesla, Inc. Vision-based machine learning model for autonomous driving with adjustable virtual camera
US12522243B2 (en) 2021-08-19 2026-01-13 Tesla, Inc. Vision-based system training with simulated content

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103113B (en) * 2017-03-23 2019-01-11 中国科学院计算技术研究所 The Automation Design method, apparatus and optimization method towards neural network processor
CN107341761A (en) * 2017-07-12 2017-11-10 成都品果科技有限公司 A kind of calculating of deep neural network performs method and system
CN107633295B (en) * 2017-09-25 2020-04-28 南京地平线机器人技术有限公司 Method and device for adapting parameters of a neural network
CN109697509B (en) * 2017-10-24 2020-10-20 上海寒武纪信息科技有限公司 Processing method and device, computing method and device
CN109726805B (en) * 2017-10-30 2021-02-09 上海寒武纪信息科技有限公司 Method for designing neural network processor by using black box simulator
US11521046B2 (en) 2017-11-08 2022-12-06 Samsung Electronics Co., Ltd. Time-delayed convolutions for neural network device and method
CN110097180B (en) * 2018-01-29 2020-02-21 上海寒武纪信息科技有限公司 Computer equipment, data processing method and storage medium
CN110097179B (en) * 2018-01-29 2020-03-10 上海寒武纪信息科技有限公司 Computer equipment, data processing method and storage medium
EP3651020A1 (en) 2017-11-20 2020-05-13 Shanghai Cambricon Information Technology Co., Ltd Computer equipment, data processing method, and storage medium
CN109993288B (en) * 2017-12-29 2020-04-28 中科寒武纪科技股份有限公司 Neural network processing method, computer system and storage medium
WO2019128752A1 (en) * 2017-12-29 2019-07-04 北京中科寒武纪科技有限公司 Neural network processing method, computer system, and storage medium
CN108563808B (en) * 2018-01-05 2020-12-04 中国科学技术大学 Design Method of Heterogeneous Reconfigurable Graph Computation Accelerator System Based on FPGA
CN108388943B (en) * 2018-01-08 2020-12-29 中国科学院计算技术研究所 A pooling device and method suitable for neural networks
CN108154229B (en) * 2018-01-10 2022-04-08 西安电子科技大学 Image processing method based on FPGA accelerated convolutional neural network framework
CN108389183A (en) * 2018-01-24 2018-08-10 上海交通大学 Pulmonary nodule detects neural network accelerator and its control method
CN111868754A (en) * 2018-03-23 2020-10-30 索尼公司 Information processing apparatus and information processing method
CN108921289B (en) * 2018-06-20 2021-10-29 郑州云海信息技术有限公司 A kind of FPGA heterogeneous acceleration method, device and system
CN110955380B (en) * 2018-09-21 2021-01-12 中科寒武纪科技股份有限公司 Access data generation method, storage medium, computer device and apparatus
CN111079910B (en) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 Operation method, device and related product
WO2020078446A1 (en) * 2018-10-19 2020-04-23 中科寒武纪科技股份有限公司 Computation method and apparatus, and related product
CN111079909B (en) * 2018-10-19 2021-01-26 安徽寒武纪信息科技有限公司 Operation method, system and related product
CN111079907B (en) * 2018-10-19 2021-01-26 安徽寒武纪信息科技有限公司 Operation method, device and related product
CN111079924B (en) * 2018-10-19 2021-01-08 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111079914B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078293B (en) * 2018-10-19 2021-03-16 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111079911B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111079912B (en) * 2018-10-19 2021-02-12 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111079916B (en) * 2018-10-19 2021-01-15 安徽寒武纪信息科技有限公司 Operation method, system and related product
CN111144561B (en) * 2018-11-05 2023-05-02 杭州海康威视数字技术股份有限公司 Neural network model determining method and device
WO2020093304A1 (en) * 2018-11-08 2020-05-14 北京比特大陆科技有限公司 Method, apparatus, and device for compiling neural network, storage medium, and program product
CN109491956B (en) * 2018-11-09 2021-04-23 北京灵汐科技有限公司 A Heterogeneous Collaborative Computing System
KR102744306B1 (en) 2018-12-07 2024-12-18 삼성전자주식회사 A method for slicing a neural network and a neuromorphic apparatus
CN111325311B (en) * 2018-12-14 2024-03-29 深圳云天励飞技术有限公司 Neural network model generation method for image recognition and related equipment
CN109685203B (en) * 2018-12-21 2020-01-17 中科寒武纪科技股份有限公司 Data processing method, device, computer system and storage medium
CN109726797B (en) * 2018-12-21 2019-11-19 北京中科寒武纪科技有限公司 Data processing method, device, computer system and storage medium
CN109754084B (en) * 2018-12-29 2020-06-12 中科寒武纪科技股份有限公司 Network structure processing method, device and related products
CN111461296B (en) * 2018-12-29 2023-09-22 中科寒武纪科技股份有限公司 Data processing methods, electronic devices and readable storage media
CN109978160B (en) * 2019-03-25 2021-03-02 中科寒武纪科技股份有限公司 Configuration device and method of artificial intelligence processor and related products
CN109739802B (en) * 2019-04-01 2019-06-18 上海燧原智能科技有限公司 Computing cluster and computing cluster configuration method
CN112132271A (en) * 2019-06-25 2020-12-25 Oppo广东移动通信有限公司 Neural network accelerator operation method, architecture and related device
WO2021026775A1 (en) * 2019-08-13 2021-02-18 深圳鲲云信息科技有限公司 Neural network data stream acceleration method and apparatus, computer device, and storage medium
CN111126572B (en) * 2019-12-26 2023-12-08 北京奇艺世纪科技有限公司 Model parameter processing method and device, electronic equipment and storage medium
CN111339027B (en) * 2020-02-25 2023-11-28 中国科学院苏州纳米技术与纳米仿生研究所 Reconfigurable artificial intelligence core and automatic design method for heterogeneous multi-core chips
CN111488969B (en) * 2020-04-03 2024-01-19 北京集朗半导体科技有限公司 Execution optimization method and device based on neural network accelerator
CN111949405A (en) * 2020-08-13 2020-11-17 Oppo广东移动通信有限公司 Resource scheduling method, hardware accelerator and electronic device
CN111931926A (en) * 2020-10-12 2020-11-13 南京风兴科技有限公司 Hardware acceleration system and control method for convolutional neural network CNN
US12112112B2 (en) 2020-11-12 2024-10-08 Samsung Electronics Co., Ltd. Method for co-design of hardware and neural network architectures using coarse-to-fine search, two-phased block distillation and neural hardware predictor
JP7686273B2 (en) * 2021-08-26 2025-06-02 国立大学法人 東京大学 Information processing device and program
CN120850944B (en) * 2025-09-23 2025-12-23 蓝芯算力(深圳)科技有限公司 Processor circuit generation method, system and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
WO2016179533A1 (en) * 2015-05-06 2016-11-10 Indiana University Research And Technology Corporation Sensor signal processing using an analog neural network
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN106529670A (en) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 Neural network processor based on weight compression, design method, and chip

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103113B (en) * 2017-03-23 2019-01-11 中国科学院计算技术研究所 The Automation Design method, apparatus and optimization method towards neural network processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016179533A1 (en) * 2015-05-06 2016-11-10 Indiana University Research And Technology Corporation Sensor signal processing using an analog neural network
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN106529670A (en) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 Neural network processor based on weight compression, design method, and chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于神经网络嵌入式系统体系结构的研究;叶莉娅等;《杭州电子科技大学学报》;20050430;第61-64页 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US12020476B2 (en) 2017-03-23 2024-06-25 Tesla, Inc. Data synthesis for autonomous control systems
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US12536131B2 (en) 2017-07-24 2026-01-27 Tesla, Inc. Vector computational unit
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US12086097B2 (en) 2017-07-24 2024-09-10 Tesla, Inc. Vector computational unit
US12216610B2 (en) 2017-07-24 2025-02-04 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US12307350B2 (en) 2018-01-04 2025-05-20 Tesla, Inc. Systems and methods for hardware-based pooling
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US12455739B2 (en) 2018-02-01 2025-10-28 Tesla, Inc. Instruction set architecture for a vector computational unit
US11797304B2 (en) 2018-02-01 2023-10-24 Tesla, Inc. Instruction set architecture for a vector computational unit
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US12079723B2 (en) 2018-07-26 2024-09-03 Tesla, Inc. Optimizing neural network structures for embedded systems
US12346816B2 (en) 2018-09-03 2025-07-01 Tesla, Inc. Neural networks for embedded devices
US11983630B2 (en) 2018-09-03 2024-05-14 Tesla, Inc. Neural networks for embedded devices
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US12367405B2 (en) 2018-12-03 2025-07-22 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US12198396B2 (en) 2018-12-04 2025-01-14 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11908171B2 (en) 2018-12-04 2024-02-20 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US12136030B2 (en) 2018-12-27 2024-11-05 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US12223428B2 (en) 2019-02-01 2025-02-11 Tesla, Inc. Generating ground truth for machine learning from time series elements
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US12164310B2 (en) 2019-02-11 2024-12-10 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US12236689B2 (en) 2019-02-19 2025-02-25 Tesla, Inc. Estimating object properties using visual image data
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11734577B2 (en) 2019-06-05 2023-08-22 Samsung Electronics Co., Ltd Electronic apparatus and method of performing operations thereof
US12462575B2 (en) 2021-08-19 2025-11-04 Tesla, Inc. Vision-based machine learning model for autonomous driving with adjustable virtual camera
US12522243B2 (en) 2021-08-19 2026-01-13 Tesla, Inc. Vision-based system training with simulated content

Also Published As

Publication number Publication date
WO2018171717A1 (en) 2018-09-27
CN107103113A (en) 2017-08-29

Similar Documents

Publication Publication Date Title
CN107103113B (en) The Automation Design method, apparatus and optimization method towards neural network processor
CN107016175B (en) It is applicable in the Automation Design method, apparatus and optimization method of neural network processor
US11783227B2 (en) Method, apparatus, device and readable medium for transfer learning in machine learning
CN106529670A (en) Neural network processor based on weight compression, design method, and chip
CN112433819A (en) Heterogeneous cluster scheduling simulation method and device, computer equipment and storage medium
CN108710941A (en) The hard acceleration method and device of neural network model for electronic equipment
CN109902798A (en) The training method and device of deep neural network
CN107403141A (en) Method for detecting human face and device, computer-readable recording medium, equipment
CN108470190A (en) The image-recognizing method of impulsive neural networks is customized based on FPGA
CN110991362A (en) Pedestrian detection model based on attention mechanism
CN102176200A (en) Software test case automatic generating method
CN113168552A (en) Artificial intelligence application development system, computer device and storage medium
WO2023051678A1 (en) Recommendation method and related device
CN116665219A (en) A data processing method and device thereof
CN114613159A (en) Traffic signal lamp control method, device and equipment based on deep reinforcement learning
US12198037B2 (en) Hardware architecture determination based on a neural network and a network compilation process
WO2025189504A1 (en) Arithmetic unit chip configuration method, computing subsystem, and intelligent computing platform
CN114566048A (en) Traffic control method based on multi-view self-adaptive space-time diagram network
CN107292385A (en) The model training method and device of one species Alexnet networks
CN117574767A (en) In-memory computing architecture software and hardware system simulation method and simulator
CN112270083B (en) A multi-resolution modeling and simulation method and system
CN110263328A (en) A kind of disciplinary capability type mask method, device, storage medium and terminal device
CN110442753A (en) A kind of chart database auto-creating method and device based on OPC UA
CN117829242B (en) Model processing methods and related equipment
CN109359650A (en) Target detection method and device, embedded device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant