WO2020258529A1 - Bnrp-based configurable parallel general convolutional neural network accelerator - Google Patents
Bnrp-based configurable parallel general convolutional neural network accelerator Download PDFInfo
- Publication number
- WO2020258529A1 WO2020258529A1 PCT/CN2019/105534 CN2019105534W WO2020258529A1 WO 2020258529 A1 WO2020258529 A1 WO 2020258529A1 CN 2019105534 W CN2019105534 W CN 2019105534W WO 2020258529 A1 WO2020258529 A1 WO 2020258529A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mode
- pooling
- data
- parameters
- calculation
- Prior art date
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 27
- 238000004364 calculation method Methods 0.000 claims abstract description 73
- 238000010606 normalization Methods 0.000 claims abstract description 22
- 230000004913 activation Effects 0.000 claims abstract description 20
- 238000013144 data compression Methods 0.000 claims abstract description 10
- 238000003491 array Methods 0.000 claims abstract description 3
- 238000011176 pooling Methods 0.000 claims description 55
- 230000001133 acceleration Effects 0.000 abstract description 10
- 238000004891 communication Methods 0.000 abstract description 7
- 238000000034 method Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the invention discloses a configurable parallel general convolutional neural network accelerator based on BNRP, which belongs to the technical field of calculation, calculation and counting.
- DNN Deep Neural Network
- CNN convolutional neural network
- CNN Convolutional Neural Network
- the network topology of neural networks is constantly changing. Accordingly, the network scale has expanded dramatically. For example, the Baidu brain with 100 billion neuron connections And Google's cat-recognizing system with 1 billion neuron connections. Therefore, how to realize large-scale deep learning neural network models with low consumption and high speed through computational acceleration and advanced technology has become an important issue in the field of machine learning and artificial intelligence.
- Deep neural networks not only require a large amount of calculation, but also need to store millions or even hundreds of millions of network parameters. Therefore, at present, they mainly use high-performance multi-core CPU (Central Processing Unit) and GPU (Graphic Processing Unit). Complete real-time detection and recognition based on deep neural network. However, for robots, consumer electronics, smart cars and other mobile devices with limited power consumption, size and cost, it is almost impossible to transplant complex and diverse convolutional neural network models through CPU or GPU. Therefore, the use of general-purpose devices to build a flexibly configurable, high-performance, low-power general-purpose hardware accelerator can meet the large-scale computing and storage requirements of convolutional neural networks.
- CPU Central Processing Unit
- GPU Graphic Processing Unit
- FPGA field-programmable gate array
- ASIC has the disadvantages of long development cycle, high cost and low flexibility, because ASIC is customized, it is better than GPU and FPGA in performance and power consumption.
- the performance of the TPU series ASICAI chip released by Google in 2016 is 14 to 16 times that of the traditional GPU, and the performance of the NPU released by Vimicro is 118 times that of the GPU.
- FPGA or ASIC is applied to the mobile work platform, and the convolutional neural network configurable general hardware accelerator is designed based on the systolic convolutional array and the high parallelism pipeline method that can achieve high computing throughput with only moderate storage and communication bandwidth. It is an effective solution.
- the purpose of the present invention is to address the deficiencies of the above-mentioned background technology and provide a configurable parallel general convolutional neural network accelerator based on BNRP, which can support the calculation acceleration of convolutional neural network structures of various scales, has good versatility, and is suitable for on-chip Storage resources and I/O bandwidth requirements are low, which improves computing parallelism and throughput, and solves the technical problem that the limited on-chip storage and I/O bandwidth of existing hardware accelerators cannot meet the high-throughput computing requirements of convolutional neural networks.
- a BNRP-based configurable parallel general convolutional neural network accelerator including: mode configurator, parallel computing acceleration unit (convolution calculator, BNRP calculator), data cache unit (input and output feature map cache, weight parameter cache) , Data communication unit (AXI4 bus interface, AHB bus interface), data compression encoder/decoder.
- the input feature map data In_Map, weight parameters and BN parameters are compressed and encoded by the data compression encoder/decoder through the AXI4 bus interface in the data communication unit and then buffered to the corresponding In_Map Buffer, weight buffer and BN parameter buffer area; accelerator calculation mode and function
- the configuration information is transmitted to the mode configurator through the AHB bus interface in the data communication unit; the mode configurator configures the calculation modes and functions of the parallel computing acceleration unit according to the received configuration information, and the parallel computing acceleration unit reads In_Map Buffer and weight
- caching and BN parameter buffering data perform corresponding convolution, batch normalization, nonlinear activation or pooling operations layer by layer, row, column and channel according to the configuration parameters in parallel pipeline mode; after each layer of network has extracted the features
- the output feature map data is sent back to the data compression encoder/decoder for decoding, and then sent back to the accelerator external data storage unit through the AXI4 bus interface.
- the network configuration information such as the network level of the current processed data read from the AHB bus interface by the mode configurator, network model parameters, and buffer data read and write addresses are cached in the data buffer area of the convolution calculator; Whether the configurator reads from the AHB bus interface performs batch normalization (BN), non-linear activation (ReLu), pooling (pooling), data compression encoding/decoding function operations, and calculation mode configuration parameters, etc. And function configuration parameters are transferred to the BNRP calculator.
- BN batch normalization
- ReLu non-linear activation
- Pooling pooling
- data compression encoding/decoding function operations and calculation mode configuration parameters
- the BNRP calculator executes batch normalization (BN), non-linear activation (ReLu) or 4 kinds of pooling (Pooling) operations in parallel in a pipeline manner, and can be configured to execute the above according to the flag bit One or several operations, and perform the corresponding calculation mode according to the configuration parameters, mode 1: perform the BN operation, perform the pooling operation first, and then perform the ReLu operation; mode 2: perform the BN operation, perform the ReLu operation first, and then perform the pooling operation .
- BN batch normalization
- ReLu non-linear activation
- Pooling 4 kinds of pooling
- the BNRP calculator when the input feature map size map_size>R and the pooling operation is performed according to the configuration needs, according to the network model, the number of rows R of the pulsating convolutional array and the configuration parameters, the configuration will input m rows of features
- the image data is interleaved and cached to 2m on-chip Block RAM.
- 2*R*T poolers are partially enabled according to the configuration information, and the others are turned off; among them, the "2*2 pooler” executes 2*2AP or 2 according to the configuration parameters.
- the convolution calculation array and the BNRP calculator if the configuration requires BN operation, before performing the ReLu operation, first determine the feature map data map[i][j], BN weight parameters a[i][j] and b[i][j] and the size of 0, if map[i][j] ⁇ 0, a[i][j] ⁇ 0 and b[i][j ] ⁇ 0, the convolution calculation array does not need to multiply the map[i][j] and a[i][j], and does not need to add b[i][j], BNRP calculator mode 1 The corresponding output value of the BN operation of the BNRP calculator mode 2 and the corresponding output value of the ReLu operation are 0.
- the present invention uses the parallel pipeline method to design the BNRP calculator, and reduces the calculation amount of the neural network accelerator by dynamically configuring the parameters of the parallel calculator, especially the calculation execution mode of the BNRP calculator, especially for the larger volume of the network structure.
- Convolutional neural network can greatly accelerate the calculation of the convolutional neural network accelerator, while reducing repetitive calculations and reducing the power consumption of the accelerator; based on the systolic array architecture, the convolutional calculation array is designed, and only moderate storage and I/O communication bandwidth can be used. Achieve high computing throughput, effectively improve the data reuse rate, and further reduce the data transmission time.
- the calculation execution mode of the BNRP calculator can be dynamically configured according to the characteristics of the network structure, which is more versatile and is no longer restricted by the network model structure and the number of layers, and unnecessary intermediate value caches are also omitted. Reduce the use of memory resources.
- Fig. 1 is a schematic structural diagram of the accelerator disclosed in the present invention.
- Figure 2 is a schematic diagram of the structure of the BNRP calculator of the present invention.
- Fig. 3 is a schematic diagram of the working process of the BNRP calculator of the present invention.
- Figure 4 is a schematic diagram of the 3*3 pooling device of the present invention performing pooling operation.
- the configurable parallel general convolutional neural network accelerator based on BNRP disclosed in the present invention is shown in Fig. 1, and includes: a parallel computing acceleration unit composed of a mode configurator, a convolution calculator and a BNRP calculator, an input and output feature map cache and a weight Data buffer unit composed of parameter buffer, data communication unit composed of AXI4 bus interface and AHB bus interface, data compression encoder/decoder.
- the working status of the accelerator includes read configuration parameter status, read data status, calculation status, and send data status.
- the mode configurator reads the mode configuration parameters from the outside of the accelerator through the AHB bus, among which, whether to perform BN, ReLu or pooling operation and the configuration information such as execution mode, network layer number, feature map size, etc. are transmitted to the BNRP calculator; network layer number, Information such as feature map size and batch size, and convolution kernel size are transmitted to the data buffer area of the convolution calculator; configuration information such as the number of network layers, data read/write enable and address are transmitted to the data compression encoder/decoder.
- the data compression encoder/decoder After the data compression encoder/decoder reads the data read enable and address signals, it reads the corresponding weight parameters (convolution kernel and bias) from the accelerator through the AXI4 bus and transmits them to the weight parameter buffer area, and reads the corresponding input
- the feature map data is transferred to In_MapBuffer.
- the convolution calculator After the convolution calculator receives the calculation enable signal, it reads the number of network layers, feature map size and batch size, and convolution kernel size from the data buffer area, and reads the weight parameters and input feature map data in a pulsating manner for corresponding Convolution calculation. After the calculation is completed, the end flag information is output to the BNRP calculator, and the convolution calculation result is output to Out_MapBuffer.
- the BNRP calculator waits for the calculation completion flag information sent by the convolution calculator after receiving the mode configuration parameters. If the configuration requires BN operation, it initiates a BN parameter read request and reads the corresponding BN parameter from the BN parameter buffer ; Otherwise, no BN operation is performed.
- the BNRP calculator determines the calculation mode to be executed according to the configuration information. If the execution mode 1 is configured, the pooling operation is executed first, and the input pixel values of the feature map that need to be cached are sent to the corresponding BlockRAM according to the received network model parameters (pooling step size) and feature map size, and the corresponding pooling is enabled After completing the pooling calculation, execute the ReLu operation; if the execution mode 2 is configured, execute the ReLu operation first.
- the maximum pooler calculation process is as follows:
- k 1, 2 represents the size of the pooler, IMap represents the pixel value of the input feature map, OMap represents the pixel value of the output feature map, and OMap[c][i][j] represents the ith row and the th row of the C-th output feature map. Column j pixel value.
- the first convolution calculation outputs the 1, 2, 3, 4, 5, and 6 lines of the feature map to the corresponding BlockRAM1, BlockRAM2, BlockRAM3, BlockRAM4, BlockRAM5, BlockRAM6, and caches the fifth line of data to BlockRAM5B, and caches the sixth line Data to BlockRAM6B, enable 1C, 3, 5 poolers.
- the first output value of No. 1C pooler is invalid;
- No. 3 pooler executes R1, R2, R3 three-row pooling calculation, and outputs the pixel value of the first row of Out_Map;
- No. 5 pooler executes R3, R4, R5 three rows Pooling calculation, output the pixel value of the second row of Out_Map.
- the second convolution calculation outputs lines 7, 8, 9, 10, 11, and 12 of the feature map to the corresponding BlockRAM1, BlockRAM2, BlockRAM3, BlockRAM4, BlockRAM5, BlockRAM6, and caches the 11th line of data to BlockRAM5B, and caches the 12th line Data to BlockRAM6B, enable 1B, 3, 5 poolers.
- Pooler 1B performs three-line pooling calculations of R5, R6, and R7, and outputs the pixel value of the third row of Out_Map;
- pooler No. 3 performs three-line pooling calculation of R7, R8, R9, and outputs the pixel value of the fourth row of Out_Map;
- the No. 5 pooler performs the three-line pooling calculation of R9, R10, and R11, and outputs the pixel value of the fifth row of Out_Map.
- the third convolution calculation outputs the random numbers of 13 lines and 5 lines of the feature map to the corresponding BlockRAM 1, 2, 3, 4, 5, 6.
- the convolution output feature map size map_size ⁇ R, so there is no need to cache, so Can pooler No. 1C.
- No. 1C pooler performs the three-line pooling calculation of R11, R12, and R13, and outputs the pixel value of the sixth row of Out_Map to complete the pooling operation of the input image of this layer.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims (9)
- 一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,包括:A BNRP-based configurable parallel general convolutional neural network accelerator, which is characterized in that it includes:模式配置器,从外部读取网络参数、特征图参数、计算模式和功能配置参数,根据读取的参数输出切换加速器工作状态的指令,Mode configurator, which reads network parameters, characteristic map parameters, calculation mode and function configuration parameters from the outside, and outputs instructions for switching the accelerator working state according to the read parameters数据压缩编码/解码器,在收到模式配置器发送的网络参数、数据读写使能指令和地址配置信息后对从外部读取的特征图数据、权重数据、BN参数进行编码,在接收到BNRP计算器输出的计算结果时对计算结果进行解码,Data compression encoder/decoder, after receiving the network parameters, data read and write enable command and address configuration information sent by the mode configurator, it encodes the feature map data, weight data and BN parameters read from the outside, The calculation result output by the BNRP calculator decodes the calculation result,BN参数缓存器,用于存储编码后的BN参数,The BN parameter buffer is used to store the encoded BN parameters,输入特征图缓存器,用于存储编码后的输入特征图数据,The input feature map buffer is used to store the encoded input feature map data,权重参数缓存器,用于存储编码后的权重数据,The weight parameter buffer is used to store the encoded weight data,数据缓存器,用于存储模式配置器从外部读取的网络参数、特征图尺寸参数,在进入计算状态后从权重参数缓存器读取编码后的权重数据,The data buffer is used to store the network parameters and feature map size parameters read from the outside by the mode configurator, and read the encoded weight data from the weight parameter buffer after entering the calculation state,卷积计算器,在收到模式配置器发送的计算使能指令后,从数据缓存器读取网络参数、特征图参数、权重数据,从输入特征图缓存器和权重参数缓存器读取输入特征图数据和权重数据后进行卷积计算,The convolution calculator, after receiving the calculation enable command sent by the mode configurator, reads network parameters, feature map parameters, and weight data from the data buffer, and reads input features from the input feature map buffer and weight parameter buffer After graph data and weight data, convolution calculation is performed,输出特征图缓存器,用于存储卷积计算器输出的卷积结果,及,The output feature map buffer is used to store the convolution result output by the convolution calculator, and,BNRP计算器,在收到模式配置器发送的计算模式和卷积计算器输出的卷积计算结束标志后,根据模式配置器发送的功能配置参数对卷积计算器输出的卷积结果执行先批量归一化后池化再非线性激活的计算模式或者先批量归一化后非线性激活再池化的计算模式。The BNRP calculator, after receiving the calculation mode sent by the mode configurator and the convolution calculation end flag output by the convolution calculator, execute the first batch of the convolution results output by the convolution calculator according to the function configuration parameters sent by the mode configurator The calculation mode of pooling and then nonlinear activation after normalization or the calculation mode of nonlinear activation and then pooling after batch normalization.
- 根据权利要求1所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,所述BNRP计算器包括:The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 1, wherein the BNRP calculator comprises:R*T个数据输入接口,接收卷积计算器T个卷积阵列输出的R行特征图,R*T data input interfaces, receiving the R line feature maps output by the convolution calculator T convolution arrays,BN操作模块,在模式配置器发送的功能配置参数包含批归一化操作指令时,从BN参数缓存器读取BN参数后对数据输入端口接收的数据进行批量归一化操作,BN operation module, when the function configuration parameters sent by the mode configurator include batch normalization operation instructions, read the BN parameters from the BN parameter buffer and perform batch normalization operations on the data received by the data input port.Relu操作模块,在模式配置器发送的计算模式为先批量归一化后池化再非线性激活时,对池化结果进行非线性激活,在模式配置器发送的计算模式为先批 量归一化后非线性激活再池化时,对批量归一化后的数据进行非线性激活,及,Relu operation module, when the calculation mode sent by the mode configurator is batch normalization first, then pooling and then non-linear activation, the pooling result is activated nonlinearly, and the calculation mode sent by the mode configurator is batch normalization first After nonlinear activation and repooling, perform nonlinear activation on the batch-normalized data, and,R*T个池化器,在模式配置器发送的计算模式为先批量归一化后池化再非线性激活时输出批量归一化数据的池化结果,在模式配置器发送的计算模式为先批量归一化后非线性激活再池化时输出非线性激活后的批量归一化数据的池化结果。R*T poolers, the calculation mode sent by the mode configurator is batch normalization first, pooling and then non-linear activation, output the pooling results of batch normalization data, the calculation mode sent by the mode configurator is First batch normalization, then nonlinear activation and then pooling, output the pooling result of the batch normalized data after nonlinear activation.
- 根据权利要求2所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,所述BNRP计算器还包括模式简化模块,在执行非线性激活操作前,模式选择器读取BNRP计算器数据输入接口接收的特征图数据以及BN权重参数和偏置参数,在不需要对特征图数据进行乘法运算和偏置加运算时将先批量归一化后池化再非线性激活这一计算模式下的批量归一化指令置零,或将先批量归一化后非线性激活再池化这一计算模式下的批量归一化操作指令及非线性激活指令置零。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 2, wherein the BNRP calculator further includes a mode simplification module, and before performing the nonlinear activation operation, the mode selector reads the BNRP The feature map data and BN weight parameters and bias parameters received by the calculator data input interface will be batch-normalized first, then pooled, and then nonlinearly activated when there is no need to multiply and offset the feature map data. Set the batch normalization command in the calculation mode to zero, or reset the batch normalization operation command and the non-linear activation command in the calculation mode of the batch normalization first, then nonlinear activation and then pooling.
- 根据权利要求3所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,所述模式简化模块包括三个分别判断特征图数据、BN权重参数和偏置参数与0大小关系的比较器,在同时满足特征数数据小于或等于0、BN权重参数大于或等于0、偏置参数小于或等于0这三个条件时,输出先批量归一化后池化再非线性激活这一计算模式中批量归一化指令为零的配置参数,或先批量归一化后非线性激活再池化这一计算模式中批量归一化操作指令及非线性激活指令均为零的配置参数。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 3, characterized in that the mode simplification module includes three determining the relationship between feature map data, BN weight parameters, and bias parameters, respectively, and 0 Comparator, when the three conditions that the feature number data is less than or equal to 0, the BN weight parameter is greater than or equal to 0, and the bias parameter is less than or equal to 0, the output is first batch normalized and then pooled and then activated nonlinearly. A configuration parameter in which the batch normalization command is zero in the calculation mode, or the configuration parameter in which the batch normalization command and the non-linear activation command are both zero in the calculation mode, after the batch normalization is first nonlinear activation and then pooling .
- 根据权利要求2所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,当模式配置器发送的功能配置参数包含执行2*2最大池化指令时,所述R*T个池化器为R*T个2*2池化器,2*2池化器是由第一二选一比较器和第二二选一比较器组成的一个四选一比较器,每个时钟输入两个特征图数据到两个二选一比较器的输出端,四选一比较器每2个时钟输出一个2*2pooling值,当池化步长为1时,保存第二二选一比较器的输出值作为下一个时钟第一二选一比较器的输出值;当模式配置器发送的功能配置参数包含执行2*2平均池化 指令时,将最大池化模式的比较器配置成1/2除法器。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 2, characterized in that, when the function configuration parameters sent by the mode configurator include the execution of 2*2 maximum pooling instructions, the R*T The two poolers are R*T 2*2 poolers. The 2*2 pooler is a four-choice comparator composed of the first two-choice one comparator and the second two-choice one comparator, each Clock input two feature map data to the output terminals of two one-of-two comparators, one-of-four comparators output a 2*2 pooling value every 2 clocks, when the pooling step is 1, save the second and choose one The output value of the comparator is used as the output value of the first or second comparator in the next clock; when the function configuration parameters sent by the mode configurator include the execution of 2*2 average pooling instructions, the maximum pooling mode comparator is configured as 1/2 divider.
- 根据权利要求2所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,当模式配置器发送的功能配置参数包含执行3*3最大池化指令时,所述R*T个池化器为R*T个3*3池化器,3*3池化器是由第一三选一比较器、第二三选一比较器、第三三选一比较器组成的一个九选一比较器,每个时钟输入三个特征图数据到三个三选一比较器的输入端,九选一比较器每3个时钟输出一个3*3pooling值,当池化步长为1时,保存第二三选一比较器的输出值作为下一个时钟第一三选一比较器的输出值,保存第三三选一比较器的输出值作为下一个时钟第二三选一比较器的输出值,当池化步长为2时,保存第三三选一比较器的输出值作为下一个时钟第一三选一比较器的输出值;当模式配置器发送的功能配置参数包含执行3*3平均池化指令时,将最大池化模式的比较器配置成1/3除法器。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 2, characterized in that, when the function configuration parameters sent by the mode configurator include the execution of a 3*3 maximum pooling instruction, the R*T Each pooler is R*T 3*3 pooler, 3*3 pooler is composed of the first three-select one comparator, the second three-select one comparator, and the third three-select one comparator One-of-nine comparators, each clock inputs three characteristic map data to the input terminals of three-to-three comparators, and one-of-nine comparators output a 3*3 pooling value every 3 clocks, when the pooling step is 1 When, save the output value of the second one-of-three comparator as the output value of the next clock, and save the output value of the third one-of-three comparator as the next clock When the pooling step size is 2, save the output value of the third one-of-three comparator as the output value of the first three-to-one comparator of the next clock; when the function configuration parameters sent by the mode configurator include execution When the 3*3 average pooling instruction is used, configure the comparator in the maximum pooling mode as a 1/3 divider.
- 根据权利要求1所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,所述模式配置器通过AHB总线从外部读取网络参数、特征图参数、计算模式和功能配置参数,所述网络参数包括网络层数和卷积核大小,特征图参数包括特征图尺寸参数和批次,计算模式为对卷积计算器输出的卷积结果执行先批量归一化后池化再非线性激活或者先批量归一化后非线性激活再池化,功能配置参数包括是否进行批量归一化操作、是否进行非线性激活操作、是否进行池化操作。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 1, wherein the mode configurator reads network parameters, feature map parameters, calculation mode and functional configuration parameters from the outside through the AHB bus The network parameters include the number of network layers and the size of the convolution kernel. The feature map parameters include the feature map size parameters and batches. The calculation mode is to perform batch normalization and then pooling on the convolution results output by the convolution calculator. Non-linear activation or first batch normalization and then non-linear activation and then pooling. The function configuration parameters include whether to perform batch normalization operations, whether to perform non-linear activation operations, and whether to perform pooling operations.
- 根据权利要求1所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,所述数据压缩编码/解码器通过AXI4总线从外部读取的特征图数据、权重数据、BN参数。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 1, wherein the data compression encoder/decoder reads feature map data, weight data, and BN parameters from the outside through the AXI4 bus .
- 根据权利要求1所述一种基于BNRP的可配置并行通用卷积神经网络加速器,其特征在于,在输入特征图数据大于卷积计算器的阵列行数且需要执行池化操作时,将m行输入特征图数据交错缓存到2m块片上Block RAM。The BNRP-based configurable parallel general convolutional neural network accelerator according to claim 1, wherein when the input feature map data is greater than the number of array rows of the convolution calculator and the pooling operation needs to be performed, m rows The input feature map data is interleaved and cached to 2m on-chip Block RAM.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910572582.3 | 2019-06-28 | ||
CN201910572582.3A CN110390385B (en) | 2019-06-28 | 2019-06-28 | BNRP-based configurable parallel general convolutional neural network accelerator |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020258529A1 true WO2020258529A1 (en) | 2020-12-30 |
Family
ID=68285909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/105534 WO2020258529A1 (en) | 2019-06-28 | 2019-09-12 | Bnrp-based configurable parallel general convolutional neural network accelerator |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110390385B (en) |
WO (1) | WO2020258529A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112905239A (en) * | 2021-02-19 | 2021-06-04 | 北京超星未来科技有限公司 | Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment |
CN113051216A (en) * | 2021-04-22 | 2021-06-29 | 南京工业大学 | MobileNet-SSD target detection device and method based on FPGA acceleration |
CN113052299A (en) * | 2021-03-17 | 2021-06-29 | 浙江大学 | Neural network memory computing device based on lower communication bound and acceleration method |
CN113255897A (en) * | 2021-06-11 | 2021-08-13 | 西安微电子技术研究所 | Pooling computing unit of convolutional neural network |
CN113516236A (en) * | 2021-07-16 | 2021-10-19 | 西安电子科技大学 | VGG16 network parallel acceleration processing method based on ZYNQ platform |
CN113592086A (en) * | 2021-07-30 | 2021-11-02 | 中科亿海微电子科技(苏州)有限公司 | Method and system for obtaining optimal solution of parallelism of FPGA CNN accelerator |
CN113592067A (en) * | 2021-07-16 | 2021-11-02 | 华中科技大学 | Configurable convolution calculation circuit for convolution neural network |
CN113743587A (en) * | 2021-09-09 | 2021-12-03 | 苏州浪潮智能科技有限公司 | Convolutional neural network pooling calculation method, system and storage medium |
CN113792621A (en) * | 2021-08-27 | 2021-12-14 | 杭州电子科技大学 | A Design Method of Target Detection Accelerator Based on FPGA |
CN114239816A (en) * | 2021-12-09 | 2022-03-25 | 电子科技大学 | Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network |
CN114265696A (en) * | 2021-12-28 | 2022-04-01 | 北京航天自动控制研究所 | Pooler and Pooling Acceleration Circuit for Max Pooling Layer of Convolutional Neural Network |
CN114819129A (en) * | 2022-05-10 | 2022-07-29 | 福州大学 | Convolution neural network hardware acceleration method of parallel computing unit |
CN114911628A (en) * | 2022-06-15 | 2022-08-16 | 福州大学 | An FPGA-based MobileNet Hardware Acceleration System |
CN114936636A (en) * | 2022-04-29 | 2022-08-23 | 西安电子科技大学广州研究院 | General lightweight convolutional neural network acceleration method based on FPGA |
CN115145839A (en) * | 2021-03-31 | 2022-10-04 | 广东高云半导体科技股份有限公司 | Deep convolution accelerator and method for accelerating deep convolution by using same |
CN115204364A (en) * | 2022-06-28 | 2022-10-18 | 中国电子科技集团公司第五十二研究所 | A convolutional neural network hardware acceleration device with dynamic allocation of cache space |
CN116309520A (en) * | 2023-04-03 | 2023-06-23 | 江南大学 | A strip steel surface defect detection system |
CN117933345A (en) * | 2024-03-22 | 2024-04-26 | 长春理工大学 | A training method for medical image segmentation model |
CN118070855A (en) * | 2024-04-18 | 2024-05-24 | 南京邮电大学 | Convolutional neural network accelerator based on RISC-V architecture |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158756B (en) * | 2019-12-31 | 2021-06-29 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing information |
CN111242295B (en) * | 2020-01-20 | 2022-11-25 | 清华大学 | Method and circuit capable of configuring pooling operator |
CN111142808B (en) * | 2020-04-08 | 2020-08-04 | 浙江欣奕华智能科技有限公司 | Access device and access method |
CN111832717B (en) * | 2020-06-24 | 2021-09-28 | 上海西井信息科技有限公司 | Chip and processing device for convolution calculation |
CN111736904B (en) * | 2020-08-03 | 2020-12-08 | 北京灵汐科技有限公司 | Multitask parallel processing method and device, computer equipment and storage medium |
CN112905530B (en) * | 2021-03-29 | 2023-05-26 | 上海西井信息科技有限公司 | On-chip architecture, pooled computing accelerator array, unit and control method |
CN113065647B (en) * | 2021-03-30 | 2023-04-25 | 西安电子科技大学 | Calculation-storage communication system and communication method for accelerating neural network |
CN114004351B (en) * | 2021-11-22 | 2025-04-18 | 浙江大学 | A convolutional neural network hardware acceleration platform |
CN115470164B (en) * | 2022-09-30 | 2025-07-08 | 上海安路信息科技股份有限公司 | A hybrid system based on FPGA+NPU architecture |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184366A (en) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | Time-division-multiplexing general neural network processor |
CN105631519A (en) * | 2015-12-31 | 2016-06-01 | 北京工业大学 | Convolution nerve network acceleration method based on pre-deciding and system |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
US20180341495A1 (en) * | 2017-05-26 | 2018-11-29 | Purdue Research Foundation | Hardware Accelerator for Convolutional Neural Networks and Method of Operation Thereof |
CN109635944A (en) * | 2018-12-24 | 2019-04-16 | 西安交通大学 | A kind of sparse convolution neural network accelerator and implementation method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229647A (en) * | 2017-08-18 | 2018-06-29 | 北京市商汤科技开发有限公司 | The generation method and device of neural network structure, electronic equipment, storage medium |
US11568218B2 (en) * | 2017-10-17 | 2023-01-31 | Xilinx, Inc. | Neural network processing system having host controlled kernel acclerators |
CN109389212B (en) * | 2018-12-30 | 2022-03-25 | 南京大学 | Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network |
CN109767002B (en) * | 2019-01-17 | 2023-04-21 | 山东浪潮科学研究院有限公司 | A neural network acceleration method based on multi-block FPGA co-processing |
CN109934339B (en) * | 2019-03-06 | 2023-05-16 | 东南大学 | A Universal Convolutional Neural Network Accelerator Based on a 1D Systolic Array |
-
2019
- 2019-06-28 CN CN201910572582.3A patent/CN110390385B/en active Active
- 2019-09-12 WO PCT/CN2019/105534 patent/WO2020258529A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184366A (en) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | Time-division-multiplexing general neural network processor |
CN105631519A (en) * | 2015-12-31 | 2016-06-01 | 北京工业大学 | Convolution nerve network acceleration method based on pre-deciding and system |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
US20180341495A1 (en) * | 2017-05-26 | 2018-11-29 | Purdue Research Foundation | Hardware Accelerator for Convolutional Neural Networks and Method of Operation Thereof |
CN109635944A (en) * | 2018-12-24 | 2019-04-16 | 西安交通大学 | A kind of sparse convolution neural network accelerator and implementation method |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112905239B (en) * | 2021-02-19 | 2024-01-12 | 北京超星未来科技有限公司 | Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment |
CN112905239A (en) * | 2021-02-19 | 2021-06-04 | 北京超星未来科技有限公司 | Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment |
CN113052299B (en) * | 2021-03-17 | 2022-05-31 | 浙江大学 | Neural network memory computing device based on lower communication bound and acceleration method |
CN113052299A (en) * | 2021-03-17 | 2021-06-29 | 浙江大学 | Neural network memory computing device based on lower communication bound and acceleration method |
CN115145839A (en) * | 2021-03-31 | 2022-10-04 | 广东高云半导体科技股份有限公司 | Deep convolution accelerator and method for accelerating deep convolution by using same |
CN115145839B (en) * | 2021-03-31 | 2024-05-14 | 广东高云半导体科技股份有限公司 | Depth convolution accelerator and method for accelerating depth convolution |
CN113051216A (en) * | 2021-04-22 | 2021-06-29 | 南京工业大学 | MobileNet-SSD target detection device and method based on FPGA acceleration |
CN113051216B (en) * | 2021-04-22 | 2023-07-11 | 南京工业大学 | MobileNet-SSD target detection device and method based on FPGA acceleration |
CN113255897A (en) * | 2021-06-11 | 2021-08-13 | 西安微电子技术研究所 | Pooling computing unit of convolutional neural network |
CN113255897B (en) * | 2021-06-11 | 2023-07-07 | 西安微电子技术研究所 | Pooling calculation unit of convolutional neural network |
CN113592067B (en) * | 2021-07-16 | 2024-02-06 | 华中科技大学 | A configurable convolution computing circuit for convolutional neural networks |
CN113592067A (en) * | 2021-07-16 | 2021-11-02 | 华中科技大学 | Configurable convolution calculation circuit for convolution neural network |
CN113516236A (en) * | 2021-07-16 | 2021-10-19 | 西安电子科技大学 | VGG16 network parallel acceleration processing method based on ZYNQ platform |
CN113592086A (en) * | 2021-07-30 | 2021-11-02 | 中科亿海微电子科技(苏州)有限公司 | Method and system for obtaining optimal solution of parallelism of FPGA CNN accelerator |
CN113792621B (en) * | 2021-08-27 | 2024-04-05 | 杭州电子科技大学 | FPGA-based target detection accelerator design method |
CN113792621A (en) * | 2021-08-27 | 2021-12-14 | 杭州电子科技大学 | A Design Method of Target Detection Accelerator Based on FPGA |
CN113743587A (en) * | 2021-09-09 | 2021-12-03 | 苏州浪潮智能科技有限公司 | Convolutional neural network pooling calculation method, system and storage medium |
CN113743587B (en) * | 2021-09-09 | 2024-02-13 | 苏州浪潮智能科技有限公司 | A convolutional neural network pooling calculation method, system, and storage medium |
CN114239816A (en) * | 2021-12-09 | 2022-03-25 | 电子科技大学 | Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network |
CN114239816B (en) * | 2021-12-09 | 2023-04-07 | 电子科技大学 | Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network |
CN114265696A (en) * | 2021-12-28 | 2022-04-01 | 北京航天自动控制研究所 | Pooler and Pooling Acceleration Circuit for Max Pooling Layer of Convolutional Neural Network |
CN114936636A (en) * | 2022-04-29 | 2022-08-23 | 西安电子科技大学广州研究院 | General lightweight convolutional neural network acceleration method based on FPGA |
CN114819129A (en) * | 2022-05-10 | 2022-07-29 | 福州大学 | Convolution neural network hardware acceleration method of parallel computing unit |
CN114911628A (en) * | 2022-06-15 | 2022-08-16 | 福州大学 | An FPGA-based MobileNet Hardware Acceleration System |
CN115204364A (en) * | 2022-06-28 | 2022-10-18 | 中国电子科技集团公司第五十二研究所 | A convolutional neural network hardware acceleration device with dynamic allocation of cache space |
CN116309520A (en) * | 2023-04-03 | 2023-06-23 | 江南大学 | A strip steel surface defect detection system |
CN117933345A (en) * | 2024-03-22 | 2024-04-26 | 长春理工大学 | A training method for medical image segmentation model |
CN117933345B (en) * | 2024-03-22 | 2024-06-11 | 长春理工大学 | A training method for medical image segmentation model |
CN118070855A (en) * | 2024-04-18 | 2024-05-24 | 南京邮电大学 | Convolutional neural network accelerator based on RISC-V architecture |
CN118070855B (en) * | 2024-04-18 | 2024-07-09 | 南京邮电大学 | A convolutional neural network accelerator based on RISC-V architecture |
Also Published As
Publication number | Publication date |
---|---|
CN110390385A (en) | 2019-10-29 |
CN110390385B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020258529A1 (en) | Bnrp-based configurable parallel general convolutional neural network accelerator | |
CN109284817B (en) | Deep separable convolutional neural network processing architecture/method/system and medium | |
CN109934339B (en) | A Universal Convolutional Neural Network Accelerator Based on a 1D Systolic Array | |
WO2020258841A1 (en) | Deep neural network hardware accelerator based on power exponent quantisation | |
CN109447241B (en) | A Dynamic Reconfigurable Convolutional Neural Network Accelerator Architecture for the Internet of Things | |
CN106991477B (en) | Artificial neural network compression coding device and method | |
US10936941B2 (en) | Efficient data access control device for neural network hardware acceleration system | |
US20190026626A1 (en) | Neural network accelerator and operation method thereof | |
CN110516801A (en) | A High Throughput Dynamically Reconfigurable Convolutional Neural Network Accelerator Architecture | |
CN107169563A (en) | Processing system and method applied to two-value weight convolutional network | |
CN109389212B (en) | Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network | |
CN106228240A (en) | Degree of depth convolutional neural networks implementation method based on FPGA | |
CN107085562B (en) | Neural network processor based on efficient multiplexing data stream and design method | |
CN110991630A (en) | Convolutional neural network processor for edge calculation | |
CN111626403B (en) | Convolutional neural network accelerator based on CPU-FPGA memory sharing | |
CN115880132B (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN111507465B (en) | A Configurable Convolutional Neural Network Processor Circuit | |
CN107844829A (en) | Method and system and neural network processor for accelerans network processing unit | |
CN107729995A (en) | Method and system and neural network processor for accelerans network processing unit | |
CN108345934B (en) | A kind of activation device and method for neural network processor | |
CN111860773A (en) | Processing apparatus and method for information processing | |
CN113392963B (en) | FPGA-based CNN hardware acceleration system design method | |
CN115983348A (en) | RISC-V Accelerator System Supporting Extended Instructions for Convolutional Neural Networks | |
CN117632844A (en) | Reconfigurable AI algorithm hardware accelerator | |
CN108647780A (en) | Restructural pond operation module structure towards neural network and its implementation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19935380 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935380 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935380 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.08.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935380 Country of ref document: EP Kind code of ref document: A1 |