CN113762496B

CN113762496B - Method for reducing low-bit convolutional neural network reasoning operation complexity

Info

Publication number: CN113762496B
Application number: CN202010497777.9A
Authority: CN
Inventors: 张东
Original assignee: Hefei Ingenic Technology Co ltd
Current assignee: Hefei Ingenic Technology Co ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2024-05-03
Anticipated expiration: 2040-06-04
Also published as: CN113762496A

Abstract

The present invention provides a method for reducing the complexity of low-bit convolutional neural network reasoning operations, including quantizing the stored data after the S1 neural network training is completed, assuming that the quantization of the i-th layer is as follows: Where _δi is the activation function, _QA is the quantization formula of the feature map, and _Qw is the quantization formula of the weight; S2 is quantized when the parameters of the formula in S1 meet the conditions. Obtained through fixed-point operations: S3 determines the threshold from the quantization of the feature map: Quantization of the feature map: The thresholds are directly derived from the feature map quantization formula as (0.5, 1.5…(2 ^k ‑0.5)), where k is the quantization bit width; because the distance between thresholds is 1.0, only the thresholds need to be saved in the final quantization. in Then the threshold Where k is the bit width of quantization; S4 is quantized to low bits, the value of the feature map after quantization is determined, and Q _A is uniform quantization, so S2 The final quantization result is obtained by comparing with a series of thresholds (T ₁ , T _{2 .} . . T _n ). The present application solves the problem of high computational complexity and high computational resource requirements in the low-bit model reasoning process.

Description

Method for reducing low-bit convolutional neural network reasoning operation complexity

Technical Field

The invention relates to the technical field of neural network acceleration, in particular to a method for reducing the reasoning operation complexity of a low-bit convolutional neural network.

Background

In recent years, with rapid development of technology, a large data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and has quite remarkable results in many key fields of artificial intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is used as a typical DNN structure, can effectively extract hidden layer characteristics of images, accurately classifies the images, and is widely applied to the fields of image recognition and detection in recent years.

In particular, multiplying the shift achieves 32 bits quantization as low bits: and storing the result obtained by the quantized convolution operation as 32bit shaping, and then performing multiplication and shift operation according to the pre-calculated parameters to realize conversion from 32bit to low bit.

However, when the 32 bits are quantized into the low bits in the prior art, since the accuracy after quantization needs to be ensured, a series of addition and comparison operations need to be performed in the quantization process, so that the computational complexity and the computational resource are greatly increased, and the cost is often too high especially when the quantization is performed to 2 bits.

Furthermore, the common terminology in the prior art is as follows:

Convolutional neural network (Convolutional Neural Networks, CNN): is a type of feedforward neural network that includes convolution calculations and has a depth structure.

Quantification: quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) discrete values.

Low bits: the data is quantized to 8bit,4bit or 2bit wide data.

Reasoning: and after the neural network training is finished, performing operation by using the stored data.

Disclosure of Invention

The application provides a method for reducing the inference operation complexity of a low-bit convolutional neural network, which aims to overcome the defects in the prior art and solve the problems of high computation complexity and high computation resource requirement in the existing low-bit model inference process.

Specifically, the invention provides a method for reducing the inference operation complexity of a low-bit convolutional neural network, which comprises the following steps:

S1, after the neural network training is finished, the stored data is used for quantization,

Let the quantization of the i-th layer be as follows:

Wherein delta _i is an activation function, Q _A is a quantization formula of feature map, and Q _w is a quantization formula of weight;

s2, when the parameters of the formula in S1 meet the following conditions:

1)、 Representing i.e./>, with floating point scalar scaled fixed point numbers W _int is a fixed point number expressed in an integer;

2)、 Representing i.e./>, with floating point scalar scaled fixed point numbers X _int is a fixed point number expressed in an integer;

3) Delta _i is a monotonic function;

then, quantize Obtained by fixed point number operation, namely:

s3, determining a threshold value from quantization of feature map:

The quantization formula of feature map is:

The threshold value (0.5, 1.5 … (2 ^k -0.5)) can be directly deduced from the quantization formula of the feature map above, where k is the quantized bit width;

since the distance between the thresholds is 1.0, only the preservation is needed in the final quantization WhereinThresholdWhere k is the quantized bit width;

s4, since the value of the feature map after quantization is already determined when the quantization is low-bit, and Q _A is uniformly quantized, δ_i(s_ws_xs_BN(w_int·x_int+b_i/(s_ws_xs_BN))) in S2 obtains the final quantization result by comparing with a series of the thresholds (T ₁,T₂…T_n) in step S3.

And when the quantization is carried out to the low bit 2bit in the step S2, the quantized feature map has the values of 0,1,2 and 3.

Since δ _i is a monotonic function in step S2, S _ws_x > 0, it is also possible to pass (w _int·x_int+b_i/(s_ws_xs_BN)) andComparison between them to obtain quantized results.

In the step S4, since S _BN are different for each channel, one for each channel is required to be saved when the threshold is saved.

Thus, the present application has the advantages that:

1. the 32 bits are quantized into low bits directly through threshold comparison, so that the complexity of operation is reduced;

2. The overall running time of the quantization model is reduced;

3. the demand of operation resources is reduced;

The operation of 64 bits by 64 bits is avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application.

FIG. 1 is a schematic flow chart of the method of the present invention.

Detailed Description

In order that the technical content and advantages of the present invention may be more clearly understood, a further detailed description of the present invention will now be made with reference to the accompanying drawings.

As shown in fig. 1, the method for reducing the inference operation complexity of the low-bit convolutional neural network comprises the following steps:

Let the quantization of the i-th layer be as follows:

s2, when the parameters of the formula in S1 meet the following conditions:

3) Delta _i is a monotonic function;

then, quantize Obtained by fixed point number operation, namely:

s3, determining a threshold value from quantization of feature map:

The quantization formula of feature map is:

In particular, the method of the application can also be expressed as follows:

suppose the quantization calculation for the i-th layer is as follows:

Wherein delta _i is an activation function, Q _A is a quantization formula of feature map, and Q _w is a quantization formula of weight

The parameters in the above formula meet the following conditions:

1、 i.e./>, can be represented by fixed point numbers scaled with floating point scalar W _int is the fixed point number expressed in integers

2、I.e./>, can be represented by fixed point numbers scaled with floating point scalarX _int is the fixed point number expressed in integers

3. Delta _i is a monotonic function

So calculate the finalThe method can be obtained by calculating the fixed point number:

Since the value of the quantized feature map is actually determined (taking 2 bits as an example, the feature map takes 0,1,2, 3) and Q _A is uniformly quantized when quantized to low bits, δ_i(s_ws_xs_BN(w_int·x_int+b_i/(s_ws_xs_BN))) can be compared with a series of thresholds (T ₁,T₂…T_n) to obtain a quantized result, and since δ _i is a monotonic function, s _ws_x > 0, it can also be obtained by (w _int·x_int+b_i/(s_ws_xs_BN) and (d) a quantization result Comparison between them to obtain quantized results.

The determination of the threshold needs to start with the quantization formula of the feature map.

The quantization formula of feature map is:

from the above equation, the threshold value can be directly deduced to be (0.5, 1.5 … (2 ^k -0.5)), where k is the quantized bit width. Since the distance between the thresholds is 1.0, we only need to preserve at the final quantization Wherein the method comprises the steps ofThreshold value Where k is the quantized bit width; since s _BN are different for each channel, one per channel is required to save the threshold.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for reducing the complexity of the reasoning operation of a low-bit convolutional neural network, which is suitable for image recognition and detection, comprising the following steps:

Let the quantization of the i-th layer be as follows:

s2, when the parameters of the formula in S1 meet the following conditions:

3) Delta _i is a monotonic function;

then, quantize Obtained by fixed point number operation, namely:

s3, determining a threshold value from quantization of feature map:

The quantization formula of feature map is:

Directly deriving a threshold value (0.5, 15 … (2 ^k -0.5)) from the quantization formula of the feature map, where k is the quantized bit width;

since the distance between the thresholds is 1.0, only the preservation is needed in the final quantization Wherein the method comprises the steps ofThresholdWhere k is the quantized bit width;

S4, since the value of the feature map after quantization is already determined when the quantization is low-bit, and Q _A is uniformly quantized, δ_i(s_ws_xs_BN(w_int·x_int+b_i/(s_ws_xs_BN))) in S2 obtains the final quantization result by comparing with a series of thresholds (T ₁,T₂…T_n) in step S3.

2. The method for reducing the complexity of the reasoning operation of the low-bit convolutional neural network according to claim 1, wherein the quantized feature map has a value of 0,1,2,3 when the low-bit 2bit is quantized in the step S2.

3. The method for reducing the complexity of the low-bit convolutional neural network inference operation according to claim 1, wherein in the step S2, δ _i is a monotonic function, S _ws_x > 0, so (w _int·x_int+b_i/(s_ws_xs_BN)) and (w _int·x_int+b_i/(s_ws_xs_BN)) can be used as wellComparison between them to obtain quantized results.

4. The method of claim 1, wherein in step S4, since S _BN are different for each channel, one needs to be stored for each channel when storing the threshold.