[go: up one dir, main page]

CN114841325B - Data processing method and medium of neural network model and electronic equipment - Google Patents

Data processing method and medium of neural network model and electronic equipment Download PDF

Info

Publication number
CN114841325B
CN114841325B CN202210556167.0A CN202210556167A CN114841325B CN 114841325 B CN114841325 B CN 114841325B CN 202210556167 A CN202210556167 A CN 202210556167A CN 114841325 B CN114841325 B CN 114841325B
Authority
CN
China
Prior art keywords
data
parameter
quantization parameter
fixed point
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210556167.0A
Other languages
Chinese (zh)
Other versions
CN114841325A (en
Inventor
章小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202210556167.0A priority Critical patent/CN114841325B/en
Publication of CN114841325A publication Critical patent/CN114841325A/en
Application granted granted Critical
Publication of CN114841325B publication Critical patent/CN114841325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method, medium and electronic device for a neural network model. The method comprises the following steps: determining floating point data to be processed in a neural network model; quantizing floating point data through a post quantization parameter to obtain fixed point numbers, wherein the post quantization parameter is determined according to a preset floating point threshold value parameter; and taking the fixed point number as fixed point input data of a first ThresholdReLU activation operation included in the neural network model to obtain fixed point output data, wherein the first ThresholdReLU activation operation is associated with the post-quantization parameter. The application adopts the post-quantization parameter to ensure that the fixed point number of the floating point data which is larger than the floating point threshold parameter is larger than the fixed point threshold parameter which is obtained after the floating point threshold parameter is quantized, thereby improving the accuracy of the model operation result.

Description

Data processing method and medium of neural network model and electronic equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method, medium and electronic device for a neural network model.
Background
With the development of artificial intelligence technology (ARTIFICIAL INTELLIGENCE, AI), neural network models are increasingly being used in the field of artificial intelligence. For example, the method is applied to application scenes such as image recognition, target detection, reinforcement learning, information recommendation, user behavior prediction and the like. Because the neural network is a resource intensive algorithm, the calculation cost and the memory consumption are large, in order to reduce the parameter quantity of the neural network model and the calculation quantity in the running process of a computer and reduce the calculation cost and the memory occupation, the neural network model is usually required to be quantized, and high-precision floating point operation in the neural network model is converted into fixed point operation, so that the neural network model of the fixed point operation is obtained. The Threshold correction linear Unit (ThresholdReLU) activation function is widely used as a common activation function in neural network models, such as the activation layer operator of a neural network, to perform nonlinear activation operation. Therefore, in quantifying the neural network model, quantization of ThresholdReLU activation functions is involved in many cases.
In the operation process of the neural network model, since the value of some data to be subjected to ThresholdReLU activation operation in the floating point domain is greater than the threshold parameter of ThresholdReLU activation operation, but the value in the fixed point domain is equal to the threshold parameter of ThresholdReLU activation operation, the problem that the result obtained by performing ThresholdReLU activation operation on the data to be subjected to ThresholdReLU activation operation in the floating point domain is not 0, and the floating point value corresponding to the fixed point result obtained by performing ThresholdReLU activation operation in the fixed point domain is 0 is generated, so that the accuracy of the operation result of the neural network model is affected.
Disclosure of Invention
In order to solve the problems, the application provides a data processing method, a medium and electronic equipment of a neural network model.
In a first aspect, an embodiment of the present application provides a data processing method of a neural network model, which is applied to an electronic device, and is characterized in that the method includes:
Determining floating point data to be processed in a neural network model;
Quantizing floating point data through a post quantization parameter to obtain fixed point numbers, wherein the post quantization parameter is determined according to a preset floating point threshold value parameter;
And taking the fixed point number as fixed point input data of a first Threshold RECTIFIED LINEAR Unit (ThresholdReLU) activation operation included in the neural network model to obtain fixed point output data, wherein the first ThresholdReLU activation operation is associated with the post-quantization parameter.
It may be appreciated that in the embodiment of the present application, the post-quantization parameter is a second quantization parameter, the preset floating point threshold parameter is a threshold parameter preset in the floating point domain, and the first ThresholdReLU activation operation is a ThresholdReLU activation operation performed in the fixed point domain.
It can be appreciated that, in the data processing method for a neural network model provided by the embodiment of the present application, floating point data is quantized by using a post quantization parameter to obtain a fixed point number, and the fixed point number is used as fixed point input data of a first ThresholdReLU activation operation included in the neural network model to obtain fixed point output data. Therefore, the situation that the floating point value corresponding to the output result obtained in the fixed point domain is 0 because the fixed point value obtained after quantization is not greater than the fixed point value obtained after the quantization of the floating point threshold value parameter is obtained can be conveniently avoided by only adjusting the quantization parameter in the quantization mode, wherein the floating point type data with the output result greater than 0 is obtained by carrying out ThresholdReLU activation operation on the floating point threshold value parameter in the operation process of the neural network model; the fixed point number obtained by quantizing floating point type data larger than the threshold value parameter through the post quantization parameter is also realized, the fixed point number obtained by quantizing floating point type data larger than the fixed point threshold value parameter through quantization of the floating point threshold value parameter after use is larger than the fixed point threshold value parameter obtained by quantizing the floating point type data through the quantization parameter, so that the floating point value corresponding to the fixed point output data obtained through ThresholdReLU activation operation in the fixed point domain is not 0, and the accuracy of the operation result of the neural network model is improved.
In a possible implementation of the first aspect, the post-quantization parameter is determined according to a floating point threshold parameter and a pre-quantization parameter.
It can be appreciated that in the embodiment of the present application, the front quantization parameter is a first quantization parameter.
It will be appreciated that in the embodiment of the present application, the post-quantization parameter is determined according to the floating-point threshold parameter and the pre-quantization parameter, that is, the post-quantization parameter can be determined only by knowing the floating-point threshold parameter and the pre-quantization parameter, which is simple in logic.
In a possible implementation of the first aspect, the post-quantization parameter is determined by a method comprising,
Determining a pre-quantization parameter according to the range of floating point data to be processed in the neural network model and the data type of the fixed point number;
and obtaining a post quantization parameter according to the pre quantization parameter and the floating point threshold parameter.
It can be understood that in the embodiment of the present application, the correlation between the front quantization parameter and the rear quantization parameter is reflected by determining the front quantization parameter according to the range of the floating point data to be processed in the neural network model and the data type of the fixed point number, and then obtaining the rear quantization parameter according to the front quantization parameter and the floating point threshold parameter, so that the distinction and the connection between the rear quantization parameter and the front quantization parameter can be better understood.
In a possible implementation of the first aspect, the obtaining the post-quantization parameter according to the pre-quantization parameter and the floating point threshold parameter includes,
Quantizing the floating point threshold parameter according to the front quantization parameter to obtain a first fixed point threshold parameter;
and obtaining a post-quantization parameter according to the first fixed-point threshold parameter and the floating-point threshold parameter.
It can be appreciated that in the embodiment of the present application, the floating point threshold parameter is quantized according to the front quantization parameter to obtain the first fixed point threshold parameter, and then the rear quantization parameter is obtained according to the first fixed point threshold parameter and the floating point threshold parameter, so that the relationship between the first fixed point threshold parameter and the rear quantization parameter can be analyzed, which is beneficial to discussing how the first fixed point threshold parameter can quickly obtain the rear quantization parameter under the condition of different parity.
In a possible implementation manner of the first aspect, the obtaining the post-quantization parameter according to the first fixed-point threshold parameter and the floating-point threshold parameter includes,
Under the condition that the first fixed point threshold value parameter is an odd number, obtaining a post-quantization parameter according to an odd number quantization parameter formula, wherein the odd number quantization parameter formula comprises the first fixed point threshold value parameter and a floating point threshold value parameter;
And under the condition that the first fixed-point threshold value parameter is even, obtaining a post-quantization parameter according to an even-number quantization parameter formula, wherein the even-number quantization parameter formula comprises the first fixed-point threshold value parameter and a floating-point threshold value parameter.
It can be understood that in the embodiment of the present application, since the specific rounding operation in the process of quantizing floating point data into fixed point numbers can use the corresponding odd quantization parameter formula and even quantization parameter formula under the condition of different parity of the first fixed point threshold parameter, according to the different parity of the first fixed point threshold parameter, there are the corresponding odd quantization parameter formula and even quantization parameter formula, so that the corresponding post quantization parameter can be obtained pointedly and accurately, and multiple obtaining modes are provided for the post quantization parameter.
In a possible implementation of the first aspect, the first fixed-point threshold parameter and the floating-point threshold parameter obtain post-quantization parameters, including,
And obtaining a post-quantization parameter according to a quantization parameter formula, wherein the quantization parameter formula comprises a first fixed-point threshold parameter and a floating-point threshold parameter.
It can be appreciated that in the embodiment of the present application, the odd quantization parameter formula and the even quantization parameter formula are combined according to the quantization parameter formula, so that the parity of the first fixed-point threshold parameter does not need to be considered, and the post quantization parameter can be obtained more directly.
In a possible implementation of the first aspect, the first ThresholdReLU activation operation is determined according to a post-quantization parameter and a floating point threshold parameter.
It will be appreciated that in the embodiment of the present application, the first ThresholdReLU active operation is determined according to the post-quantization parameter and the floating-point threshold parameter, that is, only the floating-point threshold parameter and the post-quantization parameter need to be known to determine the first ThresholdReLU active operation, which is simple in logic.
In a possible implementation of the first aspect, the first ThresholdReLU activation operation is determined by a method that includes,
Quantizing the floating point threshold parameters according to the post quantization parameters to obtain second fixed point threshold parameters;
a ThresholdReLU activation function is obtained for the first ThresholdReLU activation operation based on the second setpoint threshold parameter.
It can be understood that in the embodiment of the present application, the floating point threshold parameter is quantized according to the post quantization parameter to obtain the second fixed point threshold parameter, and the manner of obtaining the second fixed point threshold parameter is the same as the manner of quantizing the floating point data to be processed, which is logically scientific; the ThresholdReLU activation function for performing the first ThresholdReLU activation operation can be obtained according to the second fixed-point threshold parameter, the logic is simple, and the expansion and the application are convenient.
In a possible implementation of the first aspect, the range of floating point data to be processed in the neural network model is determined during operation of the neural network model.
It can be understood that in the embodiment of the application, the range of the floating point type data to be processed can be obtained by operating the neural network model, and the obtaining mode is convenient.
In a possible implementation of the first aspect, the fixed point number data type includes at least one of the following: int32, int16, int8, int4, uint32, uint16, uint8, or uint4.
It can be appreciated that in the embodiment of the present application, the data types of the fixed point number may include multiple data types, so that the application scope is wider.
In a possible implementation manner of the first aspect, the floating point data to be processed in the neural network model is obtained based on at least one of image data, audio data, text data, and video data.
It can be appreciated that in the embodiment of the application, the neural network has a self-learning function, an associative memory function and a capability of efficiently searching for an optimal solution, and can well process image data, audio data, text data and video data.
In a possible implementation of the first aspect, the quantization is a symmetric quantization.
It can be appreciated that in the embodiment of the application, the floating point data is quantized into the fixed point data by using symmetric quantization, so that the calculation speed of the neural network model can be increased, and the power consumption of equipment can be reduced.
In a second aspect, an embodiment of the present application provides a data processing apparatus for a neural network model, including a first determining unit configured to determine floating point type data to be processed in the neural network model;
the first acquisition unit is used for quantizing floating point type data through post quantization parameters to obtain fixed point numbers, wherein the post quantization parameters are determined according to preset floating point threshold parameters;
And the second acquisition unit is used for taking the fixed point number as fixed point input data of a first ThresholdReLU activation operation included in the neural network model to obtain fixed point output data, wherein the first ThresholdReLU activation operation is associated with a post-quantization parameter.
It is understood that in the embodiment of the present application, the post-quantization parameter is the second quantization parameter, and the first ThresholdReLU activation operation is the ThresholdReLU activation operation performed in the fixed point domain.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, including: stored on a computer scale storage medium are instructions for performing the method of the first aspect as described above and any one of the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product comprising: the computer program product comprises instructions for implementing the method according to the first aspect described above and any of the various possible implementations of the first aspect when executed by one or more processors.
In a fifth aspect, an embodiment of the present application provides an electronic device, including:
a memory for storing instructions, an
One or more processors that, when executed by the one or more processors, perform the method of the first aspect and any of the various possible implementations of the first aspect described above.
Based on the scheme, the application has the following beneficial effects:
According to the data processing method of the neural network model, the front quantization parameter is adjusted to obtain the rear quantization parameter, floating point data to be processed is quantized according to the rear quantization parameter to obtain fixed point numbers, and the fixed point numbers are used as fixed point input data of a first ThresholdReLU activating operation included in the neural network model to obtain fixed point output data. Therefore, under the condition of using the original quantization parameter, the problem that the floating point value corresponding to the fixed point output data obtained by performing the first ThresholdReLU activation operation on the floating point threshold parameter is 0 and the operation result error of the neural network model is larger because the rounding function is used and errors exist in the process of quantizing the floating point data to the fixed point data can be avoided. In addition, the method for adjusting the quantization parameter is simple in logic and easy to operate, only the preset floating point threshold parameter and the original quantization parameter are needed to be focused, and the accuracy of the operation result of the neural network model can be improved in the process of processing the image data, the audio data, the text data and the video data.
Drawings
FIG. 1 illustrates a schematic diagram of a ThresholdReLU activation function, according to some embodiments of the application;
FIG. 2 illustrates a schematic diagram of symmetric quantization, according to some embodiments of the application;
FIG. 3 illustrates a scene graph of a terminal 100 identifying an acquired face image through a face recognition model, according to some embodiments of the application;
FIG. 4 is a graph illustrating the relationship between input data and fixed point input data for a quantization parameter of 2, according to some embodiments of the present application;
FIG. 5 illustrates a flow chart for deriving a quantized ThresholdReLU activation function using an adjusted quantization parameter, according to some embodiments of the application;
FIG. 6a illustrates a schematic diagram of a first setpoint threshold parameter value being even in number, according to some embodiments of the application;
FIG. 6b illustrates a schematic diagram of a first setpoint threshold parameter value that is odd, according to some embodiments of the application;
FIG. 7a is a schematic diagram illustrating obtaining a corresponding second setpoint threshold parameter when the first setpoint threshold parameter value is even, according to some embodiments of the application;
FIG. 7b illustrates a schematic diagram of obtaining a corresponding second setpoint threshold parameter when the first setpoint threshold parameter value is odd, according to some embodiments of the application;
FIG. 8 illustrates a flowchart of a neural network model data process, according to some embodiments of the application;
FIG. 9 illustrates a schematic diagram of a data processing apparatus of a neural network model, according to some embodiments of the application;
fig. 10 illustrates a schematic structure of a terminal 100 according to some embodiments of the present application.
Detailed Description
Illustrative embodiments of the application include, but are not limited to, a data processing method, medium, and electronic device for a neural network model.
In order to more clearly illustrate the aspects of the embodiments of the present application, some terms related to the embodiments of the present application are explained below.
Quantification of neural network model: floating point operations in the neural network model are converted to fixed point operations.
Threshold correction Linear Unit (Threshold RECTIFIED LINEAR Unit, thresholdReLU) activation function
The expression of ThresholdReLU activation functions in the floating point domain is shown in equation 1 below:
Wherein x is input data of ThresholdReLU activation functions, a is threshold parameter (a is more than or equal to 0) of ThresholdReLU activation functions, and y is output data of ThresholdReLU activation functions. FIG. 1 shows a schematic diagram of ThresholdReLU activation functions in some embodiments of the application. For example, the threshold parameter a is 10.1, and if the input data is 10.5, the output data is 10.5, and if the input data is 9.5, the output data is 0.
Threshold correction Linear Unit (Threshold RECTIFIED LINEAR Unit, thresholdReLU) activated function quantization
ThresholdReLU activation function quantization may be implemented by symmetric quantization or asymmetric quantization. In the present application, the quantization mode is to quantize ThresholdReLU the activation function by a symmetric quantization mode to obtain a quantized ThresholdReLU activation function.
For ease of understanding, the symmetrical quantization scheme is described first.
Symmetric quantization:
Floating point number X in range [ X min,Xmax ] is mapped to fixed point number Q in range [ Q min,Qmax ] by equation 2:
q=round (X/S) (formula 2)
Where the fixed point number Q is a signed integer number (integral numeric types, int) whose range [ Q min,Qmax ] is specifically [ -2 n-1,2n-1 -1], where n is the number of bits quantized, e.g., the fixed point number Q is of the type int8, then [ Q min,Qmax ] is specifically [ -128, 127], and the floating point number X is quantized to a fixed point number in the range [ -128, 127] as shown in FIG. 2.
The round () function is a rounding function that rounds the value of X/S so that the resulting value is integer. In some embodiments of the present application, the rounding rule of the round () function may be: recording the maximum integer smaller than the X/S value as A, and when the X/S value is larger than the A+0.5 value, recording the round (X/S) value as A+1; when the value of X/S is less than the value of A+0.5, the value of round (X/S) is A; when the value of X/S is equal to the value of A+0.5, round (X/S) is the nearest even number of X/S. Specifically, when the value of X/S is equal to the value of A+0.5, and A is an even number, the value of round (X/S) is A; when the value of X/S is equal to the value of A+0.5, and A is an odd number, the value of round (X/S) is A+1. For example, if the value of X/S is 1.5, then the value of round (X/S) is 2,2 is the nearest even number of 1.5, and for example, if the value of X/S is 3.5, then the value of round (X/S) is 4,4 is the nearest even number of 3.5. It will be appreciated that, in other embodiments, the same result of the rounding rule of the round () function described above may be achieved by other rounding rules, which are not described herein. It will be appreciated that in other implementations, the rounding function may also be ceil () rounding up function.
S is a quantization parameter for quantizing floating point numbers to fixed point numbers, is a minimum scale for quantizing different floating point numbers X to the same fixed point number Q, and can be obtained according to a quantization parameter formula, for example, the following formula 3, or other modified formulas, which is not particularly limited herein. The max () function in equation 3 is the maximum function. And the absolute value is taken as the I. It will be appreciated that for a fixed point number, the floating point number corresponding to the fixed point number may be determined based on inverse quantization. In some embodiments, the above equation 2 is modified to obtain an inverse quantization equation 2', and the floating point number corresponding to the fixed point number is obtained according to equation 2'.
X=q×s (formula 2')
As can be seen from equation 1, the quantization of ThresholdReLU activation functions requires the quantization of the input data x, the threshold parameter a, and the output data y in the activation functions, and the fixed point numbers obtained by the quantization of the input data x, the threshold parameter a, and the output data y are respectively noted as fixed point input data x q, fixed point threshold parameter a q, and fixed point output data y q.
In some implementations, symmetric quantization of the input data x, the threshold parameter a, and the output data y results in fixed point input data x q, fixed point threshold parameter a q, fixed point output data y q, respectively. It can be understood that the relationship of the following formula 4 exists among the fixed point threshold parameter a q, the fixed point input data x q, and the fixed point output data y q.
It should be explained that in the following formula 4, the data types of the fixed-point input data and the fixed-point output data are the same, and the quantization parameter Sa in symmetric quantization corresponding to the threshold parameter a, the quantization parameter Sy in symmetric quantization corresponding to the output data y, and the quantization parameter Sx corresponding to the input data x are the same. Therefore, for convenience of explanation, the quantization parameter Sa, the quantization parameter Sy, and the quantization parameter Sx will be hereinafter collectively referred to as quantization parameter S.
The quantized ThresholdReLU activation function may be a modified equation of equation 4, and the specific form is not specifically required here.
The received setpoint input data x q is compared with the setpoint threshold parameter a q according to equation 4 to obtain setpoint output data y q. For example, the fixed point threshold parameter a q is 10, the fixed point output data y q is 11 if the fixed point input data x q is 11, and the fixed point output data y q is 0 if the fixed point input data x q is 9.
In order to better understand the present solution, an application scenario of the technical solution of the present application will be described first.
Fig. 3 shows a scene diagram of the terminal 100 for recognizing the acquired face image through the face recognition model. As shown in fig. 3, the terminal 100 is deployed with a face recognition model quantized by the server 200. After the terminal 100 collects the face image of the user, the collected face image may be subjected to face recognition by using a quantized face recognition model, thereby obtaining a face recognition result.
The face recognition model is usually quantized in order to reduce the amount of data. For example, the trained face recognition model is quantized by the server 200, and then the quantized face recognition model (hereinafter referred to as quantization model) is deployed into the terminal device 100. In the process of face recognition using the quantization model, the terminal 100 typically uses the quantized ThresholdReLU activation functions to perform the activation operation in the activation layer operator. After receiving the floating point data to be subjected to ThresholdReLU activation operation, quantizing the floating point data according to the same mode of obtaining the quantized ThresholdReLU activation function to obtain fixed point data, and using the obtained fixed point data as fixed point input data of the ThresholdReLU activation function to obtain fixed point output data of ThresholdReLU activation operation. It will be appreciated that in this face recognition scenario, the input data for the activation operation in the face recognition model may be face image data.
In some implementations, the server 200 performs multiple model training by using an unquantized face recognition model in advance, obtains a range of input data of ThresholdReLU activation operation required in a quantization formula, substitutes the obtained range of input data and a range of available values corresponding to a data type of fixed point input data required to be quantized into a quantization parameter formula to obtain a quantization parameter S, and quantizes a threshold parameter a according to the quantization parameter S to obtain a fixed point threshold parameter a q. After receiving floating point data to be subjected to ThresholdReLU activation operation, quantizing the floating point data according to the obtained quantization parameter S to obtain fixed point data, taking the obtained fixed point data as fixed point input data x q in a formula 4 and comparing the fixed point input data with fixed point threshold value parameter a q to obtain fixed point output data. It can be appreciated that the fixed point output data obtained may be dequantized according to the quantization parameter S to obtain the corresponding floating point output data.
For example, for the threshold parameter a of ThresholdReLU activation functions, taking a value of 4.1 as an example, in the scenario shown in fig. 3, the server 200 performs multiple model training using an unquantized face recognition model in advance, so as to obtain the range of input data of the ThresholdReLU activation functions of the ThresholdReLU activation functions in the floating point domain in the face recognition model in the training process to be [ -6.2,254.0], the fixed point input data type to be int8, and the corresponding range of values to be [ -128,127], where the quantization parameter S in the symmetric quantization mode is obtained according to equation 3, s=254.0/127=2, and the corresponding symmetric quantization equation 2 is q=round (X/2), and according to the obtained symmetric quantization equation, the relationship between the input data X and the fixed point input data X q can be obtained. As shown in fig. 4, fig. 4 shows a relationship diagram between the input data x and the fixed-point input data x q when the quantization parameter is 2. Symmetric quantization of the threshold parameter a of value 4.1 yields a fixed-point threshold parameter a q =round (4.1/2) =2, thereby determining the quantized ThresholdReLU activation function. After receiving floating point data to be subjected to ThresholdReLU activation operation, quantizing the floating point data according to a quantization parameter S with a value of 2 to obtain fixed point data, substituting the obtained fixed point data into a formula 4 as fixed point input data x q in the formula 4 to obtain fixed point output data y q. It can be understood that the obtained fixed-point output data is subjected to inverse quantization operation according to the formula 2 to obtain corresponding floating-point output data.
However, when the quantized ThresholdReLU activation function is used to perform the activation operation on the floating point data to be subjected to the activation operation in the operation process of the face recognition model, there may be a case that output data obtained by using an unquantized ThresholdReLU activation function on some floating point data to be subjected to the activation operation is not 0, and floating point output data corresponding to fixed point output data is obtained by using a quantized ThresholdReLU activation function, so that the face recognition result is affected.
For example, as shown in fig. 4, the fixed-point threshold parameter a q corresponding to the threshold parameter a=4.1 is 2, and it is known from the characteristic of the ThresholdReLU activation function that when the input data x is equal to or smaller than the threshold parameter a, the output data obtained by performing the activation operation is 0, and when the input data x is greater than the threshold parameter a, the output data obtained by performing the activation operation is the original value. If the floating point data to be activated is not only greater than the threshold activation function a but also the fixed point input data x q obtained after quantization is the same value as the fixed point threshold parameter a q in the operation process of the face recognition model, the fixed point output data y q obtained according to the value of the fixed point input data x q can be obtained according to the formula 4 and is 0.
For convenience of explanation, the quantized fixed-point input data x q and input data having the same value as the fixed-point threshold parameter a q are hereinafter referred to as "same-value data"; the value is smaller than or equal to the threshold value parameter a, and the input data with the same value of the fixed point input data x q and the fixed point threshold value parameter a q obtained after quantization is called as left same fixed value data; the input data with the same value as the fixed-point input data x q and the fixed-point threshold parameter a q, which are obtained after quantization, are called right same-value data, wherein the value of the input data is larger than the threshold parameter a. For the same threshold parameter a, different quantization parameters result in different co-valued data ranges.
For example, the input data in the interval l1 shown in fig. 4 are all the same constant value data, the specific range of the interval l1 is [3,5], the constant value after the data in the interval is quantized is 2, and the constant value is equal to the constant threshold value parameter a q; the input data in the interval l2 are all left same fixed value data, the specific range of the interval l2 is [3,4.1] which is smaller than or equal to the value 4.1 of the threshold value parameter a, the fixed value of the quantized data in the interval is 2, and the quantized fixed value is equal to the fixed point threshold value parameter a q; the input data in the interval l3 is the same right fixed value data, the specific range of the interval l3 is (4.1,5 ], the value is 4.1 greater than the threshold value parameter a, the fixed value of the quantized data in the interval is 2, and the quantized fixed value is equal to the fixed-point threshold value parameter a q.
Taking the right same fixed value data 4.3 to be activated in the interval l3 as an example for explanation, carrying out symmetric quantization on the same fixed value data 4.3 by obtaining, specifically calculating as round (4.3/2) =2, obtaining fixed point data 2, taking the fixed point data 2 as fixed point input data of a ThresholdReLU activation function after quantization, and according to a formula 4, it can be found that since the fixed point data value 2 is equal to the fixed point threshold parameter a q value 2, fixed point output data is obtained through a ThresholdReLU activation function after quantization, the fixed point output data 0 is inversely quantized according to a formula 2', and the obtained floating point output data is 0; when the same constant value data 4.3 is activated by using the unquantized ThresholdReLU activation function, according to the formula 1, it can be found that, because the same constant value data 4.3 is greater than the threshold value parameter 4.1, the obtained output data is 4.3, and the floating point output data obtained through the fixed point operation has a larger phase difference from the floating point output data obtained through the floating point operation, thereby influencing the face recognition result.
Therefore, the application provides a data processing method of a neural network model, which is characterized in that the maximum value in a new same-constant value data range is a threshold value parameter a by adjusting quantization parameters, so that the fixed point value obtained by quantizing the original right same-constant value data according to the adjusted quantization parameters is larger than the fixed point value obtained by quantizing the threshold value parameters according to the adjusted quantization parameters. It will be appreciated that after the quantization parameter is changed, the fixed point value obtained by the floating point data according to the changed quantization parameter may be the same as or different from the fixed point value obtained according to the original quantization parameter. For the same threshold parameter a, different quantization parameters correspond to different co-valued data ranges.
It should be noted that, when the neural network model is applied to the aspect of video, voice, image, text, etc., the input data for performing the activation operation may be image data or intermediate data obtained based on the image data during the operation of the neural network model, the output data obtained after the activation operation is data for image processing, for example, in the face recognition scenario, the input data for performing the activation operation during the operation of the face recognition model may be face image data or intermediate data obtained based on the face image data, and the output data obtained after the activation operation is data for face image processing; when the neural network model is applied to the aspect of audio and video processing, in the operation process of the neural network model, the input data for performing the activation operation can be audio and video data or intermediate data obtained based on the audio and video data, the output data obtained after the activation operation is still the audio and video data, for example, in an intelligent home voice recognition scene, in the operation process of the voice recognition model, the input data for performing the activation operation can be voice data or intermediate data obtained based on the voice data, and the output data obtained after the activation operation is data for voice processing; when the neural network model is applied to word processing, input data for performing an activation operation may be text data or intermediate data obtained based on the text data during operation of the neural network model, and output data obtained after performing the activation operation may be data for text processing, for example, in a word recognition scene, input data for performing the activation operation may be text data or intermediate data obtained based on the text data during operation of the word recognition model, and output data obtained after performing the activation operation may be data for text processing.
For example, taking the above example as an example, the quantization parameter S in symmetric quantization is adjusted to obtain a new quantization parameter S ', so that the right same fixed-value data in the section l3 as shown in fig. 4 is quantized according to the adjusted quantization parameter S', and the obtained fixed-point input data is greater than the fixed-point threshold parameter a q obtained by quantizing the threshold parameter a according to the adjusted quantization parameter.
Specifically, an original quantization parameter (hereinafter referred to as a first quantity parameter) is obtained first, and the threshold parameter a is quantized according to the first quantization parameter to obtain a first fixed-point threshold parameter a q1; according to the first fixed point threshold value parameter a q1 and the threshold value parameter a, according to a preset adjustment mode, the first quantization parameter is adjusted to obtain an adjusted quantization parameter (hereinafter referred to as a second quantization parameter), and the threshold value parameter a is quantized according to the second quantization parameter to obtain a second fixed point threshold value parameter a q2, wherein the maximum value of the same fixed value data range corresponding to the second quantization parameter is the threshold value parameter a, namely, the right same fixed value data is empty, and the fixed point value obtained by quantizing the original right same fixed value data according to the second quantization parameter is necessarily larger than the second fixed point threshold value parameter a q2; and determining the ThresholdReLU activation function after quantization according to the second fixed point threshold parameter a q2.
For example, in the above example, the threshold parameter 4.1 is less than the maximum value 5 of the same constant data range; adjusting the first quantization parameter S according to a first fixed point threshold parameter a q1 with a value of 2 and a threshold parameter a with a value of 4.1 to obtain a second quantization parameter S ', so that the threshold parameter a is quantized according to the second quantization parameter to obtain a second fixed point threshold parameter a q2, wherein the maximum value of the same fixed value data range corresponding to the second quantization parameter is 4.1 of the threshold parameter a, and when 4.3 uses the second quantization parameter S' for quantization, the obtained fixed point value is larger than the second fixed point threshold parameter; and determining the ThresholdReLU activation function after quantization according to the second fixed point threshold parameter a q2.
Therefore, in the actual running process of the model, after receiving floating point data to be subjected to ThresholdReLU activation operation, quantizing the floating point data by using a second quantization parameter to obtain fixed point data; and then taking the obtained fixed-point data as fixed-point input data of ThresholdReLU activation functions, thereby obtaining fixed-point output data of activation operation.
Therefore, in the technical scheme of the application, when the quantized ThresholdReLU activation function is used for activating floating point data to be activated in the operation process of the face recognition model, some output data obtained by using the unquantized ThresholdReLU activation function is not 0, and the quantized ThresholdReLU activation function is used for activating to obtain floating point data to be activated, wherein the floating point data corresponding to the fixed point output data is 0, and the output data obtained by using the quantized ThresholdReLU activation function for activating is not 0 according to the floating point data corresponding to the fixed point output data, so that the accuracy of the operation result of the model is improved.
For example, in the above example, taking the floating point data 4.3 to be subjected to ThresholdReLU activation operation as an example, the fixed point output data obtained by using the quantized ThresholdReLU activation function obtained by the first quantization parameter is 0 in the floating point data 4.3, and since the fixed point value obtained by using the quantized ThresholdReLU activation function obtained by the second quantization parameter in the floating point data 4.3 according to the second quantization parameter is greater than the second threshold parameter a q2, the fixed point output data obtained by using the quantized ThresholdReLU activation function obtained by using the second quantization parameter in the floating point data 4.3 is not 0, thereby improving the accuracy of the model running result.
It can be understood that the method for realizing the fixed point number obtained by quantizing floating point data larger than the threshold value parameter by adjusting the quantization parameter and the fixed point threshold value parameter obtained by quantizing the quantization parameter after the floating point threshold value parameter is used is not limited to the symmetric quantization by using the round () rounding function, and the symmetric quantization by using other rounding functions, so that the symmetric quantization with the effect is realized within the protection scope of the present application.
It should be noted that, the quantized ThresholdReLU activation function in the neural network model may be determined by the terminal 100, or the quantized ThresholdReLU activation function in the neural network model may be determined by the server 200. The terminal 100 includes, but is not limited to, one of a cell phone, a tablet computer, a smart screen, a wearable device (e.g., a wristwatch, a bracelet, a helmet, an earphone, etc.), a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), etc. electronic devices. The server 200 may be a single server or a server cluster composed of a plurality of servers.
The process of obtaining the quantized ThresholdReLU activation function will be described below with reference to the face recognition scenario shown in fig. 3, taking the example that the server 200 determines the quantized activation function in the neural network model. Specifically, referring to the flowchart shown in fig. 5, a process for obtaining a quantized ThresholdReLU activation function using an adjusted quantization parameter according to an embodiment of the present application includes the following steps:
s501, determining a first quantization parameter based on floating point input data for performing an activation operation in the operation process of the neural network model.
It will be appreciated that the floating point input data for the activation operation during operation of the neural network model may be video data, audio data, image data, text data, or data associated therewith, etc. Taking the application of the neural network model in the aspect of video as an example, floating point input data for performing an activation operation in the operation process of the neural network model is video data or other intermediate data obtained based on the video data, such as feature data, vectors and the like; when the neural network model is applied to other aspects, floating point input data for performing an activation operation in the operation process of the neural network model is also similar to the floating point input data, and will not be described herein.
In some implementations, when the activation operation is performed according to the neural network model operation, the first quantization parameter is obtained by the input data range of the activation operation in the floating point domain ThresholdReLU, the available value range corresponding to the data type of the fixed point input data of the activation operation in the fixed point domain ThresholdReLU, and the quantization parameter formula. Wherein, the input data range of the floating point domain ThresholdReLU activation operation is determined in the neural network model training process; the data type of the fixed point input data in the fixed point field ThresholdReLU that activates the operation may be int32, int16, int8, int4, uint32, uint16, uint8, or uint4.
For example, in the scenario shown in fig. 3, the server 200 performs multiple model training using an unquantized face recognition model in advance, so as to obtain a maximum value 254 and a minimum value-6 of floating-point input data of ThresholdReLU activation operation in the face recognition model in the training process, and obtain a floating-point input data range of [ -6, 254]; the data type of the fixed point input data of the fixed point field ThresholdReLU activating operation is determined to be int8, and the range of the desirable value corresponding to int8 is [ -128,127]. Substituting the maximum value 254, the minimum value-6 and the fixed point input data maximum value 127 of the floating point input data into equation 3 results in the first quantization parameter S being 2.
S502, according to the first quantization parameter, the threshold parameter a is quantized to obtain a first fixed-point threshold parameter aq1.
In some implementations, substituting the first quantization parameter into the symmetric quantization formula results in a first fixed point threshold parameter aq1 corresponding to the threshold parameter a.
For example, the threshold parameter a is 4.1, and the first quantization parameter value 2 obtained in the above step is substituted into the symmetric quantization formula 2 to obtain a value of round (4.1/2) =2 for the first fixed-point threshold parameter aq1 corresponding to the threshold parameter a with the value of 4.1.
For example, the threshold parameter a is 5.8, and the first quantization parameter value 2 obtained in the above step is substituted into the symmetric quantization formula 2 to obtain a value of round (5.8/2) =3 for the first fixed-point threshold parameter aq1 corresponding to the threshold parameter a with a value of 5.8.
And S503, adjusting the first quantization parameter according to the first fixed-point threshold parameter aq1 and the threshold parameter a to obtain a second quantization parameter, wherein the maximum value of the same fixed-value data range corresponding to the second quantization parameter is the threshold parameter a.
It will be appreciated that in some implementations, the second quantization parameter may be obtained according to the parity of the first fixed-point threshold parameter aq1 value and the quantization parameter formula corresponding to the parity, and in other implementations, the second quantization parameter may be directly obtained according to the quantization parameter formula corresponding to the parity without considering the parity of the first fixed-point threshold parameter aq1 value.
In some implementations, when the obtained value of the first fixed-point threshold parameter aq1 is even, the first fixed-point threshold parameter aq1 and the threshold parameter a are substituted into the following formula 5 to obtain the second quantization parameter, so that the maximum value of the same fixed-value data range corresponding to the second quantization parameter is the threshold parameter a. For convenience of explanation, equation 5 used to obtain the second quantization parameter in the case where the value of the first fixed point threshold parameter aq1 is even is referred to as an even quantization parameter equation.
In equation 5, S' is a second quantization parameter, aq1 is a first fixed-point threshold parameter, and a is a threshold parameter.
For example, in fig. 6a, the threshold parameter a is 4.1, the first fixed-point threshold parameter aq1 has a value of 2 and is an even number, and substituting the first fixed-point threshold parameter aq1 having a value of 2 and the threshold parameter a having a value of 4.1 into formula 5 yields a second quantized parameter value of S' =4.1/(2+0.5) =1.64. At this time, the correspondence between the floating point input data x and the fixed point input data x q obtained according to the second quantization parameter S' with a value of 1.64 is shown in fig. 7 a.
In some implementations, when the obtained value of the first fixed-point threshold parameter aq1 is odd, the first fixed-point threshold parameter aq1 and the threshold parameter a are substituted into the following formula 6 to obtain the second quantization parameter S ', so that the maximum value of the same fixed-value data range corresponding to the second quantization parameter S' is the threshold parameter a. For convenience of explanation, equation 6 used to obtain the second quantization parameter in the case where the value of the first fixed point threshold parameter aq1 is even is referred to as an odd quantization parameter equation.
In equation 6, S' is a second quantization parameter, aq1 is a first fixed-point threshold parameter, and a is a threshold parameter.
For example, in fig. 6b, the threshold parameter a is 5.8, the first fixed-point threshold parameter aq1 is 3, and is an odd number, and substituting the first fixed-point threshold parameter aq1 with the value of 3 and the threshold parameter a with the value of 5.8 into formula 6 yields a second quantization parameter value S' =5.8/(3-0.5) =2.32. At this time, the correspondence between the floating point input data x and the fixed point input data x q obtained according to the second quantization parameter S' with a value of 2.32 is shown in fig. 7 b.
It can be appreciated that, in other implementations, the first fixed-point threshold parameter aq1 and the threshold parameter a are substituted into equation 7 to obtain the second quantization parameter S ', so that the maximum value of the same-value data range corresponding to the second quantization parameter S' is the threshold parameter a. For convenience of explanation, the quantization parameter formula corresponding to the parity of the value of the threshold parameter aq1 without considering the first fixed point will be simply referred to as a quantization parameter formula, that is, the following formula 7 used for obtaining the second quantization parameter will be referred to as a quantization parameter formula.
In formula 7, S' is a second quantization parameter, aq1 is a first fixed-point threshold parameter, a is a threshold parameter, mod (aq 1, 2) is a residual function, that is, mod (aq 1, 2) is a remainder obtained by dividing two numbers of aq1 and aq 2, aq1 is a dividend, and 2 is a divisor.
For example, in fig. 6a, the threshold parameter a is 4.1, the first fixed-point threshold parameter aq1 is 2, and the first fixed-point threshold parameter aq1 with the value of 2 and the threshold parameter a with the value of 4.1 are substituted into formula 7 to obtain the second quantized parameter value, wherein the remainder function mod (2, 2) =0, and the corresponding S' =4.1/(2-0+0.5) =1.64. At this time, the correspondence between the floating point input data x and the fixed point input data x q obtained according to the second quantization parameter S' with a value of 1.64 is shown in fig. 7 a.
For example, in fig. 6b, the threshold parameter a is 5.8, the first fixed-point threshold parameter aq1 is 3, and the first fixed-point threshold parameter aq1 with the value of 3 and the threshold parameter a with the value of 5.8 are substituted into formula 7 to obtain the second quantized parameter value, wherein the remainder function mod (3, 2) =1, s' =5.8/(3-1+0.5) =2.32. At this time, the correspondence between the floating point input data x and the fixed point input data x q obtained according to the second quantization parameter S' with a value of 2.32 is shown in fig. 7 b.
It will be appreciated that for the first setpoint threshold parameter aq1 and the threshold parameter a, the values of the second quantization parameter according to the odd/even quantization parameter formula, or the quantization parameter formula, are the same.
S504, quantizing the threshold parameter a according to the second quantizing parameter to obtain a second fixed-point threshold parameter aq2.
In some implementations, the second quantization parameter obtained according to the quantization parameter formula, or the second quantization parameter obtained according to the odd/even quantization parameter formula, is substituted into the symmetric quantization formula 2, and the threshold parameter a is quantized to obtain the second fixed-point threshold parameter aq2.
For example, the threshold parameter a is 4.1, and substituting the second quantization parameter value obtained according to the quantization parameter formula or the even quantization parameter formula in the above step into the symmetric quantization formula 2 is 1.64, to obtain the second fixed-point threshold parameter aq2 with a value of round (4.1/1.64) =2. Fig. 7a shows that when the value of the threshold parameter a is 4.1, the corresponding value of the second fixed-point threshold parameter aq2 is 2, and when the maximum value of the floating point input data range in which the quantized fixed-point value is equal to the value of the second fixed-point threshold parameter aq2 is 4.1, the maximum value of the same fixed-point data range corresponding to the value of the second quantized parameter is 1.64 is 4.1.
For another example, the threshold parameter a is 5.8, and the second fixed point threshold parameter aq2 obtained by substituting the second quantization parameter value obtained by the above step according to the quantization parameter formula or the odd quantization parameter formula into the symmetric quantization formula 2 is found to have a value of round (5.8/2.32) =2. Fig. 7b shows that when the value of the threshold parameter a is 5.8, the corresponding value of the second fixed-point threshold parameter aq2 is 2, and when the maximum value of the floating point input data range in which the quantized fixed-point value is equal to the value of the second fixed-point threshold parameter aq2 is 5.8, the maximum value of the same fixed-point data range corresponding to the second quantized parameter value is 2.32 is 5.8.
S505, obtaining a quantized ThresholdReLU activation function for performing activation operation in the operation process of the neural network model according to the second fixed-point threshold parameter aq 2.
It can be appreciated that during the operation of the neural network model, the quantized ThresholdReLU activation functions are used to perform an activation operation on the data to be activated.
In some implementations, a second fixed point threshold parameter aq2 obtained from the second quantization parameter is substituted into equation 4 to obtain the quantized ThresholdReLU activation function.
For example, in the above example, if the value of the second fixed point threshold parameter aq2 obtained by the threshold parameter a being 4.1 is 2, the second fixed point threshold parameter aq2 with the value of 2 is substituted into equation 4, so as to obtain the quantized ThresholdReLU activation function as shown in equation 8 below. The neural network model performs an activation operation on the data to be activated using the operation logic as shown in equation 8 during operation.
For another example, if the value of the second fixed point threshold parameter aq2 obtained by the threshold parameter a being 5.8 in the above example is 2, the second fixed point threshold parameter aq2 having the value of 2 is substituted into equation 4 to obtain the quantized ThresholdReLU activation function as shown in equation 9 below. The neural network model performs an activation operation on the data to be activated using the operation logic as shown in equation 9 during operation.
It should be understood that the above-mentioned execution sequence of steps S501 to S505 is only an example for the quantized ThresholdReLU activation function method, and in other embodiments, other execution sequences may be adopted, and partial steps may be split or combined, which is not limited herein.
It may be appreciated that, after the quantized ThresholdReLU activation function is obtained, in the operation process of the neural network model, the quantized ThresholdReLU activation function is used to perform ThresholdReLU activation operation, so as to obtain fixed-point output data, where the fixed-point output data may be data obtained by performing ThresholdReLU activation operation on video data, performing ThresholdReLU activation operation on audio data, performing ThresholdReLU activation operation on image data, performing ThresholdReLU activation operation on text data, and so on, for example, in the process of performing data processing on a face image using a face recognition model in the scene shown in fig. 3, the fixed-point output data may be image data obtained by performing ThresholdReLU activation operation on the face image data.
For better understanding the technical solution of the embodiment of the present application, an example of performing an activation operation in the process of processing data by the terminal 100 in the operation of the neural network model will be described. Taking the flowchart shown in fig. 8 as an example, the server 200 creates a quantized ThresholdReLU activation function by using the flowchart shown in fig. 5, and sends the quantized ThresholdReLU activation function to the terminal 100, when the terminal 100 runs the deployed neural network model, in the process of processing data by the model, according to floating point input data to be subjected to ThresholdReLU activation operation, a fixed point output result is obtained by using the determined quantized ThresholdReLU activation function, and the specific process includes the following steps:
S801, floating point data to be subjected to ThresholdReLU activation operation in the operation process of the neural network model is received.
It is understood that the floating point data to be subjected to ThresholdReLU activation operation in the operation process of the receiving neural network model may be floating point video data, audio data, image data, text data or data related to the floating point data.
For example, in the face recognition scenario shown in fig. 3, floating point data to be subjected to ThresholdReLU activation operations in the running process of the received face recognition model is image data, and the value of the floating point data is 4.3.
S802, the floating point data are quantized to obtain fixed point numbers, wherein the fixed point numbers after the floating point data with the value larger than the threshold value parameter are quantized are larger than the fixed point numbers after the threshold value parameter is quantized in ThresholdReLU activation functions.
It is understood that the fixed point number obtained by quantizing the floating point data is fixed point video data, audio data, image data, text data, or the like.
For example, in the process of obtaining the post-quantization ThresholdReLU activation function shown in fig. 5, the threshold parameter a is 4.1, the corresponding second quantization parameter is 1.64, the floating-point image data with the value of 4.3 in step S801 is quantized according to the symmetric quantization formula 2 to obtain the fixed-point number, and the value of round (4.3/1.64) =3, and the value of the obtained fixed-point image data is 3.
S803, taking the obtained fixed point number as fixed point input data of the ThresholdReLU activating function after quantization, and obtaining fixed point output data obtained after activating operation in the operation process of the neural network model.
It can be understood that in the operation process of the neural network model, the fixed-point data such as video, audio, image, text and the like are used as the fixed-point input data of the ThresholdReLU activation function after quantization, and the activation operation is performed, so that the fixed-point output data of the activation operation is obtained. The resulting fixed point output data of the active operation is video data, audio data, image data, text data, etc. When the input data for performing the activation operation is the image data, the obtained fixed-point output data is also the image data; when the input data for performing the activation operation is video data, the obtained fixed-point output data is video data; when the input data for performing the activation operation is audio data, the obtained fixed-point output data is also audio data; when the input data for performing the activation operation is text data, the obtained fixed-point output data is text data.
For example, according to the flow shown in fig. 5, the quantized ThresholdReLU activation function corresponding to the threshold parameter a of 4.1 is represented by formula 8, the fixed-point image data value 3 is substituted into formula 8, and it is found that the fixed-point image data value 3 is greater than the fixed-point threshold parameter a q value 2, so that the fixed-point output data y q is 3, and the value of the image data obtained by performing the activation operation is 3.
It should be understood that the execution sequence of steps S801 to S803 is merely an example, and in other embodiments, other execution sequences may be adopted, and partial steps may be split or combined, which is not limited herein.
It will be appreciated that the neural network model deployed to the terminal 100 may also be run by the server 200.
FIG. 9 illustrates a schematic diagram of a data processing apparatus for a neural network model, according to some embodiments of the application, as shown, the apparatus comprising:
The first determining unit is used for determining floating point data to be processed in the neural network model;
the first acquisition unit is used for quantizing the determined floating point data through a second quantization parameter to obtain fixed point numbers, wherein the second quantization parameter is determined according to a floating point threshold value parameter;
And the second obtaining unit is used for obtaining fixed-point output data by taking the obtained fixed-point number as fixed-point input data of ThresholdReLU activation operation included in the neural network model, wherein ThresholdReLU activation operation is associated with the second quantization parameter.
In order to facilitate understanding of the technical solutions of the embodiments of the present application, the following describes the hardware structure of the terminal 100.
Further, fig. 10 illustrates a schematic structural diagram of a terminal 100 according to some embodiments of the present application. As shown in fig. 10, terminal 100 includes one or more processors 101, a system Memory 102, a Non-Volatile Memory (NVM) 103, a communication interface 104, input/output (I/O) devices 105, system control logic 106, and instructions 107.
Wherein: the Processor 101 may include one or more processing units, for example, a processing module or processing circuit, which may include a central Processor CPU (Central Processing Unit), an image Processor GPU (Graphics Processing Unit), a digital signal Processor DSP (DIGITAL SIGNAL Processor), a microprocessor MCU (Micro-programmed Control Unit), an AI (ARTIFICIAL INTELLIGENCE ) Processor, or a programmable logic device FPGA (Field Programmable GATE ARRAY), may include one or more single-core or multi-core processors. Among other things, AI (ARTIFICIAL INTELLIGENCE ) processors include neural network processors (Neural network Processing Unit, NPU), zhouyi AIPU, and the like.
The system Memory 102 is a volatile Memory such as Random-Access Memory (RAM), double data rate synchronous dynamic Random Access Memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), or the like. The system memory is used to temporarily store data and/or instructions, for example, in some embodiments, the system memory 102 may be used to store instructions related to performing neural network model data processing methods, and the like, as described above.
Nonvolatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as a hard disk drive (HARD DISK DRIVE, HDD), compact Disc (CD), digital versatile Disc (DIGITAL VERSATILE DISC, DVD), solid state disk (Solid-state-STATE DRIVE, SSD), and the like. In some embodiments, the nonvolatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like.
In particular, the system memory 102 and the nonvolatile memory 103 may each include: a temporary copy and a permanent copy of instruction 107. The instructions 107 may include: at least one of the processors 101 executes a data processing method for causing the terminal 100to implement the neural network model provided by the embodiments of the present application.
The communication interface 104 may include a transceiver to provide a wired or wireless communication interface for the terminal 100 to communicate with any other suitable device via one or more networks. In some embodiments, the communication interface 104 may be integrated with other components of the terminal 100, e.g., the communication interface 104 may be integrated in the processor 101. In some embodiments, terminal 100 may communicate with other devices via communication interface 104.
Input/output (I/O) devices 105 may include input devices such as a keyboard, mouse, etc., output devices such as a display, etc., through which a user may interact with terminal 100.
The system control logic 106 may include any suitable interface controller to provide any suitable interface with other modules of the terminal 100. For example, in some embodiments, the system control logic 106 may include one or more memory controllers to provide an interface to the system memory 102 and the non-volatile memory 103.
In some embodiments, at least one of the processors 101 may be packaged together with logic for one or more controllers of the system control logic 106 to form a system package (SYSTEM IN PACKAGE, SIP). In other embodiments, at least one of the processors 101 may also be integrated on the same Chip with logic for one or more controllers of the System control logic 106 to form a System-on-Chip (SoC).
It is understood that terminal 100 may be any electronic device capable of running a neural network model, including, but not limited to, a cell phone, a wearable device (e.g., a smart watch, etc.), a tablet, a desktop, a laptop, a handheld computer, a notebook, an ultra-mobile personal computer (UMPC), a netbook, a cellular telephone, a Personal Digital Assistant (PDA), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, etc., embodiments of the present application are not limited.
It is to be understood that the configuration of terminal 100 shown in fig. 10 is merely an example, and in other embodiments, terminal 100 may include more or fewer components than shown, or certain components may be combined, certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Embodiments of the disclosed mechanisms may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as a computer program or program code that is executed on a programmable system comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of the present application, a processing system includes any system having a Processor such as, for example, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), microcontroller, application SPECIFIC INTEGRATED Circuit (ASIC), or microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in the present application are not limited in scope by any particular programming language. In either case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed over a network or through other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including but not limited to floppy diskettes, optical disks, read-Only memories (CD-ROMs), magneto-optical disks, read Only Memories (ROMs), random access memories (Random Access Memory, RAMs), erasable programmable Read-Only memories (Erasable Programmable Read Only Memory, EPROMs), electrically erasable programmable Read-Only memories (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only memories, EEPROMs), magnetic or optical cards, flash Memory, or tangible machine-readable Memory for transmitting information (e.g., carrier waves, infrared signal digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some structural or methodological features may be shown in a particular arrangement and/or order. However, it should be understood that such a particular arrangement and/or ordering may not be required. Rather, in some embodiments, these features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of structural or methodological features in a particular figure is not meant to imply that such features are required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the present application, each unit/module mentioned in each device is a logic unit/module, and in physical terms, one logic unit/module may be one physical unit/module, or may be a part of one physical unit/module, or may be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logic unit/module itself is not the most important, and the combination of functions implemented by the logic unit/module is only a key for solving the technical problem posed by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-described device embodiments of the present application do not introduce units/modules that are less closely related to solving the technical problems posed by the present application, which does not indicate that the above-described device embodiments do not have other units/modules.
It should be noted that, in the examples and descriptions of this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims (11)

1. A data processing method of a neural network model, applied to an electronic device, comprising:
Determining floating point data to be processed in a neural network model;
Quantizing the floating point data through a post quantization parameter to obtain fixed point numbers, wherein the post quantization parameter is determined according to a preset floating point threshold parameter;
the fixed point number is used as fixed point input data of a first threshold correction linear unit activation operation included in the neural network model, fixed point output data is obtained, and the first threshold correction linear unit activation operation is associated with the post-quantization parameter;
wherein floating point type data to be processed in the neural network model is obtained based on at least one data of image data, audio data, text data and video data,
The post-quantization parameter is determined according to a preset floating point threshold parameter, and comprises the following steps:
Obtaining the post-quantization parameter based on a quantization parameter formula, or obtaining the post-quantization parameter based on an odd quantization parameter formula and an even quantization parameter formula, and obtaining the post-quantization parameter according to the odd quantization parameter formula when a first fixed point threshold parameter is odd, obtaining the post-quantization parameter according to the even quantization parameter formula when the first fixed point threshold parameter is even, wherein the first fixed point threshold parameter is a fixed point number obtained by quantizing the floating point threshold parameter based on a previous quantization parameter;
The formula of the quantization parameter is as follows:
Wherein S' is the post-quantization parameter, a is the floating point threshold parameter, a q1 is the first fixed point threshold parameter, mod (a q1, 2) is the remainder function, a q1 is the dividend, and 2 is the divisor;
the odd quantization parameter formula is:
The even quantization parameter formula is:
2. the method of claim 1, wherein the pre-quantization parameter is determined by a method comprising,
And determining the front quantization parameter according to the range of floating point data to be processed in the neural network model and the data type of the fixed point number.
3. The method of claim 1, wherein the first threshold modifying linear unit activation operation is determined based on the post-quantization parameter and the floating point threshold parameter.
4. A method according to claim 1 or 3, wherein the first threshold modifying linear unit activation operation is determined by a method comprising,
Quantizing the floating point threshold parameter according to the post quantization parameter to obtain a second fixed point threshold parameter;
And obtaining a threshold correction linear unit activation function for performing the first threshold correction linear unit activation operation according to the second fixed-point threshold parameter.
5. The method of claim 2, wherein the range of floating point type data to be processed in the neural network model is determined during operation of the neural network model.
6. The method of claim 2, wherein the fixed point number data types include at least one of: int32, int16, int8, int4, uint32, uint16, uint8, or uint4.
7. The method of claim 1, wherein the quantization is symmetric quantization.
8. A data processing apparatus of a neural network model for performing the data processing method of a neural network model as claimed in any one of claims 1 to 7, comprising,
The first determining unit is used for determining floating point type data to be processed in the neural network model;
The first acquisition unit is used for quantizing the floating point type data through a post quantization parameter to obtain fixed point numbers, wherein the post quantization parameter is determined according to a preset floating point threshold parameter;
The second obtaining unit is used for taking the fixed point number as fixed point input data of a first threshold correction linear unit activation operation included in the neural network model to obtain fixed point output data, wherein the first threshold correction linear unit activation operation is associated with the post-quantization parameter;
The floating point type data to be processed in the neural network model is obtained based on at least one of image data, audio data, text data and video data.
9. A computer readable storage medium having stored thereon instructions for performing the method of any of claims 1-7 on an electronic device.
10. A computer program product, characterized in that the computer program product comprises instructions, which instructions, when executed by one or more processors, are adapted to carry out the method of any one of claims 1-7.
11. An electronic device, comprising:
a memory for storing instructions, an
One or more processors that, when executed by the one or more processors, perform the method of any of claims 1-7.
CN202210556167.0A 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic equipment Active CN114841325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210556167.0A CN114841325B (en) 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210556167.0A CN114841325B (en) 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic equipment

Publications (2)

Publication Number Publication Date
CN114841325A CN114841325A (en) 2022-08-02
CN114841325B true CN114841325B (en) 2024-09-24

Family

ID=82572384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210556167.0A Active CN114841325B (en) 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic equipment

Country Status (1)

Country Link
CN (1) CN114841325B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294108B (en) * 2022-09-29 2022-12-16 深圳比特微电子科技有限公司 Target detection method, target detection model quantification device, and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612996A (en) * 2022-03-31 2022-06-10 安谋科技(中国)有限公司 Method for operating neural network model, medium, program product, and electronic device

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
US11645493B2 (en) * 2018-05-04 2023-05-09 Microsoft Technology Licensing, Llc Flow for quantized neural networks
CN110874625B (en) * 2018-08-31 2023-10-27 杭州海康威视数字技术股份有限公司 Data processing method and device
US20200082269A1 (en) * 2018-09-12 2020-03-12 Nvidia Corporation Memory efficient neural networks
GB2580171B (en) * 2018-12-21 2021-02-17 Imagination Tech Ltd Methods and systems for selecting quantisation parameters for deep neural networks using back-propagation
CN111488976B (en) * 2019-01-28 2023-06-30 中科寒武纪科技股份有限公司 Neural network computing device, neural network computing method and related products
GB2581546B (en) * 2019-08-22 2021-03-31 Imagination Tech Ltd Methods and systems for converting weights of a deep neural network from a first number format to a second number format
EP4020329A4 (en) * 2019-08-23 2023-07-05 Anhui Cambricon Information Technology Co., Ltd. Data processing method and apparatus, computer equipment and storage medium
JP7146952B2 (en) * 2019-08-23 2022-10-04 安徽寒武紀信息科技有限公司 DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
CN114065900A (en) * 2020-07-30 2022-02-18 华为技术有限公司 Data processing method and data processing device
CN114698395A (en) * 2020-10-30 2022-07-01 华为技术有限公司 Quantification method and device of neural network model, and data processing method and device
CN112733863B (en) * 2021-01-07 2022-06-07 苏州浪潮智能科技有限公司 Image feature extraction method, device, equipment and storage medium
CN113723161A (en) * 2021-03-08 2021-11-30 腾讯科技(深圳)有限公司 Neural network model training method, storage medium and electronic device
US20230010981A1 (en) * 2021-07-06 2023-01-12 Kwai Inc. Methods and apparatuses for high performance and accuracy fixed-point scale implementation
CN114091655A (en) * 2021-11-17 2022-02-25 上海瑾盛通信科技有限公司 Neural network quantization method, device, storage medium and terminal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612996A (en) * 2022-03-31 2022-06-10 安谋科技(中国)有限公司 Method for operating neural network model, medium, program product, and electronic device

Also Published As

Publication number Publication date
CN114841325A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN110555450B (en) Face recognition neural network adjusting method and device
CN110619871B (en) Voice wakeup detection method, device, equipment and storage medium
US12225196B2 (en) Image compression method and apparatus thereof
CN114612996B (en) Neural network model operation method, medium, program product and electronic device
CN114118384B (en) Quantification method of neural network model, readable medium and electronic device
US20200302283A1 (en) Mixed precision training of an artificial neural network
CN110728350B (en) Quantization for machine learning models
CN111401550A (en) Neural network model quantification method and device and electronic equipment
CN112580805A (en) Method and device for quantizing neural network model
CN111178514A (en) Neural network quantification method and system
US20230068381A1 (en) Method and electronic device for quantizing dnn model
CN114841325B (en) Data processing method and medium of neural network model and electronic equipment
US20190378526A1 (en) Vector quantizer
CN115983349A (en) Method and device for quantizing convolutional neural network, electronic device and storage medium
WO2020214254A1 (en) Layer trajectory long short-term memory with future context
CN115564028A (en) Quantification method, electronic device and readable storage medium of neural network model
KR20210141252A (en) Electronic apparatus and method for controlling thereof
US10318891B1 (en) Geometry encoder
US10891758B2 (en) Geometry encoder
CN116306709A (en) Data processing method, medium and electronic equipment
US20220101100A1 (en) Load distribution for a distributed neural network
CN114492779B (en) Operation method of neural network model, readable medium and electronic equipment
KR102722476B1 (en) Neural processing elements with increased precision
US20210303975A1 (en) Compression and decompression of weight values
US11431349B2 (en) Method, electronic device and computer program product for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant