CN114698395A - Method and device for quantification of neural network model, method and device for data processing - Google Patents
Method and device for quantification of neural network model, method and device for data processing Download PDFInfo
- Publication number
- CN114698395A CN114698395A CN202080016479.1A CN202080016479A CN114698395A CN 114698395 A CN114698395 A CN 114698395A CN 202080016479 A CN202080016479 A CN 202080016479A CN 114698395 A CN114698395 A CN 114698395A
- Authority
- CN
- China
- Prior art keywords
- operator
- data
- input data
- training
- quantized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
A neural network model quantification method and device and a data processing method and device belong to the field of artificial intelligence. The original neural network model comprises a first operator, a second operator and a first operation module, wherein the first operation module is used for operating the output of the first operator and the output of the second operator, and the neural network model quantization method comprises the following steps: determining a data quantization parameter according to the range of first training input data of a first operator and the range of second training input data of a second operator; and determining a quantized neural network model, and quantizing the quantized first input data of the first operator and the quantized second input data of the second operator by using the quantized data quantization parameters of the quantized neural network model respectively. The processing result of the quantized first operator and the processing result of the quantized second operator can be directly operated, so that the data processing efficiency is improved while the data processing precision of the neural network model is improved.
Description
PCT国内申请,说明书已公开。PCT domestic application, the description has been published.
Claims (18)
- A neural network model quantization method, comprising:the method comprises the steps of obtaining an original neural network model, wherein the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out first operation on the output of the first operator and the output of the second operator;determining a data quantization parameter according to a range of first training input data and a range of second training input data, wherein the first training input data is input data of the first operator, and the second training input data is input data of the second operator;determining a quantized neural network model according to the original neural network model, wherein the quantized neural network model comprises a quantization module, a third operator, a fourth operator and a second operation module, the quantization module is used for quantizing first input data of the third operator and second input data of the fourth operator respectively by using the data quantization parameter, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the second operation module is used for performing the first operation.
- The method of claim 1, further comprising:acquiring preset training output data corresponding to a training input data set, wherein the training input data set comprises the first training input data and the second training input data;quantizing the first training input data and the second training input data respectively by using the data quantization parameter;processing the quantized first training input data and the quantized second training input data by using the quantized neural network model to obtain actual training output data;adjusting the data quantization parameter according to the difference between the actual training output data and the preset training output data to minimize the difference;the quantization module is configured to quantize the first input data of the third operator and the second input data of the fourth operator respectively by using the adjusted data quantization parameter.
- The method of claim 2, further comprising:determining operator quantization parameters according to the parameter range of the first operator and the parameter range of the second operator;quantizing the parameter of the first operator by using the operator quantization parameter to obtain a parameter of the third operator;and quantizing the parameters of the second operator by using the operator quantization parameters to obtain the parameters of the fourth operator.
- The method according to any one of claims 1 to 3, wherein the quantized neural network model further comprises a compression module for compressing the output of the third operator and the output of the fourth operator respectively according to an offset parameter indicating the position of the highest bit in the compressed data in the data before the compression, and the second operation module is configured to perform the first operation on the compressed data;the method further comprises the following steps:quantizing the first training input data and the second training input data respectively by using the data quantization parameters;processing the quantized first training input data by using the third operator, wherein the third operator outputs first training operation data;processing the quantized second training input data by using the fourth operator, wherein the fourth operator outputs second training operation data;determining the offset parameter according to the significand of the first training operand data and the significand of the second training operand data.
- A method of data processing, the method comprising:the method comprises the steps of obtaining a quantized neural network model, wherein the quantized neural network model is obtained by quantizing an original neural network model, the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out the first operation on the output of the first operator and the output of the second operator;processing the first input data of the third operator and the second input data of the fourth operator by using the quantized neural network model, where the quantized neural network model includes a quantization module, a first operator, a second operator, and a second operation module, the quantization module is configured to quantize the first input data and the second input data by using a data quantization parameter, the second operation module is configured to perform the first operation, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the data quantization parameter is determined according to a range of the first training input data of the first operator and a range of the second training input data of the second operator.
- The method of claim 5,the data quantization parameter is obtained by adjusting the initial data quantization parameter, the adjustment minimizes the difference between the actual training output data and the preset training output data,the initial quantization parameter is determined from a range of the first training input data and a range of the second training input data,the preset training output data corresponds to a training input data set comprising the first training input data and the second training input data,the actual training output data is obtained by processing the first training input data and the second training input data by using the quantized neural network model, and the quantization module is used for quantizing the first training input data and the second training input data by using the initial data quantization parameter.
- The method according to claim 6, wherein the parameter of the third operator is obtained by quantizing the parameter of the first operator by using an operator quantization parameter, and the parameter of the fourth operator is obtained by quantizing the parameter of the second operator by using the operator quantization parameter, and the operator quantization parameter is determined according to the parameter range of the first operator and the parameter range of the second operator.
- The method according to any one of claims 5 to 7, wherein the quantized neural network model further comprises a compression module for compressing the output of the third operator and the output of the fourth operator respectively according to an offset parameter indicating the position of the highest bit in the compressed data in the data before the compression, and the second operation module is configured to perform the first operation on the compressed data;the offset parameter is determined according to a significand of first training operation data obtained by processing first training input data quantized by the data quantization parameter using the third operator and a significand of second training operation data obtained by processing second training input data quantized by the data quantization parameter using the fourth operator.
- An apparatus for quantizing a neural network model, the apparatus comprising: a storage module and a processing module, wherein,the storage module is used for storing programs;when the program is run in the processing module, the processing module is to:the method comprises the steps of obtaining an original neural network model, wherein the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out the first operation on the output of the first operator and the output of the second operator;determining a data quantization parameter according to a range of first training input data and a range of second training input data, wherein the first training input data is input data of the first operator, and the second training input data is input data of the second operator;determining a quantized neural network model according to the original neural network model, wherein the quantized neural network model comprises a quantization module, a third operator, a fourth operator and a second operation module, the quantization module is used for quantizing first input data of the third operator and second input data of the fourth operator respectively by using the data quantization parameter, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the second operation module is used for performing the first operation.
- The apparatus of claim 9,the processing module is further configured to obtain preset training output data corresponding to a training input data set, where the training input data set includes the first training input data and the second training input data;the processing module is further configured to quantize the first training input data and the second training input data respectively by using the data quantization parameter;the processing module is further used for processing the quantized first training input data and the quantized second training input data by using the quantized neural network model to obtain actual training output data;the processing module is further configured to adjust the data quantization parameter according to a difference between the actual training output data and the preset training output data to minimize the difference;the quantization module is configured to quantize the first input data of the third operator and the second input data of the fourth operator respectively by using the adjusted data quantization parameter.
- The apparatus of claim 10,the processing module is further used for determining operator quantization parameters according to the parameter range of the first operator and the parameter range of the second operator;the processing module is further configured to quantize the parameter of the first operator by using the operator quantization parameter to obtain a parameter of the third operator;the processing module is further configured to quantize the parameter of the second operator by using the operator quantization parameter to obtain a parameter of the fourth operator.
- The apparatus according to any one of claims 9-11, wherein the quantized neural network model further comprises a compression module configured to compress outputs of the third operator and the fourth operator according to an offset parameter, respectively, the offset parameter indicating a position of a highest bit in the compressed data in data before the compression, and the second operation module configured to perform the first operation on the compressed data;the processing module is further configured to quantize the first training input data and the second training input data using the data quantization parameter, respectively;the processing module is further configured to process the quantized first training input data by using the third operator, and the third operator outputs first training operation data;the processing module is further configured to process the quantized second training input data by using the fourth operator, and the fourth operator outputs second training operation data;the processing module is further configured to determine the offset parameter according to the significand of the first training operation data and the significand of the second training operation data.
- A data processing apparatus, comprising: a storage module and a processing module, wherein,the storage module is used for storing programs;when the program is run in the processing module, the processing module is to:the method comprises the steps of obtaining a quantized neural network model, wherein the quantized neural network model is obtained by quantizing an original neural network model, the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out the first operation on the output of the first operator and the output of the second operator;processing the first input data of the third operator and the second input data of the fourth operator by using the quantized neural network model, where the quantized neural network model includes a quantization module, a first operator, a second operator, and a second operation module, the quantization module is configured to quantize the first input data and the second input data by using a data quantization parameter, the second operation module is configured to perform the first operation, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the data quantization parameter is determined according to a range of the first training input data of the first operator and a range of the second training input data of the second operator.
- The apparatus of claim 13,the data quantization parameter is obtained by adjusting the initial data quantization parameter, the adjustment minimizes the difference between the actual training output data and the preset training output data,the initial quantization parameter is determined from a range of the first training input data and a range of the second training input data,the preset training output data corresponds to a training input data set comprising the first training input data and the second training input data,the actual training output data is obtained by processing the first training input data and the second training input data by using the quantized neural network model, and the quantization module is used for quantizing the first training input data and the second training input data by using the initial data quantization parameter.
- The apparatus according to claim 14, wherein the parameter of the third operator is obtained by quantizing the parameter of the first operator by using an operator quantization parameter, and the parameter of the fourth operator is obtained by quantizing the parameter of the second operator by using the operator quantization parameter, and the operator quantization parameter is determined according to a parameter range of the first operator and a parameter range of the second operator.
- The apparatus according to any one of claims 13-15, wherein the quantized neural network model further comprises a compression module configured to compress outputs of the third operator and the fourth operator according to an offset parameter, respectively, the offset parameter indicating a position of a highest bit in the compressed data in data before the compression, and the second operation module configured to perform the first operation on the compressed data;the offset parameter is determined according to a significand of first training operation data obtained by processing first training input data quantized by the data quantization parameter using the third operator and a significand of second training operation data obtained by processing second training input data quantized by the data quantization parameter using the fourth operator.
- A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, which when executed by the device performs the method of any one of claims 1 to 8.
- A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1 to 8.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/125370 WO2022088063A1 (en) | 2020-10-30 | 2020-10-30 | Method and apparatus for quantizing neural network model, and method and apparatus for processing data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114698395A true CN114698395A (en) | 2022-07-01 |
Family
ID=81381775
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202080016479.1A Pending CN114698395A (en) | 2020-10-30 | 2020-10-30 | Method and device for quantification of neural network model, method and device for data processing |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114698395A (en) |
WO (1) | WO2022088063A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114841325B (en) * | 2022-05-20 | 2024-09-24 | 安谋科技(中国)有限公司 | Data processing method and medium of neural network model and electronic equipment |
CN116258178B (en) * | 2023-03-24 | 2023-09-22 | 美的集团(上海)有限公司 | Model conversion method, device, electronic equipment and readable storage medium |
CN116579400B (en) * | 2023-05-19 | 2024-02-23 | 北京百度网讯科技有限公司 | Quantization method, data processing method and device of deep learning model |
CN116975628A (en) * | 2023-06-19 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Data processing method, apparatus, electronic device, storage medium, and program product |
CN117634577B (en) * | 2024-01-25 | 2024-06-07 | 深圳市九天睿芯科技有限公司 | Vector processor, neural network accelerator, chip and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738403A (en) * | 2020-04-26 | 2020-10-02 | 华为技术有限公司 | A neural network optimization method and related equipment |
CN111814955A (en) * | 2020-06-19 | 2020-10-23 | 浙江大华技术股份有限公司 | Method and apparatus for quantizing neural network model, and computer storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598839A (en) * | 2018-06-12 | 2019-12-20 | 华为技术有限公司 | Convolutional neural network system and method for quantizing convolutional neural network |
US20200097818A1 (en) * | 2018-09-26 | 2020-03-26 | Xinlin LI | Method and system for training binary quantized weight and activation function for deep neural networks |
US11651192B2 (en) * | 2019-02-12 | 2023-05-16 | Apple Inc. | Compressed convolutional neural network models |
CN110322008A (en) * | 2019-07-10 | 2019-10-11 | 杭州嘉楠耘智信息科技有限公司 | Residual convolution neural network-based quantization processing method and device |
CN111176853A (en) * | 2020-02-19 | 2020-05-19 | 珠海市杰理科技股份有限公司 | Data quantization method and device, computer equipment and storage medium |
CN111652366A (en) * | 2020-05-09 | 2020-09-11 | 哈尔滨工业大学 | A joint neural network model compression method based on channel pruning and quantization training |
-
2020
- 2020-10-30 CN CN202080016479.1A patent/CN114698395A/en active Pending
- 2020-10-30 WO PCT/CN2020/125370 patent/WO2022088063A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738403A (en) * | 2020-04-26 | 2020-10-02 | 华为技术有限公司 | A neural network optimization method and related equipment |
CN111814955A (en) * | 2020-06-19 | 2020-10-23 | 浙江大华技术股份有限公司 | Method and apparatus for quantizing neural network model, and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2022088063A1 (en) | 2022-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114698395A (en) | Method and device for quantification of neural network model, method and device for data processing | |
TWI722434B (en) | Self-tuning incremental model compression method in deep neural network | |
Krishnamoorthi | Quantizing deep convolutional networks for efficient inference: A whitepaper | |
CN109840589B (en) | Method and device for operating convolutional neural network on FPGA | |
CN110334802B (en) | A method, device, equipment and storage medium for constructing a neural network model | |
JP7424078B2 (en) | Image encoding method and device and image decoding method and device | |
CN113132723B (en) | Image compression method and device | |
EP4535240A3 (en) | Data compression using conditional entropy models | |
TW202145142A (en) | Method and apparatus of quantization training, image processing, and storage medium | |
JP2020191077A (en) | Image coding method and apparatus and image decoding method and apparatus | |
CN111937011A (en) | A method and device for determining weight parameters of a neural network model | |
JP2003283844A5 (en) | ||
EP4044070A3 (en) | Neural network processing unit, neural network processing method and device | |
CN113222098B (en) | Data processing method and related product | |
KR20210143093A (en) | Electronic apparatus and control method thereof | |
CN109978144B (en) | Model compression method and system | |
US20240078411A1 (en) | Information processing system, encoding device, decoding device, model learning device, information processing method, encoding method, decoding method, model learning method, and program storage medium | |
CN112800813A (en) | Target identification method and device | |
JPWO2022190195A5 (en) | Information processing system, encoding device, decoding device, model learning device, information processing method, encoding method, decoding method, model learning method, and program | |
KR20200049422A (en) | Effective Network Compression using Simulation-guided Iterative Pruning | |
CN101198041A (en) | Vector quantization method and device | |
CN119577371A (en) | A highly efficient and intelligent data compression and feature extraction method and system | |
CN112183726A (en) | Neural network full-quantization method and system | |
CA2561435A1 (en) | Reduced computational complexity of bit allocation for perceptual coding | |
US20210397962A1 (en) | Effective network compression using simulation-guided iterative pruning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |