[go: up one dir, main page]

CN114698395A - Method and device for quantification of neural network model, method and device for data processing - Google Patents

Method and device for quantification of neural network model, method and device for data processing Download PDF

Info

Publication number
CN114698395A
CN114698395A CN202080016479.1A CN202080016479A CN114698395A CN 114698395 A CN114698395 A CN 114698395A CN 202080016479 A CN202080016479 A CN 202080016479A CN 114698395 A CN114698395 A CN 114698395A
Authority
CN
China
Prior art keywords
operator
data
input data
training
quantized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080016479.1A
Other languages
Chinese (zh)
Inventor
昌晶
连朔
孙方轩
王晨曦
周君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114698395A publication Critical patent/CN114698395A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A neural network model quantification method and device and a data processing method and device belong to the field of artificial intelligence. The original neural network model comprises a first operator, a second operator and a first operation module, wherein the first operation module is used for operating the output of the first operator and the output of the second operator, and the neural network model quantization method comprises the following steps: determining a data quantization parameter according to the range of first training input data of a first operator and the range of second training input data of a second operator; and determining a quantized neural network model, and quantizing the quantized first input data of the first operator and the quantized second input data of the second operator by using the quantized data quantization parameters of the quantized neural network model respectively. The processing result of the quantized first operator and the processing result of the quantized second operator can be directly operated, so that the data processing efficiency is improved while the data processing precision of the neural network model is improved.

Description

PCT国内申请,说明书已公开。PCT domestic application, the description has been published.

Claims (18)

  1. A neural network model quantization method, comprising:
    the method comprises the steps of obtaining an original neural network model, wherein the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out first operation on the output of the first operator and the output of the second operator;
    determining a data quantization parameter according to a range of first training input data and a range of second training input data, wherein the first training input data is input data of the first operator, and the second training input data is input data of the second operator;
    determining a quantized neural network model according to the original neural network model, wherein the quantized neural network model comprises a quantization module, a third operator, a fourth operator and a second operation module, the quantization module is used for quantizing first input data of the third operator and second input data of the fourth operator respectively by using the data quantization parameter, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the second operation module is used for performing the first operation.
  2. The method of claim 1, further comprising:
    acquiring preset training output data corresponding to a training input data set, wherein the training input data set comprises the first training input data and the second training input data;
    quantizing the first training input data and the second training input data respectively by using the data quantization parameter;
    processing the quantized first training input data and the quantized second training input data by using the quantized neural network model to obtain actual training output data;
    adjusting the data quantization parameter according to the difference between the actual training output data and the preset training output data to minimize the difference;
    the quantization module is configured to quantize the first input data of the third operator and the second input data of the fourth operator respectively by using the adjusted data quantization parameter.
  3. The method of claim 2, further comprising:
    determining operator quantization parameters according to the parameter range of the first operator and the parameter range of the second operator;
    quantizing the parameter of the first operator by using the operator quantization parameter to obtain a parameter of the third operator;
    and quantizing the parameters of the second operator by using the operator quantization parameters to obtain the parameters of the fourth operator.
  4. The method according to any one of claims 1 to 3, wherein the quantized neural network model further comprises a compression module for compressing the output of the third operator and the output of the fourth operator respectively according to an offset parameter indicating the position of the highest bit in the compressed data in the data before the compression, and the second operation module is configured to perform the first operation on the compressed data;
    the method further comprises the following steps:
    quantizing the first training input data and the second training input data respectively by using the data quantization parameters;
    processing the quantized first training input data by using the third operator, wherein the third operator outputs first training operation data;
    processing the quantized second training input data by using the fourth operator, wherein the fourth operator outputs second training operation data;
    determining the offset parameter according to the significand of the first training operand data and the significand of the second training operand data.
  5. A method of data processing, the method comprising:
    the method comprises the steps of obtaining a quantized neural network model, wherein the quantized neural network model is obtained by quantizing an original neural network model, the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out the first operation on the output of the first operator and the output of the second operator;
    processing the first input data of the third operator and the second input data of the fourth operator by using the quantized neural network model, where the quantized neural network model includes a quantization module, a first operator, a second operator, and a second operation module, the quantization module is configured to quantize the first input data and the second input data by using a data quantization parameter, the second operation module is configured to perform the first operation, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the data quantization parameter is determined according to a range of the first training input data of the first operator and a range of the second training input data of the second operator.
  6. The method of claim 5,
    the data quantization parameter is obtained by adjusting the initial data quantization parameter, the adjustment minimizes the difference between the actual training output data and the preset training output data,
    the initial quantization parameter is determined from a range of the first training input data and a range of the second training input data,
    the preset training output data corresponds to a training input data set comprising the first training input data and the second training input data,
    the actual training output data is obtained by processing the first training input data and the second training input data by using the quantized neural network model, and the quantization module is used for quantizing the first training input data and the second training input data by using the initial data quantization parameter.
  7. The method according to claim 6, wherein the parameter of the third operator is obtained by quantizing the parameter of the first operator by using an operator quantization parameter, and the parameter of the fourth operator is obtained by quantizing the parameter of the second operator by using the operator quantization parameter, and the operator quantization parameter is determined according to the parameter range of the first operator and the parameter range of the second operator.
  8. The method according to any one of claims 5 to 7, wherein the quantized neural network model further comprises a compression module for compressing the output of the third operator and the output of the fourth operator respectively according to an offset parameter indicating the position of the highest bit in the compressed data in the data before the compression, and the second operation module is configured to perform the first operation on the compressed data;
    the offset parameter is determined according to a significand of first training operation data obtained by processing first training input data quantized by the data quantization parameter using the third operator and a significand of second training operation data obtained by processing second training input data quantized by the data quantization parameter using the fourth operator.
  9. An apparatus for quantizing a neural network model, the apparatus comprising: a storage module and a processing module, wherein,
    the storage module is used for storing programs;
    when the program is run in the processing module, the processing module is to:
    the method comprises the steps of obtaining an original neural network model, wherein the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out the first operation on the output of the first operator and the output of the second operator;
    determining a data quantization parameter according to a range of first training input data and a range of second training input data, wherein the first training input data is input data of the first operator, and the second training input data is input data of the second operator;
    determining a quantized neural network model according to the original neural network model, wherein the quantized neural network model comprises a quantization module, a third operator, a fourth operator and a second operation module, the quantization module is used for quantizing first input data of the third operator and second input data of the fourth operator respectively by using the data quantization parameter, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the second operation module is used for performing the first operation.
  10. The apparatus of claim 9,
    the processing module is further configured to obtain preset training output data corresponding to a training input data set, where the training input data set includes the first training input data and the second training input data;
    the processing module is further configured to quantize the first training input data and the second training input data respectively by using the data quantization parameter;
    the processing module is further used for processing the quantized first training input data and the quantized second training input data by using the quantized neural network model to obtain actual training output data;
    the processing module is further configured to adjust the data quantization parameter according to a difference between the actual training output data and the preset training output data to minimize the difference;
    the quantization module is configured to quantize the first input data of the third operator and the second input data of the fourth operator respectively by using the adjusted data quantization parameter.
  11. The apparatus of claim 10,
    the processing module is further used for determining operator quantization parameters according to the parameter range of the first operator and the parameter range of the second operator;
    the processing module is further configured to quantize the parameter of the first operator by using the operator quantization parameter to obtain a parameter of the third operator;
    the processing module is further configured to quantize the parameter of the second operator by using the operator quantization parameter to obtain a parameter of the fourth operator.
  12. The apparatus according to any one of claims 9-11, wherein the quantized neural network model further comprises a compression module configured to compress outputs of the third operator and the fourth operator according to an offset parameter, respectively, the offset parameter indicating a position of a highest bit in the compressed data in data before the compression, and the second operation module configured to perform the first operation on the compressed data;
    the processing module is further configured to quantize the first training input data and the second training input data using the data quantization parameter, respectively;
    the processing module is further configured to process the quantized first training input data by using the third operator, and the third operator outputs first training operation data;
    the processing module is further configured to process the quantized second training input data by using the fourth operator, and the fourth operator outputs second training operation data;
    the processing module is further configured to determine the offset parameter according to the significand of the first training operation data and the significand of the second training operation data.
  13. A data processing apparatus, comprising: a storage module and a processing module, wherein,
    the storage module is used for storing programs;
    when the program is run in the processing module, the processing module is to:
    the method comprises the steps of obtaining a quantized neural network model, wherein the quantized neural network model is obtained by quantizing an original neural network model, the original neural network model comprises a first operator, a second operator and a first operation module, the first operator and the second operator are used for carrying out the same type of operation, and the first operation module is used for carrying out the first operation on the output of the first operator and the output of the second operator;
    processing the first input data of the third operator and the second input data of the fourth operator by using the quantized neural network model, where the quantized neural network model includes a quantization module, a first operator, a second operator, and a second operation module, the quantization module is configured to quantize the first input data and the second input data by using a data quantization parameter, the second operation module is configured to perform the first operation, the third operator is the quantized first operator, the fourth operator is the quantized second operator, and the data quantization parameter is determined according to a range of the first training input data of the first operator and a range of the second training input data of the second operator.
  14. The apparatus of claim 13,
    the data quantization parameter is obtained by adjusting the initial data quantization parameter, the adjustment minimizes the difference between the actual training output data and the preset training output data,
    the initial quantization parameter is determined from a range of the first training input data and a range of the second training input data,
    the preset training output data corresponds to a training input data set comprising the first training input data and the second training input data,
    the actual training output data is obtained by processing the first training input data and the second training input data by using the quantized neural network model, and the quantization module is used for quantizing the first training input data and the second training input data by using the initial data quantization parameter.
  15. The apparatus according to claim 14, wherein the parameter of the third operator is obtained by quantizing the parameter of the first operator by using an operator quantization parameter, and the parameter of the fourth operator is obtained by quantizing the parameter of the second operator by using the operator quantization parameter, and the operator quantization parameter is determined according to a parameter range of the first operator and a parameter range of the second operator.
  16. The apparatus according to any one of claims 13-15, wherein the quantized neural network model further comprises a compression module configured to compress outputs of the third operator and the fourth operator according to an offset parameter, respectively, the offset parameter indicating a position of a highest bit in the compressed data in data before the compression, and the second operation module configured to perform the first operation on the compressed data;
    the offset parameter is determined according to a significand of first training operation data obtained by processing first training input data quantized by the data quantization parameter using the third operator and a significand of second training operation data obtained by processing second training input data quantized by the data quantization parameter using the fourth operator.
  17. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, which when executed by the device performs the method of any one of claims 1 to 8.
  18. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1 to 8.
CN202080016479.1A 2020-10-30 2020-10-30 Method and device for quantification of neural network model, method and device for data processing Pending CN114698395A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/125370 WO2022088063A1 (en) 2020-10-30 2020-10-30 Method and apparatus for quantizing neural network model, and method and apparatus for processing data

Publications (1)

Publication Number Publication Date
CN114698395A true CN114698395A (en) 2022-07-01

Family

ID=81381775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080016479.1A Pending CN114698395A (en) 2020-10-30 2020-10-30 Method and device for quantification of neural network model, method and device for data processing

Country Status (2)

Country Link
CN (1) CN114698395A (en)
WO (1) WO2022088063A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841325B (en) * 2022-05-20 2024-09-24 安谋科技(中国)有限公司 Data processing method and medium of neural network model and electronic equipment
CN116258178B (en) * 2023-03-24 2023-09-22 美的集团(上海)有限公司 Model conversion method, device, electronic equipment and readable storage medium
CN116579400B (en) * 2023-05-19 2024-02-23 北京百度网讯科技有限公司 Quantization method, data processing method and device of deep learning model
CN116975628A (en) * 2023-06-19 2023-10-31 腾讯科技(深圳)有限公司 Data processing method, apparatus, electronic device, storage medium, and program product
CN117634577B (en) * 2024-01-25 2024-06-07 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738403A (en) * 2020-04-26 2020-10-02 华为技术有限公司 A neural network optimization method and related equipment
CN111814955A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Method and apparatus for quantizing neural network model, and computer storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598839A (en) * 2018-06-12 2019-12-20 华为技术有限公司 Convolutional neural network system and method for quantizing convolutional neural network
US20200097818A1 (en) * 2018-09-26 2020-03-26 Xinlin LI Method and system for training binary quantized weight and activation function for deep neural networks
US11651192B2 (en) * 2019-02-12 2023-05-16 Apple Inc. Compressed convolutional neural network models
CN110322008A (en) * 2019-07-10 2019-10-11 杭州嘉楠耘智信息科技有限公司 Residual convolution neural network-based quantization processing method and device
CN111176853A (en) * 2020-02-19 2020-05-19 珠海市杰理科技股份有限公司 Data quantization method and device, computer equipment and storage medium
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 A joint neural network model compression method based on channel pruning and quantization training

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738403A (en) * 2020-04-26 2020-10-02 华为技术有限公司 A neural network optimization method and related equipment
CN111814955A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Method and apparatus for quantizing neural network model, and computer storage medium

Also Published As

Publication number Publication date
WO2022088063A1 (en) 2022-05-05

Similar Documents

Publication Publication Date Title
CN114698395A (en) Method and device for quantification of neural network model, method and device for data processing
TWI722434B (en) Self-tuning incremental model compression method in deep neural network
Krishnamoorthi Quantizing deep convolutional networks for efficient inference: A whitepaper
CN109840589B (en) Method and device for operating convolutional neural network on FPGA
CN110334802B (en) A method, device, equipment and storage medium for constructing a neural network model
JP7424078B2 (en) Image encoding method and device and image decoding method and device
CN113132723B (en) Image compression method and device
EP4535240A3 (en) Data compression using conditional entropy models
TW202145142A (en) Method and apparatus of quantization training, image processing, and storage medium
JP2020191077A (en) Image coding method and apparatus and image decoding method and apparatus
CN111937011A (en) A method and device for determining weight parameters of a neural network model
JP2003283844A5 (en)
EP4044070A3 (en) Neural network processing unit, neural network processing method and device
CN113222098B (en) Data processing method and related product
KR20210143093A (en) Electronic apparatus and control method thereof
CN109978144B (en) Model compression method and system
US20240078411A1 (en) Information processing system, encoding device, decoding device, model learning device, information processing method, encoding method, decoding method, model learning method, and program storage medium
CN112800813A (en) Target identification method and device
JPWO2022190195A5 (en) Information processing system, encoding device, decoding device, model learning device, information processing method, encoding method, decoding method, model learning method, and program
KR20200049422A (en) Effective Network Compression using Simulation-guided Iterative Pruning
CN101198041A (en) Vector quantization method and device
CN119577371A (en) A highly efficient and intelligent data compression and feature extraction method and system
CN112183726A (en) Neural network full-quantization method and system
CA2561435A1 (en) Reduced computational complexity of bit allocation for perceptual coding
US20210397962A1 (en) Effective network compression using simulation-guided iterative pruning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination