WO2024175079A1 - 一种模型的量化方法以及相关设备 - Google Patents
一种模型的量化方法以及相关设备 Download PDFInfo
- Publication number
- WO2024175079A1 WO2024175079A1 PCT/CN2024/078233 CN2024078233W WO2024175079A1 WO 2024175079 A1 WO2024175079 A1 WO 2024175079A1 CN 2024078233 W CN2024078233 W CN 2024078233W WO 2024175079 A1 WO2024175079 A1 WO 2024175079A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sub
- feature information
- quantization
- machine learning
- learning model
- Prior art date
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 341
- 238000000034 method Methods 0.000 title claims abstract description 221
- 238000010801 machine learning Methods 0.000 claims abstract description 209
- 230000004913 activation Effects 0.000 claims abstract description 129
- 238000009826 distribution Methods 0.000 claims abstract description 85
- 238000012549 training Methods 0.000 claims description 158
- 230000008569 process Effects 0.000 claims description 105
- 238000012545 processing Methods 0.000 claims description 69
- 230000015654 memory Effects 0.000 claims description 47
- 238000003860 storage Methods 0.000 claims description 31
- 238000001514 detection method Methods 0.000 claims description 27
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 230000002159 abnormal effect Effects 0.000 abstract description 29
- 238000013473 artificial intelligence Methods 0.000 abstract description 21
- 238000010586 diagram Methods 0.000 description 40
- 239000011159 matrix material Substances 0.000 description 23
- 230000006870 function Effects 0.000 description 20
- 230000009286 beneficial effect Effects 0.000 description 18
- 238000004364 calculation method Methods 0.000 description 12
- 238000000605 extraction Methods 0.000 description 12
- 238000013528 artificial neural network Methods 0.000 description 11
- 238000013461 design Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 235000019580 granularity Nutrition 0.000 description 8
- 238000003058 natural language processing Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- the present application relates to the field of artificial intelligence, and in particular to a model quantization method and related equipment.
- Artificial Intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that machines have the functions of perception, reasoning and decision-making.
- the embodiment of the present application provides a model quantization method and related equipment.
- different quantization step sizes are used in this scheme to quantize the sub-activation values corresponding to different channels. This is beneficial for retaining the abnormality of the quantized sub-activation values corresponding to the channels with abnormal distribution, and is beneficial for avoiding the loss of accuracy of the quantized sub-activation values corresponding to the channels with normal distribution.
- an embodiment of the present application provides a model quantization method, which can be used to compress a model in the field of artificial intelligence.
- the method is applied to a process of processing data using a first machine learning model.
- the model quantization method includes quantizing an activation value generated by at least one activation layer in the first machine learning model.
- the at least one activation layer includes a first activation layer, that is, the first activation value generated by the first activation layer is any activation value that needs to be quantized.
- the electronic device quantizes the first activation value generated by the first activation layer, including:
- the electronic device quantizes a first sub-activation value in a first activation value using a first quantization step size; and quantizes a second sub-activation value in the first activation value using a second quantization step size, wherein the first machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size and the second quantization step size are different.
- the electronic device can be a training device for the first model, or an execution device on which the first model is deployed.
- a method for quantizing the activation values generated by the activation layer in the first machine learning model which can reduce the computational complexity of the first machine learning model and can reduce the storage space occupied when using the first machine learning model to process data; in addition, since there may be channels with abnormal sub-activation value distributions among multiple channels, for example, the sub-activation values corresponding to the channels with abnormal distributions are stably too large or too small, if the same quantization step size is used to quantize the sub-activation values corresponding to each channel, then the value of the aforementioned quantization step size needs to be larger, and the accuracy of the quantized sub-activation values corresponding to the channels with normal distributions will be greatly reduced.
- different quantization step sizes are used in this scheme to quantize the sub-activation values corresponding to different channels, which is beneficial to retaining the abnormality of the quantized sub-activation values corresponding to the channels with abnormal distributions, and is beneficial to avoiding the loss of accuracy of the quantized sub-activation values corresponding to the channels with normal distributions.
- the distribution of the first sub-activation value is different from the distribution of the second sub-activation value.
- the distribution of the first sub-activation values corresponding to the first channel is abnormal
- the distribution of the second sub-activation values corresponding to the second channel is normal
- the first channel can also be called an abnormal channel
- the second sub-activation values exceeding the second proportion among all the second sub-activation values corresponding to the second channel are within the normal value range
- the second channel can also be called a normal channel
- the values of the first proportion and the second proportion can be the same or different.
- the first proportion and the second proportion The values of the two ratios can be 80%, 85%, 90% or other ratios, etc., or the values of the first ratio and the second ratio can be different, which is not limited here.
- more than 90% of the first sub-activation values of all the second sub-activation values corresponding to the second channel are between 20 and 30; more than 90% of the first sub-activation values of all the first sub-activation values corresponding to the first channel are greater than or equal to 50.
- more than 85% of the first sub-activation values of all the second sub-activation values corresponding to the second channel are between 10 and 20, and more than 85% of the first sub-activation values of all the first sub-activation values corresponding to the first channel are less than or equal to 1.
- more than 90% of the first sub-activation values of all the second sub-activation values corresponding to the second channel are between 10 and 20, and more than 90% of the first sub-activation values of all the first sub-activation values corresponding to the first channel are either greater than or equal to 60 or less than or equal to 1. It should be noted that the examples given here are only for the convenience of understanding the concept that "the distribution of the first sub-activation values corresponding to the first channel" is different from the “distribution of the second sub-activation values corresponding to the second channel", and are not used to limit this solution.
- the first machine learning model is a Transformer model.
- the technicians found in their research that when the first machine learning model uses the Transformer model, the difference between the sub-activation values corresponding to the channels with abnormal distribution and the sub-activation values corresponding to the channels with normal distribution is more obvious.
- the scheme of "using the first step length to quantize the sub-activation values corresponding to the first channel, and using the second step length to quantize the sub-activation values corresponding to the second channel" is more compatible with the Transformer model, and can reduce the amount of calculation of the Transformer model, reduce the number of parameters in the Transformer model, and avoid the reduction of the accuracy of the prediction results output by the Transformer model.
- multiple feature information of the input data can be obtained, the multiple feature information includes the first feature information, and the quantization method of the model also includes quantizing the first feature information.
- the electronic device quantizing the first feature information includes:
- the electronic device divides the first feature information into at least two sub-feature information, the at least two sub-feature information include first sub-feature information and second sub-feature information; the electronic device quantizes the first sub-feature information using a first quantization parameter, and quantizes the second sub-feature information using a second quantization parameter, and the first quantization parameter and the second quantization parameter are different.
- the quantization parameters used when quantizing the model may include a quantization step size, a quantization bias, or other types of quantization parameters, etc., which are not exhaustive here.
- the same input data may include semantically different parts
- the same image may include multiple semantically different areas
- the same text may include multiple semantically different words
- the distribution of values of sub-feature information corresponding to semantically different parts in the same input data has a large difference
- the distribution of values of sub-feature information corresponding to semantically identical parts has a small difference.
- the first feature information is divided into at least two sub-feature information, and different sub-feature information is quantized using different quantization parameters, which is beneficial to improving the matching degree between the value in the first feature information and the adopted quantization parameter.
- the distribution characteristics of the sub-feature information corresponding to the semantically identical parts are retained, and the difference of the sub-feature information corresponding to the semantically different parts is retained, which is beneficial to avoid reducing the accuracy of the prediction result output by the first machine learning model.
- the first quantization parameter is different from the second quantization parameter may represent that the quantization step size 1 used when quantizing the first sub-feature information is different from the quantization step size 2 used when quantizing the second sub-feature information, that is, different sub-feature information in the M sub-feature information uses the same quantization offset.
- the first quantization parameter is different from the second quantization parameter may represent that the quantization step size 1 and the quantization offset 1 used when quantizing the first sub-feature information are different from the quantization step size 2 and the quantization offset 2 used when quantizing the second sub-feature information.
- the input data is an image
- the task of the first machine learning model is to perform target detection on the image.
- the image is likely to include multiple objects, and usually several tokens in the first feature information focus on the same object in the image. Different tokens in the first feature information may focus on different objects in the image.
- the distribution of the values of the sub-feature information corresponding to the same object is similar, and the distribution of the sub-feature information corresponding to different objects is different.
- the input data of the machine learning model is likely to include multiple semantically different regions, and the "using different quantization parameters" to quantize different sub-feature information is more compatible with the specific task of "target detection task”.
- multiple feature information of the input data can be obtained, the multiple feature information includes second feature information, the second feature information includes feature maps of different scales, and the quantization method of the model also includes quantizing the second feature information.
- the electronic device quantizing the second feature information includes: the electronic device quantizes The second feature information is divided into multiple groups, each of the multiple groups includes at least one feature map, and different groups in the multiple groups include feature maps of different scales; different quantization parameters are used to quantize different groups.
- the size of multiple feature maps of different scales of the training sample is the same, and "feature maps of different scales" refers to the feature information of the training sample at different granularities. More details of the training sample can be seen in the feature map with a smaller granularity (also called denser), and the overall information of the training sample can be seen in the feature map with a larger granularity (also called sparser).
- the second feature information is obtained in the process of processing data on training samples using a machine learning model
- the second feature information since the second feature information includes feature maps of multiple scales, the second feature information is grouped based on the scale of each feature map, and different groups are quantized using different quantization parameters, that is, feature maps of different scales are quantized using different quantization parameters, which is beneficial to retaining the information carried by feature maps of different scales to avoid reducing the accuracy of the prediction results output by the machine learning model.
- the input data is an image
- the task of the first machine learning model is any of the following: object detection on the image, semantic segmentation on the image, or super-resolution on the image.
- the process of using the first machine learning model to process data is in the inference phase of the first machine learning model, or the process of using the first machine learning model to process data is in the training phase of the first machine learning model.
- the quantization method of the model provided in this application can be used, that is, it can not only reduce the amount of calculation when the machine learning model processes data on the execution device, but also reduce the amount of calculation when the machine learning model processes data on the training device.
- an embodiment of the present application provides a model quantization method, which can be used to compress a model in the field of artificial intelligence.
- the method is applied to a process of performing data processing using a machine learning model.
- multiple feature information of the input data can be obtained.
- the multiple feature information includes first feature information.
- the model quantization method includes quantizing the first feature information; wherein, the electronic device quantizes the first feature information, including: the electronic device divides the first feature information into at least two sub-feature information, the at least two sub-feature information include first sub-feature information and second sub-feature information; the electronic device uses a first quantization parameter to quantize the first sub-feature information, and uses a second quantization parameter to quantize the first sub-feature information, and the first quantization parameter and the second quantization parameter are different.
- the model quantization method also includes quantizing the activation value generated by at least one activation layer in the machine learning model, and the at least one activation layer includes a first activation layer; wherein quantizing the first activation value generated by the first activation layer includes: quantizing the first sub-activation value in the first activation value using a first quantization step size; and quantizing the second sub-activation value in the second activation value using a second quantization step size, wherein the machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size and the second quantization step size are different.
- the distribution of the first sub-activation values is different from the distribution of the second sub-activation values.
- the above machine learning model is a Transformer model.
- the input data is an image
- the task of the machine learning model is to detect objects in the image.
- multiple feature information of the input data can be obtained, the multiple feature information includes second feature information, the second feature information includes feature maps of different scales, and the quantization method of the model also includes quantizing the second feature information; wherein quantizing the second feature information includes: dividing the second feature information into multiple groups, each of the multiple groups includes at least one feature map, and different groups in the multiple groups include feature maps of different scales; and using different quantization parameters for quantization of different groups.
- the input data is an image
- the task of the machine learning model is any of the following: performing object detection on the image, performing semantic segmentation on the image, or performing super-resolution on the image.
- the process of using a machine learning model to process data is in the inference phase of the machine learning model, or the process of using a machine learning model to process data is in the training phase of the machine learning model.
- an embodiment of the present application provides a model quantization device, which can be used to compress a model in the field of artificial intelligence.
- the model quantization device is applied in the process of data processing using a machine learning model.
- the model quantization device is used to quantize the activation value generated by at least one activation layer in the machine learning model, and the at least one activation layer includes a first activation layer; wherein the model quantization device includes:
- a quantization module is used to quantize a first sub-activation value in a first activation value using a first quantization step size; the quantization module is also used to quantize a second sub-activation value in the first activation value using a second quantization step size, wherein the machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size and the second quantization step size are different.
- the quantization device of the model can also be used to execute the steps performed by the electronic device in the first aspect and various possible implementation methods of the first aspect.
- the specific implementation methods, meanings of terms and beneficial effects brought about by the steps in various possible implementation methods of the third aspect can all be referred to the first aspect and will not be repeated here.
- an embodiment of the present application provides a model quantization device, which can be used to compress a model in the field of artificial intelligence.
- the model quantization device is applied to a process of data processing using a machine learning model.
- multiple feature information of the input data can be obtained.
- the multiple feature information includes first feature information.
- the model quantization device is used to quantize the first feature information; wherein the model quantization device includes: a grouping module, which is used to divide the first feature information into at least two sub-feature information, and the at least two sub-feature information include first sub-feature information and second sub-feature information; a quantization module, which is used to quantize the first sub-feature information using a first quantization parameter; the quantization module is also used to quantize the first sub-feature information using a second quantization parameter, and the first quantization parameter and the second quantization parameter are different.
- a grouping module which is used to divide the first feature information into at least two sub-feature information, and the at least two sub-feature information include first sub-feature information and second sub-feature information
- a quantization module which is used to quantize the first sub-feature information using a first quantization parameter
- the quantization module is also used to quantize the first sub-feature information using a second quantization parameter, and the first quantization parameter and the second quantization parameter are different.
- the quantization device of the model can also be used to execute the steps performed by the electronic device in the first aspect and each possible implementation method of the first aspect.
- the specific implementation methods, meanings of terms and beneficial effects brought about by the steps in each possible implementation method of the fourth aspect can all be referred to the first aspect and will not be repeated here.
- an embodiment of the present application provides a computer program product, which includes a program.
- the program When the program is run on a computer, the computer executes the quantization method of the model described in the first aspect above.
- an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored.
- the computer-readable storage medium is run on a computer, the computer executes the quantization method of the model described in the first aspect above.
- an embodiment of the present application provides an electronic device, including a processor and a memory, wherein the processor is coupled to the memory, the memory is used to store programs; the processor is used to execute the programs in the memory, so that the electronic device executes the quantization method of the model of the first aspect mentioned above.
- the present application provides a chip system, which includes a processor for supporting a terminal device or a communication device to implement the functions involved in the above aspects, for example, sending or processing the data and/or information involved in the above methods.
- the chip system also includes a memory, which is used to store program instructions and data necessary for the terminal device or the communication device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- FIG1 is a schematic diagram of a structure of an artificial intelligence main framework provided in an embodiment of the present application.
- FIG2 is a system architecture diagram of a quantization system of a model provided in an embodiment of the present application.
- FIG3 is a schematic diagram of a process for quantizing a first activation value generated by a first activation layer according to an embodiment of the present application
- FIG4 is a schematic diagram of a quantization method of a model provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a flow chart of a method for quantifying a model provided in an embodiment of the present application
- FIG6 is a schematic diagram of quantifying first feature information provided in an embodiment of the present application.
- FIG7 is a schematic diagram of images of different scales provided in an embodiment of the present application.
- FIG8 is a schematic diagram of a quantization method of a model provided in an embodiment of the present application.
- FIG9 is a schematic diagram of a flow chart of a method for quantizing a model provided in an embodiment of the present application.
- FIG10 is a schematic diagram of a structure of a quantization device of a model provided in an embodiment of the present application.
- FIG11 is another schematic diagram of the structure of a quantization device for a model provided in an embodiment of the present application.
- FIG12 is another schematic diagram of the structure of a quantization device for a model provided in an embodiment of the present application.
- FIG13 is a schematic diagram of a structure of an execution device provided in an embodiment of the present application.
- FIG14 is a schematic diagram of a structure of a training device provided in an embodiment of the present application.
- FIG. 15 is a schematic diagram of the structure of a chip provided in an embodiment of the present application.
- Figure 1 shows a structural diagram of the main framework of artificial intelligence.
- the following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain” (horizontal axis) and “IT value chain” (vertical axis).
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
- the infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and supports it through the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips, which can be hardware acceleration chips such as central processing units (CPU), embedded neural network processors (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA); the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
- smart chips which can be hardware acceleration chips such as central processing units (CPU), embedded neural network processors (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA)
- the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc
- the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
- machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
- Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
- Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
- some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
- Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart manufacturing, smart transportation, smart homes, smart medical care, smart security, autonomous driving, smart cities, etc.
- Model quantization is a term in the field of artificial intelligence model acceleration, which refers to the discretization of continuous values (such as activation values, weight parameters or other information) in machine learning models.
- the quantization method of the model provided in the present application can be applied to visual perception tasks, natural language synthesis tasks related to speech semantics, audio and video processing tasks, and other fields that require neural network implementation.
- the following provides examples of multiple application scenarios of the embodiments of the present application.
- an autonomous driving vehicle can collect point cloud data corresponding to the environment around the vehicle through sensors, and perform target detection through a machine learning model based on the collected point cloud data to obtain a prediction result corresponding to the point cloud data.
- the prediction result is used to indicate the position of at least an object in the environment around the vehicle.
- the autonomous driving vehicle can plan its driving path based on the aforementioned prediction result.
- the above-mentioned vehicle can be a car, truck, motorcycle, bus, ship, airplane, helicopter, lawn mower, entertainment vehicle, amusement park vehicle, construction equipment, tram, golf cart or train, etc., and the embodiments of the present application are not particularly limited.
- mobile robots such as sweeping robots, tutoring robots, or other movable robots, etc.
- the prediction result is used to indicate the position of at least one obstacle around the mobile robot.
- the model quantization method provided in this application can be used to compress the above-mentioned machine learning models, thereby ensuring that some larger models can also perform reasoning tasks better on the terminal devices.
- Semantic segmentation refers to the use of a machine learning model to classify all pixels in an image.
- the model quantization method provided in this application can be used to compress the aforementioned machine learning model.
- the terminal device After obtaining the image to be classified, the terminal device (such as a mobile phone, tablet or laptop computer, etc.) can use a machine learning model to obtain the category of the object in the image to be classified, and then classify the image to be classified according to the category of the object in the image to be classified.
- the quantization method of the model provided in this application can be used to compress the aforementioned machine learning model.
- Natural language processing is the processing of human language. Natural language processing is the process of using machine learning models to systematically analyze, understand and extract information from text data.
- the quantization method of the model provided in this application can be used to compress the aforementioned machine learning model.
- natural language processing tasks can be classified into the following categories.
- Sequence labeling Each word in a sentence requires the model to give a classification category based on the context, such as Chinese word segmentation, part-of-speech tagging, named entity recognition, and semantic role labeling.
- Classification task The entire sentence outputs a classification value, such as text classification.
- Sentence relationship inference Given two sentences, determine whether the two sentences have a certain nominal relationship. For example, question answering system, semantic rewriting, natural language inference.
- Generative tasks Output a piece of text and generate another piece of text, such as machine translation, text summarization, poetry writing, and picture description.
- FIG. 2 is a system architecture diagram of the quantization system of the model provided in the embodiment of the present application.
- the quantization system 200 of the model includes a training device 210, a data Database 220 , execution device 230 , data storage system 240 and client device 250 , wherein the execution device 230 includes a computing module 231 .
- the database 220 stores a training data set.
- the training device 210 In the training stage of the first machine learning model 201, the training device 210 generates the first machine learning model 201, and iteratively trains the first machine learning model 201 using the training data set to obtain the trained first machine learning model 201.
- the first machine learning model 201 can be specifically expressed as a neural network or a non-neural network model.
- the first machine learning model 201 obtained by the training device 210 can be deployed in the computing module 231 of the execution device 230.
- the execution device 210 can be a mobile phone, a tablet, a laptop, a VR device, a vehicle or a monitoring system, etc.
- the execution device 230 can input the data to be processed into the first machine learning model 201 to obtain the prediction result output by the first machine learning model 201 corresponding to the data to be processed.
- the execution device 230 can call data, codes, etc. in the data storage system 240, or store data, instructions, etc. in the data storage system 240.
- the data storage system 240 can be placed in the execution device 230, or the data storage system 240 can be an external memory relative to the execution device 230.
- the quantization method of the model provided in the present application can be used in both the training stage and the inference stage of the first machine learning model 201, that is, the training device 210 and the execution device 230 can be the execution body of the quantization method of the model provided in the present application.
- the aforementioned method can be applied to the process of data processing using a machine learning model, and the quantization method of the model includes quantizing the activation value generated by at least one activation layer in the machine learning model, and the at least one activation layer includes a first activation layer, and the first activation layer is any one of the aforementioned at least one activation layer.
- Figure 3 is a flow chart of quantizing the first activation value generated by the first activation layer provided in an embodiment of the present application. 301.
- the electronic device quantizes the first sub-activation value in the first activation value using a first quantization step size. 302.
- the electronic device quantizes the second sub-activation value in the first activation value using a second quantization step size; wherein the machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size and the second quantization step size are different.
- the electronic device that executes steps 301 and 302 may be the training device 210 or the execution device 230 .
- Figure 4 is a schematic diagram of a quantization method of a model provided in an embodiment of the present application.
- the electronic device can divide the first activation value into a first sub-activation value corresponding to the first channel and a second sub-activation value corresponding to the second channel.
- the electronic device quantizes the first sub-activation value using a first quantization step size to obtain a quantized first sub-activation value; and quantizes the second sub-activation value using a second quantization step size to obtain a quantized second sub-activation value; the quantized first sub-activation value and the quantized second sub-activation value constitute the quantized first activation value.
- the example in Figure 4 is only for the convenience of understanding the present solution and is not used to limit the present solution.
- a method for quantizing activation values generated by an activation layer in a machine learning model which can reduce the computational complexity of the machine learning model and can reduce the storage space occupied in the process of using the machine learning model for data processing; in addition, since there may be channels with abnormal sub-activation value distribution among multiple channels, for example, the sub-activation value corresponding to the channel with abnormal distribution is stably too large or too small, if the same quantization step size is used to quantize the sub-activation value corresponding to each channel, then the value of the aforementioned quantization step size needs to be larger, and the accuracy of the quantized sub-activation value corresponding to the channel with normal distribution will be greatly reduced.
- different quantization step sizes are used in this scheme to quantize the sub-activation values corresponding to different channels, which is beneficial to retaining the abnormality of the quantized sub-activation value corresponding to the channel with abnormal distribution, and is beneficial to avoiding the loss of accuracy of the quantized sub-activation value corresponding to the channel with normal distribution.
- the execution device 230 and the client device 250 can be independent devices respectively.
- the execution device 230 is configured with an input/output (I/O) interface to interact with the client device 250 for data.
- the "user" can input the data to be processed through the client device 250, and the client device 250 sends the data to be processed to the execution device 230 through the I/O interface.
- the execution device 230 After the execution device 230 generates the prediction decision information corresponding to the data to be processed through the first machine learning model/rule 201 in the computing module 231, the aforementioned prediction decision information can be returned to the client device 250 through the I/O interface and provided to the user.
- FIG. 2 is only a schematic diagram of the architecture of the quantization system of the two models provided in the embodiment of the present invention, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
- the execution device 230 can be configured in the client device 250.
- the execution device 230 can be a module for performing array image processing in the host processor (Host CPU) of the mobile phone or tablet.
- the execution device 230 can also be a graphics processing unit (GPU) or a neural network processor (NPU) in the mobile phone or tablet.
- the GPU or NPU is mounted as a coprocessor to On the main processor, tasks are assigned by the main processor.
- the training stage describes the process in which the training device 210 trains the first machine learning model 201 using the training data in the database 220.
- FIG. 5 is a flow chart of a quantization method of the model provided in the embodiment of the present application.
- the quantization method of the model provided in the embodiment of the present application may include:
- the training device inputs the training samples into the first machine learning model, wherein in the process of processing the training samples using the first machine learning model, multiple feature information of the training samples and the activation values generated by the activation layer in the first machine learning model can be obtained.
- a training data set may be stored in a training device, and the training data set may include multiple training samples and the expected results corresponding to each training sample; wherein the specific expressions of "training samples” and “expected results corresponding to training samples” need to be determined in combination with the actual application scenario;
- the task performed by the first machine learning model may be any of the following: target detection, semantic segmentation of images, super-resolution processing of images, image classification, natural language processing or other types of tasks, etc.; for the description of "natural language processing" tasks, please refer to the above description, and they will not be listed one by one here.
- the "training samples” can be represented as images, and the "expected results corresponding to the training samples” can be represented as the correct position information of at least one object in the image.
- the "training samples” can be represented as images, and the "expected results corresponding to the training samples” can be represented as the correct category of each pixel in the image.
- the correct category can be foreground or background, etc.
- the training device can input training samples (which is also another name for "input data" in the training stage) into the first machine learning model to process the training samples through the first machine learning model, and then obtain the expected results corresponding to the training samples; the specific expressions of "the predicted results corresponding to the training samples” and “the expected results corresponding to the training samples” are similar and will not be elaborated here.
- the process of using the first machine learning model to process data of training samples includes the process of using the first machine learning model to extract features of training samples.
- multiple feature information of the training samples and the activation value generated by the activation layer in the machine learning model can be obtained.
- the first machine learning model can be expressed as a Transformer model, a convolutional neural network (CNN), a recurrent neural network or other types of neural networks, etc., which are not limited here.
- CNN convolutional neural network
- recurrent neural network a recurrent neural network or other types of neural networks, etc., which are not limited here.
- the training device obtains first feature information and divides the first feature information into M sub-feature information, where the first feature information is included in multiple feature information of the training sample, and M is an integer greater than or equal to 2.
- step 502 and step 503 are optional steps.
- multiple feature information of the training sample can be obtained.
- the training device can obtain the first feature information of the training sample in the process of feature extraction of the training sample through the first machine learning model.
- the first feature information is included in the multiple feature information of the training sample.
- the training device divides the first feature information into M sub-feature information, where M is an integer greater than or equal to 2.
- a first feature information includes multiple feature maps
- the scales of different feature maps in the first feature information are the same; or, a first feature information includes only one feature map.
- the training device uses a first quantization parameter to quantize the first sub-feature information, and uses a second quantization parameter to quantize the second sub-feature information, the M sub-feature information includes the first sub-feature information and the second sub-feature information, and the first quantization parameter and the second quantization parameter are different.
- the training device can store M groups of quantization parameters corresponding to the M sub-feature information one by one, and when quantizing any one of the M sub-feature information (hereinafter referred to as "target sub-feature information" for the convenience of description), the target quantization parameter corresponding to the target sub-feature information can be obtained from the M groups of quantization parameters, and the target sub-feature information is quantized using the target quantization parameter to obtain the quantized target sub-feature information.
- the M sub-feature information includes first sub-feature information and second sub-feature information
- the training device uses a first quantization parameter to quantize the first sub-feature information and uses a second quantization parameter to quantize the second sub-feature information, and the first quantization parameter and the second quantization parameter are different.
- the quantization parameters used when quantizing the model may include a quantization step size, a quantization bias, or other types of quantization parameters, which are not exhaustively listed here.
- Any one or more of the quantization step size and the quantization bias can be set as learnable parameters, that is, during the training process of the first machine learning model, the quantization step size and/or the quantization bias are continuously updated; or, both the quantization step size and the quantization bias can be set as hyperparameters.
- q s (X) represents the quantization of X
- s represents the quantized value obtained after quantizing X
- s represents the quantization step size in the quantization parameter
- ⁇ represents the quantization bias in the quantization parameter
- "**" represents the nearest rounding operation
- clamp(*,t n ,t p ) represents a clamping operation.
- the maximum value of is limited to below t p .
- the minimum value of is limited to above t n . It should be noted that the above example is only an example for facilitating the understanding of the present solution. Other quantization algorithms may also be used. The examples given here are not used to limit the present solution.
- “the first quantization parameter and the second quantization parameter are different” may represent that the quantization step size 1 used when quantizing the first sub-feature information is different from the quantization step size 2 used when quantizing the second sub-feature information, that is, different sub-feature information in the M sub-feature information uses the same quantization bias.
- “the first quantization parameter and the second quantization parameter are different” may represent that the quantization step size 1 and the quantization offset 1 used when quantizing the first sub-feature information are different from the quantization step size 2 and the quantization offset 2 used when quantizing the second sub-feature information.
- the training device may adopt the method in steps 502 and 503 to quantize any first feature information to obtain quantized first feature information.
- the first feature information before using the first feature information for matrix multiplication, may be quantized using the method in steps 502 and 503, and the quantized first feature information may be used for matrix multiplication.
- the feature processing module of the first machine learning model uses a Transformer module, and the attention mechanism is used in the process of extracting features from training samples using the Transformer module; the query matrix, key matrix, and value matrix used in the data processing process based on the attention mechanism may be quantized using the method in steps 502 and 503, and the quantized query matrix, key matrix, and value matrix may be used for data processing; that is, the query matrix, key matrix, and value matrix are all examples of the first feature information.
- Figure 6 is a schematic diagram of quantizing the first feature information provided by an embodiment of the present application.
- the first feature information includes three feature maps corresponding to three channels as an example, and different regions in the same feature map are quantized using different quantization steps.
- each feature map is divided into three groups of sub-feature information, and the sub-feature information in the upper region of each feature map (that is, the upper two rows in Figure 6) is quantized using a quantization step of 1, and the sub-feature information in the middle region of each feature map (that is, the middle two rows in Figure 6) is quantized using a quantization step of 2.
- each feature map (that is, the lower two rows in Figure 6) is quantized using a quantization step of 3. It should be understood that the example in Figure 6 is only for the convenience of understanding this solution and is not used to limit this solution.
- the same input data may include parts with different semantics
- the same image may include multiple areas with different semantics
- the same text may include multiple words with different semantics, etc.
- the distribution of values of sub-feature information corresponding to parts with different semantics in the same input data has a large difference
- the distribution of values of sub-feature information corresponding to parts with the same semantics has a small difference.
- the first feature information is divided into at least two sub-feature information, and different sub-feature information is quantized using different quantization parameters, which is beneficial to improving the matching degree between the value in the first feature information and the used quantization parameter.
- the distribution characteristics of the sub-feature information corresponding to the parts with the same semantics are retained, and the difference of the sub-feature information corresponding to the parts with different semantics is retained, which is beneficial to avoid reducing the accuracy of the prediction result output by the first machine learning model.
- the image is likely to include multiple objects.
- several tokens in the first feature information focus on the same object in the image. Different tokens in the first feature information may focus on different objects in the image.
- the distribution of the values of the sub-feature information corresponding to the same object is similar, and the distribution of the sub-feature information corresponding to different objects is different. That is, when the machine learning model is used to perform target detection tasks, the input data of the machine learning model is likely to include multiple semantically different areas. "Using different quantization parameters" to quantize different sub-feature information is more adaptable to the specific task of "target detection task".
- the training device quantizes a first sub-activation value in the first activation value using a first quantization step size, and quantizes a second sub-activation value in the first activation value using a second quantization step size
- the first machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size and the second quantization step size are different.
- step 504 is an optional step. Since the first machine learning model may include one or more activation layers, the training device can generate activation values corresponding to all channels through each activation layer in the first machine learning model during the process of processing data of training samples using the first machine learning model.
- the first activation value can be an activation value generated by any activation layer, that is, the activation values generated by each activation layer in the first machine learning model are quantized in the manner of step 504; or, only the activation values generated by some preset activation layers in the first machine learning model are quantized in the manner of step 504, etc.
- Which activation layers in the first machine learning model are quantized in the manner of step 504 can be flexibly determined based on actual conditions, and is not limited in the embodiments of the present application.
- the training device can divide the first activation value into N groups corresponding to N channels, where N is an integer greater than or equal to 2; optionally, the distribution of sub-activation values corresponding to different channels in the N channels is different, that is, the distribution of sub-activation values included in different groups in the N groups is different.
- the training device uses different quantization step sizes to quantize the values of different groups in the N groups.
- the quantization offsets used by values of different groups in the N groups may be the same or different.
- the N groups include a first sub-activation value and a second sub-activation value.
- the training device can obtain the first sub-activation value and the second sub-activation value from the first activation value, quantize the first sub-activation value in the first activation value using a first quantization step size, and quantize the second sub-activation value in the first activation value using a second quantization step size.
- the first machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size is different from the second quantization step size.
- the distribution of the first sub-activation value is different from the distribution of the second sub-activation value.
- the distribution of the first sub-activation value corresponding to the first channel is abnormal, and the distribution of the second sub-activation value corresponding to the second channel is normal.
- the first channel can also be called an abnormal channel
- the second sub-activation values exceeding the second proportion among all the second sub-activation values corresponding to the second channel are within the normal value range, then the second channel can also be called a normal channel
- the values of the first proportion and the second proportion can be the same or different.
- the values of the first proportion and the second proportion can both be 80%, 85%, 90%, or other proportion values, etc., or the values of the first proportion and the second proportion can be different, etc., which are not limited here.
- more than 90% of the first sub-activation values of all the second sub-activation values corresponding to the second channel are between 20 and 30; more than 90% of the first sub-activation values of all the first sub-activation values corresponding to the first channel are greater than or equal to 50.
- more than 85% of the first sub-activation values of all the second sub-activation values corresponding to the second channel are between 10 and 20, and more than 85% of the first sub-activation values of all the first sub-activation values corresponding to the first channel are less than or equal to 1.
- more than 90% of the first sub-activation values of all the second sub-activation values corresponding to the second channel are between 10 and 20, and more than 90% of the first sub-activation values of all the first sub-activation values corresponding to the first channel are either greater than or equal to 60, or less than or equal to 1. It should be noted that the examples given here are only for the convenience of understanding the concept that "the distribution of the first sub-activation values corresponding to the first channel" is different from the “distribution of the second sub-activation values corresponding to the second channel", and are not used to limit this solution.
- the training device can input a small amount of training samples into the first machine learning model, and count the distribution of sub-activation values corresponding to different channels each time a training sample is input, so as to determine which channels in the first machine learning model have abnormal distributions and which channels have normal distributions, and then mark which channels in the first machine learning model are first channels and which channels are second channels.
- the training device may input a training sample into the first machine learning model to obtain all sub-activation values corresponding to each channel.
- the training device determines a first number of sub-activation values greater than a first value among all sub-activation values corresponding to the target channel, and determines the target channel.
- the second number of all sub-activation values corresponding to the channel if the ratio between the first number and the second number is greater than or equal to the first ratio, the target channel is determined to be a channel with abnormal distribution; if the ratio between the first number and the second number is less than the first ratio, the target channel is determined to be a channel with normal distribution.
- the training device performs the above operation on each channel to determine which channels are normally distributed and which channels are abnormally distributed in the first machine learning model.
- the training device can obtain T training samples, and after inputting one of the T training samples into the first machine learning model, all sub-activation values corresponding to each channel are obtained.
- the training device determines the first number and the second number corresponding to the target channel. If the ratio between the first number and the second number is greater than or equal to the first ratio, the number of times the target channel is determined to be a channel with abnormal distribution is increased by one; if the ratio between the first number and the second number is less than the first ratio, it is not increased. After the training device inputs all T training samples into the first machine learning model, the total number of times the target channel is determined to be a channel with abnormal distribution can be obtained.
- the target channel is determined to be a channel with abnormal distribution; if the ratio between the total number of times the target channel is determined to be a channel with abnormal distribution and T is less than the second ratio, the target channel is determined to be a channel with normal distribution.
- the training device performs the above operations on each channel to determine which are channels with normal distribution and which are channels with abnormal distribution in the first machine learning model.
- the training device may also use other methods to determine which channels in the first machine learning model have abnormal distribution and which channels have normal distribution.
- the examples given here are only to prove the feasibility of this solution and are not used to limit this solution.
- the first activation value can be split into two parts, one part only includes multiple first sub-activation values, and the other part only includes multiple second sub-activation values. It should be understood that the examples here are only for the convenience of understanding this solution and are not used to limit this solution.
- the sub-activation value corresponding to the channel with abnormal distribution is stably too large or too small
- the same quantization step size is used to quantize the sub-activation value corresponding to each channel, the value of the aforementioned quantization step size needs to be larger, and the accuracy of the quantized sub-activation value corresponding to the channel with normal distribution will be greatly reduced.
- different quantization step sizes are used in this scheme to quantize the sub-activation values corresponding to different channels, which is beneficial to retaining the abnormality of the quantized sub-activation value corresponding to the channel with abnormal distribution, and is beneficial to avoiding the loss of accuracy of the quantized sub-activation value corresponding to the channel with normal distribution.
- the scheme of "using the first step size to quantize the sub-activation values corresponding to the first channel, and using the second step size to quantize the sub-activation values corresponding to the second channel" is more compatible with the Transformer model. It can reduce the amount of computation of the Transformer model, reduce the number of parameters in the Transformer model, and avoid a reduction in the accuracy of the prediction results output by the Transformer model.
- the training device may impose linear constraints on the first quantization step size and the second quantization step size, so that the values of the first quantization step size and the second quantization step size are more hardware-friendly.
- the following uses the quantization step size used for the sub-activation value corresponding to the channel with normal distribution and the quantization step size used for the sub-activation value corresponding to the channel with abnormal distribution as an example to disclose an example of the constraint relationship between the first quantization step size and the second quantization step size:
- s outller represents the first quantization step
- s normal represents the second quantization step
- max(X outlier ) represents the normal value of all distribution anomalies.
- max( Xnormal ) represents the maximum value of all the second sub-activation values corresponding to the channel with normal distribution
- max(Xnormal) represents the maximum value of all the first sub-activation values corresponding to all the channels with normal distribution.
- the training device divides the second feature information into multiple groups, each of the multiple groups includes at least one feature map, and different groups in the multiple groups include feature maps with different scales, wherein the first feature information is included in multiple feature information of the training sample, and the second feature information includes feature maps of different scales.
- steps 505 and 506 are optional steps. If the training device can obtain the second feature information of the training sample in the process of data processing of the training sample, one second feature information of the training sample includes multiple feature maps of different scales, that is, the multiple feature information of the training sample obtained by the training device in the process of data processing of the training sample includes the second feature information.
- the training device can divide the second feature information into multiple groups, each of the multiple groups includes at least one feature map, and different groups in the multiple groups include feature maps of different scales.
- the second feature information of the training sample can be obtained in the process of data processing of the training sample: target detection on the image, semantic segmentation on the image, super-resolution processing on the image, or other types of image processing tasks.
- target detection on the image target detection on the image
- semantic segmentation on the image semantic segmentation on the image
- super-resolution processing on the image or other types of image processing tasks.
- first feature information and the second feature information are different feature information.
- One first feature information includes one feature map, or one first feature information includes multiple feature maps with the same scale; and one second feature information includes multiple feature maps with different scales.
- feature maps of different scales of a training sample have the same size
- “feature maps of different scales” refer to feature information of the training sample at different granularities. More details of the training sample can be seen in feature maps with smaller granularity (also called denser granularity), and the overall information of the training sample can be seen in feature maps with larger granularity (also called sparser granularity).
- Figure 7 is a schematic diagram of images of different scales provided in an embodiment of the present application.
- Figure 7 includes two left and right sub-schematic diagrams, and the left sub-schematic diagram and the right sub-schematic diagram of Figure 7 have the same size.
- the left sub-schematic diagram of Figure 7 there are dogs, bushes, trees and houses in the background in the image.
- Feature extraction of the right sub-schematic diagram of Figure 7 obtains detailed feature information of the local area; the left sub-schematic diagram and the right sub-schematic diagram of Figure 7 represent images of different scales. It should be understood that the example in Figure 7 is only for the convenience of understanding the present solution and is not used to limit the present solution.
- the training device uses different quantization parameters to quantize different groups in the second feature information.
- different quantization parameters may be used to quantize different groups in the second characteristic information.
- a second feature information of a training sample includes feature maps of L scales
- the aforementioned second feature information can be divided into L groups in the training device, and L groups of quantization parameters corresponding one-to-one to the L groups (that is, L groups of quantization parameters corresponding one-to-one to the L scales) are stored on the training device, so that different groups in the second feature information are quantized using different quantization parameters; the meaning of "different quantization parameters" can be found in the above description and will not be repeated here.
- q s (X1) [q s1 (X 1 ); q s2 (X 2 ); q s3 (X 3 ); ...; q sL (X L )]
- qs (X1) represents quantizing a second feature information X1
- XL represent L groups included in the second feature information X1
- different groups in the L groups include feature maps with different scales
- qs1 ( X1 ) represents quantizing the values in X1 using the first group of quantization parameters
- qs2 ( X2 ) represents quantizing the values in X2 using the second group of quantization parameters
- qs3 ( X3 ) represents quantizing the values in X3 using the third group of quantization parameters
- qsL ( XL ) represents quantizing the values in XL using the Lth group of quantization parameters, that is, different groups in the L groups are quantized using different quantization parameters.
- the training device may quantize each second feature information of the training sample using the method in steps 505 and 506; alternatively, the training device may quantize part of the second feature information of the training sample using the method in steps 505 and 506.
- the specific second feature information to be quantized using the method in steps 505 and 506 may be flexibly determined based on the actual application scenario, and is not limited in the embodiments of the present application.
- the second feature information is obtained in the process of processing the training sample data using the machine learning model, since the second feature information includes multiple feature graphs with different scales, the second feature information is grouped based on the scale of each feature graph, and the second feature information is grouped based on the scale of each feature graph.
- Different groups are quantized using different quantization parameters, that is, different scale feature maps are quantized using different quantization parameters, which is beneficial to retain the information carried by feature maps of different scales to avoid reducing the accuracy of the prediction results output by the machine learning model.
- steps 502 and 503, step 504, and steps 505 and 506 are all optional steps. If steps 502 and 503, step 504, and/or steps 505 and 506 are executed, the execution order of steps 502 and 503, step 504, and steps 505 and 506 is not limited in the embodiment of the present application, nor is the number of executions of steps 502 and 503, step 504, and steps 505 and 506 limited. The specific execution times can be flexibly determined based on actual conditions and are not limited here.
- Figure 8 is a schematic diagram of a quantization method of a model provided in an embodiment of the present application.
- the task of the first model is target detection
- the first model includes a feature extraction module (backbone), a feature processing module (neck) and a detection head module (head) as an example.
- the detection head module (head) is used to detect the category of the object and the position of the object respectively.
- the feature extraction module (backbone), the feature processing module (neck) and the detection head module (head) in the first model can generate the first feature information of the training sample, and the first feature information can be quantified by the methods in steps 502 and 503.
- the activation layer in the feature extraction module (backbone) and the activation layer in the feature processing module (neck) can both generate activation values, and the method in step 504 can be used to quantify the aforementioned activation values. It should be understood that the example in FIG8 is only for the convenience of understanding the present solution and is not used to limit the present solution.
- the training device obtains the prediction result output by the first machine learning model, and trains the first machine learning model according to the expected result, prediction result and loss function corresponding to the training sample.
- the training device after the training device inputs the training sample into the first machine learning model and processes the training sample through the first machine learning model, it can obtain the prediction result output by the first machine learning model, and generate the function value of the loss function according to the expected result corresponding to the training sample and the aforementioned prediction result, and the aforementioned loss function indicates the similarity between the prediction result corresponding to the training sample and the expected result.
- the training device uses the back propagation algorithm to update the weight parameters in the first machine learning model and some quantization parameters in steps 502 to 506, thereby realizing one training of the first machine learning model.
- the predicted value output by the first machine learning model can be compared with the desired expected value, and then the weight vector of each layer of the neural network can be updated according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the first machine learning model). For example, if the predicted value of the model is high, the weight vector is adjusted to make it predict lower, and it is adjusted continuously until the first machine learning model can predict the desired target value or a value very close to the desired target value.
- the training device may also quantize the weight parameters in the first machine learning model, and use the quantized weight parameters in the next round of training of the first machine learning model.
- the training device repeatedly executes steps 501 to 507 until the convergence condition of the loss function is met, so as to realize iterative training of the first machine learning model and obtain the trained first machine learning model and multiple sets of quantization parameters; the aforementioned multiple sets of quantization parameters are used to quantize the feature information and/or activation values generated by the first machine learning model during the inference stage of the first machine learning model.
- the inference stage describes the process in which the execution device 230 uses the trained first machine learning model 201 to process the data to be processed and outputs a prediction result corresponding to the data to be processed.
- FIG. 9 is a flow chart of a quantization method of the model provided in the embodiment of the present application.
- the quantization method of the model provided in the embodiment of the present application may include:
- the execution device inputs the data to be processed into the first machine learning model, wherein, in the process of processing the data to be processed using the first machine learning model, multiple feature information of the data to be processed and the activation value generated by the activation layer in the first machine learning model can be obtained.
- the execution device obtains first feature information and divides the first feature information into M sub-feature information, where the first feature information is included in a plurality of feature information of the data to be processed, and M is an integer greater than or equal to 2.
- the execution device quantizes the first sub-feature information using the first quantization parameter and quantizes the second sub-feature information using the second quantization parameter.
- the M sub-feature information includes first sub-feature information and second sub-feature information, and the first quantization parameter and the second quantization parameter are different.
- the execution device quantizes a first sub-activation value in the first activation value using a first quantization step size, and quantizes a second sub-activation value in the first activation value using a second quantization step size
- the first machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the distribution of the first sub-activation value corresponding to the first channel is different from the distribution of the second sub-activation value corresponding to the second channel, and the first quantization step size and the second quantization step size are different.
- the execution device divides the second feature information into multiple groups, each of the multiple groups includes at least one feature graph, and different groups in the multiple groups include feature graphs with different scales, wherein the first feature information is included in multiple feature information of the data to be processed, and the second feature information includes feature graphs of different scales.
- the execution device quantizes different groups in the second feature information using different quantization parameters.
- the execution device obtains the prediction result output by the first machine learning model.
- the specific implementation method of steps 901 to 907 and the meaning of each noun in steps 901 to 907 can refer to the description in the embodiments corresponding to Figure 5.
- the difference is that the "training sample” in the embodiment corresponding to Figure 5 is replaced by the "data to be processed” in the embodiment corresponding to Figure 9.
- the "data to be processed” is another name for the "input data" in the reasoning stage, and will not be repeated here one by one.
- steps 902 and 903, step 904, and steps 905 and 906 are all optional steps. If steps 902 and 903, step 904, and/or steps 905 and 906 are executed, the execution order of steps 902 and 903, step 904, and steps 905 and 906 is not limited in the embodiment of the present application, nor is the number of executions of steps 902 and 903, step 904, and steps 905 and 906 limited. The specific execution times can be flexibly determined based on actual conditions and are not limited here.
- the quantization method of the model provided in the present application can be adopted, that is, not only the amount of computation of the machine learning model when processing data on the execution device can be reduced, but also the amount of computation of the machine learning model when processing data on the training device can be reduced.
- DETR with improved denoising anchor boxes for end to end object detection (DINO) and deformable detection transformer (DETR) for performing object detection tasks are two machine learning models
- floating point operations per second (FLOPs) is an indicator representing the amount of calculation
- mAP is an indicator representing the accuracy
- channel-wise is an existing quantization method.
- the quantization method of the model provided by the present application not only greatly reduces the number of parameters of the machine learning model, reduces the amount of calculation of the machine learning model, but also avoids the reduction of the accuracy of the prediction results output by the model.
- FIG. 10 is a structural schematic diagram of a quantization device of a model provided in an embodiment of the present application.
- the quantization device 1000 of the model is applied in the process of data processing using a machine learning model.
- the quantization device of the model is used to quantize the activation value generated by at least one activation layer in the machine learning model, and the at least one activation layer includes a first activation layer;
- the quantization device 1000 of the model includes: a quantization module 1001, which is used to quantize the first activation value by using a first quantization step size.
- the sub-activation value is quantized; the quantization module 1001 is also used to quantize the second sub-activation value in the first activation value using a second quantization step size, wherein the machine learning model includes multiple channels, the multiple channels include a first channel and a second channel, the first sub-activation value corresponds to the first channel, the second sub-activation value corresponds to the second channel, and the first quantization step size and the second quantization step size are different.
- the distribution of the first sub-activation values is different from the distribution of the second sub-activation values.
- the machine learning model is a Transformer model.
- multiple feature information of the input data can be obtained, the multiple feature information includes the first feature information, and the quantization device 1000 of the model is also used to quantify the first feature information;
- the quantization device 1000 of the model also includes: a grouping module 1002, which is used to divide the first feature information into at least two sub-feature information, and the at least two sub-feature information include first sub-feature information and second sub-feature information; the quantization module 1002 is also used to quantize the first sub-feature information using a first quantization parameter; the quantization module 1002 is also used to quantize the second sub-feature information using a second quantization parameter, and the first quantization parameter and the second quantization parameter are different.
- a grouping module 1002 which is used to divide the first feature information into at least two sub-feature information, and the at least two sub-feature information include first sub-feature information and second sub-feature information
- the quantization module 1002 is also used to quantize the first sub-feature information using a first quantization parameter
- the quantization module 1002 is also used to quantize the second sub-feature information using a second quantization parameter, and the first quantization parameter and the second quantization parameter are different.
- the input data is an image
- the task of the machine learning model is to detect objects in the image.
- multiple feature information of the input data can be obtained in the process of processing the input data using the machine learning model, the multiple feature information includes second feature information, the second feature information includes feature maps of different scales, and the quantization device 1000 of the model is also used to quantize the second feature information;
- the quantization device 1000 of the model also includes: a grouping module 1002, used to divide the second feature information into multiple groups, each of the multiple groups includes at least one feature map, and different groups in the multiple groups include feature maps with different scales; the quantization module 1001 is also used to quantize different groups using different quantization parameters.
- the input data is an image
- the task of the machine learning model is any of the following: object detection on the image, semantic segmentation on the image, or super-resolution on the image.
- the process of using a machine learning model to process data is in the inference phase of the machine learning model, or the process of using a machine learning model to process data is in the training phase of the machine learning model.
- FIG. 12 is another schematic diagram of the structure of a quantization device of a model provided in an embodiment of the present application.
- the quantization device 1200 of the model is applied to a process of performing data processing using a machine learning model.
- multiple feature information of the input data can be obtained.
- the multiple feature information includes first feature information.
- the quantization device 1200 of the model is used to quantize the first feature information.
- the quantization device 1200 of the model includes: a grouping module 1201, which is used to divide the first feature information into at least two sub-feature information, and the at least two sub-feature information include first sub-feature information and second sub-feature information; a quantization module 1202, which is used to quantize the first sub-feature information using a first quantization parameter; the quantization module 1202 is also used to quantize the first sub-feature information using a second quantization parameter, and the first quantization parameter and the second quantization parameter are different.
- the model quantization device 1200 is further used to quantize activation values generated by at least one activation layer in the machine learning model, the at least one activation layer including a first activation layer;
- the quantization module 1202 is also used to quantize the first sub-activation value in the first activation value using a first quantization step size; the quantization module 1202 is also used to quantize the second sub-activation value in the second activation value using a second quantization step size, wherein the machine learning model includes multiple channels, the first sub-activation value corresponds to a first channel among the multiple channels, the second sub-activation value corresponds to a second channel among the multiple channels, and the first quantization step size is different from the quantization step size.
- the electronic device can be a training device for a first machine learning model.
- the device may also be an execution device configured with the first machine learning model.
- FIG. 13 is a schematic diagram of a structure of an execution device provided in an embodiment of the present application.
- the execution device 1300 may be specifically a virtual reality (VR) device, a mobile phone, a tablet, a laptop, a smart wearable device, a monitoring data processing device or a radar data processing device, etc., which are not limited here.
- VR virtual reality
- the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303 and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 may be one or more, and one processor is taken as an example in FIG. 13), wherein the processor 1303 may include an application processor 13031 and a communication processor 13032.
- the receiver 1301, the transmitter 1302, the processor 1303 and the memory 1304 may be connected via a bus or other means.
- the memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A portion of the memory 1304 may also include a non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- the memory 1304 stores processor and operation instructions, executable modules or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
- the processor 1303 controls the operation of the execution device.
- the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc.
- the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc.
- various buses are referred to as bus systems in the figure.
- the method disclosed in the above embodiment of the present application can be applied to the processor 1303, or implemented by the processor 1303.
- the processor 1303 can be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 1303 or the instruction in the form of software.
- the above processor 1303 can be a general processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the processor 1303 can implement or execute the various methods, steps and logic block diagrams disclosed in the embodiment of the present application.
- the general processor can be a microprocessor or the processor can also be any conventional processor, etc.
- the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to be executed, or a combination of hardware and software modules in the decoding processor can be executed.
- the software module may be located in a storage medium mature in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc.
- the storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304 and completes the steps of the above method in combination with its hardware.
- the receiver 1301 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device.
- the transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen.
- the application processor 13031 in the processor 1303 is used to execute the quantization method of the model executed by the execution device in the embodiments corresponding to Figures 3 to 9.
- the specific manner in which the application processor 13031 executes the above steps is based on the same concept as the various method embodiments corresponding to Figures 3 to 9 in the present application, and the technical effects brought by it are the same as the various method embodiments corresponding to Figures 3 to 9 in the present application.
- FIG. 14 is a schematic diagram of a structure of a training device provided in an embodiment of the present application.
- the training device 1400 is implemented by one or more servers.
- the training device 1400 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1422 (for example, one or more processors) and memory 1432, and one or more storage media 1430 (for example, one or more mass storage devices) storing application programs 1442 or data 1444.
- the memory 1432 and the storage medium 1430 can be short-term storage or permanent storage.
- the program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1422 can be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400.
- the training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458, and/or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- operating systems 1441 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- the central processing unit 1422 is used to execute the quantization method of the model executed by the training device in the embodiment corresponding to Figure 12. It should be noted that the specific manner in which the central processing unit 1422 executes the above steps is based on the same concept as the various method embodiments corresponding to Figure 12 in the present application, and the technical effects brought by it are the same as the various method embodiments corresponding to Figure 12 in the present application. For specific contents, please refer to the description in the method embodiments shown in the previous embodiment of the present application, which will not be repeated here.
- Also provided in an embodiment of the present application is a computer program product, which, when executed on a computer, enables the computer to execute the steps executed by the training device in the method described in the embodiments shown in the aforementioned Figures 3 to 8, or enables the computer to execute the steps executed by the execution device in the method described in the embodiment shown in the aforementioned Figure 9.
- a computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing.
- the computer executes the steps executed by the training device in the method described in the embodiments shown in Figures 3 to 8 above, or the computer executes the steps executed by the execution device in the method described in the embodiment shown in Figure 9 above.
- the execution device, training device or model quantization device may be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc.
- the processing unit may execute the computer execution instructions stored in the storage unit so that the chip executes the quantization method of the model described in the embodiments shown in Figures 3 to 9 above.
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
- ROM read-only memory
- RAM random access memory
- FIG. 15 is a schematic diagram of a structure of a chip provided in an embodiment of the present application.
- the chip can be expressed as a neural network processor NPU 150.
- NPU 150 is mounted on the host CPU (Host CPU) as a coprocessor, and tasks are assigned by the Host CPU.
- the core part of the NPU is the operation circuit 150, which controls the operation circuit 1503 through the controller 1504 to extract matrix data in the memory and perform multiplication operations.
- the operation circuit 1503 includes multiple processing units (Process Engine, PE) inside.
- the operation circuit 1503 is a two-dimensional systolic array.
- the operation circuit 1503 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
- the operation circuit 1503 is a general-purpose matrix processor.
- the operation circuit takes the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the operation circuit.
- the operation circuit takes the matrix A data from the input memory 1501 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1508.
- Unified memory 1506 is used to store input data and output data. Weight data is directly transferred to weight memory 1502 through Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 through DMAC.
- DMAC Direct Memory Access Controller
- BIU stands for Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1509.
- IOB instruction fetch buffer
- the bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
- BIU Bus Interface Unit
- DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.
- the vector calculation unit 1507 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
- the vector calculation unit 1507 can store the processed output vector to the unified memory 1506.
- the vector calculation unit 1507 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value.
- the vector calculation unit 1507 generates a normalized value, a pixel-level summed value, or both.
- the processed output vector can be used as an activation input to the operation circuit 1503, for example, for use in a subsequent layer in a neural network.
- the controller 1504 is connected to an instruction fetch buffer 1509, which is used to store instructions used by the controller 1504. instruction;
- Unified memory 1506, input memory 1501, weight memory 1502 and instruction fetch memory 1509 are all on-chip memories. External memories are private to the NPU hardware architecture.
- each layer in the first machine learning model shown in the above embodiments can be performed by the operation circuit 1503 or the vector calculation unit 1507.
- the processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the above-mentioned first aspect method.
- the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
- a computer device which can be a personal computer, a training device, or a network device, etc.
- all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
- all or part of the embodiments may be implemented in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center.
- the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations.
- the available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.
- a magnetic medium e.g., a floppy disk, a hard disk, a tape
- an optical medium e.g., a DVD
- a semiconductor medium e.g., a solid-state drive (SSD)
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
一种模型的量化方法以及相关设备,该方法可用于人工智能领域中对模型进行压缩,其中,对机器学习模型中第一激活层生成的第一激活值进行量化包括:采用第一量化步长对第一激活值中的第一子激活值进行量化,采用第二量化步长对第一激活值中的第二子激活值进行量化;机器学习模型中第一通道与第一子激活值对应,机器学习模型中第二通道与第二子激活值对应,第一量化步长和第二量化步长不同。本方案中采用不同的量化步长对不同通道所对应的子激活值进行量化,既有利于保留分布异常的通道所对应的量化后的子激活值的异常性,又有利于避免分布正常的通道所对应的量化后的子激活值的精度的损失。
Description
本申请要求于2023年02月25日提交国家知识产权局、申请号为202310215082.0、发明名称为“一种模型的量化方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能领域,尤其涉及一种模型的量化方法以及相关设备。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
随着人工智能技术的发展,将机器学习模型部署在终端设备上的场景越来越多。但很多机器学习模型十分复杂,参数量庞大,对于终端设备的硬件要求很高,基于终端设备资源有限的现状,一种对机器学习模型进行压缩的方案亟待推出。
发明内容
本申请实施例提供了一种模型的量化方法以及相关设备,针对不同通道所对应的子激活值的分布不同的情况,本方案中采用不同的量化步长对不同通道所对应的子激活值进行量化,既有利于保留分布异常的通道所对应的量化后的子激活值的异常性,又有利于避免分布正常的通道所对应的量化后的子激活值的精度的损失。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种模型的量化方法,可用于人工智能领域中对模型进行压缩,方法应用于利用第一机器学习模型进行数据处理的过程中,模型的量化方法包括对第一机器学习模型中的至少一个激活层生成的激活值进行量化,至少一个激活层包括第一激活层,也即第一激活层生成的第一激活值为任意一个需要量化的激活值。其中,电子设备对第一激活层生成的第一激活值进行量化包括:
电子设备采用第一量化步长对第一激活值中的第一子激活值进行量化;并采用第二量化步长对第一激活值中的第二子激活值进行量化,其中,第一机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。电子设备可以为第一模型的训练设备,也可以为部署有第一模型的执行设备。
本实现方式中,提供了对第一机器学习模型中的激活层生成的激活值进行量化的方法,可以降低第一机器学习模型的计算复杂度,且能够降低利用第一机器学习模型进行数据处理的过程时所占用的存储空间;此外,由于多个通道中可能会存在子激活值分布异常的通道,例如分布异常的通道所对应的子激活值稳定的超大或超小,若采用相同的量化步长对每个通道所对应的子激活值进行量化,则前述量化步长的取值就需要较大,则与分布正常的通道所对应的量化后的子激活值的精度就会大大降低,针对不同通道所对应的子激活值的分布不同的情况,本方案中采用不同的量化步长对不同通道所对应的子激活值进行量化,既有利于保留分布异常的通道所对应的量化后的子激活值的异常性,又有利于避免分布正常的通道所对应的量化后的子激活值的精度的损失。
在一种可能实现方式中,第一子激活值的分布和所述第二子激活值的分布不同。此处以第一通道所对应的第一子激活值分布异常,第二通道所对应的第二子激活值分布正常为例,示例性地,第一通道所对应的所有第一子激活值中超过第一比例的第一激活值稳定的超大或超小,则第一通道也可以称之为异常通道;第二通道所对应的所有第二子激活值中超过第二比例的第二子激活值均处于正常的取值范围内,则第二通道也可以称之为正常通道;第一比例和第二比例的取值可以相同或不同。例如,第一比例和第
二比例的取值均可以为百分之八十、百分之八十五、百分之九十或者也可以为其他比例值等等,又或者第一比例和第二比例的取值可以不同等,此处均不做限定。
例如,第二通道所对应的所有第二子激活值中百分之九十以上的第一子激活值位于20至30之间;第一通道所对应的所有第一子激活值中百分之九十以上的第一子激活值大于或等于50。又例如,第二通道所对应的所有第二子激活值中百分之八十五以上的第一子激活值位于10至20之间,第一通道所对应的所有第一子激活值中百分之八十五以上的第一子激活值小于或等于1。又例如,第二通道所对应的所有第二子激活值中百分之九十以上的第一子激活值的取值位于10至20之间,第一通道所对应的所有第一子激活值中百分之九十以上的第一子激活值要么大于或等于60,要么小于或等于1。需要说明的是,此处举例仅为方便理解“第一通道所对应的第一子激活值的分布”与“第二通道所对应的第二子激活值的分布”不同这一概念,不用于限定本方案。
在一种可能实现方式中,第一机器学习模型为变形金刚Transformer模型。本实现方式中,技术人员在研究中发现,当第一机器学习模型选用Transformer模型时,分布异常的通道所对应的子激活值和分布正常的通道所对应的子激活值之间的差异更明显,“采用第一步长对第一通道所对应的子激活值进行量化,并采用第二步长对第二通道所对应的子激活值进行量化”这一方案与Transformer模型的适配度更高,能够在降低Transformer模型的计算量,减少Transformer模型中的参数量的同时,避免Transformer模型输出的预测结果的精度的降低。
在一种可能实现方式中,在利用第一机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第一特征信息,模型的量化方法还包括对第一特征信息进行量化。其中,电子设备对第一特征信息进行量化包括:
电子设备将第一特征信息分为至少两个子特征信息,至少两个子特征信息包括第一子特征信息和第二子特征信息;电子设备采用第一量化参数对第一子特征信息进行量化,并采用第二量化参数对第二子特征信息进行量化,第一量化参数和第二量化参数不同。示例性地,对模型进行量化时所采用的量化参数可以包括量化步长、量化偏置或其他类型的量化参数等,此处不做穷举。
本实现方式中,由于同一个输入数据中可能包括语义不同的部分,例如同一个图像中可能会包括多个语义不同的区域,又例如同一个文本中可能会包括多个语义不同的词语等,则同一输入数据中语义不同的部分所对应的子特征信息的值的分布具有较大差异,语义相同的部分所对应的子特征信息的值的分布具有较小差异,本方案中将第一特征信息分为至少两个子特征信息,以采用不同的量化参数对不同的子特征信息进行量化,有利于提高第一特征信息中的值与采用的量化参数之间的匹配度,采用本方案对第一特征信息执行了量化操作之后,既保留了语义相同的部分所对应的子特征信息的分布特性,又保留了语义不同的部分所对应的子特征信息的差异性,有利于避免降低第一机器学习模型输出的预测结果的精准度。
在一种可能实现方式中,“第一量化参数和第二量化参数不同”可以代表对第一子特征信息进行量化时采用的量化步长1与对第二子特征信息进行量化时采用的量化步长2不同,也即M个子特征信息中不同的子特征信息采用相同的量化偏置。或者,“第一量化参数和第二量化参数不同”可以代表对第一子特征信息进行量化时采用的量化步长1以及量化偏置1,与,对第二子特征信息进行量化时采用的量化步长2以及量化偏置2不同。
在一种可能实现方式中,输入数据为图像,第一机器学习模型的任务为对图像进行目标检测。本实现方式中,由于当利用机器学习模型对图像执行目标检测任务时,图像中大概率包括多个物体,通常由第一特征信息中的几个令牌(token)来关注图像中的同一个物体,第一特征信息中不同的token可能关注图像中不同的物体,同一物体所对应的子特征信息的值的分布相似,不同物体所对应的子特征信息的分布不同,也即当机器学习模型是用于执行目标检测任务时,该机器学习模型的输入数据中大概率是包括多个语义不同的区域的,“采用不同量化参数”对不同的子特征信息进行量化与“目标检测任务”这一具体的任务之间的适配度更高。
在一种可能实现方式中,在利用第一机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第二特征信息,第二特征信息包括不同尺度的特征图,模型的量化方法还包括对第二特征信息进行量化。其中,电子设备对第二特征信息进行量化包括:电子设备将
第二特征信息分为多个组,多个组中每个组包括至少一个特征图,多个组中不同的组包括的特征图的尺度不同;对不同的组采用不同的量化参数进行量化。示例性地,训练样本的多个不同尺度的特征图的尺寸相同,“不同尺度的特征图”指的是训练样本在不同粒度上的特征信息,粒度更小(也可以称为更密集)的特征图中可以看到训练样本更多的细节,粒度更大(也可以称为更稀疏)的特征图中可以看到训练样本整体的信息。
本实现方式中,若在利用机器学习模型对训练样本进行数据处理的过程中得到了第二特征信息,由于第二特征信息包括多个尺度不同的特征图,基于每个特征图的尺度对第二特征信息进行分组,对不同的组采用不同的量化参数进行量化,也即对不同尺度的特征图采用不同的量化参数进行量化,有利于保留不同尺度的特征图所携带的信息,以避免降低机器学习模型输出的预测结果的准确率。
在一种可能实现方式中,输入数据为图像,第一机器学习模型的任务为如下任一项:对图像进行目标检测、对图像进行语义分割或者对图像进行超分处理。本实现方式中,提供了多种应用场景,有利于提高本方案的实现灵活性。
在一种可能实现方式中,利用第一机器学习模型进行数据处理的过程在第一机器学习模型的推理阶段中,或者,利用第一机器学习模型进行数据处理的过程在第一机器学习模型的训练阶段中。本实现方式中,无论在机器学习模型的训练阶段还是推理阶段,只要利用机器学习模型对输入数据进行数据处理时,均可以采用本申请提供的模型的量化方法,也即不仅能降低机器学习模型在执行设备上进行数据处理时的计算量,还能降低机器学习模型在训练设备上进行数据处理时的计算量。
第二方面,本申请实施例提供一种模型的量化方法,可用于人工智能领域中对模型进行压缩,方法应用于利用机器学习模型进行数据处理的过程中,在利用机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第一特征信息,模型的量化方法包括对第一特征信息进行量化;其中,电子设备对第一特征信息进行量化包括:电子设备将第一特征信息分为至少两个子特征信息,至少两个子特征信息包括第一子特征信息和第二子特征信息;电子设备采用第一量化参数对第一子特征信息进行量化,并采用第二量化参数对第一子特征信息进行量化,第一量化参数和第二量化参数不同。
在一种可能实现方式中,模型的量化方法还包括对机器学习模型中的至少一个激活层生成的激活值进行量化,至少一个激活层包括第一激活层;其中,对第一激活层生成的第一激活值进行量化包括:采用第一量化步长对第一激活值中的第一子激活值进行量化;采用第二量化步长对第二激活值中的第二子激活值进行量化,其中,机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。
在一种可能实现方式中,第一子激活值的分布和第二子激活值的分布不同。
在一种可能实现方式中,上述机器学习模型为Transformer模型。
在一种可能实现方式中,输入数据为图像,机器学习模型的任务为对图像进行目标检测。
在一种可能实现方式中,在利用机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第二特征信息,第二特征信息包括不同尺度的特征图,模型的量化方法还包括对第二特征信息进行量化;其中,对第二特征信息进行量化包括:将第二特征信息分为多个组,多个组中每个组包括至少一个特征图,多个组中不同的组包括的特征图的尺度不同;对不同的组采用不同的量化参数进行量化。
在一种可能实现方式中,输入数据为图像,机器学习模型的任务为如下任一项:对图像进行目标检测、对图像进行语义分割或者对图像进行超分处理。
在一种可能实现方式中,利用机器学习模型进行数据处理的过程在机器学习模型的推理阶段中,或者,利用机器学习模型进行数据处理的过程在机器学习模型的训练阶段中。
本申请第二方面中,第二方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。
第三方面,本申请实施例提供一种模型的量化装置,可用于人工智能领域中对模型进行压缩,模型的量化装置应用于利用机器学习模型进行数据处理的过程中,模型的量化装置用于对机器学习模型中的至少一个激活层生成的激活值进行量化,至少一个激活层包括第一激活层;其中,模型的量化装置包括:
量化模块,用于采用第一量化步长对第一激活值中的第一子激活值进行量化;量化模块,还用于采用第二量化步长对第一激活值中的第二子激活值进行量化,其中,机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。
本申请第三方面中,模型的量化装置还可以用于执行第一方面以及第一方面的各个可能实现方式中电子设备执行的步骤,第三方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。
第四方面,本申请实施例提供一种模型的量化装置,可用于人工智能领域中对模型进行压缩,模型的量化装置应用于利用机器学习模型进行数据处理的过程中,在利用机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第一特征信息,模型的量化装置用于对第一特征信息进行量化;其中,模型的量化装置包括:分组模块,用于将第一特征信息分为至少两个子特征信息,至少两个子特征信息包括第一子特征信息和第二子特征信息;量化模块,用于采用第一量化参数对第一子特征信息进行量化;量化模块,还用于采用第二量化参数对第一子特征信息进行量化,第一量化参数和第二量化参数不同。
本申请第四方面中,模型的量化装置还可以用于执行第一方面以及第一方面的各个可能实现方式中电子设备执行的步骤,第四方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。
第五方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括程序,当该程序在计算机上运行时,使得计算机执行上述第一方面所述的模型的量化方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的模型的量化方法。
第七方面,本申请实施例提供了一种电子设备,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得电子设备执行上述第一方面的模型的量化方法。
第八方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持终端设备或通信设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存终端设备或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;
图2为本申请实施例提供的模型的量化系统的一种系统架构图;
图3为本申请实施例提供的对第一激活层生成的第一激活值进行量化的一种流程示意图;
图4为本申请实施例提供的模型的量化方法的一种示意图;
图5为本申请实施例提供的模型的量化方法的一种流程示意图;
图6为本申请实施例提供的对第一特征信息进行量化的一种示意图;
图7为本申请实施例提供的不同尺度的图像的一种示意图;
图8为本申请实施例提供的模型的量化方法的一种示意图;
图9为本申请实施例提供的模型的量化方法的一种流程示意图;
图10为本申请实施例提供的模型的量化装置的一种结构示意图;
图11为本申请实施例提供的模型的量化装置的另一种结构示意图;
图12为本申请实施例提供的模型的量化装置的另一种结构示意图;
图13为本申请实施例提供的执行设备的一种结构示意图;
图14为本申请实施例提供的训练设备的一种结构示意图;
图15为本申请实施例提供的芯片的一种结构示意图。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供,该智能芯片具体可以采用中央处理器(central processing unit,CPU)、嵌入式神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、智慧城市等。
本申请提供的模型的量化方法可以应用于人工智能技术的各种应用领域中,具体用于对各个应用领域中的机器学习模型进行压缩;本申请采用量化(Quantization)的方式对机器学习模型进行压缩。模型量化是人工智能的模型加速领域的一个术语,指的是将机器学习模型中连续的值(例如激活值、权重参数或其他信息等)进行离散化。
示例性地,本申请提供的模型的量化方法可以应用于视觉上的感知任务、语音语义相关的自然语言合成任务、音视频处理任务等需要神经网络实现的领域中,以下对本申请实施例的多个应用场景进行举例。
应用场景1:目标检测
例如,在自动驾驶领域中,自动驾驶车辆可以通过传感器采集自车周围环境所对应的点云数据,并基于采集到的点云数据通过机器学习模型进行目标检测,得到与点云数据对应的预测结果,该预测结果用于指示自车周围环境中至少物体的位置,自动驾驶车辆可以根据前述预测结果规划自车的行驶路径。
需要说明的是,上述车辆可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车或火车等,本申请实施例不做特别的限定。
又例如,在智能监控领域中,公共场所与交通道路上会安装很多摄像头,少数智能摄像头在采集到周围环境的图像信息之后,能够对采集到的图像执行目标检测的任务。
又例如,在智能家居领域中,移动机器人(例如扫地机器人、家教机器人或其他可移动的机器人等)可以采集机器人周围环境所对应的三维图像,并基于采集到的三维图像通过机器学习模型进行目标检测,得到与前述三维图像对应的预测结果,该预测结果用于指示移动机器人周围的至少一个障碍物的位置。
由于自动驾驶车辆、智能摄像头、移动机器人或其他类型终端设备的算力是有限的,利用本申请提供的模型的量化方法可以对上述机器学习模型进行压缩,从而保证一些较大的模型也能在终端设备上较好地执行推理任务。
应用场景2:对图像进行语义分割
语义分割(semantic segmentation)指的是采用机器学习模型对图像中的所有像素点进行分类,则可以利用本申请提供的模型的量化方法对前述机器学习模型进行压缩。
应用场景3:对图像进行超分处理
例如,在智能监控领域、智能医疗领域、视频编码通信等场景中,存在采用机器学习模型,根据观测到的低分辨率图像重建出高分辨率图像的需求,则可以利用本申请提供的模型的量化方法对前述机器学习模型进行压缩。
应用场景4:图像分类
终端设备(例如手机、平板或笔记本电脑等)在获取待分类图像后,可以采用机器学习模型获取待分类图像中的物体的类别,然后可根据待分类图像中物体的类别对待分类图像进行分类,则可以利用本申请提供的模型的量化方法对前述机器学习模型进行压缩。
应用场景5:自然语言处理(natural language processing,NLP)
自然语言处理就是对人类语言的处理,自然语言处理是采用机器学习模型对文本数据进行系统化分析、理解与信息提取的过程,则可以利用本申请提供的模型的量化方法对前述机器学习模型进行压缩。通过使用前述机器学习模型,我们可以管理非常大块的文本数据,或者执行大量的自动化任务,并且解决各式各样的问题,如自动摘要(automatic summarization),机器翻译(machine translation,MT),命名实体识别(named entity recognition,NER),关系提取(relation extraction,RE),信息抽取(information extraction,IE),情感分析,语音识别(speech recognition),问答系统(question answering)以及主题分割等等。
示例性地,自然语言处理任务可以有以下几类。
序列标注:句子中每一个单词要求模型根据上下文给出一个分类类别。如中文分词、词性标注、命名实体识别、语义角色标注。
分类任务:整个句子输出一个分类值,如文本分类。
句子关系推断:给定两个句子,判断这两个句子是否具备某种名义关系。例如问答系统、语义改写、自然语言推断。
生成式任务:输出一段文本,生成另一段文本。如机器翻译、文本摘要、写诗造句、看图说话。
需要说明的是,上述对本申请的各种应用场景的举例仅为方便理解本方案,不用于限定本方案。
在对本申请提供的模型的量化方法的具体实现方式进行描述之前,请先参阅图2,图2为本申请实施例提供的模型的量化系统的一种系统架构图,在图2中,模型的量化系统200包括训练设备210、数
据库220、执行设备230、数据存储系统240和客户设备250,执行设备230中包括计算模块231。
其中,数据库220中存储有训练数据集合,在第一机器学习模型201的训练阶段,训练设备210生成第一机器学习模型201,并利用训练数据集合对第一机器学习模型201进行迭代训练,得到训练后的第一机器学习模型201。第一机器学习模型201可以具体表现为神经网络,也可以表现为非神经网络的模型。
训练设备210得到的第一机器学习模型201可以部署于执行设备230的计算模块231中,例如执行设备210可以表现为手机、平板、笔记本电脑、VR设备、车辆或监控系统等等。在第一机器学习模型201的推理阶段,执行设备230可以将待处理数据输入第一机器学习模型201中,得到第一机器学习模型201输出的与该待处理数据对应的预测结果。
其中,执行设备230可以调用数据存储系统240中的数据、代码等,也可以将数据、指令等存入数据存储系统240中。数据存储系统240可以置于执行设备230中,也可以为数据存储系统240相对执行设备230是外部存储器。
在第一机器学习模型201的训练阶段和推理阶段均可以采用本申请提供的模型的量化方法,也即训练设备210和执行设备230均可以为本申请提供的模型的量化方法的执行主体。前述方法可以应用于利用机器学习模型进行数据处理的过程中,该模型的量化方法包括对机器学习模型中的至少一个激活层生成的激活值进行量化,至少一个激活层包括第一激活层,第一激活层为前述至少一个激活层中的任意一个。请参阅图3,图3为本申请实施例提供的对第一激活层生成的第一激活值进行量化的一种流程示意图。301、电子设备采用第一量化步长对第一激活值中的第一子激活值进行量化。302、电子设备采用第二量化步长对第一激活值中的第二子激活值进行量化;其中,机器学习模型包括多个通道(channel),多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。执行步骤301和302的电子设备可以为训练设备210,也可以为执行设备230。
为了更直观地理解本方案,请参阅图4,图4为本申请实施例提供的模型的量化方法的一种示意图。如图4所示,电子设备在获取到第一激活值之后,可以将第一激活值分为第一通道所对应的第一子激活值以及第二通道所对应的第二子激活值。电子设备采用第一量化步长对第一子激活值进行量化,得到量化后的第一子激活值;并采用第二量化步长对第二子激活值进行量化,得到量化后的第二子激活值;量化后的第一子激活值和量化后的第二子激活值构成量化后的第一激活值,应理解,图4中的示例仅为方便理解本方案,不用于限定本方案。
本申请实施例中,提供了对机器学习模型中的激活层生成的激活值进行量化的方法,可以降低机器学习模型的计算复杂度,且能够降低利用机器学习模型进行数据处理的过程时所占用的存储空间;此外,由于多个通道中可能会存在子激活值分布异常的通道,例如分布异常的通道所对应的子激活值稳定的超大或超小,若采用相同的量化步长对每个通道所对应的子激活值进行量化,则前述量化步长的取值就需要较大,则与分布正常的通道所对应的量化后的子激活值的精度就会大大降低,针对不同通道所对应的子激活值的分布不同的情况,本方案中采用不同的量化步长对不同通道所对应的子激活值进行量化,既有利于保留分布异常的通道所对应的量化后的子激活值的异常性,又有利于避免分布正常的通道所对应的量化后的子激活值的精度的损失。
本申请的一些实施例中,请参阅图2,执行设备230和客户设备250可以为分别独立的设备,执行设备230配置有输入/输出(I/O)接口,与客户设备250进行数据交互,“用户”可以通过客户设备250输入待处理数据,客户设备250通过I/O接口向执行设备230发送待处理数据,执行设备230在通过计算模块231中的第一机器学习模型/规则201生成与待处理数据对应的预测决策信息之后,可以通过I/O接口将前述预测决策信息返回给客户设备250,提供给用户。
值得注意的,图2仅是本发明实施例提供的两种模型的量化系统的架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,执行设备230可以配置于客户设备250中,作为示例,例如当客户设备为手机或平板时,执行设备230可以为手机或平板的主处理器(Host CPU)中用于进行阵列图像处理的模块,执行设备230也可以为手机或平板中的图形处理器(graphics processing unit,GPU)或者神经网络处理器(NPU),GPU或NPU作为协处理器挂载到
主处理器上,由主处理器分配任务。
结合上述描述,下面开始对本申请实施例提供的机器学习模型的训练阶段和推理阶段的具体实现流程进行描述。
一、训练阶段
本申请实施例中,训练阶段描述的是训练设备210利用数据库220中的训练数据对第一机器学习模型201进行训练的过程,具体的,请参阅图5,图5为本申请实施例提供的模型的量化方法的一种流程示意图,本申请实施例提供的模型的量化方法可以包括:
501、训练设备将训练样本输入第一机器学习模型,其中,利用第一机器学习模型对训练样本进行数据处理的过程中能够得到训练样本的多个特征信息和第一机器学习模型中的激活层生成的激活值。
本申请实施例中,训练设备中可以存储有训练数据集合,训练数据集合可以包括多个训练样本以及每个训练样本所对应的期望结果;其中,“训练样本”和“训练样本所对应的期望结果”的具体表现形式需要结合实际应用场景确定;示例性地,第一机器学习模型所执行的任务可以为如下任一种:目标检测、对图像进行语义分割、对图像进行超分处理、图像分类、自然语言处理或其他类型的任务等;对于“自然语言处理”类的任务的描述可以参阅上述描述,此处不再一一列举。
例如,若第一机器学习模型的任务是对图像进行目标检测,则“训练样本”可以表现为图像,“训练样本所对应的期望结果”可以表现为图像中至少一个物体的正确的位置信息。又例如,若第一机器学习模型的任务是对图像进行语义分割,则“训练样本”可以表现为图像,“训练样本所对应的期望结果”可以表现为图像中每个像素的正确类别,正确类别可以为前景或背景等等,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
训练设备可以将训练样本(也即训练阶段中“输入数据”的一个别称)输入第一机器学习模型,以通过第一机器学习模型对训练样本进行数据处理,进而得到该训练样本所对应的期望结果;“训练样本所对应的预测结果”和“训练样本所对应的期望结果”的具体表现形式类似,此处不做赘述。
其中,利用第一机器学习模型对训练样本进行数据处理的过程包括利用第一机器学习模型对训练样本进行特征提取的过程,则利用第一机器学习模型对训练样本进行数据处理的过程中能够得到训练样本的多个特征信息和机器学习模型中的激活层生成的激活值。
示例性地,第一机器学习模型可以表现为变形金刚(Transformer)模型、卷积神经网络(convolutional neural networks,CNN)、循环神经网络或其他类型的神经网络等等,此处不做限定。
502、训练设备获取第一特征信息,并将第一特征信息分为M个子特征信息,第一特征信息包含于训练样本的多个特征信息,M为大于或等于2的整数。
本申请实施例中,步骤502和步骤503为可选步骤,在训练设备对训练样本进行数据处理的过程中,能够得到训练样本的多个特征信息,则训练设备可以在通过第一机器学习模型对训练样本进行特征提取的过程中,获取训练样本的第一特征信息,第一特征信息包含于上述训练样本的多个特征信息。训练设备将第一特征信息分为M个子特征信息,M为大于或等于2的整数。
可选地,若一个第一特征信息包括多个特征图(feature map),则第一特征信息中不同的特征图的尺度相同;或者,一个第一特征信息只包括一个特征图(feature map)。
503、训练设备采用第一量化参数对第一子特征信息进行量化,采用第二量化参数对第二子特征信息进行量化,M个子特征信息包括第一子特征信息和第二子特征信息,第一量化参数和第二量化参数不同。
本申请实施例中,训练设备将第一特征信息分为M个子特征信息之后,可以采用不同的量化参数对不同的子特征信息进行量化。也即训练设备中可以存储有与M个子特征信息一一对应的M组量化参数,在对M个子特征信息中任意一个子特征信息(为方便描述,后续称为“目标子特征信息”)进行量化时,可以从M组量化参数中获取与目标子特征信息对应的目标量化参数,并采用目标量化参数对目标子特征信息进行量化,得到量化后的目标子特征信息。
示例性地,M个子特征信息包括第一子特征信息和第二子特征信息,则训练设备采用第一量化参数对第一子特征信息进行量化,采用第二量化参数对第二子特征信息进行量化,第一量化参数和第二量化参数不同。
示例性地,对模型进行量化时所采用的量化参数可以包括量化步长、量化偏置或其他类型的量化参数等,此处不做穷举。量化步长和量化偏置中的任一项或多项可以设置为可以学习的参数,也即在第一机器学习模型的训练过程中,不断更新量化步长和/或量化偏置;或者,量化步长和量化偏置均可以被设置为超参数。
为了进一步理解本方案,首先介绍一下模型量化的过程,示例性地,采用可微量化参数(learned step size quantization+,LSQ+)作为量化算法时所采用的公式如下:
其中,qs(X)代表对X进行量化,代表对X进行量化后得到的量化后的值,s代表量化参数中的量化步长,β代表量化参数中的量化偏置,「**」代表最近取整操作,clamp(*,tn,tp)代表一个钳制操作,将的取值的最大值限制在tp之下,将的取值的最小值限制在tn之上,需要说明的是,上述示例仅为方便理解本方案的一个示例,也可以采用其他量化算法,此处举例不用于限定本方案。
在一种实现方式中,“第一量化参数和第二量化参数不同”可以代表对第一子特征信息进行量化时采用的量化步长1与对第二子特征信息进行量化时采用的量化步长2不同,也即M个子特征信息中不同的子特征信息采用相同的量化偏置。
在另一种实现方式中,“第一量化参数和第二量化参数不同”可以代表对第一子特征信息进行量化时采用的量化步长1以及量化偏置1,与,对第二子特征信息进行量化时采用的量化步长2以及量化偏置2不同。
本申请实施例中,在一种实现方式中,训练设备可以采用步骤502和503中的方式,对任意一个第一特征信息进行量化,得到量化后的第一特征信息。
在另一种实现方式中,可以为在利用第一特征信息进行矩阵乘法之前,采用步骤502和503中的方式对前述第一特征信息进行量化,并采用量化后的第一特征信息进行矩阵乘法。示例性地,第一机器学习模型的特征处理模块中采用Transformer模块,采用Transformer模块来对训练样本进行特征提取的过程中,会采用注意力机制;则可以对基于注意力机制进行数据处理过程中使用到的查询(query)矩阵、关键字(key)矩阵以及价值(value)矩阵均可以采用步骤502和503中的方式进行量化,并采用量化后的query矩阵、key矩阵以及value矩阵进行数据处理;也即前述query矩阵、key矩阵以及value矩阵均为第一特征信息的一个示例。
为了更直观地理解本方案,请参阅图6,图6为本申请实施例提供的对第一特征信息进行量化的一种示意图。图6中以第一特征信息包括与三个通道对应的三个特征图为例,将同一特征图中不同区域采用不同的量化步长进行量化,如图6所示,将每个特征图分为三组子特征信息,对每个特征图中的上部区域(也即图6中的上面两行)的子特征信息采用量化步长1进行量化,对每个特征图的中间区域(也即图6中的中间两行)的子特征信息采用量化步长2进行量化,对每个特征图的下部区域(也即图6中的下面两行)的子特征信息采用量化步长3进行量化,应理解,图6中的示例仅为方便理解本方案,不用于限定本方案。
本申请实施例中,由于同一个输入数据中可能包括语义不同的部分,例如同一个图像中可能会包括多个语义不同的区域,又例如同一个文本中可能会包括多个语义不同的词语等,则同一输入数据中语义不同的部分所对应的子特征信息的值的分布具有较大差异,语义相同的部分所对应的子特征信息的值的分布具有较小差异,本方案中将第一特征信息分为至少两个子特征信息,以采用不同的量化参数对不同的子特征信息进行量化,有利于提高第一特征信息中的值与采用的量化参数之间的匹配度,采用本方案对第一特征信息执行了量化操作之后,既保留了语义相同的部分所对应的子特征信息的分布特性,又保留了语义不同的部分所对应的子特征信息的差异性,有利于避免降低第一机器学习模型输出的预测结果的精准度。
由于当利用机器学习模型对图像执行目标检测任务时,图像中大概率包括多个物体,通常由第一特征信息中的几个令牌(token)来关注图像中的同一个物体,第一特征信息中不同的token可能关注图
像中不同的物体,同一物体所对应的子特征信息的值的分布相似,不同物体所对应的子特征信息的分布不同,也即当机器学习模型是用于执行目标检测任务时,该机器学习模型的输入数据中大概率是包括多个语义不同的区域的,“采用不同量化参数”对不同的子特征信息进行量化与“目标检测任务”这一具体的任务之间的适配度更高。
504、训练设备采用第一量化步长对第一激活值中的第一子激活值进行量化,采用第二量化步长对第一激活值中的第二子激活值进行量化,第一机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。
本申请实施例中,步骤504为可选步骤,由于第一机器学习模型中可以包括一个或多个激活层,则训练设备在利用第一机器学习模型对训练样本进行数据处理的过程中,可以通过第一机器学习模型中的每个激活层生成与所有通道对应的激活值。
示例性地,第一激活值可以为任意一个激活层生成的激活值,也即对第一机器学习模型中每个激活层生成的激活值均采用步骤504的方式进行量化;或者,仅对第一机器学习模型中部分预设的激活层生成的激活值采用步骤504的方式进行量化等,具体对第一机器学习模型中哪些激活层采用步骤504中的方式进行量化可以根据实际情况灵活确定,本申请实施例中不做限定。
训练设备在获取到第一激活层生成的第一激活值之后,可以将第一激活值分为与N种通道对应的N个组,N为大于或等于2的整数;可选地,N种通道中不同通道所对应的子激活值的分布不同,也即N个组中不同组包括的子激活值的分布不同。训练设备采用不同的量化步长对N个组中不同组的值进行量化。
可选地,N个组中不同组的值所采用的量化偏置可以相同或不同。
示例性地,N个组中包括第一子激活值和第二子激活值,训练设备可以从第一激活值中获取第一子激活值和第二子激活值,采用第一量化步长对第一激活值中的第一子激活值进行量化,采用第二量化步长对第一激活值中的第二子激活值进行量化。
其中,第一机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。可选地,第一子激活值的分布与第二子激活值的分布不同。
此处以第一通道所对应的第一子激活值分布异常,第二通道所对应的第二子激活值分布正常为例,示例性地,第一通道所对应的所有第一子激活值中超过第一比例的第一激活值稳定的超大或超小,则第一通道也可以称之为异常通道;第二通道所对应的所有第二子激活值中超过第二比例的第二子激活值均处于正常的取值范围内,则第二通道也可以称之为正常通道;第一比例和第二比例的取值可以相同或不同。例如,第一比例和第二比例的取值均可以为百分之八十、百分之八十五、百分之九十或者也可以为其他比例值等等,又或者第一比例和第二比例的取值可以不同等,此处均不做限定。
例如,第二通道所对应的所有第二子激活值中百分之九十以上的第一子激活值位于20至30之间;第一通道所对应的所有第一子激活值中百分之九十以上的第一子激活值大于或等于50。
又例如,第二通道所对应的所有第二子激活值中百分之八十五以上的第一子激活值位于10至20之间,第一通道所对应的所有第一子激活值中百分之八十五以上的第一子激活值小于或等于1。
又例如,第二通道所对应的所有第二子激活值中百分之九十以上的第一子激活值的取值位于10至20之间,第一通道所对应的所有第一子激活值中百分之九十以上的第一子激活值要么大于或等于60,要么小于或等于1。需要说明的是,此处举例仅为方便理解“第一通道所对应的第一子激活值的分布”与“第二通道所对应的第二子激活值的分布”不同这一概念,不用于限定本方案。
可选地,训练设备可以通过向第一机器学习模型中输入少量的训练样本,统计每次输入一个训练样本时不同通道所对应的子激活值的分布,以确定第一机器学习模型中哪些通道为分布异常的通道,哪些通道为分布正常的通道,进而标记第一机器学习模型中哪些通道为第一通道,哪些通道为第二通道。
示例性地,在一种实现方式中,训练设备可以将一个训练样本输入第一机器学习模型,得到每个通道所对应的所有子激活值。针对第一机器学习模型中的任意一个通道(为方便描述,后续称为“目标通道”),训练设备确定目标通道所对应的所有子激活值中大于第一数值的子激活值的第一数量,确定目标
通道所对应的所有子激活值的第二数量,若第一数量与第二数量之间的比值大于或等于第一比值,则确定目标通道为分布异常的通道;若第一数量与第二数量之间的比值小于第一比值,则确定目标通道为分布正常的通道。训练设备对每个通道执行前述操作,以确定第一机器学习模型中哪些是分布正常的通道,哪些是分布异常的通道。
在另一种实现方式中,训练设备可以获取T个训练样本,将T个训练样本中的一个训练样本输入第一机器学习模型之后,得到每个通道所对应的所有子激活值。训练设备确定目标通道所对应的第一数量和第二数量,若第一数量与第二数量之间的比值大于或等于第一比值,则将目标通道被确定为分布异常的通道的次数加一;若第一数量与第二数量之间的比值小于第一比值,则不增加。训练设备将T个训练样本均输入第一机器学习模型之后,能够得到目标通道被确定为分布异常的通道的总次数。若目标通道被确定为分布异常的通道的总次数与T之间的比值是否大于或等于第二比值,则确定目标通道为分布异常的通道;若目标通道被确定为分布异常的通道的总次数与T之间的比值小于第二比值,则确定目标通道为分布正常的通道。训练设备对每个通道执行前述操作,以确定第一机器学习模型中哪些是分布正常的通道,哪些是分布异常的通道。
需要说明的是,训练设备也可以采用其他方式来确定第一机器学习模型中哪些是分布异常的通道,哪些是分布正常的通道,此处举例仅为证明本方案的可实现性,不用于限定本方案。
为进一步理解本方案,此处以第一子激活值为分布正常的通道所对应的子激活值,第二子激活值为分布异常的通道所对应的子激活值为例,公开了从第一激活值中获取第一子激活值和第二子激活值的公式的一个示例:
其中,代表包括第一子激活值,代表第二子激活值,Xc代表第一激活值中的任意一个子激活值,outlier(c)代表第一激活值中每个子激活值的标识类别,当outlier(c)取0时代表该子激活值是与分布正常的通道对应的子激活值,当outlier(c)取1时代表该子激活值是与分布异常的通道对应的子激活值,通过上述公式,可以将第一激活值拆分为两部分,一部分只包括多个第一子激活值,另一部分只包括多个第二子激活值,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
本申请实施例中,由于多个通道中可能会存在子激活值分布异常的通道,例如分布异常的通道所对应的子激活值稳定的超大或超小,若采用相同的量化步长对每个通道所对应的子激活值进行量化,则前述量化步长的取值就需要较大,则与分布正常的通道所对应的量化后的子激活值的精度就会大大降低,针对不同通道所对应的子激活值的分布不同的情况,本方案中采用不同的量化步长对不同通道所对应的子激活值进行量化,既有利于保留分布异常的通道所对应的量化后的子激活值的异常性,又有利于避免分布正常的通道所对应的量化后的子激活值的精度的损失。
技术人员在研究中发现,当第一机器学习模型选用Transformer模型时,分布异常的通道所对应的子激活值和分布正常的通道所对应的子激活值之间的差异更明显,“采用第一步长对第一通道所对应的子激活值进行量化,并采用第二步长对第二通道所对应的子激活值进行量化”这一方案与Transformer模型的适配度更高,能够在降低Transformer模型的计算量,减少Transformer模型中的参数量的同时,避免Transformer模型输出的预测结果的精度的降低。
可选地,训练设备可以对第一量化步长和第二量化步长做线性约束,使得第一量化步长和第二量化步长的取值更具有硬件友好性。为更直观地理解本方案,如下以第一量化步长为分布正常的通道所对应的子激活值采用的量化步长,第一量化步长为分布异常的通道所对应的子激活值采用的量化步长为例,公开了第一量化步长和第二量化步长之间的约束关系的一个示例:
其中,soutller代表第一量化步长,snormal代表第二量化步长,max(Xoutlier)代表所有分布异常的通
道所对应的所有第二子激活值中的最大值,max(Xnormal)代表所有分布正常的通道所对应的所有第一子激活值中的最大值,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
505、训练设备将第二特征信息分为多个组,多个组中每个组包括至少一个特征图,多个组中不同的组包括的特征图的尺度不同,其中,第一特征信息包含于训练样本的多个特征信息,第二特征信息包括不同尺度的特征图。
本申请实施例中,步骤505和506为可选步骤,若训练设备在对训练样本进行数据处理的过程中能够得到训练样本的第二特征信息,训练样本的一个第二特征信息中包括多个不同尺度的特征图,也即训练设备在对训练样本进行数据处理的过程得到的训练样本的多个特征信息中包括第二特征信息。训练设备可以将第二特征信息分为多个组,多个组中每个组包括至少一个特征图,多个组中不同的组包括的特征图的尺度不同。
示例性地,当第一机器学习模型的任务为如下任一项时,在对训练样本的进行数据处理的过程中能够得到训练样本的第二特征信息:对图像进行目标检测、对图像进行语义分割、对图像进行超分处理或者其他类型的图像处理任务。本申请实施例中,提供了多种应用场景,有利于提高本方案的实现灵活性。
需要说明的是,第一特征信息和第二特征信息为不同的特征信息,一个第一特征信息包括一个特征图,或者,一个第一特征信息中包括多个尺度相同的特征图;而一个第二特征信息包括多个尺度不同的特征图。
示例性地,训练样本的多个不同尺度的特征图的尺寸相同,“不同尺度的特征图”指的是训练样本在不同粒度上的特征信息,粒度更小(也可以称为更密集)的特征图中可以看到训练样本更多的细节,粒度更大(也可以称为更稀疏)的特征图中可以看到训练样本整体的信息。
为了更直观地理解本方案,请参阅图7,图7为本申请实施例提供的不同尺度的图像的一种示意图。图7包括左和右两个子示意图,图7的左子示意图和右子示意图的尺寸相同。在图7的左子示意图中,图像中有狗、草丛、树以及背景中的房子,在对图7的左子示图进行特征提取之后,得到的全局特征;在图7的右子示意图中,是从图7的左子示意图中提取的部分区域,将其放大至与图7的左子示意图尺寸一致,对图7的右子示意图进行特征提取得到的是局部区域的详细的特征信息;图7的左子示意图和右子示意图代表的就是不同尺度的图像,应理解,图7中的示例仅为方便理解本方案,不用于限定本方案。
506、训练设备对第二特征信息中不同的组采用不同的量化参数进行量化。
本申请实施例中,训练设备在将训练样本的一个第二特征信息分为多个组之后,可以对第二特征信息中不同的组采用不同的量化参数进行量化。
可选地,若训练样本的一个第二特征信息中包括L个尺度的特征图,则训练设备中可以将前述第二特征信息分为L个组,训练设备上存储与L个组一一对应的L组量化参数(也即与L个尺度一一对应的L组量化参数),从而对第二特征信息中不同的组采用不同的量化参数进行量化;“不同的量化参数”的含义可以参阅上述描述,此处不做赘述。
为进一步理解本方案,如下对第二特征信息进行量化时采用的公式进行举例:
qs(X1)=[qs1(X1);qs2(X2);qs3(X3);…;qsL(XL)]
qs(X1)=[qs1(X1);qs2(X2);qs3(X3);…;qsL(XL)]
其中,qs(X1)代表对一个第二特征信息X1进行量化,X1、X2、X3…XL代表第二特征信息X1包括的L个组,L个组中不同的组包括的特征图的尺度不同,qs1(X1)代表采用第一组量化参数对X1中的值进行量化,qs2(X2)代表采用第二组量化参数对X2中的值进行量化,qs3(X3)代表采用第三组量化参数对X3中的值进行量化,qsL(XL)代表采用第L组量化参数对XL中的值进行量化,也即采用不同的量化参数对L个组中不同的组进行量化,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
示例性地,训练设备可以对训练样本的每个第二特征信息均采用步骤505和506中的方式进行量化;或者,训练设备也可以采用步骤505和506中的方式对训练样本的部分第二特征信息进行量化,具体对哪些第二特征信息采用步骤505和506的方式进行量化可以结合实际应用场景灵活确定,本申请实施例中不做限定。
本申请实施例中,若在利用机器学习模型对训练样本进行数据处理的过程中得到了第二特征信息,由于第二特征信息包括多个尺度不同的特征图,基于每个特征图的尺度对第二特征信息进行分组,对不
同的组采用不同的量化参数进行量化,也即对不同尺度的特征图采用不同的量化参数进行量化,有利于保留不同尺度的特征图所携带的信息,以避免降低机器学习模型输出的预测结果的准确率。
需要说明的是,步骤502和503、步骤504以及步骤505和506均为可选步骤,若执行步骤502和503、步骤504和/或步骤505和506,本申请实施例中不限定步骤502和503、步骤504以及步骤505和506之间的执行顺序,也不限定步骤502和503、步骤504以及步骤505和506的执行次数,具体均可以结合实际情况灵活确定,此处不做限定。
为了更直观地理解本方案,请参阅图8,图8为本申请实施例提供的模型的量化方法的一种示意图。图8中以第一模型的任务是目标检测,第一模型包括特征提取模块(backbone)、特征处理模块(neck)以及探测头模块(head)为例,特征提取模块(backbone)以及特征处理模块(neck)中均存在基于注意力机制的transformer模块,该探测头模块(head)分别用来检测物体的类别和物体的位置。
如图8所示,在利用第一模型对训练样本进行数据处理的过程中,第一模型中的特征提取模块(backbone)、特征处理模块(neck)以及探测头模块(head)中均可以产生训练样本的第一特征信息,则可以采用步骤502和503中的方法对第一特征信息进行量化。特征提取模块(backbone)中的激活层和特征处理模块(neck)中的激活层均可以生成激活值,则可以采用步骤504中的方法对前述激活值进行量化,应理解,图8中的示例仅为方便理解本方案,不用于限定本方案。
507、训练设备获取第一机器学习模型输出的预测结果,根据与训练样本对应的期望结果、预测结果和损失函数,对第一机器学习模型进行训练。
本申请实施例中,训练设备在将训练样本输入第一机器学习模型,通过第一机器学习模型对训练样本进行数据处理之后,能够得到第一机器学习模型输出的预测结果,并根据与训练样本对应的期望结果和前述预测结果,生成损失函数的函数值,前述损失函数指示与训练样本对应的预测结果和期望结果之间的相似度。训练设备根据该损失函数的函数值,采用反向传播算法对第一机器学习模型中的权重参数以及步骤502至506中的一些量化参数进行更新,实现了对第一机器学习模型的一次训练。
其中,在训练第一机器学习模型的过程中,因为希望第一机器学习模型的输出尽可能的接近真正想要的值,所以可以通过比较第一机器学习模型输出的预测值和真正想要的期望值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为第一机器学习模型中的各层预先配置参数),比如,如果模型的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到第一机器学习模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
可选地,训练设备还可以对第一机器学习模型中的权重参数进行量化,并在第一机器学习模型的下一轮训练过程中,采用量化后的权重参数。
训练设备重复执行步骤501至507直至满足该损失函数的收敛条件,以实现对第一机器学习模型的迭代训练,得到训练后的第一机器学习模型和多组量化参数;前述多组量化参数用于在第一机器学习模型的推理阶段,对第一机器学习模型生成的特征信息和/或激活值进行量化。
二、推理阶段
本申请实施例中,推理阶段描述的是执行设备230利用训练后的第一机器学习模型201对待处理数据进行数据处理,输出的与该待处理数据对应的预测结果的过程,具体的,请参阅图9,图9为本申请实施例提供的模型的量化方法的一种流程示意图,本申请实施例提供的模型的量化方法可以包括:
901、执行设备将待处理数据输入第一机器学习模型中,其中,利用第一机器学习模型对待处理数据进行数据处理的过程中能够得到待处理数据的多个特征信息和第一机器学习模型中的激活层生成的激活值。
902、执行设备获取第一特征信息,并将第一特征信息分为M个子特征信息,第一特征信息包含于待处理数据的多个特征信息,M为大于或等于2的整数。
903、执行设备采用第一量化参数对第一子特征信息进行量化,采用第二量化参数对第二子特征信
息进行量化,M个子特征信息包括第一子特征信息和第二子特征信息,第一量化参数和第二量化参数不同。
904、执行设备采用第一量化步长对第一激活值中的第一子激活值进行量化,采用第二量化步长对第一激活值中的第二子激活值进行量化,第一机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一通道所对应的第一子激活值的分布和第二通道所对应的第二子激活值的分布不同,第一量化步长和第二量化步长不同。
905、执行设备将第二特征信息分为多个组,多个组中每个组包括至少一个特征图,多个组中不同的组包括的特征图的尺度不同,其中,第一特征信息包含于待处理数据的多个特征信息,第二特征信息包括不同尺度的特征图。
906、执行设备对第二特征信息中不同的组采用不同的量化参数进行量化。
907、执行设备获取第一机器学习模型输出的预测结果。
本申请实施例中,步骤901至907的具体实现方式以及步骤901至907中各个名词的含义均可以参阅图5对应的各个实施例中的描述,区别在于,将图5对应实施例中的“训练样本”替换为图9对应实施例中的“待处理数据”,“待处理数据”是推理阶段中“输入数据”的一个别称,此处不再一一进行赘述。
需要说明的是,步骤902和903、步骤904以及步骤905和906均为可选步骤,若执行步骤902和903、步骤904和/或步骤905和906,本申请实施例中不限定步骤902和903、步骤904以及步骤905和906之间的执行顺序,也不限定步骤902和903、步骤904以及步骤905和906的执行次数,具体均可以结合实际情况灵活确定,此处不做限定。
本申请实施例中,无论在机器学习模型的训练阶段还是推理阶段,只要利用机器学习模型对输入数据进行数据处理时,均可以采用本申请提供的模型的量化方法,也即不仅能降低机器学习模型在执行设备上进行数据处理时的计算量,还能降低机器学习模型在训练设备上进行数据处理时的计算量。
为了对本申请提供的模型的量化方法所带来的有益效果有更直观的了解,以下结合如下表1中示出的实验数据进行说明。
表1
其中,增加改进的去噪锚框端到端目标检测模型(DETR with improved denoising anchor boxes for end to end object detection,DINO)和用于执行目标检测任务的Transformer模型的变形(deformable detection transformer,Deformable DETR)是两个机器学习模型,每秒浮点计算(loating point operations per second,FLOPs)为代表计算量的一个指标,mAP是代表准确率的一个指标,channel-wise是一种已有的量化方法。如表1所示,采用本申请提供的模型的量化方法不仅大大减少了机器学习模型的参数量,降低了机器学习模型的计算量,且避免了模型输出的预测结果的准确率的降低。
在图1至图9所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图10,图10为本申请实施例提供的模型的量化装置的一种结构示意图,模型的量化装置1000应用于利用机器学习模型进行数据处理的过程中,模型的量化装置用于对机器学习模型中的至少一个激活层生成的激活值进行量化,至少一个激活层包括第一激活层;
其中,模型的量化装置1000包括:量化模块1001,用于采用第一量化步长对第一激活值中的第一
子激活值进行量化;量化模块1001,还用于采用第二量化步长对第一激活值中的第二子激活值进行量化,其中,机器学习模型包括多个通道,多个通道包括第一通道和第二通道,第一子激活值与第一通道对应,第二子激活值与第二通道对应,第一量化步长和第二量化步长不同。
在一种可能的设计中,第一子激活值的分布和第二子激活值的分布不同。
在一种可能的设计中,机器学习模型为Transformer模型。
在一种可能的设计中,在利用机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第一特征信息,模型的量化装置1000还用于对第一特征信息进行量化;
请参阅图11,图11为本申请实施例提供的模型的量化装置的另一种结构示意图,模型的量化装置1000还包括:分组模块1002,用于将第一特征信息分为至少两个子特征信息,至少两个子特征信息包括第一子特征信息和第二子特征信息;量化模块1002,还用于采用第一量化参数对第一子特征信息进行量化;量化模块1002,还用于采用第二量化参数对第二子特征信息进行量化,第一量化参数和第二量化参数不同。
在一种可能的设计中,输入数据为图像,机器学习模型的任务为对图像进行目标检测。
在一种可能的设计中,在利用机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第二特征信息,第二特征信息包括不同尺度的特征图,模型的量化装置1000还用于对第二特征信息进行量化;
请参阅图11,模型的量化装置1000还包括:分组模块1002,用于将第二特征信息分为多个组,多个组中每个组包括至少一个特征图,多个组中不同的组包括的特征图的尺度不同;量化模块1001,还用于对不同的组采用不同的量化参数进行量化。
在一种可能的设计中,输入数据为图像,机器学习模型的任务为如下任一项:对图像进行目标检测、对图像进行语义分割或者对图像进行超分处理。
在一种可能的设计中,利用机器学习模型进行数据处理的过程在机器学习模型的推理阶段中,或者,利用机器学习模型进行数据处理的过程在机器学习模型的训练阶段中。
需要说明的是,模型的量化装置1000中各模块/单元之间的信息交互、执行过程等内容,与本申请中图3至图9对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
请参阅图12,图12为本申请实施例提供的模型的量化装置的另一种结构示意图,模型的量化装置1200应用于利用机器学习模型进行数据处理的过程中,在利用机器学习模型对输入数据进行数据处理的过程中能够得到输入数据的多个特征信息,多个特征信息包括第一特征信息,模型的量化装置1200用于对第一特征信息进行量化;
其中,模型的量化装置1200包括:分组模块1201,用于将第一特征信息分为至少两个子特征信息,至少两个子特征信息包括第一子特征信息和第二子特征信息;量化模块1202,用于采用第一量化参数对第一子特征信息进行量化;量化模块1202,还用于采用第二量化参数对第一子特征信息进行量化,第一量化参数和第二量化参数不同。
在一种可能的设计中,模型的量化装置1200还用于对机器学习模型中的至少一个激活层生成的激活值进行量化,至少一个激活层包括第一激活层;
其中,量化模块1202,还用于采用第一量化步长对第一激活值中的第一子激活值进行量化;量化模块1202,还用于采用第二量化步长对第二激活值中的第二子激活值进行量化,其中,机器学习模型包括多个通道,第一子激活值与多个通道中的第一通道对应,第二子激活值与多个通道中的第二通道对应,第一量化步长和量化步长不同。
需要说明的是,模型的量化装置1200中各模块/单元之间的信息交互、执行过程等内容,与本申请中图3至图9对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
接下来介绍本申请实施例提供的一种电子设备,该电子设备可以表现为第一机器学习模型的训练设
备,也可以表现为配置第一机器学习模型的执行设备,当电子设备表现为执行设备时,请参阅图13,图13为本申请实施例提供的执行设备的一种结构示意图,执行设备1300具体可以表现为虚拟现实(virtual reality,VR)设备、手机、平板、笔记本电脑、智能穿戴设备、监控数据处理设备或者雷达数据处理设备等,此处不做限定。具体的,执行设备1300包括:接收器1301、发射器1302、处理器1303和存储器1304(其中执行设备1300中的处理器1303的数量可以一个或多个,图13中以一个处理器为例),其中,处理器1303可以包括应用处理器13031和通信处理器13032。在本申请的一些实施例中,接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接。
存储器1304可以包括只读存储器和随机存取存储器,并向处理器1303提供指令和数据。存储器1304的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1304存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1303控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1303中,或者由处理器1303实现。处理器1303可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1303可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1303读取存储器1304中的信息,结合其硬件完成上述方法的步骤。
接收器1301可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1302可用于通过第一接口输出数字或字符信息;发射器1302还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1302还可以包括显示屏等显示设备。
本申请实施例中,在一种情况下,处理器1303中的应用处理器13031,用于执行图3至图9对应实施例中的执行设备执行的模型的量化方法。需要说明的是,应用处理器13031执行上述各个步骤的具体方式,与本申请中图3至图9对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图3至图9对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
当电子设备表现为训练设备时,请参阅图14,图14是本申请实施例提供的训练设备一种结构示意图,具体的,训练设备1400由一个或多个服务器实现,训练设备1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1422(例如,一个或一个以上处理器)和存储器1432,一个或一个以上存储应用程序1442或数据1444的存储介质1430(例如一个或一个以上海量存储设备)。其中,存储器1432和存储介质1430可以是短暂存储或持久存储。存储在存储介质1430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1422可以设置为与存储介质1430通信,在训练设备1400上执行存储介质1430中的一系列指令操作。
训练设备1400还可以包括一个或一个以上电源1426,一个或一个以上有线或无线网络接口1450,一个或一个以上输入输出接口1458,和/或,一个或一个以上操作系统1441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1422,用于执行图12对应实施例中的训练设备执行的模型的量化方法。需要说明的是,中央处理器1422执行上述各个步骤的具体方式,与本申请中图12对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图12对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图3至图8所示实施例描述的方法中训练设备所执行的步骤,或者,使得计算机执行如前述图9所示实施例描述的方法中执行设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图3至图8所示实施例描述的方法中训练设备所执行的步骤,或者,使得计算机执行如前述图9所示实施例描述的方法中执行设备所执行的步骤。
本申请实施例提供的执行设备、训练设备或模型的量化装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图3至图9所示实施例描述的模型的量化方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图15,图15为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 150,NPU 150作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路150,通过控制器1504控制运算电路1503提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1503内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1503是二维脉动阵列。运算电路1503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1508中。
统一存储器1506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1505,DMAC被搬运到权重存储器1502中。输入数据也通过DMAC被搬运到统一存储器1506中。
BIU为Bus Interface Unit即,总线接口单元1510,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1509的交互。
总线接口单元1510(Bus Interface Unit,简称BIU),用于取指存储器1509从外部存储器获取指令,还用于存储单元访问控制器1505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1506或将权重数据搬运到权重存储器1502中或将输入数据数据搬运到输入存储器1501中。
向量计算单元1507包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元1507能将经处理的输出的向量存储到统一存储器1506。例如,向量计算单元1507可以将线性函数和/或非线性函数应用到运算电路1503的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1507生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1503的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1504连接的取指存储器(instruction fetch buffer)1509,用于存储控制器1504使用的
指令;
统一存储器1506,输入存储器1501,权重存储器1502以及取指存储器1509均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述各个实施例中示出的第一机器学习模型中各层的运算可以由运算电路1503或向量计算单元1507执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
Claims (23)
- 一种模型的量化方法,其特征在于,所述方法应用于利用机器学习模型进行数据处理的过程中,所述模型的量化方法包括对所述机器学习模型中的至少一个激活层生成的激活值进行量化,所述至少一个激活层包括第一激活层;其中,对所述第一激活层生成的第一激活值进行量化包括:采用第一量化步长对所述第一激活值中的第一子激活值进行量化;采用第二量化步长对所述第一激活值中的第二子激活值进行量化,其中,所述机器学习模型包括多个通道,所述多个通道包括第一通道和第二通道,所述第一子激活值与所述第一通道对应,所述第二子激活值与所述第二通道对应,所述第一量化步长和所述第二量化步长不同。
- 根据权利要求1所述的方法,其特征在于,所述第一子激活值的分布和所述第二子激活值的分布不同。
- 根据权利要求1或2所述的方法,其特征在于,所述机器学习模型为Transformer模型。
- 根据权利要求1至3任一项所述的方法,其特征在于,在利用所述机器学习模型对输入数据进行数据处理的过程中能够得到所述输入数据的多个特征信息,所述多个特征信息包括第一特征信息,所述模型的量化方法还包括对所述第一特征信息进行量化;其中,所述对所述第一特征信息进行量化包括:将所述第一特征信息分为至少两个子特征信息,所述至少两个子特征信息包括第一子特征信息和第二子特征信息;采用第一量化参数对所述第一子特征信息进行量化;采用第二量化参数对所述第二子特征信息进行量化,所述第一量化参数和所述第二量化参数不同。
- 根据权利要求4所述的方法,其特征在于,所述输入数据为图像,所述机器学习模型的任务为对所述图像进行目标检测。
- 根据权利要求1至5任一项所述的方法,其特征在于,在利用所述机器学习模型对输入数据进行数据处理的过程中能够得到所述输入数据的多个特征信息,所述多个特征信息包括第二特征信息,所述第二特征信息包括不同尺度的特征图,所述模型的量化方法还包括对所述第二特征信息进行量化;其中,所述对所述第二特征信息进行量化包括:将所述第二特征信息分为多个组,所述多个组中每个组包括至少一个特征图,所述多个组中不同的组包括的特征图的尺度不同;对所述不同的组采用不同的量化参数进行量化。
- 根据权利要求6所述的方法,其特征在于,所述输入数据为图像,所述机器学习模型的任务为如下任一项:对所述图像进行目标检测、对所述图像进行语义分割或者对所述图像进行超分处理。
- 根据权利要求1至7任一项所述的方法,其特征在于,所述利用机器学习模型进行数据处理的过程在所述机器学习模型的推理阶段中,或者,所述利用机器学习模型进行数据处理的过程在所述机器学习模型的训练阶段中。
- 一种模型的量化方法,其特征在于,所述方法应用于利用机器学习模型进行数据处理的过程中,在利用所述机器学习模型对输入数据进行数据处理的过程中能够得到所述输入数据的多个特征信息,所述多个特征信息包括第一特征信息,所述模型的量化方法包括对所述第一特征信息进行量化;其中,所述对所述第一特征信息进行量化包括:将所述第一特征信息分为至少两个子特征信息,所述至少两个子特征信息包括第一子特征信息和第二子特征信息;采用第一量化参数对所述第一子特征信息进行量化;采用第二量化参数对所述第一子特征信息进行量化,所述第一量化参数和所述第二量化参数不同。
- 根据权利要求9所述的方法,其特征在于,所述模型的量化方法还包括对所述机器学习模型中的至少一个激活层生成的激活值进行量化,所述至少一个激活层包括第一激活层;其中,对所述第一激活层生成的第一激活值进行量化包括:采用第一量化步长对所述第一激活值中的第一子激活值进行量化;采用第二量化步长对所述第二激活值中的第二子激活值进行量化,其中,所述机器学习模型包括多 个通道,所述多个通道包括第一通道和第二通道,所述第一子激活值与所述第一通道对应,所述第二子激活值与所述第二通道对应,所述第一量化步长和所述第二量化步长不同。
- 一种模型的量化装置,其特征在于,所述模型的量化装置应用于利用机器学习模型进行数据处理的过程中,所述模型的量化装置用于对机器学习模型中的至少一个激活层生成的激活值进行量化,所述至少一个激活层包括第一激活层;其中,所述模型的量化装置包括:量化模块,用于采用第一量化步长对所述第一激活值中的第一子激活值进行量化;所述量化模块,还用于采用第二量化步长对所述第一激活值中的第二子激活值进行量化,其中,所述机器学习模型包括多个通道,所述多个通道包括第一通道和第二通道,所述第一子激活值与所述第一通道对应,所述第二子激活值与所述第二通道对应,所述第一量化步长和所述第二量化步长不同。
- 根据权利要求11所述的装置,其特征在于,所述第一子激活值的分布和所述第二子激活值的分布不同。
- 根据权利要求11或12所述的装置,其特征在于,所述机器学习模型为Transformer模型。
- 根据权利要求11至13任一项所述的装置,其特征在于,在利用所述机器学习模型对输入数据进行数据处理的过程中能够得到所述输入数据的多个特征信息,所述多个特征信息包括第一特征信息,所述模型的量化装置还用于对所述第一特征信息进行量化;所述模型的量化装置还包括:分组模块,用于将所述第一特征信息分为至少两个子特征信息,所述至少两个子特征信息包括第一子特征信息和第二子特征信息;所述量化模块,还用于采用第一量化参数对所述第一子特征信息进行量化;所述量化模块,还用于采用第二量化参数对所述第二子特征信息进行量化,所述第一量化参数和所述第二量化参数不同。
- 根据权利要求14所述的装置,其特征在于,所述输入数据为图像,所述机器学习模型的任务为对所述图像进行目标检测。
- 根据权利要求11至15任一项所述的装置,其特征在于,在利用所述机器学习模型对输入数据进行数据处理的过程中能够得到所述输入数据的多个特征信息,所述多个特征信息包括第二特征信息,所述第二特征信息包括不同尺度的特征图,所述模型的量化装置还用于对所述第二特征信息进行量化;所述模型的量化装置还包括:分组模块,用于将所述第二特征信息分为多个组,所述多个组中每个组包括至少一个特征图,所述多个组中不同的组包括的特征图的尺度不同;所述量化模块,还用于对所述不同的组采用不同的量化参数进行量化。
- 根据权利要求16所述的装置,其特征在于,所述输入数据为图像,所述机器学习模型的任务为如下任一项:对所述图像进行目标检测、对所述图像进行语义分割或者对所述图像进行超分处理。
- 根据权利要求11至17任一项所述的装置,其特征在于,所述利用机器学习模型进行数据处理的过程在所述机器学习模型的推理阶段中,或者,所述利用机器学习模型进行数据处理的过程在所述机器学习模型的训练阶段中。
- 一种模型的量化装置,其特征在于,所述模型的量化装置应用于利用机器学习模型进行数据处理的过程中,在利用所述机器学习模型对输入数据进行数据处理的过程中能够得到所述输入数据的多个特征信息,所述多个特征信息包括第一特征信息,所述模型的量化装置用于对所述第一特征信息进行量化;其中,所述模型的量化装置包括:分组模块,用于将所述第一特征信息分为至少两个子特征信息,所述至少两个子特征信息包括第一子特征信息和第二子特征信息;量化模块,用于采用第一量化参数对所述第一子特征信息进行量化;所述量化模块,还用于采用第二量化参数对所述第一子特征信息进行量化,所述第一量化参数和所述第二量化参数不同。
- 根据权利要求19所述的装置,其特征在于,所述模型的量化装置还用于对所述机器学习模型中的至少一个激活层生成的激活值进行量化,所述至少一个激活层包括第一激活层;其中,所述量化模块,还用于采用第一量化步长对所述第一激活值中的第一子激活值进行量化;所述量化模块,还用于采用第二量化步长对所述第二激活值中的第二子激活值进行量化,其中,所述机器学习模型包括多个通道,所述多个通道包括第一通道和第二通道,所述第一子激活值与所述第一通道对应,所述第二子激活值与所述第二通道对应,所述第一量化步长和所述第二量化步长不同。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至10中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至10中任一项所述的方法。
- 一种电子设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,所述存储器,用于存储程序;所述处理器,用于执行所述存储器中的程序,使得所述执行设备执行如权利要求1至10中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310215082.0A CN116362301A (zh) | 2023-02-25 | 2023-02-25 | 一种模型的量化方法以及相关设备 |
CN202310215082.0 | 2023-02-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024175079A1 true WO2024175079A1 (zh) | 2024-08-29 |
Family
ID=86913021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/078233 WO2024175079A1 (zh) | 2023-02-25 | 2024-02-23 | 一种模型的量化方法以及相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116362301A (zh) |
WO (1) | WO2024175079A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116362301A (zh) * | 2023-02-25 | 2023-06-30 | 华为技术有限公司 | 一种模型的量化方法以及相关设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113163203A (zh) * | 2021-04-29 | 2021-07-23 | 上海大学 | 深度学习特征压缩和解压缩方法、系统及终端 |
CN115053230A (zh) * | 2020-03-13 | 2022-09-13 | 英特尔公司 | 优化低精度推理模型用于深度神经网络的部署 |
US20230139347A1 (en) * | 2021-10-29 | 2023-05-04 | Qualcomm Incorporated | Per-embedding-group activation quantization |
CN116362301A (zh) * | 2023-02-25 | 2023-06-30 | 华为技术有限公司 | 一种模型的量化方法以及相关设备 |
-
2023
- 2023-02-25 CN CN202310215082.0A patent/CN116362301A/zh active Pending
-
2024
- 2024-02-23 WO PCT/CN2024/078233 patent/WO2024175079A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115053230A (zh) * | 2020-03-13 | 2022-09-13 | 英特尔公司 | 优化低精度推理模型用于深度神经网络的部署 |
CN113163203A (zh) * | 2021-04-29 | 2021-07-23 | 上海大学 | 深度学习特征压缩和解压缩方法、系统及终端 |
US20230139347A1 (en) * | 2021-10-29 | 2023-05-04 | Qualcomm Incorporated | Per-embedding-group activation quantization |
CN116362301A (zh) * | 2023-02-25 | 2023-06-30 | 华为技术有限公司 | 一种模型的量化方法以及相关设备 |
Also Published As
Publication number | Publication date |
---|---|
CN116362301A (zh) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022083536A1 (zh) | 一种神经网络构建方法以及装置 | |
WO2022042002A1 (zh) | 一种半监督学习模型的训练方法、图像处理方法及设备 | |
CN111797895B (zh) | 一种分类器的训练方法、数据处理方法、系统以及设备 | |
CN112651511B (zh) | 一种训练模型的方法、数据处理的方法以及装置 | |
CN111797893B (zh) | 一种神经网络的训练方法、图像分类系统及相关设备 | |
CN111882031B (zh) | 一种神经网络蒸馏方法及装置 | |
CN112580720B (zh) | 一种模型训练方法及装置 | |
CN112183718A (zh) | 一种用于计算设备的深度学习训练方法和装置 | |
CN110175671A (zh) | 神经网络的构建方法、图像处理方法及装置 | |
CN111368972A (zh) | 一种卷积层量化方法及其装置 | |
EP4303767A1 (en) | Model training method and apparatus | |
CN111950700B (zh) | 一种神经网络的优化方法及相关设备 | |
CN112215332A (zh) | 神经网络结构的搜索方法、图像处理方法和装置 | |
WO2024083121A1 (zh) | 一种数据处理方法及其装置 | |
CN113191241A (zh) | 一种模型训练方法及相关设备 | |
WO2023284716A1 (zh) | 一种神经网络搜索方法及相关设备 | |
WO2022111387A1 (zh) | 一种数据处理方法及相关装置 | |
WO2022222854A1 (zh) | 一种数据处理方法及相关设备 | |
CN116432736A (zh) | 神经网络模型优化方法、装置及计算设备 | |
CN115081588A (zh) | 一种神经网络参数量化方法和装置 | |
WO2024160186A1 (zh) | 一种模型训练方法及其相关设备 | |
CN112532251B (zh) | 一种数据处理的方法及设备 | |
CN117217280A (zh) | 神经网络模型优化方法、装置及计算设备 | |
WO2024179503A1 (zh) | 一种语音处理方法及其相关设备 | |
WO2024114659A1 (zh) | 一种摘要生成方法及其相关设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24759753 Country of ref document: EP Kind code of ref document: A1 |