[go: up one dir, main page]

CN112446888A - Processing method and processing device for image segmentation model - Google Patents

Processing method and processing device for image segmentation model Download PDF

Info

Publication number
CN112446888A
CN112446888A CN201910845625.0A CN201910845625A CN112446888A CN 112446888 A CN112446888 A CN 112446888A CN 201910845625 A CN201910845625 A CN 201910845625A CN 112446888 A CN112446888 A CN 112446888A
Authority
CN
China
Prior art keywords
feature extraction
submodel
model
image segmentation
binarization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910845625.0A
Other languages
Chinese (zh)
Other versions
CN112446888B (en
Inventor
韩凯
闻长远
舒晗
陈翼翼
苏霞
王云鹤
许春景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910845625.0A priority Critical patent/CN112446888B/en
Priority to PCT/CN2020/100058 priority patent/WO2021042857A1/en
Publication of CN112446888A publication Critical patent/CN112446888A/en
Application granted granted Critical
Publication of CN112446888B publication Critical patent/CN112446888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image segmentation technology in the field of artificial intelligence, and provides a processing method and a processing device of an image segmentation model. The image segmentation model comprises a feature extraction sub-model and an image segmentation sub-model, wherein the feature extraction sub-model is used for extracting the features of the image, and the image segmentation sub-model is used for segmenting the image according to the extracted features. The processing method comprises the following steps: adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel; and extracting the sub-model and the image segmentation sub-model according to the first characteristic to obtain a target image segmentation model. The application provides a processing method and a processing device of an image segmentation model, which are beneficial to improving the segmentation precision of the image segmentation model, thereby being beneficial to realizing an image segmentation technology on edge equipment.

Description

Processing method and processing device for image segmentation model
Technical Field
The present application relates to the field of image processing, and more particularly, to a processing method and a processing apparatus for an image segmentation model.
Background
Image segmentation also has application requirements on edge devices, such as vehicles or cell phones. However, processing images using neural networks on these edge devices presents difficulties due to the computational performance and cache limitations of the edge devices.
Therefore, how to obtain an image segmentation model applicable to edge devices is a technical problem to be solved urgently.
Disclosure of Invention
The application provides a processing method and a processing device of an image segmentation model, which are beneficial to improving the segmentation precision of the image segmentation model so as to realize image segmentation on edge equipment.
In a first aspect, a method for processing an image segmentation model is provided, where the image segmentation model includes a feature extraction sub-model and an image segmentation sub-model, the feature extraction sub-model is used to extract features of an image, and the image segmentation sub-model is used to segment the image according to the extracted features, and the method includes: adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel; and obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel.
The image segmentation model may be trained or untrained.
In the method, the layer width of the feature extraction submodel is adjusted, so that the accuracy of the feature extraction submodel is improved, the segmentation accuracy of the target image segmentation model is improved, and the image segmentation model is easier to apply to edge equipment.
Particularly, in the scene that the feature extraction submodel is the binarization neural network model, the precision loss caused by the feature extraction submodel being the binarization neural network model can be reduced, so that the segmentation precision of the image segmentation model is improved.
In some possible implementations of the first aspect, the performing layer width adjustment on the feature extraction submodel includes: increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel; generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1; reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels; selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1; performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1; reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels; taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times; and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.
Optionally, a probability that a jth third feature extraction sub-model of the M third feature extraction sub-models is selected satisfies the following formula:
Figure BDA0002188477490000021
wherein, Pr (b)j) Representing the probability of the jth third feature extraction submodel being selected, f (b)j) The following formula is satisfied:
f(bj)=mIoU(j)+α/N(j)
where mlou (j) represents an intersection ratio when the jth third feature extraction sub-model performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction sub-model performs feature extraction on the image, and α is a preset parameter.
In some possible implementation manners of the first aspect, the obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel includes: performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel; and obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.
The teacher model may include, among other things, feature extraction models trained using conventional methods.
In the implementation mode, the trained teacher model is used for carrying out knowledge distillation on the feature extraction submodel in the image segmentation model, so that the precision loss caused by binarization can be reduced, and the precision loss of image segmentation is reduced.
Optionally, the loss function of the fifth feature extraction submodel satisfies the following relationship:
Figure BDA0002188477490000022
wherein G istiRepresenting the ith scale feature, G, of the T scale features output by the teacher feature extraction modelsiTo representThe fifth feature extraction submodel outputs the ith scale feature of the T scale features, T is a positive integer, the ith scale feature output by the teacher feature extraction submodel has the same scale as the ith scale feature output by the fifth feature extraction submodel, and the | | | | "represents a matrix norm.
In some possible implementation manners of the first aspect, the obtaining the target image segmentation model according to the fifth feature extraction sub-model and the image segmentation sub-model includes: and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.
The teacher model may include, among other things, an image segmentation model trained using conventional methods.
In this implementation, knowledge distillation is performed on the image segmentation submodel through the trained teacher model, and the accuracy of the image segmentation model can be improved.
Particularly, in the scene that the feature extraction submodel is the binarization neural network model, the precision loss caused by the feature extraction submodel being the binarization neural network model can be reduced, so that the precision of the image segmentation model is improved.
Optionally, the loss function of the target image segmentation sub-model satisfies the following relationship:
τ(PT)=soft max(aT/τ),τ(PS)=soft max(aS/τ),Lτ=H(y,PS)+λ*H(τ(PT),τ(PS))
wherein, PTRepresenting the segmentation result of the teacher image segmentation model, PSThe segmentation result of the target image segmentation submodel is represented, H represents a cross entropy loss function, y is used for indicating whether the segmentation result of the target image segmentation submodel is correct or not, lambda is a preset weighing coefficient, and softmax represents a flexible maximum function.
Alternatively, the feature extraction sub-model may be a binarization neural network model, which may reduce the parameters and the computation amount of the image segmentation model, thereby facilitating the application of the image segmentation model on the edge device.
In a second aspect, there is provided an apparatus for processing an image segmentation model, the image segmentation model including a feature extraction sub-model and an image segmentation sub-model, the feature extraction sub-model being used for extracting features of an image, and the image segmentation sub-model being used for segmenting the image according to the extracted features, the apparatus comprising: a layer width adjusting module and a knowledge distilling module.
The layer width adjusting module is used for adjusting the layer width of the feature extraction submodel in the image segmentation model to obtain a first feature extraction submodel; and the knowledge distillation module is used for extracting a sub-model and obtaining a target image segmentation model according to the first characteristic.
The device adjusts the layer width of the feature extraction submodel, and is beneficial to improving the precision of the feature extraction submodel, so that the segmentation precision of the image segmentation model is improved, and the image segmentation model is further beneficial to being applied to edge equipment.
Particularly, in the scene that the feature extraction submodel is the binarization neural network model, the precision loss caused by the feature extraction submodel being the binarization neural network model can be reduced, so that the segmentation precision of the image segmentation model is improved.
In some possible implementations of the second aspect, the layer width adjusting module is specifically configured to: increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel; generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1; reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels; selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1; performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1; reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels; taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times; and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.
Optionally, a probability that a jth third feature extraction sub-model of the M third feature extraction sub-models is selected satisfies the following formula:
Figure BDA0002188477490000031
wherein, Pr (b)j) Representing the probability of the jth third feature extraction submodel being selected, f (b)j) The following formula is satisfied:
f(bj)=mIoU(j)+α/N(j)
where mlou (j) represents an intersection ratio when the jth third feature extraction sub-model performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction sub-model performs feature extraction on the image, and α is a preset parameter.
In some possible implementations of the second aspect, the knowledge distillation module is specifically configured to: performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel; and obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.
The teacher model may include, among other things, feature extraction models trained using conventional methods.
In the implementation mode, the trained teacher model is used for carrying out knowledge distillation on the feature extraction submodel in the image segmentation model, so that the image segmentation precision can be improved.
Particularly, in the scene that the feature extraction submodel is the binarization neural network model, the precision loss caused by the feature extraction submodel being the binarization neural network model can be reduced, so that the precision of the image segmentation model is improved.
Optionally, the loss function of the fifth feature extraction submodel satisfies the following relationship:
Figure BDA0002188477490000041
wherein G istiRepresenting the ith scale feature, G, of the T scale features output by the teacher feature extraction modelsiThe ith scale feature in the T scale features output by the fifth feature extraction submodel is represented, T is a positive integer, the ith scale feature output by the teacher feature extraction model is the same as the scale of the ith scale feature output by the fifth feature extraction submodel, and the | | | | "represents a matrix norm.
In some possible implementations of the second aspect, the knowledge distillation module is specifically configured to: and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.
The teacher model may include, among other things, an image segmentation model trained using conventional methods.
In the implementation mode, knowledge distillation is carried out on the image segmentation sub-model through the trained teacher model, so that precision loss caused by binarization can be reduced, and further, precision loss of image segmentation is reduced.
Optionally, the loss function of the target image segmentation sub-model satisfies the following relationship:
τ(PT)=soft max(aT/τ),τ(PS)=soft max(aS/τ),Lτ=H(y,PS)+λ*H(τ(PT),τ(PS))
wherein, PTRepresenting the segmentation result of the teacher image segmentation model, PSThe segmentation result of the target image segmentation submodel is represented, H represents a cross entropy loss function, y is used for indicating whether the segmentation result of the target image segmentation submodel is correct or not, lambda is a preset weighing coefficient, and softmax represents a flexible maximum function.
Alternatively, the feature extraction sub-model may be a binarization neural network model, which may reduce the parameters and the computation amount of the image segmentation model, thereby facilitating the application of the image segmentation model on the edge device.
In a third aspect, there is provided an apparatus for processing an image segmentation model, the apparatus comprising: a memory for storing a program; a processor configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to perform the method in any one of the implementations of the first aspect.
The processor in the third aspect may be a Central Processing Unit (CPU), or may be a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), a Tensor Processing Unit (TPU), and the like. The artificial intelligence accelerator is an artificial intelligence accelerator application specific integrated circuit which is completely customized for machine learning by Google (Google).
In a fourth aspect, a computer readable medium is provided, which stores program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first aspect.
In a fifth aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method of any one of the implementations of the first aspect.
In a sixth aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to execute the method in any one of the implementation manners in the first aspect.
Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in any one of the implementation manners of the first aspect.
The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
In a seventh aspect, an electronic device is provided, which includes the processing apparatus in the second aspect, or includes the processing apparatus in the third aspect.
Drawings
FIG. 1 is a schematic deployment of a processing device of the present application;
FIG. 2 is a schematic block diagram of a computing device of the present application;
FIG. 3 is a schematic block diagram of a processing device according to one embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of a processing method of one embodiment of the present application;
FIG. 5 is a schematic block diagram of a chip according to one embodiment of the present application.
Detailed Description
The embodiments of the present application relate to a large number of related applications of neural networks, and in order to better understand the solution of the embodiments of the present application, the following first introduces related terms and other related concepts of neural networks that may be related to the embodiments of the present application.
(1) Neural network
The neural network may be composed of neural units, which may be referred to as xsAnd an arithmetic unit with intercept 1 as input, the output of which can be as shown in equation (1-1):
Figure BDA0002188477490000051
wherein s is 1, 2, … … n, n is a natural number greater than 1, and W issIs xsB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(2) Deep neural network
Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.
Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:
Figure BDA0002188477490000052
wherein,
Figure BDA0002188477490000053
Is the input vector of the input vector,
Figure BDA0002188477490000054
is the output vector of the output vector,
Figure BDA0002188477490000055
is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector
Figure BDA0002188477490000056
Obtaining the output vector through such simple operation
Figure BDA0002188477490000057
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure BDA0002188477490000058
The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as
Figure BDA0002188477490000059
The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as
Figure BDA0002188477490000061
Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.
(3) Convolutional neural network
A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) Loss function
In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.
(5) Back propagation algorithm
The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.
(6) Binary neural network
The binarization neural network is a neural network which binarizes the weight and activation value of a part of or all layers into 1 or-1. The binarization neural network generally only binarizes the weight value and the activation value of the floating point type neural network, and does not change the structure of the network.
The weight and the activation value in the binarization neural network can occupy smaller storage space, and theoretically, the memory consumption is reduced to 1/32 times of that of the floating point type neural network. In addition, the binarization neural network replaces multiplication and addition operation in a floating point type neural network by using bit operation, so that the operation time is greatly reduced. The binarization neural network may also be referred to as a binary neural network or a binary network.
(7) Knowledge distillation
A complex neural network model is a set of a plurality of independent neural network models, or a larger and more complex network model obtained by training under strong constraint conditions (such as high random deactivation rate). The way in which smaller and simpler network models (e.g., scaled-down models that can be deployed at the application site) are trained from the larger and more complex network models is known as knowledge distillation. Among them, the larger and more complex network model is called a teacher network (teacher network), and the smaller and simpler network model is called a student network (student network). The teacher network may provide more accurate supervision information than the student network. The student network has greater computational throughput and fewer model parameters than the teacher network.
(8) Neural network model
Neural network models are a class of mathematical computational models that mimic the structure and function of biological neural networks (the central nervous system of animals). A neural network model may include a number of different functional neural network layers, each layer including parameters and computational formulas. Different layers in the neural network model have different names according to different calculation formulas or different functions, for example: the layers that are subjected to convolution calculations are called convolutional layers, which are commonly used for feature extraction of input signals (e.g., images).
One neural network model may also be composed of a combination of a plurality of existing neural network models. Neural network models of different structures may be used for different scenes (e.g., classification, recognition, or image segmentation) or to provide different effects when used for the same scene. The neural network model structure specifically includes one or more of the following: the neural network model has different network layers, different sequences of the network layers, and different weights, parameters or calculation formulas in each network layer.
There are many different neural network models with higher accuracy for identifying or classifying or image segmentation application scenarios in the industry. Some neural network models can be trained by a specific training set and then perform a task alone or in combination with other neural network models (or other functional modules). Some neural network models may also be used directly to perform a task alone or in combination with other neural network models (or other functional modules).
(9) Edge device
An edge device refers to any device having computing resources and network resources between the data generation source and the cloud center. For example, the mobile phone is an edge device between a person and a cloud center, and the gateway is an edge device between a smart home and the cloud center. In an ideal environment, edge device refers to a device that analyzes or processes data near the source of the data generation. And because no data flows, the network flow and the response time are further reduced.
The edge device in the embodiment of the present application may be a mobile phone with computing capability, a Tablet Personal Computer (TPC), a media player, a smart home, a notebook computer (LC), a Personal Digital Assistant (PDA), a Personal Computer (PC), a camera, a camcorder, a smart watch, a Wearable Device (WD), an autonomous vehicle, or the like. It is understood that the embodiments of the present application do not limit the specific form of the edge device.
Image processing, such as image segmentation or image classification (recognition), also has application requirements in edge devices, such as vehicles or cell phones. However, processing images using neural networks on these edge devices presents difficulties due to the computational performance and cache limitations of the edge devices. To process an image using a neural network on an edge device, a reduction process is required for an image processing model formed by the neural network, which reduces the accuracy of the image processing model. Examples of image processing models in embodiments of the present application are image segmentation models or image classification models.
In order to solve the problem, the application provides a processing method and a processing device of an image segmentation model. The processing method and the processing device improve the precision of the feature extraction model by adjusting the layer width of the feature extraction submodel in the image segmentation model. Furthermore, the processing method and the processing device also improve the precision by carrying out knowledge distillation processing on the extraction feature extraction submodel and the image segmentation submodel in the image segmentation model. That is, the processing method and the processing apparatus provided in the present application are helpful for improving the segmentation accuracy of the image segmentation model, and thus are helpful for applying the image segmentation model to the edge device.
Fig. 1 is a schematic deployment diagram of a processing device according to an embodiment of the present application, where the processing device may be deployed in a cloud environment, and the cloud environment is an entity that provides a cloud service to a user by using a base resource in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, which may include a large number of computing devices (e.g., servers), and a cloud service platform. The processing device may be a server in the cloud data center for adjusting the image segmentation model; the processing device may also be a virtual machine created in the cloud data center for adapting the image processing model; the processing means may also be a software means deployed on a server or a virtual machine in the cloud data center for adjusting the image segmentation model, and the software means may be deployed in a distributed manner on a plurality of servers, or in a distributed manner on a plurality of virtual machines, or in a distributed manner on a virtual machine and a server.
As shown in fig. 1, the processing apparatus may be abstracted by a cloud service provider in a cloud service platform to a cloud service for adjusting an image processing model, and provided to a user, after the user purchases the cloud service in the cloud service platform, the cloud environment provides the cloud service for adjusting the image processing model to the user by using the cloud service, the user may upload an image processing model to be adjusted to the cloud environment through an Application Program Interface (API) or a web interface provided by the cloud service platform, the processing apparatus receives the image processing model to be adjusted, the image processing model to be adjusted is adjusted, an adjustment result is returned to an edge device where the user is located by the processing apparatus, or the adjustment result is stored in the cloud environment, for example: and the data is presented on a webpage interface of the cloud service platform for a user to view.
When the processing means is a software means, the processing means may also be deployed separately on a computing device in any environment, for example, on an edge device or on a computing device in a data center. As shown in fig. 2, computing device 200 includes a bus 201, a processor 202, a communication interface 203, and a memory 204.
The processor 202, memory 204, and communication interface 203 communicate via a bus 201. The processor 202 may be a Central Processing Unit (CPU). Memory 204 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 204 may also include a non-volatile memory (2 NVM), such as a read-only memory (2 ROM), a flash memory, a Hard Disk Drive (HDD) or a Solid State Drive (SSD). The memory 204 stores executable code included in the detection device, and the processor 202 reads the executable code in the memory 204 to perform a method of video similarity detection. The memory 204 may also include other software modules required to run processes, such as an operating system. The operating system may be LINUXTM,UNIXTM,WINDOWSTMAnd the like.
Fig. 3 is a schematic structural diagram of a processing device 300 according to an embodiment of the present application. The processing device is used for processing the image segmentation model, so that the target image segmentation model with relatively less calculation amount and less precision loss is obtained. This may make the target image segmentation model more suitable for application on edge devices.
The processing apparatus 300 includes a layer width adjustment module 310 and a knowledge distillation module 320. The image segmentation model to be adjusted may include a feature extraction submodel and an image segmentation submodel, where the feature extraction submodel is used to extract features of an image, and the image segmentation submodel is used to segment the image according to the extracted features.
One example of this feature extraction submodel is a convolutional neural network; an example of this image segmentation sub-model is a convolutional neural network.
It is understood that the embodiment of the present application is only an exemplary division of the structure and the functional modules of the processing apparatus 300, and the present application does not limit the specific division.
Fig. 4 is a schematic flowchart of a processing method of an image segmentation model according to an embodiment of the present application. The method shown in fig. 4 includes S410 and S420. The processing method of the present application will be described below by taking the processing apparatus shown in fig. 3 as an example of the processing method.
S410, adjusting the layer width of the feature extraction submodel in the image segmentation model to obtain a first feature extraction submodel. This step may be performed by the layer width processing module 310.
The layer width adjustment of the feature extraction submodel can be understood as adjusting all or part of the layers of channels of the feature extraction submodel, for example, some channels are added first, and then some channels are removed according to a reasonable method, so that the precision loss when the feature extraction submodel extracts features is reduced, and the segmentation precision of the image segmentation model can be improved.
The feature extraction submodel can extract features of different scales from the image.
In some possible implementations, genetic algorithms may be utilized to adjust the layer width of the feature extraction submodel,
when the layer width of the feature extraction submodel is adjusted by using a genetic algorithm, one implementation mode comprises the following operations.
And increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel.
And extracting the second feature to perform binarization on a plurality of channels of the submodel for K times to obtain K different first binarization codes, wherein the first binarization codes comprise a plurality of binarization numerical values, the binarization numerical values correspond to the channels one by one, each binarization numerical value in the binarization numerical values is used for indicating that the channel corresponding to each binarization numerical value is reserved or removed, and K is an integer greater than 1.
And reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels.
And selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each of the K third feature extraction submodels performs feature extraction on the images (such as the images in the verification image set), wherein M is a positive integer. In this case, if M is 1, that is, only one third feature extraction submodel is selected, the third feature extraction submodel may be regarded as the first feature extraction submodel. If M is greater than 1, the following operations may continue.
And carrying out cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1. The value of S may be empirically set in advance.
And reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels.
And taking the fourth feature extraction submodel as a third feature extraction submodel, taking K as S, and repeatedly executing the fourth operation to the sixth operation.
Wherein, binarizing the multiple channels of the second feature extraction submodel to obtain a first binarized code, which can be understood as: and generating a string of values, wherein the string of values comprises a plurality of values, each value is any one of two preset values, the number of the string of values is the same as the total number of the plurality of channels of the second feature extraction submodel, the plurality of values in the string of values are in one-to-one correspondence with the plurality of channels in the second feature extraction submodel, and each value in the string of values is used for indicating whether the corresponding channel is reserved or removed. The string of values is the binary code.
For example, two values "0" and "1" are preset, where "0" indicates that the corresponding channel is removed, and "1" indicates that the corresponding channel is reserved, and when 10 channels of the second feature extraction submodel are binarized once, a string of codes, for example, "1001101101", is randomly generated, and the string of codes is the first binarized code obtained by binarizing once by the second feature extraction submodel.
In some possible designs, the number of times of repeated execution of the fourth to sixth operations may be preset. When all the times of execution are finished, if all the obtained fourth feature submodels are multiple, one of the obtained fourth feature submodels can be selected as the first feature extraction submodel, for example, the fourth feature submodel with the highest probability is selected as the first feature extraction submodel.
In other possible designs, the fourth operation to the sixth operation are repeatedly performed until the S second binarization codes are the same, and a fourth feature extraction submodel obtained according to the second binarization codes is used as the first feature extraction submodel.
When M third feature extraction submodels are selected from the K third feature extraction submodels, the fitness of the third feature submodels can be calculated according to the intersection ratio and the calculated quantity when each third feature extraction submodel performs feature extraction on the image, and then the probability that the third feature extraction submodels are selected is calculated according to the fitness.
The fitness f (b) of the jth third feature extraction submodel in the M third feature extraction submodelsj) The following formula is satisfied:
f(bj)=mIoU(j)+α/N(j)
where mliou (j) represents an intersection ratio when the jth third feature extraction submodel performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction submodel performs feature extraction on the image, and α is a preset parameter, also called a hyper-parameter.
Probability Pr (b) that the jth third feature extraction submodel is selectedj) The following formula is satisfied:
Figure BDA0002188477490000101
after the probabilities that the K third feature extraction submodels are respectively selected are obtained through calculation, in some possible implementation manners, the K third feature submodels may be ranked according to the order of the probability values from large to small, and then the first M third feature submodels are selected as fourth feature extraction submodels, where the size of M may be preset according to experience.
Since the average intersection ratio of the adjusted feature extraction submodels is consistent with that before the layer width is adjusted, the precision loss can be reduced.
In an implementation manner of performing cross processing on M first binarized codes corresponding to the M third feature extraction submodels, any two first binarized codes are selected from the M first binarized codes each time the cross processing is performed, and then partial codes with the same length in the two first binarized codes are exchanged, so that two new binarized codes are obtained. For example, one first binarization code is 0101011100100101, and the other first binarization code is 0101101010110110, and the sixth to 12 th values in the two first binarization codes are swapped to obtain 0101001010110101 and 0101111100100110, respectively.
In one implementation of performing mutation processing on S binary codes obtained by performing cross processing on M first binary codes, for one binary code, a value of any length is replaced with another value, so as to obtain a different binary code. For example, for the binary code 10010010101101010, the fourth bit value 10010101 to the eleventh bit value 10010101 are selected and replaced with 01101010, thereby obtaining the binary code 10001101010101010. And performing mutation processing on the S binary codes to obtain S new binary codes. For convenience of the following description, the mutated binarization encoding is referred to as a second binarization encoding.
Optionally, the M first binarization codes corresponding to the M third feature extraction submodels may be directly subjected to cross processing to obtain M second binarization codes, where M is equal to S.
Optionally, the cross processing may be performed on the M first binarized codes corresponding to the M third feature extraction submodels, and after S new binarized codes are obtained, the mutation processing may not be performed. At this time, the S binary codes are S second binary codes.
And S420, obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel in the image segmentation model. The steps may be performed by knowledge distillation module 320.
In some possible implementations, an image segmentation model composed of the first feature extraction sub-model and the image segmentation sub-model may be used as the target image segmentation model.
In other possible implementations, knowledge distillation may be performed on the image segmentation model by a trained teacher image segmentation model, and the distilled image segmentation model is used as the target image segmentation model. The implementation mode can improve the segmentation precision of the target image segmentation model.
Optionally, in this implementation, the knowledge distillation may be performed on the first feature extraction sub-model according to a feature extraction sub-model in the teacher image segmentation model, and the knowledge distillation may be performed on an image segmentation sub-model in the to-be-trained image segmentation model according to an image segmentation sub-model in the teacher image segmentation model. The characteristic extraction submodel in the teacher model can be called as a teacher characteristic extraction submodel, and the image segmentation submodel in the teacher model can be called as a teacher image segmentation submodel.
The following describes an implementation manner of knowledge distillation of the teacher image segmentation model to the image segmentation model, taking as an example that the first feature extraction sub-model can perform multi-scale feature extraction on the image. Among them, knowledge distillation is also called guiding training.
Recording the feature map of the ith scale extracted by the teacher feature extraction sub-model as
Figure BDA0002188477490000111
The feature map of the ith scale extracted by the first feature extraction submodel is recorded as
Figure BDA0002188477490000112
Wherein, ctiAnd csiIs the number of channels, hiAnd wiIs the height and width of the feature map,
Figure BDA0002188477490000113
denotes cti×hi×wiThe dimension matrix is a matrix of dimensions,
Figure BDA0002188477490000114
denotes csi×hi×wiDimension matrix, i, takes from 1 to 4.
Adding the characteristic values of all channels of the ith scale characteristic diagram output by the teacher characteristic extraction submodel to obtain
Figure BDA0002188477490000115
Adding the eigenvalues of all channels of the ith scale characteristic diagram output by the first characteristic extraction submodel to obtain a sum
Figure BDA0002188477490000116
The loss function of the first feature extraction submodel at the ith scale is
Figure BDA0002188477490000117
Where | | can be any norm of the matrix. The values of i are respectively taken from 1 to 4, and loss functions of four different scales are obtained.
And guiding the first feature extraction submodel to train according to the four loss functions of the first feature extraction submodel, namely performing knowledge distillation to obtain a fifth feature extraction submodel.
Output of the teacher image segmentation submodel is recorded as
Figure BDA0002188477490000118
The output of the image segmentation submodel is recorded as
Figure BDA0002188477490000119
The loss function of the image segmentation sub-model satisfies the following formula:
τ(PT)=soft max(aT/τ),τ(PS)=soft max(aS/τ),Lτ=H(y,PS)+λ*H(τ(PT),τ(PS))
wherein, PTRepresenting the segmentation result of the teacher image segmentation model, PSThe method comprises the steps of representing a segmentation result of an image segmentation submodel, H representing a cross entropy loss function, y indicating whether the segmentation result of the image segmentation submodel is correct or not, lambda being a preset weighing coefficient, and softmax representing a flexible maximum function.
And guiding the image segmentation submodel to train according to the loss function of the image segmentation submodel so as to obtain the target image segmentation submodel.
And the fifth feature extraction submodel and the target image segmentation submodel form a target image segmentation model. The target image segmentation model obtained by knowledge distillation can greatly improve the accuracy of the output result, namely the accuracy of the segmentation result.
In other possible implementations, knowledge distillation may be performed only on the first feature extraction submodel, and an image segmentation model composed of the distilled feature extraction submodel and the image segmentation submodel is used as the target image segmentation model. This implementation may also improve the segmentation accuracy of the target image segmentation model.
In other possible implementations, knowledge distillation may be performed only on the image segmentation sub-model, and an image segmentation model composed of the distilled image segmentation sub-model and the first feature extraction sub-model is used as the target image segmentation model. This implementation may also improve the segmentation accuracy of the target image segmentation model.
The feature extraction submodel in the embodiment of the present application may be a binarized neural network model. This can reduce the parameter number and the calculation amount of the feature extraction sub-model, thereby reducing the parameter number and the calculation amount of the target image segmentation model, and facilitating the application of the image segmentation model on the edge device.
Although the binarized neural network model may reduce the accuracy of the feature extraction submodel, the loss of the accuracy can be reduced because the feature extraction submodel is subjected to layer width adjustment in the embodiment of the application. In addition, in the embodiment of the present application, knowledge distillation is performed on the image segmentation model by the teacher model, so that the loss of the precision can be further reduced.
The application also provides an image segmentation method, which comprises the following steps: and (5) segmenting the image to be processed by using the image segmentation model obtained by the processing method in the S410 to obtain a segmentation result.
The present application further provides an image segmentation model, which includes: the image segmentation model obtained by the processing method in S410 is used.
The present application also provides a computing device 200 as shown in fig. 2, wherein a processor 202 in the computing device 200 reads executable code stored in a memory 204 to perform the processing method described in fig. 4.
The present application also provides a chip 500 as shown in fig. 5, where the chip 500 may include a processor 502, and the processor 502 reads executable codes stored in a memory to execute the steps executed by the layer width adjusting module 310 and the knowledge distilling module 320, so as to implement the processing method described in fig. 4.
Chip 500 may also include a memory 504 for storing executable code.
The chip 500 may further comprise a communication interface 503 for inputting an image segmentation model to be trained and/or outputting a target image segmentation model. Optionally, it can also be used to input a teacher model.
The application also provides a processing method of the neural network model, and the neural network model can be an image classification model, an image recognition model, a voice recognition model and the like.
Optionally, a binary neural network sub-model may be included in the neural network model.
The processing method comprises the following steps: and carrying out layer width adjustment on the neural network model. Reference may be made to S410 for a manner of performing layer width adjustment on the neural network model.
Further, the processing method may further include knowledge distillation of the neural network model based on a teacher model. Reference may be made to S420 for ways of achieving knowledge distillation.
The present application also provides a computing device, like computing device 200, in which a processor reads executable code stored in a memory to perform the aforementioned neural network model processing method.
The present application also provides a processing apparatus similar to the processing apparatus 300, which is used for executing the processing method of the neural network model.
The present application also provides a chip similar to the chip 500, which is used for executing the processing method of the neural network model.
The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product for video similarity detection comprises one or more computer program instructions for video similarity detection, which when loaded and executed on a computer, cause, in whole or in part, the processes or functions described in accordance with the embodiments of the present application with reference to fig. 4.
The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optics, digital subscriber line, or wireless (e.g., infrared, wireless, microwave, etc.) means, the computer readable storage medium may store a readable storage medium of the computer program instructions for video similarity detection. (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., SSD).

Claims (19)

1. A processing method of an image segmentation model, wherein the image segmentation model comprises a feature extraction submodel and an image segmentation submodel, the feature extraction submodel is used for extracting features of an image, and the image segmentation submodel is used for segmenting the image according to the extracted features, and the method comprises the following steps:
adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel;
and obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel.
2. The processing method of claim 1, wherein said layer width adjusting said feature extraction submodel comprises:
increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel;
generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1;
reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels;
selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1;
performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1;
reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels;
taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times;
and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.
3. The processing method of claim 2, wherein the probability that the jth of the M third feature extraction submodels is selected satisfies the following formula:
Figure FDA0002188477480000011
wherein, Pr (b)j) Representing the probability of the jth third feature extraction submodel being selected, f (b)j) The following formula is satisfied:
f(bj)=mIoU(j)+α/N(j)
where mlou (j) represents an intersection ratio when the jth third feature extraction sub-model performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction sub-model performs feature extraction on the image, and α is a preset parameter.
4. The processing method according to any one of claims 1 to 3, wherein the deriving a target image segmentation model from the first feature extraction submodel and the image segmentation submodel comprises:
performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel;
and obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.
5. The processing method of claim 4, wherein the loss function of the fifth feature extraction submodel satisfies the following relationship:
Figure FDA0002188477480000021
wherein G istiRepresenting the ith scale feature, G, of the T scale features output by the teacher feature extraction modelsiThe ith scale feature in the T scale features output by the fifth feature extraction submodel is represented, T is a positive integer, the ith scale feature output by the teacher feature extraction model is the same as the scale of the ith scale feature output by the fifth feature extraction submodel, and the | | | | "represents a matrix norm.
6. The processing method of claim 4 or 5, wherein said deriving the target image segmentation model from the fifth feature extraction submodel and the image segmentation submodel comprises:
and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.
7. The processing method of claim 6, wherein the loss function of the target image segmentation submodel satisfies the relationship:
τ(PT)=soft max(aT/τ),τ(PS)=soft max(aS/τ),Lτ=H(y,PS)+λ*H(τ(PT),τ(PS))
wherein, PTRepresenting the segmentation result of the teacher image segmentation model, PSThe segmentation result of the target image segmentation submodel is represented, H represents a cross entropy loss function, y is used for indicating whether the segmentation result of the target image segmentation submodel is correct or not, lambda is a preset weighing coefficient, and softmax represents a flexible maximum function.
8. The processing method of any one of claims 1 to 7, wherein the feature extraction submodel is a binarized neural network model.
9. An apparatus for processing an image segmentation model, the image segmentation model comprising a feature extraction sub-model and an image segmentation sub-model, the feature extraction sub-model being configured to extract features of an image, the image segmentation sub-model being configured to segment the image according to the extracted features, the apparatus comprising:
the layer width adjusting module is used for adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel;
and the knowledge distillation module is used for extracting a sub-model and the image segmentation sub-model according to the first characteristic to obtain a target image segmentation model.
10. The processing apparatus as in claim 9, wherein the layer width adjustment module is specifically to:
increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel;
generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1;
reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels;
selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1;
performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1;
reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels;
taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times;
and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.
11. The processing apparatus as claimed in claim 10, wherein the probability that the jth of the M third feature extraction submodels is selected satisfies the following formula:
Figure FDA0002188477480000031
wherein, Pr (b)j) Representing the probability of the jth third feature extraction submodel being selected, f (b)j) The following formula is satisfied:
f(bj)=mIoU(j)+α/N(j)
where mlou (j) represents an intersection ratio when the jth third feature extraction sub-model performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction sub-model performs feature extraction on the image, and α is a preset parameter.
12. The processing apparatus of any of claims 9 to 11, wherein the knowledge distillation module is specifically configured to:
performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel;
and determining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.
13. The processing apparatus as claimed in claim 12, wherein the loss function of the fifth feature extraction submodel satisfies the following relationship:
Figure FDA0002188477480000032
wherein G istiRepresenting the ith scale feature, G, of the T scale features output by the teacher feature extraction modelsiThe ith scale feature in the T scale features output by the fifth feature extraction submodel is represented, T is a positive integer, the ith scale feature output by the teacher feature extraction model is the same as the scale of the ith scale feature output by the fifth feature extraction submodel, and the | | | | "represents a matrix norm.
14. The processing apparatus of claim 12 or 13, wherein the knowledge distillation module is specifically configured to:
and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.
15. The processing apparatus as claimed in claim 14, wherein the loss function of the target image segmentation submodel satisfies the following relationship:
τ(PT)=soft max(aT/τ),τ(PS)=soft max(aS/τ),Lτ=H(y,PS)+λ*H(τ(PT),τ(PS))
wherein, PTRepresenting the segmentation result of the teacher image segmentation model, PSThe segmentation result of the target image segmentation submodel is represented, H represents a cross entropy loss function, y is used for indicating whether the segmentation result of the target image segmentation submodel is correct or not, lambda is a preset weighing coefficient, and softmax represents a flexible maximum function.
16. The processing apparatus according to any one of claims 9 to 15, wherein the feature extraction submodel is a binarized neural network model.
17. An apparatus for processing an image segmentation model, comprising a processor and a memory, the memory being configured to store program instructions, the processor being configured to invoke the program instructions to perform the processing method of any one of claims 1 to 8.
18. A computer-readable storage medium, characterized in that the computer-readable medium stores instructions for implementing the processing method of any one of claims 1 to 8.
19. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the processing method of any one of claims 1 to 8.
CN201910845625.0A 2019-09-02 2019-09-02 Image segmentation model processing method and processing device Active CN112446888B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910845625.0A CN112446888B (en) 2019-09-02 2019-09-02 Image segmentation model processing method and processing device
PCT/CN2020/100058 WO2021042857A1 (en) 2019-09-02 2020-07-03 Processing method and processing apparatus for image segmentation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910845625.0A CN112446888B (en) 2019-09-02 2019-09-02 Image segmentation model processing method and processing device

Publications (2)

Publication Number Publication Date
CN112446888A true CN112446888A (en) 2021-03-05
CN112446888B CN112446888B (en) 2024-09-13

Family

ID=74732997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910845625.0A Active CN112446888B (en) 2019-09-02 2019-09-02 Image segmentation model processing method and processing device

Country Status (2)

Country Link
CN (1) CN112446888B (en)
WO (1) WO2021042857A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120420A (en) * 2021-12-01 2022-03-01 北京百度网讯科技有限公司 Image detection method and device
CN117726541A (en) * 2024-02-08 2024-03-19 北京理工大学 A dark-light video enhancement method and device based on binary neural network

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358206B (en) * 2022-01-12 2022-11-01 合肥工业大学 Binary neural network model training method and system, and image processing method and system
CN114549296B (en) * 2022-04-21 2022-07-12 北京世纪好未来教育科技有限公司 Training method of image processing model, image processing method and electronic equipment
CN115906651B (en) * 2022-12-06 2024-05-31 中电金信软件有限公司 Update method and device of binary neural network and electronic equipment
WO2024174998A1 (en) * 2023-02-20 2024-08-29 Peking University Systems and methods for image processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492286A (en) * 2018-03-13 2018-09-04 成都大学 A kind of medical image cutting method based on the U-shaped convolutional neural networks of binary channel
CN109544556A (en) * 2017-09-21 2019-03-29 江苏华夏知识产权服务有限公司 A kind of image characteristic extracting method
CN109741348A (en) * 2019-01-07 2019-05-10 哈尔滨理工大学 A method for segmentation of diabetic retinal images
CN110189334A (en) * 2019-05-28 2019-08-30 南京邮电大学 Medical Image Segmentation Method Based on Residual Fully Convolutional Neural Network Based on Attention Mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6767966B2 (en) * 2014-04-09 2020-10-14 エントルピー インコーポレーテッドEntrupy Inc. Authenticity of objects using machine learning from microscopic differences

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544556A (en) * 2017-09-21 2019-03-29 江苏华夏知识产权服务有限公司 A kind of image characteristic extracting method
CN108492286A (en) * 2018-03-13 2018-09-04 成都大学 A kind of medical image cutting method based on the U-shaped convolutional neural networks of binary channel
CN109741348A (en) * 2019-01-07 2019-05-10 哈尔滨理工大学 A method for segmentation of diabetic retinal images
CN110189334A (en) * 2019-05-28 2019-08-30 南京邮电大学 Medical Image Segmentation Method Based on Residual Fully Convolutional Neural Network Based on Attention Mechanism

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120420A (en) * 2021-12-01 2022-03-01 北京百度网讯科技有限公司 Image detection method and device
CN114120420B (en) * 2021-12-01 2024-02-13 北京百度网讯科技有限公司 Image detection method and device
CN117726541A (en) * 2024-02-08 2024-03-19 北京理工大学 A dark-light video enhancement method and device based on binary neural network

Also Published As

Publication number Publication date
WO2021042857A1 (en) 2021-03-11
CN112446888B (en) 2024-09-13

Similar Documents

Publication Publication Date Title
CN112446888B (en) Image segmentation model processing method and processing device
CN113658100B (en) Three-dimensional target object detection method, device, electronic equipment and storage medium
US11915128B2 (en) Neural network circuit device, neural network processing method, and neural network execution program
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
US20220261659A1 (en) Method and Apparatus for Determining Neural Network
CN111507378A (en) Method and apparatus for training image processing model
CN111382868A (en) Neural network structure search method and neural network structure search device
CN110175671A (en) Construction method, image processing method and the device of neural network
CN113705769A (en) Neural network training method and device
CN112883149B (en) Natural language processing method and device
CN113537462B (en) Data processing method, neural network quantization method and related device
US20220156508A1 (en) Method For Automatically Designing Efficient Hardware-Aware Neural Networks For Visual Recognition Using Knowledge Distillation
CN113011568B (en) Model training method, data processing method and equipment
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
CN114627282B (en) Method, application method, equipment, device and medium for establishing target detection model
WO2022252455A1 (en) Methods and systems for training graph neural network using supervised contrastive learning
CN114155388B (en) Image recognition method and device, computer equipment and storage medium
EP4517682A1 (en) Real time salient object detection in images and videos
CN111738403A (en) Neural network optimization method and related equipment
CN112861601B (en) Method and related device for generating adversarial samples
CN114373092A (en) A progressive training fine-grained visual classification method based on puzzle arrangement learning
CN116992151A (en) Online course recommendation method based on double-tower graph convolution neural network
JP7242590B2 (en) Machine learning model compression system, pruning method and program
CN116563683A (en) A remote sensing image scene classification method based on convolutional neural network and multi-layer perceptron
CN114463574A (en) A scene classification method and device for remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant