CN112446888A

CN112446888A - Processing method and processing device for image segmentation model

Info

Publication number: CN112446888A
Application number: CN201910845625.0A
Authority: CN
Inventors: 韩凯; 闻长远; 舒晗; 陈翼翼; 苏霞; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2021-03-05
Anticipated expiration: 2039-09-02
Also published as: WO2021042857A1; CN112446888B

Abstract

The application relates to an image segmentation technology in the field of artificial intelligence, and provides a processing method and a processing device of an image segmentation model. The image segmentation model comprises a feature extraction sub-model and an image segmentation sub-model, wherein the feature extraction sub-model is used for extracting the features of the image, and the image segmentation sub-model is used for segmenting the image according to the extracted features. The processing method comprises the following steps: adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel; and extracting the sub-model and the image segmentation sub-model according to the first characteristic to obtain a target image segmentation model. The application provides a processing method and a processing device of an image segmentation model, which are beneficial to improving the segmentation precision of the image segmentation model, thereby being beneficial to realizing an image segmentation technology on edge equipment.

Description

Processing method and processing device for image segmentation model

Technical Field

The present application relates to the field of image processing, and more particularly, to a processing method and a processing apparatus for an image segmentation model.

Background

Image segmentation also has application requirements on edge devices, such as vehicles or cell phones. However, processing images using neural networks on these edge devices presents difficulties due to the computational performance and cache limitations of the edge devices.

Therefore, how to obtain an image segmentation model applicable to edge devices is a technical problem to be solved urgently.

Disclosure of Invention

The application provides a processing method and a processing device of an image segmentation model, which are beneficial to improving the segmentation precision of the image segmentation model so as to realize image segmentation on edge equipment.

In a first aspect, a method for processing an image segmentation model is provided, where the image segmentation model includes a feature extraction sub-model and an image segmentation sub-model, the feature extraction sub-model is used to extract features of an image, and the image segmentation sub-model is used to segment the image according to the extracted features, and the method includes: adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel; and obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel.

The image segmentation model may be trained or untrained.

In the method, the layer width of the feature extraction submodel is adjusted, so that the accuracy of the feature extraction submodel is improved, the segmentation accuracy of the target image segmentation model is improved, and the image segmentation model is easier to apply to edge equipment.

Particularly, in the scene that the feature extraction submodel is the binarization neural network model, the precision loss caused by the feature extraction submodel being the binarization neural network model can be reduced, so that the segmentation precision of the image segmentation model is improved.

In some possible implementations of the first aspect, the performing layer width adjustment on the feature extraction submodel includes: increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel; generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1; reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels; selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1; performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1; reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels; taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times; and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.

Optionally, a probability that a jth third feature extraction sub-model of the M third feature extraction sub-models is selected satisfies the following formula:

wherein, Pr (b)_j) Representing the probability of the jth third feature extraction submodel being selected, f (b)_j) The following formula is satisfied:

f(b_j)＝mIoU(j)+α/N(j)

where mlou (j) represents an intersection ratio when the jth third feature extraction sub-model performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction sub-model performs feature extraction on the image, and α is a preset parameter.

In some possible implementation manners of the first aspect, the obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel includes: performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel; and obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.

The teacher model may include, among other things, feature extraction models trained using conventional methods.

In the implementation mode, the trained teacher model is used for carrying out knowledge distillation on the feature extraction submodel in the image segmentation model, so that the precision loss caused by binarization can be reduced, and the precision loss of image segmentation is reduced.

Optionally, the loss function of the fifth feature extraction submodel satisfies the following relationship:

wherein G is_tiRepresenting the ith scale feature, G, of the T scale features output by the teacher feature extraction model_siTo representThe fifth feature extraction submodel outputs the ith scale feature of the T scale features, T is a positive integer, the ith scale feature output by the teacher feature extraction submodel has the same scale as the ith scale feature output by the fifth feature extraction submodel, and the | | | | "represents a matrix norm.

In some possible implementation manners of the first aspect, the obtaining the target image segmentation model according to the fifth feature extraction sub-model and the image segmentation sub-model includes: and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.

The teacher model may include, among other things, an image segmentation model trained using conventional methods.

In this implementation, knowledge distillation is performed on the image segmentation submodel through the trained teacher model, and the accuracy of the image segmentation model can be improved.

Particularly, in the scene that the feature extraction submodel is the binarization neural network model, the precision loss caused by the feature extraction submodel being the binarization neural network model can be reduced, so that the precision of the image segmentation model is improved.

Optionally, the loss function of the target image segmentation sub-model satisfies the following relationship:

τ(P_T)＝soft max(a_T/τ)，τ(P_S)＝soft max(a_S/τ)，L_τ＝H(y,P_S)+λ*H(τ(P_T),τ(P_S))

wherein, P_TRepresenting the segmentation result of the teacher image segmentation model, P_SThe segmentation result of the target image segmentation submodel is represented, H represents a cross entropy loss function, y is used for indicating whether the segmentation result of the target image segmentation submodel is correct or not, lambda is a preset weighing coefficient, and softmax represents a flexible maximum function.

Alternatively, the feature extraction sub-model may be a binarization neural network model, which may reduce the parameters and the computation amount of the image segmentation model, thereby facilitating the application of the image segmentation model on the edge device.

In a second aspect, there is provided an apparatus for processing an image segmentation model, the image segmentation model including a feature extraction sub-model and an image segmentation sub-model, the feature extraction sub-model being used for extracting features of an image, and the image segmentation sub-model being used for segmenting the image according to the extracted features, the apparatus comprising: a layer width adjusting module and a knowledge distilling module.

The layer width adjusting module is used for adjusting the layer width of the feature extraction submodel in the image segmentation model to obtain a first feature extraction submodel; and the knowledge distillation module is used for extracting a sub-model and obtaining a target image segmentation model according to the first characteristic.

The device adjusts the layer width of the feature extraction submodel, and is beneficial to improving the precision of the feature extraction submodel, so that the segmentation precision of the image segmentation model is improved, and the image segmentation model is further beneficial to being applied to edge equipment.

In some possible implementations of the second aspect, the layer width adjusting module is specifically configured to: increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel; generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1; reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels; selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1; performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1; reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels; taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times; and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.

f(b_j)＝mIoU(j)+α/N(j)

In some possible implementations of the second aspect, the knowledge distillation module is specifically configured to: performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel; and obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.

In the implementation mode, the trained teacher model is used for carrying out knowledge distillation on the feature extraction submodel in the image segmentation model, so that the image segmentation precision can be improved.

wherein G is_tiRepresenting the ith scale feature, G, of the T scale features output by the teacher feature extraction model_siThe ith scale feature in the T scale features output by the fifth feature extraction submodel is represented, T is a positive integer, the ith scale feature output by the teacher feature extraction model is the same as the scale of the ith scale feature output by the fifth feature extraction submodel, and the | | | | "represents a matrix norm.

In some possible implementations of the second aspect, the knowledge distillation module is specifically configured to: and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.

In the implementation mode, knowledge distillation is carried out on the image segmentation sub-model through the trained teacher model, so that precision loss caused by binarization can be reduced, and further, precision loss of image segmentation is reduced.

In a third aspect, there is provided an apparatus for processing an image segmentation model, the apparatus comprising: a memory for storing a program; a processor configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to perform the method in any one of the implementations of the first aspect.

The processor in the third aspect may be a Central Processing Unit (CPU), or may be a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), a Tensor Processing Unit (TPU), and the like. The artificial intelligence accelerator is an artificial intelligence accelerator application specific integrated circuit which is completely customized for machine learning by Google (Google).

In a fourth aspect, a computer readable medium is provided, which stores program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first aspect.

In a fifth aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method of any one of the implementations of the first aspect.

In a sixth aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to execute the method in any one of the implementation manners in the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in any one of the implementation manners of the first aspect.

The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

In a seventh aspect, an electronic device is provided, which includes the processing apparatus in the second aspect, or includes the processing apparatus in the third aspect.

Drawings

FIG. 1 is a schematic deployment of a processing device of the present application;

FIG. 2 is a schematic block diagram of a computing device of the present application;

FIG. 3 is a schematic block diagram of a processing device according to one embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a processing method of one embodiment of the present application;

FIG. 5 is a schematic block diagram of a chip according to one embodiment of the present application.

Detailed Description

The embodiments of the present application relate to a large number of related applications of neural networks, and in order to better understand the solution of the embodiments of the present application, the following first introduces related terms and other related concepts of neural networks that may be related to the embodiments of the present application.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x_sAnd an arithmetic unit with intercept 1 as input, the output of which can be as shown in equation (1-1):

wherein s is 1, 2, … … n, n is a natural number greater than 1, and W is_sIs x_sB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.

Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein，

Is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.

In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(3) Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(5) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.

(6) Binary neural network

The binarization neural network is a neural network which binarizes the weight and activation value of a part of or all layers into 1 or-1. The binarization neural network generally only binarizes the weight value and the activation value of the floating point type neural network, and does not change the structure of the network.

The weight and the activation value in the binarization neural network can occupy smaller storage space, and theoretically, the memory consumption is reduced to 1/32 times of that of the floating point type neural network. In addition, the binarization neural network replaces multiplication and addition operation in a floating point type neural network by using bit operation, so that the operation time is greatly reduced. The binarization neural network may also be referred to as a binary neural network or a binary network.

(7) Knowledge distillation

A complex neural network model is a set of a plurality of independent neural network models, or a larger and more complex network model obtained by training under strong constraint conditions (such as high random deactivation rate). The way in which smaller and simpler network models (e.g., scaled-down models that can be deployed at the application site) are trained from the larger and more complex network models is known as knowledge distillation. Among them, the larger and more complex network model is called a teacher network (teacher network), and the smaller and simpler network model is called a student network (student network). The teacher network may provide more accurate supervision information than the student network. The student network has greater computational throughput and fewer model parameters than the teacher network.

(8) Neural network model

Neural network models are a class of mathematical computational models that mimic the structure and function of biological neural networks (the central nervous system of animals). A neural network model may include a number of different functional neural network layers, each layer including parameters and computational formulas. Different layers in the neural network model have different names according to different calculation formulas or different functions, for example: the layers that are subjected to convolution calculations are called convolutional layers, which are commonly used for feature extraction of input signals (e.g., images).

One neural network model may also be composed of a combination of a plurality of existing neural network models. Neural network models of different structures may be used for different scenes (e.g., classification, recognition, or image segmentation) or to provide different effects when used for the same scene. The neural network model structure specifically includes one or more of the following: the neural network model has different network layers, different sequences of the network layers, and different weights, parameters or calculation formulas in each network layer.

There are many different neural network models with higher accuracy for identifying or classifying or image segmentation application scenarios in the industry. Some neural network models can be trained by a specific training set and then perform a task alone or in combination with other neural network models (or other functional modules). Some neural network models may also be used directly to perform a task alone or in combination with other neural network models (or other functional modules).

(9) Edge device

An edge device refers to any device having computing resources and network resources between the data generation source and the cloud center. For example, the mobile phone is an edge device between a person and a cloud center, and the gateway is an edge device between a smart home and the cloud center. In an ideal environment, edge device refers to a device that analyzes or processes data near the source of the data generation. And because no data flows, the network flow and the response time are further reduced.

The edge device in the embodiment of the present application may be a mobile phone with computing capability, a Tablet Personal Computer (TPC), a media player, a smart home, a notebook computer (LC), a Personal Digital Assistant (PDA), a Personal Computer (PC), a camera, a camcorder, a smart watch, a Wearable Device (WD), an autonomous vehicle, or the like. It is understood that the embodiments of the present application do not limit the specific form of the edge device.

Image processing, such as image segmentation or image classification (recognition), also has application requirements in edge devices, such as vehicles or cell phones. However, processing images using neural networks on these edge devices presents difficulties due to the computational performance and cache limitations of the edge devices. To process an image using a neural network on an edge device, a reduction process is required for an image processing model formed by the neural network, which reduces the accuracy of the image processing model. Examples of image processing models in embodiments of the present application are image segmentation models or image classification models.

In order to solve the problem, the application provides a processing method and a processing device of an image segmentation model. The processing method and the processing device improve the precision of the feature extraction model by adjusting the layer width of the feature extraction submodel in the image segmentation model. Furthermore, the processing method and the processing device also improve the precision by carrying out knowledge distillation processing on the extraction feature extraction submodel and the image segmentation submodel in the image segmentation model. That is, the processing method and the processing apparatus provided in the present application are helpful for improving the segmentation accuracy of the image segmentation model, and thus are helpful for applying the image segmentation model to the edge device.

Fig. 1 is a schematic deployment diagram of a processing device according to an embodiment of the present application, where the processing device may be deployed in a cloud environment, and the cloud environment is an entity that provides a cloud service to a user by using a base resource in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, which may include a large number of computing devices (e.g., servers), and a cloud service platform. The processing device may be a server in the cloud data center for adjusting the image segmentation model; the processing device may also be a virtual machine created in the cloud data center for adapting the image processing model; the processing means may also be a software means deployed on a server or a virtual machine in the cloud data center for adjusting the image segmentation model, and the software means may be deployed in a distributed manner on a plurality of servers, or in a distributed manner on a plurality of virtual machines, or in a distributed manner on a virtual machine and a server.

As shown in fig. 1, the processing apparatus may be abstracted by a cloud service provider in a cloud service platform to a cloud service for adjusting an image processing model, and provided to a user, after the user purchases the cloud service in the cloud service platform, the cloud environment provides the cloud service for adjusting the image processing model to the user by using the cloud service, the user may upload an image processing model to be adjusted to the cloud environment through an Application Program Interface (API) or a web interface provided by the cloud service platform, the processing apparatus receives the image processing model to be adjusted, the image processing model to be adjusted is adjusted, an adjustment result is returned to an edge device where the user is located by the processing apparatus, or the adjustment result is stored in the cloud environment, for example: and the data is presented on a webpage interface of the cloud service platform for a user to view.

When the processing means is a software means, the processing means may also be deployed separately on a computing device in any environment, for example, on an edge device or on a computing device in a data center. As shown in fig. 2, computing device 200 includes a bus 201, a processor 202, a communication interface 203, and a memory 204.

The processor 202, memory 204, and communication interface 203 communicate via a bus 201. The processor 202 may be a Central Processing Unit (CPU). Memory 204 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 204 may also include a non-volatile memory (2 NVM), such as a read-only memory (2 ROM), a flash memory, a Hard Disk Drive (HDD) or a Solid State Drive (SSD). The memory 204 stores executable code included in the detection device, and the processor 202 reads the executable code in the memory 204 to perform a method of video similarity detection. The memory 204 may also include other software modules required to run processes, such as an operating system. The operating system may be LINUX^TM，UNIX^TM，WINDOWS^TMAnd the like.

Fig. 3 is a schematic structural diagram of a processing device 300 according to an embodiment of the present application. The processing device is used for processing the image segmentation model, so that the target image segmentation model with relatively less calculation amount and less precision loss is obtained. This may make the target image segmentation model more suitable for application on edge devices.

The processing apparatus 300 includes a layer width adjustment module 310 and a knowledge distillation module 320. The image segmentation model to be adjusted may include a feature extraction submodel and an image segmentation submodel, where the feature extraction submodel is used to extract features of an image, and the image segmentation submodel is used to segment the image according to the extracted features.

One example of this feature extraction submodel is a convolutional neural network; an example of this image segmentation sub-model is a convolutional neural network.

It is understood that the embodiment of the present application is only an exemplary division of the structure and the functional modules of the processing apparatus 300, and the present application does not limit the specific division.

Fig. 4 is a schematic flowchart of a processing method of an image segmentation model according to an embodiment of the present application. The method shown in fig. 4 includes S410 and S420. The processing method of the present application will be described below by taking the processing apparatus shown in fig. 3 as an example of the processing method.

S410, adjusting the layer width of the feature extraction submodel in the image segmentation model to obtain a first feature extraction submodel. This step may be performed by the layer width processing module 310.

The layer width adjustment of the feature extraction submodel can be understood as adjusting all or part of the layers of channels of the feature extraction submodel, for example, some channels are added first, and then some channels are removed according to a reasonable method, so that the precision loss when the feature extraction submodel extracts features is reduced, and the segmentation precision of the image segmentation model can be improved.

The feature extraction submodel can extract features of different scales from the image.

In some possible implementations, genetic algorithms may be utilized to adjust the layer width of the feature extraction submodel,

when the layer width of the feature extraction submodel is adjusted by using a genetic algorithm, one implementation mode comprises the following operations.

And increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel.

And extracting the second feature to perform binarization on a plurality of channels of the submodel for K times to obtain K different first binarization codes, wherein the first binarization codes comprise a plurality of binarization numerical values, the binarization numerical values correspond to the channels one by one, each binarization numerical value in the binarization numerical values is used for indicating that the channel corresponding to each binarization numerical value is reserved or removed, and K is an integer greater than 1.

And reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels.

And selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each of the K third feature extraction submodels performs feature extraction on the images (such as the images in the verification image set), wherein M is a positive integer. In this case, if M is 1, that is, only one third feature extraction submodel is selected, the third feature extraction submodel may be regarded as the first feature extraction submodel. If M is greater than 1, the following operations may continue.

And carrying out cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1. The value of S may be empirically set in advance.

And reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels.

And taking the fourth feature extraction submodel as a third feature extraction submodel, taking K as S, and repeatedly executing the fourth operation to the sixth operation.

Wherein, binarizing the multiple channels of the second feature extraction submodel to obtain a first binarized code, which can be understood as: and generating a string of values, wherein the string of values comprises a plurality of values, each value is any one of two preset values, the number of the string of values is the same as the total number of the plurality of channels of the second feature extraction submodel, the plurality of values in the string of values are in one-to-one correspondence with the plurality of channels in the second feature extraction submodel, and each value in the string of values is used for indicating whether the corresponding channel is reserved or removed. The string of values is the binary code.

For example, two values "0" and "1" are preset, where "0" indicates that the corresponding channel is removed, and "1" indicates that the corresponding channel is reserved, and when 10 channels of the second feature extraction submodel are binarized once, a string of codes, for example, "1001101101", is randomly generated, and the string of codes is the first binarized code obtained by binarizing once by the second feature extraction submodel.

In some possible designs, the number of times of repeated execution of the fourth to sixth operations may be preset. When all the times of execution are finished, if all the obtained fourth feature submodels are multiple, one of the obtained fourth feature submodels can be selected as the first feature extraction submodel, for example, the fourth feature submodel with the highest probability is selected as the first feature extraction submodel.

In other possible designs, the fourth operation to the sixth operation are repeatedly performed until the S second binarization codes are the same, and a fourth feature extraction submodel obtained according to the second binarization codes is used as the first feature extraction submodel.

When M third feature extraction submodels are selected from the K third feature extraction submodels, the fitness of the third feature submodels can be calculated according to the intersection ratio and the calculated quantity when each third feature extraction submodel performs feature extraction on the image, and then the probability that the third feature extraction submodels are selected is calculated according to the fitness.

The fitness f (b) of the jth third feature extraction submodel in the M third feature extraction submodels_j) The following formula is satisfied:

f(b_j)＝mIoU(j)+α/N(j)

where mliou (j) represents an intersection ratio when the jth third feature extraction submodel performs feature extraction on an image, n (j) represents a calculation amount when the jth third feature extraction submodel performs feature extraction on the image, and α is a preset parameter, also called a hyper-parameter.

Probability Pr (b) that the jth third feature extraction submodel is selected_j) The following formula is satisfied:

after the probabilities that the K third feature extraction submodels are respectively selected are obtained through calculation, in some possible implementation manners, the K third feature submodels may be ranked according to the order of the probability values from large to small, and then the first M third feature submodels are selected as fourth feature extraction submodels, where the size of M may be preset according to experience.

Since the average intersection ratio of the adjusted feature extraction submodels is consistent with that before the layer width is adjusted, the precision loss can be reduced.

In an implementation manner of performing cross processing on M first binarized codes corresponding to the M third feature extraction submodels, any two first binarized codes are selected from the M first binarized codes each time the cross processing is performed, and then partial codes with the same length in the two first binarized codes are exchanged, so that two new binarized codes are obtained. For example, one first binarization code is 0101011100100101, and the other first binarization code is 0101101010110110, and the sixth to 12 th values in the two first binarization codes are swapped to obtain 0101001010110101 and 0101111100100110, respectively.

In one implementation of performing mutation processing on S binary codes obtained by performing cross processing on M first binary codes, for one binary code, a value of any length is replaced with another value, so as to obtain a different binary code. For example, for the binary code 10010010101101010, the fourth bit value 10010101 to the eleventh bit value 10010101 are selected and replaced with 01101010, thereby obtaining the binary code 10001101010101010. And performing mutation processing on the S binary codes to obtain S new binary codes. For convenience of the following description, the mutated binarization encoding is referred to as a second binarization encoding.

Optionally, the M first binarization codes corresponding to the M third feature extraction submodels may be directly subjected to cross processing to obtain M second binarization codes, where M is equal to S.

Optionally, the cross processing may be performed on the M first binarized codes corresponding to the M third feature extraction submodels, and after S new binarized codes are obtained, the mutation processing may not be performed. At this time, the S binary codes are S second binary codes.

And S420, obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel in the image segmentation model. The steps may be performed by knowledge distillation module 320.

In some possible implementations, an image segmentation model composed of the first feature extraction sub-model and the image segmentation sub-model may be used as the target image segmentation model.

In other possible implementations, knowledge distillation may be performed on the image segmentation model by a trained teacher image segmentation model, and the distilled image segmentation model is used as the target image segmentation model. The implementation mode can improve the segmentation precision of the target image segmentation model.

Optionally, in this implementation, the knowledge distillation may be performed on the first feature extraction sub-model according to a feature extraction sub-model in the teacher image segmentation model, and the knowledge distillation may be performed on an image segmentation sub-model in the to-be-trained image segmentation model according to an image segmentation sub-model in the teacher image segmentation model. The characteristic extraction submodel in the teacher model can be called as a teacher characteristic extraction submodel, and the image segmentation submodel in the teacher model can be called as a teacher image segmentation submodel.

The following describes an implementation manner of knowledge distillation of the teacher image segmentation model to the image segmentation model, taking as an example that the first feature extraction sub-model can perform multi-scale feature extraction on the image. Among them, knowledge distillation is also called guiding training.

Recording the feature map of the ith scale extracted by the teacher feature extraction sub-model as

The feature map of the ith scale extracted by the first feature extraction submodel is recorded as

Wherein, c_tiAnd c_siIs the number of channels, h_iAnd w_iIs the height and width of the feature map,

denotes c_ti×h_i×w_iThe dimension matrix is a matrix of dimensions,

denotes c_si×h_i×w_iDimension matrix, i, takes from 1 to 4.

Adding the characteristic values of all channels of the ith scale characteristic diagram output by the teacher characteristic extraction submodel to obtain

Adding the eigenvalues of all channels of the ith scale characteristic diagram output by the first characteristic extraction submodel to obtain a sum

The loss function of the first feature extraction submodel at the ith scale is

Where | | can be any norm of the matrix. The values of i are respectively taken from 1 to 4, and loss functions of four different scales are obtained.

And guiding the first feature extraction submodel to train according to the four loss functions of the first feature extraction submodel, namely performing knowledge distillation to obtain a fifth feature extraction submodel.

Output of the teacher image segmentation submodel is recorded as

The output of the image segmentation submodel is recorded as

The loss function of the image segmentation sub-model satisfies the following formula:

wherein, P_TRepresenting the segmentation result of the teacher image segmentation model, P_SThe method comprises the steps of representing a segmentation result of an image segmentation submodel, H representing a cross entropy loss function, y indicating whether the segmentation result of the image segmentation submodel is correct or not, lambda being a preset weighing coefficient, and softmax representing a flexible maximum function.

And guiding the image segmentation submodel to train according to the loss function of the image segmentation submodel so as to obtain the target image segmentation submodel.

And the fifth feature extraction submodel and the target image segmentation submodel form a target image segmentation model. The target image segmentation model obtained by knowledge distillation can greatly improve the accuracy of the output result, namely the accuracy of the segmentation result.

In other possible implementations, knowledge distillation may be performed only on the first feature extraction submodel, and an image segmentation model composed of the distilled feature extraction submodel and the image segmentation submodel is used as the target image segmentation model. This implementation may also improve the segmentation accuracy of the target image segmentation model.

In other possible implementations, knowledge distillation may be performed only on the image segmentation sub-model, and an image segmentation model composed of the distilled image segmentation sub-model and the first feature extraction sub-model is used as the target image segmentation model. This implementation may also improve the segmentation accuracy of the target image segmentation model.

The feature extraction submodel in the embodiment of the present application may be a binarized neural network model. This can reduce the parameter number and the calculation amount of the feature extraction sub-model, thereby reducing the parameter number and the calculation amount of the target image segmentation model, and facilitating the application of the image segmentation model on the edge device.

Although the binarized neural network model may reduce the accuracy of the feature extraction submodel, the loss of the accuracy can be reduced because the feature extraction submodel is subjected to layer width adjustment in the embodiment of the application. In addition, in the embodiment of the present application, knowledge distillation is performed on the image segmentation model by the teacher model, so that the loss of the precision can be further reduced.

The application also provides an image segmentation method, which comprises the following steps: and (5) segmenting the image to be processed by using the image segmentation model obtained by the processing method in the S410 to obtain a segmentation result.

The present application further provides an image segmentation model, which includes: the image segmentation model obtained by the processing method in S410 is used.

The present application also provides a computing device 200 as shown in fig. 2, wherein a processor 202 in the computing device 200 reads executable code stored in a memory 204 to perform the processing method described in fig. 4.

The present application also provides a chip 500 as shown in fig. 5, where the chip 500 may include a processor 502, and the processor 502 reads executable codes stored in a memory to execute the steps executed by the layer width adjusting module 310 and the knowledge distilling module 320, so as to implement the processing method described in fig. 4.

Chip 500 may also include a memory 504 for storing executable code.

The chip 500 may further comprise a communication interface 503 for inputting an image segmentation model to be trained and/or outputting a target image segmentation model. Optionally, it can also be used to input a teacher model.

The application also provides a processing method of the neural network model, and the neural network model can be an image classification model, an image recognition model, a voice recognition model and the like.

Optionally, a binary neural network sub-model may be included in the neural network model.

The processing method comprises the following steps: and carrying out layer width adjustment on the neural network model. Reference may be made to S410 for a manner of performing layer width adjustment on the neural network model.

Further, the processing method may further include knowledge distillation of the neural network model based on a teacher model. Reference may be made to S420 for ways of achieving knowledge distillation.

The present application also provides a computing device, like computing device 200, in which a processor reads executable code stored in a memory to perform the aforementioned neural network model processing method.

The present application also provides a processing apparatus similar to the processing apparatus 300, which is used for executing the processing method of the neural network model.

The present application also provides a chip similar to the chip 500, which is used for executing the processing method of the neural network model.

The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product for video similarity detection comprises one or more computer program instructions for video similarity detection, which when loaded and executed on a computer, cause, in whole or in part, the processes or functions described in accordance with the embodiments of the present application with reference to fig. 4.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optics, digital subscriber line, or wireless (e.g., infrared, wireless, microwave, etc.) means, the computer readable storage medium may store a readable storage medium of the computer program instructions for video similarity detection. (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., SSD).

Claims

1. A processing method of an image segmentation model, wherein the image segmentation model comprises a feature extraction submodel and an image segmentation submodel, the feature extraction submodel is used for extracting features of an image, and the image segmentation submodel is used for segmenting the image according to the extracted features, and the method comprises the following steps:

adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel;

and obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel.

2. The processing method of claim 1, wherein said layer width adjusting said feature extraction submodel comprises:

increasing the channel number of the feature extraction submodel to obtain a second feature extraction submodel;

generating K different first binarization codes for the second feature extraction submodel, wherein the first binarization codes comprise a plurality of binarization values, the binarization values correspond to a plurality of channels of the second feature extraction submodel in a one-to-one manner, each binarization value in the binarization values is used for indicating whether the channel corresponding to each binarization value is reserved or removed, and K is an integer greater than 1;

reserving or removing channels of the second feature extraction submodel according to the K different first binary codes to obtain K third feature extraction submodels;

selecting M third feature extraction submodels from the K third feature extraction submodels according to the intersection ratio and the calculated amount when each third feature extraction submodel in the K third feature extraction submodels extracts the features of the image, wherein M is an integer greater than 1;

performing cross and/or variation processing on the M first binarization codes corresponding to the M third feature extraction submodels to obtain S second binarization codes, wherein S is an integer greater than 1;

reserving or removing a channel of the second feature extraction submodel according to each second binarization code in the S second binarization codes to obtain S fourth feature extraction submodels;

taking the fourth feature extraction sub-model as the third feature extraction sub-model, taking K as S, and repeatedly executing the fourth operation to the sixth operation for T times;

and taking one of the S fourth feature extraction submodels obtained at the last time as the first feature extraction submodel.

3. The processing method of claim 2, wherein the probability that the jth of the M third feature extraction submodels is selected satisfies the following formula:

f(b_j)＝mIoU(j)+α/N(j)

4. The processing method according to any one of claims 1 to 3, wherein the deriving a target image segmentation model from the first feature extraction submodel and the image segmentation submodel comprises:

performing knowledge distillation on the first feature extraction model by using a teacher feature extraction model to obtain a fifth feature extraction submodel;

and obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.

5. The processing method of claim 4, wherein the loss function of the fifth feature extraction submodel satisfies the following relationship:

6. The processing method of claim 4 or 5, wherein said deriving the target image segmentation model from the fifth feature extraction submodel and the image segmentation submodel comprises:

and carrying out knowledge distillation on the image segmentation submodel according to a teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image segmentation model comprises the fifth feature extraction submodel and the target image segmentation submodel.

7. The processing method of claim 6, wherein the loss function of the target image segmentation submodel satisfies the relationship:

8. The processing method of any one of claims 1 to 7, wherein the feature extraction submodel is a binarized neural network model.

9. An apparatus for processing an image segmentation model, the image segmentation model comprising a feature extraction sub-model and an image segmentation sub-model, the feature extraction sub-model being configured to extract features of an image, the image segmentation sub-model being configured to segment the image according to the extracted features, the apparatus comprising:

the layer width adjusting module is used for adjusting the layer width of the feature extraction submodel to obtain a first feature extraction submodel;

and the knowledge distillation module is used for extracting a sub-model and the image segmentation sub-model according to the first characteristic to obtain a target image segmentation model.

10. The processing apparatus as in claim 9, wherein the layer width adjustment module is specifically to:

11. The processing apparatus as claimed in claim 10, wherein the probability that the jth of the M third feature extraction submodels is selected satisfies the following formula:

f(b_j)＝mIoU(j)+α/N(j)

12. The processing apparatus of any of claims 9 to 11, wherein the knowledge distillation module is specifically configured to:

and determining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel.

13. The processing apparatus as claimed in claim 12, wherein the loss function of the fifth feature extraction submodel satisfies the following relationship:

14. The processing apparatus of claim 12 or 13, wherein the knowledge distillation module is specifically configured to:

15. The processing apparatus as claimed in claim 14, wherein the loss function of the target image segmentation submodel satisfies the following relationship:

16. The processing apparatus according to any one of claims 9 to 15, wherein the feature extraction submodel is a binarized neural network model.

17. An apparatus for processing an image segmentation model, comprising a processor and a memory, the memory being configured to store program instructions, the processor being configured to invoke the program instructions to perform the processing method of any one of claims 1 to 8.

18. A computer-readable storage medium, characterized in that the computer-readable medium stores instructions for implementing the processing method of any one of claims 1 to 8.

19. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the processing method of any one of claims 1 to 8.