CN112861881A

CN112861881A - Honeycomb lung recognition method based on improved MobileNet model

Info

Publication number: CN112861881A
Application number: CN202110252132.3A
Authority: CN
Inventors: 李钢; 张玲; 李晶; 张海轩; 李宇; 李鹏博; 鄂林宁
Original assignee: Taiyuan University of Technology
Current assignee: Taiyuan University of Technology
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2021-05-28

Abstract

The invention discloses a cellular lung identification method based on an improved MobileNet model, and the research relates to the technical field of image processing. The method includes: acquiring a cellular sign CT image data set, performing data labeling and preprocessing on the data set; building a classification and identification model; calculating The error loss value between the model output and the original data; update the parameters of the neural network model according to the error loss value; traverse all training samples, complete the training and learning of the neural network model, build a honeycomb lung CT image recognition model, and use the evaluation index to obtain the recognition performance of the model The invention accurately obtains the recognition and classification results of the honeycomb lung CT images under the premise of computer assistance, and improves the classification and recognition accuracy and the overall performance of the model.

Description

Honeycomb lung recognition method based on improved MobileNet model

Technical Field

The invention belongs to the technical field of image processing, and relates to a honeycomb lung recognition method based on an improved MobileNet model.

Background

Cellulite is a major histopathological feature of idiopathic pulmonary fibrosis, and is the development of pulmonary alterations characterized by cellulite in a variety of advanced lung diseases. In recent years, the incidence and prevalence of interstitial lung diseases are increasing year by year due to environmental deterioration, and the diseases have high fatality rate and poor clinical prognosis, and the 5-year survival rate is less than 30%. When the honeycomb lung disease is diagnosed, a relevant honeycomb characteristic CT image can be obtained through biopsy, and medical personnel can clearly observe the distribution range or the slight change of the honeycomb lung through the CT image.

The traditional medical image recognition method is to use manual extraction of features, and most of the common methods are feature extraction based on textures, shapes and colors, and the features are classified by using classifiers such as a support vector machine and a random forest. However, the problems of low resolution, obvious noise, irregular focus shape and the like exist in the honeycomb lung CT image, so that the traditional identification method has insufficient extraction characteristics and low efficiency, the generalization capability and robustness are poor, and the identification and classification accuracy of the image is influenced.

In recent years, with the development of scientific technology, deep learning technology plays an important role in image classification due to its strong feature extraction capability. Compared with manual identification, the deep learning technology has strong objective ability and high identification speed, can process a large amount of image data in a short time, can help doctors to make better judgment, and reduces the burden of the doctors. However, the existing deep learning technology still has the problem of low classification and identification accuracy for the honeycomb lung image data set.

Disclosure of Invention

The invention overcomes the defects of the prior art, provides a honeycomb lung identification method based on an improved MobileNet model, and identifies and classifies a honeycomb lung CT image data set through multi-scale feature fusion and an improved depth separable convolution module.

In order to achieve the above object, the present invention is achieved by the following technical solutions.

A honeycomb lung recognition method based on an improved MobileNet model comprises the following steps:

a) CT images of normal people and patients in different age groups are collected and obtained, data sets are generated by the CT images and the normal people and the patients, and data labeling and preprocessing are carried out on the honeycomb lung CT images in the data sets.

b) And performing data expansion on the preprocessed cellular lung CT image data set, and dividing the data expansion into a training set and a verification set according to a preset proportion.

c) Constructing a network model based on improved MobileNet, and obtaining the output of the neural network model through a training process; the improved network model of the MobileNet automatically extracts the feature information in the cellular lung CT image by using the cavity convolution with different expansion rates, enlarges the receptive field of feature extraction on the basis of not losing the feature information, sends the feature information of different layers into a feature extraction module for channel splicing to obtain a feature fusion vector, then realizes the fusion of the feature information by splicing a plurality of kinds of feature information obtained after convolution operation through the channel, and finally keeps the feature information of each channel by using a Sigmid linear activation function.

d) Updating parameters of the network model according to loss errors between predicted values and real values of the identified and classified network model; the loss error is obtained by using a cross entropy loss function, and the calculation formula of the loss error is as follows:

wherein J (theta) is ginsengPartial derivatives of the number θ; y is⁽ⁱ⁾For the ith sample x⁽ⁱ⁾The label of (1); m is the number of samples; h is_θ(. is the probability of the sample prediction being correct.

e) And testing the verification set by adopting the network model of the MobileNet after the parameters are updated, and obtaining the overall performance of the network model through evaluation indexes.

f) And inputting the CT image to be predicted into the network model of the MobileNet after the parameters are updated to obtain a prediction recognition result.

Preferably, in step b, the preprocessed honeycomb lung CT image is subjected to data expansion and then to normalization processing.

Preferably, the data expansion is to process the preprocessed data in one or any combination of inversion, translation, cutting and scaling.

Preferably, the preprocessing is used for image processing through mean normalization and image denoising methods, and meanwhile, the data set is manually classified for distinguishing a normal image from a lesion image.

Preferably, the CT images subjected to preprocessing and data expansion are summarized to construct a honeycomb lung CT image data set; and meanwhile, dividing the data set D into k mutually exclusive subsets with similar sizes by using a cross-validation method, then using a union set of (k-1) subsets as a training set, and using the rest subsets as a test set.

Preferably, in the step d, parameters of the network model are updated by using an Adam algorithm according to the loss error.

Compared with the prior art, the invention has the beneficial effects that.

The invention constructs the automatic identification classification model by adopting the method of multi-scale feature fusion and improved depth separable convolution module, automatically classifies the honeycomb lung CT image, and improves the classification identification accuracy and the overall performance of the model.

Drawings

In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention more clearly understood, the following drawings are taken for illustration:

fig. 1 is a flowchart of a method for automatically classifying a cellular lung CT image dataset based on an improved MobileNet network model according to an embodiment of the present invention.

Fig. 2 is a system diagram illustrating a method for automatically classifying a cellular lung CT image dataset based on an improved MobileNet network model according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail with reference to the embodiments and the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. The technical solution of the present invention is described in detail below with reference to the embodiments and the drawings, but the scope of protection is not limited thereto.

A honeycomb lung recognition method based on an improved MobileNet model specifically comprises the following steps:

step S1: CT images of normal people and patients in different age groups are collected and obtained, data sets are generated by the CT images and the normal people and the patients, and data labeling and preprocessing are carried out on the honeycomb lung CT images in the data sets.

Step S2: and performing data expansion on the preprocessed cellular lung CT image data set, and dividing the data expansion into a training set and a verification set according to a preset proportion.

Step S3: and constructing a classification recognition model based on a MobileNet network to train the honeycomb lung CT image training set.

Step S4: in order to improve the identification accuracy of the model and accelerate the model training speed, the constructed network model is improved to improve the identification accuracy of the network model.

Step S5: and updating parameters of the neural network model according to the loss error between the predicted value and the true value of the recognition classification network model, so that the model recognition accuracy is improved.

Step S6: and training the marked test set by using the classification recognition model, and acquiring the recognition accuracy and the overall performance of the model according to the evaluation index.

Step S7: and inputting the prediction picture into the recognition classification network model to obtain a prediction result.

In step S1, a honeycomb CT image data set is obtained, which includes a normal lung CT image and a honeycomb lung lesion CT image, and because the original honeycomb lung CT image has characteristics of high noise, low contrast, and a shape change of a segmented target, an image enhancement method is required to be used to preprocess the original image, and methods such as mean normalization and image denoising are used for image processing, and in addition, the data set needs to be manually classified to distinguish a normal image from a lesion image.

In step S2, because the data volume of the cellular lung CT image is limited, the data set needs to be expanded on the basis, and here, a means of data expansion is used to obtain a rotation map and a mirror image of the original CT image by performing methods such as inversion, translation, shearing, or scaling on the image in the original data set to effectively expand the data volume, so as to provide data security for training the classification recognition model, and then the CT images subjected to preprocessing and data expansion are summarized to construct the cellular lung CT image data set. And simultaneously, dividing the data set D into k mutually exclusive subsets with similar sizes by using a cross-validation method, wherein each subset is required to keep the consistency of data distribution as much as possible. Then the union of (k-1) subsets is used as the training set, and the rest subsets are used as the test set. This results in k training/test sets, allowing k training and tests to be performed, and ultimately returning the mean of the k test results. Specifically, 90% of the samples in the honeycomb lung CT image data set formed in step S2 are randomly selected as a training set, and the remaining 10% of the samples are used as a test set, so as to perform a classification test.

In step S3, a classification recognition model based on MobileNet is constructed, where the core idea of the MobileNet network is that a depth separable convolution module in the network model decomposes standard convolution into a depth convolution and a point-by-point convolution. The deep convolution is a filtering stage of the deep separable convolution, and each channel is subjected to convolution operation corresponding to a convolution kernel; the point-by-point convolution is a combined stage of deep separable convolution, a plurality of characteristic diagram information are integrated to be output in series, and the combination of the two stages realizes the separation of a channel and a space, so that the parameter quantity required by model training is reduced, the model training speed is accelerated, more characteristic information is transmitted in a network, and the identification and classification accuracy of the model is improved. The depth separable convolution has certain difference with the standard convolution, the depth convolution and the point-by-point convolution in structure, the depth separable convolution reduces a large amount of calculation and improves the classification performance of the system, wherein the standard convolution calculation process is shown as the formula (1):

D_K×D_K×M×N×D_F×D_F (1)

the depth separable convolution expression is shown in equation (2):

D_K×D_K×M×D_F+M×N×D_F×D_F (2)

d in the above two formulas_KIs the size of the convolution kernel, M is the number of input channels, N is the number of convolution kernels, D_FFor the input, the ratio expression of the calculated amount between the two is shown as formula (3):

as can be seen from the above formula that the ratio is less than 1, the computation amount required for the deep separable convolution is less than that of the standard convolution, and thus the resource consumption of the computer can be reduced to some extent.

In step S4, because the obtained honeycomb lung CT images have different sizes, the different sizes of the receptive fields are obtained through convolution kernels with different sizes, wherein the feature located at the lower layer has higher resolution and contains more detailed information, but too few convolution layers are passed through, which results in too much irrelevant information and too much noise in the obtained image; high-level features have stronger semantic information, but the resolution is too low, and the perception capability of details is poorer, so that the network loses global or local information by extracting different feature information from different receptive fields, and the classification of a model to a target is hindered. By using the cavity convolution with different expansion rates, the feature information in the cellular lung CT image is automatically extracted, the receptive field of feature extraction is expanded on the basis of not losing the feature information, and the feature information of different layers is sent to a feature extraction module for channel splicing to obtain a feature fusion vector. And (3) carrying out channel splicing on the 4 kinds of feature information obtained after convolution, wherein the multi-scale feature fusion and calculation process is shown as the formula (4):

F＝[F₁,F₂,F₃,F₄] (4)

wherein F is the fused feature vector, F_iEach of (i ═ 1,2,3, and 4) is a feature vector of 4 different levels obtained by hole convolution.

Since deep convolution in a MobileNet network has no ability to change channels, its extracted features are single-channel, and the ReLU activation function may cause loss of information when a convolutional layer with a small number of channels performs output operations. Therefore, in order to ensure the recognition accuracy of model training, the characteristic information of each channel is preserved by using a Sigmid linear activation function instead of the ReLU activation function.

In step S5, a loss value is obtained by using a cross entropy loss function, and the calculation is as shown in formula (5):

wherein J (θ) is the partial derivative of the parameter θ; y is⁽ⁱ⁾For the ith sample x⁽ⁱ⁾The label of (1); m is the number of samples; h is_θ(. is the probability of the sample prediction being correct.

According to the loss value, parameters of the network model are updated by using an Adam algorithm, wherein the Adam algorithm is an algorithm based on adaptive low-order moment estimation and can perform first-order gradient optimization on a random objective function. The Adam algorithm is easy to implement, and has extremely high computational efficiency and low memory requirement. The diagonal scaling of the Adam algorithm gradient is invariant and therefore well suited to solving problems with large scale data or parameters.

In step S6, the evaluation index for training the deep learning model includes: accuracy (accuracy), sensitivity (sensitivity), specificity (specificity) and F1 value (F1-score) as shown by the formula (6):

wherein TP, FP, FN and TN indicate the number of true positive, false negative and true negative, respectively.

In step S7, the prediction picture is inputted into the improved classification model according to claim 1 to obtain the prediction result, so as to assist the physician to make a diagnosis.

While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. a honeycomb lung identification method based on improving MobileNet model, is characterized in that, comprises the following steps:

a) Collect CT images of normal people and patients in different age groups, use them to generate a dataset, and perform data labeling and preprocessing on the honeycomb lung CT images in the dataset;

b) Data expansion is performed on the preprocessed honeycomb lung CT image dataset, and divided into a training set and a validation set according to a preset ratio;

c) construct the network model based on the improved MobileNet, obtain the output of the neural network model through the training process; the network model of the described improved MobileNet is to automatically extract the feature information in the cellular lung CT image by using the hole convolution of different expansion rates, Expand the receptive field of feature extraction without losing feature information, and send feature information of different levels into the feature extraction module for channel splicing to obtain feature fusion vectors, and then pass the various feature information obtained after the convolution operation through the Channel splicing realizes the fusion of feature information, and finally the feature information of each channel is preserved by using the sigmid linear activation function;

d) Update the parameters of the network model according to the loss error between the predicted value and the real value of the identification classification network model, the loss error is obtained by using the cross entropy loss function, and the calculation formula of the loss error is:

where J(θ) is the partial derivative of the parameter θ; y ⁽ⁱ⁾ is the label of the i-th sample x(i); m is the number of samples; hθ(*) is the correct probability of sample prediction;

e) adopt the network model of MobileNet after parameter updating to test the verification set, and obtain the overall performance of the network model through the evaluation index;

f) Obtain the prediction and recognition result from the MobileNet network model after updating the CT image input parameters to be predicted.

2. A kind of honeycomb lung identification method based on improved MobileNet model according to claim 1, is characterized in that, in step b, after carrying out data expansion on the preprocessed honeycomb lung CT image, standardization processing is carried out again.

3. a kind of honeycomb lung identification method based on improved MobileNet model according to claim 1 and 2, is characterized in that, described data expansion is to adopt inversion, translation, shearing, scaling to the data after preprocessing one or any combination.

4. a kind of honeycomb lung identification method based on improved MobileNet model according to claim 1, is characterized in that, described preprocessing is to be used for image processing by mean value normalization and image denoising method, simultaneously to data set. Manual classification was used to distinguish normal images from lesion images.

5. a kind of honeycomb lung identification method based on improved MobileNet model according to claim 1, is characterized in that, the CT image through preprocessing and data expansion is collected, builds honeycomb lung CT image data set; Use cross-validation simultaneously The method divides the dataset D into k mutually exclusive subsets of similar size, and then uses the union of (k-1) subsets as the training set and the remaining subsets as the test set.

6. A kind of cellular lung identification method based on improved MobileNet model according to claim 1, is characterized in that, in described step d, according to loss error, use Adam algorithm to update the parameter of network model.