Disclosure of Invention
The invention provides an intelligent retinal vessel image segmentation method based on GAN (generic object network), aiming at the problems of strong subjectivity and low efficiency in manual retinal vessel image segmentation and the problem of low segmentation accuracy in the existing method for performing retinal vessel segmentation by using a supervision method.
In order to solve the technical problems in the prior art, the technical scheme of the invention is as follows: a retinal vessel image intelligent segmentation method based on GAN comprises the following steps:
in step S1, a retinal image sample set is given, containing a sample pair of a retinal image and a reference blood vessel segmentation image, defined herein as (a, b); defining a retina image corpus C { (a)i,bi)|i∈[1,R]R denotes the total number of samples, i denotes the number of samples, a denotes the retinal image, and b denotes the reference blood vessel segmentation image. Copying and dividing a retina image sample set into a retina image training set E { (a)i,bi)|i∈[1,M]And a retinal image test set F { (a) }i,bi)|i∈[1,N]Where N + M ═ R, and M and N respectively denote the corresponding number of samples.
Step S2, constructing a retina intelligent segmentation network model based on GAN, wherein the network model comprises a generator network G and a discriminator network D, and constructing an Adam optimizer to assist the network training to quickly converge:
the overall architecture of the generator network G includes two parts, a contracting path (contracting path) and an expanding path (expanding path). In order to utilize the characteristic diagram information in the network training process to a greater extent, the characteristic diagram extracted and processed on the network contraction path is spliced with the characteristic diagram in the expansion path with the same size in the process of upsampling. In addition, a cavity convolution structure is introduced into the bottom layer of network downsampling, and the structure can increase the receptive field of the generated network, so that the network can better grasp the global characteristics of the retinal vessel image, and accurate segmentation of the retinal vessel image is realized. The generator network G performs 4 downsampling, 4 upsampling, and 3 feature concatenating operations in total, and the selected feature map is the feature map after downsampling, so that the generator network performs only 3 feature map concatenating operations although downsampling is performed 4 times. The down-sampling operation used in the generator network G is performed using a convolution operation with a convolution kernel size of 3x3 steps of 2.
The discriminator network D is a deep convolutional neural network, and its main role is to judge whether the input blood vessel segmented image is a reference blood vessel segmented image or a blood vessel segmented image generated by the generator network G. A residual block ResBlock structure is also used in the discrimination network, the structure can prevent the over-fitting of the network while increasing the number of network layers, and the problem of difficult training is solved, so that the network can better capture image characteristics, and the network can be converged more quickly. In the discriminator network D, the convolution kernel size used is 3 × 3, and then the downsampling operation is performed using the maximum pooling layer maxPooling operation with step size of 2, highlighting the main features in the feature map.
It is noted that the generator countermeasure network is composed of a generator network G and a discriminator network D. The main process is that the generator network G continuously fits the distribution of the retina training set E, and inputs the retina training sample pair (a)i,bi) The generator network G generates a vessel segmentation image ziAnd obtaining a retina sample pair (a) generated by the generatori,zi). The discriminator network D simultaneously and respectively aligns the E sample pairs (a) of the retina training seti,bi) And the generated retina sample pair (a)i,zi) Discrimination is made wherein i is 1,2,3, …, M, each giving [0,1]The discrimination confidence q between the samples represents the summary of the sample pair as the retina training set E sample pairAnd (4) rate. Vessel segmentation sample z generated by loss functioniAnd a reference blood vessel segmentation sample biIn between, in preparation for further back propagation.
Constructing an Adam optimizer to assist network training, and setting an initial learning rate of 0.0002, beta1=0.5,β2The learning rate can be intelligently adjusted during the training process, so that the network can be converged quickly.
Step S3, loading the retina training set E into the computer memory as input, and setting the retina training set E { (a)i,bi)|i∈[1,M]And (4) randomly scrambling to prepare for the next training stage.
Steps S4, S5, and S6 are the main training phase for generating the countermeasure network, and the game problem of the discriminator network D and the generator network G in the generation countermeasure network can be considered as a maximum minimization problem, and the two networks understand the relationship of image mapping therein by learning the features of the retinal image to the blood vessel segmentation image. The objective function is shown in equation 1:
at each iteration, a pair of image pairs (a) in the retina training set E is extractedi,bi),aiRepresenting a retinal image, biA reference blood vessel segmentation image is shown, where i is 1,2,3, …, M.
In step S4, a is input from the generatoriGenerating a corresponding blood vessel segmentation image G (a)i) I.e. ziThe generator network G tries to minimize the objective function LcGAN(G, D), in order to make the output of the final objective function as small as possible, a vessel segmentation image z is generatediThe image b is segmented as much as possible in the image style, the vascular structure and the reference retinaiAs similar as possible.
In step S5, the discriminator network D attempts to discriminate the distribution of the retina training set E and the distribution of the retina training composite set E' so as to maximizeThe discriminant network D simultaneously and respectively processes the sample pairs (a) of the retina training set Ei,bi) And the sample pairs of the retinal training composition set E' (a)i,zi) Discrimination is made wherein i is 1,2,3, …, M, each giving [0,1]The discrimination confidence q between the samples q represents that the sample pair is a retina training set E sample pair (a)i,bi) The probability of (c).
Finally, the game equilibrium points of the discriminator network D and the generator network G are 'Nash equilibrium points', that is, the discriminator network D cannot judge that the input image sample pair is the sample pair (a) of the retina training set Ei,bi) Or the sample pair of the retina training synthetic set E' (a)i,zi) For a given retina, sample pairs (a) of the synthetic set E' are trainedi,zi) D confidence q for each output is 0.5. At this time, the distribution of the segmented image generated by the generator network G is fitted with the distribution of the reference blood vessel segmented image, so that the accurate mapping relation from the retina image to the blood vessel segmented image is learned, and the generated segmented image is the target image required by people. The process of the game can be considered as a maximum minimization process, which can be expressed as:
since the segmentation from the retina image into the blood vessel segmentation image is essentially a classification prediction of "black or white" for each pixel, which is actually a pixel-to-pixel classification task, the present invention additionally uses a class-two classification cross entropy loss in the generator network G to penalize the distance between the generated blood vessel segmentation image and the reference blood vessel segmentation image, so that the generated blood vessel segmentation image is more approximate to the reference blood vessel segmentation image. The class-two class cross entropy loss function is defined as follows:
where a is the retinal image, b is the reference vessel segmentation map, and G is the corresponding generator network G. In step S6, based on the loss functions given by equation (1) and equation (3), a total loss value at the current iteration number can be calculated. In order to minimize the loss value, a gradient value of each parameter in each step can be obtained by using a computation graph, and the whole function is close to a minimum value point by using a gradient updating method, so that the aim of fitting is fulfilled. The corresponding parameter updating formula is as follows:
wherein theta is
tParameters representing the t-th component in the generator network G and the discriminator network D, η represents the learning rate in the hyper-parameters,
representing the gradient of the corresponding parameter.
In step S7, the generator is evaluated using the retinal image test sample, and the optimal model parameters are retained as follows: at the end of the training phase, inputting the retina test set F into a generator network G, and generating a retina test synthesis set F' by using the retina test set F sample pairs (a)i,bi) Reference vessel segmentation map b in (1)iAnd the generated retina sample pair (a)i,zi) Generating a segmentation map z of (1)iA pixel-by-pixel alignment is performed, where i is 1,2,3, …, N, and each pixel is classified as a vascular point and a non-vascular point. In order to perform a performance test on the current generator network G, quantitative analysis needs to be objectively performed by a performance index. Indexes such as Accuracy (Accracy, Acc), Specificity (Sp), Sensitivity (Se), Dice coefficient, F-measure, Area (AUC) formed by a receiver operating characteristic curve (ROC) and a coordinate axis, and area (mAP) formed by an Accuracy-recall rate curve (PR curve) and the coordinate axis are adopted to measure the effectiveness of the model.
The AUC is more used for the performance measurement of medical image processing, and the closer the value of the AUC is to 1, the better the segmentation effect is.
Wherein, TP (true positive) is true positive and represents the number of correctly segmented blood vessels; TN (true negative) represents the number of correctly segmented non-blood vessels, namely background pixels; FP (false positive), that is, the number of pixel points of the blood vessel which is wrongly divided into non-blood vessels; FN (false negative), i.e., the number of pixels in the non-blood vessel is erroneously classified as blood vessel. TP + FN + FP + TN is the total number of pixel points in the region of interest in the image.
Since the above evaluation index depends on the threshold value of the output result, ROC data can be plotted by changing the true positive score (Sensitivity) and the false positive score (Specificity), and AUC is the area under the ROC curve. All evaluation indices are tests that are considered on all pixels within a mask, which represents the retinal optic disc region. And screening the indexes, and selecting the model with the largest indexes of Acc, Se, Sp, Precision, Recall and F-measure as the optimal model. And finally, judging at a parameter updating end stage, judging whether the training iteration number reaches the maximum iteration number, if so, ending the training stage to obtain an optimal generator network G 'and an optimal discriminator network D', and entering the next step. Otherwise, step S3 is entered for continuous loop iteration training.
The training image and the test image input in the training and testing process are the whole image. According to the invention, an Adam optimization method is adopted to carry out optimization training on the loss function, so that the final parameters for generating the confrontation network model are obtained, and the parameters can be continuously used in the subsequent retina segmentation task after being stored.
In step S8: a retina image sample set F1={ai|i∈[1,RF]A, taking each retinal image aiAs input in the optimal generator network G, the corresponding vessel segmentation image z is outputi,i=1,2,3,…,RFIn the formula, RFThe sample number of the retina image sample set is represented, and the final segmentation image has better accuracy.
The invention provides an intelligent retinal vessel image segmentation method based on a generation countermeasure network, which has the beneficial effects that:
the network is mainly characterized by being based on a countermeasure training mechanism, and has a larger receptive field, so that the global information of the image can be captured well. Compared with other segmentation networks, the network has deeper network layers and can better capture and utilize abstract features of images. The method achieves the advanced effects in the aspects of accuracy, sensitivity and specificity. In addition, good segmentation effect can be achieved in a blood vessel region and a lesion region with low contrast, the method achieves high precision and good robustness of retinal blood vessel segmentation, and has good value and prospect in practical application.
Detailed Description
For completeness and clarity of description of technical solutions in the embodiments of the present invention, the following detailed description will be further developed with reference to the accompanying drawings in the embodiments of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
As shown in fig. 1, the present invention provides a technical solution: a retinal vessel image intelligent segmentation method based on GAN comprises the following steps:
step S1: given a sample set of retinal images, a sample pair comprising a retinal image and a reference vessel segmentation image, defined herein as (a, b); defining a retina image corpus C { (a)i,bi)|i∈[1,R]R denotes the total number of samples, i denotes a sample index, a denotes a retina image, and b denotes a reference blood vessel segmentation image. Copying and dividing a retina image sample set into a retina image training set E { (a)i,bi)|i∈[1,M]And a retinal image test set F { (a) }i,bi)|i∈[1,N]Where N + M ═ R, and M and N respectively denote the corresponding number of samples.
In the invention, the main steps comprise data division, and generation of the antagonistic loss is carried out by combining the constructed generator network G and the discriminator network D, finally the generator network G and the discriminator network D reach the optimal balance point, and the accurate segmentation of the input retina image by using the generator network G is realized, and the relationship between the overall network structure and the loss function is shown in FIG. 2.
Step S2: combining characteristic up-down sampling, a residual block structure and cavity convolution operation to obtain a generator network G; designing a discriminator network D, and adding a discriminator network D with a large receptive field and an improved residual structure into the discriminator network D; the combination of the generator network G and the arbiter network D is finally referred to as the generation countermeasure network. And initializing network parameters of the generator network G and the discriminator network D by using an Adam optimization method in a Pythrch frame to obtain initial parameters of the generator network and the discriminator network, and setting related training hyper-parameters for training and optimizing the network model.
The generator network G is specifically constructed by combining the advantages of feature up-down sampling and a ResBlock structure, so that the network combines the shallow feature of the image while up-sampling, thereby more comprehensively utilizing the feature map information and preventing the network degradation problem in a deep network. In addition, the generation network uses the hole convolution operation, and the receptive field of the generation network is increased while the number of network parameters is not increased. The sizes of convolution kernels in the generated network are all 3x3, after each next sample operation, the output feature map is input into 2 residual blocks ResBlock to carry out jump connection operation, and the feature map obtained after 4 times of downsampling is subjected to hole convolution operation with the hole rates of 5 and 3 respectively to increase the receptive field of the network. In the process of up-sampling, the features extracted by the shallow layer network are spliced for operation, the network finally uses convolution operation of 1x1 to ensure that the number of color channels output by the network is the same as the number of input color channels, the network structure is shown in fig. 3, and an upward arrow in the legend represents deconvolution operation, BatchNormal operation and Relu activation operation, wherein the convolution kernel size is 3x3 and the step size is 2; "downward arrow" indicates the convolution operation, BatchNormalization operation, Relu activation operation with a convolution kernel size of 3 × 3 step size of 2; "thin arrow to the right" indicates the convolution operation, BatchNormalization operation, Relu activation operation with a convolution kernel size of 3 × 3 step size of 1; "Thick arrows to the right" indicate convolution operation, BatchNormalization operation, Relu activation operation with convolution kernel size 1 × 1 step size 1; the "feature map" represents an output image after a corresponding convolution operation, i.e., is a feature map; "dotted arrows" indicate that the corresponding output feature maps are subjected to a splicing operation, that is, based on the dimension of the color channel, the number of the color channels is spliced, for example, the feature map with the size of (m, h, w) and the feature map with the size of (n, h, w) are subjected to a splicing operation, and the obtained feature map has the size of (m + n, h, w); the "residual block" refers to an operation based on a residual block network, as shown in fig. 4, the arrows in the diagram do not contain any operation meaning, and only represent the operation flow, and the residual block respectively contains two convolution operations with convolution kernel size of 3 × 3 and step size of 1, a batch normalization operation, and a Relu activation operation, and when the final output is performed, the input original feature map and the output feature map of the last convolution are added and summed to obtain and output the feature map with the same number, height, and width of color channels as the input feature map; "hole residual block" is based on the residual block, except that two convolution kernels are respectively set to different hole rates, other processes are all the same as the "residual block", and a schematic diagram is shown in fig. 5.
The overall architecture of the generator network G includes two parts, a contracting path (contracting path) and an expanding path (expanding path). The contraction path represents an operation path based on the U shape, namely an operation path which is input from an image, subjected to four times of down-sampling, further subjected to a hole residual block, and then subjected to 4 times of up-sampling output; the extended path indicates a path of the splicing operation other than the path of the U-shape. In order to utilize the feature information in the network to a greater extent, the feature map of the features extracted on the network contraction path after up-sampling is spliced with the feature map in the expansion path with the same size.
In the generator network G, 4 downsampling operations, 4 upsampling operations and 3 feature splicing operations are performed in total, and the selected and spliced feature map is the feature map after downsampling. And the downsampling operations used in the generation network are all performed using convolution operations with a convolution kernel size of 3x3 steps of 2.
The construction of the discriminator network D is specifically that a ResBlock structure is introduced into the discriminator network D to prevent the degradation problem of the deep network, and the used hopping connection structure still enables the network to optimize learning under the condition of increasing the network depth, and simultaneously prevents the network degradation problem. The size of each layer of convolution kernel in the network is judged to be 3x3, the network is downsampled by carrying out Max boosting operation for 4 times in total, the network finally uses a full connection layer to classify vectors, and whether the image input into the judgment network is from a real image or an image generated by a generator is judged. The structure of the discrimination network is shown in fig. 6, wherein in the legend, "Scalar" represents a Scalar, that is, the output value of the final discriminator network D, and is between [0,1], which represents the confidence of the final discriminator network in the authenticity judgment of the input image; "feature map" means the output image after the convolution operation; "thin arrow to the right" indicates the convolution operation with a convolution kernel size of 3 × 3 steps of size 1, the batch normalization operation, the Relu activation operation; "Thick arrows to the right" indicate convolution operations, BatchNormalization operations, Relu activation operations with convolution kernel sizes of 3 × 3 steps of size 2; "thin dashed arrow to the right" represents a flattening operation that stretches the feature map of the multi-color channel into a one-dimensional vector; the "residual block" here is the same as the "residual block" in the generator network G; "maximum pooling layer" means the maximum downsampling operation with an operation kernel size of 2 x 2 steps of size 2, with the output image size being 1/2 of the original image in both height and width; "global average pooling" means a downsampling operation that operates on the average of kernel size to image size, with the output result such that the image sizes are averaged over all, (c, h, w) size images are "global average pooled" to (c, 1, 1), where c is the number of color channels, h is the height of the feature map, and w is the width of the feature map; "thick dotted arrow to the right" indicates a full join operation, i.e., the target vector is transformed into a matrix to obtain a result vector.
It is noted that the numbers in the upper left corner of all feature maps in the generator network (as shown in fig. 3) and the discriminator network (as shown in fig. 6) represent the number of color channels, corresponding to the number of color channels output after each pass of the "arrow operation". And the 'K' at the lower left corner of the feature map represents the size of the original image, and the size of the feature map is changed into 'K/p' after multiple times of down-sampling, which represents that the adopted size is K divided by the corresponding multiple p.
Meanwhile, an Adam optimization method is adopted in the Pythrch framework, and the hyper-parameters in the Pythrch framework are optimized in the training process, so that the initial parameters of the generator network G and the discriminator network D are assisted.
Step S3: loading the retina training set E into a memory, randomly disorganizing, and extracting a pair of retina training samples (a) for each trainingi,bi) Including the retinal image and the reference blood vessel segmentation image, to prepare for the next training.
Step S4: extracting a pair of samples (a) from a retina training set Ei,bi) I is 1,2,3, …, M. The retina image aiAnd loading and inputting the image into a generator network G, wherein the corresponding image size is (3 multiplied by h multiplied by w), 3 represents the number of color channels, the corresponding color channel d belongs to { red, green and blue, h represents the height of a single picture, and w represents the width of the single picture. Through the layer-by-layer calculation of the network, the generated retina blood vessel segmentation graph z is finally obtainediThe image size is (1 × h × w), and the corresponding gray-scale map is a single color channel, i.e., the image is represented by the gray-scale value size of the image according to the degree of the blood vessel in the image.
Step S5: the discriminant network D attempts to maximize the loss function by distinguishing the distribution of the retina training set E from the distribution of the retina training composite set E', and the discriminant network D simultaneously and separately processes the pairs of retina training set E samples (a)i,bi) And retina training synthetic set E' sample pairs (a)i,zi) Discrimination is made wherein i is 1,2,3, …, M, each giving [0,1]The discrimination confidence q between the samples q represents the probability that the sample pair is a retina training set E sample pair.
Step S6: and calculating the error between the generated retina training synthetic set E' and the retina training set E through loss function calculation to obtain a loss value. And performing backward propagation by using the obtained loss value, and respectively performing network parameter adjustment on the discriminator network D and the generator network G. And according to the given loss function, calculating the gradient of the parameters in the generator network G and the discriminator network D by using a chain type derivative method, and updating the corresponding parameters by using a random gradient descent method. The corresponding parameter updating formula is as follows:
wherein theta is
tParameters representing the t-th component in the generator network G and the discriminator network D, η represents the learning rate in the hyper-parameters,
representing the gradient of the corresponding parameter.
Step S7: the generators were evaluated using the retina test set F, with the optimal model parameters retained. And meanwhile, judging at the parameter updating end stage, judging whether the training iteration number reaches the maximum iteration number, if so, ending the training stage, and entering the next step. Otherwise, the training is continued, and the step S3 is continued to continue the loop iteration training.
At the end of the model training phase, inputting the retina test set F into a generator network G, and generating a retina test synthesis set F' by using the retina test set F sample pairs (a)i,bi) Reference vessel segmentation map b in (1)iAnd the generated retina sample pair (a)i,zi) Generating a segmentation map z of (1)iA pixel-by-pixel alignment is performed, where i is 1,2,3, …, N, and each pixel is classified as a vascular point and a non-vascular point. In order to perform a performance test on the current generator network G, quantitative analysis needs to be objectively performed by a performance index. Indexes such as Accuracy (Accracy, Acc), Specificity (Sp), Sensitivity (Se), Dice coefficient, F-measure, Area (AUC) formed by a receiver operating characteristic curve (ROC) and a coordinate axis, and area (mAP) formed by an Accuracy-recall rate curve (PR curve) and a coordinate axis are adopted to measure the text modelThe effectiveness of the model.
The AUC is more used for the performance measurement of medical image processing, and the closer the value of the AUC is to 1, the better the segmentation effect is.
Wherein, TP (true positive) is true positive and represents the number of correctly segmented blood vessels; TN (true negative) represents the number of correctly segmented non-blood vessels, namely background pixels; FP (false positive), that is, the number of pixel points of the blood vessel which is wrongly divided into non-blood vessels; FN (false negative), i.e., the number of pixels in the non-blood vessel is erroneously classified as blood vessel. TP + FN + FP + TN is the total number of pixel points in the region of interest in the image.
Since the above evaluation index depends on the threshold value of the output result, ROC data can be plotted by changing the true positive score (Sensitivity) and the false positive score (Specificity), and AUC is the area under the ROC curve. All evaluation indices are tests that are considered on all pixels within a mask, which represents the retinal optic disc region. And screening the indexes, and selecting the model with the largest indexes of Acc, Se, Sp, Precision, Recall and F-measure as the optimal model. And finally, judging at a parameter updating end stage, judging whether the training iteration number reaches the maximum iteration number, if so, ending the training stage to obtain an optimal generator network G 'and an optimal discriminator network D', and entering the next step. Otherwise, step S3 is entered for continuous loop iteration training.
Step S8: and testing the test sample image by using the trained generator network G, inputting an original retinal image, and correspondingly outputting a retinal blood vessel segmentation image.
A retina image sample set F1={ai|i∈[1,RF]A, taking each retinal image aiAs input in the optimal generator network G, the corresponding vessel segmentation image z is outputi,i=1,2,3,…,RFWhere RF represents the number of samples of the retinal image sample set. The segmentation effect of the present invention on a sample set of retinal images is shown in fig. 7.
In summary, the invention adopts a GAN-based retinal vessel image intelligent segmentation method, and the generated network uses a residual error structure and a hole convolution operation, so that the network increases the receptive field of the network without introducing additional parameters, the network captures the characteristics of the image more comprehensively, and the retinal segmentation task can be completed better. In addition, a feature stitching operation in the expanded path of the network is generated, which enables the network to better exploit the shallow and deep features of the image for segmentation tasks. In the discrimination network, the structure of a residual error module is added, so that the problem of network degradation caused by the deepening of the network layer number is avoided, the discrimination capability of a deep network is better utilized, and the supervision capability of the network is enhanced.
It will be appreciated by persons skilled in the art that the invention is not limited to details of the foregoing embodiments, and that the invention can be embodied in other specific forms without departing from the spirit or scope of the invention. In addition, various modifications and alterations of this invention may be made by those skilled in the art without departing from the spirit and scope of this invention, and such modifications and alterations should also be viewed as being within the scope of this invention.