CN110807762B

CN110807762B - An intelligent segmentation method of retinal blood vessel images based on GAN

Info

Publication number: CN110807762B
Application number: CN201910884346.5A
Authority: CN
Inventors: 赵汉理; 卢望龙; 邱夏青; 黄辉
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2021-07-06
Anticipated expiration: 2039-09-19
Also published as: CN110807762A

Abstract

The invention discloses an intelligent segmentation method of retinal blood vessel images based on GAN. Optimizer; 3. Input the training set to G; 4. G generate blood vessel segmentation images; 5. D discriminate and calculate the segmented images generated by G; 6. Update the parameters of G and D; 7. Evaluate G and obtain the optimal Model G', repeat steps 3-7 until the iteration ends; 8. Input the retinal image into G' to generate a blood vessel segmentation image. The invention uses the large receptive field network model to intelligently segment the retinal image to obtain the final retinal blood vessel segmented image. The network model of the invention has better robustness, and the obtained blood vessel segmentation image contains less noise, which is generally better than the existing retina blood vessel image segmentation method.

Description

Intelligent retinal blood vessel image segmentation method based on GAN

Technical Field

The invention belongs to the field of intelligent segmentation of retinal vessel images, and particularly relates to an intelligent segmentation method of a retinal vessel image based on GAN, which better solves the problem that the accuracy rate of the conventional retinal vessel image segmentation method is low.

Background

In clinical medicine, doctors often analyze ophthalmology and some systemic diseases, such as diabetes, glaucoma, hypertension, cardiovascular and cerebrovascular diseases, and the like by observing morphological characteristics of retinal images. The occurrence of these diseases generally affects the human retinal morphology, such as information affecting the number, branching, width, and angle of retinal blood vessels. Therefore, the realization of the segmentation of the color retinal image becomes an important condition for disease judgment of ophthalmologists, but the segmentation of the color retinal image by using the manual method is time-consuming and labor-consuming, and the segmentation result is also influenced by the experience of operators and the segmentation technology. The method has strong subjectivity and low repeatability, so that the method has important medical research significance and application value for realizing the intelligent and accurate segmentation of the retinal vessel image. With the continuous development of computer-aided diagnosis systems in medicine, intelligent segmentation of retinas also becomes a research hotspot at present.

Disclosure of Invention

The invention provides an intelligent retinal vessel image segmentation method based on GAN (generic object network), aiming at the problems of strong subjectivity and low efficiency in manual retinal vessel image segmentation and the problem of low segmentation accuracy in the existing method for performing retinal vessel segmentation by using a supervision method.

In order to solve the technical problems in the prior art, the technical scheme of the invention is as follows: a retinal vessel image intelligent segmentation method based on GAN comprises the following steps:

in step S1, a retinal image sample set is given, containing a sample pair of a retinal image and a reference blood vessel segmentation image, defined herein as (a, b); defining a retina image corpus C { (a)_i，b_i)|i∈[1，R]R denotes the total number of samples, i denotes the number of samples, a denotes the retinal image, and b denotes the reference blood vessel segmentation image. Copying and dividing a retina image sample set into a retina image training set E { (a)_i，b_i)|i∈[1，M]And a retinal image test set F { (a) }_i，b_i)|i∈[1，N]Where N + M ═ R, and M and N respectively denote the corresponding number of samples.

Step S2, constructing a retina intelligent segmentation network model based on GAN, wherein the network model comprises a generator network G and a discriminator network D, and constructing an Adam optimizer to assist the network training to quickly converge:

the overall architecture of the generator network G includes two parts, a contracting path (contracting path) and an expanding path (expanding path). In order to utilize the characteristic diagram information in the network training process to a greater extent, the characteristic diagram extracted and processed on the network contraction path is spliced with the characteristic diagram in the expansion path with the same size in the process of upsampling. In addition, a cavity convolution structure is introduced into the bottom layer of network downsampling, and the structure can increase the receptive field of the generated network, so that the network can better grasp the global characteristics of the retinal vessel image, and accurate segmentation of the retinal vessel image is realized. The generator network G performs 4 downsampling, 4 upsampling, and 3 feature concatenating operations in total, and the selected feature map is the feature map after downsampling, so that the generator network performs only 3 feature map concatenating operations although downsampling is performed 4 times. The down-sampling operation used in the generator network G is performed using a convolution operation with a convolution kernel size of 3x3 steps of 2.

The discriminator network D is a deep convolutional neural network, and its main role is to judge whether the input blood vessel segmented image is a reference blood vessel segmented image or a blood vessel segmented image generated by the generator network G. A residual block ResBlock structure is also used in the discrimination network, the structure can prevent the over-fitting of the network while increasing the number of network layers, and the problem of difficult training is solved, so that the network can better capture image characteristics, and the network can be converged more quickly. In the discriminator network D, the convolution kernel size used is 3 × 3, and then the downsampling operation is performed using the maximum pooling layer maxPooling operation with step size of 2, highlighting the main features in the feature map.

It is noted that the generator countermeasure network is composed of a generator network G and a discriminator network D. The main process is that the generator network G continuously fits the distribution of the retina training set E, and inputs the retina training sample pair (a)_i，b_i) The generator network G generates a vessel segmentation image z_iAnd obtaining a retina sample pair (a) generated by the generator_i，z_i). The discriminator network D simultaneously and respectively aligns the E sample pairs (a) of the retina training set_i，b_i) And the generated retina sample pair (a)_i，z_i) Discrimination is made wherein i is 1,2,3, …, M, each giving [0,1]The discrimination confidence q between the samples represents the summary of the sample pair as the retina training set E sample pairAnd (4) rate. Vessel segmentation sample z generated by loss function_iAnd a reference blood vessel segmentation sample b_iIn between, in preparation for further back propagation.

Constructing an Adam optimizer to assist network training, and setting an initial learning rate of 0.0002, beta₁＝0.5，β₂The learning rate can be intelligently adjusted during the training process, so that the network can be converged quickly.

Step S3, loading the retina training set E into the computer memory as input, and setting the retina training set E { (a)_i，b_i)|i∈[1，M]And (4) randomly scrambling to prepare for the next training stage.

Steps S4, S5, and S6 are the main training phase for generating the countermeasure network, and the game problem of the discriminator network D and the generator network G in the generation countermeasure network can be considered as a maximum minimization problem, and the two networks understand the relationship of image mapping therein by learning the features of the retinal image to the blood vessel segmentation image. The objective function is shown in equation 1:

at each iteration, a pair of image pairs (a) in the retina training set E is extracted_i，b_i)，a_iRepresenting a retinal image, b_iA reference blood vessel segmentation image is shown, where i is 1,2,3, …, M.

In step S4, a is input from the generator_iGenerating a corresponding blood vessel segmentation image G (a)_i) I.e. z_iThe generator network G tries to minimize the objective function L_cGAN(G, D), in order to make the output of the final objective function as small as possible, a vessel segmentation image z is generated_iThe image b is segmented as much as possible in the image style, the vascular structure and the reference retina_iAs similar as possible.

In step S5, the discriminator network D attempts to discriminate the distribution of the retina training set E and the distribution of the retina training composite set E' so as to maximizeThe discriminant network D simultaneously and respectively processes the sample pairs (a) of the retina training set E_i，b_i) And the sample pairs of the retinal training composition set E' (a)_i，z_i) Discrimination is made wherein i is 1,2,3, …, M, each giving [0,1]The discrimination confidence q between the samples q represents that the sample pair is a retina training set E sample pair (a)_i，b_i) The probability of (c).

Finally, the game equilibrium points of the discriminator network D and the generator network G are 'Nash equilibrium points', that is, the discriminator network D cannot judge that the input image sample pair is the sample pair (a) of the retina training set E_i，b_i) Or the sample pair of the retina training synthetic set E' (a)_i，z_i) For a given retina, sample pairs (a) of the synthetic set E' are trained_i，z_i) D confidence q for each output is 0.5. At this time, the distribution of the segmented image generated by the generator network G is fitted with the distribution of the reference blood vessel segmented image, so that the accurate mapping relation from the retina image to the blood vessel segmented image is learned, and the generated segmented image is the target image required by people. The process of the game can be considered as a maximum minimization process, which can be expressed as:

since the segmentation from the retina image into the blood vessel segmentation image is essentially a classification prediction of "black or white" for each pixel, which is actually a pixel-to-pixel classification task, the present invention additionally uses a class-two classification cross entropy loss in the generator network G to penalize the distance between the generated blood vessel segmentation image and the reference blood vessel segmentation image, so that the generated blood vessel segmentation image is more approximate to the reference blood vessel segmentation image. The class-two class cross entropy loss function is defined as follows:

where a is the retinal image, b is the reference vessel segmentation map, and G is the corresponding generator network G. In step S6, based on the loss functions given by equation (1) and equation (3), a total loss value at the current iteration number can be calculated. In order to minimize the loss value, a gradient value of each parameter in each step can be obtained by using a computation graph, and the whole function is close to a minimum value point by using a gradient updating method, so that the aim of fitting is fulfilled. The corresponding parameter updating formula is as follows:

wherein theta is_tParameters representing the t-th component in the generator network G and the discriminator network D, η represents the learning rate in the hyper-parameters,

representing the gradient of the corresponding parameter.

In step S7, the generator is evaluated using the retinal image test sample, and the optimal model parameters are retained as follows: at the end of the training phase, inputting the retina test set F into a generator network G, and generating a retina test synthesis set F' by using the retina test set F sample pairs (a)_i，b_i) Reference vessel segmentation map b in (1)_iAnd the generated retina sample pair (a)_i，z_i) Generating a segmentation map z of (1)_iA pixel-by-pixel alignment is performed, where i is 1,2,3, …, N, and each pixel is classified as a vascular point and a non-vascular point. In order to perform a performance test on the current generator network G, quantitative analysis needs to be objectively performed by a performance index. Indexes such as Accuracy (Accracy, Acc), Specificity (Sp), Sensitivity (Se), Dice coefficient, F-measure, Area (AUC) formed by a receiver operating characteristic curve (ROC) and a coordinate axis, and area (mAP) formed by an Accuracy-recall rate curve (PR curve) and the coordinate axis are adopted to measure the effectiveness of the model.

The AUC is more used for the performance measurement of medical image processing, and the closer the value of the AUC is to 1, the better the segmentation effect is.

Wherein, TP (true positive) is true positive and represents the number of correctly segmented blood vessels; TN (true negative) represents the number of correctly segmented non-blood vessels, namely background pixels; FP (false positive), that is, the number of pixel points of the blood vessel which is wrongly divided into non-blood vessels; FN (false negative), i.e., the number of pixels in the non-blood vessel is erroneously classified as blood vessel. TP + FN + FP + TN is the total number of pixel points in the region of interest in the image.

Since the above evaluation index depends on the threshold value of the output result, ROC data can be plotted by changing the true positive score (Sensitivity) and the false positive score (Specificity), and AUC is the area under the ROC curve. All evaluation indices are tests that are considered on all pixels within a mask, which represents the retinal optic disc region. And screening the indexes, and selecting the model with the largest indexes of Acc, Se, Sp, Precision, Recall and F-measure as the optimal model. And finally, judging at a parameter updating end stage, judging whether the training iteration number reaches the maximum iteration number, if so, ending the training stage to obtain an optimal generator network G 'and an optimal discriminator network D', and entering the next step. Otherwise, step S3 is entered for continuous loop iteration training.

The training image and the test image input in the training and testing process are the whole image. According to the invention, an Adam optimization method is adopted to carry out optimization training on the loss function, so that the final parameters for generating the confrontation network model are obtained, and the parameters can be continuously used in the subsequent retina segmentation task after being stored.

In step S8: a retina image sample set F₁＝{a_i|i∈[1，R_F]A, taking each retinal image a_iAs input in the optimal generator network G, the corresponding vessel segmentation image z is output_i，i＝1，2，3，…，R_FIn the formula, R_FThe sample number of the retina image sample set is represented, and the final segmentation image has better accuracy.

The invention provides an intelligent retinal vessel image segmentation method based on a generation countermeasure network, which has the beneficial effects that:

the network is mainly characterized by being based on a countermeasure training mechanism, and has a larger receptive field, so that the global information of the image can be captured well. Compared with other segmentation networks, the network has deeper network layers and can better capture and utilize abstract features of images. The method achieves the advanced effects in the aspects of accuracy, sensitivity and specificity. In addition, good segmentation effect can be achieved in a blood vessel region and a lesion region with low contrast, the method achieves high precision and good robustness of retinal blood vessel segmentation, and has good value and prospect in practical application.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is an overall framework of the invention;

FIG. 3 is a generator architecture diagram of the present invention;

FIG. 4 is a block diagram of a Residual block used in the present invention;

FIG. 5 is a block diagram of a scaled residual block of the present invention;

FIG. 6 is a discriminator architecture diagram of the present invention;

fig. 7 is a diagram of the final segmentation effect of the present invention.

Detailed Description

For completeness and clarity of description of technical solutions in the embodiments of the present invention, the following detailed description will be further developed with reference to the accompanying drawings in the embodiments of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As shown in fig. 1, the present invention provides a technical solution: a retinal vessel image intelligent segmentation method based on GAN comprises the following steps:

step S1: given a sample set of retinal images, a sample pair comprising a retinal image and a reference vessel segmentation image, defined herein as (a, b); defining a retina image corpus C { (a)_i，b_i)|i∈[1，R]R denotes the total number of samples, i denotes a sample index, a denotes a retina image, and b denotes a reference blood vessel segmentation image. Copying and dividing a retina image sample set into a retina image training set E { (a)_i，b_i)|i∈[1，M]And a retinal image test set F { (a) }_i，b_i)|i∈[1，N]Where N + M ═ R, and M and N respectively denote the corresponding number of samples.

In the invention, the main steps comprise data division, and generation of the antagonistic loss is carried out by combining the constructed generator network G and the discriminator network D, finally the generator network G and the discriminator network D reach the optimal balance point, and the accurate segmentation of the input retina image by using the generator network G is realized, and the relationship between the overall network structure and the loss function is shown in FIG. 2.

Step S2: combining characteristic up-down sampling, a residual block structure and cavity convolution operation to obtain a generator network G; designing a discriminator network D, and adding a discriminator network D with a large receptive field and an improved residual structure into the discriminator network D; the combination of the generator network G and the arbiter network D is finally referred to as the generation countermeasure network. And initializing network parameters of the generator network G and the discriminator network D by using an Adam optimization method in a Pythrch frame to obtain initial parameters of the generator network and the discriminator network, and setting related training hyper-parameters for training and optimizing the network model.

The generator network G is specifically constructed by combining the advantages of feature up-down sampling and a ResBlock structure, so that the network combines the shallow feature of the image while up-sampling, thereby more comprehensively utilizing the feature map information and preventing the network degradation problem in a deep network. In addition, the generation network uses the hole convolution operation, and the receptive field of the generation network is increased while the number of network parameters is not increased. The sizes of convolution kernels in the generated network are all 3x3, after each next sample operation, the output feature map is input into 2 residual blocks ResBlock to carry out jump connection operation, and the feature map obtained after 4 times of downsampling is subjected to hole convolution operation with the hole rates of 5 and 3 respectively to increase the receptive field of the network. In the process of up-sampling, the features extracted by the shallow layer network are spliced for operation, the network finally uses convolution operation of 1x1 to ensure that the number of color channels output by the network is the same as the number of input color channels, the network structure is shown in fig. 3, and an upward arrow in the legend represents deconvolution operation, BatchNormal operation and Relu activation operation, wherein the convolution kernel size is 3x3 and the step size is 2; "downward arrow" indicates the convolution operation, BatchNormalization operation, Relu activation operation with a convolution kernel size of 3 × 3 step size of 2; "thin arrow to the right" indicates the convolution operation, BatchNormalization operation, Relu activation operation with a convolution kernel size of 3 × 3 step size of 1; "Thick arrows to the right" indicate convolution operation, BatchNormalization operation, Relu activation operation with convolution kernel size 1 × 1 step size 1; the "feature map" represents an output image after a corresponding convolution operation, i.e., is a feature map; "dotted arrows" indicate that the corresponding output feature maps are subjected to a splicing operation, that is, based on the dimension of the color channel, the number of the color channels is spliced, for example, the feature map with the size of (m, h, w) and the feature map with the size of (n, h, w) are subjected to a splicing operation, and the obtained feature map has the size of (m + n, h, w); the "residual block" refers to an operation based on a residual block network, as shown in fig. 4, the arrows in the diagram do not contain any operation meaning, and only represent the operation flow, and the residual block respectively contains two convolution operations with convolution kernel size of 3 × 3 and step size of 1, a batch normalization operation, and a Relu activation operation, and when the final output is performed, the input original feature map and the output feature map of the last convolution are added and summed to obtain and output the feature map with the same number, height, and width of color channels as the input feature map; "hole residual block" is based on the residual block, except that two convolution kernels are respectively set to different hole rates, other processes are all the same as the "residual block", and a schematic diagram is shown in fig. 5.

The overall architecture of the generator network G includes two parts, a contracting path (contracting path) and an expanding path (expanding path). The contraction path represents an operation path based on the U shape, namely an operation path which is input from an image, subjected to four times of down-sampling, further subjected to a hole residual block, and then subjected to 4 times of up-sampling output; the extended path indicates a path of the splicing operation other than the path of the U-shape. In order to utilize the feature information in the network to a greater extent, the feature map of the features extracted on the network contraction path after up-sampling is spliced with the feature map in the expansion path with the same size.

In the generator network G, 4 downsampling operations, 4 upsampling operations and 3 feature splicing operations are performed in total, and the selected and spliced feature map is the feature map after downsampling. And the downsampling operations used in the generation network are all performed using convolution operations with a convolution kernel size of 3x3 steps of 2.

The construction of the discriminator network D is specifically that a ResBlock structure is introduced into the discriminator network D to prevent the degradation problem of the deep network, and the used hopping connection structure still enables the network to optimize learning under the condition of increasing the network depth, and simultaneously prevents the network degradation problem. The size of each layer of convolution kernel in the network is judged to be 3x3, the network is downsampled by carrying out Max boosting operation for 4 times in total, the network finally uses a full connection layer to classify vectors, and whether the image input into the judgment network is from a real image or an image generated by a generator is judged. The structure of the discrimination network is shown in fig. 6, wherein in the legend, "Scalar" represents a Scalar, that is, the output value of the final discriminator network D, and is between [0,1], which represents the confidence of the final discriminator network in the authenticity judgment of the input image; "feature map" means the output image after the convolution operation; "thin arrow to the right" indicates the convolution operation with a convolution kernel size of 3 × 3 steps of size 1, the batch normalization operation, the Relu activation operation; "Thick arrows to the right" indicate convolution operations, BatchNormalization operations, Relu activation operations with convolution kernel sizes of 3 × 3 steps of size 2; "thin dashed arrow to the right" represents a flattening operation that stretches the feature map of the multi-color channel into a one-dimensional vector; the "residual block" here is the same as the "residual block" in the generator network G; "maximum pooling layer" means the maximum downsampling operation with an operation kernel size of 2 x 2 steps of size 2, with the output image size being 1/2 of the original image in both height and width; "global average pooling" means a downsampling operation that operates on the average of kernel size to image size, with the output result such that the image sizes are averaged over all, (c, h, w) size images are "global average pooled" to (c, 1, 1), where c is the number of color channels, h is the height of the feature map, and w is the width of the feature map; "thick dotted arrow to the right" indicates a full join operation, i.e., the target vector is transformed into a matrix to obtain a result vector.

It is noted that the numbers in the upper left corner of all feature maps in the generator network (as shown in fig. 3) and the discriminator network (as shown in fig. 6) represent the number of color channels, corresponding to the number of color channels output after each pass of the "arrow operation". And the 'K' at the lower left corner of the feature map represents the size of the original image, and the size of the feature map is changed into 'K/p' after multiple times of down-sampling, which represents that the adopted size is K divided by the corresponding multiple p.

Meanwhile, an Adam optimization method is adopted in the Pythrch framework, and the hyper-parameters in the Pythrch framework are optimized in the training process, so that the initial parameters of the generator network G and the discriminator network D are assisted.

Step S3: loading the retina training set E into a memory, randomly disorganizing, and extracting a pair of retina training samples (a) for each training_i，b_i) Including the retinal image and the reference blood vessel segmentation image, to prepare for the next training.

Step S4: extracting a pair of samples (a) from a retina training set E_i，b_i) I is 1,2,3, …, M. The retina image a_iAnd loading and inputting the image into a generator network G, wherein the corresponding image size is (3 multiplied by h multiplied by w), 3 represents the number of color channels, the corresponding color channel d belongs to { red, green and blue, h represents the height of a single picture, and w represents the width of the single picture. Through the layer-by-layer calculation of the network, the generated retina blood vessel segmentation graph z is finally obtained_iThe image size is (1 × h × w), and the corresponding gray-scale map is a single color channel, i.e., the image is represented by the gray-scale value size of the image according to the degree of the blood vessel in the image.

Step S5: the discriminant network D attempts to maximize the loss function by distinguishing the distribution of the retina training set E from the distribution of the retina training composite set E', and the discriminant network D simultaneously and separately processes the pairs of retina training set E samples (a)_i，b_i) And retina training synthetic set E' sample pairs (a)_i，z_i) Discrimination is made wherein i is 1,2,3, …, M, each giving [0,1]The discrimination confidence q between the samples q represents the probability that the sample pair is a retina training set E sample pair.

Step S6: and calculating the error between the generated retina training synthetic set E' and the retina training set E through loss function calculation to obtain a loss value. And performing backward propagation by using the obtained loss value, and respectively performing network parameter adjustment on the discriminator network D and the generator network G. And according to the given loss function, calculating the gradient of the parameters in the generator network G and the discriminator network D by using a chain type derivative method, and updating the corresponding parameters by using a random gradient descent method. The corresponding parameter updating formula is as follows:

representing the gradient of the corresponding parameter.

Step S7: the generators were evaluated using the retina test set F, with the optimal model parameters retained. And meanwhile, judging at the parameter updating end stage, judging whether the training iteration number reaches the maximum iteration number, if so, ending the training stage, and entering the next step. Otherwise, the training is continued, and the step S3 is continued to continue the loop iteration training.

At the end of the model training phase, inputting the retina test set F into a generator network G, and generating a retina test synthesis set F' by using the retina test set F sample pairs (a)_i，b_i) Reference vessel segmentation map b in (1)_iAnd the generated retina sample pair (a)_i，z_i) Generating a segmentation map z of (1)_iA pixel-by-pixel alignment is performed, where i is 1,2,3, …, N, and each pixel is classified as a vascular point and a non-vascular point. In order to perform a performance test on the current generator network G, quantitative analysis needs to be objectively performed by a performance index. Indexes such as Accuracy (Accracy, Acc), Specificity (Sp), Sensitivity (Se), Dice coefficient, F-measure, Area (AUC) formed by a receiver operating characteristic curve (ROC) and a coordinate axis, and area (mAP) formed by an Accuracy-recall rate curve (PR curve) and a coordinate axis are adopted to measure the text modelThe effectiveness of the model.

Step S8: and testing the test sample image by using the trained generator network G, inputting an original retinal image, and correspondingly outputting a retinal blood vessel segmentation image.

A retina image sample set F₁＝{a_i|i∈[1，R_F]A, taking each retinal image a_iAs input in the optimal generator network G, the corresponding vessel segmentation image z is output_i，i＝1，2，3，…，R_FWhere RF represents the number of samples of the retinal image sample set. The segmentation effect of the present invention on a sample set of retinal images is shown in fig. 7.

In summary, the invention adopts a GAN-based retinal vessel image intelligent segmentation method, and the generated network uses a residual error structure and a hole convolution operation, so that the network increases the receptive field of the network without introducing additional parameters, the network captures the characteristics of the image more comprehensively, and the retinal segmentation task can be completed better. In addition, a feature stitching operation in the expanded path of the network is generated, which enables the network to better exploit the shallow and deep features of the image for segmentation tasks. In the discrimination network, the structure of a residual error module is added, so that the problem of network degradation caused by the deepening of the network layer number is avoided, the discrimination capability of a deep network is better utilized, and the supervision capability of the network is enhanced.

It will be appreciated by persons skilled in the art that the invention is not limited to details of the foregoing embodiments, and that the invention can be embodied in other specific forms without departing from the spirit or scope of the invention. In addition, various modifications and alterations of this invention may be made by those skilled in the art without departing from the spirit and scope of this invention, and such modifications and alterations should also be viewed as being within the scope of this invention.

Claims

1. a kind of retinal blood vessel image intelligent segmentation method based on GAN, is characterized in that, described method comprises the following steps:

Step S1: Given a retinal image sample set, the sample set includes a sample pair of retinal image and reference blood vessel segmentation image, and the retinal image sample set is denoted as C={(a _i ,b _i )|i∈[1,R]}, In the formula, a represents the retinal image, b represents the reference blood vessel segmentation image, R represents the number of samples, and i represents the sample subscript; the retinal image complete set C is copied and divided into retinal image training set E={(a _i ,b _i )| i∈[1,M]} and retinal image test set F={(a _i ,b _i )|i∈[1,N]}, where N+M=R, and M and N represent the divided samples, respectively quantity;

Step S2: Build a generator network G, which includes a residual block structure and a hole convolution operation; design a discriminator network D, which includes a residual structure, and obtain a discriminator network D with a large receptive field; Finally, the generator network G and The combination of the discriminator network D is called a generative adversarial network; the Adam optimization method is used in the Pytorch framework to initialize the network parameters of the generator network G and the discriminator network D, and the initial parameters of the generator network G and the discriminator network D are obtained. , and set the relevant training hyperparameters for training optimization of the network model;

Step S3: take the retina training set E as input, load it into the computer memory, and randomly scramble the retina training set E={(a _i ,b _i )|i∈[1,M]} for the next step. Prepare for the training phase;

Step S4: The generator network G takes the retina training set E as input, and calculates the output layer by layer through the network to generate the corresponding retina training synthesis set E'={(a _i ,z _i )|i∈[1,M]} , where a _i in E' is the same as a _i in E, both are retinal images; and _zi in E' represents the blood vessel segmentation map generated by the generator network G, and b _i in E represents the benchmark blood vessel segmentation map ;

Step S5: The discriminator network D judges the image samples in the _retinal training _synthetic set E' and the retinal training set E generated by the generator network G one by one, respectively. The retina training sample pair (a _i , b _i ) gives the confidence q of the decision, i=1, 2, 3,...,M, and the range of q is between [0, 1];

Step S6: Calculate the error between the generated retinal training set E' and the retinal training set E by calculating the loss function, and obtain a loss value; use the obtained loss value to perform backpropagation, and compare the discriminator network D and generation. The network parameters of the device network G are adjusted respectively;

Step S7: Use the retina test set F to evaluate the generator network G, retain the optimal generator network G' and the discriminator network D'; at the same time, judge at the end stage of parameter update to judge whether the number of training iterations has reached the maximum number of iterations , if the maximum number of iterations has been reached, the training phase ends, and the next step is entered; otherwise, it will enter step S3 for cyclic iteration training;

Step S8: Input the retinal image sample set _F ₁ ={a _i |i∈[1,RF ]} into the optimal generator G' to generate a blood vessel segmentation image, where _RF represents the number of samples in the retinal image sample set;

In the step S2, the generator, the discriminator, and the Adam optimizer are constructed and initialized as follows:

In the generator network G, the network splices the feature maps obtained in the expansion path with the high-resolution feature maps obtained in the shrinking path, and then performs a convolution operation on the obtained spliced feature maps to extract features. Use the shallow features and deep features of the image; the residual module structure is also used in the generator network G, this structure is used to solve the degradation problem that is easy to occur in the deep network, and can help the network to better learn the features of the image; In order to increase the receptive field of the generator network G, the present invention adds a hole convolution operation to the generator network G, which increases the receptive field of the generator network G without increasing the number of network parameters; The size of the convolution kernel is 3x3. After each downsampling operation, the output feature map is input into 2 residual blocks ResBlock for skip connection operation. The feature map obtained after 4 times of downsampling is divided into The hole convolution operation of 5 and 3 increases the receptive field of the network; in the process of upsampling, the features extracted by the shallow network will be spliced for operation, and the network will finally use a 1x1 convolution operation to ensure the color channel output by the network. The number is the same as the number of input color channels;

The construction of the discriminator network D is as follows: the discriminator network D is a deep convolutional neural network, and its main task is to judge whether the input image is a real image or an image generated by the generator network G; in order to prevent the problem of network degradation, the present invention is used to discriminate. The ResBlock structure is added to the discriminator network D. The skip connection structure can still make the network gradient propagate and ensure convergence when the network depth is increased. The size of each layer of the convolution kernel in the discriminator network D is 3x3, and a total of 4 times MaxPooling is performed The operation downsamples the network. At the end of the network, a fully connected layer is used to change the final dimension, so that a confidence q of [0, 1] is finally output, and the input image is judged to be the benchmark retinal image pair (a, b ), or the retinal image pair (a, z) generated by the generator network G;

The Adam optimizer is constructed as follows: the Adam optimization method is used in the Pytorch framework to dynamically adjust the training hyperparameters to optimize the training; during the training process, the Adam optimizer is used to intelligently adjust the learning rate to speed up the convergence of the network; the initial learning rate is 0.0002 , the first moment coefficient β ₁ is 0.5, and the second moment coefficient β ₂ is 0.999;

In the step S4, the generator network G generates a segmented image specifically as follows: extracting a pair of retinal training samples (a _i , b _i ) from the retinal training set E, i=1, 2, 3,...,M, The retinal image a _i is input into the generator network G, and the corresponding image size is (3×h×w) where 3 represents the number of color channels, which is the corresponding color channel d∈{red, green, blue}, h represents a single The height of a picture, w represents the width of a single picture; through the layer-by-layer calculation of the network, the generated retinal blood vessel segmentation map _zi is obtained, and the image size is (1×h×w), which corresponds to the grayscale of a single color channel Figure, that is, according to the apparent degree of blood vessels in the image, it is expressed in the form of the size of the image gray value;

In the step S5, the generated blood vessel segmentation image and the reference blood vessel segmentation image are respectively discriminated, specifically: the discriminator network D simultaneously separates the sample pair (a _i , b _i ) of the retinal training set E and the generated retinal sample pair. (a _i , z _i ) to discriminate, where i=1, 2, 3,...,M, respectively give the discriminant confidence q between [0, 1], q indicates that this sample pair is a retinal training set The probability of the sample pair in E; then calculate the loss value between the generated blood vessel segmentation sample _zi and the reference blood vessel segmentation sample b _i through the loss function to prepare for the next back propagation;

In the step S6, the gradient update of the generator network G and the discriminator network D using the confrontation loss is specifically: according to the given loss function, using the chain derivation rule, the generator network G and the discriminator network D are updated. The parameters of the gradient are calculated, and the corresponding parameters are updated through the stochastic gradient descent method; the corresponding parameter update formula is:

where θt denotes the parameters of the _t -th component in the generator network G and the discriminator network D, η denotes the learning rate in the hyperparameters,

represents the gradient of the corresponding parameter;

In the described step S7, the retinal image test sample is used to evaluate the generator, and the optimal model parameters are reserved as follows: at the end of the current training stage of the model, the retinal test set F is input into the generator network G, and the generator network G generates a synthetic retinal test set F' by combining the benchmark blood vessel segmentation map b _i in the sample pair (a _i , b _i ) in the retina test set F with the generated retinal sample pair (a _i , z _i ) in the generated The segmentation map zi is compared pixel by pixel, where _i =1, 2, 3, ..., N, and each pixel is classified as a vessel point and a non-vessel point; in order to test the performance of the current generator network G , which needs to be quantitatively analyzed objectively through performance indicators; accuracy (Accuracy, abbreviated Acc), specificity (Specificity, abbreviated Sp), sensitivity (Sensitivity, abbreviated Se), Dice coefficient, F-measure, receiver work The area enclosed by the characteristic curve (ROC) and the coordinate axis (AUC) and the precision-recall rate curve (PR curve) and the area enclosed by the coordinate axis (mAP) are used to measure the effectiveness of the model in this paper;

Among them, AUC is mostly used for performance measurement of medical image processing. The closer the value of AUC is to 1, the better the segmentation effect;

Among them, TP (true positive) is a true positive, indicating the number of correctly segmented blood vessels; TN (true negative) is a true negative, indicating the number of correctly segmented non-vessel background pixels; FP (false positive) is false Positive, that is, the number of pixels that blood vessels are wrongly divided into non-vessel; FN (false negative) is false negative, that is, the number of pixels that non-vessel is wrongly divided into blood vessels; TP+FN+FP+TN is the image The total number of pixels in the region of interest in

Since the above performance indicators depend on the threshold of the output results, ROC data can be drawn by changing the true positive score (Sensitivity) and the false positive score (Specificity), and AUC is the area under the ROC curve; all evaluation indicators are considered in the mask Test on all pixels within, mask represents the retinal optic disc area;

After the above indicators are screened, the model with the largest Acc, Se, Sp, Precision, Recall, and F-measure is selected as the optimal model; finally, it is judged at the end of the parameter update stage to determine whether the number of training iterations has reached the maximum iteration. If the maximum number of iterations has been reached, the training phase ends, the optimal generator network G' and the optimal discriminator network D' are obtained, and the next step is entered; otherwise, step S3 will be entered to continue the loop iteration training;

The step S8 is as follows: taking the retinal image sample set _F ₁ ={a _i |i∈[1,RF ]}, and taking each retinal image a _i as the optimal generator network G. Input and output the corresponding blood vessel segmentation image z _i , i=1, 2, 3, . . . , _RF , where _RF represents the number of samples in the retinal image sample set.