CN113192069B

CN113192069B - Semantic segmentation method and device for tree structure in three-dimensional tomographic image

Info

Publication number: CN113192069B
Application number: CN202110618838.7A
Authority: CN
Inventors: 冯建江; 周杰; 谭子萌
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2024-07-12
Anticipated expiration: 2041-06-03
Also published as: CN113192069A

Abstract

The application provides a semantic segmentation method of a tree structure in a three-dimensional tomographic image, which relates to the technical field of medical image processing, wherein the method comprises the following steps: acquiring a three-dimensional tomographic image to be tested; preprocessing an image to obtain a preprocessed image, wherein the preprocessing comprises unifying the resolution of the image, cutting the image to a uniform size and normalizing the gray value of the image; inputting the preprocessed image into a tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image. The application uses the multitasking full convolution network and the graph convolution network to finish feature extraction together, so as to obtain the semantic segmentation result of the tree structure, model the space context information, explicitly introduce the structure priori knowledge, and realize good segmentation performance.

Description

Semantic segmentation method and device for tree structure in three-dimensional tomographic image

Technical Field

The present application relates to the field of medical image processing technologies, and in particular, to a semantic segmentation method and apparatus for a tree structure in a three-dimensional tomographic image.

Background

The computed tomography technology utilizes the difference of the transmittance and the absorptivity of different tissues of a human body to X rays, can provide a three-dimensional anatomical field of view in the human body for doctors on the premise of not performing surgery, has the advantages of high scanning speed, high resolution and the like, and is a mainstream medical imaging mode. Anatomical tree structures in three-dimensional tomographic images, including the trachea, arteries, veins, etc., are important aids in diagnosing and treating related diseases. Taking chest tomographic image as an example, the method can reflect the anatomical structure and physiological condition of the lung and the tracheal tree of a patient, and plays an important role in the diagnosis, treatment, prognosis and follow-up of various diseases such as novel coronavirus infection (COVID-19), interstitial pneumonia, lung tumor and the like. In recent years, with the rapid development of computer theory, the tasks of automatic extraction of tracheal tree, lung nodule detection, lung tumor positioning and the like in chest tomography images based on computer-aided medical technology of medical images have been widely studied and used.

The semantic segmentation task of the anatomical tree structure can be seen as an extension of the traditional overall tree structure extraction task. Taking the example of the semantic segmentation of the tracheal tree based on the thoracic tomographic image, it means that the tracheal region is automatically segmented from the thoracic tomographic image and is divided into 32 predefined parts with unique anatomical meaning and anatomical naming according to the tracheal tree topology, and these parts are called semantic classes. Notably, for ease of definition, the 32 semantic classes covered herein cover the trunk, main bronchi (primary bronchi), lobar bronchi (secondary bronchi), and lung segment Zhi Qiguan (tertiary bronchi) portions of the tracheal tree, with the terminal bronchioles omitted. Compared with the traditional overall tree structure extraction task, the semantic segmentation task not only can model the overall space structure of the tree structure, but also can provide more meaningful information for clinical application and subsequent medical image processing tasks, such as being beneficial to morphological measurement, positioning of interesting anatomical view angles, lesion area positioning, comparison analysis of the same anatomical area of different patients and the like. However, because the anatomical tree structure has complex distribution, different inter-individual morphology changes, each semantic class often has very small volume ratio in the tomographic image, no obvious demarcation plane exists between classes, the local appearance and gray level distribution of the image can be affected by pathology, and the like, the semantic segmentation task of the tree structure faces great challenges.

On the other hand, the anatomical topology of the tree structure is often relatively fixed, i.e. the relative positions of the semantic classes have strong regularity and consistency. Taking the tracheobronchial tree in the chest tomographic image as an example, the trunk is divided from the carina into left and right main bronchi, and then the slender and long left main bronchus descends obliquely and is divided into an upper bronchus and a middle section Zhi Qiguan, the latter is further divided into a middle bronchus and a lower bronchus; the short and thick right bronchi enter the upper right lung in nearly vertical position and are divided into upper bronchus and lower bronchus. If the prior information of the anatomical structure is combined in the semantic segmentation algorithm of the tree structure, spatial distribution constraint is introduced, so that better segmentation performance can be achieved. At present, an anatomical tree structure semantic segmentation method with explicit combination of space prior information is not available.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art to some extent.

Therefore, a first object of the present application is to provide a semantic segmentation method for tree structures in three-dimensional tomographic images, which solves the problems of complex distribution of anatomical tree structures, various morphological changes among different individuals, very small volume occupation ratio of each semantic class in tomographic images, no obvious boundary plane between classes, possible pathological influence on local appearance and gray distribution of images, etc. in the existing method, achieves the purpose of jointly completing feature extraction by using a multi-task full convolution network and a graph convolution network to obtain semantic segmentation results of tree structures, models spatial context information, explicitly introduces structure priori knowledge, and can realize good segmentation performance.

A second object of the present application is to provide a semantic segmentation apparatus for tree structures in three-dimensional tomographic images.

A third object of the present application is to propose a non-transitory computer readable storage medium.

To achieve the above object, an embodiment of a first aspect of the present application provides a semantic segmentation method for a tree structure in a three-dimensional tomographic image, including: acquiring a three-dimensional tomographic image to be tested; preprocessing an image to obtain a preprocessed image, wherein the preprocessing comprises unifying the resolution of the image, cutting the image to a uniform size and normalizing the gray value of the image; inputting the preprocessed image into a tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image.

Optionally, in one embodiment of the present application, a tree-structure semantic segmentation network is constructed, where the tree-structure semantic segmentation network includes a feature extraction module and an inference module, and the feature extraction module includes a multitasking network extraction module and a graph convolution network feature extraction module.

Optionally, in one embodiment of the present application, the tree-structured semantic segmentation network is trained offline, the offline training comprising the steps of:

Acquiring an original data set, preprocessing the original data set, and generating a preprocessed data set, wherein each image in the preprocessed data set has the same resolution and size as those of the preprocessed image, and comprises the same anatomical tree structure;

Labeling the preprocessed data set to generate a manual labeling result, wherein the manual labeling result comprises tree structure bifurcation key points, tree structure integral segmentation labels and tree structure semantic segmentation labels;

Performing data preparation on the preprocessed data set according to the manual labeling result to generate a training data set;

Inputting the training data set into a multi-task network extraction module for pre-training to obtain multi-task network parameters;

And carrying out overall training on the tree structure semantic segmentation network according to the multi-task network parameters to obtain semantic segmentation algorithm network parameters.

Optionally, in one embodiment of the present application, labeling the preprocessed data set includes the steps of:

Manually labeling each image in the preprocessed data set by using medical image processing software, wherein the content of the manual labeling comprises a predefined tree structure bifurcation key point and a tree structure integral segmentation label;

Based on the tree structure bifurcation key points and the tree structure integral segmentation labels, generating tree structure semantic segmentation labels corresponding to each image in the data set by using an automatic method, and then carrying out manual correction.

Optionally, in an embodiment of the present application, the data preparation is performed on the preprocessed data set, and the specific process is that a probability heat map, an overall segmentation probability map, and a semantic segmentation probability map are generated according to the manual labeling result, and each image in the preprocessed data set and the corresponding probability heat map, the overall segmentation probability map, and the semantic segmentation probability map form a training data pair, where all the training data pairs together form the training data set.

Optionally, in one embodiment of the present application, inputting the training data set into the multi-tasking network extraction module for pre-training comprises the steps of:

Randomly selecting a training data pair from the training data set, inputting images in the training data pair into the multi-task network module, and outputting the prediction results of three tasks;

Respectively inputting the prediction results of the three tasks and the prediction targets of the three tasks in the training data pair into loss functions corresponding to the tasks to obtain loss function values, and completing one-time pre-training, wherein the loss functions are as follows:

Wherein, The method is characterized in that the method comprises the steps of respectively detecting a key point, integrally dividing a tree structure, and obtaining loss functions corresponding to semantic division tasks, wherein alpha and beta are super parameters;

And circularly executing the pre-training step, and completing the pre-training when the training times exceed the set upper limit times to obtain the multi-task network parameters.

Optionally, in one embodiment of the application, the overall training comprises the steps of:

generating training parameters according to the multitasking network parameters, the random initialization graph convolution network parameters and the inference module network parameters, wherein the specific process is as follows:

θ＝θ₁∪θ₂∪θ₃

Wherein θ is a training parameter, θ ₁ is a multitasking network parameter, θ ₂ is a random initialization graph convolution network parameter, and θ ₃ is an inference module network parameter;

Randomly selecting a training data pair from the training data set, inputting the images in the training data pair into a multi-task network feature extraction module, and obtaining an output result and an output feature diagram of a trunk part;

the graph convolution network feature extraction module performs graph model construction and feature extraction according to the output result to generate a new feature graph;

the inference module generates a final prediction result of semantic segmentation according to the output feature map of the trunk part and the new feature map;

inputting the final prediction result and the semantic segmentation probability map in the training data pair into a loss function to calculate to obtain a loss function value, and completing one-time integral training, wherein the loss function is as follows:

For the loss function of the final prediction result, selecting the Dice loss function as the loss function of the final prediction result, For a multi-task network loss function, using a super-parameter gamma balance magnitude;

and circularly executing the whole training step, and completing the whole training when the training times exceed the preset upper limit to obtain the semantic segmentation algorithm network parameters.

Optionally, in one embodiment of the present application, the preprocessed image is input into a structural semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image, which specifically includes the following steps:

Inputting the preprocessed image into a multi-task network feature extraction module to obtain a corresponding bifurcation key point detection result, a tree structure integral segmentation result, a semantic segmentation prediction result and an output feature map of a trunk part;

inputting the bifurcation key point detection result, the tree structure integral segmentation result and the semantic segmentation prediction result into a graph convolution feature extraction module for graph model construction and further feature optimization, and outputting a feature graph;

Inputting the output feature map and the feature map of the trunk part into an inference module to obtain a multichannel predictive probability map;

and comparing the predicted probability values of all the channels for each voxel position in the multi-channel predicted probability map, wherein the channel corresponding to the maximum probability value is the semantic class to which the voxel belongs, so that the final predicted result of the semantic segmentation corresponding to the input image is obtained.

To achieve the above object, a second aspect of the present invention provides a semantic segmentation apparatus for tree structures in three-dimensional tomographic images, comprising:

The acquisition module is used for acquiring a three-dimensional tomographic image to be tested;

the preprocessing module is used for preprocessing the image to obtain a preprocessed image, wherein the preprocessing comprises unified image resolution, cutting the image to a unified size and normalizing an image gray value;

The prediction module is used for inputting the preprocessed image into the tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image.

In order to achieve the above object, a third aspect of the present invention provides a non-transitory computer-readable storage medium, which when executed by a processor, is capable of executing a semantic segmentation method and apparatus for a tree structure in a three-dimensional tomographic image.

The semantic segmentation method, the semantic segmentation device and the non-transitory computer readable storage medium of the tree structure in the three-dimensional tomographic image solve the problems that the anatomical tree structure is complex in distribution, different inter-individual forms are varied, the volume ratio of each semantic class in the tomographic image is very small, no obvious boundary plane exists between classes, the local appearance and gray distribution of the image can be affected by pathology and the like in the conventional method, realize the purpose of jointly completing feature extraction by using a multi-task full convolution network and a graph convolution network to obtain the semantic segmentation result of the tree structure, model space context information, explicitly introduce structure priori knowledge, and can be widely applied to semantic segmentation tasks of various anatomical tree structures, such as trachea, artery, vein and the like, and realize good segmentation performance.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart of a semantic segmentation method for tree structures in a three-dimensional tomographic image according to an embodiment of the present application;

FIG. 2 is another flow chart of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application;

FIG. 3 is a schematic diagram of the results of labeling and data generation of a tracheal tree of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an online stage tree structure semantic segmentation network of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an offline stage tree structure semantic segmentation network of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a multi-task network structure of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a semantic segmentation method of a tracheobronchial tree RMB semantic segment map of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application;

fig. 8 is a schematic diagram of a semantic segmentation prediction result of a tracheal tree of the semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

The following describes a semantic segmentation method and apparatus of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application with reference to the accompanying drawings.

Fig. 1 is a flowchart of a semantic segmentation method for a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in fig. 1, the semantic segmentation method of the tree structure in the three-dimensional tomographic image includes the following steps:

Step 101, acquiring a three-dimensional tomographic image to be tested;

102, preprocessing an image to obtain a preprocessed image, wherein the preprocessing comprises unifying the resolution of the image, cutting the image to a uniform size and normalizing the gray value of the image;

and step 103, inputting the preprocessed image into a tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image.

The semantic segmentation method of the tree structure in the three-dimensional tomographic image comprises the steps of obtaining a three-dimensional tomographic image to be tested; preprocessing an image to obtain a preprocessed image, wherein the preprocessing comprises unifying the resolution of the image, cutting the image to a uniform size and normalizing the gray value of the image; inputting the preprocessed image into a tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image. Therefore, the problems that the distribution of the anatomical tree structure is complex, the morphological changes among different individuals are various, the volume ratio of each semantic class in a tomographic image is very small, no obvious demarcation plane exists among classes, the local appearance and gray distribution of the image can be affected by pathology and the like can be solved, the purposes that the characteristic extraction is completed jointly by using a multi-task full convolution network and a graph convolution network to obtain the semantic segmentation result of the tree structure are realized, the spatial context information is modeled, the structure priori knowledge is explicitly introduced, the method can be widely applied to semantic segmentation tasks of various anatomical tree structures, such as trachea, artery, vein and the like, and good segmentation performance can be realized.

Further, in the embodiment of the application, a tree-structure semantic segmentation network is constructed, wherein the tree-structure semantic segmentation network comprises a feature extraction module and an inference module, and the feature extraction module comprises a multi-task network extraction module and a graph convolution network feature extraction module.

The algorithm input image is a preprocessed three-dimensional single-slice scanning image, the feature extraction module comprises a multi-task network and a graph convolution network, and the inference module takes the output of the feature extraction module as input and outputs a prediction result of semantic segmentation of the tree structure.

The multi-task network feature extraction module takes a U-Net network as a main body and consists of a main body part and 3 parallel branches. Wherein the trunk portion comprises symmetrical compression and expansion paths: the compression path consists of 5 residual modules and 4 largest pooling layers distributed between every two residual modules, each residual module contains two convolution operations and a superimposed short-link structure bridging between the input and the output to prevent the gradient vanishing problem which may occur during network training. In the data forward propagation process, each residual error module increases the number of characteristic channels of an input characteristic image by 2 times without changing the size of the characteristic image, and the maximum pooling layer reduces the size of each dimension of the characteristic image to 1/2 of the original size without changing the number of the characteristic channels. The expansion path consists of 3 residual modules and 4 deconvolution layers, wherein the input characteristic diagram of the 1 st deconvolution layer is the output of the 5 th residual module on the compression path, and the deconvolution layers and the residual modules are alternately distributed. The expansion path uses a residual module with the same structure as the compression path, but the number of characteristic channels of the characteristic diagram is not changed, and each deconvolution layer increases the size of each dimension of the characteristic diagram by 2 times and reduces the number of the characteristic channels of the characteristic diagram to 1/2 of the original size. The compression paths and the expansion paths are symmetrically distributed, so that the output and input of the trunk part are kept at the same size.

In addition, in order to better reserve local features and realize fusion of global and local features, a jump connection structure is added between symmetrical layers on a compression path and an expansion path, namely splicing operation of channel dimensions among feature graphs with the same dimension is realized. Specifically, the method is arranged from deep to shallow in a hierarchy and from small to large in scale: splicing an output characteristic diagram of the 4 th residual module on the compression path with an output characteristic diagram of the 1 st deconvolution module on the expansion path to serve as input of the 1 st residual module on the expansion path; splicing an output characteristic diagram of a 3 rd residual module on a compression path with an output characteristic diagram of a 2 nd deconvolution module on an expansion path to serve as input of the 2 nd residual module on the expansion path; splicing an output characteristic diagram of the 2 nd residual error module on the compression path with an output characteristic diagram of the 3 rd deconvolution module on the expansion path to serve as input of the 3 rd residual error module on the expansion path; and splicing the output characteristic diagram of the 1 st residual module on the compression path with the output characteristic diagram of the 4 th deconvolution module on the expansion path to serve as an output characteristic diagram of the trunk part.

Then, the output characteristic diagram of the trunk part is simultaneously transmitted into three parallel branches of the network, each branch is composed of a residual block and a three-dimensional convolution layer with a convolution kernel size of 1 x 1. The output of the three branches is respectively a bifurcation key point multi-channel heat map prediction result, a tree structure integral segmentation probability map prediction result and a tree structure semantic segmentation multi-channel probability map prediction result which correspond to the input image. The three predictors remain consistent in size with the input image.

In the tree structure semantic segmentation network, a graph convolution network feature extraction module is positioned between a multi-task network feature extraction module and an inference module, and aims to further optimize the features learned by the multi-task network. Unlike conventional convolution networks, the graph convolution network allows information exchange between voxels that are not directly adjacent, thus having a larger receptive field and being able to learn more discriminative features. In this module, a graph model G (V, E) is first constructed based on the output of the multitasking network, where V is the set of nodes and E is the set of edges. And processing a multi-channel semantic segmentation probability map prediction result of the multi-task network, comparing the prediction probability value of the voxel position of each channel with each voxel, wherein the channel corresponding to the maximum probability value is the semantic class to which the voxel belongs (when the probability value of the background channel is maximum, the voxel does not belong to a tree structure area), so as to obtain the prediction semantic label of each voxel. Randomly sampling on the processed tree-like structure region, namely randomly obtaining a subset of the non-background voxel set of the semantic label, wherein the number of the elements is N ₁ (the sampling method can use uniform sampling or furthest point sampling), and the set is recorded asObviously, the belonging branches (i.e. semantic tags) of each voxel in V ₁ are known. And obtaining the position of the voxel with the maximum predicted value channel by channel according to the regression result of the key point heat map of the multi-task network, namely the detection position of the corresponding key point. Sampling a series of voxels near the detection position of each key point, wherein the total number is N ₂, and the set formed by all the sampling voxels is recorded asTogether, voxel sets V ₁ and V ₂ form a node set of graph model GWhere n=n ₁+N₂.

In graph model construction of graph-rolling networks, one common edge structure is to set edges between neighboring nodes in the european space, i.e. connect those nodes whose euclidean distance is smaller than a certain threshold r (i.e. when nodes v _i and v _j satisfy |v _i-v_j | < r, there is an edge e _ij between the two nodes, i.e. e _ij =1, otherwise there is no edge e _ij, i.e. e _ij =0). In order to explicitly apply the structure prior information of the anatomical tree structure, a new edge structure is introduced except edges among the neighborhood nodes: feature learning of the semantic class region is assisted by connecting nodes located on the semantic class region with nodes around corresponding bifurcation keypoints.

The graph convolution network is realized based on a graph model G (V, E) and consists of two layers of graph convolution layers. First, feature graphs are output at the backbone portion of the multi-tasking networkIs combined into a feature map by sampling feature vectors at the V (i=1, 2,..once, N) position of each node V _i eTo enrich the feature expression, the three-dimensional coordinates of each node and the distance between the three-dimensional coordinates and the center of the tree structure are added into a feature map, and the feature map is expanded intoAnd serves as an input feature map for the graph convolutional network. Output characteristics of the graph rolling network are marked asThe calculation process of the graph convolutional network can be expressed as:

Wherein, Corresponding diagonal matrixSigma (·) is the ReLU activation function, W ⁽⁰⁾ and W ⁽¹⁾ correspond to trainable parameters of the two-layer scroll laminate, respectively.

Further, in the embodiment of the present application, the tree-structured semantic segmentation network is trained offline, and the offline training includes the following steps:

Using a large number of three-dimensional tomographic images containing the same anatomical tree structure as the original dataset, which may originate from a public database or a collaborative hospital, the number of images should be no less than 50. Each image in the original data set is preprocessed, and the preprocessing process comprises three parts of unifying image resolution, cutting the image to unifying size and normalizing the gray value of the image. The cut image should contain anatomical tree structures in the original image, remove noise interference such as bones and other tissues, and the cut area can be determined according to the average distribution range of the tree structures.

Further, in the embodiment of the present application, labeling the preprocessed data set includes the following steps:

Based on the tree structure bifurcation key points and the tree structure integral segmentation labels, generating tree structure semantic segmentation labels corresponding to each image in the data set by using an automatic method, and then manually correcting to finish the labels.

And manually labeling each image in the preprocessed data set by using medical image processing software (such as a 3D (three-dimensional) slice), wherein labeling content comprises a predefined tree structure bifurcation key point and a tree structure integral segmentation two parts. The labeling result of the bifurcation key points is a series of bifurcation key point three-dimensional coordinates (for example, stored in fcsv format) corresponding to the image, and the labeling result of the whole segmentation is a pixel-by-pixel binary image corresponding to the image (in the binary image, the voxel value of the tree structure area is 1, and the voxel values of the rest background areas are 0).

Based on the tree structure bifurcation key points and the whole segmentation labels, generating tree structure semantic segmentation labels corresponding to each image in the data set by using an automatic method. Specifically, the overall segmentation labeling refinement is used to obtain the central line of the lumen of the tree-shaped structure, and for the tree-shaped structure (such as a tracheal tree) with small change of the curvature of the lumen, the central line can be approximated by using sequential connecting lines among bifurcation key points. According to the anatomical topology of the tree structure, the whole segmentation label is divided into a plurality of areas by using the positions of the bifurcation key points, the bifurcation key points are positioned at the central positions of two adjacent semantic class interface surfaces, and the interface surfaces are perpendicular to the central line at the key points. And then, manually adjusting semantic segmentation labels automatically generated by each image in medical image processing software to obtain final semantic segmentation labels.

Further, in the embodiment of the application, data preparation is performed on the preprocessed data set, and the specific process is that a probability heat map, an integral segmentation probability map and a semantic segmentation probability map are generated according to a manual labeling result, each image in the preprocessed data set, the corresponding probability heat map, the integral segmentation probability map and the corresponding semantic segmentation probability map form a training data pair, and all the training data pairs jointly form the training data set.

And preparing a training data set by using the preprocessed original data set and the manual labeling result, namely, obtaining the prediction targets of three tasks, namely bifurcation key point detection, tree structure integral segmentation and semantic segmentation, in the multi-task network feature extraction module by using the manual labeling result of each image.

The bifurcation key point detection task is designed based on a Gaussian heat map regression method, and a probability heat map with the same scale as an input image is required to be respectively output to each key point by a network. Specifically, for a preprocessed image, each bifurcation key point in the image is taken as a target key point, and then the corresponding probability heat map takes the key point as a center to be in gaussian distribution, and the value of each voxel in the heat map reflects the probability that the voxel belongs to the target key point. The probability value is determined by Euclidean distance from the voxel to the target key point, and decreases from 1 to 0 outwards from the position of the target key point; the shape of the gaussian distribution is determined by the standard deviation delta. The heat map calculation formula corresponding to each preprocessed image is as follows:

Wherein x _k is the spatial coordinate of the kth bifurcation key point of any image in the preprocessed dataset, x is any voxel spatial position, H _k (x) is the heat map probability value corresponding to the key point at the voxel position, and N _l is the bifurcation key point total number owned by the tree structure.

And generating an integral segmentation prediction target, namely a binary image with the same scale as the input image, by applying the tree structure integral segmentation labeling. Wherein, the corresponding region voxel value of the tree structure is 1, and the voxel values of the rest background regions are 0.

And generating a multi-channel semantic segmentation prediction target by applying tree structure semantic segmentation labels, wherein each channel is a binary image with the same scale as the input image. For N _S semantic classes, the prediction target should contain N _S +1 channels, where the first N _S channels respectively reflect the distribution of each anatomical semantic segment in the input image (when a voxel belongs to the ith semantic class, the ith channel is 1, the rest of the channels are 0), and the last channel is the background channel (when a voxel does not belong to any semantic class, the background channel is 1, and the rest of the channels are 0).

The probability heat map, the whole segmentation probability map and the semantic segmentation probability map generated by each preprocessed image and the corresponding bifurcation key point label form a training data pair, and all training data pairs jointly form a training data set.

Further, in an embodiment of the present application, inputting the training data set into the multi-tasking network extraction module for pre-training comprises the steps of:

The graph model of the graph convolution network characteristic extraction module is constructed by depending on the output result of the multi-task network, and the multi-task network is pre-trained firstly to ensure that the multi-task network can obtain a better prediction result.

Specifically, an L2 loss function is used for the keypoint detection task, and a Dice loss function is used for the whole segmentation and semantic segmentation tasks. Because the anatomical tree structure always occupies a small volume proportion in the three-dimensional tomographic image, the training process is very easy to be difficult to converge because of the problem of class imbalance, each loss function is weighted, and the weight is the ratio of the number of voxels in the input image to the number of voxels belonging to the foreground class in each task. Specifically, for a key point detection task, a voxel with a non-zero probability value in a heat map prediction target is taken as a foreground; for the overall segmentation and semantic segmentation tasks, the foreground class is a corresponding segmentation area.

Every time a training data pair is input, the network calculates the loss function valueAnd minimizing the loss function value and adjusting the parameters by using a gradient descent method, so as to complete one training. And when the training times exceeds the upper limit L ₁ times, the pre-training of the multi-task network is completed, and the pre-trained multi-task network parameter theta ₁ is obtained.

Further, in an embodiment of the present application, the overall training includes the steps of:

θ＝θ₁∪θ₂∪θ₃

After the pre-training of the multi-task network feature extraction module is completed, the whole training is carried out on the tree-structure semantic segmentation network.

And selecting the Dice loss function as a loss function of a final prediction result, and weighting the loss function, wherein the weight is the ratio of the number of voxels in the input image to the number of voxels belonging to the foreground class in each task.

And each time a training data pair is input, the network completes parameter adjustment. And when the training times exceeds the upper limit L ₂ times, completing the whole training to obtain the trained semantic segmentation algorithm network parameter theta.

Further, in the embodiment of the present application, the preprocessed image is input into a structural semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image, which specifically includes the following steps:

When the background channel probability value is maximum, the voxel does not belong to the tree structure region.

The inference module consists of 21 multiplied by 1 convolution layers with nonlinear activation functions, takes the output characteristic diagram F ₀ of the trunk part of the multi-task network and the output characteristic diagram F ₂ of the diagram convolution network as inputs together, and completes the final prediction of the semantic segmentation of the tree structure. Specifically, first, the output feature map of the graph-rolling network is formedProjection to sparse matrix I.e. the eigenvectors of each node V _i e V (i=1, 2, …, N) are placed at corresponding positions of the matrix F ₃ according to their three-dimensional coordinates. Thereafter, the feature map is displayedAnd feature mapSplicing according to channel dimension as characteristic diagramAs an input feature map to the inference module. In the inference module, the number of channels of the output feature map is not changed by the first layer convolution layer, the output of the second layer convolution layer is consistent with the output size of the semantic segmentation branches in the multi-task network, and the second layer convolution layer is a multi-channel prediction probability map which has the same size as the input image and has the number of channels larger than 1 of the semantic category number. The output of the inference module is the final prediction result of the tree structure semantic segmentation network.

Fig. 2 is another flowchart of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in fig. 2, the semantic segmentation method of the tree structure in the three-dimensional tomographic image comprises an offline stage and an online stage, wherein the offline stage comprises acquisition, preprocessing and manual labeling of an original data set; preparing a training data set; inputting a tree structure semantic segmentation deep learning network; outputting a tree structure semantic segmentation prediction result; calculating a loss function optimizing network function and the like, wherein the online stage comprises the steps of acquiring a three-dimensional tomographic image; preprocessing data; inputting a tree structure semantic segmentation deep learning network; outputting tree structure semantic segmentation prediction results and the like.

Fig. 3 is a schematic diagram of a result of labeling and data generation of a tracheal tree of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in fig. 3, the semantic segmentation method of the tree structure in the three-dimensional tomographic image performs preprocessing on each image in the original data set, and the preprocessing process comprises three parts, namely unified image resolution, image clipping to a unified size and image gray value normalization. The preprocessed chest tomographic image is shown in fig. 3 (a). The entire segmentation markers of the tracheal tree in the chest tomographic image are shown in fig. 3 (B). The overall segmentation labeling refinement is used for obtaining the central line of the lumen of the tree-shaped structure, and for the tree-shaped structure (such as a tracheal tree) with small change of the curvature of the lumen, the central line can be approximated by using the sequential connecting lines among the bifurcation key points. According to the anatomical topology of the tree structure, the whole segmentation label is divided into a plurality of areas by using the positions of the bifurcation key points, the bifurcation key points are positioned at the central positions of two adjacent semantic class interface surfaces, and the interface surfaces are perpendicular to the central line at the key points. Bifurcation keypoint labeling of the tracheal tree in the thoracic tomographic image is shown in fig. 3 (C). And manually adjusting semantic segmentation labels automatically generated by each image in medical image processing software to obtain final semantic segmentation labels. Semantic segmentation labels of the tracheal tree in the chest tomographic image are shown in fig. 3 (D). And for a preprocessed image, taking each bifurcation key point in the image as a target key point, wherein the corresponding probability heat map takes the key point as a center to be in Gaussian distribution, and the value of each voxel in the heat map reflects the probability that the voxel belongs to the target key point. A heat map generated for each labeling key point in the chest tomographic image is shown in fig. 3 (E). For convenient observation, the three-dimensional heat map of each key point is projected to the same plane. And generating an integral segmentation prediction target, namely a binary image with the same scale as the input image, by applying the tree structure integral segmentation labeling. An overall segmented binary image of the tracheal tree in the thoracic tomographic image is shown in fig. 3 (F). And generating a multi-channel semantic segmentation prediction target by applying tree structure semantic segmentation labels, wherein each channel is a binary image with the same scale as the input image. The semantic segmentation prediction target of the tracheal tree in the thoracic tomographic image is shown in fig. 3 (G).

Fig. 4 is a schematic diagram of an online stage tree-structure semantic segmentation network of the semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in fig. 4, the semantic segmentation method of the tree structure in the three-dimensional tomographic image constructs a tree structure semantic segmentation algorithm network, the network is composed of two parts of a feature extraction module and an inference module, wherein an algorithm input image is a preprocessed three-dimensional single tomographic image, the feature extraction module comprises a multi-task network and a graph convolution network, and the inference module takes the output of the feature extraction module as the input and outputs the prediction result of the tree structure semantic segmentation.

Fig. 5 is a schematic diagram of an offline-stage tree-structure semantic segmentation network of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in fig. 5, the semantic segmentation method of the tree structure in the three-dimensional tomographic image constructs a tree structure semantic segmentation algorithm network, wherein the network is composed of a feature extraction module and an inference module, the feature extraction module comprises a multi-task network and a graph convolution network, and the inference module takes the output of the feature extraction module as input and outputs the prediction result of the tree structure semantic segmentation.

Fig. 6 is a schematic diagram of a multi-task network structure of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in FIG. 6, in the semantic segmentation method of the tree structure in the three-dimensional tomographic image, the multi-task network feature extraction module takes a U-Net network as a main body and consists of a main body part and 3 parallel branches. The trunk part comprises a symmetrical compression path and an expansion path, so as to better reserve local characteristics and realize fusion of global and local characteristics, and a jump connection structure, namely splicing operation of channel dimensions among feature graphs with the same dimension, is added between symmetrical layers on the compression path and the expansion path. The output characteristic diagram of the trunk part is simultaneously transmitted into three parallel branches of the network, and each branch consists of a residual error module and a three-dimensional convolution layer with the convolution kernel size of 1 multiplied by 1. The output of the three branches is respectively a bifurcation key point multi-channel heat map prediction result, a tree structure integral segmentation probability map prediction result and a tree structure semantic segmentation multi-channel probability map prediction result which correspond to the input image.

Fig. 7 is a schematic diagram of a semantic segmentation method of a tree structure in a three-dimensional tomographic image according to an embodiment of the present application.

As shown in fig. 7, in the semantic segmentation method of the tree structure in the three-dimensional tomographic image, if the node v _i∈V₁ is predicted to be located on the RMB semantic segment in the semantic segmentation result of the multi-task network (i.e. the RMB class corresponding channel at the node position has the maximum prediction probability value), and the RMB semantic segment is separated from the trunk of the tracheal tree at the point a and is separated into the RUL and BronInt branches downstream at the point B, the edge e _ij =1 is set between the node v _i and the sampling node v _j near the predicted position of the key point A, B. The edge structures among the neighborhood nodes, the nodes on the semantic class area and the nodes nearby the corresponding key points form an edge set E of the graph model together. The graph model G (V, E) is thus constructed. A sparse binary adjacency matrix a E {1,0} ^N×N can be generated based on the edge set E, where the corresponding position element a _ij = 1 in a is made when edge E _ij = 1, otherwise a _ij = 0.

As shown in fig. 8, the semantic segmentation method of the tree structure in the three-dimensional tomographic image inputs the preprocessed three-dimensional tomographic image into the trained semantic segmentation network of the tree structure. Specifically, the multi-task network obtains corresponding bifurcation key point detection, tree structure integral segmentation and semantic segmentation prediction results from an input image, the graph rolling network constructs a graph model on the prediction results of the multi-task network to complete further feature optimization, and then the inference module combines the feature graphs obtained by the multi-task network and the graph rolling network to output a multi-channel prediction probability graph of the semantic segmentation task. And comparing the predicted probability value of each channel for each voxel position in the predicted probability map, wherein the channel corresponding to the maximum probability value is the semantic class to which the voxel belongs (when the probability value of the background channel is maximum, the voxel does not belong to the tree structure area), so that the final predicted result of the semantic segmentation corresponding to the input image is obtained.

The second embodiment of the present application provides a semantic segmentation device for a tree structure in a three-dimensional tomographic image, including:

The semantic segmentation device of the tree structure in the three-dimensional tomographic image solves the problems that the anatomical tree structure is complex in distribution, different inter-individual forms are varied, the volume ratio of each semantic class in the tomographic image is very small, no obvious demarcation plane exists between classes, the local appearance and gray level distribution of the image can be affected by pathology and the like in the conventional method, achieves the purpose of jointly completing feature extraction by using a multi-task full convolution network and a graph convolution network to obtain the semantic segmentation result of the tree structure, models space context information, explicitly introduces structure priori knowledge, and can be widely applied to semantic segmentation tasks of various anatomical tree structures, such as trachea, artery, vein and the like, and good segmentation performance can be achieved. In order to achieve the above embodiments, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the semantic segmentation method and apparatus for a tree structure in a three-dimensional tomographic image of the above embodiments.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A semantic segmentation method of a tree structure in a three-dimensional tomographic image, comprising the steps of:

acquiring a three-dimensional tomographic image to be tested;

preprocessing the image to obtain a preprocessed image, wherein the preprocessing comprises unifying the image resolution, cutting the image to a unified size and normalizing the gray value of the image;

inputting the preprocessed image into a tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image;

The tree structure semantic segmentation network is constructed, wherein the tree structure semantic segmentation network comprises a feature extraction module and an inference module, and the feature extraction module comprises a multi-task network extraction module and a graph convolution network feature extraction module

Inputting the preprocessed image into a tree-structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image, wherein the semantic segmentation prediction result comprises the following steps:

Inputting the preprocessed image into the multi-task network feature extraction module to obtain a corresponding bifurcation key point detection result, a tree structure integral segmentation result, a semantic segmentation prediction result and an output feature diagram of a trunk part;

Inputting the bifurcation key point detection result, the tree structure integral segmentation result and the semantic segmentation prediction result into the graph convolution network feature extraction module for graph model construction and further feature optimization, and outputting a feature graph;

Inputting the output feature map of the trunk part and the feature map output by the map convolution network feature extraction module into the inference module to obtain a multichannel prediction probability map;

2. The method of claim 1, wherein the tree-structured semantic segmentation network is trained offline, the offline training comprising the steps of:

Acquiring an original data set, and carrying out the preprocessing on the original data set to generate a preprocessed data set, wherein each image in the preprocessed data set has the same resolution and size as those of the preprocessed image and contains the same anatomical tree structure;

preparing data of the preprocessed data set according to the manual labeling result, and generating a training data set;

inputting the training data set into the multi-task network extraction module for pre-training to obtain multi-task network parameters;

and carrying out overall training on the tree-structured semantic segmentation network according to the multi-task network parameters to obtain semantic segmentation algorithm network parameters.

3. The method of claim 2, wherein labeling the preprocessed data set comprises the steps of:

Applying medical image processing software to manually label each image in the preprocessed data set, wherein the content of the manual label comprises a predefined tree structure bifurcation key point and a tree structure integral segmentation label;

based on the tree structure bifurcation key points and the tree structure integral segmentation labels, generating tree structure semantic segmentation labels corresponding to each image in a data set by using an automatic method, and then manually correcting to finish labels.

4. A method according to claim 3, wherein the pre-processed dataset is subjected to data preparation, and the specific process is that a probability heat map, an overall segmentation probability map and a semantic segmentation probability map are generated according to the manual labeling result, each image in the pre-processed dataset and the corresponding probability heat map, the overall segmentation probability map and the semantic segmentation probability map form a training data pair, and all training data pairs jointly form the training dataset.

5. The method of claim 3, wherein inputting the training data set into the multi-tasking network extraction module for pre-training comprises the steps of:

Wherein, 、、The loss functions corresponding to the key point detection task, the tree structure integral segmentation task and the semantic segmentation task are respectively adopted,、Is a super parameter;

6. A method according to claim 3, wherein the global training comprises the steps of:

Wherein, In order to train the parameters of the machine,In order to be a parameter of the multi-tasking network,To randomly initialize the graph rolling network parameters,To infer module network parameters;

Randomly selecting a training data pair from the training data set, inputting images in the training data pair into the multi-task network feature extraction module, and obtaining an output result and an output feature diagram of a trunk part;

For the loss function of the final prediction result, selecting the Dice loss function as the loss function of the final prediction result, For multitasking network loss functions, use of superparametersBalance magnitude;

And circularly executing the whole training step, and completing the whole training when the training times exceed a preset upper limit to obtain the semantic segmentation algorithm network parameters.

7. A semantic segmentation apparatus for a tree structure in a three-dimensional tomographic image, comprising:

The prediction module is used for inputting the preprocessed image into a tree structure semantic segmentation network to obtain a semantic segmentation prediction result corresponding to the image;

The prediction module is also used for inputting the preprocessed image into the multi-task network feature extraction module to obtain corresponding bifurcation key point detection results, tree structure integral segmentation results, semantic segmentation prediction results and output feature graphs of a trunk part; inputting the bifurcation key point detection result, the tree structure integral segmentation result and the semantic segmentation prediction result into the graph convolution network feature extraction module for graph model construction and further feature optimization, and outputting a feature graph; inputting the output feature map of the trunk part and the feature map output by the map convolution network feature extraction module into the inference module to obtain a multichannel prediction probability map; and comparing the predicted probability values of all the channels for each voxel position in the multi-channel predicted probability map, wherein the channel corresponding to the maximum probability value is the semantic class to which the voxel belongs, so that the final predicted result of the semantic segmentation corresponding to the input image is obtained.

8. A non-transitory computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-6.