Open AccessArticle

Semi-Supervised Left-Atrial Segmentation Based on Squeeze–Excitation and Triple Consistency Training

Dongsheng Wang

^1,2,

Tiezhen Xv

^1,2,*,

Jianshen Li

^1,2

Jiehui Liu

^1,2,

Jinxi Guo

^1,2 and

Lijie Yang

^1,2

School of Mechanical and Equipment Engineering, Hebei University of Engineering, Handan 056038, China

Key Laboratory of Intelligent Industrial Equipment Technology of Hebei Province, Hebei University of Engineering, Handan 056038, China

Author to whom correspondence should be addressed.

Symmetry 2024, 16(8), 1041; https://doi.org/10.3390/sym16081041

Submission received: 21 June 2024 / Revised: 4 August 2024 / Accepted: 6 August 2024 / Published: 14 August 2024

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

This study focuses on the segmentation of 3D medical images. Due to the difficulty in obtaining labels for medical images, it is challenging to achieve effective segmentation using fully supervised networks. Therefore, semi-supervised networks have become a better choice. To enhance the segmentation performance of semi-supervised learning methods on the left-atrium database, this paper proposes a triple consistency segmentation network based on squeeze–excitation (SETC-Net). Our semi-supervised medical image segmentation network achieves state-of-the-art segmentation performance on the left-atrium database.

Abstract

Convolutional neural networks (CNNs) have achieved remarkable success in fully supervised medical image segmentation tasks. However, the acquisition of large quantities of homogeneous labeled data is challenging, making semi-supervised training methods that rely on a small amount of labeled data and pseudo-labels increasingly popular in recent years. Most existing semi-supervised learning methods, however, underestimate the importance of the unlabeled regions during training. This paper posits that these regions may contain crucial information for minimizing the model’s uncertainty prediction. To enhance the segmentation performance of the left-atrium database, this paper proposes a triple consistency segmentation network based on the squeeze-and-excitation mechanism (SETC-Net). Specifically, the paper constructs a symmetric architectural unit called SEConv, which adaptively recalibrates the feature responses in the channel direction by modeling the inter-channel correlations. This allows the network to adaptively weigh each channel according to the task’s needs, thereby emphasizing or suppressing different feature channels. Moreover, SETC-Net is composed of an encoder and three slightly different decoders, which convert the prediction discrepancies among the three decoders into unsupervised loss through a constructed iterative pseudo-labeling scheme, thus encouraging consistent and low-entropy predictions. This allows the model to gradually capture generalized features from these challenging unmarked regions. We evaluated the proposed SETC-Net on the public left-atrium (LA) database. The proposed method achieved an excellent Dice score of 91.14% using only 20% of the labeled data. The experiments demonstrate that the proposed SETC-Net outperforms seven current semi-supervised methods in left-atrium segmentation and is one of the best semi-supervised segmentation methods on the LA database.

Keywords:

squeeze–excitation; semi-supervised learning; medical image segmentation; attention; mutual consistency

1. Introduction

The precise segmentation of organs or tissues is fundamental and a crucial step in constructing computer-aided diagnosis (CAD) systems [1]. Various supervised learning-based medical image segmentation techniques have been proposed [2,3,4], which typically require large amounts of annotated data. However, obtaining such large annotated datasets in medical imaging is time-consuming and expensive, as annotations need to be performed by clinical experts. Considering it is much easier to collect unlabeled data, developing semi-supervised segmentation methods is highly desirable. Therefore, semi-supervised segmentation techniques have received increasing attention in recent years and are gradually becoming a major trend for future development.

In semi-supervised medical image segmentation tasks, labeled and unlabeled images typically stem from the same distribution. However, in real-life scenarios, due to the scarcity of labeled data, it is challenging to accurately estimate empirical information from such limited labeled data, leading to empirical errors between abundant unlabeled data and sparse labeled data in experiments [5]. Semi-supervised segmentation methods usually train both labeled and unlabeled data in a similar fashion. For instance, self-training [6,7] involves generating pseudo-labels from a small set of labeled data and then using these pseudo-labels to supervise the unlabeled data in a pseudo-supervised manner. Mean Teacher-based methods [8] utilize consistency loss to “supervise” highly augmented unlabeled data. Some researchers have proposed semi-supervised image segmentation models incorporating consistency regularization [9,10]. For instance, Sohn et al. [9] applied “weak” and “strong” augmentations to unlabeled samples, predicted labels using the model, and utilized cross-entropy loss to ensure consistent outputs under different perturbations. It was confirmed that involving highly confident unlabeled data in training yields better results. Lee et al. [11] used pseudo-labels to encourage low-density separation between classes as entropy regularization during training. Yicheng Wu et al. [12] designed a cyclic pseudo-labeling scheme to transform differences in predictions between two slightly different decoders into unsupervised loss, promoting mutual consistency training and enabling models to capture more generalized features from unknown regions. Yunhao Bai et al. [13], building on the Mean Teacher architecture, bidirectionally copied and pasted images between labeled and unlabeled data, encouraging unlabeled data to learn comprehensive semantic information from both internal and external aspects of labeled data. These semi-supervised methods have shown exciting progress [14].

However, these methods focus solely on utilizing unlabeled data that belong to the same domain as the labeled data, overlooking rich prior knowledge. Chen et al. proposed semi-supervised semantic segmentation based on cross-pseudo supervision, but target distribution strategies are typically fixed and lack adaptability. Fang et al. [15] developed a collaborative training framework to train each sub-model and employed adversarial loss, further enhancing performance. Nevertheless, these deep models either underestimate the importance of some unlabeled regions during training or are constrained by insufficiently high-quality pseudo-labels, resulting in suboptimal fitting of predicted results.

Currently, most algorithmic networks learn from each part of the information in an average manner and then fine-tune based on the importance assessed by the loss function. This approach is highly inefficient and fails to accurately capture all relevant features according to task requirements. Furthermore, due to limitations in training data and label quality, deep models often exhibit ambiguous predictions in complex or unclear regions. One study [14] suggested that such ambiguous predictions can lead to meaningless and unreliable guidance. Previous extensive research has attempted to enhance the representational capability of CNNs by improving the spatial encoding quality of the entire feature hierarchy [16,17]. Building on this approach, this paper proposes a semi-supervised network based on squeeze excitation and triple consistency training (SETC-Net) for segmenting 3D MR left-atrial images. Not long ago, Jie Hu et al. [18] proposed a “Squeeze-and-Excitation” (SE) block, which adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels [19,20]. Previously applied primarily in 2D images, we modified this module for use with 3D images and integrated it into our network architecture based on its principles. We stacked it alongside convolutional blocks at the input layer to form a new SEConv module. This module dynamically adjusts feature representations of each channel by learning importance weights, replacing the original convolutional blocks in our network. In previous networks, during the process of extracting image features, certain feature map layers inevitably play a larger role, while others play a smaller role. This means the network struggles to effectively distinguish between useful and irrelevant information. Our approach applies weights extracted directly from each channel to the feature maps, ensuring adaptive channel weighting during feature extraction. This prioritizes more influential feature maps, thereby enhancing feature extraction compared to regular convolutional layers. Consequently, it further improves the accuracy and generalization capabilities of convolutional neural network models in the field of atrial segmentation. Additionally, our SETC-Net consists of an encoder and three slightly different decoders, which capture uncertainty information by exploiting discrepancies among the three outputs. Subsequently, we sharpened the probability outputs using a sharpening function [9] to generate soft pseudo-labels, followed by entropy regularization. Here, we devised a cyclic pseudo-labeling scheme, encouraging consistency among outputs from different decoders to leverage uncertainty information for learning more critical features [21]. SETC-Net achieves consistent and low-entropy results for the same inputs and outputs, thereby reducing model-based epistemic uncertainty [9,11]. We evaluated the proposed SETC-Net on the publicly available LA database [22] and compared it with seven state-of-the-art (SOTA) methods. The experimental results demonstrate that our designed SETC-Net outperforms current semi-supervised segmentation methods in left-atrium segmentation. Subsequent ablation studies further validate the effectiveness of each component.

Our contributions include the following: (1) A SEConv module was constructed, allowing the network to emphasize or suppress different feature channels according to the task requirements. This aids in extracting features with greater representational capacity. (2) Three slightly different decoders are employed to simulate the model’s cognitive uncertainty. Additionally, a cyclic pseudo-labeling scheme was developed to promote model training by encouraging triplet consistency, enabling the sub-models to learn from each other. (3) The experiments demonstrate that the proposed SETC-Net outperforms seven popular semi-supervised methods in left-atrium segmentation, making it one of the state-of-the-art semi-supervised approaches on the LA database.

2. Methods

The proposed SETC-Net has three key features. First, a new SEConv module was constructed. This module initially compresses the channel dimensions of the image, thus reducing the image information to focus on the main features. Then, it expands the image back to its original size using a symmetric approach to the initial compression. This process filters out finer details, retaining the most core and distinctive information of the image. This module is used to replace the first and second convolutional blocks in the encoder of the V-Net. With this new module, the network can learn to utilize global information and selectively emphasize channel features based on their relevance to the task while suppressing less useful features. Second, two additional slightly different decoders were embedded into the original V-Net [23], and the differences in the outputs of these three decoders are utilized to capture uncertainty information. Finally, a cyclic pseudo-labeling scheme was developed to leverage the uncertainty information and learn more crucial features by encouraging triplet consistency during training.

2.1. Model Architecture

In this paper, we introduce a squeeze-and-excitation (SE) block, the structure of which is illustrated in Figure 1. The input X is mapped to the feature map U through the given transformation

F_{t r}

, where U ∈

R^{H \times W \times D \times C}

. The features in U are initially compressed by a squeezing operation to reduce the number of channels to 1/16 of the original. This concentration enables relevant information to be enhanced, while irrelevant information is attenuated. Subsequently, the features are activated using the ReLU function. The squeezing operation generates channel descriptors (H × W × D) by aggregating feature maps along the spatial dimensions, compressing global spatial information into these channel descriptors. This is achieved by computing channel-wise statistics through global average pooling. The descriptor obtained can generate a feature response embedded within the global distribution of channels, allowing information from the network’s global receptive field to be utilized by all its layers. Following the aggregation, there is an excitation operation that amplifies the number of channels by 16 times, thereby amplifying the essential or useful information. The embedded descriptor is then used as input to generate a set of modulation weights for each channel. These weights are applied to the feature map U (as shown in Figure 1) to produce the output of the SE block, which is then directly fed into the subsequent layers of the network. One can see that this series of squeeze-and-excitation operations is symmetric. As shown in Figure 1, prior to applying the SE attention mechanism (left grayscale image C’), each channel in the feature map carries equal importance. The post-SE-block application is illustrated by the colored image C on the right, with different colors denoting varying weights, indicating that the importance of each feature channel has been adjusted. This adjustment enables the network to focus more on channels with higher weights. However, compared to the initial state, the number of channels remains unchanged; instead, the neural network is directed to focus on certain feature channels. In the symmetric operation of shrinking and then enlarging the image, the processed image retains its original size but includes specific attention to segmentation tasks in its information. This enhancement boosts the channel weights of feature maps that are more useful for the current task while suppressing those with minimal impact.

Stacking this SE block with 3D convolution blocks forms a new SEConv module, as shown in Figure 2. We used this module to replace the first and second layers of convolutional blocks in the V-Net encoder, resulting in a new encoder.

It first employs an adaptive average pooling operation to assess the importance of each channel in the feature map based on the loss function. This importance value is then used to assign a weight to each feature channel. Subsequently, the number of channels is reduced, followed by another linear transformation. Finally, the learned weights y are multiplied with the original feature map x, resulting in a new feature map x that highlights channels of special interest.

For evaluating cognitive uncertainty, there are several effective methods, among which Monte Carlo dropout [24] is notably prominent. For a 3D input sample U ∈

X^{H \times W \times D \times C}

, under stochastic dropout, we perform N random forward passes, where dropout layers sample sub-models

θ_{n}

from the original model θ. In this manner, the deep model θ outputs a set of probability vectors:

{P_{n}}_{n = 1}^{N}

. Uncertainty u can be approximated by statistically aggregating predictions from all sub-models

θ_{n}

. For instance, related work [25] utilizes Monte Carlo dropout to estimate uncertainty u as follows:

μ_{c} = \frac{1}{N} \sum_{n} P_{n}^{c}, u = - \sum_{C} μ_{c} l o g μ_{c}

(1)

where

P_{n}^{c}

denotes the probability output of the nth class c,

μ_{c}

represents the average value across N predictions, C is the number of classes, and the uncertainty u ∈

R^{H \times W \times D \times C}

essentially corresponds to voxel-wise entropy. This method requires multiple inferences. For example, each iteration in [25] involves eight random forward passes to estimate uncertainty, resulting in significant computational costs.

Therefore, inspired by [26] to reduce training and inference costs, we introduced auxiliary classifiers to simplify Monte Carlo dropout. Specifically, the proposed method designs three slightly different decoders to approximate cognitive uncertainty. The basic decoder

θ_{d A}

is almost symmetrical to the original encoder and uses transpose convolution for upsampling, similar to V-Net. Another decoder

θ_{d B}

expands the transmitted feature maps using trilinear interpolation, followed by standard 3D convolution operations for upsampling. The third decoder

θ_{d C}

performs upsampling by using nearest-neighbor interpolation to expand the feature maps as an auxiliary classifier, followed by standard 3D convolution operations, with other modules remaining the same as in V-Net. These three decoders receive the same deep features

F_{e}

from the encoder and generate three feature maps:

F_{A}, F_{B}

, and

F_{C}

. This process can be expressed with the following formula:

F_{e} = f_{θ_{e}} (X), F_{A} = f_{θ_{d A}} (F_{e}), F_{B} = f_{θ_{d B}} (F_{e}), F_{C} = f_{θ_{d C}} (F_{e})

(2)

Subsequently, probability outputs

P_{A}, P_{B}

, and

P_{C}

are obtained from the deep features

F_{A}, F_{B},

and

F_{C}

, respectively, followed by applying the sigmoid activation function to each output. By incorporating three different decoders to increase the diversity of the segmentation model, we effectively reduce overfitting of diverse features among different submodels. Additionally, through the utilization of cross-entropy and consistency losses, the submodels can learn from each other’s insights, thereby enhancing performance [27]. In contrast to Monte Carlo dropout [24,25], the submodels

θ_{n}

in SETC-Net are fixed and do not require additional perturbations during training. The approximation of model cognitive uncertainty is based on the discrepancies between the outputs

P_{A}, P_{B}, a n d P_{C}

from the three decoders.

2.2. Cycled Pseudo Label

Through employing three different decoders for prediction, we converted their prediction discrepancies into auxiliary supervision signals to facilitate model training. Here, we established a cyclic pseudo-labeling scheme. Initially, we applied a sharpening function [28] to transform the probability outputs

P_{A}, P_{B},

and

P_{C}

into soft pseudo-labels

{s P L}_{A}, {s P L}_{B}

, and

{s P L}_{C}

∈

{[0, 1]}^{H \times W \times D}

. The sharpening function is defined as follows:

s P L = \frac{P^{1 / T}}{P^{1 / T} + {(1 - P)}^{1 / T}}

(3)

where T is a constant used to control sharpening temperature. Using soft pseudo-labels not only aids in entropy regularization training, but also, compared to pseudo-labels generated with fixed thresholds, soft pseudo-labels can eliminate the influence of mislabeled training data [28]. Subsequently, we supervised the outputs of the three decoders

{s P L}_{A}, {s P L}_{B}

, and

{s P L}_{C}

to train them mutually, aiming for consistency across the same input [21]. This approach ensures that predictions

P_{A}, P_{B}

, and

P_{C}

are consistent and have low entropy. This consistency and entropy regularization make the model focus more on unlabeled and uncertain regions [9,10]. The total training loss function of our SETC-Net is obtained by weighting and summing the segmentation loss

L_{s e g}

and the consistency loss

L_{c}

, specifically as follows:

L_{s e g} = [D i c e (P_{A}, Y) + D i c e (P_{B}, Y) + D i c e (P_{C}, Y)] / 3

(4)

L_{c} = [L_{2} (P_{A}, {s P L}_{B}) + L_{2} (P_{A}, {s P L}_{C}) + L_{2} (P_{B}, {s P L}_{A}) + L_{2} (P_{B}, {s P L}_{C}) {+ L}_{2} (P_{C}, {s P L}_{A}) + L_{2} (P_{C}, {s P L}_{B})] / 6

(5)

Total loss : Loss = L_{s e g} + λ \times L_{c}

(6)

where Dice represents the Dice loss,

L_{2}

denotes the mean squared error (MSE) loss,

Y

stands for the ground truth, and

λ

is the weight balancing

L_{s e g}

and

L_{c}

. It is important to note that

L_{s e g}

is computed solely based on the labeled data, whereas

L_{c}

is unsupervised and used to supervise all training data. The final constructed SETC-Net architecture diagram is shown in Figure 3.

3. Experiment and Results

3.1. Database

We evaluated the proposed SETC-Net on the LA dataset from the 2018 Atrial Segmentation Challenge [22]. This dataset consists of 100 3D gadolinium-enhanced left-atrial MRI images with annotations. We used 80 samples for training and reserved 20 samples for validation. The isotropic voxel resolution is 0.625 × 0.625 × 0.625 mm. Our experimental setup strictly adhered to those of SASSNet [29], DTC [30], and UA-MT [25].

3.2. Implementation Details

During the preprocessing phase, we first obtained 3D MR images with enhanced edges, which were then cropped to the target size as required. Subsequently, the images were further normalized to zero mean and unit variance. For training, the preprocessed images were randomly cropped into 3D patches of size 112 × 112 × 80. Data augmentation was performed using 2D random rotations and flips. The batch size was set to 4, with each batch containing two labeled images and two unlabeled images. The temperature constant T was set to 0.1, and the weight λ is defined as a time-varying Gaussian warm-up function [31]. The proposed SETC-Net was trained with the SGD optimizer for 15 K iterations, with an initial learning rate of 0.01 and a decay of 10% every 2.5 K iterations. Most experimental settings align with recent SOTA methods [25,30,31]. During testing, features were extracted using a sliding window of size 112 × 112 × 80 with a fixed stride of 18 × 18 × 4. All blocks’ predictions were then recombined to form the complete result. Finally, the average of

P_{A}

and

P_{B}

was used as the final output. All experiments in this paper were conducted in the same device environment with a fixed random seed. (Hardware: Intel Core i7-13700HX CPU, NVIDIA GEFORCE RTX4080 GPU; Software: Pytorch 1.13.1+cu117, and Python 3.9.16).

3.3. Results

Figure 4 displays the results of UA-MT [25], SASSNet [31], DTC [30], the proposed SETC-Net, and the ground truth on the LA dataset. The experiments were conducted under two common semi-supervised settings. The first setting involved using 10% labeled data and 90% unlabeled data, where predictions from the initial 10% labeled data in a fully supervised scenario were used as pseudo-labels for training the remaining 90% unlabeled data. The second setting utilized 20% labeled data and 80% unlabeled data. As shown in Figure 4, the proposed model generated a more complete left atrium compared to several existing SOTA methods, both in 3D and 2D views. It is worth noting that no post-processing modules such as morphological algorithms were used to refine the result images in this study. SETC-Net naturally yields superior results in challenging regions (highlighted in yellow circles) and diminishes most isolated areas present in the LA dataset. This capability is crucial for advancing further quantitative medical analyses.

Table 1 presents the performance of various mainstream networks in our experiment, evaluated quantitatively using four different metrics: Dice coefficient, Jaccard index, 95% Hausdorff distance (95HD), and average surface distance (ASD). The results for V-Net under different supervision settings (10%, 20%, and fully annotated data) are also provided for reference. By focusing more on learning features that are more beneficial for the current task and leveraging multiple decoders to enhance predictions in uncertain regions, the proposed SETC-Net showed impressive performance gains. For instance, the proposed method achieved a Dice score of 91.14% using only 20% of the labeled training data, which is an improvement of 0.79 percentage points over the previous best MC-Net. Additionally, in the fully supervised setting, it achieved an impressive Dice coefficient of 92.28%. As shown in Table 1, SETC-Net outperformed the other seven recent SOTA methods on the LA database across all semi-supervised settings.

To further validate the generalization performance of SETC-Net, we conducted experiments on the Pancreas-NIH dataset [35], which comprises 80 contrast-enhanced abdominal CT volumes. Following the setup in MC-Net [12] for fair comparison, we employed data augmentation through random rotations and flips. During training, images were randomly cropped to a size of 96 × 96 × 96 before being fed into the network. The batch size, initial learning rate, and number of iterations were set to 4, 0.01, and 15,000 respectively, with a 10% learning rate decay every 2500 iterations. We also evaluated our model under 10% and 20% labeled data settings. In Table 2, we compare the proposed method with several current state-of-the-art algorithms. All networks were evaluated under the same experimental settings, and the results are presented without any postprocessing. As shown in the table, SETC-Net achieved the highest Dice score across all supervision settings. This indicates that our network is not only applicable to other datasets but also outperforms several top-performing algorithms. The experiments demonstrate that the proposed network exhibits good generalization and robustness.

3.4. Ablation Study

Yicheng Wu et al. demonstrated that in semi-supervised settings, using two slightly different decoders (V2d-Net) yields better results compared to using two identical decoders (V2-Net). Building upon this, we further experimented, and as shown in Table 3, using three slightly different decoders (V3d-Net) outperformed both using two slightly different decoders (V2d-Net) and using three identical decoders (V3-Net). This suggests that employing three slightly different decoders and encouraging consistency training indeed identifies more effective information, which clearly benefits the experimental results.

Additionally, we reverted the SEConv blocks in SETC-Net back to regular 3D Conv blocks and compared it with SETC-Net under the same experimental settings. As shown in Table 3, evidently, the network utilizing SEConv blocks can extract features with greater representational power, further confirming the effectiveness of our improvements.

4. Conclusions

In this paper, we propose a semi-supervised left-atrial segmentation network based on squeeze-and-excitation (SE) and triple consistency training (SETC-Net). Our core idea stems from the observation that deep models often produce ambiguous predictions in complex or indistinct regions due to training data and convolutional block constraints. We constructed a symmetric architecture unit, SEConv, aimed at enhancing the network’s ability to emphasize or suppress different feature channels according to the task requirements, thereby improving the network’s information extraction capability. The experiments demonstrate its effectiveness in extracting features with greater representational power, reducing ambiguous predictions. Additionally, our model employs three slightly different decoders, encouraging consistent and low-entropy predictions to capture more informative features from critical areas and facilitate model training. Our experiments show that by focusing more on learning features that are more useful for the current task and combining multiple decoders to enhance predictions in uncertain areas, the proposed SETC-Net achieved impressive performance gains. It achieved a Dice score of 91.14% using only 20% of labeled data from the LA database, making it one of the best-performing semi-supervised segmentation networks on the LA database. Additionally, SETC-Net also demonstrated excellent performance on the pancreas CT database, further showcasing the generalization and stability of the proposed network.

Author Contributions

Conceptualization, D.W.; methodology, T.X.; software, L.Y.; validation, J.G.; formal analysis, D.W.; investigation, J.L. (Jianshen Li); resources, J.L. (Jiehui Liu); data curation, T.X.; writing—original draft preparation, T.X.; writing—review and editing, L.Y.; visualization, J.G.; supervision, D.W.; data analysis, J.L. (Jiehui Liu); encoding and experiments, T.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We are very grateful to all the teachers in our research group. Without their help, this research would not have been successful.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Y.; Zhou, Y.; Shen, W.; Park, S.; Fishman, E.K.; Yuille, A.L. Abdominal multi-organ segmentation with organ-attention networks and statistical fusion. Med. Image Anal. 2019, 55, 88–102. [Google Scholar] [CrossRef] [PubMed]
Dou, Q.; Liu, Q.; Heng, P.-A.; Glocker, B. Unpaired multi-modal segmentation via knowledge distillation. IEEE Trans. Med. Imaging 2020, 39, 2415–2425. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Chen, H.; Qi, X.; Dou, Q.; Fu, C.-W.; Heng, P.-A. H-denseunet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 2018, 37, 2663–2674. [Google Scholar] [CrossRef] [PubMed]
Zhao, T.; Cao, K.; Yao, J.; Nogues, I.; Lu, L.; Huang, L.; Xiao, J.; Yin, Z.; Zhang, L. 3d graph anatomy geometry-integrated network for pancreatic mass segmentation, diagnosis, and quantitative patient management. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Wang, Q.; Li, W.; Van Gool, L. Semi-supervised learning by augmented distribution alignment. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1466–1475. [Google Scholar]
Bai, W.; Oktay, O.; Sinclair, M.; Suzuki, H.; Rajchl, M.; Tarroni, G.; Glocker, B.; King, A.P.; Matthews, P.M.; Rueckert, D. Semisupervised learning for network-based cardiac MR image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 10 September 2017. [Google Scholar]
Zhou, Y.; Wang, Y.; Tang, P.; Bai, S.; Shen, W.; Fishman, E.K.; Yuille, A.L. Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019. [Google Scholar]
You, C.; Zhou, Y.; Zhao, R.; Staib, L.; Duncan, J.S. Simcvd: Simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging 2022, 41, 2228–2237. [Google Scholar] [CrossRef] [PubMed]
Sohn, K.; Berthelot, D.; Li, C.-L.; Zhang, Z.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Zhang, H.; Raffel, C. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; Volume 33, pp. 596–608. [Google Scholar]
Ouali, Y.; Hudelot, C.; Tami, M. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12674–12684. [Google Scholar]
Lee, D.H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the ICML 2013 Workshop: Challenges in Representation Learning (WREPL), Atlanta, GA, USA, 21 June 2013; Volume 3. [Google Scholar]
Wu, Y.; Xu, M.; Ge, Z.; Cai, J.; Zhang, L. Semi-supervised left atrium segmentation with mutual consistency training. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part II, volume 12902 of Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2021; Volume 6, p. 297306. [Google Scholar]
Bai, Y.; Chen, D.; Li, Q.; Shen, W.; Wang, Y. Bidirectional copy-paste for semi-supervised medical image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1234–1244. [Google Scholar]
Kowalski, M.; Naruniec, J.; Trzcinski, T. Deep Alignment Network: A convolutional neural network for robust face alignment. arXiv 2017, arXiv:1706.01789. [Google Scholar]
Fang, K.; Li, W.J. DMNet: Difference Minimization Network for Semi-Supervised Segmentation in Medical Images. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Lima, Peru, 4–8 October 2020; Springer: Cham, Switzerland, 2021; pp. 532–541. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the ICML’15: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016. [Google Scholar]
Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4320–4328. [Google Scholar]
Xiong, Z.; Xia, Q.; Hu, Z.; Huang, N.; Bian, C.; Zheng, Y.; Vesal, S.; Ravikumar, N.; Maier, A.; Yang, X.; et al. A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging. Med. Image Anal. 2021, 67, 101832. [Google Scholar] [CrossRef] [PubMed]
Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? arXiv 2017, arXiv:1703.04977. [Google Scholar]
Yu, L.; Wang, S.; Li, X.; Fu, C.W.; Heng, P.A. Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A., Eds.; Springer: Cham, Switzerland, 2019; pp. 605–613. [Google Scholar] [CrossRef]
Zheng, Z.; Yang, Y. Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. Int. J. Comput. Vis. 2021, 129, 1106–1120. [Google Scholar] [CrossRef]
Xia, Y.; Liu, F.; Yang, D.; Cai, J.; Yu, L.; Zhu, Z.; Xu, D.; Yuille, A.; Roth, H. 3d semi-supervised learning with uncertainty-aware multi-view co-training. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 3646–3655. [Google Scholar]
Xie, Q.; Dai, Z.; Hovy, E.; Luong, M.T.; Le, Q.V. Unsupervised data augmentation for consistency training. arXiv 2019, arXiv:1904.12848. [Google Scholar]
Li, S.; Zhang, C.; He, X. Shape-aware Semi-Supervised 3D Semantic Segmentation for Medical Images. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2020, Proceedings of the 23rd International Conference, Lima, Peru, 4–8 October 2020; Martel, J., Wang, L., Ourselin, S., Reyes, M., Yang, G., Eds.; Springer: Cham, Switzerland, 2020; pp. 552–561. [Google Scholar]
Luo, X.; Chen, J.; Song, T.; Chen, Y.; Wang, G.; Zhang, S. Semi-supervised medical image segmentation through dual-task consistency. arXiv 2020, arXiv:2009.04448. [Google Scholar] [CrossRef]
Laine, S.; Aila, T. Temporal ensembling for semi-supervised learning. arXiv 2016, arXiv:1610.02242. [Google Scholar]
Zheng, H.; Lin, L.; Hu, H.; Zhang, Q.; Chen, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Tong, R. Semi-supervised segmentation of liver using adversarial learning with deep atlas prior. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Proceedings of the 22nd International Conference, Shenzhen, China, 13–17 October 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 148–156. [Google Scholar] [CrossRef]
Hang, W.; Feng, W.; Liang, S.; Yu, L.; Wang, Q.; Choi, K.-S. Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, Proceedings of the 23rd International Conference, Lima, Peru, 4–8 October 2020; Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L., Eds.; Springer: Cham, Switzerland, 2020; pp. 562–571. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, Y.; Tian, J.; Zhong, C.; Shi, Z.; Zhang, Y.; He, Z. Double-uncertainty weighted method for semi-supervised learning. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, Proceedings of the 23rd International Conference, Lima, Peru, 4–8 October 2020; Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L., Eds.; Springer: Cham, Switzerland, 2020; pp. 542–551. [Google Scholar] [CrossRef]
Roth, H.R.; Lu, L.; Farag, A.; Shin, H.-C.; Liu, J.; Turkbey, E.B.; Summers, R.M. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part I, Volume 9349 of Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; pp. 556–564. [Google Scholar]

Figure 1. Squeeze-and-excitation block.

Figure 2. SEConv block.

Figure 3. SETC-Net network structure diagram.

Figure 4. Segmentation results obtained on the LA database using UA-MT [25] (first column), SASSNet [31] (second column), DTC [30] (third column), and our SETC-Net (fourth column), along with the corresponding ground truth (fifth column). Comparisons between the use of 10% and 20% labeled data are shown in the top and bottom rows, respectively.

Table 1. Comparisons with seven state-of-the-art methods on the LA database.

Method	#Scans Used		Metrics
Method	Labeled	Unlabeled	Dice (%)	Jaccard (%)	95HD (Voxel)	ASD (Voxel)
V-Net	8 (10%)	0	79.99	68.12	21.11	5.48
V-Net	16 (20%)	0	86.04	76.06	14.26	3.51
V-Net	80 (All)	0	90.72	82.74	7.65	2.34
MC-Net [12]	80 (All)	0	91.13	83.79	5.76	1.57
SETC-Net (Ours)	80 (All)	0	92.28↑1.15	85.72↑1.93	4.42↓1.34	1.40↓0.17
DAP [32]	8 (10%)	72	81.88	71.23	15.81	3.79
UA-MT [25]	8 (10%)	72	84.23	73.48	13.84	3.35
SASSNet [31]	8 (10%)	72	87.31	77.72	9.61	2.55
LG-ER-MT [33]	8 (10%)	72	85.53	75.12	13.29	3.77
DUWM [34]	8 (10%)	72	85.90	75.74	12.67	3.31
DTC [30]	8 (10%)	72	86.57	76.55	14.47	3.74
MC-Net [12]	8 (10%)	72	87.70	78.30	9.37	2.18
SETC-Net (Ours)	8 (10%)	72	87.76↑0.06	78.42↑0.12	9.50↑0.13	2.02↓0.16
DAP [32]	16 (20%)	64	87.89	78.72	9.29	2.74
UA-MT [25]	16 (20%)	64	88.87	80.21	7.32	2.26
SASSNet [31]	16 (20%)	64	89.54	81.24	8.24	2.20
LG-ER-MT [33]	16 (20%)	64	89.63	81.30	7.16	2.06
DUWM [34]	16 (20%)	64	89.65	81.33	7.04	2.03
DTC [30]	16 (20%)	64	89.42	80.97	7.32	2.10
BCP [13]	16 (20%)	64	90.31	82.51	6.97	1.84
MC-Net [12]	16 (20%)	64	90.35	82.47	6.01	1.77
SETC-Net (Ours)	16 (20%)	64	91.14↑0.79	83.79↑1.32	5.75↓0.26	1.39↓0.38

Table 2. Comparison with four SOTA methods on the pancreatic CT database.

Method	#Scans Used		Metrics
Method	Labeled	Unlabeled	Dice (%)	Jaccard (%)	95HD (Voxel)	ASD (Voxel)
V-Net	6 (10%)	0	55.06	40.48	32.80	12.67
V-Net	12 (20%)	0	69.65	55.18	20.19	6.31
V-Net	62 (All)	0	83.01	71.35	5.18	1.19
UA-MT [14]	6 (10%)	56	68.70	54.65	13.89	3.23
SASSNet [31]	6 (10%)	56	66.52	52.23	17.11	2.27
DTC [30]	6 (10%)	56	66.27	52.07	15.00	4.43
MC-Net [12]	6 (10%)	56	68.94	54.74	16.28	3.16
SETC-Net (Ours)	6 (10%)	56	69.03↑0.09	54.89↑0.15	13.96↑0.07	2.11↓0.16
UA-MT [14]	12 (20%)	50	76.77	63.77	11.41	2.79
SASSNet [31]	12 (20%)	50	77.12	64.24	8.93	1.91
DTC [30]	12 (20%)	50	78.27	64.75	8.37	2.27
MC-Net [12]	12 (20%)	50	79.05	65.82	10.29	2.71
SETC-Net (Ours)	12 (20%)	50	80.01↑0.96	66.55↑0.73	7.40↓0.97	1.54↓0.37

All experiments were conducted on the same environments. Pancreas-CT: https://www.cancerimagingarchive.net/collection/pancreas-ct (accessed on 22 July 2021).

Table 3. Ablation study of our proposed SETC-Net on LA database.

Method	#Scans Used		Metrics
Method	Labeled	Unlabeled	Dice (%)	Jaccard (%)	95HD (Voxel)	ASD (Voxel)
V2d-Net	8 (10%)	72	85.79	75.41	14.45	3.83
V3-Net	8 (10%)	72	86.61	76.65	13.39	3.93
V3d-Net	8 (10%)	72	87.56	78.25	9.93	2.16
V3d-Net+SEConv	8 (10%)	72	87.76	78.42	9.50	2.02
V2d-Net	16 (20%)	64	88.97	80.36	7.61	2.25
V3-Net	16 (20%)	64	85.56	79.65	14.06	3.58
V3d-Net	16(20%)	64	90.79	83.22	7.94	2.03
V3d-Net+SEConv	16 (20%)	64	91.14	83.79	5.75	1.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Xv, T.; Li, J.; Liu, J.; Guo, J.; Yang, L. Semi-Supervised Left-Atrial Segmentation Based on Squeeze–Excitation and Triple Consistency Training. Symmetry 2024, 16, 1041. https://doi.org/10.3390/sym16081041

AMA Style

Wang D, Xv T, Li J, Liu J, Guo J, Yang L. Semi-Supervised Left-Atrial Segmentation Based on Squeeze–Excitation and Triple Consistency Training. Symmetry. 2024; 16(8):1041. https://doi.org/10.3390/sym16081041

Chicago/Turabian Style

Wang, Dongsheng, Tiezhen Xv, Jianshen Li, Jiehui Liu, Jinxi Guo, and Lijie Yang. 2024. "Semi-Supervised Left-Atrial Segmentation Based on Squeeze–Excitation and Triple Consistency Training" Symmetry 16, no. 8: 1041. https://doi.org/10.3390/sym16081041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Left-Atrial Segmentation Based on Squeeze–Excitation and Triple Consistency Training

Abstract

Featured Application

Abstract

1. Introduction

2. Methods

2.1. Model Architecture

2.2. Cycled Pseudo Label

3. Experiment and Results

3.1. Database

3.2. Implementation Details

3.3. Results

3.4. Ablation Study

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI