Introduction

Digital subtraction angiography (DSA) effectively isolates vascular structures in X-ray angiography by subtracting a “mask image” (without a contrast agent) from “live images” (with a contrast agent). However, especially in coronary angiography, challenges arise due to patient movement, leading to motion artefacts and blurred vessel appearances (Unlike natural images, the structures on an X-ray film overlap and are transparent. Herein, “background” refers to nonvascular anatomical structures, while “foreground” pertains to blood vessels.). Traditional image registration algorithms aim to remove these artefacts through approaches such as deforming one image before executing subtraction to minimize motion disturbances1,2,3 or using probabilistic models that treat images as superpositions of layered motions4,5. Despite the demonstrated advancements, including deep learning adaptations6,7, these approaches still cannot fully eliminate nonvascular structures from coronary angiograms.

In vessel segmentation tasks, a parallel domain is used to classify each pixel to identify vessels. This approach facilitates observation and lays the groundwork for automated tasks such as stenosis detection. The traditional segmentation process relies on morphological features and uses techniques including Hessian matrix-based8, morphological9, and Gabor filter enhancements10. However, these methods struggle to differentiate vessels from complex backgrounds, such as bones or cardiopulmonary movements, particularly in angiocardiography images.

Convolutional neural networks (CNNs) have shown promise in the image segmentation field11. However, their training procedures require the manual annotation of coronary angiograms, which is a painstaking process. Even with significant efforts, such as Du et al.’s12 work with 20,612 annotated samples, challenges persist. Manual annotation struggles with overlapping structures or low-contrast frames, resulting in models that might overlook smaller vessels. Given these hurdles, unsupervised or weakly supervised techniques that leverage unannotated data seem promising13,14. Notable works include those of Plourde and Luc15, who employed machine learning after applying Hessian, enhancement, and Vlontzos and Mikolajczyk16, who used the TopHat operation alongside the U-Net training process17,18. Additionally, Ma et al.19 proposed a self-supervised vessel segmentation method via adversarial learning. However, these methods still cannot compete with supervised learning.

Addressing these challenges, we propose a novel method for learning vascular representations from unannotated samples. This approach achieves single-frame subtraction that surpasses DSA in background removal efficiency. By effectively eliminating nonvascular structures, our method requires only a minimal number of annotated samples to fine-tune a vessel segmentation model, outperforming purely supervised learning approaches. Our work was motivated by the significant hurdles in coronary angiography vessel segmentation and single-frame subtraction, particularly the extreme difficulty and high cost of annotating fine vessels. The resulting technique leverages large amounts of unannotated data while minimizing manual annotation requirements. This innovative learning paradigm offers a practical solution to real-world challenges in medical image analysis, potentially providing new perspectives for vascular segmentation.

Methods

The traditional digital subtraction approach involves subtracting X-ray images captured before and after the injection of a contrast agent into blood vessels. This process aims to eliminate anatomical structures from the produced images. However, this technique is highly sensitive to motion. Any deviation from a stable position manifests as a visible artefact, diminishing the diagnostic utility of this method. Various registration and layering techniques have been proposed to improve the quality of subtraction images. While these methods exhibit some robustness against minor body motions, they struggle to cope with the continuous nonlinear movements of the heart and lungs. As a result, subtraction images derived from coronary angiograms often suffer from motion artefacts. Furthermore, DSA cannot generate subtraction images from a single frame.

We recognize that subtraction can be framed as an image-to-image (I2I) translation problem, which is a popular and versatile paradigm in machine learning. I2I translation has demonstrated remarkable utility in diverse applications ranging from computer graphics and style transfer to satellite imagery and photo enhancement. The central objective in a typical I2I translation problem is to learn a mapping function that translates images from one domain to a corresponding image in another domain. The existing methods often leverage either paired20 or unpaired21 training samples for this purpose. In our research, we extend this notion by treating a “mask image” (an image taken prior to the administration of a contrast agent) and a “live image” (an image captured after administering the contrast agent) as entities from two distinct domains. This setting is essential because the key difference between these two types of frames lies in whether blood vessels are displayed. If a neural network can learn to switch between them, it would inherently need to learn the representations of the blood vessels. With this framework, we are able to generate a corresponding mask for any frame within a coronary angiography sequence.

All methods were carried out in accordance with relevant guidelines and regulations.All experimental protocols were approved by Ethics Committee of The Second Affiliated Hospital of Chongqing Medical University. Anonymous data were used in this study, and exempt informed consent was obtained from the Ethics Committee of the Second Affiliated Hospital of Chongqing Medical University.

Data

We collect 58,128 coronary angiographic DICOM files from 3756 patients sourced from The Second Affiliated Hospital of Chongqing Medical University. For each patient, the angiography procedure typically involves multiple imaging positions, including RAO (Right Anterior Oblique), LAO (Left Anterior Oblique), CRA (Cranial), CAU (Caudal), AP (Anteroposterior), and LAT (Lateral). These positions are used to visualize different coronary arteries and their branches, including the Left Main Coronary Artery (LM), Left Anterior Descending Branch (LAD), Left Circumflex Branch (LCX), Right Coronary Artery (RCA), and their subdivisions. This comprehensive imaging approach typically results in about 10–15 DICOM segments per patient, with each segment representing a different angle or coronary artery. Each DICOM file consists of a single-angle continuous sequence captured after a single contrast agent injection. From each sequence, we extract the first frame, which has not yet been subjected to contrast injection, to serve as a mask image. Subsequent frames are then randomly selected from the middle portion of each sequence, with each frame varying in terms of its degree of visible vascular structures. These frames are divided into two domains: X and Y. We exclude samples where the site is not the coronary artery or where the first frame already exhibits the presence of a contrast agent. The dataset is finalized with 17,398 mask images and 38,930 live images. We designate this dataset as the Live-Mask Coronary Angiograms Dataset (LM-CAD), which is intended for pretraining neural networks. The significant reduction from 58,128 original DICOM files to 17,398 mask images and 38,930 live images is due to our rigorous selection process. We initially selected the first frame of each DICOM as a potential mask frame, but many were discarded due to the presence of contrast agent. Live frames were extracted at a 2:1 ratio to mask frames, but also underwent manual screening to ensure quality. This process, while reducing the dataset size, ensures high-quality, representative images for both background and vascular structures. Finally, we randomly select 50 live images and manually annotate them with high granularity, identifying vessels with diameters as fine as a single pixel. We refer to these samples as the Fine Segmentation Coronary Angiograms Dataset (FS-CAD), which is suitable for fine-tuning or quantitative evaluation tasks. To facilitate ongoing research in this area, we have made the datasets publicly available.

Moreover, we utilize another publicly available dataset, known as XCAD19, for comparative analyses and evaluations against other methods. We restrict our usage to the test set of this dataset, which includes 126 manually annotated coronary angiograms for segmentation. Unlike FS-CAD, which focuses on finer vessels, XCAD mainly contains annotations for larger blood vessels.

Pretrained model

Initially, two neural networks are pretrained on the LM-CAD dataset, each with distinct objectives. The first network, denoted as \({G}_{yx}\), aims to transform a mask image into a live image by incorporating vascular structures. Conversely, the second network, \({G}_{xy}\), is designed to erase vascular structures from a live image, effectively creating a mask image.

Both \({G}_{xy}\) and \({G}_{yx}\) are based on a U-Net architecture with a base dimensionality of 32, comprising approximately 4.3 million parameters. This architecture allows for efficient learning of hierarchical features while maintaining spatial information through skip connections. The discriminators Dx and Dy use a PatchGAN structure to classify whether 46 × 46 overlapping image patches are real or fake.

These networks operate via cycle-consistent adversarial learning, as depicted in Fig. 1. The overall objective function combines adversarial losses for both generators and a cycle consistency loss. This formulation encourages the generators to produce realistic images while preserving the content of the input images. The detailed mathematical formulations of these loss functions and the complete objective function are provided in the Supplementary materials A.

Fig. 1
figure 1

Schematic diagram of the coronary subtraction framework with cycle-consistent adversarial networks. Its structure is similar to that of CycleGAN21, but its training objective is not style transfer. The added skip connections are designed to encourage the network to remove or add blood vessels while minimally altering the background. This training process is designed to yield a pretrained model. Ultimately, we utilize the \({G}_{xy}\) network. The output of the final layer of this network, when normalized to pixel values between 0 and 1, directly yields the blood vessel contours with the background removed. Further thresholding of this output can produce vessel segmentation results.

Prior to feeding images into these networks, several data augmentation techniques are applied, including random TopHat, random cropping and resizing, and colour jitter. These augmentations help the model learn invariant features and improve generalization. The training process uses the Adam optimizer with a learning rate of 2e-4 for 60 epochs, with linear decay after the 40th epoch.

This training process is referred to as pretraining. Specifically, \({G}_{xy}\), which we term the PT-Model, assimilates vascular representation features through extensive pretraining. These learned features can either be leveraged for subsequent vascular segmentation tasks or directly applied to perform single-frame subtraction.

An additional replica network, \({\text{G}}_{\text{E}}\), is introduced as a clone of \({G}_{xy}\), updated using the exponential moving average (EMA) method for weight updating22. This approach provides more stable model outputs and serves as the final inference network.

Vessel segmentation

To adapt the PT-Model for use in vessel segmentation tasks, we explore two viable approaches:fine-tuning and automatic thresholding.

Finetuning for segmentation

After conducting cycle-consistent pretraining, our U-Net model excels at performing single-frame subtractions. However, fine-grained vessel segmentation requires additional refinements to the initial output. We fine-tune the network on the FS-CAD dataset, specifically by optimizing the \({\text{G}}_{\text{E}}\) parameter. We employ a composite loss function that combines the binary cross-entropy loss and Dice loss, thus aligning with the standard training methodologies that are commonly used in conventional segmentation models. Upon fine-tuning the model, its outputs can be directly thresholded at a value of 0.5 to yield the final segmentation results.

Automatic thresholding for segmentation

For cases when additional fine-tuning data are not available, we propose an alternative approach that enables direct vascular segmentation to be performed using the subtraction images generated by the PT-Model. However, the pixel value distributions in the subtraction images produced by the PT-Model closely resemble those in the original images rather than converging to polarized values of 0 or 1, as expected for segmentation targets. Therefore, using a fixed threshold value for segmentation is not feasible. To address this limitation, we introduce a method called AutoTresh, which employs a joint segmentation approach that combines the Threshold-yen23 and Threshold-Local24 methods. This approach can effectively transform the outputs of the PT-Model into segmented results. The implementation details of this method can be found in the Supplementary materials E.

Experiments and results

After training the model, it is further fine-tuned on the FS-CAD or XCAD. The former is used to produce the best-performing model for the fine-grained segmentation of small vessels; this model is referred to as the FS-Model. The latter serves as the basis for quantitative comparisons with models trained using other methods and is denoted as the XCAD-Model. Due to the limited sample size of the FS-CAD or XCAD dataset, we employ previously described data augmentation strategies to mitigate overfitting during fine-tuning. The number of epochs used for fine-tuning is set to 100. For additional details regarding the training and testing procedures, readers are directed to our source code repository.

Deep subtraction results

It's important to note that our 'deep subtraction' method differs fundamentally from traditional DSA. While we use the term 'subtraction' for ease of comparison, our method does not perform pixel-wise subtraction between two frames. Instead, our generator \({G}_{xy}\) learns to produce a 'virtual mask' for any given input frame, effectively addressing the challenges posed by cardiac motion and other dynamic factors in angiographic imaging.

Both PT-Model and FS-Model generate what we term “deep subtraction” outputs. These processed images exhibit subtracted angiograms in which the majority of nonvascular tissue is effectively eliminated. Given the absence of any quantitative standards for assessing subtraction techniques, Fig. 2 serves as a visual comparative analysis between deep subtraction and DSA.

Fig. 2
figure 2

Comparison between deep subtraction and DSA. The first column shows the original images, while the second column displays the results obtained through DSA. The third and fourth columns present the effects achieved using deep subtraction.

The second column of Fig. 2 displays the results achieved using DSA, a technique that relies on the identification of a mask frame from a given continuous video sequence for subtraction. While DSA can successfully remove static anatomical structures such as the ribs and vertebrae, it falters in terms of addressing artefacts induced by motion, particularly those originating from cardiopulmonary activities. Such artefacts become markedly visible in areas including lung markings and the diaphragm.

In sharp contrast, the third and fourth columns highlight the advantages of deep subtraction. This innovative approach eliminates the necessity of utilizing a predetermined mask, leveraging I2I translation to implicitly generate a corresponding mask for each frame. Deep subtraction overcomes the limitations of DSA. One primary issue with DSA is its inability to use an optimal mask for each frame. This leads to an incomplete background removal effect. This shortcoming has fostered a certain hesitancy among cardiovascular physicians to fully embrace DSA. Deep subtraction excels at more cleanly removing the background, attaining a performance level previously only observed in anatomically stable regions, such as in cerebral angiograms. Moreover, the fine-tuned FS-Model demonstrates superior subtraction outcomes compared to those of the PT-Model, as evidenced by its clearer displays of small blood vessels and more comprehensive removal of catheters. In summary, both models markedly outperform DSA in coronary angiography.

Segmentation results

Evaluation metrics

Common metrics employed for evaluating medical image segmentation performance include the pixel accuracy (PA), intersection over union (IoU), and Dice coefficient. The PA quantifies the proportion of correctly classified pixels within an image. However, its reliability can be compromised in cases with class imbalance; for instance, in our dataset, the background comprises a more significant portion of the data, thereby disproportionately influencing the PA score. The IoU is a prevalent metric in the realm of semantic segmentation. It measures the area of overlap between the ground truth and the predicted segmentation outcome, normalized by the area of their union. Due to its straightforwardness and efficacy, the IoU is widely utilized. The Dice score is another related metric that is calculated as twice the area of overlap divided by the total number of pixels in both the segmented and ground-truth images. While IoU and Dice scores are closely related, we present both to facilitate comparisons across medical and computer vision domains. For primary analysis, readers may focus on the Dice score, which is more commonly used in medical image segmentation.

Evaluation results

After applying thresholding to the outputs of the FS-Model, we obtain vessel segmentation images. Figure 3 shows the qualitative metrics, demonstrating the efficacy of our deep subtraction and segmentation algorithms on the test set of the LM-CAD dataset. Each column in the figure constitutes a sample organized as “original image”- “deep subtraction”- “egmentation.” The segmentation results clearly indicate that not only the primary branches of the coronary artery but also their secondary and tertiary vessels are precisely segmented. Furthermore, pathological alterations such as stenoses are effectively preserved in the segmentation outputs.

Fig. 3
figure 3

The effects of deep subtraction and vessel segmentation. The first row contains the original images, the second row includes the deep subtraction images, and the third row provides the semantic vessel segmentation images. Both the deep subtraction and segmentation results enable enhanced visualizations of pathological alterations such as stenoses.

Given the limited sample size of the FS-CAD dataset, we resort to using a fivefold cross-validation strategy for performance evaluation purposes (Table 1). Specifically, in this fivefold cross-validation scheme, the dataset is partitioned into five equal subsets. One of these subsets is held out for testing, while the remaining four subsets are utilized for training. This process is iterated five times, with a different subset serving as the test set each time. The final performance metric is the average of the five individual test results. Our FS-Model achieves a Dice score of 0.828, further corroborating its robust segmentation capabilities. In comparison, the PT-Model, which is not fine-tuned on the FSCAD dataset, achieves a respectable Dice coefficient of 0.792 through the utilization of the AutoThresh method. A baseline U-Net model (with the same network structure but random initialization), which employs the same architectural underpinnings as those of the PT-Model but is trained exclusively on the FS-CAD dataset, records a significantly lower Dice score of 0.657. This underscores the utility of pretraining: the PT-Model already captures a majority of the vascular features and requires only minimal fine-tuning on a small sample set to effectively adapt to a specific task.

Table 1 Model performance achieved on the FS-CAD dataset.

Considering that the ground-truth annotations for the vessels in the FS-CAD dataset are nearly within the limits of human visual discernment, the high Dice score achieved by the FS-Model attests to its ability to effectively segment even the most diminutive vascular structures.

Validation on the XCAD dataset

To further substantiate the advantages of pretraining, we extend our experimentation to the XCAD dataset, which is a publicly available coronary vessel segmentation dataset comprising 126 images with human-annotated vessel boundaries. Unlike in the FS-CAD dataset, in the XCAD dataset, the groundtruth annotations are primarily focused on larger vessels. Given that the FS-Model is specifically designed for comprehensive vessel detection, directly comparing its performance on the XCAD dataset, which emphasizes larger vessels, may not yield a fair assessment. However, the PT-Model is designed to learn generalized feature representations of coronary arteries, making it adaptable for use in various downstream tasks. To adapt our PT-Model to the specific characteristics of the XCAD dataset, we fine-tune it, resulting in what we refer to as the XCAD-Model. Based on the original authors’ training and evaluation protocols, we employ threefold cross-validation. The resulting scores are documented in Table 2. The data for the other methods and models are extracted directly from the work of19. The comparisons presented in this section, particularly in Tables 1 and 2, are designed to demonstrate the effectiveness of our approach in scenarios with limited annotated data. While traditional methods might achieve better results with larger annotated datasets, our goal is to show that good performance can be achieved with minimal manual annotation by leveraging pretraining on unannotated data. This approach addresses the common challenge of limited annotated data in medical imaging.

Table 2 Model performance achieved on the XCAD dataset.

To obtain supervised learning scores, we conduct a threefold cross-validation evaluation on theXCAD dataset. Domain adaptation methods, such as MMD/citepbermudez2018domain and YNet27, transfer knowledge from annotated datasets in the source domain to unannotated datasets in the target domain. For unsupervised learning, IIC28 is a clustering-based method, while another method, named ReDO28, utilizes an adversarial architecture to extract the object mask of the input. Self-supervised vessel segmentation (SSVS)19, proposed by Ma, employs adversarial learning to acquire vascular representations of unlabelled samples and includes a fractal synthetic module to generate synthetic vessels. SSVS was previously the best-performing unsupervised method on XCAD but fails to surpass the performance of supervised methods.

Our tests show that the XCAD-Model achieves the highest Dice score of 0.755, surpassing the solely supervised learning methods. The high Dice score produced by the XCAD-Model aligns with our expectations, as it is fine-tuned based on the PT-Model that had already learned vascular features through cycle-consistent training. Intriguingly, the PT-Model, which is never trained on the XCAD dataset, still achieves a Dice coefficient of 0.715 after the AutoThresh method is implemented. This result slightly lags behind the performance achieved through supervised learning but confirms the robust generalization ability of the PT-Model and establishes it as the best-performing unsupervised learning method, significantly surpassing SSVS. In contrast, alternative methods such as MMD, YNet, IIC, and ReDO register Dice scores below 0.6, revealing a substantial performance gap between them and supervised learning.

Figure 4 shows a visual representation of the segmentation results produced on the XCAD dataset. Compared to the ground truth, all the models display marginal differences when segmenting larger vessels; these disparities lie mainly in the identification of secondary and tertiary vessels as well as conduits. Remarkably, the XCAD-Model outperforms solely supervised learning methods in terms of recognizing conduits. The PT-Model, which has never been trained on this specific dataset, also approximates the performance of supervised learning methods and demonstrably outperforms the previously best unsupervised or self-supervised method, i.e., SSVS.

Fig. 4
figure 4

Visualization vessel segmentation results produced on the XCAD dataset.

Stenosis detection

In addition, we develop an additional task with potential clinical value. On the basis of the PTModel, we fine-tune a network that is capable of identifying the vascular stenosis locations within coronary angiography images, which is referred to as SDNet. More precisely, we select 60 coronary angiography images with stenosis and manually annotate the stenosis sites. A subset of 10 images is designated as a test set, while the remaining images are allocated for training and validation purposes. Figure 5 illustrates the efficacy of SDNet in terms of detecting coronary stenosis within the test set. Despite the limited number of available training samples, SDNet proficiently identifies stenosis sites, attaining a Dice coefficient of 0.56 on the test set. For comparison, a U-Net model trained from scratch (without pretraining) yields a slower convergence rate and inferior performance, as evidenced by its Dice coefficient of merely 0.35. This outcome underscores the utility of the PT-Model as an exceptionally robust pretrained model. Comprehensive details concerning the training process and additional results acquired from this experiment are provided in the Supplementary materials F.

Fig. 5
figure 5

This figure demonstrates the performance achieved by the vascular stenosis detection network (SDNet) on the test dataset. The first row shows the original images, while the second row presents the manual annotations (highlighted in white). The third row displays the detection outcomes of SDNet. Instead of a binary representation, the results are visualized using a pseudocolour scale. On this scale, the intensity of the red colour indicates an increasing likelihood of a pixel being part of a vascular stenosis site.

Discussion

Our study introduces a novel pretraining approach and model that are specifically tailored for coronary angiograms. We uniquely employ an image-to-image (I2I) framework to conduct pretraining on large unlabelled angiographic image datasets. This overarching strategy resembles the highly successful “self-supervised plus fine-tuning” paradigm found in seminal models such as BERT29, GPTs30, and ViTs31. To enrich the vascular features extracted from angiograms, our pretraining tasks are intentionally designed to toggle between mask images and live images, thereby allowing us to acquire robust vascular representations.

Significance of deep subtraction

DSA has not gained widespread acceptance in cardiology, primarily because of its motion artefact removal limitations. As a result, clinicians are often more inclined to directly examine original angiograms. In contrast, our pretrained model inherently possesses single-frame subtraction capabilities, dramatically surpassing DSA in this specific context. This advancement allows for significantly enhanced subtraction images to be obtained, offering a more reliable alternative in clinical settings.

Fine-tuning

This pretrained model enables competitive vessel segmentation results to be obtained with a minimum of annotated samples. On the FS-CAD dataset, we realize fine-grained annotation for coronary angiography with just 40 samples, achieving a Dice score of 0.825 when benchmarked against meticulous human annotations. With respect to the images in the XCAD dataset, our methodology surpasses the traditional supervised learning techniques, establishing a new state-of-theart (SOTA) approach.

Clinical applications

The model is inherently unaffected by motion, providing clearer coronary artery subtraction images. It can serve as an optional tool, offering radiologists and cardiologists a novel imaging display mode and allowing them to observe vessels without the interference of background noise in real time. Using an RTX3090, the inference speed is 0.3 s per frame (however, using a single 2.0 GHz CPU for inference takes approximately 4 s per frame. Running on a CPU still requires further optimization of speed with inference frameworks such as ONNX). Moreover, the model provides highly accurate segmentation results for major vessels, making it invaluable for quantitative cardiovascular analyses that inform medical decision-making. While our implementation mitigates some of the computational challenges associated with U-Net architectures, future work could explore parallel computation strategies to further optimize performance and resource utilization. Recent studies have demonstrated the effectiveness of such approaches in medical image processing32,33,34,35.

An additional significant aspect is that the pretrained model can significantly reduce the data requirements for downstream tasks. We conduct vessel stenosis detection with a small sample dataset, with the proposed approach functioning as a real-time, automatic alert system for doctors during coronary angiography, enabling them to pinpoint potential vessel stenosis. This feature can also be further utilized to automatically and quantitatively analyse vessel stenosis. Similarly, the model can be conveniently transferred to other downstream tasks without the need for large manually annotated datasets.

Limitations and future directions

While our model achieves competitive performance, particularly regarding the identification of larger vessels, it does encounter limitations when segmenting extremely fine distal vessels, as evidenced in Fig. 6. These overlooked vessels, represented in green, are often only 1–2 pixels in size and exhibit low contrast levels. Even upon performing meticulous manual annotation through image magnification, these diminutive vessels are easily neglected. Although these minor vessels may not be of paramount clinical interest, enhancing the ability of the model to operate in this area could offer advantages for automated analyses. Despite our efforts to mitigate class imbalance and improve small vessel detection, these remain ongoing challenges in medical image segmentation. Future work could explore advanced techniques to further address these issues, drawing inspiration from recent studies in other medical domains. For instance, Chandrasekar et al. have conducted comprehensive analyses of machine learning approaches to handle imbalanced datasets in the context of drug permeability studies, both across the placenta36 and the blood–brain barrier37. Their work on data balancing techniques and machine learning model investigations could provide valuable insights for improving our approach to small vessel detection and addressing class imbalance in coronary angiogram segmentation.

Fig. 6
figure 6

Visual comparison between model-based segmentation results and human annotations. In the images, green denotes the ground truths, which correspond to blood vessels identified solely through manual annotations but missed by the model. Red indicates blood vessels detected exclusively by the model, and yellow marks the overlapping pixels. Notably, vessels with low contrast are prone to omission, as evidenced by the prevalence of green areas in regions where vessel visibility is poor.

Additional limitations of this study include the relatively limited scale and diversity of our dataset, the lack of multi-center validation, and the need for further clinical validation before widespread adoption. Future work should address these limitations through larger, more diverse datasets and extensive clinical trials.

In addition to refining the recognition abilities of the model, future research avenues include extending the model to other downstream tasks, such as multicategory vessel segmentation and the detection of various vascular abnormalities. This approach may also find applications in other medical vascular analysis tasks, thereby reducing the dependence on labelled data. Future research could explore the integration of stochastic resonance techniques and other machine learning technologies to further enhance image contrast and potentially improve segmentation performance, particularly for fine vessel structures. Recent work has shown promise in applying SR to medical image analysis tasks38,39,40,41.

Conclusion

In this study, we introduce a novel vessel segmentation and extraction method for coronary angiography, specifically engineered to capture intricate vessel representations. A model pretrained via this approach excels in terms of background removal and single-frame subtraction, achieving effects similar to digital subtraction angiography (DSA) with only a single frame. Upon fine-tuning the model on a small annotated dataset, its vessel segmentation performance is further enhanced, even surpassing the performance of purely supervised methods. This approach significantly elevates the clarity of subtraction images and the accuracy of major vessel segmentation. These advances are not only academically significant but also have direct clinical value. They facilitate real-time vessel observation without interference from background noise and enable more precise quantitative cardiovascular analyses. Most importantly, this method offers a pathway for autonomously learning complex vessel representations from samples without the need for manual annotations, effectively reducing the demand for supervised samples in downstream tasks.