[go: up one dir, main page]

CN116912498B - A 3D Medical Image Segmentation Method and System Based on Paired Attention - Google Patents

A 3D Medical Image Segmentation Method and System Based on Paired Attention

Info

Publication number
CN116912498B
CN116912498B CN202310921977.6A CN202310921977A CN116912498B CN 116912498 B CN116912498 B CN 116912498B CN 202310921977 A CN202310921977 A CN 202310921977A CN 116912498 B CN116912498 B CN 116912498B
Authority
CN
China
Prior art keywords
attention
paired
medical image
module
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310921977.6A
Other languages
Chinese (zh)
Other versions
CN116912498A (en
Inventor
赵晶
邹庆志
陈玲
张荣环
胡玉帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202310921977.6A priority Critical patent/CN116912498B/en
Publication of CN116912498A publication Critical patent/CN116912498A/en
Application granted granted Critical
Publication of CN116912498B publication Critical patent/CN116912498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了基于配对注意力的3D医学图像分割方法及系统,属于3D医学图像分割技术领域。包括获取待分割3D医学图像;将待分割3D医学图像输入完成训练的3D医学图像分割模型进行处理,以获取分割结果;其中,3D医学图像分割模型基于配对注意力变换器模块进行设计,能够对空间维度进行降维并在3D特征图中有效地学习通道和空间信息,实现了在减少模型参数量并加快模型计算速度的同时提升模型分割性能。解决了现有技术中存在“医学图像分割模型的鲁棒性较差且对计算资源的需求较大”的问题。

This invention discloses a 3D medical image segmentation method and system based on paired attention, belonging to the field of 3D medical image segmentation technology. It includes acquiring a 3D medical image to be segmented; inputting the 3D medical image to be segmented into a trained 3D medical image segmentation model for processing to obtain segmentation results; wherein, the 3D medical image segmentation model is designed based on a paired attention transformer module, which can reduce the spatial dimension and effectively learn channel and spatial information in the 3D feature map, thereby improving the model's segmentation performance while reducing the number of model parameters and accelerating model computation. This solves the problem of "poor robustness and high computational resource requirements" in existing technologies for medical image segmentation models.

Description

3D medical image segmentation method and system based on paired attention
Technical Field
The invention relates to the technical field of 3D medical image segmentation, in particular to a 3D medical image segmentation method and system based on paired attention.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
Convolutional Neural Networks (CNNs) are widely used in medical image segmentation tasks due to their powerful multi-scale representation capabilities, the ability to capture local semantic and texture information. Some U-Net variants, such as U-Net++, U-Net3+ and Residual U-Net, perform well on various datasets by processing 3D voxel data into 2D slices. 3D U-Net segmentation in sparsely labeled volumetric images by expanding the U-Net architecture using 3D operations rather than 2D convolution. Isensee et al propose a nnU-Net model based on U-Net with an automated configuration and adaptive framework that can extract features from multi-level images. In addition, there is some effort directed to learning local-global information, such as deformable convolution, depth separable convolution, and large-kernel convolution, through a pure CNNs architecture.
Version Transformer successfully address the challenges of capturing long-range dependencies in medical image segmentation. The transducer learns the correlations between all input markers using a self-attention mechanism, thereby enabling capturing long-range dependencies. While much work has been done to improve the architecture of the transducer to achieve more accurate segmentation, there are few computational complexity issues of model research on the self-attention mechanisms in the transducer. Swin-Unet is a U-shaped encoder-decoder structure consisting of Swin transducer blocks. nnFormer by Zhou et al still uses convolution layers in extracting local image details and employs a hierarchical structure to model multi-scale features.
Recently, some efforts have attempted to design hybrid architectures that combine U-Net models with transformers in order to learn global-local context information through global self-attention mechanics while extracting local features using convolution. TransUNet introduces an encoder with a hybrid CNN-transducer architecture that improves segmentation performance by introducing convolutional neural networks into the transducer structure to recover local spatial information. UNETR introduces a novel Transformer-based method for semantic segmentation of medical images, redefining the task as a one-dimensional sequence-to-sequence prediction problem. The 3D UX-NET proposes that the lightweight volume ConvNet module with large kernel depth separable convolutions adjust the characteristics of the hierarchy to achieve better volumetric segmentation.
The Convolutional Neural Network (CNNs) based approach has certain limitations in capturing geometric and structural information of medical image data because the convolutional operation is limited to learning the limited range of dependencies between pixels. The method based on the Transformer introduces a self-attention mechanism, can learn the long-range dependency relationship among pixels, and solves the problem of limited receptive field in the U-Net variant based on CNN. However, performance improvement of this approach is often accompanied by an increase in model complexity.
Recent research efforts have devised hybrid architectures that combine the U-Net model with a transducer, aimed at learning global-local context information by global self-attention mechanics while extracting local features using convolution. For example UNETR uses a transducer as the encoder and a convolution and upsampling operation as the decoder. Models such as Swin-Unet and nnFormer design a hybrid module based on a transducer and convolution and apply to both encoder and decoder. However, these studies focus mainly on how to improve the segmentation accuracy of the model, but at the cost of increasing the number of parameters and the calculation amount of the model, resulting in poor robustness of the model and a large demand for calculation resources.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a 3D medical image segmentation method, a system, electronic equipment and a computer readable storage medium based on paired attention, wherein a 3D medical image segmentation model is designed based on a paired attention transducer (Paired Attention Transformer, PAT) module, the dimension of a space dimension is reduced, and channels and space information are effectively learned in a 3D feature map, so that model segmentation performance is improved while model parameter quantity is reduced and model calculation speed is increased.
In a first aspect, the present invention provides a paired attention based 3D medical image segmentation method;
A paired attention based 3D medical image segmentation method comprising:
Acquiring a 3D medical image to be segmented;
inputting the 3D medical image to be segmented into a trained 3D medical image segmentation model for processing so as to obtain a segmentation result;
the 3D medical image segmentation model comprises an encoder and a decoder, wherein the encoder is connected with the decoder, the encoder comprises a first encoding module and a plurality of second encoding modules which are sequentially connected, the first encoding module comprises a patch embedding layer and a paired attention transformer module, any second encoding module comprises a paired attention transformer module and a downsampling layer, the decoder comprises a plurality of decoding modules which are sequentially connected, and any decoding module comprises a jump connection module, a paired attention transformer module and an upsampling module.
Further, the paired attention transformer module is composed of a normalization layer, a multi-layer perceptron and a multi-head paired attention module, wherein data input into the paired attention transformer module sequentially passes through the normalization layer, the multi-head paired attention module and the multi-layer perceptron.
Preferably, the multi-head pairing attention module is used for capturing the channel dependency relationship of the input data through the channel attention, obtaining the channel attention output characteristic diagram, obtaining the space attention output characteristic diagram through the space dependency relationship of the space attention capturing input data, merging the channel attention output characteristic diagram and the space attention output characteristic diagram with the original 3D voxel characteristic of the input data, and carrying out 3D convolution to obtain the deep characteristic representation of the input data.
Preferably, the channel attention formula in the multi-head pairing attention module is expressed as:
Wherein X C represents the output obtained through channel attention, Q channel is a channel query vector, K channel is a channel key vector, V channel is a channel value, and d is the size of each vector;
The spatial attention formula in the multi-head paired attention module is expressed as:
Wherein X s is the output obtained by spatial attention, Q spatial is the projection of the spatial query vector, K spatial_proj is the projection of the spatial key vector, V spatial_proj is the projection of the spatial value, and d is the size of each vector.
Further, the first encoding module is used for carrying out embedding processing and segmentation on the 3D medical image to be segmented, obtaining a 3D voxel feature image and adding position codes, the second encoding modules are used for carrying out pairing attention transformation and downsampling operation on the 3D voxel feature image so as to realize sequential dimension reduction of the 3D voxel feature image, the decoding modules are used for carrying out upsampling processing on the dimension-reduced 3D voxel feature image and splicing the dimension-reduced 3D voxel feature image with 3D voxel feature images of different dimensions, then carrying out pairing attention transformation processing so as to realize sequential dimension increase of the spliced 3D voxel feature image, and outputting a predicted final segmentation result through convolution operation.
Further, the inputting the 3D medical image to be segmented into the trained 3D medical image segmentation model for processing includes:
Carrying out embedding processing and segmentation on the 3D medical image to be segmented, obtaining a 3D voxel characteristic map and adding a position code;
performing pairing attention transformation and downsampling operation on the 3D voxel feature map so as to realize sequential dimension reduction of the 3D voxel feature map;
and carrying out up-sampling processing on the 3D voxel feature map after the dimension reduction, splicing the 3D voxel feature map with different dimensions, and carrying out pairing attention transformation processing to realize sequential dimension increase of the spliced 3D voxel feature map, and outputting a predicted final segmentation result through convolution operation.
Further, the training mode for the 3D medical image segmentation model includes:
acquiring training data;
setting AdamW an optimizer, and adaptively adjusting the learning rate;
And training the 3D medical image segmentation model according to training data, the learning rate and a preset loss function.
In a second aspect, the present invention provides a paired attention based 3D medical image segmentation system;
A paired attention based 3D medical image segmentation system comprising:
the acquisition module is used for acquiring the 3D medical image to be segmented;
The 3D medical image segmentation module is used for inputting the 3D medical image to be segmented into a trained 3D medical image segmentation model for processing so as to obtain a segmentation result;
the 3D medical image segmentation model comprises an encoder and a decoder, wherein the encoder is connected with the decoder, the encoder comprises a first encoding module and a plurality of second encoding modules which are sequentially connected, the first encoding module comprises a patch embedding layer and a paired attention transformer module, any second encoding module comprises a paired attention transformer module and a downsampling layer, the decoder comprises a plurality of decoding modules which are sequentially connected, and any decoding module comprises a jump connection module, a paired attention transformer module and an upsampling module.
In a third aspect, the present invention provides an electronic device;
an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the paired attention-based 3D medical image segmentation method described above.
In a fourth aspect, the present invention provides a computer-readable storage medium;
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the paired attention based 3D medical image segmentation method described above.
Compared with the prior art, the invention has the beneficial effects that:
according to the technical scheme provided by the invention, the 3D medical image segmentation model PAT-Unet is designed based on Paired Attention Transformer modules, the dependency information among channels and the rich information on the space dimension are effectively combined by Paired Attention Transformer modules, the segmentation effect is improved, the parameter quantity of the model is reduced, and the model reasoning speed is accelerated.
Compared with the existing method, the technical scheme provided by the invention can capture the detail texture information in the image, and the model is reduced by more than 67% in terms of the parameter number and the operation times while obtaining a more accurate segmentation map.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a schematic flow chart provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a 3D medical image segmentation model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a Paired Attention Transformer module according to an example embodiment of the present invention;
FIG. 4 is a schematic diagram of a multi-head paired attention module according to an embodiment of the present invention;
Fig. 5 is a visual comparison diagram of segmentation results of the method of the present invention with other models on an ACDC dataset according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
In the 3D medical image segmentation method in the prior art, the segmentation precision of the model is improved, but the robustness of the model is poor and the requirement on computing resources is high at the cost of increasing the parameter quantity and the computing quantity of the model, so that the segmentation efficiency and the segmentation precision of the 3D medical image are affected. Accordingly, the present invention provides a 3D medical image segmentation method based on paired attention.
Next, a detailed description will be given of the paired attention-based 3D medical image segmentation method disclosed in this embodiment with reference to fig. 1 to 5. The 3D medical image segmentation method based on the paired attention comprises the following steps of:
S1, acquiring a 3D medical image to be segmented.
S2, inputting the 3D medical image to be segmented into a trained 3D medical image segmentation model for processing so as to obtain a segmentation result. The 3D medical image segmentation model comprises an encoder and a decoder, the encoder is connected with the decoder, the encoder comprises a first encoding module and 3 second encoding modules which are sequentially connected, the first encoding module comprises a patch embedding (Patch Embedding) layer and a pairing attention transformer (Paired Attention Transformer, PAT) module, any second encoding module comprises a pairing attention transformer (Paired Attention Transformer, PAT) module and a downsampling layer, the decoder comprises 4 decoding modules which are sequentially connected, and any decoding module comprises a jump connection module, a pairing attention transformer (Paired Attention Transformer, PAT) module and an upsampling module.
The specific flow of inputting the 3D medical image to be segmented into the trained 3D medical image segmentation model for processing is as follows:
inputting the 3D medical image to be segmented into a decoder, and processing the 3D medical image to be segmented by the decoder, wherein the method comprises the following steps of:
In this embodiment, the 3D medical image to be segmented is a first 3D voxel feature map with a size of 128×128×64×1, i.e. height×width×depth×channel format, where 64 is the Depth of the data volume, 128×128 represents the Height and Width of the data volume, and 1 is the Channel number of the feature volume image.
In the first stage of the encoder, i.e. the first encoding module, first a first 3D voxel feature map is subjected to Patch Embedding (patch embedding) processing by a patch embedding layer, a three-dimensional data volume is split into a number of small data blocks of a low-dimensional representation, and position encoding is added to these data blocks. And secondly, inputting the encoded first 3D voxel characteristic map into a Paired Attention Transformer module for processing, and carrying out focusing segmentation on the medical image characteristic region to obtain a second 3D voxel characteristic map with the size of 32 multiplied by 16 multiplied by 32.
In the second stage of the encoder, i.e. the first second encoder module, the second 3D voxel feature map is first subjected to a step size of 2, and a downsampling layer formed by 3D convolution with the convolution kernel size of 3 multiplied by 3 and normalization operation is processed by Paired Attention Transformer modules to obtain a third 3D voxel characteristic diagram with the size of 16 multiplied by 8 multiplied by 64.
In a third phase of the encoder, i.e. the second encoder module, the third 3D voxel feature map is first subjected to a step size of 2, and a downsampling layer formed by 3D convolution with the convolution kernel size of 3 multiplied by 3 and normalization operation is processed by Paired Attention Transformer modules to obtain a fourth 3D voxel characteristic diagram with the size of 8 multiplied by 4 multiplied by 128.
In a fourth phase of the encoder, i.e. the third second encoder module, the fourth 3D voxel feature map is first subjected to a step size of 2, and a downsampling layer formed by 3D convolution with the convolution kernel size of 3 multiplied by 3 and normalization operation is processed by Paired Attention Transformer modules to obtain a fifth 3D voxel characteristic diagram with the size of 4 multiplied by 2 multiplied by 256.
The processing of the 3D voxel feature map by the Paired Attention Transformer module is the same as the processing of the Paired Attention Transformer module in the decoder section described below, and is described in detail below, and is not repeated here.
The first 3D voxel feature map, the second 3D voxel feature map, the third 3D voxel feature map, the fourth 3D voxel feature map and the fifth 3D voxel feature map which are obtained by processing of the decoder are input into the decoder for processing, and the processing steps are that in the first stage (a first decoding module) of the decoder, the fifth 3D voxel feature map is up-sampled to the 3D voxel feature map with the size of 8 multiplied by 4 multiplied by 128 through an up-sampling layer, then the 3D voxel feature map is spliced with the fourth 3D voxel feature map through a jump connection module, and then the sixth 3D voxel feature map with the size of 8 multiplied by 4 multiplied by 128 is obtained by processing through a Paired Attention Transformer module.
In the second stage (second decoding module) of the decoder, the sixth 3D voxel feature map is up-sampled to a 3D voxel feature map with the size of 16×16×8×64 through an up-sampling layer, then the 3D voxel feature map up-sampled to the size of 16×16×8×64 is spliced with the third 3D voxel feature map through a jump connection module, and then a seventh 3D voxel feature map with the size of 16×16×8×64 is obtained through processing of Paired Attention Transformer modules.
In a third stage of the decoder, the third decoding module, the seventh 3D voxel feature map is first upsampled via an upsampling layer to a 3D voxel feature map of size 32 x 16 x 32, and splicing the 3D voxel characteristic map up-sampled to the size of 32 multiplied by 16 multiplied by 32 with the second 3D voxel characteristic map through a jump connection module, and processing the spliced 3D voxel characteristic map through a Paired Attention Transformer module to obtain an eighth 3D voxel characteristic map with the size of 32 multiplied by 16 multiplied by 32.
In a fourth stage of the decoder, the fourth decoding module, the eighth 3D voxel feature map is first upsampled via an upsampling layer to a 3D voxel feature map of size 128 x 64 x 1, and splicing the 3D voxel feature map up-sampled to 128 multiplied by 64 multiplied by 1 with the result of the 3D convolution processing of the first 3D voxel feature map with the convolution kernel size of 3 multiplied by 3 by 1 through a jump connection module, and obtaining the final prediction output of the model by the 3D convolution processing of the spliced result with the convolution kernel size of 3 multiplied by 3, namely the final segmentation result of the medical feature region: and a ninth 3D voxel feature map.
The Paired Attention Transformer module in the above operation is shown in fig. 3, and mainly consists of a normalization Layer (Layer Norm), a multi-Layer perceptron (MLP), and a multi-head pairing attention (MPA) module. The multi-head pairing attention module is shown in fig. 4, and consists of two parts, namely channel attention and space attention, which capture channel dependency and space dependency respectively.
The channel attention operation in the multi-head paired attention first transposes the vector Q channel and then performs scaling dot product operation with the vector K channel, and uses Softmax to measure the similarity between each feature and the rest of the channel features, so as to obtain the channel attention map. And performing dot product operation on the channel attention map and the vector V channel to capture the dependency relationship among the channels in the feature map so as to obtain the output of the channel attention. The channel attention formula in the multi-head paired attention is as (1):
(1) Where X C represents the output obtained through channel attention. Q channel,Kchannel and V channel represent a channel query vector, a channel key vector, and a channel value, respectively, and d is the size of each vector.
The spatial attention operation in the multi-head paired attention projects V spatial and K spatial with dimensions hwd×c onto spatial dimensions p×c, respectively, to obtain V spatial_proj and K spatial_pro, respectively. And performing scaling dot product operation on the transposed K spatial_pro with the dimension of PxC and Q spatial, and processing by using Softmax to obtain a space attention feature map with the dimension of HWDxP. And finally, performing dot product operation on the spatial attention characteristic map and the projected V spatial_proj to generate a spatial attention characteristic map with the dimension HWD multiplied by C. The spatial attention formula in the multi-head paired attention is expressed as (2):
in equation (2), Q spatial、Kspatial_proj、Vspatial_proj represents a spatial query vector, a projection of a spatial key vector, and a projection of a spatial value, respectively, and d is a size of each vector.
And (3) integrating the output after the channel and the spatial attention with the original 3D voxel characteristics, and performing 3D convolution operation on the integrated result to further extract deeper characteristic representation. The final output of the multi-head paired attention module is shown in equation (3):
X=Conv1( Conv3( ( Xs+Xc ) ) ) (3)
Wherein X C and X S represent output feature maps of channel and spatial attention, respectively, conv1 represents a 3D convolution with a convolution kernel size of 1X 1, conv3 represents a 3D convolution with a convolution kernel size of 3X 3.
Further, training the 3D medical image segmentation model includes:
step 1, training data are acquired.
Two disclosed 3D medical image segmentation datasets, synapse and ACDC, were selected as training data.
Wherein the Synapse data set consists of CT scans of 30 patient abdominal organs. Referring to the partitioning of the dataset by the TransUnet model, the 18 sets of data are partitioned into training sets and the remaining 12 sets of data are partitioned into test sets. The model is given in the experimental results section with a Dice similarity score (DSC) and 95% Hausdorff distance (HD 95) data for 8 abdominal organs, spleen, left kidney, pancreas, stomach, aorta, liver, gall bladder and right kidney. An automated cardiac diagnostic challenge data set (ACDC) was split into 70 training samples, 10 validation samples, and 20 test samples.
And 2, preprocessing and enhancing data.
First, two data sets of Synapse and ACDC are acquired and the input three-dimensional data volume is cut into 128×128×64 sizes.
And secondly, carrying out random rotation and random overturning operation with 50% probability on the cut training image and the corresponding real segmentation image. The data preprocessing and enhancing operation can effectively make up the defect of small number of training images in the original data set, so that the capability of the model for resisting over fitting is enhanced, and the robustness of the model is improved.
And step 3, inputting the data after data enhancement into a 3D medical image segmentation model for processing to obtain a final model prediction graph.
Wherein the loss is calculated using a combination of a plurality of loss functions, the loss functions being used to calculate the error between the predicted value of the model and the true segmented image.
The sum of the cross entropy loss and the soft position loss is used in this embodiment to calculate the loss between the model predicted 3D voxel result and the true value, so that the advantages of both loss functions can be integrated. The loss function is shown in equation (4):
(4) Where V is the total number of 3D voxel feature maps, N is the number of predicted classes, Z v,j is the true value of the jth class at voxel V, and P v,j is the predicted probability of the model output of the jth class at voxel V.
Setting AdamW optimizers, setting initial learning rate to 0.001, setting weight attenuation to 3e-5, setting weight attenuation coefficient to prevent model from over fitting, and self-adapting learning rate adjustment to speed up model convergence.
And calculating the loss between the model segmentation prediction graph and the real result according to the combined loss function, and performing gradient updating and learning rate self-adaptive adjustment by using AdamW optimizers, wherein 8 samples are trained at a time, and the total training is 1000 rounds. The average Dice score and the evaluation index result of 95% Hausdorff distance are given for the Synase data set, and the ACDC data set only uses the average Dice score as the evaluation index.
The comparative models used for the experiments were the current up-to-date medical image segmentation models UNETR and nnFormer, with experimental comparative data for these models as shown in table 1, and visual comparisons with other models on ACDC datasets as shown in fig. 5.
Table 1 experimental comparison of the method with other models
Example two
The embodiment discloses a 3D medical image segmentation system based on paired attention, comprising:
the acquisition module is used for acquiring the 3D medical image to be segmented;
The 3D medical image segmentation module is used for inputting the 3D medical image to be segmented into a trained 3D medical image segmentation model for processing so as to obtain a segmentation result;
the 3D medical image segmentation model comprises an encoder and a decoder, wherein the encoder is connected with the decoder, the encoder comprises a first encoding module and a plurality of second encoding modules which are sequentially connected, the first encoding module comprises a patch embedding layer and a paired attention transformer module, any second encoding module comprises a paired attention transformer module and a downsampling layer, the decoder comprises a plurality of decoding modules which are sequentially connected, and any decoding module comprises a jump connection module, a paired attention transformer module and an upsampling module.
It should be noted that, the acquiring module and the 3D medical image segmentation module correspond to the steps in the first embodiment, and the modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
Example III
An electronic device according to a third embodiment of the present invention includes a memory, a processor, and computer instructions stored in the memory and running on the processor, where the computer instructions, when executed by the processor, complete the steps of the 3D medical image segmentation method based on paired attention.
Example IV
A fourth embodiment of the present invention provides a computer readable storage medium storing computer instructions that, when executed by a processor, perform the steps of the paired attention-based 3D medical image segmentation method described above.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1.基于配对注意力的3D医学图像分割方法,其特征在于,包括:1. A 3D medical image segmentation method based on paired attention, characterized in that it includes: 获取待分割3D医学图像;Acquire 3D medical images to be segmented; 将待分割3D医学图像输入完成训练的3D医学图像分割模型进行处理,以获取分割结果;The 3D medical image to be segmented is input into the trained 3D medical image segmentation model for processing to obtain the segmentation result; 其中,所述3D医学图像分割模型包括编码器和解码器,所述编码器和所述解码器连接,所述编码器包括依次连接第一编码模块和多个第二编码模块,所述第一编码模块由补丁嵌入层和配对注意力变换器模块组成,任一第二编码模块由配对注意力变换器模块和下采样层组成,所述解码器包括依次连接的多个解码模块,任一解码模块由跳跃连接模块、配对注意力变换器模块和上采样模块组成;The 3D medical image segmentation model includes an encoder and a decoder, which are connected. The encoder includes a first encoding module and multiple second encoding modules connected in sequence. The first encoding module consists of a patch embedding layer and a paired attention transformer module. Each second encoding module consists of a paired attention transformer module and a downsampling layer. The decoder includes multiple decoding modules connected in sequence. Each decoding module consists of a skip connection module, a paired attention transformer module, and an upsampling module. 所述配对注意力变换器模块由归一化层、多层感知机和多头配对注意力模块组成,其中,输入所述配对注意力变换器模块的数据依次经过所述归一化层、所述多头配对注意力模块和所述多层感知机;The paired attention transformer module consists of a normalization layer, a multilayer perceptron, and a multi-head paired attention module. The data input to the paired attention transformer module passes through the normalization layer, the multi-head paired attention module, and the multilayer perceptron in sequence. 所述多头配对注意力模块用于通过通道注意力捕获输入数据的通道依赖关系,获取通道注意力输出特征图,通过空间注意力捕获输入数据的空间依赖关系,获取空间注意力输出特征图;将通道注意力输出特征图和空间注意力输出特征图与输入数据的原始3D体素特征融合并进行3D卷积,以获取输入数据的深层次特征表示。The multi-head paired attention module is used to capture the channel dependencies of the input data through channel attention and obtain the channel attention output feature map, and to capture the spatial dependencies of the input data through spatial attention and obtain the spatial attention output feature map. The channel attention output feature map and the spatial attention output feature map are fused with the original 3D voxel features of the input data and 3D convolution is performed to obtain a deep feature representation of the input data. 2.如权利要求1所述的基于配对注意力的3D医学图像分割方法,其特征在于,所述多头配对注意力模块中的通道注意力公式表示为:2. The 3D medical image segmentation method based on paired attention as described in claim 1, characterized in that the channel attention formula in the multi-head paired attention module is expressed as: 其中,XC表示经过通道注意力得到的输出,Qchannel为通道查询向量,Kchannel为通道keys向量,Vchannel为通道值,d为每个向量的大小;Where XC represents the output obtained after channel attention, Q channel is the channel query vector, K channel is the channel keys vector, V channel is the channel value, and d is the size of each vector; 所述多头配对注意力模块中的空间注意力公式表示为:The spatial attention formula in the multi-head pairing attention module is expressed as follows: 其中,为经过空间注意力得到的输出,Qspatial为空间查询向量的投影,Kspatial_proj为空间keys向量的投影,Vspatial_proj为空间值的投影,d为每个向量的大小。in, The output obtained after spatial attention is Q spatial , which is the projection of the spatial query vector, K spatial_proj is the projection of the spatial keys vector, V spatial_proj is the projection of the spatial values, and d is the size of each vector. 3.如权利要求1所述的基于配对注意力的3D医学图像分割方法,其特征在于,所述第一编码模块用于对待分割3D医学图像进行嵌入处理并分割,获取3D体素特征图并添加位置编码,多个所述第二编码模块用于对3D体素特征图进行配对注意力变换和下采样操作,以实现3D体素特征图的依次降维;多个所述解码模块用于对降维后的3D体素特征图进行上采样处理并与不同维度的3D体素特征图进行拼接,再进行配对注意力变换处理,以实现拼接后3D体素特征图的依次升维,经过卷积操作输出预测的最终分割结果。3. The 3D medical image segmentation method based on paired attention as described in claim 1, characterized in that: the first encoding module is used to embed and segment the 3D medical image to be segmented, obtain a 3D voxel feature map and add position encoding; multiple second encoding modules are used to perform paired attention transformation and downsampling operations on the 3D voxel feature map to achieve sequential dimensionality reduction of the 3D voxel feature map; multiple decoding modules are used to upsampling the dimensionality-reduced 3D voxel feature map and concatenate it with 3D voxel feature maps of different dimensions, and then perform paired attention transformation processing to achieve sequential dimensionality increase of the concatenated 3D voxel feature map; and output the predicted final segmentation result after convolution operation. 4.如权利要求1所述的基于配对注意力的3D医学图像分割方法,其特征在于,所述将待分割3D医学图像输入完成训练的3D医学图像分割模型进行处理包括:4. The 3D medical image segmentation method based on paired attention as described in claim 1, characterized in that the step of inputting the 3D medical image to be segmented into the trained 3D medical image segmentation model for processing includes: 对待分割3D医学图像进行嵌入处理并分割,获取3D体素特征图并添加位置编码;The 3D medical image to be segmented is embedded and segmented to obtain a 3D voxel feature map and add position encoding. 对3D体素特征图进行配对注意力变换和下采样操作,以实现3D体素特征图的依次降维;Paired attention transformation and downsampling operations are performed on the 3D voxel feature map to achieve sequential dimensionality reduction of the 3D voxel feature map; 对降维后的3D体素特征图进行上采样处理,并与不同维度的3D体素特征图进行拼接,再进行配对注意力变换处理,以实现拼接后3D体素特征图的依次升维,经过卷积操作输出预测的最终分割结果。The dimensionality-reduced 3D voxel feature map is upsampled and then concatenated with 3D voxel feature maps of different dimensions. Paired attention transformation is then performed to achieve sequential dimensionality increase of the concatenated 3D voxel feature map. Finally, the predicted segmentation result is output after convolution operation. 5.如权利要求1所述的基于配对注意力的3D医学图像分割方法,其特征在于,对所述3D医学图像分割模型的训练方式包括:5. The 3D medical image segmentation method based on paired attention as described in claim 1, characterized in that the training method for the 3D medical image segmentation model includes: 获取训练数据;Obtain training data; 设置AdamW优化器,自适应调整学习率;Configure the AdamW optimizer to adaptively adjust the learning rate; 根据训练数据、学习率和预设的损失函数对所述3D医学图像分割模型进行训练。The 3D medical image segmentation model is trained based on training data, learning rate, and a preset loss function. 6.基于配对注意力的3D医学图像分割系统,其特征在于,包括:6. A 3D medical image segmentation system based on paired attention, characterized in that it includes: 获取模块,用于获取待分割3D医学图像;The acquisition module is used to acquire the 3D medical image to be segmented; 3D医学图像分割模块,用于将待分割3D医学图像输入完成训练的3D医学图像分割模型进行处理,以获取分割结果;The 3D medical image segmentation module is used to input the 3D medical image to be segmented into the trained 3D medical image segmentation model for processing to obtain the segmentation result; 其中,所述3D医学图像分割模型包括编码器和解码器,所述编码器和所述解码器连接,所述编码器包括依次连接第一编码模块和多个第二编码模块,所述第一编码模块由补丁嵌入层和配对注意力变换器模块组成,任一第二编码模块由配对注意力变换器模块和下采样层组成,所述解码器包括依次连接的多个解码模块,任一解码模块由跳跃连接模块、配对注意力变换器模块和上采样模块组成;The 3D medical image segmentation model includes an encoder and a decoder, which are connected. The encoder includes a first encoding module and multiple second encoding modules connected in sequence. The first encoding module consists of a patch embedding layer and a paired attention transformer module. Each second encoding module consists of a paired attention transformer module and a downsampling layer. The decoder includes multiple decoding modules connected in sequence. Each decoding module consists of a skip connection module, a paired attention transformer module, and an upsampling module. 所述配对注意力变换器模块由归一化层、多层感知机和多头配对注意力模块组成,其中,输入所述配对注意力变换器模块的数据依次经过所述归一化层、所述多头配对注意力模块和所述多层感知机;The paired attention transformer module consists of a normalization layer, a multilayer perceptron, and a multi-head paired attention module. The data input to the paired attention transformer module passes through the normalization layer, the multi-head paired attention module, and the multilayer perceptron in sequence. 所述多头配对注意力模块用于通过通道注意力捕获输入数据的通道依赖关系,获取通道注意力输出特征图,通过空间注意力捕获输入数据的空间依赖关系,获取空间注意力输出特征图;将通道注意力输出特征图和空间注意力输出特征图与输入数据的原始3D体素特征融合并进行3D卷积,以获取输入数据的深层次特征表示。The multi-head paired attention module is used to capture the channel dependencies of the input data through channel attention and obtain the channel attention output feature map, and to capture the spatial dependencies of the input data through spatial attention and obtain the spatial attention output feature map. The channel attention output feature map and the spatial attention output feature map are fused with the original 3D voxel features of the input data and 3D convolution is performed to obtain a deep feature representation of the input data. 7.一种电子设备,其特征在于,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成权利要求1-5任一项所述的步骤。7. An electronic device, characterized in that it comprises a memory and a processor, and computer instructions stored in the memory and running on the processor, wherein the computer instructions, when executed by the processor, perform the steps of any one of claims 1-5. 8.一种计算机可读存储介质,其特征在于,用于存储计算机指令,所述计算机指令被处理器执行时,完成权利要求1-5任一项所述的步骤。8. A computer-readable storage medium, characterized in that it is used to store computer instructions, which, when executed by a processor, perform the steps described in any one of claims 1-5.
CN202310921977.6A 2023-07-25 2023-07-25 A 3D Medical Image Segmentation Method and System Based on Paired Attention Active CN116912498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310921977.6A CN116912498B (en) 2023-07-25 2023-07-25 A 3D Medical Image Segmentation Method and System Based on Paired Attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310921977.6A CN116912498B (en) 2023-07-25 2023-07-25 A 3D Medical Image Segmentation Method and System Based on Paired Attention

Publications (2)

Publication Number Publication Date
CN116912498A CN116912498A (en) 2023-10-20
CN116912498B true CN116912498B (en) 2025-11-18

Family

ID=88350861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310921977.6A Active CN116912498B (en) 2023-07-25 2023-07-25 A 3D Medical Image Segmentation Method and System Based on Paired Attention

Country Status (1)

Country Link
CN (1) CN116912498B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120219402B (en) * 2025-03-13 2025-12-09 北京大顶医疗科技有限公司 Pelvis fracture CT image segmentation method based on dynamic spatial feature enhancement module

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375698A (en) * 2022-08-22 2022-11-22 武汉理工大学重庆研究院 Medical image segmentation method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090012841A1 (en) * 2007-01-05 2009-01-08 Yahoo! Inc. Event communication platform for mobile device users
CN112560756A (en) * 2020-12-24 2021-03-26 北京嘀嘀无限科技发展有限公司 Method, device, electronic equipment and storage medium for recognizing human face
CN116309650B (en) * 2023-05-22 2023-08-01 湖南大学 Medical image segmentation method and system based on double-branch embedded attention mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375698A (en) * 2022-08-22 2022-11-22 武汉理工大学重庆研究院 Medical image segmentation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116912498A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN111798400B (en) Reference-free low-light image enhancement method and system based on generative adversarial network
CN114972746B (en) A medical image segmentation method based on multi-resolution overlapping attention mechanism
CN114445420B (en) Image segmentation model combining coding and decoding structure with attention mechanism and training method thereof
CN118134952A (en) A medical image segmentation method based on feature interaction
CN116433914A (en) A two-dimensional medical image segmentation method and system
CN115393289A (en) Tumor image semi-supervised segmentation method based on integrated cross pseudo label
CN115861703A (en) Remote sensing image change detection method and device based on multi-scale CNN-Transformer
CN117237623B (en) A method and system for semantic segmentation of UAV remote sensing images
CN119151963B (en) CT image segmentation method and system based on improved Swin-Unet
CN118570054B (en) Training method, related device and medium for image generation model
CN118967714B (en) Medical image segmentation model establishment method based on harmonic attention and medical image segmentation method
CN113160032A (en) Unsupervised multi-mode image conversion method based on generation countermeasure network
CN114863539A (en) Portrait key point detection method and system based on feature fusion
CN119494961A (en) A dual-branch segmentation method for high-resolution remote sensing images
CN116934593A (en) Image super-resolution method and system based on semantic reasoning and cross convolution
CN118447520A (en) Classification type government affair document analysis method based on multi-mode large language model
CN116385454A (en) A medical image segmentation method based on multi-stage aggregation
CN118470714B (en) Camouflage object semantic segmentation method, system, medium and electronic equipment based on decision-level feature fusion modeling
CN117036368A (en) Image data processing method, device, computer equipment and storage medium
CN120318499A (en) UAV target detection method and electronic equipment based on cross-spatial frequency domain
CN116912498B (en) A 3D Medical Image Segmentation Method and System Based on Paired Attention
CN116012595B (en) Image processing method and system based on transducer
CN110633706B (en) Semantic segmentation method based on pyramid network
CN120564244A (en) Facial image restoration method, device, equipment, medium and program product
CN119417792A (en) A TransUNet liver parenchyma image segmentation method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant