CN119205549A

CN119205549A - A medical image denoising model training method and device

Info

Publication number: CN119205549A
Application number: CN202411308847.6A
Authority: CN
Inventors: 项磊; 高婕; 张志浩; 宫恩浩
Original assignee: Shenzhitou Medical Technology Development Shanghai Co ltd
Current assignee: Beijing Shentou Technology Co.,Ltd.
Priority date: 2024-09-19
Filing date: 2024-09-19
Publication date: 2024-12-27
Anticipated expiration: 2044-09-19
Also published as: CN119205549B

Abstract

The application discloses a medical image denoising model training method and device, which are characterized in that a noise image and a clean image are acquired, preprocessing is carried out to obtain an input image, a label image and a mask image, the input image is input into an encoder to carry out feature extraction to obtain an original feature image, a Hessian matrix and matrix feature values of the original feature image are calculated, hessian response is calculated according to the matrix feature values, an edge feature image is extracted to obtain a depth Hessian attention feature image, the original feature image and the depth Hessian attention feature image are spliced and input into a decoder to carry out feature fusion to obtain an output image, loss between the output image and the label image is calculated according to the mask image, and updating weights are reversely propagated to obtain a trained medical image denoising model. The application is beneficial to tissue layering, improves the image quality after denoising, and ensures that the diagnosis result is more accurate.

Description

Medical image denoising model training method and device

Technical Field

The application relates to the technical field of medical image processing, in particular to a medical image denoising model training method and device.

Background

Optical coherence tomography (Optical Coherence Tomography, OCT) is a low coherence optical imaging modality, and is widely used in clinical imaging in ophthalmology, dermatology, cardiology, and gastrointestinal tract due to its non-invasive, non-radiative, high resolution, real-time imaging characteristics, etc. In particular, OCT is an effective tool for diagnosing various ocular diseases such as retinal diseases and glaucoma in the field of ophthalmology. However, due to the low coherence light scattering of OCT, the inherent speckle noise reduces the signal-to-noise ratio, affecting image quality and accurate diagnosis. Classical algorithms such as a filtering method, a non-local mean method and the like can effectively remove speckle noise in the OCT image. However, these methods are very time-consuming due to the high computational complexity.

With the popularity of deep learning techniques in image processing, many classical convolutional neural network (Convolutional Neural Network, CNN) architectures are used for OCT image denoising in an effort to produce high quality images, such as U-Net or ResNet. However, such methods are prone to cases where tissue boundary delamination in OCT images is not evident. The method also utilizes GAN (GENERATIVE ADVERSARIAL Network) to complete OCT image speckle noise removal, for example, patent documents with application numbers 201910515611.2 and 202211149964.3, has the problems of high model training complexity, strong dependence on a large amount of data and insufficient generalization on different types of OCT images, and utilizes a three-dimensional convolutional neural Network to conduct denoising, for example, patent document with application number 202311293845.X, and has the advantages of high calculation resource consumption, long training time and possibly facing memory bottlenecks when processing large-scale images, so that denoising performance is poor.

Disclosure of Invention

Therefore, the application provides a medical image denoising model training method and device, which are used for solving the problems that the existing OCT image denoising method is easy to have unobvious tissue boundary layering and poor denoising performance.

In order to achieve the above object, the present application provides the following technical solutions:

In a first aspect, a medical image denoising model training method includes:

Step 1, acquiring medical images of the same part, wherein the medical images comprise noise images and clean images;

step2, adjusting the sizes of the noise image and the clean image, and carrying out normalization processing to obtain an input image and a label image;

Step 3, preparing the clean image as a mask image;

Step 4, inputting the input image into an encoder of a U-Net network for feature extraction to obtain an original feature map;

Step 5, calculating a Hessian matrix of the original feature map, and calculating a matrix feature value of the Hessian matrix;

Step 6, calculating Hessian response according to the matrix eigenvalues, and extracting an edge eigenvector to obtain a depth Hessian attention eigenvector;

Step 7, splicing the original feature map and the depth Hessian attention feature map, and inputting the spliced original feature map and the depth Hessian attention feature map into a decoder of the U-Net network for feature fusion to obtain an output image;

and 8, calculating the loss between the output image and the label image according to the mask image, and back-propagating the update weight so as to obtain a trained medical image denoising model.

Preferably, the step 3 specifically includes making an all 0-value image with the same size as the clean image, generating a polygonal area in a target tissue area of the all 0-value image, and filling with 1-value to obtain a mask image.

Preferably, in the step 4, the U-Net network is ResUNet networks, attention U-Net networks, or Mamba-UNet.

Preferably, in the step 5, the Hessian matrix calculation formula is:

Wherein H _i is a Hessian matrix, Represents partial differentiation, x represents transverse coordinates, y represents longitudinal coordinates, E _i represents the original feature map,Representing the second partial derivative of E _i in the x-direction,Representing the second partial derivative of E _i in the y-direction,Representing the mixed partial derivative of E _i in the x, y directions.

Preferably, in the step 6, the method Jerman, the method Frangi or the method Erdt is used for calculating the Hessian response.

Preferably, in the step 7, when the original feature map and the depth Hessian attention feature map are spliced, a ratio of the original feature map to the depth Hessian attention feature map is 1:2.

Preferably, in the step 8, when the loss between the output image and the tag image is calculated from the mask image, the loss function is any combination of a mean square error loss, an L1 loss, a PSNR loss, and an SSIM loss.

Preferably, when the loss function is a combination of L1 loss and SSIM loss, the loss function calculation formula is:

Wherein, Represents the output image, b represents the label image,Representing the loss of the SSIM,Representing the loss of L1, I _mask representing the mask image,AndThe weight coefficient is represented by a number of weight coefficients,A process for sharpening enhancement using 3*3 convolution kernels is shown.

Preferably, in the step 8, the label image is a sharpened and enhanced label image.

In a second aspect, a medical image denoising model training apparatus includes:

The medical image acquisition module is used for acquiring medical images of the same part, wherein the medical images comprise noise images and clean images;

The data preprocessing module is used for adjusting the sizes of the noise image and the clean image and carrying out normalization processing to obtain an input image and a label image;

a mask image making module, configured to make the clean image into a mask image;

the original feature map extraction module is used for inputting the input image into an encoder of the U-Net network to perform feature extraction to obtain an original feature map;

The calculation module is used for calculating a Hessian matrix of the original feature map and calculating matrix feature values of the Hessian matrix;

The depth Hessian attention feature map extraction module is used for calculating Hessian response according to the matrix feature values and extracting an edge feature map to obtain a depth Hessian attention feature map;

The feature fusion module is used for splicing the original feature image and the depth Hessian attention feature image, inputting the spliced original feature image and the depth Hessian attention feature image into a decoder of the U-Net network for feature fusion, and obtaining an output image;

And the training module is used for calculating the loss between the output image and the label image according to the mask image and back-propagating the update weight so as to obtain a trained medical image denoising model.

Compared with the prior art, the application has at least the following beneficial effects:

The application provides a medical image denoising model training method and device, which are characterized in that a noise image and a clean image of the same part are obtained, preprocessing is carried out to obtain an input image, a label image and a mask image, the input image is input into an encoder of a U-Net network for feature extraction to obtain an original feature image, a Hessian matrix of the original feature image is calculated, matrix feature values of the Hessian matrix are calculated, hessian response is calculated according to the matrix feature values, an edge feature image is extracted to obtain a depth Hessian attention feature image, the original feature image and the depth Hessian attention feature image are spliced and are input into a decoder of the U-Net network for feature fusion to obtain an output image, loss between the output image and the label image is calculated according to the mask image, and updating weights are reversely propagated to obtain a trained medical image denoising model. The application enhances the attention of the U-Net network to the structural details by combining the depth Hessian attention characteristic, can pay more attention to the tissue boundary information and the texture details in the medical image when removing inherent speckles of the medical image and reconstructing the original tissue structure of the medical image, is beneficial to tissue layering, and remarkably improves the image quality of the medical image after denoising, so that the diagnosis result is more accurate.

Drawings

In order to more intuitively illustrate the prior art and the application, exemplary drawings are presented below. It should be understood that the specific shapes and configurations shown in the drawings are not generally considered to be limiting conditions in implementing the present application, and that, for example, those skilled in the art will be able to make conventional adjustments or further optimization of the addition/subtraction/attribution division, specific shapes, positional relationships, connection manners, dimensional proportion relationships, etc. of certain units (components) based on the technical concepts and exemplary drawings disclosed in the present application.

FIG. 1 is a flowchart of a training method for a denoising model of a medical image according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data preprocessing structure according to a first embodiment of the present application;

Fig. 3 is a schematic diagram of a ResUNet network structure based on deep Hessian attention feature improvement according to an embodiment of the present application;

FIG. 4 is a schematic illustration of an improved deep Hessian attention feature supplemental connection configuration provided in accordance with an embodiment of the present application;

fig. 5 is a schematic diagram of a model training structure according to a first embodiment of the present application.

Detailed Description

The application will be further described in detail by means of specific embodiments with reference to the accompanying drawings.

In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more. The terms "first," "second," "third," and the like in this disclosure are intended to distinguish between the referenced objects without a special meaning in terms of technical connotation (e.g., should not be construed as emphasis on the degree of importance or order, etc.). The expressions "comprising", "including", "having", etc. also mean "not limited to" (certain units, components, materials, steps, etc.).

The terms such as "upper", "lower", "left", "right", "middle", and the like, as used herein, are generally used for the purpose of facilitating an intuitive understanding with reference to the drawings and are not intended to be an absolute limitation of the positional relationship in actual products.

The application provides a novel medical image denoising method, namely a medical image denoising method based on a depth Hessian attention feature improved neural network. The main idea is to enhance the attention of the U-Net network to structural details by combining with depth Hessian attention features which emphasize boundary information in OCT images, thereby injecting a visual attention to the network. In addition, to effectively help restore more structural detail, the present application introduces mask loss to improve OCT image quality, especially in clinically significant areas. The mask is a binary image used to mark a region of interest, such as a retinal region in an ophthalmic image. By incorporating the mask into the loss function, the denoising process is focused on these critical areas, enhancing overall image quality and diagnostic accuracy. The method effectively solves the challenges of maintaining tissue boundary integrity and enhancing texture features in OCT images, thereby significantly improving denoising performance and diagnostic accuracy.

The method has good application prospect when processing other imaging modes such as CT (Computed Tomography, CT) and the like. The details of the vascular structure in the CT image are important information for diagnosing lung diseases, and the definition of the blood vessels during imaging directly influences the accuracy of diagnosis. However, noise and artifacts are often present in CT images, which may affect image quality. By introducing deep Hessian attention features, the method can enhance the focus on fine structures in CT images, thereby improving the visibility and resolution of these structures, particularly the edges and bifurcation points of blood vessels. By combining with mask loss, the method can ensure that the denoising process is focused on key areas such as blood vessels in the lung image, and the image quality of the areas is improved, so that the diagnosis precision and reliability are improved. The multi-mode image processing method not only can improve the quality of OCT images, but also has obvious effect on clear display of pulmonary vascular structures in CT images. The popularization and application of the method can play an important role in different medical image fields.

Example 1

Referring to fig. 1, the embodiment provides a medical image denoising model training method, which includes:

S1, acquiring medical images of the same part, wherein the medical images comprise a noise image I _noisy and a clean image I _clean;

s2, adjusting the sizes of the noise image and the clean image, and carrying out normalization processing to obtain an input image and a label image;

Referring to fig. 2, the step of data preprocessing is to perform data preprocessing on the acquired noise image I _noisy and clean image I _clean, where the data preprocessing includes normalizing the noise image I _noisy and the clean image I _clean, and adjusting the data range to be between 0 and 1. The noise image I _noisy is the input image I _input, and the clean image I _clean is the label image I _abel.

S3, preparing a clean image as a mask image;

Specifically, for each clean image I _clean, this step generates a polygon area in the target tissue area of the all 0-value image by making the all 0-value image of the same size as the clean image I _clean, and fills in with 1-values, to obtain a mask image I _make.

S4, inputting the input image into an encoder of a U-Net network for feature extraction to obtain an original feature map;

Specifically, the U-Net network may be ResUNet network, attention U-Net network or Mamba-UNet network, and is preferably ResUNet network. The present embodiment is based on ResUNet network, which replaces the original jump connection of ResUNet with a modified depth Hessian attention feature complementary connection, so ResUNet network comprises three parts of encoder, decoder and depth Hessian attention feature complementary connection, wherein both encoder and decoder have residual structure as shown in fig. 3.

Referring to fig. 4, assuming that the number of downsampling times of the encoder is n, the feature map of the input image I _input that is downsampled and extracted by the encoder isI.e. E _i is the original signature, in this step the number of next samples of the encoder can be increased or decreased, e.g. four times.

S5, calculating a Hessian matrix of the original feature map, and calculating matrix feature values of the Hessian matrix;

specifically, the calculation formula of the Hessian matrix is as follows:

S6, calculating Hessian response according to the matrix eigenvalues, and extracting an edge eigenvector to obtain a depth Hessian attention eigenvector;

Specifically, the step may calculate the Hessian response using the Jerman method, the Frangi method, or the Erdt method.

S7, splicing the original feature map and the depth Hessian attention feature map, and inputting the spliced original feature map and the depth Hessian attention feature map into a decoder of a U-Net network for feature fusion to obtain an output image;

Specifically, this step takes the original signature E _i and the depth Hessian attention signature The input for jump connection after splicing is:

The feature map D _n-i which enters the decoder for n-i times of last sample extraction is spliced with F _i and is transmitted to n-i+1 times of last samples, namely:

。

In this step, the number of times of last sampling of the decoder can be increased or decreased, for example, the number of times of last sampling can be increased;

when the original feature map and the extracted depth Hessian attention feature are spliced to be input as jump connection, the ratio between the original feature map and the depth Hessian attention feature can be changed, for example, the ratio between the original feature map and the depth Hessian attention feature map can be 1:2.

And S8, calculating the loss between the output image and the label image according to the mask image, and back-propagating the update weight so as to obtain a trained medical image denoising model.

Specifically, referring to fig. 5, during training, in order to increase the sharpening degree of the output image, the output image and the label image after the sharpening enhancement are subjected to loss calculation, and update weights are propagated reversely. When calculating the loss between the output image and the label image according to the mask image, the loss function is any combination of mean square error loss, L1 loss, PSNR loss and SSIM loss, and other regularization terms or loss terms, such as contrast loss (Contrastive Loss), perception loss (Perceptual Loss) and the like, can also be introduced to further improve the denoising effect and the image quality.

When the loss function is a combination of L1 loss and SSIM loss, the loss function f is calculated as:

Wherein, Representing the output image, i.e., I _output, b representing the label image, i.e., I _label,Representing the loss of the SSIM,Representing the loss of L1, I _mask representing the mask image,AndThe weight coefficient is represented by a number of weight coefficients,+=1,≥0,≥0。

The SSIM loss calculation formula is:

The L1 loss calculation formula is:

Wherein, Represents the process of sharpening enhancement using 3*3 convolution kernels [ [0, -0.5,0], [ -0.5,3, -0.5], [0, -0.5,0],、、AndRespectively representThe mean and standard deviation of b, c ₁ and c ₂ are two constants, n representsThe number of pixels in b,、Representation ofAnd the ith pixel value in b.

The sharpened convolution kernel [ [0, -0.5,0], [ -0.5,3, -0.5], [0, -0.5,0] ] may be replaced with other sharpened convolution kernels, such as [ [0, -1, 0], [ -1, 4, -1], [0, -1, 0] ], and the like.

When the medical image denoising model trained by the embodiment is utilized to perform OCT image denoising, I _input is input into a ResUNet network (namely the medical image denoising model) trained based on depth Hessian attention characteristic improvement to obtain I _output, and then the sharpened convolution is used for checking the enhancement of I _output to obtainWill beThe value range of (2) is adjusted between 0 and 1, and the final result is obtained by inverse normalization.

According to the medical image denoising model training method, through combining the depth Hessian attention characteristic, the attention of the U-Net network to structural details is enhanced, when intrinsic speckles of an OCT image are removed and an original tissue structure of the OCT is reconstructed, tissue boundary information and texture details in the OCT image are more concerned, tissue layering is facilitated, the image quality of the OCT image after denoising is remarkably improved, the signal to noise ratio is improved, and a diagnosis result is more accurate. In addition, the method is also suitable for other imaging modes, such as CT images, and can enhance the visibility of the pulmonary vascular structures in the CT images, so that the edges and details of the blood vessels are more obvious, and the image quality and the diagnosis accuracy are improved.

Compared with the problem of long time consumption of the traditional denoising method (such as a filtering method and a non-local mean method), the embodiment utilizes a deep learning model to realize more efficient calculation. Even under the condition of large-scale data processing, the method can still quickly generate the denoising image with high quality, and has higher time efficiency. In addition, by introducing mask loss, the embodiment can concentrate on areas with important clinical significance, such as retina in OCT images and pulmonary vascular structures in CT images, so that the image quality and diagnosis accuracy of the areas are further improved, and the method has important significance for clinically and accurately positioning lesions and diagnosis.

Example two

The embodiment provides a medical image denoising model training apparatus, which comprises:

For details of implementation of each module in a medical image denoising model training apparatus, reference may be made to the above definition of a medical image denoising model training method, which is not repeated here.

Any combination of the features of the above embodiments may be used (as long as there is no contradiction between the combinations of the features), and for brevity of description, all of the possible combinations of the features of the above embodiments are not described, and all of the embodiments not explicitly described are also to be considered as being within the scope of the description.

Claims

1. A medical image denoising model training method, comprising:

Step 3, preparing the clean image as a mask image;

2. The method for training the denoising model of medical image according to claim 1, wherein step 3 specifically comprises the steps of creating an all 0-value image with the same size as the clean image, generating a polygonal region in a target tissue region of the all 0-value image, and filling the polygonal region with 1-value to obtain a mask image.

3. The medical image denoising model training method according to claim 1, wherein in the step 4, the U-Net network is ResUNet networks, attention U-Net networks, or Mamba-UNet.

4. The medical image denoising model training method according to claim 1, wherein in the step 5, the Hessian matrix calculation formula is:

;

5. The medical image denoising model training method according to claim 1, wherein the method Jerman, frangi or Erdt is adopted for calculating the Hessian response in step 6.

6. The medical image denoising model training method according to claim 1, wherein in the step 7, when the original feature map and the depth Hessian attention feature map are spliced, the ratio of the original feature map to the depth Hessian attention feature map is 1:2.

7. The medical image denoising model training method according to claim 1, wherein in the step 8, when the loss between the output image and the label image is calculated from the mask image, the loss function is any combination of mean square error loss, L1 loss, PSNR loss, SSIM loss.

8. The medical image denoising model training method according to claim 7, wherein when the loss function is a combination of L1 loss and SSIM loss, the loss function calculation formula is:

;

9. The medical image denoising model training method according to claim 1, wherein in step 8, the label image is a sharpened enhanced label image.

10. A medical image denoising model training apparatus, comprising: