CN117808707B - Multi-scale image defogging method, system, device and storage medium - Google Patents
Multi-scale image defogging method, system, device and storage medium Download PDFInfo
- Publication number
- CN117808707B CN117808707B CN202311861387.5A CN202311861387A CN117808707B CN 117808707 B CN117808707 B CN 117808707B CN 202311861387 A CN202311861387 A CN 202311861387A CN 117808707 B CN117808707 B CN 117808707B
- Authority
- CN
- China
- Prior art keywords
- layer
- pixel
- image
- module
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a multi-scale image defogging method, a system, equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be defogged; inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
Description
Technical Field
The present invention relates to the field of image defogging technology, and in particular, to a multi-scale image defogging method, system, device and storage medium.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
Image defogging is a pre-processing step for other visual tasks, aimed at defogging the image and restoring a clear, foggy scene through a given foggy day. The atmospheric environment contains a large amount of floating particles such as smoke, dust, fog and the like, and the atmospheric particles can disperse and absorb light, so that the albedo of a scene is reduced, and the problems of limited visibility, low color saturation, loss of details and the like of an image captured by an imaging sensor from a foggy scene inevitably occur. Therefore, how to recover high-quality haze-free images with balanced brightness, abundant details and clear edges from degraded images in foggy days, and provide high-quality input for downstream computer vision tasks and systems, has become a research hotspot in the field of computer vision.
McCartney in 1977 described the formation process of foggy images for the first time in detail, and proposed an atmospheric degradation model based on an attenuation model and an ambient light model, the formation process was as follows:
I(x)=J(x)t(x)+A(1-t(x)) (1)
Wherein I (x) is an acquired foggy-day low-quality image, J (x) is a clear foggy-free image, t (x) represents transmissivity, A represents global atmospheric light, and x represents pixel coordinates;
wherein, the t (x) transmittance is affected by the depth of field, and can be expressed as:
t(x)=e-(βd(x)) (2)
where d (x) represents the depth of field distance between the object and the camera and β represents the light attenuation coefficient. Therefore, the haze-free image can be recovered by accurately estimating the atmospheric light value a and the transmittance t (x).
Based on the thought, researchers at home and abroad explore a plurality of novel image defogging methods (namely model driving methods) by means of different models and priori knowledge, and obtain good defogging performance. These methods, while producing images with good visibility, may introduce artifacts in areas that do not meet the prior.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a multi-scale image defogging method, a system, equipment and a storage medium;
In one aspect, a multi-scale image defogging method is provided, comprising:
Acquiring an image to be defogged;
Inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
In another aspect, a multi-scale image defogging module is provided, comprising:
An acquisition module configured to: acquiring an image to be defogged;
A defogging module configured to: inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
In still another aspect, there is provided an electronic device including:
a memory for non-transitory storage of computer readable instructions; and
A processor for executing the computer-readable instructions,
Wherein the computer readable instructions, when executed by the processor, perform the method of the first aspect described above.
In yet another aspect, there is also provided a storage medium non-transitory storing computer readable instructions, wherein the instructions of the method of the first aspect are executed when the non-transitory computer readable instructions are executed by a computer.
In a further aspect, there is also provided a computer program product comprising a computer program for implementing the method of the first aspect described above when run on one or more processors.
The technical scheme has the following advantages or beneficial effects:
The invention provides a local transducer-based multiscale image defogging network MIDNet, which comprehensively extracts image features of different levels, and processes foggy day images with uniform/non-uniform haze by utilizing local information in a window and a remote relation among pixels. The model can effectively reduce the space resource consumption of the original ViT, realize simple but efficient defogging of a single image, and ensure the visual consistency of the reconstructed image and the truth image.
The invention designs a top-down characteristic aggregation module based on dense connection for the decoding process to aggregate characteristic information of different scales in the decoding process, and simultaneously fuses the characteristics of the same scale in the encoding process, thereby realizing remarkable enhancement of the characteristics.
The invention designs a gating enhancement module, which utilizes different weights to carry out pixel-level assignment to enhance the characteristic information such as edges, textures and the like, thereby realizing the detail preservation of the reconstructed image.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flow chart of a method according to a first embodiment;
FIG. 2 is a diagram showing the internal structure of a multi-scale image defogging network according to the first embodiment;
fig. 3 is an internal structure diagram of a feature extraction module and a feature aggregation module according to the first embodiment;
FIG. 4 is a partial transducer layer internal structure diagram of the first embodiment;
fig. 5 is an internal structure diagram of a gating enhancement module according to the first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
In recent years, with the update of computer software and the update of hardware, a learning-based image defogging method (i.e., a data driving method) has emerged. The method learns network parameters by designing a specific network architecture and paired foggy day data sets so as to obtain a better defogging effect. A few convolutional neural network (convolutional neural networks, CNN) based learning models estimate parameters in the atmospheric degradation model in an end-to-end manner to recover a haze-free image. Estimating the transmittance map t (x) by CNN and suppressing noise as DehazeNet; AODNet the atmospheric light value A and the transmissivity t (x) are jointly estimated, and good defogging performance is obtained. However, most CNN-based methods acquire an image without fog by learning a mapping relationship between the image without fog and the image with fog, and under the constraint of a loss function. However, since CNN-based methods have difficulty in effectively utilizing global information, the defogging results obtained therefrom tend to be unsatisfactory. In particular, for non-uniform haze images, the above method cannot remove the effect of haze in the image well due to irregularities in the haze distribution and differences in the haze concentration thereof.
Recently, transformers have shown great potential in the field of artificial intelligence applications. Initially, researchers applied transformers to the field of natural language processing (natural language processing, NLP) and achieved excellent results in this field. Thus, inspired by the above, researchers extended it to computer vision tasks, presented vision transducers (vision transformer, viT), and made breakthroughs in vision fields such as object detection and image deblurring. However, the original visual transducer grows in square with the increase of the spatial resolution of the input image, and if the original visual transducer is used for visual tasks such as image defogging, the efficiency of the original visual transducer is greatly reduced, and the actual industrial requirement cannot be met.
The present embodiment proposes a novel multi-scale image defogging network MIDNet that processes foggy day images of uniform and non-uniform haze by comprehensively extracting and aggregating multi-source multi-level features through local Transfomer and dense connections. This is one attempt to use Transforemr in processing uniform and non-uniform haze foggy images.
Example 1
The embodiment provides a multi-scale image defogging method;
as shown in fig. 1, the multi-scale image defogging method includes:
s101: acquiring an image to be defogged;
S102: inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
Further, as shown in fig. 2, the trained multiscale image defogging network comprises:
the device comprises an input layer, a feature extraction module, a feature aggregation module, a gate control enhancement module, a pixel-by-pixel addition module and an output layer which are connected in sequence; the input end of the pixel-by-pixel adding module is also connected with the output end of the input layer.
As can be seen from fig. 2, the multi-scale image defogging network is composed of a feature extraction module, a feature aggregation module and a gating enhancement module.
The multi-scale image defogging network first projects an input image I i∈R3 ×H×W through a block embedding (patch-embedding) operation onto an embedded vector of dimension d (d=256 in the multi-scale image defogging network), each embedded image having the shape of(P is the block size of the block embedding, p=4 in a multi-scale image defogging network); secondly, the feature extraction module extracts multi-scale features, and the feature aggregation module aggregates features with different scales (namely multi-source multi-level); the gating enhancement module restores the edge information.
Further, as shown in fig. 3, the feature extraction module includes:
a block embedding layer, a first partial transducer layer, a second partial transducer layer, a third partial transducer layer, and a fourth partial transducer layer connected in sequence;
the input of the block embedding layer is the input of the feature extraction module.
Further, the block embedding layer is configured to divide an image to be defogged into a plurality of blocks of a set size, and represent each block as a vector.
Further, the feature extraction module is used for realizing multi-scale feature extraction of the image to be defogged.
It should be appreciated that since the original ViT performs multi-head attention at all spatial positions, its computational complexity of processing an image of size h×w is o=2× (HW) 2d. The computational complexity of raw ViT increases square times with increasing image resolution, and raw ViT is less efficient and takes up relatively more resources when it processes high resolution images. At the same time, single scale feature representations have certain limitations. Thus, during the encoding phase, the present invention replaces all of the convolutional layers in the FPN encoder with partial fransformer layers.
Further, as shown in fig. 3, the feature aggregation module includes:
the device comprises a first upsampling layer, a first pixel-by-pixel adding module, a second upsampling layer, a second pixel-by-pixel adding module, a first feature cascade layer, a first convolution layer, a third upsampling layer, a third pixel-by-pixel adding module, a second feature cascade layer and a second convolution layer which are sequentially connected;
The input end of the first upsampling layer is connected with the output end of the fourth local fransformer layer through a third convolution layer; the input end of the first pixel-by-pixel adding module is connected with the output end of the third local transducer layer through the fourth convolution layer; the input end of the second pixel-by-pixel adding module is connected with the output end of the second local transducer layer through a fifth convolution layer; the input end of the third pixel-by-pixel adding module is connected with the output end of the first local transducer layer through a sixth convolution layer;
The input end of the first characteristic cascade layer is connected with the output end of the third convolution layer through a fourth upsampling layer; the input end of the second characteristic cascade layer is connected with the output end of the third convolution layer through a fifth upsampling layer; the input end of the second characteristic cascade layer is connected with the output end of the first pixel-by-pixel addition module through a sixth upsampling layer;
the output of the second convolution layer is the output of the feature aggregation module.
Further, the functions of the first feature cascade layer and the second feature cascade layer are the same, and the first feature cascade layer and the second feature cascade layer are used for splicing input features in the channel dimension.
Further, the first, second and third upsampling layers are each configured to perform double upsampling.
Further, the fourth upsampling layer is used for realizing four times upsampling, the fifth upsampling layer is used for realizing eight times upsampling, and the sixth upsampling layer is used for realizing four times upsampling.
Further, the feature aggregation module is used for realizing aggregation of features with different scales.
It should be understood that, in feature aggregation, the invention adopts a global strategy, and adds dense connection operation on the basis of the original FPN feature aggregation mode. Since the add operation loses some of the original feature information at feature fusion, the cascade operation is a lossless operation in a strict sense. The dense connection takes the characteristic information of each layer as the input of the next layer through cascading operation, aggregates the characteristics of different layers, and realizes the characteristic reuse of different scales. Therefore, the feature aggregation module designed by the invention fuses the features of the same level in the encoding process and utilizes dense connection to aggregate the features of different levels in the decoding process from top to bottom. Through the global strategy, the multi-source multi-level characteristic information can be more comprehensively aggregated, and further, a clearer defogging image is reconstructed.
Further, as shown in fig. 4, the internal structures of the first local transducer layer, the second local transducer layer, the third local transducer layer, and the fourth local transducer layer are the same, and the first local transducer layer includes:
The system comprises a first normalization operation layer, a window-based multi-head self-attention mechanism layer, a fourth pixel-by-pixel addition module, a second normalization operation layer, a first multi-layer perceptron, a fifth pixel-by-pixel addition module, a third normalization operation layer, a moving window-based multi-head self-attention mechanism layer, a sixth pixel-by-pixel addition module, a fourth normalization operation module, a second multi-layer perceptron and a seventh pixel-by-pixel addition module which are sequentially connected in series;
The input end of the fourth pixel-by-pixel adding module is also connected with the input end of the first normalization operation layer; the input end of the fifth pixel-by-pixel adding module is also connected with the input end of the second normalization operation layer; the input end of the sixth pixel-by-pixel adding module is also connected with the input end of the third normalization operation layer; the input end of the seventh pixel-by-pixel adding module is also connected with the input end of the fourth normalization operation layer.
Further, the window-based multi-head self-attention mechanism layer W-MSA (window based self-attention), the network structure includes: window division, multi-head self-attention and splicing modules which are connected in sequence.
Further, the window-based multi-head self-attention mechanism layer W-MSA comprises the following working processes: firstly, dividing an input sequence into a plurality of windows with fixed sizes by window dividing operation; secondly, the subsequence in each window is sent to a plurality of attention heads for attention calculation; the pixel points in each window can only be subjected to inner product with other pixel points in the current window so as to obtain information; and finally, splicing the results obtained by calculating each attention head through cascading operation to obtain the final attention representation.
Further, the multi-head self-attention mechanism layer SW-MSA (shifted window based self-attention) based on the moving window comprises the following network structures: window partitioning, multi-headed self-attention, cross-window self-attention, and cascading.
Further, the multi-head self-attention mechanism layer SW-MSA based on the moving window comprises the following working processes: firstly, dividing an input sequence into a plurality of windows with fixed sizes by window dividing operation; secondly, the subsequence in each window is sent to a plurality of self-attention heads for attention calculation; then, the cross-window self-attention carries out weighted summation on the self-attention representation in each window and the self-attention representations in other windows, and self-attention calculation is carried out again to obtain the cross-window self-attention representation; and finally, splicing all window-crossing self-attention representations through cascading operation to obtain a final attention representation.
Further, the first, second, third, fourth, fifth, sixth and seventh pixel-by-pixel adding modules are used for adding pixel values of corresponding pixel points of the two input feature images, and the added pixel values are used as pixel values of corresponding pixel points of the output feature images.
Further, the first multi-layer perceptron and the second multi-layer perceptron are both used for carrying out nonlinear transformation on the input vector.
In the local transducer layer, the feature map is divided into a plurality of disjoint window areas, and a self-attention operation is performed in each window to acquire local information in the local window. Meanwhile, a movable window (movable window) is utilized to establish the connection among the local windows, the remote relation among the pixels is obtained, and then the characteristics of the input foggy day images are rapidly and comprehensively extracted, so that a strong characteristic foundation is provided for processing foggy day images with uniform/non-uniform haze.
The local transducer layer consists of, among other things, a multi-layer perceptron (multilayer perceptron, MLP), normalization (layernorm, LN), residual connection, and local window-based multi-head self-attention, MHSA.
The local transform layer performs self-attention within the local window to keep the linear computation costs, which are more efficient and take less resources.
The local transducer layer operates as follows: given an input feature map X, X is projected to a self-attention Q (Query), K (Key) and V (Value) matrix using a linear layer, and labels are grouped using window partitioning; the local transducer layer applies multi-head attention in the window, and window divisions of adjacent blocks are different; thus, self-attention can be calculated by:
Where d represents the number of channels, B is the relative position deviation term, softMax is the normalization function. A linear layer projects an attention output based on it.
Further, as shown in fig. 5, the gating enhancement module includes:
The fifth normalization operation layer, the seventh convolution layer, the first activation function layer, the eighth convolution layer, the sixth normalization operation layer, the second activation function layer, the gate control unit and the eighth pixel-by-pixel addition module are sequentially connected;
The input end of the gate control unit is also connected with the input end of the fifth normalization operation layer; the input end of the eighth pixel-by-pixel adding module is also connected with the input end of the fifth normalization operation layer.
Further, the gate control enhancing module comprises the following working processes: a fifth normalization operation layer normalizes the input features; the seventh convolution layer carries out linear transformation on the output characteristics of the fifth normalization layer; the first activation function layer retains the positive value output by the seventh convolution layer; the eighth convolution layer carries out linear transformation on the output characteristics of the first activation function layer; the sixth normalization operation layer normalizes the output of the eighth convolution layer; the second activation function layer maps the output of the sixth normalization layer to generate a gating value between 0 and 1; the gating unit selectively reserves or suppresses information in the feature map according to the gating value; and the eighth pixel-by-pixel adding module adds the output of the gating unit and the input of the fifth normalization operation layer pixel by pixel to obtain a final output.
Further, as shown in fig. 5, the gating enhancement module is configured to: and enhancing detail information such as edges in the image.
It should be understood that details such as contours and edges of the image are important structural information, so in order to obtain a defogging image with rich details and clear edges, at the end of the decoding process, the invention designs a gating enhancement module, which enhances details such as edges of the image by giving greater weight to the details such as edges and utilizing a pixel-by-pixel multiplication method.
The gating enhancement module is shown in fig. 2, and after a series of 1×1 convolution, reLU nonlinear activation and Norm normalization operations are performed on the input feature map, a weighting map is generated by using a Sigmoid activation function layer.
And then, performing dot multiplication operation on the weight map and an input feature map (namely, the output of the feature aggregation module), and fusing the input feature map by utilizing residual design and jump connection so as to focus on detail information such as edges and the like, thereby recovering defogging images with rich details and clear edges.
The formula of the enhancement unit is expressed as follows:
Wherein X and Representing the input and output feature maps, respectively, sigmoid is a Sigmoid activation function, norm is batch normalization, RELU represents ReLU nonlinear activation, conv is a1 x1 convolution.
It should be appreciated that the present invention proposes a local transducer-based multiscale image defogging network, referred to as MIDNet (Multi-SCALE IMAGE Dehazing Network based on local Transformer). This is one attempt to use Transforemr in processing uniform and non-uniform haze foggy images.
In the encoding process, different from the CNN which only focuses on local features, the multiscale image defogging network uses a multiscale feature extractor based on local Transformer to extract features of different levels more comprehensively by utilizing local information in a local window and a remote relation among pixels of the Transformer, so that the multiscale image defogging network can process foggy images with uniform and non-uniform haze. Compared with the prior method, the multi-scale image defogging network can obtain global information and simultaneously ensure the extraction efficiency of the features.
In the decoding process, the multi-scale image defogging network combines the pyramid structure from top to bottom and dense connection (dense connections, DC) operation, not only fuses the characteristics of the same scale in the encoding process, but also fuses the characteristics of different scales in the decoding process better, and realizes the comprehensive fusion enhancement of multi-source (encoding process and decoding process) multi-level (different scales) characteristics.
At the end of the network, the multi-scale image defogging network gives greater weight to details such as edges and the like through a gating enhancement module so as to acquire defogging images with rich details. The present invention conducted a number of experiments on the RESIDE, I-HAZE, O-HAZE, NH-HAZE and NITER public data sets, and compared with the most advanced method (SOTA), the proposed MIDNet model achieved better defogging performance.
Further, the trained multiscale image defogging network, and the training process comprises the following steps:
constructing a training set, wherein the training set is an original image of a known defogging image;
And inputting the training set into a multi-scale image defogging network, training the multi-scale image defogging network, and stopping training when the total loss function value of the network is not reduced any more, so as to obtain the trained multi-scale image defogging network.
Further, the total loss function is expressed as:
where ω 1、ω2、ω3 is a hyper-parameter, L represents the total loss function, Representing a smooth loss function, L vgg representing a perceptual loss function, and L ms-ssim representing a multi-scale structural similarity loss function.
Further, the smoothing loss function is expressed as:
Wherein i represents pixels, N is the total number of pixels, Y is the defogging image, and Y is the truth image.
It should be appreciated that many image restoration tasks trained using L1 loss achieve better performance than L2 loss in terms of PSNR and SSIM metrics. The smooth L1 loss converges fast and the gradient change is relatively smaller. Thus, the present invention employs a smooth L1 penalty to ensure that the predicted image is close to the real image.
Further, the perceptual loss function is expressed as:
Wherein Y and Y represent defogging images and truth images respectively, C i represents a channel, and H i and W i are the height and width of an ith feature map respectively; Representing a characterization map of the size C i×Hi×Wi of VGG-16 pre-trained on ImageNet.
It will be appreciated that in order to maintain perceptual and semantic fidelity, and better reconstruct and recover detailed information, the present invention exploits perceptual loss to provide additional supervision in the high-level feature space to measure high-level feature differences between blurred images and their corresponding defogged images.
Further, the multi-scale structural similarity loss function includes:
Wherein Y and Y respectively represent defogging images and truth images, M represents different scales, SSIM (Y, Y) is a structural similarity loss, and human visual perception (brightness, contrast, structure and the like) is considered;
The structural similarity loss function has the expression:
where, gaussian filtering is applied to Y and Y, the mean values of the calculation results are μ Y and μ y, the standard deviations are σ Y and σ y, and the covariances are σ Yy,C1 and C 2 are constants for maintaining stability.
It should be appreciated that the multiscale structural similarity penalty considers both human visual perception and resolution (multiscale), with the range of values for SSIM being [0,1]; in order to preserve the structure of the defogged image, the present invention uses a multi-scale structural similarity penalty to measure the structural similarity between the defogged image and the truth image.
Example two
The embodiment provides a multi-scale image defogging system;
a multi-scale image defogging module comprising:
An acquisition module configured to: acquiring an image to be defogged;
A defogging module configured to: inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
Here, it should be noted that the above-mentioned obtaining module and defogging module correspond to steps S101 to S102 in the first embodiment, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.
Example III
The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.
The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311861387.5A CN117808707B (en) | 2023-12-28 | 2023-12-28 | Multi-scale image defogging method, system, device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311861387.5A CN117808707B (en) | 2023-12-28 | 2023-12-28 | Multi-scale image defogging method, system, device and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117808707A CN117808707A (en) | 2024-04-02 |
| CN117808707B true CN117808707B (en) | 2024-08-02 |
Family
ID=90419805
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311861387.5A Active CN117808707B (en) | 2023-12-28 | 2023-12-28 | Multi-scale image defogging method, system, device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117808707B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN121010595A (en) * | 2025-10-27 | 2025-11-25 | 山东浪潮智慧建筑科技有限公司 | A method, equipment, and medium for fog detection and defogging in a security system. |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116342868A (en) * | 2023-03-22 | 2023-06-27 | 西安电子科技大学 | Small target detection method based on multi-scale feature compensation and gating enhancement |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113450273B (en) * | 2021-06-18 | 2022-10-14 | 暨南大学 | A method and system for image dehazing based on multi-scale and multi-stage neural network |
| CN114120253B (en) * | 2021-10-29 | 2023-11-14 | 北京百度网讯科技有限公司 | Image processing method, device, electronic equipment and storage medium |
| CN114049274B (en) * | 2021-11-13 | 2024-08-27 | 哈尔滨理工大学 | Defogging method for single image |
| CN114202696B (en) * | 2021-12-15 | 2023-01-24 | 安徽大学 | SAR target detection method and device based on context vision and storage medium |
| CN116051428B (en) * | 2023-03-31 | 2023-07-21 | 南京大学 | A low-light image enhancement method based on joint denoising and super-resolution of deep learning |
| CN117237608A (en) * | 2023-09-18 | 2023-12-15 | 江苏智能无人装备产业创新中心有限公司 | A multi-scale fog scene target detection method and system based on deep learning |
-
2023
- 2023-12-28 CN CN202311861387.5A patent/CN117808707B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116342868A (en) * | 2023-03-22 | 2023-06-27 | 西安电子科技大学 | Small target detection method based on multi-scale feature compensation and gating enhancement |
Non-Patent Citations (1)
| Title |
|---|
| Li, S. ; Yuan, Q. ; Zhang, Y. ; Lv, B. ; Wei, F..Image Dehazing Algorithm Based on Deep Learning Coupled Local and Global Features.《 Appl. Sci.》.2022,1-14页. * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117808707A (en) | 2024-04-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116797488B (en) | A Low-Light Image Enhancement Method Based on Feature Fusion and Attention Embedding | |
| Tu et al. | SWCGAN: Generative adversarial network combining swin transformer and CNN for remote sensing image super-resolution | |
| CN111951195B (en) | Image enhancement method and device | |
| Cherabier et al. | Learning priors for semantic 3d reconstruction | |
| CN113378775B (en) | Video shadow detection and elimination method based on deep learning | |
| CN116977208B (en) | Dual-branch fusion low-light image enhancement method | |
| CN113065645B (en) | Twin attention network, image processing method and device | |
| CN108510451B (en) | Method for reconstructing license plate based on double-layer convolutional neural network | |
| CN113947538B (en) | A multi-scale efficient convolutional self-attention single image rain removal method | |
| CN111091503A (en) | Image defocus blur method based on deep learning | |
| CN110503613A (en) | Single Image-Oriented Rain Removal Method Based on Cascaded Atrous Convolutional Neural Network | |
| CN115272438A (en) | High-precision monocular depth estimation system and method for three-dimensional scene reconstruction | |
| CN116645569B (en) | A method and system for colorizing infrared images based on generative adversarial networks | |
| CN111079764A (en) | Low-illumination license plate image recognition method and device based on deep learning | |
| WO2024002211A1 (en) | Image processing method and related apparatus | |
| CN110349087A (en) | RGB-D image superior quality grid generation method based on adaptability convolution | |
| CN117808706B (en) | Video rain removing method, system, equipment and storage medium | |
| Wang et al. | Multi-focus image fusion framework based on transformer and feedback mechanism | |
| CN117726544A (en) | An image deblurring method and system for complex motion scenes | |
| Ali et al. | Boundary-constrained robust regularization for single image dehazing | |
| CN117808707B (en) | Multi-scale image defogging method, system, device and storage medium | |
| Bai et al. | CEPDNet: a fast CNN-based image denoising network using edge computing platform | |
| CN115953312A (en) | A joint defogging detection method, device and storage medium based on a single image | |
| Zhao et al. | End‐to‐End Retinex‐Based Illumination Attention Low‐Light Enhancement Network for Autonomous Driving at Night | |
| Zhou et al. | Restoration of laser interference image based on large scale deep learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |