[go: up one dir, main page]

CN117808707B - Multi-scale image defogging method, system, device and storage medium - Google Patents

Multi-scale image defogging method, system, device and storage medium Download PDF

Info

Publication number
CN117808707B
CN117808707B CN202311861387.5A CN202311861387A CN117808707B CN 117808707 B CN117808707 B CN 117808707B CN 202311861387 A CN202311861387 A CN 202311861387A CN 117808707 B CN117808707 B CN 117808707B
Authority
CN
China
Prior art keywords
layer
pixel
image
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311861387.5A
Other languages
Chinese (zh)
Other versions
CN117808707A (en
Inventor
高珊珊
毛德乾
刘峥
刘慧�
潘晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Finance and Economics
Original Assignee
Shandong University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Finance and Economics filed Critical Shandong University of Finance and Economics
Priority to CN202311861387.5A priority Critical patent/CN117808707B/en
Publication of CN117808707A publication Critical patent/CN117808707A/en
Application granted granted Critical
Publication of CN117808707B publication Critical patent/CN117808707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-scale image defogging method, a system, equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be defogged; inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.

Description

Multi-scale image defogging method, system, equipment and storage medium
Technical Field
The present invention relates to the field of image defogging technology, and in particular, to a multi-scale image defogging method, system, device and storage medium.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
Image defogging is a pre-processing step for other visual tasks, aimed at defogging the image and restoring a clear, foggy scene through a given foggy day. The atmospheric environment contains a large amount of floating particles such as smoke, dust, fog and the like, and the atmospheric particles can disperse and absorb light, so that the albedo of a scene is reduced, and the problems of limited visibility, low color saturation, loss of details and the like of an image captured by an imaging sensor from a foggy scene inevitably occur. Therefore, how to recover high-quality haze-free images with balanced brightness, abundant details and clear edges from degraded images in foggy days, and provide high-quality input for downstream computer vision tasks and systems, has become a research hotspot in the field of computer vision.
McCartney in 1977 described the formation process of foggy images for the first time in detail, and proposed an atmospheric degradation model based on an attenuation model and an ambient light model, the formation process was as follows:
I(x)=J(x)t(x)+A(1-t(x)) (1)
Wherein I (x) is an acquired foggy-day low-quality image, J (x) is a clear foggy-free image, t (x) represents transmissivity, A represents global atmospheric light, and x represents pixel coordinates;
wherein, the t (x) transmittance is affected by the depth of field, and can be expressed as:
t(x)=e-(βd(x)) (2)
where d (x) represents the depth of field distance between the object and the camera and β represents the light attenuation coefficient. Therefore, the haze-free image can be recovered by accurately estimating the atmospheric light value a and the transmittance t (x).
Based on the thought, researchers at home and abroad explore a plurality of novel image defogging methods (namely model driving methods) by means of different models and priori knowledge, and obtain good defogging performance. These methods, while producing images with good visibility, may introduce artifacts in areas that do not meet the prior.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a multi-scale image defogging method, a system, equipment and a storage medium;
In one aspect, a multi-scale image defogging method is provided, comprising:
Acquiring an image to be defogged;
Inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
In another aspect, a multi-scale image defogging module is provided, comprising:
An acquisition module configured to: acquiring an image to be defogged;
A defogging module configured to: inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
In still another aspect, there is provided an electronic device including:
a memory for non-transitory storage of computer readable instructions; and
A processor for executing the computer-readable instructions,
Wherein the computer readable instructions, when executed by the processor, perform the method of the first aspect described above.
In yet another aspect, there is also provided a storage medium non-transitory storing computer readable instructions, wherein the instructions of the method of the first aspect are executed when the non-transitory computer readable instructions are executed by a computer.
In a further aspect, there is also provided a computer program product comprising a computer program for implementing the method of the first aspect described above when run on one or more processors.
The technical scheme has the following advantages or beneficial effects:
The invention provides a local transducer-based multiscale image defogging network MIDNet, which comprehensively extracts image features of different levels, and processes foggy day images with uniform/non-uniform haze by utilizing local information in a window and a remote relation among pixels. The model can effectively reduce the space resource consumption of the original ViT, realize simple but efficient defogging of a single image, and ensure the visual consistency of the reconstructed image and the truth image.
The invention designs a top-down characteristic aggregation module based on dense connection for the decoding process to aggregate characteristic information of different scales in the decoding process, and simultaneously fuses the characteristics of the same scale in the encoding process, thereby realizing remarkable enhancement of the characteristics.
The invention designs a gating enhancement module, which utilizes different weights to carry out pixel-level assignment to enhance the characteristic information such as edges, textures and the like, thereby realizing the detail preservation of the reconstructed image.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flow chart of a method according to a first embodiment;
FIG. 2 is a diagram showing the internal structure of a multi-scale image defogging network according to the first embodiment;
fig. 3 is an internal structure diagram of a feature extraction module and a feature aggregation module according to the first embodiment;
FIG. 4 is a partial transducer layer internal structure diagram of the first embodiment;
fig. 5 is an internal structure diagram of a gating enhancement module according to the first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
In recent years, with the update of computer software and the update of hardware, a learning-based image defogging method (i.e., a data driving method) has emerged. The method learns network parameters by designing a specific network architecture and paired foggy day data sets so as to obtain a better defogging effect. A few convolutional neural network (convolutional neural networks, CNN) based learning models estimate parameters in the atmospheric degradation model in an end-to-end manner to recover a haze-free image. Estimating the transmittance map t (x) by CNN and suppressing noise as DehazeNet; AODNet the atmospheric light value A and the transmissivity t (x) are jointly estimated, and good defogging performance is obtained. However, most CNN-based methods acquire an image without fog by learning a mapping relationship between the image without fog and the image with fog, and under the constraint of a loss function. However, since CNN-based methods have difficulty in effectively utilizing global information, the defogging results obtained therefrom tend to be unsatisfactory. In particular, for non-uniform haze images, the above method cannot remove the effect of haze in the image well due to irregularities in the haze distribution and differences in the haze concentration thereof.
Recently, transformers have shown great potential in the field of artificial intelligence applications. Initially, researchers applied transformers to the field of natural language processing (natural language processing, NLP) and achieved excellent results in this field. Thus, inspired by the above, researchers extended it to computer vision tasks, presented vision transducers (vision transformer, viT), and made breakthroughs in vision fields such as object detection and image deblurring. However, the original visual transducer grows in square with the increase of the spatial resolution of the input image, and if the original visual transducer is used for visual tasks such as image defogging, the efficiency of the original visual transducer is greatly reduced, and the actual industrial requirement cannot be met.
The present embodiment proposes a novel multi-scale image defogging network MIDNet that processes foggy day images of uniform and non-uniform haze by comprehensively extracting and aggregating multi-source multi-level features through local Transfomer and dense connections. This is one attempt to use Transforemr in processing uniform and non-uniform haze foggy images.
Example 1
The embodiment provides a multi-scale image defogging method;
as shown in fig. 1, the multi-scale image defogging method includes:
s101: acquiring an image to be defogged;
S102: inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
Further, as shown in fig. 2, the trained multiscale image defogging network comprises:
the device comprises an input layer, a feature extraction module, a feature aggregation module, a gate control enhancement module, a pixel-by-pixel addition module and an output layer which are connected in sequence; the input end of the pixel-by-pixel adding module is also connected with the output end of the input layer.
As can be seen from fig. 2, the multi-scale image defogging network is composed of a feature extraction module, a feature aggregation module and a gating enhancement module.
The multi-scale image defogging network first projects an input image I i∈R3 ×H×W through a block embedding (patch-embedding) operation onto an embedded vector of dimension d (d=256 in the multi-scale image defogging network), each embedded image having the shape of(P is the block size of the block embedding, p=4 in a multi-scale image defogging network); secondly, the feature extraction module extracts multi-scale features, and the feature aggregation module aggregates features with different scales (namely multi-source multi-level); the gating enhancement module restores the edge information.
Further, as shown in fig. 3, the feature extraction module includes:
a block embedding layer, a first partial transducer layer, a second partial transducer layer, a third partial transducer layer, and a fourth partial transducer layer connected in sequence;
the input of the block embedding layer is the input of the feature extraction module.
Further, the block embedding layer is configured to divide an image to be defogged into a plurality of blocks of a set size, and represent each block as a vector.
Further, the feature extraction module is used for realizing multi-scale feature extraction of the image to be defogged.
It should be appreciated that since the original ViT performs multi-head attention at all spatial positions, its computational complexity of processing an image of size h×w is o=2× (HW) 2d. The computational complexity of raw ViT increases square times with increasing image resolution, and raw ViT is less efficient and takes up relatively more resources when it processes high resolution images. At the same time, single scale feature representations have certain limitations. Thus, during the encoding phase, the present invention replaces all of the convolutional layers in the FPN encoder with partial fransformer layers.
Further, as shown in fig. 3, the feature aggregation module includes:
the device comprises a first upsampling layer, a first pixel-by-pixel adding module, a second upsampling layer, a second pixel-by-pixel adding module, a first feature cascade layer, a first convolution layer, a third upsampling layer, a third pixel-by-pixel adding module, a second feature cascade layer and a second convolution layer which are sequentially connected;
The input end of the first upsampling layer is connected with the output end of the fourth local fransformer layer through a third convolution layer; the input end of the first pixel-by-pixel adding module is connected with the output end of the third local transducer layer through the fourth convolution layer; the input end of the second pixel-by-pixel adding module is connected with the output end of the second local transducer layer through a fifth convolution layer; the input end of the third pixel-by-pixel adding module is connected with the output end of the first local transducer layer through a sixth convolution layer;
The input end of the first characteristic cascade layer is connected with the output end of the third convolution layer through a fourth upsampling layer; the input end of the second characteristic cascade layer is connected with the output end of the third convolution layer through a fifth upsampling layer; the input end of the second characteristic cascade layer is connected with the output end of the first pixel-by-pixel addition module through a sixth upsampling layer;
the output of the second convolution layer is the output of the feature aggregation module.
Further, the functions of the first feature cascade layer and the second feature cascade layer are the same, and the first feature cascade layer and the second feature cascade layer are used for splicing input features in the channel dimension.
Further, the first, second and third upsampling layers are each configured to perform double upsampling.
Further, the fourth upsampling layer is used for realizing four times upsampling, the fifth upsampling layer is used for realizing eight times upsampling, and the sixth upsampling layer is used for realizing four times upsampling.
Further, the feature aggregation module is used for realizing aggregation of features with different scales.
It should be understood that, in feature aggregation, the invention adopts a global strategy, and adds dense connection operation on the basis of the original FPN feature aggregation mode. Since the add operation loses some of the original feature information at feature fusion, the cascade operation is a lossless operation in a strict sense. The dense connection takes the characteristic information of each layer as the input of the next layer through cascading operation, aggregates the characteristics of different layers, and realizes the characteristic reuse of different scales. Therefore, the feature aggregation module designed by the invention fuses the features of the same level in the encoding process and utilizes dense connection to aggregate the features of different levels in the decoding process from top to bottom. Through the global strategy, the multi-source multi-level characteristic information can be more comprehensively aggregated, and further, a clearer defogging image is reconstructed.
Further, as shown in fig. 4, the internal structures of the first local transducer layer, the second local transducer layer, the third local transducer layer, and the fourth local transducer layer are the same, and the first local transducer layer includes:
The system comprises a first normalization operation layer, a window-based multi-head self-attention mechanism layer, a fourth pixel-by-pixel addition module, a second normalization operation layer, a first multi-layer perceptron, a fifth pixel-by-pixel addition module, a third normalization operation layer, a moving window-based multi-head self-attention mechanism layer, a sixth pixel-by-pixel addition module, a fourth normalization operation module, a second multi-layer perceptron and a seventh pixel-by-pixel addition module which are sequentially connected in series;
The input end of the fourth pixel-by-pixel adding module is also connected with the input end of the first normalization operation layer; the input end of the fifth pixel-by-pixel adding module is also connected with the input end of the second normalization operation layer; the input end of the sixth pixel-by-pixel adding module is also connected with the input end of the third normalization operation layer; the input end of the seventh pixel-by-pixel adding module is also connected with the input end of the fourth normalization operation layer.
Further, the window-based multi-head self-attention mechanism layer W-MSA (window based self-attention), the network structure includes: window division, multi-head self-attention and splicing modules which are connected in sequence.
Further, the window-based multi-head self-attention mechanism layer W-MSA comprises the following working processes: firstly, dividing an input sequence into a plurality of windows with fixed sizes by window dividing operation; secondly, the subsequence in each window is sent to a plurality of attention heads for attention calculation; the pixel points in each window can only be subjected to inner product with other pixel points in the current window so as to obtain information; and finally, splicing the results obtained by calculating each attention head through cascading operation to obtain the final attention representation.
Further, the multi-head self-attention mechanism layer SW-MSA (shifted window based self-attention) based on the moving window comprises the following network structures: window partitioning, multi-headed self-attention, cross-window self-attention, and cascading.
Further, the multi-head self-attention mechanism layer SW-MSA based on the moving window comprises the following working processes: firstly, dividing an input sequence into a plurality of windows with fixed sizes by window dividing operation; secondly, the subsequence in each window is sent to a plurality of self-attention heads for attention calculation; then, the cross-window self-attention carries out weighted summation on the self-attention representation in each window and the self-attention representations in other windows, and self-attention calculation is carried out again to obtain the cross-window self-attention representation; and finally, splicing all window-crossing self-attention representations through cascading operation to obtain a final attention representation.
Further, the first, second, third, fourth, fifth, sixth and seventh pixel-by-pixel adding modules are used for adding pixel values of corresponding pixel points of the two input feature images, and the added pixel values are used as pixel values of corresponding pixel points of the output feature images.
Further, the first multi-layer perceptron and the second multi-layer perceptron are both used for carrying out nonlinear transformation on the input vector.
In the local transducer layer, the feature map is divided into a plurality of disjoint window areas, and a self-attention operation is performed in each window to acquire local information in the local window. Meanwhile, a movable window (movable window) is utilized to establish the connection among the local windows, the remote relation among the pixels is obtained, and then the characteristics of the input foggy day images are rapidly and comprehensively extracted, so that a strong characteristic foundation is provided for processing foggy day images with uniform/non-uniform haze.
The local transducer layer consists of, among other things, a multi-layer perceptron (multilayer perceptron, MLP), normalization (layernorm, LN), residual connection, and local window-based multi-head self-attention, MHSA.
The local transform layer performs self-attention within the local window to keep the linear computation costs, which are more efficient and take less resources.
The local transducer layer operates as follows: given an input feature map X, X is projected to a self-attention Q (Query), K (Key) and V (Value) matrix using a linear layer, and labels are grouped using window partitioning; the local transducer layer applies multi-head attention in the window, and window divisions of adjacent blocks are different; thus, self-attention can be calculated by:
Where d represents the number of channels, B is the relative position deviation term, softMax is the normalization function. A linear layer projects an attention output based on it.
Further, as shown in fig. 5, the gating enhancement module includes:
The fifth normalization operation layer, the seventh convolution layer, the first activation function layer, the eighth convolution layer, the sixth normalization operation layer, the second activation function layer, the gate control unit and the eighth pixel-by-pixel addition module are sequentially connected;
The input end of the gate control unit is also connected with the input end of the fifth normalization operation layer; the input end of the eighth pixel-by-pixel adding module is also connected with the input end of the fifth normalization operation layer.
Further, the gate control enhancing module comprises the following working processes: a fifth normalization operation layer normalizes the input features; the seventh convolution layer carries out linear transformation on the output characteristics of the fifth normalization layer; the first activation function layer retains the positive value output by the seventh convolution layer; the eighth convolution layer carries out linear transformation on the output characteristics of the first activation function layer; the sixth normalization operation layer normalizes the output of the eighth convolution layer; the second activation function layer maps the output of the sixth normalization layer to generate a gating value between 0 and 1; the gating unit selectively reserves or suppresses information in the feature map according to the gating value; and the eighth pixel-by-pixel adding module adds the output of the gating unit and the input of the fifth normalization operation layer pixel by pixel to obtain a final output.
Further, as shown in fig. 5, the gating enhancement module is configured to: and enhancing detail information such as edges in the image.
It should be understood that details such as contours and edges of the image are important structural information, so in order to obtain a defogging image with rich details and clear edges, at the end of the decoding process, the invention designs a gating enhancement module, which enhances details such as edges of the image by giving greater weight to the details such as edges and utilizing a pixel-by-pixel multiplication method.
The gating enhancement module is shown in fig. 2, and after a series of 1×1 convolution, reLU nonlinear activation and Norm normalization operations are performed on the input feature map, a weighting map is generated by using a Sigmoid activation function layer.
And then, performing dot multiplication operation on the weight map and an input feature map (namely, the output of the feature aggregation module), and fusing the input feature map by utilizing residual design and jump connection so as to focus on detail information such as edges and the like, thereby recovering defogging images with rich details and clear edges.
The formula of the enhancement unit is expressed as follows:
Wherein X and Representing the input and output feature maps, respectively, sigmoid is a Sigmoid activation function, norm is batch normalization, RELU represents ReLU nonlinear activation, conv is a1 x1 convolution.
It should be appreciated that the present invention proposes a local transducer-based multiscale image defogging network, referred to as MIDNet (Multi-SCALE IMAGE Dehazing Network based on local Transformer). This is one attempt to use Transforemr in processing uniform and non-uniform haze foggy images.
In the encoding process, different from the CNN which only focuses on local features, the multiscale image defogging network uses a multiscale feature extractor based on local Transformer to extract features of different levels more comprehensively by utilizing local information in a local window and a remote relation among pixels of the Transformer, so that the multiscale image defogging network can process foggy images with uniform and non-uniform haze. Compared with the prior method, the multi-scale image defogging network can obtain global information and simultaneously ensure the extraction efficiency of the features.
In the decoding process, the multi-scale image defogging network combines the pyramid structure from top to bottom and dense connection (dense connections, DC) operation, not only fuses the characteristics of the same scale in the encoding process, but also fuses the characteristics of different scales in the decoding process better, and realizes the comprehensive fusion enhancement of multi-source (encoding process and decoding process) multi-level (different scales) characteristics.
At the end of the network, the multi-scale image defogging network gives greater weight to details such as edges and the like through a gating enhancement module so as to acquire defogging images with rich details. The present invention conducted a number of experiments on the RESIDE, I-HAZE, O-HAZE, NH-HAZE and NITER public data sets, and compared with the most advanced method (SOTA), the proposed MIDNet model achieved better defogging performance.
Further, the trained multiscale image defogging network, and the training process comprises the following steps:
constructing a training set, wherein the training set is an original image of a known defogging image;
And inputting the training set into a multi-scale image defogging network, training the multi-scale image defogging network, and stopping training when the total loss function value of the network is not reduced any more, so as to obtain the trained multi-scale image defogging network.
Further, the total loss function is expressed as:
where ω 1、ω2、ω3 is a hyper-parameter, L represents the total loss function, Representing a smooth loss function, L vgg representing a perceptual loss function, and L ms-ssim representing a multi-scale structural similarity loss function.
Further, the smoothing loss function is expressed as:
Wherein i represents pixels, N is the total number of pixels, Y is the defogging image, and Y is the truth image.
It should be appreciated that many image restoration tasks trained using L1 loss achieve better performance than L2 loss in terms of PSNR and SSIM metrics. The smooth L1 loss converges fast and the gradient change is relatively smaller. Thus, the present invention employs a smooth L1 penalty to ensure that the predicted image is close to the real image.
Further, the perceptual loss function is expressed as:
Wherein Y and Y represent defogging images and truth images respectively, C i represents a channel, and H i and W i are the height and width of an ith feature map respectively; Representing a characterization map of the size C i×Hi×Wi of VGG-16 pre-trained on ImageNet.
It will be appreciated that in order to maintain perceptual and semantic fidelity, and better reconstruct and recover detailed information, the present invention exploits perceptual loss to provide additional supervision in the high-level feature space to measure high-level feature differences between blurred images and their corresponding defogged images.
Further, the multi-scale structural similarity loss function includes:
Wherein Y and Y respectively represent defogging images and truth images, M represents different scales, SSIM (Y, Y) is a structural similarity loss, and human visual perception (brightness, contrast, structure and the like) is considered;
The structural similarity loss function has the expression:
where, gaussian filtering is applied to Y and Y, the mean values of the calculation results are μ Y and μ y, the standard deviations are σ Y and σ y, and the covariances are σ Yy,C1 and C 2 are constants for maintaining stability.
It should be appreciated that the multiscale structural similarity penalty considers both human visual perception and resolution (multiscale), with the range of values for SSIM being [0,1]; in order to preserve the structure of the defogged image, the present invention uses a multi-scale structural similarity penalty to measure the structural similarity between the defogged image and the truth image.
Example two
The embodiment provides a multi-scale image defogging system;
a multi-scale image defogging module comprising:
An acquisition module configured to: acquiring an image to be defogged;
A defogging module configured to: inputting an image to be defogged into a trained multi-scale image defogging network, and outputting the defogged image; the trained multiscale image defogging network is used for carrying out multiscale feature extraction on the image to be defogged to obtain a multiscale feature map; performing feature aggregation on the multi-scale feature map to obtain an aggregated feature map; gating and enhancing the aggregation feature map to obtain an enhanced feature map; and adding the enhanced feature map and the image to be defogged pixel by pixel to obtain the defogged image.
Here, it should be noted that the above-mentioned obtaining module and defogging module correspond to steps S101 to S102 in the first embodiment, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.
Example III
The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.
The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1.多尺度图像去雾方法,其特征是,包括:1. A multi-scale image defogging method, characterized by comprising: 获取待去雾的图像;Obtain an image to be dehazed; 将待去雾的图像,输入到训练后的多尺度图像去雾网络中,输出去雾后的图像;其中,训练后的多尺度图像去雾网络,用于对待去雾的图像进行多尺度特征提取得到多尺度特征图;对多尺度特征特征图进行特征聚合,得到聚合特征图;对聚合特征图进行门控增强,得到增强特征图;将增强特征图与待去雾的图像进行逐像素相加,得到去雾后的图像;The image to be defogged is input into the trained multi-scale image defogging network, and the defogged image is output; wherein the trained multi-scale image defogging network is used to perform multi-scale feature extraction on the image to be defogged to obtain a multi-scale feature map; perform feature aggregation on the multi-scale feature map to obtain an aggregated feature map; perform gated enhancement on the aggregated feature map to obtain an enhanced feature map; and perform pixel-by-pixel addition of the enhanced feature map and the image to be defogged to obtain the defogged image; 其中,训练后的多尺度图像去雾网络包括:Among them, the trained multi-scale image dehazing network includes: 依次连接的输入层、特征提取模块、特征聚合模块、门控增强模块、逐像素相加模块和输出层;所述逐像素相加模块的输入端还与输入层的输出端连接;An input layer, a feature extraction module, a feature aggregation module, a gate enhancement module, a pixel-by-pixel addition module and an output layer connected in sequence; the input end of the pixel-by-pixel addition module is also connected to the output end of the input layer; 其中,所述特征提取模块,包括:Wherein, the feature extraction module includes: 依次连接的块嵌入层、第一局部Transformer层、第二局部Transformer层、第三局部Transformer层、第四局部Transformer层;The block embedding layer, the first local Transformer layer, the second local Transformer layer, the third local Transformer layer, and the fourth local Transformer layer are connected in sequence; 块嵌入层的输入端是特征提取模块的输入端;The input of the block embedding layer is the input of the feature extraction module; 所述块嵌入层,用于实现将待去雾的图像分成多个设定尺寸的块,并将每个块为一个向量表示;The block embedding layer is used to divide the image to be defogged into a plurality of blocks of a set size, and represent each block as a vector; 所述特征提取模块,用于实现对待去雾的图像进行多尺度的特征提取;The feature extraction module is used to realize multi-scale feature extraction of the image to be defogged; 其中,所述特征聚合模块,包括:Wherein, the feature aggregation module includes: 依次连接的第一上采样层、第一逐像素相加模块、第二上采样层、第二逐像素相加模块、第一特征级联层、第一卷积层、第三上采样层、第三逐像素相加模块、第二特征级联层、第二卷积层;A first upsampling layer, a first pixel-by-pixel addition module, a second upsampling layer, a second pixel-by-pixel addition module, a first feature cascade layer, a first convolutional layer, a third upsampling layer, a third pixel-by-pixel addition module, a second feature cascade layer, and a second convolutional layer connected in sequence; 所述第一上采样层的输入端通过第三卷积层与第四局部Transformer层的输出端连接;第一逐像素相加模块的输入端通过第四卷积层与第三局部Transformer层的输出端连接;第二逐像素相加模块的输入端通过第五卷积层与第二局部Transformer层的输出端连接;第三逐像素相加模块的输入端通过第六卷积层与第一局部Transformer层的输出端连接;The input end of the first upsampling layer is connected to the output end of the fourth local Transformer layer through the third convolutional layer; the input end of the first pixel-by-pixel addition module is connected to the output end of the third local Transformer layer through the fourth convolutional layer; the input end of the second pixel-by-pixel addition module is connected to the output end of the second local Transformer layer through the fifth convolutional layer; the input end of the third pixel-by-pixel addition module is connected to the output end of the first local Transformer layer through the sixth convolutional layer; 第一特征级联层的输入端通过第四上采样层与第三卷积层的输出端连接;第二特征级联层的输入端通过第五上采样层与第三卷积层的输出端连接;第二特征级联层的输入端通过第六上采样层与第一逐像素相加模块的输出端连接;The input end of the first feature cascade layer is connected to the output end of the third convolution layer through the fourth upsampling layer; the input end of the second feature cascade layer is connected to the output end of the third convolution layer through the fifth upsampling layer; the input end of the second feature cascade layer is connected to the output end of the first pixel-by-pixel addition module through the sixth upsampling layer; 第二卷积层的输出端是特征聚合模块的输出端;The output of the second convolutional layer is the output of the feature aggregation module; 其中,所述门控增强模块,包括:Wherein, the gate control enhancement module includes: 依次连接的第五归一化操作层、第七卷积层、第一激活函数层、第八卷积层、第六归一化操作层、第二激活函数层、门控单元和第八逐像素相加模块;其中,门控单元的输入端还与第五归一化操作层的输入端连接;第八逐像素相加模块的输入端还与第五归一化操作层的输入端连;A fifth normalization operation layer, a seventh convolution layer, a first activation function layer, an eighth convolution layer, a sixth normalization operation layer, a second activation function layer, a gating unit and an eighth pixel-by-pixel addition module connected in sequence; wherein the input end of the gating unit is also connected to the input end of the fifth normalization operation layer; the input end of the eighth pixel-by-pixel addition module is also connected to the input end of the fifth normalization operation layer; 所述门控增强模块,工作过程为:第五归一化操作层将输入特征进行归一化;第七卷积层对第五归一化层的输出特征进行线性变换;第一激活函数层保留第七卷积层输出的正值;第八卷积层对第一激活函数层的输出特征进行线性变换;第六归一化操作层对第八卷积层的输出进行归一化;第二激活函数层将第六归一化层的输出进行映射产生0到1之间的门控值;门控单元根据门控值选择性保留或抑制特征图中的信息;第八逐像素相加模块将门控单元的输出和第五归一化操作层的输入进行逐像素相加,得到最终的输出;The gated enhancement module has the following working process: the fifth normalization operation layer normalizes the input features; the seventh convolution layer linearly transforms the output features of the fifth normalization layer; the first activation function layer retains the positive value of the output of the seventh convolution layer; the eighth convolution layer linearly transforms the output features of the first activation function layer; the sixth normalization operation layer normalizes the output of the eighth convolution layer; the second activation function layer maps the output of the sixth normalization layer to generate a gating value between 0 and 1; the gating unit selectively retains or suppresses the information in the feature map according to the gating value; the eighth pixel-by-pixel addition module adds the output of the gating unit and the input of the fifth normalization operation layer pixel by pixel to obtain the final output; 多尺度图像去雾网络首先通过块嵌入操作将输入图像Ii∈R3×H×W投影到维度为d的嵌入向量,每个嵌入图像的形状为其次,特征提取模块提取多尺度的特征,特征聚合模块聚合不同尺度的特征;门控增强模块恢复边缘信息。The multi-scale image dehazing network first projects the input image I i ∈ R 3×H×W to an embedding vector of dimension d through a block embedding operation. The shape of each embedded image is Secondly, the feature extraction module extracts multi-scale features, the feature aggregation module aggregates features of different scales; and the gated enhancement module restores edge information. 2.如权利要求1所述的多尺度图像去雾方法,其特征是,所述第一局部Transformer层、第二局部Transformer层、第三局部Transformer层和第四局部Transformer层的内部结构是一样的,所述第一局部Transformer层,包括:2. The multi-scale image defogging method according to claim 1, wherein the internal structures of the first local Transformer layer, the second local Transformer layer, the third local Transformer layer and the fourth local Transformer layer are the same, and the first local Transformer layer comprises: 依次串联的第一归一化操作层、基于窗口的多头自注意力机制层、第四逐像素相加模块、第二归一化操作层、第一多层感知机、第五逐像素相加模块、第三归一化操作层、基于移动窗口的多头自注意力机制层、第六逐像素相加模块、第四归一化操作模块、第二多层感知机和第七逐像素相加模块;A first normalization operation layer, a window-based multi-head self-attention mechanism layer, a fourth pixel-by-pixel addition module, a second normalization operation layer, a first multi-layer perceptron, a fifth pixel-by-pixel addition module, a third normalization operation layer, a moving window-based multi-head self-attention mechanism layer, a sixth pixel-by-pixel addition module, a fourth normalization operation module, a second multi-layer perceptron, and a seventh pixel-by-pixel addition module connected in series in sequence; 所述第四逐像素相加模块的输入端还与第一归一化操作层的输入端连接;所述第五逐像素相加模块的输入端还与第二归一化操作层的输入端连接;所述第六逐像素相加模块的输入端还与第三归一化操作层的输入端连接;所述第七逐像素相加模块的输入端还与第四归一化操作层的输入端连接。The input end of the fourth pixel-by-pixel addition module is also connected to the input end of the first normalization operation layer; the input end of the fifth pixel-by-pixel addition module is also connected to the input end of the second normalization operation layer; the input end of the sixth pixel-by-pixel addition module is also connected to the input end of the third normalization operation layer; the input end of the seventh pixel-by-pixel addition module is also connected to the input end of the fourth normalization operation layer. 3.如权利要求1所述的多尺度图像去雾方法,其特征是,所述训练后的多尺度图像去雾网络,训练过程包括:3. The multi-scale image defogging method according to claim 1, wherein the training process of the trained multi-scale image defogging network comprises: 构建训练集,所述训练集为已知去雾图像的原始图像;Constructing a training set, wherein the training set is an original image of a known dehazed image; 将训练集输入到多尺度图像去雾网络中,对其进行训练,当网络的总损失函数值不再下降时,停止训练,得到训练后的多尺度图像去雾网络;The training set is input into the multi-scale image dehazing network for training. When the total loss function value of the network no longer decreases, the training is stopped to obtain the trained multi-scale image dehazing network. 所述总损失函数,表达式为:The total loss function is expressed as: 其中,ω1、ω2、ω3是超参数,L表示总损失函数,表示平滑损失函数,Lvgg表示感知损失函数,Lms-ssim表示多尺度结构相似度损失函数;Among them, ω 1 , ω 2 , ω 3 are hyperparameters, L represents the total loss function, represents the smoothing loss function, L vgg represents the perceptual loss function, and L ms-ssim represents the multi-scale structural similarity loss function; 平滑损失函数,表达式为:Smooth loss function, expression is: 其中,i代表像素,N为像素总数,Y为去雾图像,y为真值图像;Where i represents the pixel, N is the total number of pixels, Y is the dehazed image, and y is the true value image; 感知损失函数,表达式为:Perceptual loss function, expressed as: 其中,Y和y分别代表去雾图像和真值图像,Ci代表通道,Hi和Wi分别为第i特征图的高和宽;代表在ImageNet上预训练的VGG-16的大小为Ci×Hi×Wi的特征图;Among them, Y and y represent the dehazed image and the true value image respectively, Ci represents the channel, Hi and Wi are the height and width of the i-th feature map respectively; Represents the feature map of size C i ×H i ×W i of VGG-16 pre-trained on ImageNet; 所述多尺度结构相似度损失函数,包括:The multi-scale structural similarity loss function includes: 其中,Y和y分别表示去雾图像和真值图像,M表示不同的尺度,SSIM(Y,y)为结构相似度损失,考虑了人类视觉感知;Among them, Y and y represent the dehazed image and the true image respectively, M represents different scales, and SSIM(Y, y) is the structural similarity loss, which takes into account human visual perception; 结构相似度损失函数,表达式为:The structural similarity loss function is expressed as: 其中,对Y和y应用高斯滤波,计算结果的均值为μY和μy、标准差为σY和σy、协方差σYy,C1和C2是用于保持稳定性的常数。Among them, Gaussian filtering is applied to Y and y, the mean of the calculated results is μ Y and μ y , the standard deviation is σ Y and σ y , the covariance is σ Yy , and C 1 and C 2 are constants used to maintain stability. 4.多尺度图像去雾系统,其特征是,包括:4. A multi-scale image defogging system, characterized by comprising: 获取模块,其被配置为:获取待去雾的图像;去雾模块,其被配置为:将待去雾的图像,输入到训练后的多尺度图像去雾网络中,输出去雾后的图像;其中,训练后的多尺度图像去雾网络,用于对待去雾的图像进行多尺度特征提取得到多尺度特征图;对多尺度特征特征图进行特征聚合,得到聚合特征图;对聚合特征图进行门控增强,得到增强特征图;将增强特征图与待去雾的图像进行逐像素相加,得到去雾后的图像;An acquisition module is configured to: acquire an image to be defogged; a defogging module is configured to: input the image to be defogged into a trained multi-scale image defogging network, and output a defogged image; wherein the trained multi-scale image defogging network is used to perform multi-scale feature extraction on the image to be defogged to obtain a multi-scale feature map; perform feature aggregation on the multi-scale feature map to obtain an aggregated feature map; perform gated enhancement on the aggregated feature map to obtain an enhanced feature map; and perform pixel-by-pixel addition of the enhanced feature map and the image to be defogged to obtain a defogged image; 其中,训练后的多尺度图像去雾网络包括:Among them, the trained multi-scale image dehazing network includes: 依次连接的输入层、特征提取模块、特征聚合模块、门控增强模块、逐像素相加模块和输出层;所述逐像素相加模块的输入端还与输入层的输出端连接;An input layer, a feature extraction module, a feature aggregation module, a gate enhancement module, a pixel-by-pixel addition module and an output layer connected in sequence; the input end of the pixel-by-pixel addition module is also connected to the output end of the input layer; 其中,所述特征提取模块,包括:Wherein, the feature extraction module includes: 依次连接的块嵌入层、第一局部Transformer层、第二局部Transformer层、第三局部Transformer层、第四局部Transformer层;The block embedding layer, the first local Transformer layer, the second local Transformer layer, the third local Transformer layer, and the fourth local Transformer layer are connected in sequence; 块嵌入层的输入端是特征提取模块的输入端;The input of the block embedding layer is the input of the feature extraction module; 所述块嵌入层,用于实现将待去雾的图像分成多个设定尺寸的块,并将每个块为一个向量表示;The block embedding layer is used to divide the image to be defogged into a plurality of blocks of a set size, and represent each block as a vector; 所述特征提取模块,用于实现对待去雾的图像进行多尺度的特征提取;The feature extraction module is used to realize multi-scale feature extraction of the image to be defogged; 其中,所述特征聚合模块,包括:Wherein, the feature aggregation module includes: 依次连接的第一上采样层、第一逐像素相加模块、第二上采样层、第二逐像素相加模块、第一特征级联层、第一卷积层、第三上采样层、第三逐像素相加模块、第二特征级联层、第二卷积层;A first upsampling layer, a first pixel-by-pixel addition module, a second upsampling layer, a second pixel-by-pixel addition module, a first feature cascade layer, a first convolutional layer, a third upsampling layer, a third pixel-by-pixel addition module, a second feature cascade layer, and a second convolutional layer connected in sequence; 所述第一上采样层的输入端通过第三卷积层与第四局部Transformer层的输出端连接;第一逐像素相加模块的输入端通过第四卷积层与第三局部Transformer层的输出端连接;第二逐像素相加模块的输入端通过第五卷积层与第二局部Transformer层的输出端连接;第三逐像素相加模块的输入端通过第六卷积层与第一局部Transformer层的输出端连接;The input end of the first upsampling layer is connected to the output end of the fourth local Transformer layer through the third convolutional layer; the input end of the first pixel-by-pixel addition module is connected to the output end of the third local Transformer layer through the fourth convolutional layer; the input end of the second pixel-by-pixel addition module is connected to the output end of the second local Transformer layer through the fifth convolutional layer; the input end of the third pixel-by-pixel addition module is connected to the output end of the first local Transformer layer through the sixth convolutional layer; 第一特征级联层的输入端通过第四上采样层与第三卷积层的输出端连接;The input end of the first feature cascade layer is connected to the output end of the third convolutional layer through the fourth upsampling layer; 第二特征级联层的输入端通过第五上采样层与第三卷积层的输出端连接;第二特征级联层的输入端通过第六上采样层与第一逐像素相加模块的输出端连接;The input end of the second feature cascade layer is connected to the output end of the third convolution layer through the fifth upsampling layer; the input end of the second feature cascade layer is connected to the output end of the first pixel-by-pixel addition module through the sixth upsampling layer; 第二卷积层的输出端是特征聚合模块的输出端;The output of the second convolutional layer is the output of the feature aggregation module; 其中,所述门控增强模块,包括:Wherein, the gate control enhancement module includes: 依次连接的第五归一化操作层、第七卷积层、第一激活函数层、第八卷积层、第六归一化操作层、第二激活函数层、门控单元和第八逐像素相加模块;其中,门控单元的输入端还与第五归一化操作层的输入端连接;第八逐像素相加模块的输入端还与第五归一化操作层的输入端连;A fifth normalization operation layer, a seventh convolution layer, a first activation function layer, an eighth convolution layer, a sixth normalization operation layer, a second activation function layer, a gating unit and an eighth pixel-by-pixel addition module connected in sequence; wherein the input end of the gating unit is also connected to the input end of the fifth normalization operation layer; the input end of the eighth pixel-by-pixel addition module is also connected to the input end of the fifth normalization operation layer; 所述门控增强模块,工作过程为:第五归一化操作层将输入特征进行归一化;第七卷积层对第五归一化层的输出特征进行线性变换;第一激活函数层保留第七卷积层输出的正值;第八卷积层对第一激活函数层的输出特征进行线性变换;第六归一化操作层对第八卷积层的输出进行归一化;第二激活函数层将第六归一化层的输出进行映射产生0到1之间的门控值;门控单元根据门控值选择性保留或抑制特征图中的信息;第八逐像素相加模块将门控单元的输出和第五归一化操作层的输入进行逐像素相加,得到最终的输出;The gated enhancement module has the following working process: the fifth normalization operation layer normalizes the input features; the seventh convolution layer linearly transforms the output features of the fifth normalization layer; the first activation function layer retains the positive value of the output of the seventh convolution layer; the eighth convolution layer linearly transforms the output features of the first activation function layer; the sixth normalization operation layer normalizes the output of the eighth convolution layer; the second activation function layer maps the output of the sixth normalization layer to generate a gating value between 0 and 1; the gating unit selectively retains or suppresses the information in the feature map according to the gating value; the eighth pixel-by-pixel addition module adds the output of the gating unit and the input of the fifth normalization operation layer pixel by pixel to obtain the final output; 多尺度图像去雾网络首先通过块嵌入操作将输入图像Ii∈R3×H×W投影到维度为d的嵌入向量,每个嵌入图像的形状为其次,特征提取模块提取多尺度的特征,特征聚合模块聚合不同尺度的特征;门控增强模块恢复边缘信息。The multi-scale image dehazing network first projects the input image I i ∈ R 3×H×W to an embedding vector of dimension d through a block embedding operation. The shape of each embedded image is Secondly, the feature extraction module extracts multi-scale features, the feature aggregation module aggregates features of different scales; and the gated enhancement module restores edge information. 5.一种电子设备,其特征是,包括:5. An electronic device, comprising: 存储器,用于非暂时性存储计算机可读指令;以及a memory for non-transitory storage of computer-readable instructions; and 处理器,用于运行所述计算机可读指令,a processor for executing the computer readable instructions, 其中,所述计算机可读指令被所述处理器运行时,执行上述权利要求1-3任一项所述的方法。When the computer-readable instructions are executed by the processor, the method described in any one of claims 1 to 3 is executed. 6.一种存储介质,其特征是,非暂时性存储计算机可读指令,其中,当非暂时性计算机可读指令由计算机执行时,执行权利要求1-3任一项所述方法的指令。6. A storage medium, characterized in that it non-temporarily stores computer-readable instructions, wherein when the non-temporary computer-readable instructions are executed by a computer, the instructions of the method described in any one of claims 1 to 3 are executed.
CN202311861387.5A 2023-12-28 2023-12-28 Multi-scale image defogging method, system, device and storage medium Active CN117808707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311861387.5A CN117808707B (en) 2023-12-28 2023-12-28 Multi-scale image defogging method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311861387.5A CN117808707B (en) 2023-12-28 2023-12-28 Multi-scale image defogging method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN117808707A CN117808707A (en) 2024-04-02
CN117808707B true CN117808707B (en) 2024-08-02

Family

ID=90419805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311861387.5A Active CN117808707B (en) 2023-12-28 2023-12-28 Multi-scale image defogging method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN117808707B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121010595A (en) * 2025-10-27 2025-11-25 山东浪潮智慧建筑科技有限公司 A method, equipment, and medium for fog detection and defogging in a security system.

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116342868A (en) * 2023-03-22 2023-06-27 西安电子科技大学 Small target detection method based on multi-scale feature compensation and gating enhancement

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450273B (en) * 2021-06-18 2022-10-14 暨南大学 A method and system for image dehazing based on multi-scale and multi-stage neural network
CN114120253B (en) * 2021-10-29 2023-11-14 北京百度网讯科技有限公司 Image processing method, device, electronic equipment and storage medium
CN114049274B (en) * 2021-11-13 2024-08-27 哈尔滨理工大学 Defogging method for single image
CN114202696B (en) * 2021-12-15 2023-01-24 安徽大学 SAR target detection method and device based on context vision and storage medium
CN116051428B (en) * 2023-03-31 2023-07-21 南京大学 A low-light image enhancement method based on joint denoising and super-resolution of deep learning
CN117237608A (en) * 2023-09-18 2023-12-15 江苏智能无人装备产业创新中心有限公司 A multi-scale fog scene target detection method and system based on deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116342868A (en) * 2023-03-22 2023-06-27 西安电子科技大学 Small target detection method based on multi-scale feature compensation and gating enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Li, S. ; Yuan, Q. ; Zhang, Y. ; Lv, B. ; Wei, F..Image Dehazing Algorithm Based on Deep Learning Coupled Local and Global Features.《 Appl. Sci.》.2022,1-14页. *

Also Published As

Publication number Publication date
CN117808707A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN116797488B (en) A Low-Light Image Enhancement Method Based on Feature Fusion and Attention Embedding
Tu et al. SWCGAN: Generative adversarial network combining swin transformer and CNN for remote sensing image super-resolution
CN111951195B (en) Image enhancement method and device
Cherabier et al. Learning priors for semantic 3d reconstruction
CN113378775B (en) Video shadow detection and elimination method based on deep learning
CN116977208B (en) Dual-branch fusion low-light image enhancement method
CN113065645B (en) Twin attention network, image processing method and device
CN108510451B (en) Method for reconstructing license plate based on double-layer convolutional neural network
CN113947538B (en) A multi-scale efficient convolutional self-attention single image rain removal method
CN111091503A (en) Image defocus blur method based on deep learning
CN110503613A (en) Single Image-Oriented Rain Removal Method Based on Cascaded Atrous Convolutional Neural Network
CN115272438A (en) High-precision monocular depth estimation system and method for three-dimensional scene reconstruction
CN116645569B (en) A method and system for colorizing infrared images based on generative adversarial networks
CN111079764A (en) Low-illumination license plate image recognition method and device based on deep learning
WO2024002211A1 (en) Image processing method and related apparatus
CN110349087A (en) RGB-D image superior quality grid generation method based on adaptability convolution
CN117808706B (en) Video rain removing method, system, equipment and storage medium
Wang et al. Multi-focus image fusion framework based on transformer and feedback mechanism
CN117726544A (en) An image deblurring method and system for complex motion scenes
Ali et al. Boundary-constrained robust regularization for single image dehazing
CN117808707B (en) Multi-scale image defogging method, system, device and storage medium
Bai et al. CEPDNet: a fast CNN-based image denoising network using edge computing platform
CN115953312A (en) A joint defogging detection method, device and storage medium based on a single image
Zhao et al. End‐to‐End Retinex‐Based Illumination Attention Low‐Light Enhancement Network for Autonomous Driving at Night
Zhou et al. Restoration of laser interference image based on large scale deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant