[go: up one dir, main page]

Next Article in Journal
Development of a Hierarchical Clustering Method for Anomaly Identification and Labelling of Marine Machinery Data
Previous Article in Journal
An Experimental and Numerical Study of Motion Responses of Multi-Body Arrays with Hinge Connections
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Underwater Images through Multi-Frequency Detail Optimization and Adaptive Color Correction

1
School of Smart Marine Science and Technology, Fujian University of Technology, Fuzhou 350118, China
2
Institute of Smart Marine and Engineering, Fujian University of Technology, Fuzhou 350118, China
3
Fujian Provincial Key Laboratory of Marine Smart Equipment, Fuzhou 350118, China
4
State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle, Hunan University, Changsha 410008, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2024, 12(10), 1790; https://doi.org/10.3390/jmse12101790
Submission received: 28 August 2024 / Revised: 3 October 2024 / Accepted: 4 October 2024 / Published: 8 October 2024
(This article belongs to the Section Ocean Engineering)
Figure 1
<p>Raw underwater images. Underwater images commonly suffer from (<b>a</b>) color casts, (<b>b</b>) artifacts, and (<b>c</b>) blurred details.</p> ">
Figure 2
<p>The overview of our framework. First, the EFED module detects edge information in the image using an efficient network architecture. Subsequently, the original image and the extracted edge map are fed into the MCPFA module. The MCPFA module leverages an attention mechanism to fuse information from different color spaces and scales, enhancing the image and ultimately producing the enhanced result.</p> ">
Figure 3
<p>Pixel difference convolution flowchart [<a href="#B36-jmse-12-01790" class="html-bibr">36</a>]. * for point multiplication. First, calculating the difference between a target pixel and its neighboring pixels, then multiplying these differences by the corresponding weights in the convolution kernel and summing the results, and finally, outputting the sum as the feature value of the target pixel.</p> ">
Figure 4
<p>Edge detection structure diagram. First, the original image undergoes multiple downsampling layers within the backbone network, extracting multi-scale edge features. Subsequently, these features are fed into four parallel auxiliary networks. The auxiliary networks utilize dilated convolutions to enlarge the receptive field, sampling global information and fusing features from different scales. This process enables refined edge processing. Finally, the auxiliary networks output a high-quality edge map.</p> ">
Figure 5
<p>MCSF module. Integrates information from HSV, Lab, and RGB color spaces, along with edge information, to provide comprehensive features for subsequent image enhancement steps.</p> ">
Figure 6
<p>CF-MHA architecture. First, the input feature map is divided into frequency bands based on scale channels. Then, each band undergoes multi-head attention computation independently. Color-aware weights are learned based on the attenuation levels of different colors at different locations. Finally, the multi-head attention outputs, adjusted by the color-aware weights, are fused to produce the final enhanced feature, effectively mitigating the color attenuation issue in underwater images.</p> ">
Figure 7
<p>Visual comparison of the full-reference data on the test dataset of EUVP. From left to right; (<b>a</b>) original underwater image, (<b>b</b>) UDCP [<a href="#B10-jmse-12-01790" class="html-bibr">10</a>], (<b>c</b>) HE [<a href="#B47-jmse-12-01790" class="html-bibr">47</a>], (<b>d</b>) CLAHE [<a href="#B11-jmse-12-01790" class="html-bibr">11</a>], (<b>e</b>) LRS [<a href="#B48-jmse-12-01790" class="html-bibr">48</a>], (<b>f</b>) FUnIE-GAN [<a href="#B3-jmse-12-01790" class="html-bibr">3</a>], (<b>g</b>) U-shape [<a href="#B41-jmse-12-01790" class="html-bibr">41</a>], (<b>h</b>) Semi-UIR [<a href="#B49-jmse-12-01790" class="html-bibr">49</a>], (<b>i</b>) our method and (<b>j</b>) reference image (recognized as ground-truthing (GT)).</p> ">
Figure 8
<p>Visual comparison of non-reference data from RUIE on the UCCS, UTTS, and UIQS datasets. From left to right: for (1) bluish-biased image, (2) bluish-green biased image, and (3) greenish-biased image data in the UCCS dataset with different color biases, and (4) underwater image quality data in the UIQS dataset that contains underwater images of various qualities for specific underwater mission, and (5) underwater target mission data in the image dataset UTTS for a specific underwater mission. From left to right: (<b>a</b>) original underwater image, (<b>b</b>) UDCP [<a href="#B10-jmse-12-01790" class="html-bibr">10</a>], (<b>c</b>) HE [<a href="#B47-jmse-12-01790" class="html-bibr">47</a>], (<b>d</b>) CLAHE [<a href="#B11-jmse-12-01790" class="html-bibr">11</a>], (<b>e</b>) LRS [<a href="#B48-jmse-12-01790" class="html-bibr">48</a>], (<b>f</b>) FUnIE-GAN [<a href="#B3-jmse-12-01790" class="html-bibr">3</a>], (<b>g</b>) U-shape [<a href="#B41-jmse-12-01790" class="html-bibr">41</a>], (<b>h</b>) Semi-UIR [<a href="#B49-jmse-12-01790" class="html-bibr">49</a>] and (<b>i</b>) our method.</p> ">
Figure 9
<p>Visual comparison of reference data on the test dataset of EUVP. From left to right: (<b>a</b>) the original image, (<b>b</b>) Sobel [<a href="#B19-jmse-12-01790" class="html-bibr">19</a>], (<b>c</b>) Canny [<a href="#B22-jmse-12-01790" class="html-bibr">22</a>], (<b>d</b>) Laplace [<a href="#B21-jmse-12-01790" class="html-bibr">21</a>], (<b>e</b>) RCF [<a href="#B53-jmse-12-01790" class="html-bibr">53</a>], (<b>f</b>) ours and (<b>g</b>) ours on ground truth.</p> ">
Figure 10
<p>Results of color space selection evaluation. Tests are performed on the test dataset of EUVP to obtain PSNR and SSIM results for each color space model test.</p> ">
Figure 11
<p>Results of ablation experiments on different components. From left to right: (<b>a</b>) Input, (<b>b</b>) U-net, (<b>c</b>) U + EFED, (<b>d</b>) U + MCSF, (<b>e</b>) U + CF-MHA, (<b>f</b>) U + EFED + MCSF, (<b>g</b>) U + MCSF + CF-MHA, (<b>h</b>) U + CF-MHA + EFED, (<b>i</b>) MCPFA, (<b>j</b>) GT. And zoomed-in local details.</p> ">
Figure 12
<p>The results of underwater target recognition. From left to right: (<b>a</b>) original underwater image, (<b>b</b>) UDCP [<a href="#B10-jmse-12-01790" class="html-bibr">10</a>], (<b>c</b>) HE [<a href="#B47-jmse-12-01790" class="html-bibr">47</a>], (<b>d</b>) CLAHE [<a href="#B11-jmse-12-01790" class="html-bibr">11</a>], (<b>e</b>) LRS [<a href="#B48-jmse-12-01790" class="html-bibr">48</a>], (<b>f</b>) FUnIE-GAN [<a href="#B3-jmse-12-01790" class="html-bibr">3</a>], (<b>g</b>) U-shape [<a href="#B41-jmse-12-01790" class="html-bibr">41</a>], (<b>h</b>) Semi-UIR [<a href="#B49-jmse-12-01790" class="html-bibr">49</a>] and (<b>i</b>) our method.</p> ">
Figure 13
<p>The results of the Segment Anything Model. From left to right: (<b>a</b>) original underwater image, (<b>b</b>) UDCP [<a href="#B10-jmse-12-01790" class="html-bibr">10</a>], (<b>c</b>) HE [<a href="#B47-jmse-12-01790" class="html-bibr">47</a>], (<b>d</b>) CLAHE [<a href="#B11-jmse-12-01790" class="html-bibr">11</a>], (<b>e</b>) LRS [<a href="#B48-jmse-12-01790" class="html-bibr">48</a>], (<b>f</b>) FUnIE-GAN [<a href="#B3-jmse-12-01790" class="html-bibr">3</a>], (<b>g</b>) U-shape [<a href="#B41-jmse-12-01790" class="html-bibr">41</a>], (<b>h</b>) Semi-UIR [<a href="#B49-jmse-12-01790" class="html-bibr">49</a>] and (<b>i</b>) our method.</p> ">
Figure 14
<p>Enhancement results of a real underwater cage environment. From left to right: (<b>a</b>) original underwater image, (<b>b</b>) UDCP [<a href="#B10-jmse-12-01790" class="html-bibr">10</a>], (<b>c</b>) HE [<a href="#B47-jmse-12-01790" class="html-bibr">47</a>], (<b>d</b>) CLAHE [<a href="#B11-jmse-12-01790" class="html-bibr">11</a>], (<b>e</b>) LRS [<a href="#B48-jmse-12-01790" class="html-bibr">48</a>], (<b>f</b>) FUnIE-GAN [<a href="#B3-jmse-12-01790" class="html-bibr">3</a>], (<b>g</b>) U-shape [<a href="#B41-jmse-12-01790" class="html-bibr">41</a>], (<b>h</b>) Semi-UIR [<a href="#B49-jmse-12-01790" class="html-bibr">49</a>] and (<b>i</b>) our method.</p> ">
Versions Notes

Abstract

:
This paper presents a novel underwater image enhancement method addressing the challenges of low contrast, color distortion, and detail loss prevalent in underwater photography. Unlike existing methods that may introduce color bias or blur during enhancement, our approach leverages a two-pronged strategy. First, an Efficient Fusion Edge Detection (EFED) module preserves crucial edge information, ensuring detail clarity even in challenging turbidity and illumination conditions. Second, a Multi-scale Color Parallel Frequency-division Attention (MCPFA) module integrates multi-color space data with edge information. This module dynamically weights features based on their frequency domain positions, prioritizing high-frequency details and areas affected by light attenuation. Our method further incorporates a dual multi-color space structural loss function, optimizing the performance of the network across RGB, Lab, and HSV color spaces. This approach enhances structural alignment and minimizes color distortion, edge artifacts, and detail loss often observed in existing techniques. Comprehensive quantitative and qualitative evaluations using both full-reference and no-reference image quality metrics demonstrate that our proposed method effectively suppresses scattering noise, corrects color deviations, and significantly enhances image details. In terms of objective evaluation metrics, our method achieves the best performance in the test dataset of EUVP with a PSNR of 23.45, SSIM of 0.821, and UIQM of 3.211, indicating that it outperforms state-of-the-art methods in improving image quality.

1. Introduction

With the expanding applications of underwater imaging technology in fields like marine exploration, biological research, and underwater robotics [1], addressing the challenges posed by the underwater environment becomes crucial. The complex optical properties of water, including scattering by suspended particles and absorption, lead to significant degradation of underwater images [2]. This degradation manifests as color casts, artifacts, and blurred details, hindering accurate interpretation and analysis, as illustrated in Figure 1. Therefore, high-quality underwater image enhancement is essential for advancing our understanding and exploration of the ocean [3]. Underwater image enhancement aims to mitigate these optical distortions, improving image clarity, color accuracy, and overall visual quality [4,5]. This is vital for the effective utilization of underwater imagery in diverse applications, ranging from scientific research to resource management. Over the past decades, researchers have developed various enhancement algorithms, broadly categorized into physical model-based, non-physical model-based, and deep learning-based methods [6].
Physics-based underwater image enhancement methods [7,8,9,10] attempt to restore degraded imagery by modeling light attenuation and scattering within aquatic environments. However, real-world underwater environments are complex and variable, making it difficult to accurately describe all situations with a unified model. However, the inherent complexity and variability of real-world underwater scenes often hinder the accuracy of these models. Conversely, non-physics-based methods [11,12] directly manipulate pixel values in the spatial or transform domains, typically employing mathematical transformations or filters. While computationally efficient, these methods often neglect the underlying physics of image degradation, potentially leading to limited enhancement and unwanted artifacts [13].
In the field of underwater image enhancement, deep learning-based methods [3,14,15] have attracted much attention with the advancement of deep learning technology. Thanks to their powerful data-driven capabilities, deep learning provides a new approach to underwater image enhancement [16,17,18]. By training on large amounts of underwater image data, deep learning models can learn complex image degradation and enhancement mapping relationships, thereby achieving end-to-end image enhancement. However, existing underwater image datasets are limited in size, lack diverse scenes, and often lack real-world scenarios, which severely restricts the learning performance of underwater image enhancement methods.
Although traditional image enhancement methods and advanced deep learning techniques have achieved remarkable results in improving the quality of underwater images, there are still some thorny problems in the processed images. In particular, artifacts, detail blurring and local color bias are still two major challenges in underwater image enhancement. Artifacts usually originate from the scattering and absorption of light by the water column, as well as the effect of sensor noise. These artifacts can seriously interfere with the recognition of image content and degrade the visual quality of the image. In addition, underwater images often suffer from blurring of details due to the attenuation of light by the water column. This blurring can result in the loss of important texture and edge features in the image, hindering an accurate understanding of the underwater scene.
This paper introduces a novel underwater image enhancement method, leveraging Efficient Edge Detection (EFED) and Multi-scale Color Parallel Frequency Division Attention (MCPFA) to address the aforementioned challenges. The key contributions are summarized below:
(1)
We propose a novel multi-frequency information fusion architecture for image enhancement tasks. This architecture effectively extracts high-frequency and low-frequency information in the preprocessing stage, significantly improving the detail rendition and contour sharpness of images in complex and noisy environments;
(2)
We propose MCPFA, which effectively integrates multiple color spaces and high- and low-frequency information and dynamically adjusts the feature importance through the designed multi-scale channel attention to focus on the key areas and details to enhance the overall quality of the output image;
(3)
Our dual multi-color space image enhancement structure leverages RGB, Lab, and HSV color spaces in both the network architecture and loss function, promoting structural alignment and mitigating color distortion, edge artifacts, and detail loss common in existing methods.

2. Related Work

2.1. Underwater Image Enhancement

Underwater image enhancement techniques can be broadly categorized based on their underlying principles and methodologies. In the following subsections, we will delve into the three primary categories: physics-based methods, non-physical-based methods, and deep learning-based methods. Each category offers unique advantages and limitations, catering to different underwater imaging scenarios and requirements.

2.1.1. Physics-Based Methods

Most current underwater image enhancement methods based on physical models utilize the Jaffe–McGlamery imaging model to estimate accurate transmittance maps and background light parameters. The Jaffe–McGlamery imaging model, proposed by McGlamery [7] and Jaffe [8], simplifies the underwater imaging process into a linear superposition of three scattering components: direct, forward, and backward.
I c ( x ) = t c ( x ) J c ( x ) + A c 1 t c ( x )
where I c ( x ) is the clear underwater image; c represents the red, green, and blue color channels (R, G, B) of the image; t c ( x ) is the degree of light penetration in water; J c ( x ) represents the degree to which light is absorbed and reflected in water; A c is the light intensity of the environment around the water body; t c ( x ) J c ( x ) is the direct component of light propagation in water; and A c 1 t c ( x ) denotes the scattered light component responsible for blurring and color distortion in the captured image. Forward scattering is neglected due to its negligible contribution.
The Dark Channel Prior (DCP) [9] effectively dehazes outdoor images by leveraging the darkest pixels to estimate atmospheric light and subsequently derive a transmission map. However, DCP performs poorly on underwater images due to the unique attenuation characteristics of light underwater. To address this, Drews [10] proposed an image transmission estimation method based on the underwater dark channel prior (UDCP). The method effectively utilizes the slower attenuation of the blue and green channels in the water body and combines with the underwater dark channel beforehand to achieve a more accurate transmission map estimation, thus effectively restoring the true colors and details of the underwater scene.
These physical model-based methods employ complex underwater imaging models, which are challenging to generalize in practical applications because they require prior information related to imaging.

2.1.2. Non-Physical-Based Methods

The histogram is commonly used to depict the distribution of image tones. The aim of histogram equalization is to redistribute pixels across different gray levels, thereby balancing the histogram and improving the brightness and contrast of the image. Reza [11] proposed the Contrast Limited Adaptive Histogram Equalization (CLAHE) method, which effectively limits noise amplification but exhibits poor enhancement around image boundaries. While histogram equalization and its improved methods can enhance contrast, they overlook the underwater optical imaging model, often introducing noise and artifacts. Consequently, these methods are more suitable for post-processing in certain applications.
The Retinex algorithm posits that the observed image can be represented as the product of two components: the illumination component and the reflection component. Fan et al. [12] developed a scheme that combines color correction with multi-scale fusion for underwater image enhancement. This approach constructs a normalized weight map by calculating multiple weights for contrast-enhanced and edge-enhanced images. It then fuses the contrast-enhanced and sharpened versions of the color-corrected image at multiple scales, using multiple fusion weights to obtain an enhanced image. However, they do not cover all underwater complex scenes and may introduce artifacts or distortion, especially in areas with low lighting or high noise levels.
Non-physical model underwater image enhancement methods easily lead to contrast distortion, color shift, and noise amplification when dealing with complex underwater scenes due to the lack of consideration of physical factors such as underwater light attenuation and scattering, and the enhancement effect lacks a physical explanation.

2.1.3. Deep Learning-Based Methods

Deep learning-based underwater image enhancement methods are typically data-driven, requiring a dataset that contains original underwater images and their corresponding ideal reference images for training. Li et al. [14] propose the Underwater Image Enhancement Network (Water-Net), which individually applies white balancing, gamma correction, and histogram equalization to degraded images and then synthesizes the final enhanced image using confidence maps for each input. However, this method does not effectively address the problem of backward scattering in underwater images.
Inspired by U-Net, Islam et al. [3] introduce the Fast Underwater Enhancement Network (FUnIE-GAN) aimed at improving visual perception. It enhances the salient detail information by adding residual joins to the generator but has a high dependence on the dataset, limited generalization ability, and lacks physical model guidance. Li et al. [15] synthesize a dataset using an underwater imaging model and a priori knowledge and design a lightweight convolutional neural network to adapt to complex underwater environments, but the quality of the synthesized dataset affects the model’s generalization ability, and the dependence on prior knowledge is high. Lu et al. [16] combine the dark channel prior and CycleGAN and utilize multi-scale SSIM loss to preserve image structural details, but the dark channel prior may not be valid in some complex scenes and the training of CycleGAN is unstable.
Liu et al. [17] combine the Akkaynak–Treibitz model with a feature fusion network to generate clear image details by fusing multi-scale features, but the sensitivity to the feature fusion strategy is high and the computational complexity is high. Wu et al. [18] propose a two-stage underwater image enhancement framework that first balances the color and then refines the contrast, but with more parameters, the adaptability to different scenarios needs to be improved.
Deep learning-based methods have made significant progress in underwater image enhancement, but there are still some challenges, such as dataset limitations, lack of physical models, parameter tuning, and high consumption of computational resources.

2.2. Edge Detection

The evolution of edge detection methods can be broadly categorized into three stages: gradient-based methods, second-order derivative methods, and deep learning-based approaches. Early edge detection algorithms primarily relied on image gradient information, such as the Sobel [19] and Roberts [20] operators. However, these methods are susceptible to noise and often produce edges with inconsistent thickness, leading to blurred details. To address these limitations, researchers developed second-order derivative methods, including the Laplacian operator [21] and the Marr–Hildreth algorithm. However, second-order derivative methods tend to amplify noise, exhibit higher computational complexity, and are prone to edge discontinuities, making them less adaptable to complex scenes. The Canny edge detector [22], on the other hand, combines Gaussian smoothing, gradient computation, non-maximum suppression, and hysteresis thresholding to provide more refined and accurate edge detection results, albeit with a relatively high computational cost. Advancements in deep learning have propelled deep learning-based edge detection methods to the forefront, achieving significant performance gains.
In recent years, deep learning methods have further utilized various gradient information to produce more accurate boundaries [23]. In the early development of deep learning, researchers explored the use of Convolutional Neural Networks (CNNs) as classifiers for edge detection, where the core idea is to predict the probability that an input image block belongs to an edge by training a CNN model [24,25,26]. Holistically-nested edge detection (HED) [27], pioneering the use of fully convolutional networks (FCNs) [23] for end-to-end edge detection, significantly improved accuracy and robustness. These deep learning methods leverage multi-level image features to better capture complex edge information. However, these deep learning methods also have some limitations. For example, they can be computationally expensive to train and deploy. Additionally, they may not be able to accurately detect edges in images with complex backgrounds or occlusions.
Edge detection has evolved from gradient and second derivative-based methods to deep learning-based methods. Although deep learning methods have made significant progress, they still face challenges such as over-smoothing, computational complexity, and sensitivity to noise.

2.3. Attention Mechanisms

The attention mechanism is a method that simulates the resource allocation mechanism of the human brain, with the core idea being to focus more on important information while ignoring less important information. The earliest introduction of attention mechanisms into the field of computer vision is by Itti et al. [28], who propose a model based on saliency maps, selecting key regions through the combination of multi-scale image features. Yoshua Bengio et al. [29] further introduce the attention mechanism into image captioning, proposing two models based on attention mechanisms: the Soft Attention method, trained using basic backpropagation, and the Hard Attention method, trained using reinforcement learning.
In the field of natural language processing, Bahdanau et al. [30] are the first to introduce the attention mechanism into neural machine translation. By incorporating the attention mechanism into the encoder–decoder framework, they significantly improve translation quality. Subsequently, Vaswani et al. [31] propose the Transformer model, which completely abandons traditional recurrent neural networks (RNNs) and relies entirely on the attention mechanism for information processing, significantly enhancing the parallel computing capability and performance of the model.
Hu et al. [32] apply the attention mechanism to the channel dimension, proposing Squeeze-and-Excitation Networks (SE-Net), which improve model performance by adaptively recalibrating channel features. Following this, many works based on the channel domain have been adjusted and optimized on this foundation, such as CBAM (Convolutional Block Attention Module) [33] and ECA-Net (Efficient Channel Attention Network) [34].
However, the attention mechanism also has some limitations. For example, it can be computationally expensive to compute the attention weights. Additionally, the attention mechanism may not be able to capture long-range dependencies in the input data.

3. Methods

This section provides a detailed introduction to our proposed underwater image enhancement framework, which centrally integrates an Efficient Fusion Edge detection (EFED) preprocessing module and a Multi-scale Color Parallel Frequency-division Attention (MCPFA), as shown in Figure 2. To preserve image details, we utilize an efficient multi-scale edge detection to extract image edge information, providing accurate frequency domain information for the subsequent enhancement model. Subsequently, the MCPFA module integrates multi-scale color fusion techniques with frequency domain information and performs a downsampling operation before the data are fed into the attention module. Through the channel attention mechanism, the MCPFA module effectively enhances the severely attenuated color and detail information in underwater images. Finally, through feature remapping and upsampling, we successfully generate high-quality underwater images with accurate color restoration, complete image structure, and clear details. Notably, the multi-scale parallel feature frequency attention module employs a supervised learning approach for training and utilizes a composite loss function consisting of L R G B , L L a b , L H S V , L S S I M and L U I Q M —five components for optimization.

3.1. Efficient Fusion Edge Detection Pre-Training

In the field of edge detection, traditional methods, although interpretable and computationally efficient, perform poorly in specific images or scenes and have limited generalization capabilities, making it difficult to accurately distinguish boundaries. In contrast, deep learning methods based on CNNs, GANs, etc., can automatically learn complex features and exhibit superior generalization capabilities. However, they require significant computational resources and storage space, making them unsuitable for resource-limited devices. Inspired by [35], we propose an approach that combines traditional operators with deep learning techniques to develop an integrated, lightweight, and efficient edge detection model using pixel difference convolution. The method consists of the following two main components:

3.1.1. Pixel Difference Convolution

Traditional convolution as in Equation (2) is a single multiplication and summation of convolution kernels of a specific size, which requires random initialization and learning using a large amount of data to capture the gradient information in such a way that it is too data-dependent.
y = f ( x , θ ) = i = 1 k × k w i x i
where x i is the input pixel value and w i is the k × k convolution kernel weight.
Pixel Difference Convolution (PDC) combines the learning capabilities of traditional edge operators and deep learning techniques and improves the traditional convolution by not directly performing the convolution operation on the original pixel values, but first calculating the difference between the pixel points in the k × k matrix, and then multiplying the elements with the weights of the convolution kernel to generate the output features, as shown in Figure 3, with the following formula:
y = f ( x , θ ) = ( x i , x i ) P k × k w i ( x i x i )
where x i and x i are both input pixels, f represents differential computation, and w i are the weights of the m × m convolution kernel. P is the set of all pixel pairs in the current convolution kernel, and pixel pairs can be selected according to different difference types.
Such pixel difference convolution can significantly encode gradient information and, combined with the feature extraction capabilities of convolution, more effectively capture gradient features in images, thereby enhancing the sensitivity of network-to-edge features. While Pixel Difference Convolution (PDC) requires substantial memory resources during training, it can be converted into a standard convolution layer after training, thereby maintaining high computational efficiency during inference. This approach not only offers broad applicability across diverse edge detection scenarios but also enhances the generalization ability of the model and reduces reliance on extensive pre-training datasets.

3.1.2. Efficient Multi-Scale Network

The pre-training network utilizes a structure consisting of an efficient backbone network and a task-specific auxiliary network, which are shown in Figure 4. The backbone network [37] uses pixel differential convolution and shortcut connections to improve operational efficiency and recognition accuracy. The network contains four stages with downsampling between each stage using a maximum pooling layer [38]. The first stage contains an initial convolutional layer and three backbone blocks, while the remaining stages contain four backbone blocks. It uses a Pixel Difference Convolution Block (PDC), ReLU function, and 1 × 1 convolution. To constrain the model size, the number of channels in each stage of the backbone network is kept small, specifically set to n l (where n represents the number of stages, and l is a scaling factor).
The auxiliary network effectively enriches the multi-scale edge information by introducing the dilated convolution module and the spatial attention module and, at the same time, suppresses the interference of background noise, so as to generate high-quality phase edge maps.
The auxiliary module uses four dilated convolutions with different magnifications, the R e L U function and the 1 × 1 convolution, as shown in the lower right corner of Figure 4. And we conducted experiments to evaluate the effect of different dilation rates on the performance of the auxiliary module. The results show that the best performance is obtained using a combination of dilation rates (3, 5 and 7), which demonstrates the effectiveness of capturing multi-scale edge information. The multi-scale edge information is enriched by converting the input n l channels to the output M ( M < l ) channels, which greatly reduces the number of parameters, as shown in Equation (4):
O = R e L U m = 1 M n = 1 M X [ i r m , j r n ] · K [ m , n ]
where X is the input feature map, K is the convolutional kernel, M × M is the convolutional kernel size, r is the dilation rate, R e L U is the activation function, and i , j are the spatial indices of the feature map.
Subsequently, the edge feature maps from the four stages are fused. After processing with a 1 × 1 convolution, a 3 × 3 convolution, and a Sigmoid function, the feature dimension is reduced to a single channel, resulting in the final edge detection result map E R H W .

3.2. Multi-Scale Color Parallel Frequency-Division Attention

Inspired by [39], in order to enhance the attention of the network to the more severely attenuated channels and to enhance the processing of edge artifacts and fine details, we design a Multi-scale Color Parallel Frequency-division Attention (MCPFA), which is divided into the following three main parts:

3.2.1. Multi-Color Space Fusion

Given the unique characteristics of underwater environments, the color deviation in the red channel is significantly more pronounced compared to conventional environments. The attenuation is deeper, exhibiting a variety of shifts from blue to green to yellow. Due to the lack of effective absorption of long-wavelength light, the red channel suffers the most information loss, making it challenging for traditional network architectures to effectively handle these complex and diverse color deviations, thereby limiting image quality and detail representation.
To address this issue, inspired by [40], we conduct an in-depth analysis of the color distribution characteristics of underwater images and propose a module based on Multi-Color Space Fusion (MCSF), as shown in Figure 5. First, we transform the original image into three color spaces, RGB, HSV and Lab, respectively, to obtain more color information; then, this information is fused together with the edge information obtained from edge detection, and we obtain the rich information of the original image. Leveraging the strengths of Lab color space in simulating human color perception and HSV space in highlighting color differences, we propose the MCSF module. This module extracts and fuses the characteristics of Lab and HSV color spaces, utilizes the complementary characteristics of different color spaces, and adds the original RGB space to achieve a more comprehensive red channel restoration, enhance detail expression, and effectively suppress color deviation, ultimately improving the image quality.
First, the original underwater RGB image I R G B R H W 3 is converted to a Lab color image I L a b R H W 3 , as shown in Equations (5)–(7). The Lab color space is a perceptually uniform color space that is closer to the perception of color differences by the human visual system. Its advantage lies in separating the brightness information (L channel) of the image from the color information (a and b channels), allowing for independent adjustment, thereby handling complex color deviations in underwater images more precisely. Additionally, the Lab color space has a wider color gamut, capable of representing richer color details and effectively reducing the impact of illumination changes on color analysis.
X = 0.412453 R + 0.357580 G + 0.180423 B Y = 0.212671 R + 0.715160 G + 0.072169 B Z = 0.019334 R + 0.119193 G + 0.950304 B
f ( t ) = t 1 / 3 , if t > 0.008856 7.787 t + 16 / 116 , otherwise
L = 116 ( Y / Y n ) 1 / 3 16 , if Y / Y n > 0.008856 903.3 ( Y / Y n ) , otherwise a = 500 ( f ( X / X n ) f ( Y / Y n ) ) b = 200 ( f ( Y / Y n ) f ( Z / Z n ) )
where L represents luminance, a represents the color axis from green to red, and b represents the color axis from blue to yellow. X n = 95.047 , Y n = 100 and Z n = 108.883 are constants.
Next, the original RGB image I R G B R H W 3 is converted to the HSV color space I H S V R H W 3 , as shown in Equations (8)–(10). The HSV color space uses hue (H), saturation (S), and value (V) to describe colors. This representation is more intuitive and easier to understand, facilitating color selection and adjustment. The HSV color space can effectively highlight color differences, helping the MCSF module better capture subtle color changes in underwater environments.
V = M a x ( R , G , B )
S = [ V m i n ( R , G , B ) ] / V V > 0 0 V = 0
H = 60 ( G B ) / S V V = R 60 [ 2 + ( B R ) / S V ] V = G 60 [ 4 + ( R G ) / S V ] V = B 0 V = 0 H + 360 H < 0
where H, S, and V represent hue, saturation, and value, respectively. And R , G , B represent the RGB channel values.
Finally, the MCSF module concatenates the Lab, HSV, and original RGB color space images with edge detection results A R H W 10 :
A = I R G B I L a b I H S V E
where ⓒ represents channel concatenation.
By integrating edge information and combining features from different color spaces and frequencies, this fusion strategy enhances detail expression, leading to clearer and sharper image restoration.

3.2.2. Multi-Scale Positional Encoding

The fused image, denoted as A, undergoes a 1 × 1 convolution operation to produce the feature map F 1 R H W C 1 , where C i represents the channel count at each stage. Subsequently, four downsampling operations are applied, generating four multi-scale feature maps: F i R H 2 i 1 W 2 i 1 C i , ( i = 2 , 3 , 4 , 5 ) . Inspired by [41], we discard the traditional linear projection approach, which only gives attention to the global features of the image and overlooks the important spatial information. To capture the spatial structure information within images, we employ a method that combines 2D learnable positional encoding with linear projection. This method enhances the ability of the model to perceive spatial relationships within the image by integrating pixel location data into the feature vectors. Consequently, the model can focus more on edge details during subsequent attention mechanisms. Specifically, for the first four input feature maps at different scales, we embed their corresponding positional information, denoted as S i R H 2 i 1 W 2 i 1 C i , ( i = 1 , 2 , 3 , 4 ) , into the feature representations. This results in four feature sequences T i R H 2 i 1 W 2 i 1 C i , ( i = 1 , 2 , 3 , 4 ) , enriched with spatial information.
T i = F i S i
where ⊕ represents the sum of pixel values.
For the feature sequences T i R H 2 i 1 W 2 i 1 C i , ( i = 1 , 2 , 3 , 4 ) at different scales after position encoding, we utilize linear projection for feature reshaping. We employ 2D patch sequences with a size of P 2 i 1 P 2 i 1 ( i = 1 , 2 , 3 , 4 ) and a stride of P 2 i 1 ( i = 1 , 2 , 3 , 4 ) , mapping them to the same region that encodes their features across four scales, resulting in the feature sequences U i R b C i , and b H W p 2 denotes the feature after linear projection.
Then, using Equation (13), we can obtain the query vector in the attention mechanism Q i R b C i , K R b C , V R b C :
Q i = U i W Q i K = U W K V = U W V
where W Q i R b C i , W K R b C , and W V R b C denote the projection weights; U R C b is the connection of S i through the channel dimension, C = C 1 + C 2 + C 3 + C 4 , where C 1 = 64 , C 2 = 128 , C 3 = 256 , C 4 = 512 .

3.2.3. Channel-Wise Frequency-Division Multi-Head Attention (CF-MHA)

The CF-MHA, as in Figure 6, includes six inputs: four queries Q i R b C i , one key K R b C , and one value V R b C . First, a similarity matrix M i is generated, and the value (V) is weighted through cross-attention (CA) to produce an output, as shown in Equation (14).
C A i = M i V T = SoftMax ( Norm I N ( Q i T K C 2 ) ) V T
where N o r m I N [42] represents the instance normalization operation.
This channel-specific attention mechanism, unlike others, operates along the channel axis, focusing the network on predefined high-frequency edge components and channels with significant attenuation. Furthermore, instance normalization, unlike batch normalization, standardizes each image individually, facilitating smoother gradient propagation and simplifying the learning process.
For N-head attention, the output of the (i)-th CF-MHA is given by Equation (15):
CF MHA i = ( CA i 1 + CA i 2 + , + CA i N ) / N
Then, the outputs of CF-MHA at different scales are processed using MLP, batch normalization and residual operator. The outputs at different scales are obtained according to Equation (16):
O i = CF MHA i + LN ( MLP ( Q i + CF MHA i ) )
where MLP stands for Multi-Layer Perceptron, and LN stands for Layer Normalization.
To construct an L-layer Transformer, Equation (12) is iteratively applied L times. Based on empirical evidence and experimental results, L is set to 4, a value that balances efficiency and processing time. Finally, distinct output feature sequences are remapped to generate four mapped feature maps, which are then concatenated and fed as input to the decoder.

3.3. Loss Function

To achieve underwater image enhancement through learning paired datasets, and to better enhance color, contrast, brightness, artifact removal, and detail in the images, we design a novel composite loss function. This function combines RGB loss, Lab loss, HSV loss, SSIM loss, and UIQM loss.
In the Lab color space [43], which is a device-independent three-dimensional space, it is widely used for color management and quality control. It attempts to uniformly map the color differences perceived by human vision. The Lab loss function is calculated as in Equation (17):
L L a b ( g ( x ) , y ) = w L | L g ( x ) L y | + w a | a g ( x ) a y | + w b | b g ( x ) b y |
where w L , w a , w b is the channel weight.
The loss function of HSV can be calculated by the following equation:
L H S V ( g ( x ) , y ) = w H | H g ( x ) H y | + w S | S g ( x ) S y | + w V | V g ( x ) V y |
where w H , w S , w V is the channel weight.
The Underwater Image Quality Measure (UIQM) [44] is frequently used to quantify color, sharpness, and contrast enhancements in images. Its calculation, shown in Equation (19), serves as a loss function (Equation (20)) to evaluate and understand improvements in color clarity.
U I Q M = c 1 U I C M + c 2 U I S M + c 3 U I C o n M
L U I Q M ( g ( x ) , y ) = | U I Q M ( g ( x ) ) U I Q M ( y ) |
where UICM, UISM, and UIConM represent color, sharpness, and contrast measures, respectively, with empirically derived constant coefficients c 1 = 0.0282, c 2 = 0.2953, c 1 = 3.5753.
To mitigate the loss of structural information and visual fidelity often associated with image enhancement, we incorporate the Structural Similarity Index Measure (SSIM) loss [45]. This loss function, calculated as shown in Equation (21) and formulated in Equation (22), effectively preserves content and structural details between the original and enhanced images.
S S I M ( g ( x ) , y ) = 2 μ g ( x ) μ y + C 1 μ g ( x ) 2 + μ y 2 + C 1 2 σ g ( x ) y + C 2 σ g ( x ) 2 + σ y 2 + C 2
L S S I M ( g ( x ) , y ) = 1 S S I M ( g ( x ) , y )
where μ represents the mean, σ represents the standard deviation, σ g ( x ) y represents the covariance, and C 1 = 0.02 and C 2 = 0.03 are constants.
The final loss function can be expressed as:
L t o t a l = α L R G B ( g ( x ) , y ) + β L H S V ( g ( x ) , y ) + γ L L a b ( g ( x ) , y ) + δ L S S I M ( g ( x ) , y ) + μ L U I Q M ( g ( x ) )
where hyperparameters α , β , γ , δ , μ , and θ are set to 0.1, 1, 0.01, 1, 0.1, and 100, respectively. The order of these hyperparameters corresponds to their respective scaling factors when applied to the individual loss terms.

4. Experiments

This section outlines the implementation process, platform, and datasets used in this research. The proposed method is compared against seven representative methods using qualitative and quantitative full-reference and non-reference evaluation techniques. This comparison emphasizes the overall performance method and the efficacy of edge detection at both global and detailed levels. A series of experiments were designed to verify the effectiveness of each component, analyze runtime, and showcase advantages in detail handling. Finally, the effectiveness of data augmentation is further tested through image detection, segmentation and field cage environment enhancement tasks, providing a comprehensive assessment of the performance of the method.

4.1. Implementation Details

Our methodology was implemented using Python in conjunction with the PyTorch 11.3 deep learning library, and it was conducted on an Ubuntu 22.04 system equipped with a single NVIDIA RTX 3090 GPU. All training images were preprocessed by cropping them to a size of 256 × 256 pixels and normalizing the pixel values. To expedite convergence, we chose the Adam optimizer. The training process was set to run for 500 epochs with a batch size of 8. The initial learning rate was set to 10 2 and an exponential decay strategy was used, decaying to 0.8 times the original rate every 50 epochs. In addition, for the evaluation test platforms in this section, we used Windows 11 systems equipped with an NVIDIA RTX 1660 SUPER GPU.

4.2. Datasets

The model training utilizes two publicly available underwater image datasets: the training dataset of EUVP created by Islam [3] (containing 7735 images) and the UIEB dataset created by Li [14] (containing 890 images). To expand the dataset size, we applied various data augmentation techniques, including cropping, rotation, and flipping. For model testing, we employed the testing dataset of EUVP [3] and the testing subset of the RUIE dataset [46], which encompasses multiple datasets such as UCCS (Underwater Color Cast Correction Simulation dataset), UIQS (Underwater Image Quality Evaluation Database), and UTTS (Underwater Turbidity Image and Transmission Map Training Set). This comprehensive test set allows for a more comprehensive performance evaluation.

4.3. Comparison Experiments

This subsection presents a comparative analysis of our proposed method against representative traditional and deep learning-based underwater image enhancement techniques, including UDCP [10], HE [47], CLAHE [11], RSL [48], FUnIE-GAN [3], U-shape [41], and Semi-UIR [49]. Both qualitative and quantitative assessments are provided.

4.3.1. Full-Reference Evaluation

To comprehensively evaluate the performance of the algorithm, we first conducted a full-reference image quality assessment on the EUVP test dataset. This dataset comprises original underwater images and their corresponding high-quality ground truth (GT) images in Figure 7 and Table 1. The results of our proposed method and seven other underwater image enhancement algorithms on three randomly selected images, along with their GT counterparts, are shown in Figure 7. It is worth noting that noise amplification is a common challenge encountered during the image enhancement process. We assess the performance of our algorithm in noise suppression, structure preservation, and overall quality enhancement using several metrics, including Peak Signal-to-Noise Ratio (PSNR) [50] as in Equation (25), Structural Similarity Index (SSIM) [45] as in Equation (21), Mean Squared Error (MSE) as in Equation (24), and Underwater Image Quality Measure (UIQM) [44] as in Equation (19). PSNR is sensitive to image noise and effectively reflects the denoising capability of the algorithm. SSIM evaluates the ability of an algorithm to retain structural information in an image. MSE quantifies the pixel-wise differences between the original and enhanced images, with lower values indicating higher image quality. Compared to PSNR and SSIM, UIQM provides a more comprehensive assessment of underwater image enhancement quality as it considers crucial factors such as contrast, saturation, and vividness. Therefore, we select UIQM as our comprehensive evaluation metric.
M S E = 1 m n i = 0 m 1 j = 0 n 1 [ I ( i , j ) K ( i , j ) ] 2
where m × n is the image resolution, I is the original image, and K is the enhanced image.
PSNR = 10 × log 10 ( 2 n 1 ) 2 MSE
where n is the number of bits per pixel.
Table 1 shows the performance of various image enhancement methods on the EUVP dataset. Our method generally achieves superior results in metrics like PSNR, MSE, and UIQM [44]. While its SSIM score matches the U-shape method, the overall performance remains strong.
The results in Figure 7 and Table 1 indicate that traditional methods like UDCP and HE, while effective at enhancing image brightness, have limitations in addressing color cast and contrast issues. The RSL method excels at removing scattering effects but falls short in color correction and detail restoration. Deep learning-based methods such as FUnIE-GAN and U-shape, leveraging their powerful data processing capabilities, have made significant progress in underwater image enhancement, but they occasionally produce artifacts and local over-enhancement. While the semi-infrared (Semi-UIR) technique holds great potential, it still faces challenges when dealing with complex underwater scenes. For instance, its performance may fluctuate and lead to inconsistent results, potentially with artifacts and color distortions, when handling drastic lighting changes, dense particle scattering, or intricate underwater structures.
Our method, in comparison, provides a more balanced and effective approach to scattering removal, color correction, and detail restoration. This results in high-quality images visually comparable to reference images.

4.3.2. Non-Reference Evaluation

We employed three sub-datasets from the RUIE dataset: UCCS, UIQS, and UTTS. The UCCS dataset offers a diverse collection of images with challenging color shifts, making it ideal for evaluating color correction algorithms. The UIQS dataset provides a benchmark for assessing image quality enhancement methods under various degradation conditions. The UTTS dataset is specifically tailored for underwater object detection, allowing us to evaluate the impact of enhancement on downstream tasks. Each sub-dataset consists of 5 groups of data. We conducted enhancement experiments on each sub-dataset, representing three different enhancement directions. For each sub-dataset, we randomly split the data into training and testing sets. We applied eight different enhancement methods to the training set and evaluated their performance on the testing set. After enhancement, we randomly selected images from each dataset and calculated the average metrics of all images to compare the performance of the eight methods, as shown in Figure 8 and Table 2. To evaluate the enhancement results, we employ two non-reference underwater image quality assessment (IQA) metrics: UIQM [44] and UCIQE [51]. Notably, UCIQE focuses on evaluating the color saturation, contrast, and brightness of an image. By removing the constraint of requiring high-quality reference images, our method demonstrates its versatility and suitability for a wider range of practical applications.
Our proposed method demonstrates leading performance in underwater image enhancement, achieving top results on both the UCCS and UTTS unpaired datasets. Specifically, on the UCCS dataset, our method effectively removes the blue-green and green casts commonly found in underwater images, significantly improving color fidelity and naturalness. On the UIQS dataset, our method substantially enhances image clarity and contrast while preserving a natural and smooth appearance, resulting in visually appealing images. Furthermore, experimental results on the UTTS dataset confirm the effectiveness of our method in boosting performance for underwater object detection and recognition tasks.
In some cases, it is salient that the UIQM and UCIQE metrics may harbor a preference for certain properties which might prevent them from comprehensively evaluating the overall effect of image enhancement. Consequently, they may not accurately reflect the actual visual effect of underwater image recovery and its impact on post-machine processing [52]. Additionally, we conduct a qualitative comparison of all methods, as shown in Figure 8. Our method effectively corrects color deviations and distortions while preserving natural color information, resulting in more realistic enhanced images. Moreover, regarding edge information in the images, it sharpens edges and avoids artifacts, which is crucial for subsequent image processing tasks.

4.3.3. Evaluation of Computational Efficiency

Benchmarking our method against seven existing approaches, we evaluated the performance using FLOPs, parameter count, and processing time (Table 3). This analysis, combined with previous comparisons, highlights the ability of the method to achieve high-quality image enhancement while remaining computationally efficient.
Specifically, our method significantly reduces computational complexity (FLOPs) compared to other data-driven approaches. Although the number of model parameters increases, the processing time only increases by 0.07 s. This processing time still ranks among the top three among the eight methods, indicating that our method not only excels in image enhancement quality but also achieves optimization in terms of efficiency.

4.3.4. Evaluation of Edge Detection

To evaluate the performance of different edge detection methods in underwater image processing, we employ paired data from the EUVP test set to compare classical methods like Sobel [19], Canny [22], Laplace [21], RCF [53], and our proposed new method. Our proposed method is also applied to real-world images for detection, with the results shown in Figure 9.
Figure 9 shows that in underwater images, traditional methods like Sobel, Canny, and Laplace suffer from noise amplification due to light scattering and water absorption, leading to poor edge detection performance. While RCF performs well in detecting major target objects, it still has shortcomings in detecting background objects. In contrast, our new method, when applied to paired images, shows almost identical detection results for underwater and real images. Our method effectively suppresses noise interference caused by the underwater environment. It not only performs well in underwater images but also achieves almost the same results in ground truth, demonstrating greater versatility. It effectively preserves image details, avoids over-smoothing, and produces finer edge detection results.

4.4. Ablation Studies

Utilizing the test dataset of EUVP, we conducted a series of ablation experiments using PSNR and SSIM as evaluation metrics to validate the effectiveness of each of the components of our methodology. These experiments systematically analyzed the impact of color space selection, loss function choices, and the individual and cumulative contributions of each component: Efficient Fusion Edge Detection (EFED), Multi-Color Space Fusion (MCSF), and Channel-wise Frequency-division Multi-head Attention (CF-MHA) module, on the overall performance.

4.4.1. Experiments on Multi-Color Space Fusion

For the selection of color space in MCSF, we employed multiple color spaces for training and loss function calculation to achieve the best enhancement effect. A consistent U-net model architecture is trained across different monochromatic spaces. Model performance is then evaluated on the EUVP TEST dataset using PSNR and SSIM metrics. The experimental results are shown in Figure 10. We observe that across the RGB, Lab, HSV, Lch, HSL, YCbCr, and HSI color spaces, the RGB, Lab, and HSV color spaces consistently achieve high rankings for both PSNR and SSIM, indicating their potential for enhancing underwater images.
To further investigate the influence of color spaces, we conducted ablation experiments by evaluating the performance of networks incorporating different color spaces. The results, as presented in Table 4, demonstrate that these three color spaces offer complementary advantages in representing distinct attributes of underwater images.
The RGB color space preserves the original color information of the image; the Lab color space separates luminance from chromaticity, which helps maintain image contrast; and the HSV color space emphasizes the representation of hue, saturation, and value. This complementarity allows the model to capture features of underwater images from different perspectives, resulting in more comprehensive and effective image enhancement. Therefore, we ultimately choose RGB, Lab, and HSV as the multi-color spaces for training and loss function to fully leverage their complementary advantages and improve the performance of underwater image enhancement.

4.4.2. Experiments on Loss Functions

An ablation study on loss functions assesses their effectiveness in enhancing image quality. Specifically, ablation experiments target the color loss function (composed of RGB, Lab, and HSV components), the SSIM loss function, and the UIQM loss function. Each loss function component is individually removed to evaluate its impact on final performance. The color loss function, SSIM loss function, and UIQM loss function are sequentially removed, with the performance of the model on the test dataset of EUVP recorded for each combination. Comparing the PSNR and SSIM values under different combinations provides a clearer understanding of the contribution of each loss function to the image quality improvement. The experimental results are presented in Table 5.
The results in the table demonstrate that using a single loss function leads to significantly inferior performance compared to using the combination of all loss functions. This suggests that integrating multiple loss functions provides better guidance during model training, resulting in improved performance across various evaluation metrics.
Eliminating any individual loss function negatively impacts both PSNR and SSIM, highlighting the contribution of each function to overall image quality improvement. Notably, the performance degradation is most significant when the color loss function is removed, highlighting its crucial role in our model.
Our ablation studies demonstrate the importance of our selected loss function combination in achieving high-quality image results, providing valuable guidance for future model development.

4.4.3. Overall Experiments

Our experiments utilize the (b) U-net [54] as a baseline model. First, we individually incorporated three components, EFED, MCSF, and CF-MHA, into U-net, resulting in three models: (c) U + EFED, (d) U + MCSF, and (e) U + CF-MHA. We then conducted experiments on these models. Next, we combined these three components in pairs and added them to U-net, creating three additional models: (f) U + EFED + MCSF, (g) U + MCSF + CF-MHA, and (h) U + CF-MHA + EFED. Experiments were conducted on these models as well. Finally, we integrated all three components into U-net, resulting in the (i) MCPFA model, which was also experimentally evaluated.
In order to validate the effectiveness of our method, we conducted specially designed experiments on the test dataset of EUVP and compared it with other top methods. For consistent evaluation, all experiments utilized the same platform. Our method exhibits superior performance on the test set, as evidenced by Table 6 and Figure 11. It not only achieves significantly higher PSNR and SSIM scores compared to other methods but also maintains comparable computational efficiency.
Figure 11 showcases a randomly selected test image and its locally magnified regions, clearly demonstrating the contribution of each module to the image restoration results. The EFED module significantly enhances the ability of the model to capture local details, making details such as hands in the image sharper and clearer. The MCSF module improves the color performance of the image, making the colors more vivid and natural and closer to the real scene. The CF-MHA module effectively suppresses noise and artifacts in the image and works collaboratively with the EFED and MCSF modules to further enhance the clarity and naturalness of the image. Ultimately, our method excels at restoring details such as hand edges, producing results that are very close to the real image, which fully demonstrates the superiority of our method.

4.5. Application

Underwater image enhancement not only provides convenience for human observation of underwater conditions but also enables underwater robots to better perceive the environment, thereby completing underwater tasks more efficiently and orderly. Therefore, scores on evaluation metrics cannot fully determine the effectiveness of underwater image enhancement methods. This section compares the performance of object detection and SAM (Segment Anything Model) on original underwater images and images processed with eight different enhancement methods. It also presents the enhanced results of real underwater cage images.
We train an underwater object detection model based on YOLOv8, which identifies marine life such as sea urchins, shellfish, sea cucumbers, and fish. In order to more visually assess the effectiveness of different underwater image enhancement methods, we use a target detection model. This model is tested on original underwater images as well as images processed by eight different enhancement methods. Figure 12 presents the object detection results after applying different methods, using the confidence score and quantity of identified objects as evaluation metrics. The results demonstrate that image enhancement significantly impacts object detection performance. Notably, our proposed enhancement method considerably improves the number of detected objects, recognition accuracy, and confidence scores.
To further investigate the influence of different enhancement methods on image details, we utilized the Segment Anything Model (SAM) [55] of Meta to conduct a segmentation comparison on images processed by each method. We applied SAM to both the original images and the images enhanced by different methods, with the results shown in Figure 13. The segmentation results indicate our method excels at highlighting object boundaries, producing refined and realistic segmentations. This effectiveness in enhancing both overall quality and fine details showcases the strength of our approach for underwater images.
In addition, we collected a batch of raw underwater images during a real underwater cage survey. These images were taken as input and enhanced using our method and several other mainstream methods, as shown in Figure 14. Enhanced images exhibit exceptional performance in color restoration, detail preservation, and noise suppression, significantly improving the identifiability of target objects. However, in complex underwater lighting conditions, especially when shooting long distances, the red channel of the image is prone to over-enhancement, which requires further research. Additionally, real-time enhancement in complex and variable underwater lighting conditions is a challenging task that deserves attention.

5. Conclusions

This study proposes a novel underwater image enhancement method that combines Efficient Fusion Edge Detection (EFED) and a Multi-Scale Color Parallel Frequency-division Attention Module (MCPFA) to address the common issues of local color casts and blurred details in underwater images, achieving effective enhancement of underwater images. The EFED module provides critical guidance for subsequent feature extraction by quickly extracting image edge information. The MCPFA module effectively corrects color distortions in underwater images through parallel processing in multiple color spaces. Meanwhile, the introduction of the attention mechanism enables the network to adaptively focus on regions of interest in the image, further improving the enhancement effect. Experimental results show that compared with the existing state-of-the-art methods in the test dataset of EUVP, the proposed method improves the peak signal-to-noise ratio (PSNR) by 1.72 dB and the structural similarity index measure (SSIM) by 0.094, achieving significant improvements in both visual quality and detail clarity. This indicates that the method can effectively restore the true color and detail information of underwater images. However, under extremely complex lighting conditions, the method still has some shortcomings, such as excessive enhancement of the red channel and reduced computational efficiency. These issues will be further explored and addressed in future research.

Author Contributions

Conceptualization, methodology, and writing—original draft preparation, J.J.; funding acquisition, conceptualization, and writing—review and editing, X.G.; conceptualization, methodology, writing—review and editing, F.L.; resources, conceptualization, and funding acquisition, H.H.; software, validation, and supervision, J.Y. and B.Z.; data curation, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key Scientific and Technological Innovation Projects of Fujian Province (2023XQ015); Fujian Provincial Department of Science and Technology Announces Major Special Projects (2023HZ025003); Research Start-up Funding of Fujian University of Technology (GY-Z23203); and Natural Science Foundation of Fujian Province (2024J08217).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Raveendran, S.; Patil, M.D.; Birajdar, G.K. Underwater image enhancement: A comprehensive review, recent trends, challenges and applications. Artif. Intell. Rev. 2021, 54, 5413–5467. [Google Scholar] [CrossRef]
  2. Han, M.; Lyu, Z.; Qiu, T.; Xu, M. A review on intelligence dehazing and color restoration for underwater images. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1820–1832. [Google Scholar] [CrossRef]
  3. Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
  4. Schettini, R.; Corchs, S. Underwater image processing: State of the art of restoration and image enhancement methods. EURASIP J. Adv. Signal Process. 2010, 2010, 1–14. [Google Scholar] [CrossRef]
  5. Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef]
  6. Zhao, S.; Mei, X.; Ye, X.; Guo, S. MSFE-UIENet: A Multi-Scale Feature Extraction Network for Marine Underwater Image Enhancement. J. Mar. Sci. Eng. 2024, 12, 1472. [Google Scholar] [CrossRef]
  7. McGlamery, B.L. A computer model for underwater camera systems. In Ocean Optics VI; SPIE: France, Paris, 1980; Volume 208, pp. 221–231. [Google Scholar]
  8. Jaffe, J.S. Computer modeling and the design of optimal underwater imaging systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
  9. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
  10. Drews, P.; Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission estimation in underwater single images. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, NSW, Australia, 2–8 December 2013; pp. 825–830. [Google Scholar]
  11. Reza, A.M. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. VLSI Signal Process. Syst. Signal Image Video Technol. 2004, 38, 35–44. [Google Scholar] [CrossRef]
  12. Fan, X.; Lu, L.; Shi, P.; Zhang, X. Underwater image enhancement algorithm combining color correction and multi-scale fusion. In Proceedings of the 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Mai, Thailand, 19–22 May 2021; pp. 140–143. [Google Scholar]
  13. Yang, J.; Huang, H.; Lin, F.; Gao, X.; Jin, J.; Zhang, B. Underwater Image Enhancement Fusion Method Guided by Salient Region Detection. J. Mar. Sci. Eng. 2024, 12, 1383. [Google Scholar] [CrossRef]
  14. Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
  15. Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
  16. Lu, J.; Li, N.; Zhang, S.; Yu, Z.; Zheng, H.; Zheng, B. Multi-scale adversarial network for underwater image restoration. Opt. Laser Technol. 2019, 110, 105–113. [Google Scholar] [CrossRef]
  17. Liu, X.; Gao, Z.; Chen, B.M. IPMGAN: Integrating physical model and generative adversarial network for underwater image enhancement. Neurocomputing 2021, 453, 538–551. [Google Scholar] [CrossRef]
  18. Wu, S.; Luo, T.; Jiang, G.; Yu, M.; Xu, H.; Zhu, Z.; Song, Y. A two-stage underwater enhancement network based on structure decomposition and characteristics of underwater imaging. IEEE J. Ocean. Eng. 2021, 46, 1213–1227. [Google Scholar] [CrossRef]
  19. Sobel, I.E. Camera Models and Machine Perception; Stanford University: Stanford, CA, USA, 1970. [Google Scholar]
  20. Roberts, L.G. Machine Perception of Three-Dimensional Solids; Massachusetts Institute of Technology: Cambridge, MA, USA, 1963. [Google Scholar]
  21. Marr, D.; Hildreth, E. Theory of edge detection. Proc. R. Soc. Lond. Ser. B Biol. Sci. 1980, 207, 187–217. [Google Scholar]
  22. Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 6, 679–698. [Google Scholar] [CrossRef]
  23. Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
  24. Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1915–1929. [Google Scholar] [CrossRef]
  25. Shen, W.; Wang, X.; Wang, Y.; Bai, X.; Zhang, Z. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3982–3991. [Google Scholar]
  26. Bertasius, G.; Shi, J.; Torresani, L. Deepedge: A multi-scale bifurcated deep network for top-down contour detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4380–4389. [Google Scholar]
  27. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  28. Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
  29. Xu, K.; Ba, J.; Kiros, R.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: London, UK, 2015; pp. 2048–2057. [Google Scholar]
  30. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. (Nips), 2017. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  32. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  33. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  34. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
  35. Su, Z.; Liu, W.; Yu, Z.; Hu, D.; Liao, Q.; Tian, Q.; Pietikäinen, M.; Liu, L. Pixel difference networks for efficient edge detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5117–5127. [Google Scholar]
  36. Liu, L.; Fieguth, P.; Kuang, G.; Zha, H. Sorted random projections for robust texture classification. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 391–398. [Google Scholar]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  38. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  39. Huang, Y.; Yuan, F.; Xiao, F.; Lu, J.; Cheng, E. Underwater image enhancement based on zero-reference deep network. IEEE J. Ocean. Eng. 2023, 48, 903–924. [Google Scholar] [CrossRef]
  40. Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef] [PubMed]
  41. Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef] [PubMed]
  42. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
  43. Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
  44. Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]
  45. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
  46. Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
  47. Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; Romeny, B.t.H.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vision Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
  48. Srinivasan, S.; Balram, N. Adaptive contrast enhancement using local region stretching. In Proceedings of the 9th Asian Symposium on Information Display, New Delhi, India, 8–12 October 2006; pp. 152–155. [Google Scholar]
  49. Huang, S.; Wang, K.; Liu, H.; Chen, J.; Li, Y. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18145–18155. [Google Scholar]
  50. Korhonen, J.; You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, Australia, 5–7 July 2012; pp. 37–38. [Google Scholar]
  51. Yang, M.; Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
  52. Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater single image color restoration using haze-lines and a new quantitative dataset. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2822–2837. [Google Scholar] [CrossRef] [PubMed]
  53. Liu, Y.; Cheng, M.M.; Hu, X.; Wang, K.; Bai, X. Richer convolutional features for edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3000–3009. [Google Scholar]
  54. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  55. Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. Sam 2: Segment anything in images and videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
Figure 1. Raw underwater images. Underwater images commonly suffer from (a) color casts, (b) artifacts, and (c) blurred details.
Figure 1. Raw underwater images. Underwater images commonly suffer from (a) color casts, (b) artifacts, and (c) blurred details.
Jmse 12 01790 g001
Figure 2. The overview of our framework. First, the EFED module detects edge information in the image using an efficient network architecture. Subsequently, the original image and the extracted edge map are fed into the MCPFA module. The MCPFA module leverages an attention mechanism to fuse information from different color spaces and scales, enhancing the image and ultimately producing the enhanced result.
Figure 2. The overview of our framework. First, the EFED module detects edge information in the image using an efficient network architecture. Subsequently, the original image and the extracted edge map are fed into the MCPFA module. The MCPFA module leverages an attention mechanism to fuse information from different color spaces and scales, enhancing the image and ultimately producing the enhanced result.
Jmse 12 01790 g002
Figure 3. Pixel difference convolution flowchart [36]. * for point multiplication. First, calculating the difference between a target pixel and its neighboring pixels, then multiplying these differences by the corresponding weights in the convolution kernel and summing the results, and finally, outputting the sum as the feature value of the target pixel.
Figure 3. Pixel difference convolution flowchart [36]. * for point multiplication. First, calculating the difference between a target pixel and its neighboring pixels, then multiplying these differences by the corresponding weights in the convolution kernel and summing the results, and finally, outputting the sum as the feature value of the target pixel.
Jmse 12 01790 g003
Figure 4. Edge detection structure diagram. First, the original image undergoes multiple downsampling layers within the backbone network, extracting multi-scale edge features. Subsequently, these features are fed into four parallel auxiliary networks. The auxiliary networks utilize dilated convolutions to enlarge the receptive field, sampling global information and fusing features from different scales. This process enables refined edge processing. Finally, the auxiliary networks output a high-quality edge map.
Figure 4. Edge detection structure diagram. First, the original image undergoes multiple downsampling layers within the backbone network, extracting multi-scale edge features. Subsequently, these features are fed into four parallel auxiliary networks. The auxiliary networks utilize dilated convolutions to enlarge the receptive field, sampling global information and fusing features from different scales. This process enables refined edge processing. Finally, the auxiliary networks output a high-quality edge map.
Jmse 12 01790 g004
Figure 5. MCSF module. Integrates information from HSV, Lab, and RGB color spaces, along with edge information, to provide comprehensive features for subsequent image enhancement steps.
Figure 5. MCSF module. Integrates information from HSV, Lab, and RGB color spaces, along with edge information, to provide comprehensive features for subsequent image enhancement steps.
Jmse 12 01790 g005
Figure 6. CF-MHA architecture. First, the input feature map is divided into frequency bands based on scale channels. Then, each band undergoes multi-head attention computation independently. Color-aware weights are learned based on the attenuation levels of different colors at different locations. Finally, the multi-head attention outputs, adjusted by the color-aware weights, are fused to produce the final enhanced feature, effectively mitigating the color attenuation issue in underwater images.
Figure 6. CF-MHA architecture. First, the input feature map is divided into frequency bands based on scale channels. Then, each band undergoes multi-head attention computation independently. Color-aware weights are learned based on the attenuation levels of different colors at different locations. Finally, the multi-head attention outputs, adjusted by the color-aware weights, are fused to produce the final enhanced feature, effectively mitigating the color attenuation issue in underwater images.
Jmse 12 01790 g006
Figure 7. Visual comparison of the full-reference data on the test dataset of EUVP. From left to right; (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49], (i) our method and (j) reference image (recognized as ground-truthing (GT)).
Figure 7. Visual comparison of the full-reference data on the test dataset of EUVP. From left to right; (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49], (i) our method and (j) reference image (recognized as ground-truthing (GT)).
Jmse 12 01790 g007
Figure 8. Visual comparison of non-reference data from RUIE on the UCCS, UTTS, and UIQS datasets. From left to right: for (1) bluish-biased image, (2) bluish-green biased image, and (3) greenish-biased image data in the UCCS dataset with different color biases, and (4) underwater image quality data in the UIQS dataset that contains underwater images of various qualities for specific underwater mission, and (5) underwater target mission data in the image dataset UTTS for a specific underwater mission. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Figure 8. Visual comparison of non-reference data from RUIE on the UCCS, UTTS, and UIQS datasets. From left to right: for (1) bluish-biased image, (2) bluish-green biased image, and (3) greenish-biased image data in the UCCS dataset with different color biases, and (4) underwater image quality data in the UIQS dataset that contains underwater images of various qualities for specific underwater mission, and (5) underwater target mission data in the image dataset UTTS for a specific underwater mission. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Jmse 12 01790 g008
Figure 9. Visual comparison of reference data on the test dataset of EUVP. From left to right: (a) the original image, (b) Sobel [19], (c) Canny [22], (d) Laplace [21], (e) RCF [53], (f) ours and (g) ours on ground truth.
Figure 9. Visual comparison of reference data on the test dataset of EUVP. From left to right: (a) the original image, (b) Sobel [19], (c) Canny [22], (d) Laplace [21], (e) RCF [53], (f) ours and (g) ours on ground truth.
Jmse 12 01790 g009
Figure 10. Results of color space selection evaluation. Tests are performed on the test dataset of EUVP to obtain PSNR and SSIM results for each color space model test.
Figure 10. Results of color space selection evaluation. Tests are performed on the test dataset of EUVP to obtain PSNR and SSIM results for each color space model test.
Jmse 12 01790 g010
Figure 11. Results of ablation experiments on different components. From left to right: (a) Input, (b) U-net, (c) U + EFED, (d) U + MCSF, (e) U + CF-MHA, (f) U + EFED + MCSF, (g) U + MCSF + CF-MHA, (h) U + CF-MHA + EFED, (i) MCPFA, (j) GT. And zoomed-in local details.
Figure 11. Results of ablation experiments on different components. From left to right: (a) Input, (b) U-net, (c) U + EFED, (d) U + MCSF, (e) U + CF-MHA, (f) U + EFED + MCSF, (g) U + MCSF + CF-MHA, (h) U + CF-MHA + EFED, (i) MCPFA, (j) GT. And zoomed-in local details.
Jmse 12 01790 g011
Figure 12. The results of underwater target recognition. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Figure 12. The results of underwater target recognition. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Jmse 12 01790 g012
Figure 13. The results of the Segment Anything Model. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Figure 13. The results of the Segment Anything Model. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Jmse 12 01790 g013
Figure 14. Enhancement results of a real underwater cage environment. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Figure 14. Enhancement results of a real underwater cage environment. From left to right: (a) original underwater image, (b) UDCP [10], (c) HE [47], (d) CLAHE [11], (e) LRS [48], (f) FUnIE-GAN [3], (g) U-shape [41], (h) Semi-UIR [49] and (i) our method.
Jmse 12 01790 g014
Table 1. Results of the full-reference metrics. Evaluation of the full-reference metrics, including PSNR, SSIM, and MSE, on the test paired dataset of EUVP with a quality evaluation metric UIQM [44]. The first two of each metric are red, and blue, respectively. And ↑ means the bigger the indicator, the better, ↓ means the smaller the indicator, the better.
Table 1. Results of the full-reference metrics. Evaluation of the full-reference metrics, including PSNR, SSIM, and MSE, on the test paired dataset of EUVP with a quality evaluation metric UIQM [44]. The first two of each metric are red, and blue, respectively. And ↑ means the bigger the indicator, the better, ↓ means the smaller the indicator, the better.
MethodPSNR↑SSIM↑MSE↓UIQM↑
UDCP [10]16.390.63619.872.013
HE [47]13.610.61434.432.823
CLAHE [11]12.820.49035.782.705
LRS [48]17.370.78013.452.524
FUnIE-GAN [3]19.410.8108.4782.856
U-shape [41]20.340.8217.7062.918
Semi-UIR [49]21.730.7275.6782.803
Ours23.450.8213.7452.920
Table 2. Results of the non-reference metrics. Evaluation of non-reference metrics on unpaired datasets UCCS, UIQS, and UTTS of RUIE, including UIQM [44] and UCIQE. The first two digits of each metric are in red and blue, respectively. And ↑ means the bigger the indicator, the better.
Table 2. Results of the non-reference metrics. Evaluation of non-reference metrics on unpaired datasets UCCS, UIQS, and UTTS of RUIE, including UIQM [44] and UCIQE. The first two digits of each metric are in red and blue, respectively. And ↑ means the bigger the indicator, the better.
MethodUCCSUIQSUTTS
UIQM↑UCIQE↑UIQM↑UCIQE↑UIQM↑UCIQE↑
UDCP [10]2.1460.5252.2160.5042.5500.523
HE [47]3.0680.5713.0580.6063.1280.616
CLAHE [11]3.0490.5943.0050.5862.9300.604
LKS [48]2.8750.5432.9340.5963.0190.600
FUnIE-GAN [3]3.0870.5033.0200.5103.0640.526
U-shape [41]3.0310.5382.9560.5463.1000.545
Semi-UIR [49]3.0780.5533.0230.5663.1930.575
Ours3.1010.6103.0890.5993.2110.609
Table 3. Results of the processing performance of each method. There are three metrics, FLOPs, param. and processing time, respectively. And ↓ means the smaller the indicator, the better.
Table 3. Results of the processing performance of each method. There are three metrics, FLOPs, param. and processing time, respectively. And ↓ means the smaller the indicator, the better.
MethodFLOPs (GB)↓Param. (MB)↓Time (s)↓
UDCP [10]××0.329
HE [47]××0.038
CLAHE [11]××1.235
LRS [48]××0.341
FUnIE-GAN [3]10.247.0230.018
U-shape [41]5.5630.130.756
Semi-UIR [49]33.9419.390.362
Ours3.70527.010.088
Table 4. Results of different loss function combinations. The different color spatial combination models are tested on the test dataset of EUVP to obtain their PSNR and SSIM scores. The first three of each metric are red, blue and green, respectively. And ↑ means the bigger the indicator, the better.
Table 4. Results of different loss function combinations. The different color spatial combination models are tested on the test dataset of EUVP to obtain their PSNR and SSIM scores. The first three of each metric are red, blue and green, respectively. And ↑ means the bigger the indicator, the better.
CombinationsColor SpacePSNR↑SSIM↑
RGBLabHSV
RGB 21.260.720
Lab 18.230.694
HSV 16.020.672
RGB + Lab 20.230.804
RGB + HSV 19.940.812
Lab + HSV 18.670.746
MCSF23.450.832
Table 5. Results of different loss function combinations. The different loss function combination models are tested on the test dataset of EUVP to obtain their PSNR and SSIM scores. The first three of each metric are red, blue and green, respectively. And ↑ means the bigger the indicator, the better.
Table 5. Results of different loss function combinations. The different loss function combination models are tested on the test dataset of EUVP to obtain their PSNR and SSIM scores. The first three of each metric are red, blue and green, respectively. And ↑ means the bigger the indicator, the better.
CombinationsLoss FunctionPSNR↑SSIM↑
Color LossSSIM LossUIQM Loss
Color loss 21.260.720
SSIM loss 21.230.774
UIQM loss 18.020.702
Color + SSIM loss 22.950.812
Color + UIQM loss 22.600.804
SSIM + UIQM loss 21.630.776
All loss23.450.821
Table 6. Statistical results of an ablation study. Model performance is evaluated by PSNR, SSIM, and processing time as components are added incrementally. The best results are highlighted in red. And ↑ means the bigger the indicator, the better, ↓ means the smaller the indicator, the better.
Table 6. Statistical results of an ablation study. Model performance is evaluated by PSNR, SSIM, and processing time as components are added incrementally. The best results are highlighted in red. And ↑ means the bigger the indicator, the better, ↓ means the smaller the indicator, the better.
MethodModulePSNR↑SSIM↑Times (s)↓
EFEDMCSFCF-MHA
U-net [54] 16.390.7260.0456
U + EFED 15.610.7500.0580
U + MCSF 17.340.4900.0578
U + CF-MHA 12.820.7100.0575
U + EFED + MCSF 16.410.4810.0732
U + EFED + CF-MHA 20.340.8110.0762
U + MCSF + CF-MHA 18.730.7270.0679
MCPFA23.450.8210.0885
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, X.; Jin, J.; Lin, F.; Huang, H.; Yang, J.; Xie, Y.; Zhang, B. Enhancing Underwater Images through Multi-Frequency Detail Optimization and Adaptive Color Correction. J. Mar. Sci. Eng. 2024, 12, 1790. https://doi.org/10.3390/jmse12101790

AMA Style

Gao X, Jin J, Lin F, Huang H, Yang J, Xie Y, Zhang B. Enhancing Underwater Images through Multi-Frequency Detail Optimization and Adaptive Color Correction. Journal of Marine Science and Engineering. 2024; 12(10):1790. https://doi.org/10.3390/jmse12101790

Chicago/Turabian Style

Gao, Xiujing, Junjie Jin, Fanchao Lin, Hongwu Huang, Jiawei Yang, Yongfeng Xie, and Biwen Zhang. 2024. "Enhancing Underwater Images through Multi-Frequency Detail Optimization and Adaptive Color Correction" Journal of Marine Science and Engineering 12, no. 10: 1790. https://doi.org/10.3390/jmse12101790

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop