Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion
<p>Schematic of AAF strategy. (<b>a</b>) The multiscale channel attention module <span class="html-italic">F</span>. (<b>b</b>) The two-layer iterative AAF module.</p> "> Figure 2
<p>Specific design of the structural residual module (SRM). Structural residual parallel Sobel operator consisting of two 3 × 3 convolution kernels, two batch normalization layer, and two PReLU activation function.</p> "> Figure 3
<p>Schematic diagram of the overall AAF framework. SRM is the structural residual module and AAF is attention-based adaptive fusion module. conv1 consists of a 3 × 3 convolution kernel, a PReLU activation function, conv2 consists of a 3 × 3 convolution kernel, a batch normalization layer, a Tanh activation function, and conv3 consists of a 3 × 3 convolution kernel, a batch normalization layer, a Sigmoid activation function.</p> "> Figure 4
<p>Schematic diagram of AAF test framework. <span class="html-italic">M</span> represents the added modal features and <span class="html-italic">R</span> is the fused image.</p> "> Figure 5
<p>Qualitative experimental result comparison of the AAF model with DRF, Dual-branch Net and DIDFuse on the TNO and RoadScene datasets. The first two rows are infrared and visible images, and the following are the fused images by DRF, Dualbranch, DIDFuse, and our AAF model in order. Three columns on the left: TNO dataset, two columns on the right: RoadScene dataset.</p> "> Figure 6
<p>Qualitative comparison of AAF model with five state-of-the-art IVIF methods on the TNO and RoadScene test datasets. The first two top rows are the infrared and visible images, and the following are the fused images by IFEVIP, HMSD, HMSD_GF, FusionGAN, DenseFuse, and our AAF model. The three columns on the left: the TNO dataset, and the two columns on the right: the RoadScene dataset.</p> "> Figure 7
<p>Quantitative comparison results of the AAF model with five state-of-the-art methods on six metrics for the TNO test dataset.</p> "> Figure 8
<p>Quantitative comparison results of the AAF model with five state-of-the-art methods on six metrics for the RoadScene test dataset.</p> "> Figure 9
<p>The two columns on the left: TNO test dataset, and the three columns on the right: RoadScene test dataset. The first two top rows are the source images, followed by the fused images by the NO_Loss model and NO_S_AAF model and the full model, respectively.</p> "> Figure 10
<p>Qualitative ablation study of AAF strategy. The three columns on the left: TNO test dataset and the two columns on the right: RoadScene test dataset. The first two top rows are the source images, and the following are the fusion images by the model with concatenation strategy and those by the AAF model.</p> ">
Abstract
:1. Introduction
- 1.
- To the best of our knowledge, it is the first time an adaptive fusion strategy has been introduced in IVIF. The AAF module based on multiscale channel attention mechanism dynamically generates fusion weights for different features along the channel dimension in a pixel-wise dynamic weighting manner;
- 2.
- In addition to applying the AAF strategy to synthesize structural features and modal features, we also apply it to synthesize structural features from different sources before reconstruction, in contrast to usual addition strategy applied to features of the same type in current methods. As a consequence, our synthesized structural features reflect more entire structural information;
- 3.
- Our network can pay more attention to the dominant features and generate fused images with super-high quality.
2. Related Work
2.1. Deep Learning Image Fusion Methods
2.2. Existing Convergence Strategies
2.3. Attention Mechanism
3. Method
3.1. Attention-Based Adaptive Feature Fusion Strategy
- 1.
- Single-layer attention fusion.
- 2.
- Multiple iterations of attention fusion.is the fused features generated in n-th iteration.
3.2. Structure Information Enhancement
3.3. Overall Framework
3.4. Loss Function
3.4.1. Image Decomposition Loss
3.4.2. Image Reconstruction Loss
3.4.3. Global Structure Loss
4. Experimental Results and Analysis
4.1. Experimental Details
4.2. Experimental Comparison of Similar Models
4.3. Fusion Performance Comparison Test
4.4. Ablation Experiments
4.4.1. Edge and Texture Retention Analysis
4.4.2. AAF Fusion Strategy Ablation Analysis
4.5. Efficiency Comparison
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lu, Y.; Wu, Y.; Liu, B.; Zhang, T.; Li, B.; Chu, Q.; Yu, N. Cross-modality person re-identification with shared-specific feature transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 July 2020; pp. 13379–13389. [Google Scholar]
- Luo, X.; Zhang, Z.; Wu, X. A novel algorithm of remote sensing image fusion based on shift-invariant Shearlet transform and regional selection. AEU-Int. J. Electron. Commun. 2016, 70, 186–197. [Google Scholar] [CrossRef]
- Rajah, P.; Odindi, J.; Mutanga, O. Feature level image fusion of optical imagery and Synthetic Aperture Radar (SAR) for invasive alien plant species detection and mapping. Remote. Sens. Appl. Soc. Environ. 2018, 10, 198–208. [Google Scholar] [CrossRef]
- Ma, W.; Karakuş, O.; Rosin, P.L. AMM-FuseNet: Attention-based multi-modal image fusion network for land cover mapping. Remote. Sens. 2022, 14, 4458. [Google Scholar] [CrossRef]
- Ying, J.; Shen, H.L.; Cao, S.Y. Unaligned hyperspectral image fusion via registration and interpolation modeling. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
- Hu, Z.; Zhu, M.; Wang, Q.; Su, X.; Chen, F. SDGSAT-1 TIS Prelaunch Radiometric Calibration and Performance. Remote. Sens. 2022, 14, 4543. [Google Scholar] [CrossRef]
- Niu, Y.; Xu, S.; Wu, L.; Hu, W. Airborne infrared and visible image fusion for target perception based on target region segmentation and discrete wavelet transform. Math. Probl. Eng. 2012, 2012, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Yin, S.; Cao, L.; Ling, Y.; Jin, G. One color contrast enhanced infrared and visible image fusion method. Infrared Phys. Technol. 2010, 53, 146–150. [Google Scholar] [CrossRef]
- Pajares, G.; De La Cruz, J.M. A wavelet-based image fusion tutorial. Pattern Recognit. 2004, 37, 1855–1872. [Google Scholar] [CrossRef]
- Ben Hamza, A.; He, Y.; Krim, H.; Willsky, A. A multiscale approach to pixel-level image fusion. Integr.-Comput.-Aided Eng. 2005, 12, 135–146. [Google Scholar] [CrossRef] [Green Version]
- Li, S.; Kang, X.; Hu, J. Image fusion with guided filtering. IEEE Trans. Image Process. 2013, 22, 2864–2875. [Google Scholar]
- Zhu, Z.; Yin, H.; Chai, Y.; Li, Y.; Qi, G. A novel multi-modality image fusion method based on image decomposition and sparse representation. Inf. Sci. 2018, 432, 516–529. [Google Scholar] [CrossRef]
- Wang, K.; Qi, G.; Zhu, Z.; Chai, Y. A novel geometric dictionary construction approach for sparse representation based image fusion. Entropy 2017, 19, 306. [Google Scholar] [CrossRef] [Green Version]
- Kim, M.; Han, D.K.; Ko, H. Joint patch clustering-based dictionary learning for multimodal image fusion. Inf. Fusion 2016, 27, 198–214. [Google Scholar] [CrossRef]
- Liu, G.; Lin, Z.; Yu, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the Icml, International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 171–184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xu, H.; Wang, X.; Ma, J. DRF: Disentangled representation for visible and infrared image fusion. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
- Xu, H.; Gong, M.; Tian, X.; Huang, J.; Ma, J. CUFD: An encoder-decoder network for visible and infrared image fusion based on common and unique feature decomposition. Comput. Vis. Image Underst. 2022, 218, 103407. [Google Scholar] [CrossRef]
- Fu, Y.; Wu, X.J. A dual-branch network for infrared and visible image fusion. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 10675–10680. [Google Scholar]
- Li, P. DIDFuse: Deep image decomposition for infrared and visible image fusion. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, IJCAI Organization, Yokohama, Japan, 7–15 January 2021; p. 976. [Google Scholar]
- Kong, Q.; Zhou, H.; Wu, Y. NormFuse: Infrared and Visible Image Fusion With Pixel-Adaptive Normalization. IEEE/CAA J. Autom. Sin. 2022, 9, 2190–2192. [Google Scholar] [CrossRef]
- Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional feature fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 3560–3569. [Google Scholar]
- Li, H.; Wu, X.J.; Kittler, J. Infrared and visible image fusion using a deep learning framework. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 2705–2710. [Google Scholar]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [Green Version]
- Ma, J.; Liang, P.; Yu, W.; Chen, C.; Guo, X.; Wu, J.; Jiang, J. Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 2020, 54, 85–98. [Google Scholar] [CrossRef]
- Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.P. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Huo, H.; Li, C.; Wang, R.; Feng, Q. AttentionFGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans. Multimed. 2020, 23, 1383–1396. [Google Scholar] [CrossRef]
- Zhou, H.; Wu, W.; Zhang, Y.; Ma, J.; Ling, H. Semantic-supervised Infrared and Visible Image Fusion via a Dual-discriminator Generative Adversarial Network. IEEE Trans. Multimed. 2021, 25, 635–648. [Google Scholar] [CrossRef]
- Zhou, H.; Hou, J.; Zhang, Y.; Ma, J.; Ling, H. Unified gradient-and intensity-discriminator generative adversarial network for image fusion. Inf. Fusion 2022, 88, 184–201. [Google Scholar] [CrossRef]
- Zhou, Z.; Dong, M.; Xie, X.; Gao, Z. Fusion of infrared and visible images for night-vision context enhancement. Appl. Opt. 2016, 55, 6480–6490. [Google Scholar] [CrossRef]
- Bavirisetti, D.P.; Dhuli, R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 2016, 76, 52–64. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 510–519. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–13 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 3146–3154. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–13 June 2018; pp. 8759–8768. [Google Scholar]
- Xu, H.; Ma, J.; Le, Z.; Jiang, J.; Guo, X. Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 7–12 February 2020; Volume 34, pp. 12484–12491. [Google Scholar]
- Toet, A.; Hogervorst, M.A. Progress in color night vision. Opt. Eng. 2012, 51, 010901. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, L.; Bai, X.; Zhang, L. Infrared and visual image fusion through infrared feature extraction and visual information preservation. Infrared Phys. Technol. 2017, 83, 227–237. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, B.; Li, S.; Dong, M. Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Inf. Fusion 2016, 30, 15–26. [Google Scholar] [CrossRef]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Roberts, J.W.; Van Aardt, J.A.; Ahmed, F.B. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote. Sens. 2008, 2, 023522. [Google Scholar]
- Han, Y.; Cai, Y.; Cao, Y.; Xu, X. A new image fusion performance metric based on visual information fidelity. Inf. Fusion 2013, 14, 127–135. [Google Scholar] [CrossRef]
Context-Aware | Type | Formulation | Example |
---|---|---|---|
Addition | X + Y | DRF [17], Dual-branch [19] | |
Concatenation | DIDFuse [20], CUFD [18] | ||
None | Average | TIF [33] | |
Choose-max | GFF [11], IFCNN [24] | ||
Max- | NGDC [13] | ||
-norm | DenseFuse [26] | ||
Fully | Soft Selection | G(X + Y) ⊗ X + (1 − G(X + Y)) ⊗ Y | SKNet [34] |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
DRF | 6.4773 | 3.0594 | 2.1140 | 14.0305 | 28.2402 | 0.3480 |
Dualbranch | 6.3507 | 3.5606 | 2.3751 | 15.5899 | 24.4902 | 0.2988 |
DIDF | 7.1002 | 6.0910 | 4.3534 | 13.9378 | 48.0636 | 0.6456 |
OURS | 7.2234 | 7.5430 | 5.7878 | 14.4210 | 50.7800 | 0.6276 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
DRF | 7.2503 | 4.7228 | 3.4275 | 14.3983 | 44.6414 | 0.5403 |
Dualbranch | 6.7988 | 4.9488 | 3.3732 | 16.4648 | 31.0218 | 0.4433 |
DIDF | 7.3795 | 6.8482 | 5.6517 | 14.8007 | 52.0672 | 0.7935 |
OURS | 7.5016 | 7.7386 | 6.9507 | 14.6679 | 55.4926 | 0.8484 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
IFEVIP | 6.8462 | 5.7771 | 4.0711 | 13.6033 | 38.9830 | 0.4660 |
HMSD | 7.1094 | 6.3570 | 4.5506 | 15.9629 | 39.8504 | 0.4650 |
HMSD_GF | 7.1526 | 6.7203 | 4.9077 | 15.2930 | 44.0640 | 0.4720 |
DenseFuse | 6.5797 | 4.2510 | 2.7574 | 15.3697 | 31.5165 | 0.3716 |
FusionGAN | 6.6638 | 5.1906 | 3.6460 | 10.4271 | 30.5577 | 0.4299 |
OURS | 7.2234 | 7.5430 | 5.7878 | 14.4210 | 50.7800 | 0.6276 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
IFEVIP | 6.9767 | 6.5449 | 5.1182 | 14.0182 | 43.1022 | 0.6860 |
HMSD | 7.1896 | 7.2915 | 6.3481 | 16.0270 | 40.2494 | 0.5596 |
HMSD_GF | 7.4004 | 7.4812 | 6.6571 | 15.7815 | 48.0379 | 0.6702 |
DenseFuse | 6.9147 | 5.4990 | 3.7439 | 16.6239 | 34.9110 | 0.4891 |
FusionGAN | 7.1849 | 7.5082 | 6.6071 | 12.4593 | 39.9997 | 0.5565 |
OURS | 7.5016 | 7.7386 | 6.9507 | 14.6679 | 55.4926 | 0.8484 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
NO_Loss | 7.1109 | 6.4755 | 4.6123 | 12.3098 | 50.4393 | 0.6457 |
NO_S_AAF | 7.2530 | 6.9701 | 5.2198 | 14.1635 | 49.9152 | 0.6470 |
OURS | 7.2234 | 7.5430 | 5.7878 | 14.4210 | 50.7800 | 0.6276 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
NO_Loss | 7.3880 | 6.8765 | 5.7245 | 14.689 | 54.2375 | 0.8454 |
NO_S_AAF | 7.4297 | 6.6505 | 6.0720 | 13.7379 | 54.8860 | 0.8520 |
OURS | 7.5016 | 7.7386 | 6.9507 | 14.6697 | 55.4926 | 0.8484 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
S_CAT | 7.1671 | 6.5785 | 4.8097 | 13.1636 | 45.6796 | 0.5864 |
OURS | 7.2234 | 7.5430 | 5.7878 | 14.4210 | 50.7800 | 0.6276 |
Methods | EN | SF | MG | PSNR | SD | VIF |
---|---|---|---|---|---|---|
S_CAT | 7.0836 | 6.3428 | 4.9261 | 12.3407 | 44.0769 | 0.6438 |
OURS | 7.5016 | 7.7386 | 6.9507 | 14.6679 | 55.4926 | 0.8484 |
Datasets | IFEVIP | HMSD | HMSD_GF | DenseFuse | FusionGAN | OURS |
---|---|---|---|---|---|---|
TNO | 0.034 | 3.224 | 0.644 | 0.056 | 0.224 | 0.265 |
RoadScene | 0.029 | 1.555 | 0.317 | 0.046 | 0.119 | 0.193 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Hu, Z.; Kong, Q.; Qi, Q.; Liao, Q. Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion. Entropy 2023, 25, 407. https://doi.org/10.3390/e25030407
Wang L, Hu Z, Kong Q, Qi Q, Liao Q. Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion. Entropy. 2023; 25(3):407. https://doi.org/10.3390/e25030407
Chicago/Turabian StyleWang, Lei, Ziming Hu, Quan Kong, Qian Qi, and Qing Liao. 2023. "Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion" Entropy 25, no. 3: 407. https://doi.org/10.3390/e25030407