Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection
<p>Some examples of RGB-T datasets. (<b>a</b>) Ours. (<b>b</b>) PCNet. (<b>c</b>) TAGF.</p> "> Figure 2
<p>Overall architecture of our lightweight cross-modal information mutual reinforcement network for RGB-T salient object detection. ‘E1∼E5’ are the five modules of the encoder. ‘TDec’ and ‘RDec’ are the decoder modules of the auxiliary decoder. ‘CMIMR’ is the cross-modal information mutual reinforcement module. ‘SIGF’ is the semantic-information-guided fusion module.</p> "> Figure 3
<p>Architecture of the cross-modal information mutual reinforcement (CMIMR) module. ‘<math display="inline"><semantics> <mrow> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>v</mi> <mspace width="4pt"/> <mn>1</mn> <mo>×</mo> <mn>1</mn> </mrow> </semantics></math>’ is the <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <mn>1</mn> </mrow> </semantics></math> convolution. ‘<math display="inline"><semantics> <mrow> <mi>S</mi> <mi>A</mi> </mrow> </semantics></math>’ is the spatial attention. ‘<math display="inline"><semantics> <mrow> <mi>D</mi> <mi>S</mi> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>v</mi> <mspace width="4pt"/> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math>’ is the depth-separable convolution with the <math display="inline"><semantics> <mrow> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math> convolution kernel.</p> "> Figure 4
<p>Architecture of the semantic-information-guided fusion (SIGF) module. ‘<math display="inline"><semantics> <mrow> <mi>D</mi> <mi>S</mi> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>v</mi> <mspace width="4pt"/> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math>’ is the depth-separable convolution with the <math display="inline"><semantics> <mrow> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math> convolution kernel. ‘<math display="inline"><semantics> <mrow> <mi>V</mi> <mi>A</mi> <mi>B</mi> </mrow> </semantics></math>’ is the visual attention block. ‘<math display="inline"><semantics> <mrow> <mi>U</mi> <msub> <mi>p</mi> <mrow> <mo>×</mo> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>’ is the two-times upsample.</p> "> Figure 5
<p>PR curves and F-measure curves of the compared methods on the RGB-T datasets.</p> "> Figure 6
<p>Visual comparisons with other methods. (<b>a</b>) Ours. (<b>b</b>) ADF. (<b>c</b>) MIDD. (<b>d</b>) MMNet. (<b>e</b>) MIADPD. (<b>f</b>) OSRNet. (<b>g</b>) ECFFNet. (<b>h</b>) PCNet. (<b>i</b>) TAGF. (<b>j</b>) UMINet. (<b>k</b>) APNet.</p> "> Figure 7
<p>Visual comparisons with ablation experiments on the effectiveness of the CMIMR module. (<b>a</b>) Ours. (<b>b</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> CMIMR. (<b>c</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> PDFE. (<b>d</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> IMR.</p> "> Figure 8
<p>Visual comparisons with ablation experiments on the effectiveness of the SIGF module. (<b>a</b>) Ours. (<b>b</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> SIGF. (<b>c</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> SIE. (<b>d</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> VAB.</p> "> Figure 9
<p>Visual comparisons with ablation experiments on the effectiveness of the IoU loss and auxiliary decoder. (<b>a</b>) Ours. (<b>b</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> IoU. (<b>c</b>) <span class="html-italic">w</span>/<span class="html-italic">o</span> AD.</p> ">
Abstract
:1. Introduction
- We propose a lightweight cross-modal information mutual reinforcement network for RGB-T salient object detection. Our network comprises a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module.
- To fuse complementary information between two-modal features, we introduce the CMIMR module, which effectively refines the two-modal features.
- Extensive experiments conducted on three RGB-T datasets demonstrate the effectiveness of our method.
2. Related Works
Salient Object Detection
3. Methodology
3.1. Architecture Overview
3.2. Cross-Modal Information Mutual Reinforcement Module
3.3. Semantic-Information-Guided Fusion Module
3.4. Loss Function
4. Experiments
4.1. Experiment Settings
4.1.1. Datasets
4.1.2. Implementation Details
4.2. Evaluation Metrics
4.2.1.
4.2.2.
4.2.3.
4.2.4.
4.3. Comparisons with the SOTA Methods
4.3.1. Quantitative Comparison
4.3.2. Qualitative Comparison
4.4. Ablation Study
4.4.1. Effectiveness of Cross-Modal Information Mutual Reinforcement Module
4.4.2. Effectiveness of Semantic-Information-Guided Fusion Module
4.4.3. Effectiveness of Hybrid Loss and Auxiliary Decoder
4.5. Scalability on RGB-D Datasets
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, H.; Ma, M.; Wang, M.; Chen, Z.; Zhao, Y. SCFusion: Infrared and Visible Fusion Based on Salient Compensation. Entropy 2023, 25, 985. [Google Scholar] [CrossRef]
- Cui, X.; Peng, Z.; Jiang, G.; Chen, F.; Yu, M. Perceptual Video Coding Scheme Using Just Noticeable Distortion Model Based on Entropy Filter. Entropy 2019, 21, 1095. [Google Scholar] [CrossRef]
- Wang, W.; Wang, J.; Chen, J. Adaptive Block-Based Compressed Video Sensing Based on Saliency Detection and Side Information. Entropy 2021, 23, 1184. [Google Scholar] [CrossRef] [PubMed]
- Guan, X.; He, L.; Li, M.; Li, F. Entropy Based Data Expansion Method for Blind Image Quality Assessment. Entropy 2020, 22, 60. [Google Scholar] [CrossRef] [PubMed]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Liu, J.J.; Hou, Q.; Cheng, M.M.; Feng, J.; Jiang, J. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3917–3926. [Google Scholar]
- Pang, Y.; Zhao, X.; Zhang, L.; Lu, H. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9413–9422. [Google Scholar]
- Zhao, J.X.; Liu, J.J.; Fan, D.P.; Cao, Y.; Yang, J.; Cheng, M.M. EGNet: Edge guidance network for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8779–8788. [Google Scholar]
- Zhou, X.; Shen, K.; Liu, Z.; Gong, C.; Zhang, J.; Yan, C. Edge-aware multiscale feature integration network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5605315. [Google Scholar] [CrossRef]
- Fan, D.P.; Zhai, Y.; Borji, A.; Yang, J.; Shao, L. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 275–292. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Tu, Z.; Ma, Y.; Li, Z.; Li, C.; Xu, J.; Liu, Y. RGBT salient object detection: A large-scale dataset and benchmark. IEEE Trans. Multimed. 2022, 25, 4163–4176. [Google Scholar] [CrossRef]
- Huo, F.; Zhu, X.; Zhang, L.; Liu, Q.; Shu, Y. Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3111–3124. [Google Scholar] [CrossRef]
- Wu, R.; Bi, H.; Zhang, C.; Zhang, J.; Tong, Y.; Jin, W.; Liu, Z. Pyramid contract-based network for RGB-T salient object detection. Multimed. Tools Appl. 2023, 1–21. [Google Scholar] [CrossRef]
- Wang, H.; Song, K.; Huang, L.; Wen, H.; Yan, Y. Thermal images-aware guided early fusion network for cross-illumination RGB-T salient object detection. Eng. Appl. Artif. Intell. 2023, 118, 105640. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Wu, Z.; Su, L.; Huang, Q. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3907–3916. [Google Scholar]
- Guo, M.H.; Lu, C.Z.; Liu, Z.N.; Cheng, M.M.; Hu, S.M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Gupta, A.K.; Seal, A.; Prasad, M.; Khanna, P. Salient Object Detection Techniques in Computer Vision—A Survey. Entropy 2020, 22, 1174. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Chen, F.; Peng, Z.; Zou, W.; Zhang, C. Exploring Focus and Depth-Induced Saliency Detection for Light Field. Entropy 2023, 25, 1336. [Google Scholar] [CrossRef]
- Zhou, X.; Fang, H.; Liu, Z.; Zheng, B.; Sun, Y.; Zhang, J.; Yan, C. Dense attention-guided cascaded network for salient object detection of strip steel surface defects. IEEE Trans. Instrum. Meas. 2021, 71, 5004914. [Google Scholar] [CrossRef]
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 385–400. [Google Scholar]
- Zhou, X.; Shen, K.; Weng, L.; Cong, R.; Zheng, B.; Zhang, J.; Yan, C. Edge-guided recurrent positioning network for salient object detection in optical remote sensing images. IEEE Trans. Cybern. 2022, 53, 539–552. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7479–7489. [Google Scholar]
- Li, G.; Liu, Z.; Zhang, X.; Lin, W. Lightweight salient object detection in optical remote-sensing images via semantic matching and edge alignment. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5601111. [Google Scholar] [CrossRef]
- Li, G.; Liu, Z.; Bai, Z.; Lin, W.; Ling, H. Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5617712. [Google Scholar] [CrossRef]
- Liu, N.; Zhang, N.; Han, J. Learning selective self-mutual attention for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13756–13765. [Google Scholar]
- Li, G.; Liu, Z.; Ling, H. ICNet: Information conversion network for RGB-D based salient object detection. IEEE Trans. Image Process. 2020, 29, 4873–4884. [Google Scholar] [CrossRef]
- Wen, H.; Yan, C.; Zhou, X.; Cong, R.; Sun, Y.; Zheng, B.; Zhang, J.; Bao, Y.; Ding, G. Dynamic selective network for RGB-D salient object detection. IEEE Trans. Image Process. 2021, 30, 9179–9192. [Google Scholar] [CrossRef] [PubMed]
- Bi, H.; Wu, R.; Liu, Z.; Zhu, H.; Zhang, C.; Xiang, T.Z. Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recognit. 2023, 136, 109194. [Google Scholar] [CrossRef]
- Chen, T.; Hu, X.; Xiao, J.; Zhang, G.; Wang, S. CFIDNet: Cascaded feature interaction decoder for RGB-D salient object detection. Neural Comput. Appl. 2022, 34, 7547–7563. [Google Scholar] [CrossRef]
- Chen, H.; Deng, Y.; Li, Y.; Hung, T.Y.; Lin, G. RGBD salient object detection via disentangled cross-modal fusion. IEEE Trans. Image Process. 2020, 29, 8407–8416. [Google Scholar] [CrossRef]
- Wu, Z.; Allibert, G.; Meriaudeau, F.; Ma, C.; Demonceaux, C. Hidanet: Rgb-d salient object detection via hierarchical depth awareness. IEEE Trans. Image Process. 2023, 32, 2160–2173. [Google Scholar] [CrossRef]
- Jin, X.; Yi, K.; Xu, J. MoADNet: Mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7632–7645. [Google Scholar] [CrossRef]
- Wan, B.; Lv, C.; Zhou, X.; Sun, Y.; Zhu, Z.; Wang, H.; Yan, C. TMNet: Triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection. Pattern Recognit. 2024, 147, 110074. [Google Scholar] [CrossRef]
- Wan, B.; Zhou, X.; Sun, Y.; Wang, T.; Lv, C.; Wang, S.; Yin, H.; Yan, C. MFFNet: Multi-modal Feature Fusion Network for V-D-T Salient Object Detection. IEEE Trans. Multimed. 2023, 26, 2069–2081. [Google Scholar] [CrossRef]
- Zhang, Q.; Xiao, T.; Huang, N.; Zhang, D.; Han, J. Revisiting feature fusion for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1804–1818. [Google Scholar] [CrossRef]
- Gao, W.; Liao, G.; Ma, S.; Li, G.; Liang, Y.; Lin, W. Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2091–2106. [Google Scholar] [CrossRef]
- Liang, Y.; Qin, G.; Sun, M.; Qin, J.; Yan, J.; Zhang, Z. Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection. Neurocomputing 2022, 490, 132–145. [Google Scholar] [CrossRef]
- Zhou, W.; Guo, Q.; Lei, J.; Yu, L.; Hwang, J.N. ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1224–1235. [Google Scholar] [CrossRef]
- Cong, R.; Zhang, K.; Zhang, C.; Zheng, F.; Zhao, Y.; Huang, Q.; Kwong, S. Does thermal really always matter for RGB-T salient object detection? IEEE Trans. Multimed. 2022, 25, 6971–6982. [Google Scholar] [CrossRef]
- Chen, G.; Shao, F.; Chai, X.; Chen, H.; Jiang, Q.; Meng, X.; Ho, Y.S. CGMDRNet: Cross-guided modality difference reduction network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6308–6323. [Google Scholar] [CrossRef]
- Ma, S.; Song, K.; Dong, H.; Tian, H.; Yan, Y. Modal complementary fusion network for RGB-T salient object detection. Appl. Intell. 2023, 53, 9038–9055. [Google Scholar] [CrossRef]
- Tu, Z.; Li, Z.; Li, C.; Lang, Y.; Tang, J. Multi-Interactive dual-decoder for RGB-Thermal salient object detection. IEEE Trans. Image Process. 2021, 30, 5678–5691. [Google Scholar] [CrossRef] [PubMed]
- Zhou, W.; Zhu, Y.; Lei, J.; Yang, R.; Yu, L. LSNet: Lightweight spatial boosting network for detecting salient objects in RGB-thermal images. IEEE Trans. Image Process. 2023, 32, 1329–1340. [Google Scholar] [CrossRef] [PubMed]
- Zhou, T.; Fu, H.; Chen, G.; Zhou, Y.; Fan, D.P.; Shao, L. Specificity-preserving rgb-d saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4681–4691. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.S. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5659–5667. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Hou, Q.; Cheng, M.M.; Hu, X.; Borji, A.; Tu, Z.; Torr, P.H. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3203–3212. [Google Scholar]
- De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Máttyus, G.; Luo, W.; Urtasun, R. Deeproadmapper: Extracting road topology from aerial images. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3438–3446. [Google Scholar]
- Wang, G.; Li, C.; Ma, Y.; Zheng, A.; Tang, J.; Luo, B. RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach. In Image and Graphics Technologies, Proceedings of the 13th Conference on Image and Graphics Technologies and Applications, IGTA 2018, Beijing, China, 8–10 April 2018; Springer: Singapore, 2018; pp. 359–369. [Google Scholar]
- Tu, Z.; Xia, T.; Li, C.; Wang, X.; Ma, Y.; Tang, J. RGB-T image saliency detection via collaborative graph learning. IEEE Trans. Multimed. 2019, 22, 160–173. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1597–1604. [Google Scholar]
- Fan, D.P.; Gong, C.; Cao, Y.; Ren, B.; Cheng, M.M.; Borji, A. Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 698–704. [Google Scholar]
- Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4548–4557. [Google Scholar]
- Huo, F.; Zhu, X.; Zhang, Q.; Liu, Z.; Yu, W. Real-time one-stream semantic-guided refinement network for RGB-thermal salient object detection. IEEE Trans. Instrum. Meas. 2022, 71, 2512512. [Google Scholar] [CrossRef]
- Gao, L.; Fu, P.; Xu, M.; Wang, T.; Liu, B. UMINet: A unified multi-modality interaction network for RGB-D and RGB-T salient object detection. Vis. Comput. 2023, 1–18. [Google Scholar] [CrossRef]
- Song, K.; Huang, L.; Gong, A.; Yan, Y. Multiple graph affinity interactive network and a variable illumination dataset for RGBT image salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 3104–3118. [Google Scholar] [CrossRef]
- Zhou, W.; Zhu, Y.; Lei, J.; Wan, J.; Yu, L. APNet: Adversarial learning assistance and perceived importance fusion network for all-day RGB-T salient object detection. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 6, 957–968. [Google Scholar] [CrossRef]
- Wang, J.; Song, K.; Bao, Y.; Huang, L.; Yan, Y. CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2949–2961. [Google Scholar] [CrossRef]
- Fan, D.P.; Lin, Z.; Zhang, Z.; Zhu, M.; Cheng, M.M. Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2075–2089. [Google Scholar] [CrossRef] [PubMed]
- Ju, R.; Ge, L.; Geng, W.; Ren, T.; Wu, G. Depth saliency based on anisotropic center-surround difference. In Proceedings of the IEEE International Conference on Image Processing, Paris, France, 27–30 October 2014; pp. 1115–1119. [Google Scholar]
- Peng, H.; Li, B.; Xiong, W.; Hu, W.; Ji, R. Rgbd salient object detection: A benchmark and algorithms. In Computer Vision—ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 92–109. [Google Scholar]
- Niu, Y.; Geng, Y.; Li, X.; Liu, F. Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 454–461. [Google Scholar]
- Piao, Y.; Ji, W.; Li, J.; Zhang, M.; Lu, H. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7254–7263. [Google Scholar]
- Wang, N.; Gong, X. Adaptive fusion for RGB-D salient object detection. IEEE Access 2019, 7, 55277–55284. [Google Scholar] [CrossRef]
- Bi, H.; Wu, R.; Liu, Z.; Zhang, J.; Zhang, C.; Xiang, T.Z.; Wang, X. PSNet: Parallel symmetric network for RGB-T salient object detection. Neurocomputing 2022, 511, 410–425. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, L.; Pang, Y.; Lu, H.; Zhang, L. A single stream network for robust and real-time RGB-D salient object detection. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 646–662. [Google Scholar]
Pub. | Param ↓ | FLOP ↓ | FPS ↑ | VT5000 | VT1000 | VT821 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M | G | CPU | GPU | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | |||
RGB | BASNet | CVPR19 | 87.1 | 127.6 | 0.94 | 73.0 | 0.0542 | 0.762 | 0.8386 | 0.878 | 0.0305 | 0.8449 | 0.9086 | 0.9223 | 0.0673 | 0.7335 | 0.8228 | 0.8556 |
EGNet | ICCV19 | 108.0 | 156.8 | 0.93 | 95.1 | 0.0511 | 0.7741 | 0.853 | 0.8886 | 0.0329 | 0.8474 | 0.9097 | 0.923 | 0.0637 | 0.7255 | 0.8301 | 0.8581 | |
CPD | CVPR19 | 47.9 | 17.8 | 3.97 | 38.2 | 0.0465 | 0.7859 | 0.8547 | 0.8964 | 0.0312 | 0.8617 | 0.9072 | 0.9308 | 0.0795 | 0.7173 | 0.8184 | 0.8474 | |
RGB-T | ADF | TMM22 | − | − | − | − | 0.0483 | 0.7775 | 0.8635 | 0.891 | 0.034 | 0.8458 | 0.9094 | 0.9222 | 0.0766 | 0.7159 | 0.8102 | 0.8443 |
MIDD | TIP21 | 52.4 | 216.7 | 1.56 | 36.5 | 0.0461 | 0.7876 | 0.8561 | 0.8926 | 0.0293 | 0.8695 | 0.9069 | 0.9353 | 0.0446 | 0.8032 | 0.8712 | 0.8974 | |
MMNet | TCSVT21 | 64.1 | 42.5 | 1.79 | 31.1 | 0.0433 | 0.7809 | 0.8618 | 0.8894 | 0.0268 | 0.8626 | 0.9133 | 0.932 | 0.0397 | 0.7949 | 0.8731 | 0.8944 | |
MIADPD | NP22 | − | − | − | − | 0.0404 | 0.7925 | 0.8786 | 0.8968 | 0.0251 | 0.8674 | 0.9237 | 0.936 | 0.0699 | 0.7398 | 0.8444 | 0.8529 | |
OSRNet | TIM22 | 15.6 | 42.4 | 2.29 | 63.1 | 0.0399 | 0.8207 | 0.8752 | 0.9108 | 0.0221 | 0.8896 | 0.9258 | 0.9491 | 0.0426 | 0.8114 | 0.8751 | 0.9 | |
ECFFNet | TCSVT21 | − | − | − | − | 0.0376 | 0.8083 | 0.8736 | 0.9123 | 0.0214 | 0.8778 | 0.9224 | 0.9482 | 0.0344 | 0.8117 | 0.8761 | 0.9088 | |
PCNet | MTA23 | − | − | − | − | 0.0363 | 0.829 | 0.8749 | 0.9188 | 0.021 | 0.8865 | 0.932 | 0.9482 | 0.0362 | 0.8193 | 0.8734 | 0.9005 | |
TAGF | EAAI23 | 36.2 | 115.1 | 0.87 | 33.1 | 0.0359 | 0.8256 | 0.8836 | 0.9162 | 0.0211 | 0.8879 | 0.9264 | 0.9508 | 0.0346 | 0.8205 | 0.8805 | 0.9091 | |
UMINet | VC23 | − | − | − | − | 0.0354 | 0.8293 | 0.882 | 0.922 | 0.0212 | 0.8906 | 0.926 | 0.9561 | 0.0542 | 0.7891 | 0.8583 | 0.8866 | |
APNet | TETCI21 | 30.4 | 46.6 | 0.99 | 36.9 | 0.0345 | 0.8221 | 0.8751 | 0.9182 | 0.0213 | 0.8848 | 0.9204 | 0.9515 | 0.0341 | 0.8181 | 0.8669 | 0.9121 | |
Our | 6.1 | 1.5 | 6.5 | 34.9 | 0.0321 | 0.8463 | 0.8795 | 0.932 | 0.0205 | 0.9016 | 0.9229 | 0.9608 | 0.0311 | 0.841 | 0.8776 | 0.9262 |
Pub. | Param ↓ | FLOP ↓ | FPS ↑ | VT5000 | VT1000 | VT821 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M | G | CPU | GPU | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ||
CSRNet | TCSVT21 | 1.0 | 4.4 | 2.7 | 24.8 | 0.0417 | 0.8093 | 0.8678 | 0.9068 | 0.0242 | 0.8751 | 0.9184 | 0.9393 | 0.0376 | 0.8289 | 0.8847 | 0.9116 |
LSNet | TIP23 | 4.6 | 1.2 | 11.6 | 51.1 | 0.0367 | 0.8269 | 0.8764 | 0.9206 | 0.0224 | 0.8874 | 0.9244 | 0.9528 | 0.0329 | 0.8276 | 0.8777 | 0.9179 |
Our | 6.1 | 1.5 | 6.5 | 34.9 | 0.0321 | 0.8463 | 0.8795 | 0.932 | 0.0205 | 0.9016 | 0.9229 | 0.9608 | 0.0311 | 0.841 | 0.8776 | 0.9262 |
VT5000 | VT1000 | VT821 | ||||
---|---|---|---|---|---|---|
↑ | ↑ | ↑ | ↑ | ↑ | ↑ | |
LSNet | 0.7609 | 0.8411 | 0.8627 | 0.9137 | 0.7665 | 0.8393 |
Our | 0.7721 | 0.8531 | 0.865 | 0.916 | 0.7684 | 0.8439 |
0.7728 | 0.8531 | 0.863 | 0.9149 | 0.7676 | 0.8424 | |
0.7718 | 0.852 | 0.8649 | 0.9161 | 0.7608 | 0.8357 | |
0.7738 | 0.8538 | 0.8632 | 0.9151 | 0.7669 | 0.8416 | |
0.771 | 0.8519 | 0.8629 | 0.9141 | 0.7685 | 0.8432 | |
0.7703 | 0.8512 | 0.8624 | 0.9135 | 0.765 | 0.8398 | |
p-value | 1.9 | 4.7 | 0.0562 | 0.0154 | 0.5938 | 1.1 |
VT5000 | VT1000 | VT821 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
No. | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ |
1 | 0.0321 | 0.8463 | 0.8795 | 0.932 | 0.0205 | 0.9016 | 0.9229 | 0.9608 | 0.0311 | 0.841 | 0.8776 | 0.9262 |
2 | 0.0325 | 0.843 | 0.8797 | 0.9311 | 0.0205 | 0.8978 | 0.9215 | 0.9589 | 0.0312 | 0.8385 | 0.8764 | 0.9251 |
3 | 0.0322 | 0.8451 | 0.8797 | 0.9318 | 0.0199 | 0.9004 | 0.9232 | 0.9608 | 0.032 | 0.8384 | 0.8735 | 0.9222 |
4 | 0.0324 | 0.8436 | 0.88 | 0.9319 | 0.0203 | 0.8973 | 0.9216 | 0.9591 | 0.0316 | 0.8369 | 0.8761 | 0.9244 |
5 | 0.0331 | 0.8401 | 0.8786 | 0.9299 | 0.0205 | 0.8972 | 0.9214 | 0.9597 | 0.0311 | 0.8361 | 0.8773 | 0.9242 |
6 | 0.0332 | 0.8407 | 0.8781 | 0.93 | 0.0205 | 0.8981 | 0.9214 | 0.9595 | 0.031 | 0.8369 | 0.8753 | 0.9242 |
VT5000 | VT1000 | VT821 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Compared Method | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ |
BASNet | 4.8 × | 2.5 × | 2.2 × | 2.1 × | 8.4 × | 4.8 × | 9.3 × | 5.5 × | 1.6 × | 1.4 × | 1.9 × | 2.7 × |
EGNet | 1.0 × | 5.7 × | 2.0 × | 6.2 × | 2.9 × | 6.1 × | 1.4 × | 6.1 × | 2.7 × | 1.0 × | 3.9 × | 3.3 × |
CPD | 4.3 × | 1.4 × | 2.8 × | 1.7 × | 6.0 × | 3.1 × | 5.7 × | 2.0 × | 3.7 × | 7.0 × | 1.3 × | 1.6 × |
ADF | 2.4 × | 7.3 × | 2.5 × | 8.3 × | 1.9 × | 5.2 × | 1.3 × | 5.5 × | 5.0 × | 6.6 × | 6.5 × | 1.3 × |
MIDD | 5.0 × | 1.7 × | 3.7 × | 1.0 × | 1.6 × | 1.0 × | 5.1 × | 4.6 × | 2.3 × | 3.5 × | 0.0003 | 2.9 × |
MMNet | 1.6 × | 9.5 × | 1.5 × | 6.9 × | 8.1 × | 3.5 × | 8.0 × | 2.5 × | 2.3 × | 1.2 × | 0.0024 | 1.7 × |
MIADPD | 7.7 × | 2.7 × | 0.0399 | 1.8 × | 3.8 × | 7.2 × | 0.9980 | 5.4 × | 1.1 × | 2.0 × | 2.5 × | 2.3 × |
OSRNet | 1.1 × | 1.5 × | 2.1 × | 2.5 × | 5.5 × | 3.2 × | 0.9999 | 2.9 × | 5.2 × | 1.4 × | 0.0932 | 4.9 × |
ECFFNet | 7.0 × | 1.7 × | 4.1 × | 3.7 × | 6.9 × | 5.4 × | 0.8566 | 1.9 × | 3.4 × | 1.4 × | 0.5414 | 4.5 × |
PCNet | 3.1 × | 1.5 × | 1.5 × | 3.0 × | 0.0007 | 7.7 × | 1 | 1.9 × | 3.4 × | 7.8 × | 0.0038 | 5.4 × |
TAGF | 5.5 × | 5.2 × | 1.5 × | 1.2 × | 0.0004 | 1.4 × | 0.9999 | 6.8 × | 2.5 × | 1.1 × | 0.9996 | 5.0 × |
UMINet | 1.2 × | 1.7 × | 0.0001 | 1.4 × | 0.0002 | 5.6 × | 0.9999 | 5.4 × | 1.5 × | 6.4 × | 4.5 × | 5.5 × |
APNet | 7.9 × | 2.1 × | 1.9 × | 2.4 × | 0.0001 | 4.0 × | 0.0025 | 1.0 × | 5.6 × | 5.7 × | 1.2 × | 1.5 × |
CSRNet | 3.6 × | 2.0 × | 1.2 × | 1.0 × | 1.1 × | 2.9 × | 6.1 × | 1.1 × | 9.7 × | 2.8 × | 0.9999 | 1.2 × |
LSNet | 1.9 × | 7.6 × | 0.0001 | 6.6 × | 2.5 × | 1.1 × | 0.9996 | 2.4 × | 9.0 × | 1.4 × | 0.9794 | 3.4 × |
VT5000 | VT1000 | VT821 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | |
w/o CMIMR | 0.0338 | 0.8321 | 0.8744 | 0.9274 | 0.0222 | 0.8881 | 0.9174 | 0.9556 | 0.0334 | 0.8249 | 0.8682 | 0.9163 |
w/o PDFE | 0.0328 | 0.8396 | 0.8762 | 0.9295 | 0.0211 | 0.8935 | 0.92 | 0.9571 | 0.033 | 0.8309 | 0.8693 | 0.9182 |
w/o IMR | 0.0331 | 0.8394 | 0.8777 | 0.9292 | 0.0208 | 0.8945 | 0.9203 | 0.9577 | 0.0321 | 0.8308 | 0.8712 | 0.9208 |
w ADF-TMF | 0.0329 | 0.8396 | 0.8778 | 0.9309 | 0.0208 | 0.8934 | 0.9189 | 0.9591 | 0.0314 | 0.8368 | 0.8766 | 0.9259 |
w/o SIGF | 0.0334 | 0.8366 | 0.8767 | 0.9287 | 0.0215 | 0.8853 | 0.9159 | 0.9541 | 0.0316 | 0.827 | 0.8747 | 0.9207 |
w/o SIE | 0.0327 | 0.8405 | 0.8784 | 0.93 | 0.0208 | 0.8927 | 0.9202 | 0.9571 | 0.0335 | 0.8308 | 0.8712 | 0.9201 |
w/o VAB | 0.033 | 0.8392 | 0.8771 | 0.9299 | 0.0208 | 0.894 | 0.9199 | 0.9572 | 0.0312 | 0.8327 | 0.8748 | 0.9229 |
w ADF-Decoder | 0.0328 | 0.8377 | 0.8783 | 0.9299 | 0.021 | 0.8941 | 0.9198 | 0.9582 | 0.0319 | 0.8354 | 0.8772 | 0.9238 |
w SIGF-FAM | 0.0328 | 0.8416 | 0.8795 | 0.9312 | 0.0205 | 0.8965 | 0.9215 | 0.9595 | 0.0316 | 0.8351 | 0.8775 | 0.9231 |
w SIGF-RFB | 0.0328 | 0.8411 | 0.8794 | 0.9302 | 0.0208 | 0.8966 | 0.9219 | 0.9584 | 0.0328 | 0.8354 | 0.8766 | 0.9221 |
w/o IoU | 0.0331 | 0.8344 | 0.8788 | 0.9276 | 0.0222 | 0.8828 | 0.9216 | 0.9488 | 0.0332 | 0.8259 | 0.8764 | 0.9165 |
0.0327 | 0.8396 | 0.8847 | 0.9289 | 0.0211 | 0.8903 | 0.9269 | 0.9499 | 0.0304 | 0.8353 | 0.8872 | 0.9219 | |
0.0419 | 0.7967 | 0.8578 | 0.9065 | 0.0265 | 0.8727 | 0.9139 | 0.9403 | 0.0427 | 0.7716 | 0.8446 | 0.8914 | |
0.0461 | 0.7608 | 0.8389 | 0.8911 | 0.0354 | 0.8327 | 0.8864 | 0.9204 | 0.0518 | 0.745 | 0.8228 | 0.8751 | |
+ + | 0.0402 | 0.7649 | 0.8774 | 0.8844 | 0.0276 | 0.844 | 0.9214 | 0.9216 | 0.0407 | 0.7677 | 0.8793 | 0.8802 |
w LPW | 0.0335 | 0.8316 | 0.8818 | 0.9255 | 0.0211 | 0.8861 | 0.9259 | 0.9493 | 0.0311 | 0.8296 | 0.8891 | 0.9199 |
w/o AD | 0.036 | 0.8294 | 0.8778 | 0.9228 | 0.0211 | 0.8902 | 0.9261 | 0.9522 | 0.0334 | 0.8277 | 0.8794 | 0.9198 |
RGB | 0.0419 | 0.8105 | 0.8616 | 0.9115 | 0.0257 | 0.8809 | 0.916 | 0.9467 | 0.0543 | 0.7638 | 0.8431 | 0.8939 |
T | 0.044 | 0.7766 | 0.8439 | 0.9007 | 0.0339 | 0.8444 | 0.8884 | 0.9286 | 0.0494 | 0.7595 | 0.8249 | 0.8853 |
Our | 0.0321 | 0.8463 | 0.8795 | 0.932 | 0.0205 | 0.9016 | 0.9229 | 0.9608 | 0.0311 | 0.841 | 0.8776 | 0.9262 |
VT5000 | VT1000 | VT821 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Ablation Variant | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ | ↓ | ↑ | ↑ | ↑ |
w/o CMIMR | 0.0006 | 5.0 × | 8.6 × | 0.0001 | 4.2 × | 1.5 × | 1.8 × | 2.9 × | 2.4 × | 4.6 × | 2.5 × | 1.2 × |
w/o PDFE | 0.1514 | 0.0080 | 8.2 × | 0.0045 | 0.0004 | 0.0005 | 0.0010 | 0.0002 | 6.7 × | 9.2 × | 5.3 × | 4.3 × |
w/o IMR | 0.0204 | 0.0064 | 0.0018 | 0.0022 | 0.0036 | 0.0012 | 0.0019 | 0.0008 | 0.0024 | 8.6 × | 0.0003 | 0.0006 |
w ADF-TMF | 0.0771 | 0.0080 | 0.0024 | 0.3017 | 0.0036 | 0.0004 | 0.0001 | 0.0461 | 0.3457 | 0.0824 | 0.8023 | 0.9816 |
w/o SIGF | 0.0037 | 0.0006 | 0.0002 | 0.0008 | 4.4 × | 4.8 × | 4.6 × | 6.6 × | 0.0766 | 1.1 × | 0.0402 | 0.0005 |
w/o SIE | 0.2818 | 0.0223 | 0.0179 | 0.0178 | 0.0036 | 0.0002 | 0.0015 | 0.0002 | 1.9 × | 8.6 × | 0.0003 | 0.0002 |
w/o VAB | 0.0392 | 0.0052 | 0.0004 | 0.0133 | 0.0036 | 0.0007 | 0.0008 | 0.0003 | 0.7808 | 0.0004 | 0.0495 | 0.0199 |
w ADF-Decoder | 0.1514 | 0.0014 | 0.0123 | 0.0133 | 0.0007 | 0.0008 | 0.0006 | 0.0025 | 0.0080 | 0.0080 | 0.9431 | 0.1634 |
w SIGF-FAM | 0.1514 | 0.0906 | 0.7613 | 0.5802 | 0.1177 | 0.0151 | 0.0983 | 0.2068 | 0.0766 | 0.0052 | 0.9694 | 0.0312 |
w SIGF-RFB | 0.1514 | 0.0473 | 0.6604 | 0.0330 | 0.0036 | 0.0177 | 0.3889 | 0.0044 | 0.0001 | 0.0080 | 0.8023 | 0.0040 |
w/o IoU | 0.0204 | 0.0002 | 0.0927 | 0.0001 | 4.2 × | 2.1 × | 0.1434 | 2.5 × | 3.9 × | 6.8 × | 0.7131 | 1.3 × |
0.2817 | 0.0080 | 0.9999 | 0.0012 | 0.0004 | 4.7 × | 0.9999 | 4.3 × | 0.999 | 0.0069 | 1 | 0.0029 | |
3.2 × | 4.1 × | 5.4 × | 9.6 × | 1.0 × | 1.8 × | 1.1 × | 1.5 × | 5.0 × | 1.4 × | 2.6 × | 1.1 × | |
5.0 × | 2.4 × | 2.3 × | 8.5 × | 1.2 × | 1.7 × | 7.1 × | 4.3 × | 2.6 × | 2.6 × | 1.9 × | 1.5 × | |
+ + | 8.8 × | 3.0 × | 0.0008 | 3.9 × | 4.5 × | 4.4 × | 0.0669 | 5.0 × | 1.3 × | 1.1 × | 0.9988 | 2.5 × |
w LPW | 0.0023 | 4.0 × | 0.9998 | 1.5 × | 0.0004 | 6.5 × | 0.9999 | 3.2 × | 0.8996 | 4.1 × | 1 | 0.0002 |
w/o AD | 4.7 × | 1.7 × | 0.0024 | 2.1 × | 0.0004 | 4.5 × | 0.9999 | 1.6 × | 2.4 × | 1.5 × | 0.9987 | 0.0002 |
RGB | 3.2 × | 2.4 × | 1.4 × | 3.0 × | 2.1 × | 1.2 × | 5.0 × | 1.1 × | 1.5 × | 8.0 × | 2.1 × | 1.6 × |
T | 1.2 × | 6.8 × | 4.5 × | 3.3 × | 2.0 × | 4.6 × | 9.4 × | 1.4 × | 4.9 × | 6.1 × | 2.3 × | 4.6 × |
S2MA | AFNet | ICNet | PSNet | DANet | DCMF | MoADNet | CFIDNet | HINet | LSNet | Our | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NJU2K | ↓ | 0.0533 | 0.0533 | 0.052 | 0.0485 | 0.0464 | 0.0427 | 0.041 | 0.038 | 0.0387 | 0.0379 | 0.0367 |
↑ | 0.8646 | 0.8672 | 0.8676 | 0.8659 | 0.8763 | 0.8804 | 0.8903 | 0.891 | 0.896 | 0.8998 | 0.901 | |
↑ | 0.8942 | 0.8801 | 0.8939 | 0.8898 | 0.8969 | 0.9125 | 0.9062 | 0.9141 | 0.9151 | 0.9107 | 0.9021 | |
↑ | 0.9163 | 0.9188 | 0.9127 | 0.9125 | 0.926 | 0.9246 | 0.9339 | 0.9289 | 0.9385 | 0.9401 | 0.9447 | |
NLPR | ↓ | 0.03 | 0.033 | 0.0284 | 0.0287 | 0.0285 | 0.029 | 0.0274 | 0.0258 | 0.0259 | 0.0244 | 0.0242 |
↑ | 0.8479 | 0.8203 | 0.865 | 0.8838 | 0.8662 | 0.849 | 0.8664 | 0.8803 | 0.8725 | 0.8824 | 0.8917 | |
↑ | 0.9145 | 0.8994 | 0.9215 | 0.9061 | 0.9137 | 0.921 | 0.9148 | 0.921 | 0.9212 | 0.9169 | 0.9136 | |
↑ | 0.9407 | 0.9306 | 0.9435 | 0.9457 | 0.9478 | 0.9381 | 0.9448 | 0.95 | 0.9491 | 0.9554 | 0.9564 | |
DUT | ↓ | 0.044 | − | 0.0722 | − | 0.0467 | 0.0351 | 0.0313 | − | − | − | 0.0332 |
↑ | 0.8847 | − | 0.8298 | − | 0.8836 | 0.9057 | 0.9214 | − | − | − | 0.9212 | |
↑ | 0.903 | − | 0.8524 | − | 0.8894 | 0.9279 | 0.9269 | − | − | − | 0.9154 | |
↑ | 0.9349 | − | 0.9012 | − | 0.929 | 0.9505 | 0.9589 | − | − | − | 0.9531 | |
SIP | ↓ | − | − | 0.0697 | − | 0.054 | − | 0.0585 | 0.0603 | 0.0658 | 0.0492 | 0.0521 |
↑ | − | − | 0.8334 | − | 0.8615 | − | 0.846 | 0.8565 | 0.8434 | 0.8819 | 0.8805 | |
↑ | − | − | 0.8527 | − | 0.8771 | − | 0.8648 | 0.8632 | 0.8552 | 0.8844 | 0.8709 | |
↑ | − | − | 0.899 | − | 0.9167 | − | 0.9102 | 0.9058 | 0.899 | 0.9271 | 0.9178 | |
STERE1000 | ↓ | 0.0508 | 0.0472 | 0.0447 | 0.0521 | 0.0476 | 0.0427 | 0.0424 | 0.0427 | 0.049 | 0.0543 | 0.0439 |
↑ | 0.8545 | 0.8718 | 0.8642 | 0.8522 | 0.8581 | 0.8659 | 0.8666 | 0.8789 | 0.8586 | 0.8542 | 0.874 | |
↑ | 0.8904 | 0.8914 | 0.9025 | 0.8678 | 0.8922 | 0.9097 | 0.8989 | 0.9012 | 0.8919 | 0.8707 | 0.8822 | |
↑ | 0.9254 | 0.9337 | 0.9256 | 0.9066 | 0.9263 | 0.9298 | 0.9343 | 0.9325 | 0.9273 | 0.9194 | 0.9364 |
Our | S2MA | AFNet | ICNet | PSNet | DANet | DCMF | MoADNet | CFIDNet | HINet | LSNet | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NJU2K | ↓ | 0.0367 | 0.037 | 0.0363 | 0.0359 | 0.0361 | 0.0362 | 8.8 × | 8.8 × | 1.3 × | 4.6 × | 1.2 × | 1.2 × | 5.6 × | 9.4 × | 1.7 × | 0.0001 |
↑ | 0.901 | 0.9013 | 0.9013 | 0.9028 | 0.9034 | 0.9035 | 2.8 × | 4.0 × | 4.2 × | 3.3 × | 1.8 × | 4.3 × | 8.6 × | 1.2 × | 2.1 × | 0.0018 | |
↑ | 0.9021 | 0.9018 | 0.9027 | 0.9039 | 0.9034 | 0.9034 | 8.1 × | 6.6 × | 6.9 × | 1.1 × | 5.1 × | 1 | 0.9999 | 1 | 1 | 1 | |
↑ | 0.9447 | 0.9442 | 0.9447 | 0.9451 | 0.945 | 0.945 | 2.3 × | 3.6 × | 1.3 × | 1.2 × | 1.8 × | 1.3 × | 2.8 × | 4.2 × | 4.4 × | 1.9 × | |
NLPR | ↓ | 0.0242 | 0.0245 | 0.0247 | 0.0245 | 0.0243 | 0.0246 | 4.6 × | 5.3 × | 2.5 × | 1.8 × | 2.2 × | 1.3 × | 1.1 × | 5.5 × | 3.9 × | 0.7897 |
↑ | 0.8917 | 0.8888 | 0.8898 | 0.8922 | 0.8925 | 0.8927 | 7.4 × | 6.3 × | 9.1 × | 4.5 × | 1.1 × | 8.4 × | 1.2 × | 6.9 × | 4.8 × | 2.0 × | |
↑ | 0.9136 | 0.9119 | 0.9127 | 0.9129 | 0.913 | 0.9122 | 0.9996 | 2.1 × | 1 | 6.8 × | 0.9948 | 1 | 0.9998 | 1 | 1 | 0.9999 | |
↑ | 0.9564 | 0.9548 | 0.9551 | 0.9556 | 0.9561 | 0.9557 | 1.1 × | 8.4 × | 3.1 × | 8.5 × | 2.8 × | 5.0 × | 5.5 × | 1.4 × | 6.9 × | 0.2078 | |
DUT | ↓ | 0.0332 | 0.0331 | 0.0321 | 0.0324 | 0.0321 | 0.0326 | 1.4 × | - | 2.8 × | - | 4.8 × | 2.5 × | 0.9994 | - | - | - |
↑ | 0.9212 | 0.9192 | 0.9224 | 0.9214 | 0.9229 | 0.9205 | 6.8 × | - | 7.0 × | - | 5.9 × | 4.8 × | 0.5922 | - | - | - | |
↑ | 0.9154 | 0.9142 | 0.9156 | 0.9145 | 0.9156 | 0.9141 | 8.1 × | - | 2.0 × | - | 1.8 × | 1 | 1 | - | - | - | |
↑ | 0.9531 | 0.9546 | 0.9553 | 0.9544 | 0.9558 | 0.9545 | 2.4 × | - | 1.6 × | - | 6.4 × | 5.5 × | 0.9999 | - | - | - | |
SIP | ↓ | 0.0521 | 0.0507 | 0.0553 | 0.0536 | 0.0534 | 0.0542 | - | - | 9.6 × | - | 0.1443 | - | 0.0002 | 6.1 × | 3.7 × | 0.9991 |
↑ | 0.8805 | 0.8855 | 0.8759 | 0.8781 | 0.8798 | 0.8773 | - | - | 2.2 × | - | 2.3 × | - | 1.1 × | 7.0 × | 7.5 × | 0.9280 | |
↑ | 0.8709 | 0.8759 | 0.8661 | 0.8693 | 0.8697 | 0.868 | - | - | 2.7 × | - | 0.9983 | - | 0.0062 | 0.0021 | 5.7 × | 0.9999 | |
↑ | 0.9178 | 0.9211 | 0.9113 | 0.9155 | 0.915 | 0.9133 | - | - | 3.7 × | - | 0.7525 | - | 0.0058 | 0.0005 | 3.7 × | 0.9998 | |
STERE1000 | ↓ | 0.0439 | 0.0453 | 0.0443 | 0.0441 | 0.0445 | 0.0444 | 2.7 × | 1.6 × | 0.1052 | 1.1 × | 8.3 × | 0.9998 | 0.9999 | 0.9998 | 1.4 × | 3.0 × |
↑ | 0.874 | 0.8691 | 0.8728 | 0.8747 | 0.8758 | 0.877 | 6.1 × | 0.0608 | 0.0002 | 3.5 × | 1.7 × | 0.0004 | 0.0007 | 0.9966 | 1.9 × | 5.6 × | |
↑ | 0.8822 | 0.88 | 0.8807 | 0.8818 | 0.8809 | 0.8812 | 1 | 1 | 1 | 7.8 × | 1 | 1 | 1 | 1 | 1 | 2.6 × | |
↑ | 0.9364 | 0.9352 | 0.9353 | 0.9363 | 0.9359 | 0.9365 | 4.9 × | 0.0001 | 5.4 × | 2.9 × | 7.6 × | 7.2 × | 0.0004 | 1.3 × | 1.3 × | 5.1 × |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lv, C.; Wan, B.; Zhou, X.; Sun, Y.; Zhang, J.; Yan, C. Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection. Entropy 2024, 26, 130. https://doi.org/10.3390/e26020130
Lv C, Wan B, Zhou X, Sun Y, Zhang J, Yan C. Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection. Entropy. 2024; 26(2):130. https://doi.org/10.3390/e26020130
Chicago/Turabian StyleLv, Chengtao, Bin Wan, Xiaofei Zhou, Yaoqi Sun, Jiyong Zhang, and Chenggang Yan. 2024. "Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection" Entropy 26, no. 2: 130. https://doi.org/10.3390/e26020130