Regulating Modality Utilization within Multimodal Fusion Networks
<p>Computing modality utilization by randomly shuffling a modality <math display="inline"><semantics> <msub> <mi>M</mi> <mi>i</mi> </msub> </semantics></math> within the dataset to break the association between the input modality <math display="inline"><semantics> <msub> <mi>M</mi> <mi>i</mi> </msub> </semantics></math> and the output <span class="html-italic">Y</span>.</p> "> Figure 2
<p>Modality utilization-based training targets the decision layers while using pre-trained feature extractors with frozen weights.</p> "> Figure 3
<p>Visualization of multimodal fusion network’s gradient descent on the loss surface of the fusion network task. Optimizing <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>m</mi> <mi>u</mi> </mrow> </msub> </semantics></math> from the very beginning can push the network in the local minima.</p> "> Figure 4
<p>Clipped exponential function-based for loss factor warm-up for MU-based training.</p> "> Figure 5
<p>NTIRE 2021 Multimodal Aerial View Object Classification Challenge Dataset [<a href="#B15-sensors-24-06054" class="html-bibr">15</a>].</p> "> Figure 6
<p>NTIRE 2021 Multimodal Aerial View Object Classification Network with ResNet18 as the backbone.</p> "> Figure 7
<p>Visualization of NTIRE21 dataset classification using Multimodal Aerial View Object Classification Network.</p> "> Figure 8
<p>MCubeS Multimodal Material Segmentation Dataset [<a href="#B16-sensors-24-06054" class="html-bibr">16</a>].</p> "> Figure 9
<p>MCubeS Multimodal Material Segmentation Network with UNet as the backbone.</p> "> Figure 10
<p>Visualization of MCubeS dataset image segmentation using the Multimodal Material Segmentation Network.</p> "> Figure 11
<p>Effects of different target utilization <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>U</mi> <mrow> <mi>t</mi> <mi>a</mi> <mi>r</mi> <mi>g</mi> <mi>e</mi> <mi>t</mi> </mrow> </msub> </mrow> </semantics></math> on modality utilization and classification accuracy with modality utilization-based training method in the NTIRE21 dataset. Loss factor <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>L</mi> </msub> <mo>=</mo> <mn>100.0</mn> </mrow> </semantics></math> with SAR as the focus modality.</p> "> Figure 12
<p>Effects of the loss factor <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>L</mi> </msub> </semantics></math> on modality utilization and classification accuracy with modality utilization-based training method in NTIRE21 dataset. Target utilization <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>U</mi> <mrow> <mi>t</mi> <mi>a</mi> <mi>r</mi> <mi>g</mi> <mi>e</mi> <mi>t</mi> </mrow> </msub> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math>% with SAR as the focus modality.</p> "> Figure 13
<p>Effects of the loss factor buildup rate <math display="inline"><semantics> <mi>β</mi> </semantics></math> on modality utilization and classification accuracy with modality utilization-based training method in the NTIRE21 dataset. Target utilization <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>U</mi> <mrow> <mi>t</mi> <mi>a</mi> <mi>r</mi> <mi>g</mi> <mi>e</mi> <mi>t</mi> </mrow> </msub> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math>%, Maximum Loss Factor <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mrow> <mi>L</mi> <mo>_</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math>, and buildup delay <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math> with SAR as the focus modality.</p> "> Figure 14
<p>Effects of Gaussian noise with mean = 0 and variance = <math display="inline"><semantics> <mrow> <mo>{</mo> <mn>0.06</mn> <mo>,</mo> <mn>0.09</mn> <mo>,</mo> <mn>0.12</mn> <mo>}</mo> </mrow> </semantics></math> in the EO modality, the SAR modality, and both modalities during inference on networks trained with different levels of SAR utilization.</p> ">
Abstract
:1. Introduction
- A modality utilization-based training method for multimodal fusion networks to regulate the network’s modality utilization;
- Demonstrated that regulating modality utilization within a network improves overall noise robustness;
- A heuristic approach for selecting target utilization-based on unimodal network performance.
2. Related Work
3. Method
3.1. Modality Utilization Metric
Algorithm 1: Modality Utilization |
Initialize the multimodal fusion network , learned model parameters , task dataset ; |
Compute the network loss L via forward pass with dataset , Equation (1); |
3.2. Modality Utilization-Based Training
Algorithm 2: Modality utilization-based training |
Initialize the multimodal fusion network , pre-trained feature extractors (), task dataset , focus modality , loss factor , and target modality utilization ; |
Load the parameters for pre-trained feature extractors () for each modality; |
Freeze the pre-trained feature extractors (); |
3.3. Loss Factor Warm-Up
3.4. Research Questions
- RQ1: Can we leverage the modality utilization metrics during training to regulate a network’s reliance on a dominant modality?
- RQ2: Does the modality utilization-based training method improve the overall noise robustness of multimodal fusion networks?
4. Experimental Design
4.1. Datasets and Network Architecture
4.1.1. Classification Task
4.1.2. Image Segmentation Task
5. Results
5.1. Ablation Studies
5.2. Validation Loss Factor Warm-Up
5.3. Studying Noise Robustness Properties of the Modality Utilization-Based Training Method
5.4. Determining Target Modality Utilization
6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MU | Modality Utilization |
EO | Electro-Optical |
SAR | Synthetic Aperture Radar |
RGB | Red, Green and Blue |
AoLP | Angle of Polarization |
DoLP | Degree of Polarization |
NIR | Near Infra-red |
targ | Target |
curr | Current |
FE | Feature Extractor |
Focus Modality | |
DL | Decision Layer |
M | Modality |
L | Loss |
Dataset | |
Network parameters |
References
- Xiao, Y.; Codevilla, F.; Gurram, A.; Urfalioglu, O.; López, A.M. Multimodal end-to-end autonomous driving. IEEE Trans. Intell. Transp. Syst. 2020, 23, 537–547. [Google Scholar] [CrossRef]
- Sharma, M.; Dhanaraj, M.; Karnam, S.; Chachlakis, D.G.; Ptucha, R.; Markopoulos, P.P.; Saber, E. YOLOrs: Object detection in multimodal remote sensing imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1497–1508. [Google Scholar] [CrossRef]
- Doherty, K.; Fourie, D.; Leonard, J. Multimodal semantic slam with probabilistic data association. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2419–2425. [Google Scholar]
- Papanastasiou, S.; Kousi, N.; Karagiannis, P.; Gkournelos, C.; Papavasileiou, A.; Dimoulas, K.; Baris, K.; Koukas, S.; Michalos, G.; Makris, S. Towards seamless human robot collaboration: Integrating multimodal interaction. Int. J. Adv. Manuf. Technol. 2019, 105, 3881–3897. [Google Scholar] [CrossRef]
- Guo, Z.; Li, X.; Huang, H.; Guo, N.; Li, Q. Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans. Radiat. Plasma Med. Sci. 2019, 3, 162–169. [Google Scholar] [CrossRef]
- He, M.; Hohil, M.; LaPeruta, T.; Nashed, K.; Lawrence, V.; Yao, Y.D. Performance evaluation of multimodal deep learning: Object identification using uav dataset. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Online, 12–16 April 2021; SPIE: Bellingham, WA, USA, 2021; Volume 11746, pp. 602–608. [Google Scholar]
- Chen, W.; Li, X.; Wang, L. Multimodal Remote sensing science and technology. In Remote Sensing Intelligent Interpretation for Mine Geological Environment: From land Use and Land Cover Perspective; Springer: Berlin/Heidelberg, Germany, 2022; pp. 7–32. [Google Scholar]
- Liu, G.; Peng, B.; Liu, T.; Zhang, P.; Yuan, M.; Lu, C.; Cao, N.; Zhang, S.; Huang, S.; Wang, T.; et al. Large-Scale Fine-Grained Building Classification and Height Estimation for Semantic Urban Reconstruction: Outcome of the 2023 IEEE GRSS Data Fusion Contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11194–11207. [Google Scholar] [CrossRef]
- Hänsch, R.; Arndt, J.; Lunga, D.; Pedelose, T.; Boedihardjo, A.; Pfefferkorn, J.; Petrie, D.; Bacastow, T.M. SpaceNet 8: Winning Approaches to Multi-Class Feature Segmentation from Satellite Imagery for Flood Disasters. In Proceedings of the IGARSS 2023–2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1241–1244. [Google Scholar]
- Low, S.; Nina, O.; Sappa, A.D.; Blasch, E.; Inkawhich, N. Multi-modal aerial view object classification challenge results-PBVS 2023. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 412–421. [Google Scholar]
- Meng, T.; Jing, X.; Yan, Z.; Pedrycz, W. A survey on machine learning for data fusion. Inf. Fusion 2020, 57, 115–129. [Google Scholar] [CrossRef]
- Wu, N.; Jastrzebski, S.; Cho, K.; Geras, K.J. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: London, UK, 2022; pp. 24043–24055. [Google Scholar]
- Li, J.; Hong, D.; Gao, L.; Yao, J.; Zheng, K.; Zhang, B.; Chanussot, J. Deep learning in multimodal remote sensing data fusion: A comprehensive review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [Google Scholar] [CrossRef]
- Singh, S.; Markopoulos, P.P.; Saber, E.; Lew, J.D.; Heard, J. Measuring Modality Utilization in Multi-Modal Neural Networks. In Proceedings of the 2023 IEEE Conference on Artificial Intelligence (CAI), Santa Clara, CA, USA, 5–6 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 11–14. [Google Scholar]
- Pérez-Pellitero, E.; Catley-Chandar, S.; Leonardis, A.; Timofte, R. NTIRE 2021 challenge on high dynamic range imaging: Dataset, methods and results. In Proceedings of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops, CVPRW 2021, Online, 19–25 June 2021; IEEE Computer Society: Washington, DC, USA, 2021; pp. 691–700. [Google Scholar] [CrossRef]
- Liang, Y.; Wakaki, R.; Nobuhara, S.; Nishino, K. Multimodal Material Segmentation. In Proceedings of the IEEE/CVF Conf. on Comput. Vision and Patt. Recogn. (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 19800–19808. [Google Scholar]
- Chakraborty, J.; Stolinski, M. Signal-Level Fusion Approach for Embedded Ultrasonic Sensors in Damage Detection of Real RC Structures. Mathematics 2022, 10, 724. [Google Scholar] [CrossRef]
- Cai, H.; Qu, Z.; Li, Z.; Zhang, Y.; Hu, X.; Hu, B. Feature-level fusion approaches based on multimodal EEG data for depression recognition. Inf. Fusion 2020, 59, 127–138. [Google Scholar] [CrossRef]
- Boulahia, S.Y.; Amamra, A.; Madi, M.R.; Daikh, S. Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach. Vis. Appl. 2021, 32, 121. [Google Scholar] [CrossRef]
- Su, Y.; Zhang, K.; Wang, J.; Madani, K. Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 2019, 19, 1733. [Google Scholar] [CrossRef] [PubMed]
- Ding, J.; Xue, N.; Xia, G.S.; Bai, X.; Yang, W.; Yang, M.Y.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object detection in aerial images: A large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7778–7796. [Google Scholar] [CrossRef] [PubMed]
- Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered object detection in aerial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8311–8320. [Google Scholar]
- Chen, C.; Zhong, J.; Tan, Y. Multiple-oriented and small object detection with convolutional neural networks for aerial image. Remote Sens. 2019, 11, 2176. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Sengupta, D.; Terzopoulos, D. End-to-end trainable deep active contour models for automated image segmentation: Delineating buildings in aerial imagery. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 730–746. [Google Scholar]
- Yue, K.; Yang, L.; Li, R.; Hu, W.; Zhang, F.; Li, W. TreeUNet: Adaptive tree convolutional neural networks for subdecimeter aerial image segmentation. ISPRS J. Photogramm. Remote Sens. 2019, 156, 1–13. [Google Scholar] [CrossRef]
- Guan, Z.; Miao, X.; Mu, Y.; Sun, Q.; Ye, Q.; Gao, D. Forest fire segmentation from aerial imagery data using an improved instance segmentation model. Remote Sens. 2022, 14, 3159. [Google Scholar] [CrossRef]
- Kyrkou, C.; Theocharides, T. Deep-Learning-Based Aerial Image Classification for Emergency Response Applications Using Unmanned Aerial Vehicles. In Proceedings of the CVPR Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 517–525. [Google Scholar]
- Zheng, X.; Yuan, Y.; Lu, X. A deep scene representation for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4799–4809. [Google Scholar] [CrossRef]
- Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G.S. Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3735–3756. [Google Scholar] [CrossRef]
- Gómez-Chova, L.; Tuia, D.; Moser, G.; Camps-Valls, G. Multimodal classification of remote sensing images: A review and future directions. Proc. IEEE 2015, 103, 1560–1584. [Google Scholar] [CrossRef]
- Huang, Z.; Li, W.; Tao, R. Multimodal knowledge distillation for arbitrary-oriented object detection in aerial images. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: Piscaway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Singh, S.; Sharma, M.; Heard, J.; Lew, J.D.; Saber, E.; Markopoulos, P.P. Multimodal aerial view object classification with disjoint unimodal feature extraction and fully connected-layer fusion. In Proceedings of the Big Data V: Learning, Analytics, and Applications, Orlando, FL, USA, 1 April–5 May 2023; SPIE: Bellingham, WA, USA, 2023; Volume 12522, p. 1252206. [Google Scholar]
- Xiang, Y.; Tian, X.; Xu, Y.; Guan, X.; Chen, Z. EGMT-CD: Edge-Guided Multimodal Transformers Change Detection from Satellite and Aerial Images. Remote Sens. 2023, 16, 86. [Google Scholar] [CrossRef]
- Huang, Y.; Lin, J.; Zhou, C.; Yang, H.; Huang, L. Modality competition: What makes joint training of multi-modal network fail in deep learning? (provably). In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: London, UK, 2022; pp. 9226–9259. [Google Scholar]
- Hafner, S.; Ban, Y.; Nascetti, A. Investigating Imbalances Between SAR and Optical Utilization for Multi-Modal Urban Mapping. In Proceedings of the 2023 Joint Urban Remote Sensing Event (JURSE), Heraklion, Greece, 17–19 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–4. [Google Scholar]
- Ghahremani Boozandani, M.; Wachinger, C. RegBN: Batch Normalization of Multimodal Data with Regularization. In Proceedings of the Advances in Neural Information Processing Systems 36, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Gat, I.; Schwartz, I.; Schwing, A.; Hazan, T. Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. In Proceedings of the Advances in Neural Information Processing Systems 33, Online, 6–12 December 2020; pp. 3197–3208. [Google Scholar]
- Ma, H.; Zhang, Q.; Zhang, C.; Wu, B.; Fu, H.; Zhou, J.T.; Hu, Q. Calibrating multimodal learning. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: London, UK; pp. 23429–23450. [Google Scholar]
- Cao, Y.; Bin, J.; Hamari, J.; Blasch, E.; Liu, Z. Multimodal object detection by channel switching and spatial attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 403–411. [Google Scholar]
- He, Y.; Cheng, R.; Balasubramaniam, G.; Tsai, Y.H.H.; Zhao, H. Efficient Modality Selection in Multimodal Learning. J. Mach. Learn. Res. 2024, 25, 1–39. [Google Scholar]
- Sun, Y.; Mai, S.; Hu, H. Learning to balance the learning rates between various modalities via adaptive tracking factor. IEEE Signal Process. Lett. 2021, 28, 1650–1654. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–7 December 2017. [Google Scholar]
- Xu, P.; Zhu, X.; Clifton, D.A. Multimodal learning with transformers: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12113–12132. [Google Scholar] [CrossRef] [PubMed]
- Roy, S.K.; Deria, A.; Hong, D.; Rasti, B.; Plaza, A.; Chanussot, J. Multimodal fusion transformer for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5515620. [Google Scholar] [CrossRef]
- Huang, J.; Tao, J.; Liu, B.; Lian, Z.; Niu, M. Multimodal transformer fusion for continuous emotion recognition. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3507–3511. [Google Scholar]
- Boussioux, L.; Zeng, C.; Guénais, T.; Bertsimas, D. Hurricane forecasting: A novel multimodal machine learning framework. Weather. Forecast. 2022, 37, 817–831. [Google Scholar] [CrossRef]
- Luo, Y.; Guo, X.; Dong, M.; Yu, J. Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking. Sensors 2023, 23, 6609. [Google Scholar] [CrossRef]
- Ivanov, A.; Dryden, N.; Ben-Nun, T.; Li, S.; Hoefler, T. Data movement is all you need: A case study on optimizing transformers. Proc. Mach. Learn. Syst. 2021, 3, 711–732. [Google Scholar]
- Wang, W.; Zhang, J.; Cao, Y.; Shen, Y.; Tao, D. Towards data-efficient detection transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 88–105. [Google Scholar]
- Quan, Y.; Zhang, R.; Li, J.; Ji, S.; Guo, H.; Yu, A. Learning SAR-Optical Cross Modal Features for Land Cover Classification. Remote Sens. 2024, 16, 431. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Fisher, A.; Rudin, C.; Dominici, F. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. JMLR 2019, 20, 177. [Google Scholar]
- Singh, S.; Heard, J. Measuring State Utilization During Decision Making in Human-Robot Teams. In Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, 11–15 March 2024; pp. 985–989. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Rochester Institute of Technology. Research Computing Services. 2023. Available online: https://www.rit.edu/researchcomputing/ (accessed on 19 May 2022). [CrossRef]
Dataset | Modality | Performance | Modality Utilization (MU) (%) | |||
---|---|---|---|---|---|---|
Accuracy (%) | EO | SAR | ||||
NTIRE21 | EO | 97.5 | 100.0 | - | ||
SAR | 84.9 | - | 100.0 | |||
EO-SAR | 97.8 | 99.59 | 0.40 | |||
mIoU | RBG | AoLP | DoLP | NIR | ||
MCubeS | RGB | 0.318 | 100.0 | - | - | - |
AoLP | 0.266 | - | 100.0 | - | - | |
DoLP | 0.262 | - | - | 100.0 | - | |
NIR | 0.270 | - | - | - | 100.0 | |
RGB-AoLP-DoLP-NIR | 0.374 | 34.5 | 19.0 | 30.9 | 15.6 | |
AoLP-DoLP-NIR | 0.351 | - | 67.3 | 21.0 | 11.7 |
SAR (%) | Acc. (%) | (%) | (%) |
---|---|---|---|
0.0 | 97.1 | 99.4 | 0.6 |
12.5 | 97.6 | 92.8 | 7.2 |
25.0 | 97.7 | 82.1 | 17.9 |
37.5 | 97.7 | 63.5 | 36.5 |
50.0 | 97.2 | 47.0 | 53.0 |
62.5 | 96.8 | 31.7 | 68.3 |
75.0 | 95.3 | 19.7 | 80.3 |
87.5 | 92.2 | 13.1 | 86.9 |
100.0 | 84.4 | 4.8 | 95.1 |
EO (%) | Acc. (%) | (%) | (%) |
---|---|---|---|
0.0 | 84.4 | 3.9 | 96.1 |
12.5 | 92.0 | 14.3 | 85.7 |
25.0 | 95.3 | 19.5 | 80.5 |
37.5 | 96.8 | 30.6 | 69.4 |
50.0 | 97.2 | 47.0 | 53.0 |
62.5 | 97.6 | 63.5 | 36.5 |
75.0 | 97.7 | 79.4 | 20.6 |
87.5 | 97.5 | 94.4 | 5.6 |
100.0 | 97.1 | 99.5 | 0.5 |
RGB (%) | mIoU (%) | (%) | (%) | (%) | (%) |
---|---|---|---|---|---|
0.0 | 0.400 | 1.2 | 4.2 | 54.4 | 40.2 |
12.5 | 0.403 | 6.7 | 0.7 | 34.2 | 58.4 |
25.0 | 0.388 | 18.6 | 10.8 | 7.3 | 63.3 |
37.5 | 0.393 | 39.4 | 10.8 | 16.5 | 33.3 |
50.0 | 0.397 | 54.9 | 15.9 | 0.1 | 29.1 |
62.5 | 0.394 | 68.9 | 10.5 | 8.1 | 12.5 |
75.0 | 0.387 | 84.9 | 5.4 | 0.5 | 9.2 |
87.5 | 0.407 | 89.2 | 5.1 | 2.6 | 3.1 |
100.0 | 0.403 | 96.7 | 1.0 | 0.9 | 1.4 |
SAR (%) | Fold | Acc. (%) | (%) | (%) |
---|---|---|---|---|
0.0 | 1 | 97.7 | 99.9 | 0.1 |
0.0 | 2 | 44.5 | 57.8 | 42.2 |
0.0 | 3 | 51.1 | 55.5 | 44.5 |
0.0 | 4 | 98.4 | 99.9 | 0.1 |
0.0 | 5 | 48.8 | 55.9 | 44.1 |
SAR (%) | Fold | Acc. (%) | (%) | (%) |
---|---|---|---|---|
0.0 | 1 | 97.7 | 99.4 | 0.6 |
0.0 | 2 | 96.9 | 99.2 | 0.8 |
0.0 | 3 | 97.6 | 99.3 | 0.7 |
0.0 | 4 | 97.2 | 99.4 | 0.6 |
0.0 | 5 | 97.0 | 99.6 | 0.4 |
EO | No Noise | Noise Var. = 0.06 | Noise Var. = 0.09 | Noise Var. = 0.12 | Average Acc. |
---|---|---|---|---|---|
100.0 | 95.1 | 84.4 | 74.4 | 73.7 | 81.9 |
87.5 | 94.6 | 86.8 | 74.9 | 80.3 | 84.2 |
75.0 | 92.8 | 85.7 | 80.1 | 81.4 | 85.0 |
62.5 | 91.8 | 83.7 | 80.2 | 82.1 | 84.5 |
50.0 | 89.8 | 80.8 | 82.1 | 82.3 | 83.8 |
37.5 | 89.1 | 83.8 | 78.9 | 80.9 | 83.2 |
25.0 | 91.2 | 83.9 | 79.5 | 81.1 | 83.9 |
12.5 | 92.2 | 86.5 | 81.2 | 81.7 | 85.4 |
0.0 | 90.3 | 85.9 | 83.3 | 83.0 | 85.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Singh, S.; Saber, E.; Markopoulos, P.P.; Heard, J. Regulating Modality Utilization within Multimodal Fusion Networks. Sensors 2024, 24, 6054. https://doi.org/10.3390/s24186054
Singh S, Saber E, Markopoulos PP, Heard J. Regulating Modality Utilization within Multimodal Fusion Networks. Sensors. 2024; 24(18):6054. https://doi.org/10.3390/s24186054
Chicago/Turabian StyleSingh, Saurav, Eli Saber, Panos P. Markopoulos, and Jamison Heard. 2024. "Regulating Modality Utilization within Multimodal Fusion Networks" Sensors 24, no. 18: 6054. https://doi.org/10.3390/s24186054