An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification
"> Figure 1
<p>The characteristics of a remote sensing scene image. A remote sensing scene consists of many types of land cover units. However, to classify this scene, we only need to pay more attention to the key regions, i.e., bridge, while other regions can be regarded as interference.</p> "> Figure 2
<p>The overall architecture of our proposed AGMFA-Net.</p> "> Figure 3
<p>Grad-CAM visualization results. We compare the visualization results of our proposed channel–spatial attention with three other high-level convolutional feature maps of the last residual block of ResNet-50.</p> "> Figure 4
<p>Diagram of the channel–spatial attention module.</p> "> Figure 5
<p>Examples of the UCML dataset.</p> "> Figure 6
<p>Examples of the AID dataset.</p> "> Figure 7
<p>Examples of the NWPU-RESISC45 dataset.</p> "> Figure 8
<p>Grad-CAM visualization results. We compare the visualization results of the proposed AGMFA-Net (ResNet-50) with the baseline (ResNet-50) and three other multilayer feature aggregation methods. The Grad-CAM visualization is computed for the last convolutional outputs.</p> "> Figure 9
<p>Confusion matrix of the proposed method on the UCML dataset with a training ratio of 80%.</p> "> Figure 10
<p>Confusion matrix of the proposed method on the AID dataset with a training ratio of 50%.</p> "> Figure 11
<p>Confusion matrix of the proposed method on the AID dataset with a training ratio of 20%.</p> "> Figure 12
<p>Confusion matrix of the proposed method on the NWPU-RESISC45 dataset with a training ratio of 20%.</p> "> Figure 13
<p>Confusion matrix of the proposed method on the NWPU-RESISC45 dataset with a training ratio of 10%.</p> ">
Abstract
:1. Introduction
2. Related Works
2.1. Deep-Learning-Based Remote Sensing Image Scene Classification
2.1.1. Fine-Tuning Methods
2.1.2. Deep Feature Encoding Methods
2.1.3. Multiple Feature Fusion Methods
2.1.4. Other Methods
2.2. Attention in CNNs
3. The Proposed Method
3.1. Overall Architecture
3.2. Multilayer Feature Extraction
3.3. Multilayer Feature Aggregation
3.3.1. Semantic Region Extraction
3.3.2. Multilayer Feature Aggregation
3.4. Loss Function
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Ablation Study
4.4.1. The Effectiveness of Semantic Region Extraction
4.4.2. The Effectiveness of Multilayer Feature Aggregation
4.5. State-of-the-Art Comparison and Analysis
4.5.1. Results on the UCML Dataset
4.5.2. Results on the AID Dataset
4.5.3. Results on the NWPU-RESISC45 Dataset
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Lei, L.; Sun, Y.; Li, M.; Kuang, G. Multimodal bilinear fusion network with second-order attention-based channel selection for land cover classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1011–1026. [Google Scholar] [CrossRef]
- Wu, C.; Zhang, L.; Zhang, L. A scene change detection framework for multi-temporal very high resolution remote sensing images. Signal Process 2015, 124, 84–197. [Google Scholar] [CrossRef]
- Hu, Q.; Wu, W.; Xia, T.; Yu, Q.; Yang, P.; Li, Z.; Song, Q. Exploring the use of Google Earth imagery and object-based methods in land use/cover mapping. Remote Sens. 2013, 105, 6026–6042. [Google Scholar] [CrossRef] [Green Version]
- Wang, C.; Shi, J.; Yang, X.; Zhou, Y.; Wei, S.; Li, L.; Zhang, X. Geospatial object detection via deconvolutional region proposal network. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2019, 12, 3014–3027. [Google Scholar] [CrossRef]
- Xu, K.; Huang, H.; Li, Y.; Shi, G. Multilayer feature fusion network for scene classification in remote sensing. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1894–1898. [Google Scholar] [CrossRef]
- Swain, M.J.; Ballard, D.H. Color indexing. Int. J. Comput. Vis. 1991, 7, 11–32. [Google Scholar] [CrossRef]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
- Lowe, D.G. Distinctive image features from scale-invariant key-points. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Xia, G.-S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G.S. Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities. IEEE J. Sel. Topics. Appl. Earth Observ. Remote Sens. 2020, 13, 3735–3756. [Google Scholar] [CrossRef]
- Nogueira, K.; Penatti, O.A.; Santos, J.A.D. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognit 2017, 61, 539–556. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Li, Z.; Yao, X.; Guo, L.; Wei, Z. Remote sensing image scene classification using bag of convolutional features. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1735–1739. [Google Scholar] [CrossRef]
- Wang, S.; Guan, Y.; Shao, L. Multi-granularity canonical appearance pooling for remote sensing scene classification. IEEE Trans. Image Process 2020, 29, 5396–5407. [Google Scholar] [CrossRef] [Green Version]
- Li, E.; Xia, J.; Du, P.; Lin, C.; Samat, A. Integrating multilayer features of convolutional neural networks for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5653–5665. [Google Scholar] [CrossRef]
- Lu, X.; Sun, H.; Zheng, X. A feature aggregation convolutional neural network for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7894–7906. [Google Scholar] [CrossRef]
- Chaib, S.; Liu, H.; Gu, Y.; Yao, H. Deep feature fusion for VHR remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4775–4784. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Liang, Y.; Monteiro, S.T.; Saber, E.S. Transfer learning for high resolution aerial image classification. In Proceedings of the 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 18–20 October 2016; pp. 1–8. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9.
- Zhao, W.; Du, S. Scene classification using multi-scale deeply described visual words. Int. J. Remote Sens. 2016, 37, 4119–4131. [Google Scholar] [CrossRef]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land use classification. In Proceedings of the GIS ’10: 18th Sigspatial International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; ACM: New York, NY, USA, 2010; pp. 270–279. [Google Scholar]
- Zheng, X.; Yuan, Y.; Lu, X. A deep scene representation for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4799–4809. [Google Scholar] [CrossRef]
- Sanchez, J.; Perronnin, F.; Mensink, T.; Verbeek, J. Image classification with the fisher vector: Theory and practice. Int. J. Comput. Vis. 2013, 105, 222–245. [Google Scholar] [CrossRef]
- Wang, G.; Fan, B.; Xiang, S.; Pan, C. Aggregating rich hierarchical features for scene classification in remote sensing imagery. IEEE J. Sel. Topics. Appl. Earth Observ. Remote Sens. 2017, 10, 4104–4115. [Google Scholar] [CrossRef]
- Negrel, R.; Picard, D.; Gosselin, P.-H. Evaluation of second-order visual features for land use classification. In Proceedings of the 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), Klagenfurt, Austria, 18–20 June 2014; pp. 1–5. [Google Scholar]
- He, N.; Fang, L.; Li, S.; Plaza, A.; Plaza, J. Remote sensing scene classification using multilayer stacked covariance pooling. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6899–6910. [Google Scholar] [CrossRef]
- Lu, X.; Ji, W.; Li, X.; Zheng, X. Bidirectional adaptive feature fusion for remote sensing scene classification. Neurocomputing 2019, 328, 135–146. [Google Scholar] [CrossRef]
- Yu, Y.; Liu, F. Dense connectivity based two-stream deep feature fusion framework for aerial scene classification. Remote Sens. 2018, 10, 1158. [Google Scholar] [CrossRef] [Green Version]
- Sun, H.; Li, S.; Zheng, X.; Lu, X. Remote sensing scene classification by gated bidirectional network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 82–96. [Google Scholar] [CrossRef]
- Yu, Y.; Liu, F. A two-stream deep fusion framework for high-resolution aerial scene classification. Comput. Intell. Neurosci. 2018, 2018, 8639367. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yu, Y.; Liu, F. Aerial scene classification via multilevel fusion based on deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 287–291. [Google Scholar] [CrossRef]
- Du, P.; Li, E.; Xia, J.; Samat, A.; Bai, X. Feature and model level fusion of pretrained CNN for remote sensing scene classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2019, 12, 2600–2611. [Google Scholar] [CrossRef]
- Zeng, D.; Chen, S.; Chen, B.; Li, S. Improving remote sensing scene classification by integrating global-context and local-object features. Remote Sens. 2018, 10, 734. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Zhong, Y.; Qin, Q. Scene classification based on multiscale convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7109–7121. [Google Scholar] [CrossRef] [Green Version]
- Ji, J.; Zhang, T.; Jiang, L.; Zhong, W.; Xiong, H. Combining multilevel features for remote sensing image scene classification with attention model. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1647–1651. [Google Scholar] [CrossRef]
- Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1155–1167. [Google Scholar] [CrossRef]
- Cao, R.; Fang, L.; Lu, T.; He, N. Self-attention-based deep feature fusion for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 43–47. [Google Scholar] [CrossRef]
- Zhao, Z.; Li, J.; Luo, Z.; Li, J.; Chen, C. Remote sensing image scene classification based on an enhanced attention module. IEEE Geosci. Remote Sens. Lett. 2020. [Google Scholar] [CrossRef]
- Zhang, W.; Tang, P.; Zhao, L. Remote sensing image scene classification using CNN-CapsNet. Remote Sens. 2019, 11, 494. [Google Scholar] [CrossRef] [Green Version]
- Yu, Y.; Li, X.; Liu, F. Attention GANs: Unsupervised deep feature learning for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 519–531. [Google Scholar] [CrossRef]
- Bazi, Y.; Rahhal, A.; Alhichri, M.M.H.; Alajlan, N. Simple yet effective fine-tuning of deep CNNs using an auxiliary classification loss for remote sensing scene classification. Remote Sens. 2019, 11, 2908. [Google Scholar] [CrossRef] [Green Version]
- Li, E.; Samat, A.; Du, P.; Liu, W.; Hu, J. Improved Bilinear CNN Model for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2020. [Google Scholar] [CrossRef]
- Peng, C.; Li, Y.; Jiao, L.; Shang, R. Efficient Convolutional Neural Architecture Search for Remote Sensing Image Scene Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6092–6105. [Google Scholar] [CrossRef]
- Zhang, P.; Bai, Y.; Wang, D.; Bai, B.; Li, Y. Few-shot classification of aerial scene images via meta-learning. Remote Sens. 2021, 13, 108. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008.
- Gu, Y.; Wang, L.; Wang, Z.; Liu, Y.; Cheng, M.-M.; Lu, S.-P. Pyramid Constrained Selfw-Attention Network for Fast Video Salient Object Detection. Proc. AAAI Conf. Artif. Intell 2020, 34, 10869–10876. [Google Scholar]
- Zhu, F.; Fang, C.; Ma, K.-K. PNEN: Pyramid Non-Local Enhanced Networks. IEEE Trans. Image Process. 2020, 29, 8831–8841. [Google Scholar] [CrossRef]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-cross attention for semantic segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 603–612. [Google Scholar]
- Zhang, D.; Li, N.; Ye, Q. Positional context aggregation network for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 943–947. [Google Scholar] [CrossRef]
- Fu, L.; Zhang, D.; Ye, Q. Recurrent Thrifty Attention Network for Remote Sensing Scene Recognition. IEEE Trans. Geosci. Remote Sens. 2020. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
- Wei, X.; Luo, J.; Wu, J.; Zhou, Z. Selective convolution descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process 2017, 26, 2868–2881. [Google Scholar] [CrossRef] [Green Version]
Method | AID | NWPU-RESISC45 | ||
---|---|---|---|---|
20% | 50% | 10% | 20% | |
ResNet-50 (Baseline) | 92.93 ± 0.25 | 95.40 ± 0.18 | 89.06 ± 0.34 | 91.91 ± 0.09 |
ResNet-50+DA | 93.54 ± 0.30 | 96.08 ± 0.34 | 90.26 ± 0.04 | 93.21 ± 0.16 |
ResNet-50+WA | 93.66 ± 0.28 | 96.15 ± 0.28 | 90.24 ± 0.07 | 93.08 ± 0.04 |
ResNet-50+SA | 93.77 ± 0.31 | 96.32 ± 0.18 | 90.13 ± 0.59 | 93.22 ± 0.10 |
Ours (low-level features) | 93.51 ± 0.51 | 95.98 ± 0.20 | 89.16 ± 0.36 | 92.76 ± 0.11 |
Ours (high-level features) | 94.25 ± 0.13 | 96.68 ± 0.21 | 91.01 ± 0.18 | 93.70 ± 0.08 |
Methods | Accuracy |
---|---|
VGGNet-16 [12] | 96.10 ± 0.46 |
ResNet-50 [15] | 98.76 ± 0.20 |
MCNN [43] | 96.66 ± 0.90 |
Multi-CNN [41] | 99.05 ± 0.48 |
Fusion by Addition [25] | 97.42 ± 1.79 |
Two-Stream Fusion [39] | 98.02 ± 1.03 |
VGG-VD16+MSCP [35] | 98.40 ± 0.34 |
VGG-VD16+MSCP+MRA [35] | 98.40 ± 0.34 |
ARCNet-VGG16 [45] | 99.12 ± 0.40 |
VGG-16-CapsNet [48] | 98.81 ± 0.22 |
MG-CAP (Bilinear) [22] | 98.60 ± 0.26 |
MG-CAP (Sqrt-E) [22] | 99.00 ± 0.10 |
GBNet+global feature [38] | 98.57 ± 0.48 |
EfficientNet-B0-aux [50] | 99.04 ± 0.33 |
EfficientNet-B3-aux [50] | 99.09 ± 0.17 |
IB-CNN(M) [51] | 98.90 ± 0.21 |
TEX-TS-Net [37] | 98.40 ± 0.76 |
SAL-TS-Net [37] | 98.90 ± 0.95 |
ResNet-50+EAM [47] | 98.98 ± 0.37 |
Ours (VGGNet-16) | 98.71 ± 0.49 |
Ours (ResNet-50) | 99.33±0.31 |
Method | Training Ratio | |
---|---|---|
20% | 50% | |
VGGNet-16 [12] | 88.81 ± 0.35 | 92.84 ± 0.27 |
ResNet-50 [15] | 92.93 ± 0.25 | 95.40 ± 0.18 |
Fusion by Addition [25] | - | 91.87 ± 0.36 |
Two-Stream Fusion [39] | 80.22 ± 0.22 | 93.16 ± 0.18 |
Multilevel Fusion [40] | - | 95.36 ± 0.22 |
VGG-16+MSCP [35] | 91.52 ± 0.21 | 94.42 ± 0.17 |
ARCNet-VGG16 [45] | 88.75 ± 0.40 | 93.10 ± 0.55 |
MF Net [6] | 91.34 ± 0.35 | 94.84 ± 0.27 |
MSP [31] | 93.90 | - |
MCNN [43] | - | 91.80 ± 0.22 |
VGG-16-CapsNet [48] | 91.63 ± 0.19 | 94.74 ± 0.17 |
Inception-v3-CapsNet [48] | 93.79 ± 0.13 | 96.32 ± 0.12 |
MG-CAP (Bilinear) [22] | 92.11 ± 0.15 | 95.14 ± 0.12 |
MG-CAP (Sqrt-E) [22] | 93.34 ± 0.18 | 96.12 ± 0.12 |
EfficientNet-B0-aux [50] | 93.69 ± 0.11 | 96.17 ± 0.16 |
EfficientNet-B3-aux [50] | 94.19 ± 0.15 | 96.56 ± 0.14 |
IB-CNN(M) [51] | 94.23 ± 0.16 | 96.57 ± 0.28 |
TEX-TS-Net [37] | 93.31 ± 0.11 | 95.17 ± 0.21 |
SAL-TS-Net [37] | 94.09 ± 0.34 | 95.99 ± 0.35 |
ResNet-50+EAM [47] | 93.64 ± 0.25 | 96.62 ± 0.13 |
Ours (VGGNet-16) | 91.09 ± 0.30 | 95.10 ± 0.78 |
Ours (ResNet-50) | 94.25±0.13 | 96.68±0.21 |
Method | Training Ratio | |
---|---|---|
10% | 20% | |
VGGNet-16 [12] | 81.15 ± 0.35 | 86.52 ± 0.21 |
ResNet-50 [15] | 89.06 ± 0.34 | 91.91 ± 0.09 |
Two-Stream [39] | 80.22 ± 0.22 | 83.16 ± 0.18 |
VGG-16+MSCP [35] | 85.33 ± 0.17 | 88.93 ± 0.14 |
MF Net [6] | 85.54 ± 0.36 | 89.76 ± 0.27 |
VGG-16-CapsNet [48] | 85.08 ± 0.13 | 89.18 ± 0.14 |
Inception-v3-CapsNet [48] | 89.03 ± 0.21 | 92.60 ± 0.11 |
MG-CAP (Bilinear) [22] | 89.42 ± 0.19 | 91.72 ± 0.16 |
MG-CAP (Sqrt-E) [22] | 90.83 ± 0.12 | 92.95 ± 0.13 |
EfficientNet-B0-aux [50] | 89.96 ± 0.27 | 92.89 ± 0.16 |
IB-CNN(M) [51] | 90.49 ± 0.17 | 93.33 ± 0.21 |
TEX-TS-Net [37] | 84.77 ± 0.24 | 86.36 ± 0.19 |
SAL-TS-Net [37] | 85.02 ± 0.25 | 87.01 ± 0.19 |
ResNet-50+EAM [47] | 90.87 ± 0.15 | 93.51 ± 0.12 |
Ours (VGGNet-16) | 86.87 ± 0.19 | 90.38 ± 0.16 |
Ours (ResNet-50) | 91.01±0.18 | 93.70±0.08 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, M.; Lei, L.; Tang, Y.; Sun, Y.; Kuang, G. An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification. Remote Sens. 2021, 13, 3113. https://doi.org/10.3390/rs13163113
Li M, Lei L, Tang Y, Sun Y, Kuang G. An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification. Remote Sensing. 2021; 13(16):3113. https://doi.org/10.3390/rs13163113
Chicago/Turabian StyleLi, Ming, Lin Lei, Yuqi Tang, Yuli Sun, and Gangyao Kuang. 2021. "An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification" Remote Sensing 13, no. 16: 3113. https://doi.org/10.3390/rs13163113