Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network
"> Figure 1
<p>An example of object scale variation that may be caused by sensor altitude or angle variation.</p> "> Figure 2
<p>Convolutional neural network for scene classification.</p> "> Figure 3
<p>Object scale variation leading to difficulty in remote sensing scene classification.</p> "> Figure 4
<p>Random-scale stretching of a scene image.</p> "> Figure 5
<p>The framework of the testing of the scene image involved with the multi-perspective fusion.</p> "> Figure 6
<p>Class representatives of the UCM dataset: (<b>a</b>) agriculture, (<b>b</b>) airplane, (<b>c</b>) baseball diamond, (<b>d</b>) beach, (<b>e</b>) buildings, (<b>f</b>) chaparral, (<b>g</b>) dense residential, (<b>h</b>) forest, (<b>i</b>) freeway, (<b>j</b>) golf course, (<b>k</b>) harbor, (<b>l</b>) intersection, (<b>m</b>) medium residential, (<b>n</b>) mobile home park, (<b>o</b>) overpass, (<b>p</b>) parking lot, (<b>q</b>) river, (<b>r</b>) runway, (<b>s</b>) sparse residential, (<b>t</b>) storage tanks, and (<b>u</b>) tennis court.</p> "> Figure 7
<p>(<b>a</b>) The confusion matrix for the UCM dataset based on SRSCNN. (<b>b</b>) Some of the classification results of CNNV and SRSCNN with the UCM dataset. Left: correctly recognized images for both strategies. Right: images correctly recognized by SRSCNN, but incorrectly classified by CNNV.</p> "> Figure 8
<p>Class representatives of the Google dataset of SIRI-WHU: (<b>a</b>) meadow, (<b>b</b>) pond, (<b>c</b>) harbor, (<b>d</b>) industrial, (<b>e</b>) park, (<b>f</b>) river, (<b>g</b>) residential, (<b>h</b>) overpass, (<b>i</b>) agriculture, (<b>j</b>) commercial, (<b>k</b>) water, and (<b>l</b>) idle land.</p> "> Figure 9
<p>(<b>a</b>) The confusion matrix for the Google dataset of SIRI-WHU based on SRSCNN. (<b>b</b>) Some of the classification results of CNNV and SRSCNN with the Google dataset of SIRI-WHU. Right: correctly recognized images for both strategies. Left: images correctly recognized by SRSCNN, but incorrectly classified by CNNV.</p> "> Figure 10
<p>Class representatives of the Wuhan IKONOS dataset: (<b>a</b>) dense residential, (<b>b</b>) idle, (<b>c</b>) industrial, (<b>d</b>) medium residential, (<b>e</b>) parking lot, (<b>f</b>) commercial, (<b>g</b>) vegetation, (<b>h</b>) water.</p> "> Figure 11
<p>Image annotation using the Wuhan IKONOS dataset. (<b>a</b>) False-color image of the large image with 6150 × 8250 pixels. (<b>b</b>) Annotated large image.</p> "> Figure 12
<p>The relationship between dropout rate and classification accuracy.</p> "> Figure 13
<p>The relationship between feature length L and classification accuracy.</p> "> Figure 14
<p>(<b>a</b>) The relationship between <span class="html-italic">inf</span> of the uniform distribution and classification accuracy. (<b>b</b>) The relationship between the standard deviation α of the Gaussian distribution and classification accuracy.</p> "> Figure 15
<p>The class accuracy of the RssCNN, RotCNN and NrrCNN on the UCM.</p> "> Figure 16
<p>The class accuracy on OD and OD-8. (<b>a</b>) UCM dataset; (<b>b</b>) Google data of SIRI-WHU.</p> "> Figure 17
<p>The overall accuracy of the SRSCNN on the UCM and Google dataset of SIRI-WHU with different iterations.</p> ">
Abstract
:1. Introduction
2. Convolutional Neural Networks
3. Scene Classification for HSR Imagery Based on a Deep Random-Scale Stretched Convolutional Neural Network
3.1. SRSCNN
Algorithm 1. The SRSCNN procedure |
Input: |
-input dataset D = {(I, y) | I∈RH×W×3, y∈{1, 2, 3, …C} |
-scale of patch R |
-batch size m |
-scale variation factor obeying distribution d |
-defined CNN structure Net |
-maximum iteration times L |
Output: |
-Trained CNN model |
Algorithm: |
1. randomly initialize Net |
2. for l = 1 to L do |
3. generate a small dataset P with |P| = m from D |
4. for I = 1 to m do |
5. normalize IiP to [0, 1] by dividing by the maximum pixel value to obtain the normalized image Iin. |
6. generate a random-scale stretched patch pi from Iin. |
7. P = P − {Ii}, P = P∪{pi} |
8. end for |
9. Feed P to Net, update parameters W and b in Net |
10. end for |
3.2. Remote Sensing Scene Classification
4. Experiments and Results
4.1. Experiment 1: The UCM Dataset
4.2. The Google Dataset of SIRI-WHU
4.3. Experiment 3: The Wuhan IKONOS Dataset
5. Discussion
5.1. Sensitivity Analysis in Relation to the Dropout Rate
5.2. Sensitivity Analysis in Relation to the Length of the Feature
5.3. Sensitivity Analysis in Relation to the Distribution of the Scale Variation Factor α
5.4. Sensitivity Analysis in Relation to the Crop Rate Cr
5.5. Sensitivity Analysis in Relation to the Rss and Rot
5.6. Sensitivity Analysis in Relation to the Spatial Resolution
5.7. Sensitivity Analysis in Relation to the Training Iteration
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Bosch, A.; Munoz, X.; Marti, R. Which is the best way to organize/classify images by content? Image Vis. Comput. 2007, 25, 778–791. [Google Scholar] [CrossRef]
- Yang, Y.; Newsam, S. Spatial pyramid co-occurrence for image classification. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
- Zhao, L.; Tang, P.; Huo, L. A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification. Int. J. Remote Sens. 2014, 35, 2296–2310. [Google Scholar]
- Zhao, L.; Tang, P.; Huo, L. Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4620–4631. [Google Scholar] [CrossRef]
- Chen, S.; Tian, Y. Pyramid of spatial relations for scene-level land use classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1947–1957. [Google Scholar] [CrossRef]
- Hu, F.; Xia, G.; Wang, Z.; Huang, X.; Zhang, L.; Sun, H. Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8. [Google Scholar] [CrossRef]
- Zhao, B.; Zhong, Y.; Zhang, L. A spectral-structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2016, 116, 73–85. [Google Scholar] [CrossRef]
- Zhong, Y.; Wu, S.; Zhao, B. Scene Semantic Understanding Based on the Spatial Context Relations of Multiple Objects. Remote Sens. 2017, 9, 1030. [Google Scholar] [CrossRef]
- Zhao, B.; Zhong, Y.; Zhang, L. Scene classification via latent Dirichlet allocation using a hybrid generative/discriminative strategy for high spatial resolution remote sensing imagery. Remote Sens. Lett. 2013, 4, 1204–1213. [Google Scholar] [CrossRef]
- Zhong, Y.; Zhu, Q.; Zhang, L. Scene classification based on the multifeature fusion probabilistic topic model for high spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6207–6222. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Lienou, M.; Maître, H.; Datcu, M. Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geosci. Remote. Sens. Lett. 2010, 7, 28–32. [Google Scholar] [CrossRef]
- Luo, W.; Li, H.; Liu, G.; Zeng, L. Semantic annotation of satellite images using author–genre–topic model. IEEE Trans. Geosci. Remote. Sens. 2014, 52, 1356–1368. [Google Scholar] [CrossRef]
- Zhao, B.; Zhong, Y.; Xia, G.S.; Zhang, L. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 2108–2123. [Google Scholar] [CrossRef]
- Zhu, Q.; Zhong, Y.; Zhang, L.; Li, D. Scene Classification Based on the Sparse Homogeneous-Heterogeneous Topic Feature Model. IEEE Trans. Geosci. Remote. Sens. 2018. [Google Scholar] [CrossRef]
- Zhu, Q.; Zhong, Y.; Zhang, L.; Li, D. Scene Classification Based on the Fully Sparse Semantic Topic Model. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 5525–5537. [Google Scholar] [CrossRef]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Sun, Y.; Liang, D.; Wang, X.; Tang, X. Deepid3: Face recognition with very deep neural networks. arXiv 2015, arXiv:1502.00873. [Google Scholar]
- Kontschieder, P.; Fiterau, M.; Criminisi, A.; Rota Bulo, S. Deep neural decision forests. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Xiao, J.; Torralba, A.; Oliva, A. Learning deep features for scene recognition using places database. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. arXiv 2015, arXiv:1510.02855. [Google Scholar]
- Zhong, Y.; Ma, A.; Ong, Y.; Zhu, Z.; Zhang, L. Computational Intelligence in Optical Remote Sensing Image Processing. Appl. Soft Comput. 2018, 64, 75–93. [Google Scholar] [CrossRef]
- Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
- Ma, X.; Wang, H.; Geng, J. Spectral–Spatial Classification of Hyperspectral Image Based on Deep Auto-Encoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2016, 9, 4073–4085. [Google Scholar] [CrossRef]
- Slavkovikj, V.; Verstockt, S.; De Neve, W.; Van Hoecke, S.; Van de Walle, R. Hyperspectral image classification with convolutional neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015. [Google Scholar]
- Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote. Sens. 2015, 53, 3325–3337. [Google Scholar] [CrossRef]
- Salberg, A.B. Detection of seals in remote sensing images using features extracted from deep convolutional neural networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Milan, Italy, 26–31 July 2015. [Google Scholar]
- Song, X.; Rui, T.; Zha, Z.; Wang, X.; Fang, H. The AdaBoost algorithm for vehicle detection based on CNN features. In Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, Zhangjiajie, China, 19–21 August 2015. [Google Scholar]
- Zhong, Y.; Fei, F.; Liu., Y.; Zhao, B.; Jiao, H.; Zhang, P. SatCNN: Satellite Image Dataset Classification Using Agile Convolutional Neural Networks. Remote Sens. Lett. 2017, 8, 136–145. [Google Scholar] [CrossRef]
- Hu, F.; Xia, G.S.; Hu, J.; Zhang, L. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef]
- Castelluccio, M.; Poggi, G.; Sansone, C.; Verdoliva, L. Land use classification in remote sensing images by convolutional neural networks. arXiv 2015, arXiv:1508.00092. [Google Scholar]
- Luus, F.P.S.; Salmon, B.P.; Van Den Bergh, F.; Maharaj, B.T.J. Multiview deep learning for land-use classification. IEEE Geosci. Remote. Sens. Lett. 2015, 12, 2448–2452. [Google Scholar] [CrossRef]
- Zhang, F.; Du, B.; Zhang, L. Saliency-guided unsupervised feature learning for scene classification. IEEE Trans. Geosci. Remote. Sens. 2015, 53, 2175–2184. [Google Scholar] [CrossRef]
- Romero, A.; Gatta, C.; Camps-Valls, G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 1349–1362. [Google Scholar] [CrossRef]
- Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep learning based feature selection for remote sensing scene classification. IEEE Geosci. Remote. Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
- Zhang, F.; Du, B.; Zhang, L. Scene classification via a gradient boosting random convolutional network framework. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 1793–1802. [Google Scholar] [CrossRef]
- Zhong, Y.; Fei, F.; Zhang, L. Large patch convolutional neural networks for the scene classification of high spatial resolution imagery. J. Appl. Remote Sens. 2016, 10, 025006. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv, 2015; arXiv:1512.03385. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. arXiv, 2016; arXiv:1603.05027. [Google Scholar]
- Yosinski, J.; Clune, J.; Nguyen, A.; Fuchs, T.; Lipson, H. Understanding neural networks through deep visualization. arXiv, 2015; arXiv:1506.06579. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object detectors emerge in deep scene CNNs. arXiv, 2014; arXiv:1412.6856. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]
- Maas, L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Zeiler, M.D.; Ranzato, M.A.; Monga, R.; Mao, M.; Yang, K.; Le, Q.V.; Nguyen, P.; Senior, A.; Vanhoucke, V.; Dean, J.; et al. On rectified linear units for speech processing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
- Boureau, Y.L.; Ponce, J.; LeCun, Y. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar]
- Cheriyadat, A.M. Unsupervised feature learning for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 439–451. [Google Scholar] [CrossRef]
- Zhao, B.; Zhong, Y.; Zhang, L.; Huang, B. The Fisher Kernel coding framework for high spatial resolution scene classification. Remote. Sens. 2016, 8, 157. [Google Scholar] [CrossRef]
Input (181 × 181 RGB image) |
Conv3–32 |
Conv3–64 |
Maxpooling |
Conv3–96 |
Conv2–128 |
Maxpooling |
Conv3–160 |
Maxpooling |
FC-600 |
Softmax |
Classification Method | Classification Accuracy (%) | |
---|---|---|
Non-deep learning methods | BoVW | 72.05 |
PLSA | 80.71 | |
LDA | 81.92 | |
SPCK++ [2] | 76.05 | |
LLC [7] | 82.85 | |
SPM+SIFT [7] | 82.30 | |
SSBFC [7] | 91.67 | |
SAL-PTM [10] | 88.33 | |
DMTM [14] | 92.92 | |
SIFT+SC [55] | 81.67 | |
FK-S [56] | 91.63 | |
Deep learning methods | M-DCNN [36] | 93.48 |
S-UFL [37] | 82.72 | |
GBRCN [40] | 94.53 | |
LPCNN [41] | 89.90 | |
CCNN | 91.56 | |
SRSCNN-NV | 92.58 | |
CNNV | 93.92 | |
SRSCNN | 95.57 |
Classification Method | Classification Accuracy (%) | |
---|---|---|
Non-deep learning methods | BoVW | 73.93 |
SPM-SIFT | 80.26 | |
LLC | 70.89 | |
LDA | 66.85 | |
Deep learning methods | S-UFL [37] | 74.84 |
LPCNN [41] | 89.88 | |
CCNN | 88.26 | |
SRSCNN-NV | 91.06 | |
CNNV | 90.69 | |
SRSCNN | 94.76 |
Classification Method | Classification Accuracy (%) |
---|---|
SSBFC [7] | 90.86 |
DMTM [14] | 91.52 |
LFK-Linear | 88.42 |
FK-Linear | 87.53 |
FK-S [56] | 90.40 |
SRSCNN | 93.44 |
Classification Method | Classification Accuracy (%) |
---|---|
BoVW | 80.75 |
LDA | 77.34 |
P-LDA | 84.69 |
FK-Linear | 78.23 |
LFK-Linear | 79.69 |
CCNN | 74.45 |
SRSCNN-NV | 74.97 |
CNNV | 79.60 |
SRSCNN | 85.00 |
Dataset | Cr = 50% | Cr = 70% | Cr = 90% |
---|---|---|---|
UCM | 0.9486 | 0.9557 | 0.9398 |
0.9476 | 0.9468 | 0.933 |
Dataset | OD | OD-4 | OD-8 |
---|---|---|---|
UCM | 0.9557 | 0.9500 | 0.9071 |
0.9476 | 0.9396 | 0.8958 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Zhong, Y.; Fei, F.; Zhu, Q.; Qin, Q. Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network. Remote Sens. 2018, 10, 444. https://doi.org/10.3390/rs10030444
Liu Y, Zhong Y, Fei F, Zhu Q, Qin Q. Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network. Remote Sensing. 2018; 10(3):444. https://doi.org/10.3390/rs10030444
Chicago/Turabian StyleLiu, Yanfei, Yanfei Zhong, Feng Fei, Qiqi Zhu, and Qianqing Qin. 2018. "Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network" Remote Sensing 10, no. 3: 444. https://doi.org/10.3390/rs10030444