Deep Hash Remote Sensing Image Retrieval with Hard Probability Sampling
<p>Different colors represent different categories: the centered yellow point is the anchor. The yellow point with the red “1” is the positive sample farthest from the anchor, which represents the hardest positive sample. Then, blue with the red “0” is the closest negative sample to the anchor. We draw two semicircles on the basis of these two points, and then all points inside the ring are informative samples because, in loss computation, they can provide more valuable information and make the training process converge quickly.</p> "> Figure 2
<p>(<b>a</b>) The distance distribution. (<b>b</b>) The sampling probability. (<b>c</b>) Sampling number vs. sample distance.</p> "> Figure 3
<p>Four sampling methods are shown in (<b>a</b>–<b>d</b>). The above figure shows the distribution probability density of sample distance. As the sample distance roughly follows a normal distribution <math display="inline"><semantics> <mi mathvariant="normal">N</mi> </semantics></math>(<math display="inline"><semantics> <mrow> <msqrt> <mn>2</mn> </msqrt> <mo>,</mo> <mn>1</mn> <mo>/</mo> <msqrt> <mrow> <mn>2</mn> <mi mathvariant="normal">n</mi> </mrow> </msqrt> </mrow> </semantics></math>), we put the mean at <math display="inline"><semantics> <mrow> <msqrt> <mn>2</mn> </msqrt> </mrow> </semantics></math>, and <math display="inline"><semantics> <mi mathvariant="sans-serif">τ</mi> </semantics></math> is a randomly selected boundary to denote a centrally symmetrical distribution area (the shaded zone). In the subfigures below, the centered yellow point represents the anchor, and the blue points represent the negative samples selected by different sampling methods. In (<b>a</b>), most of the selected samples are gathered in the shaded zone around <math display="inline"><semantics> <mrow> <msqrt> <mn>2</mn> </msqrt> </mrow> </semantics></math> in random sampling. (<b>b</b>) represents semi-hard sampling. The selected samples need to be close to the anchor, but not as close as that needed in hard mining. (<b>c</b>) shows hard sampling. A predefined boundary (denoted by the blue circle) is used to filter hard negative samples. Thus, the selected samples are gathered more closely to the anchor than those in semi-hard sampling. (<b>d</b>) Our proposed sampling strategy. The selected samples have a scattered distribution to maintain diversity in the training batch.</p> "> Figure 4
<p>The optimization process using the margin loss. The colored points represent samples belonging to different classes. The left shows the initial distribution. The right shows the final state after optimization. The black solid circle represents the boundary for each of class, which is used to restrict samples of the same category within a given distance. The black dotted circle is a different class boundary applied to separate different categories.</p> "> Figure 5
<p>The architecture of our HPSH with hard probability sampling (HPSH+S) method. It can be divided into four parts. First, we use Inception Net to extract deep embedding of the remote sensing dataset. Second, we obtain hard negative examples by a probability sampling method and create a training batch at the same time. Third, the training batch is sent to the deep hash network, which uses a deep neural network to obtain compacted hash-like codes. Finally, we calculate the loss function to backpropagate in the deep net. An additional binarization process was used in the testing process.</p> "> Figure 6
<p>The top 20 retrieval results of HPSH+S on the UCMD dataset. The black lines separate the search results for four images. The query image is bordered by the blue square. The wrong retrieved image is bordered by the red square. The words under the images represent the class to which the images belong.</p> "> Figure 7
<p>Trends in the loss change on the UCMD dataset.</p> "> Figure 8
<p>t-SNE 2D scatterplots comparing the 2D projection of the <span class="html-italic">L</span>-dimensional binary hash codes of the test images in the UCMD dataset. We map the tiny image in the small red box to the big red box, the number under the picture in the big red box represents the category label of the current picture, and the specific category is given in the blue box. The leftmost diagram is the t-SNE 2D scatterplot of KSLSH [<a href="#B9-remotesensing-12-02789" class="html-bibr">9</a>], the middle one is the t-SNE 2D scatterplot of MiLan, and the last one is the t-SNE 2D scatterplot of HPSH+S.</p> "> Figure 9
<p>The top 20 retrieval results of HPSH+S on the AID dataset. The black lines separate the search results for four images from four different categories. The query retrieval image is bordered by the blue square. The wrong retrieved image is bordered by the red square. The words under the picture represent the class to which the image belongs.</p> "> Figure 10
<p>Loss trend during training on the AID dataset.</p> "> Figure 11
<p>t-SNE 2D scatterplots comparing the 2D projection of the <span class="html-italic">L</span>-dimensional binary hash codes of the test images in the AID dataset. We map the tiny image in the small red box to the big red box. The number under the picture in the big red box represents the class label of the current image, and the specific class is given in the blue box. The leftmost diagram is the t-SNE 2D scatterplot of KSLSH, the middle one is the t-SNE 2D scatterplot of MiLan, and the last one is the t-SNE 2D scatterplot of HPSH+S.</p> ">
Abstract
:1. Introduction
- (1)
- We designed a non-uniform hard negative sampling method to find more informative training samples. Meanwhile, the proposed mining strategy could keep the diversity of training samples to make the learned embedding discriminative and stable.
- (2)
- We showed a new deep hash structure to learn hash functions by DNN. It used hard samples to fully train the network weights, and it used margin-based loss and two hash losses for better hash retrieval results.
- (3)
- Experimental results on two remote sensing data sets UCMD and AID suggest that our proposed HPSH can get excellent retrieval performance compared to other state-of-the-art deep hash methods.
2. Related Work
3. HPSH Method
3.1. Symbol Interpretation
3.2. Probability Sample Method
3.3. Hash Loss Function with a Margin
Algorithm 1: Optimization algorithm for learning HPSH. |
Input: |
A batch of RS images. |
Output: |
The parameter W of the deep hash network; |
Initialization: |
Use random distribution to give W initial value. |
Repeat: |
1: Hard negative probability sampling: Compute distance matrix all samples in batch by Equation (1). Compute probability matrix using Equation (4). Finally, according to the probability, we select samples inequality. |
2: Compute hash-like code , , and by forward propagation; |
3: Use binarization function to compute hash-code , , and . |
4: Utilize , , , , , and to calculate loss function according to Equation (10) |
5: Recalculate W by exploiting Adam; |
Until: |
A fixed number of iterations or a stopping criteria is satisfied |
Return: W. |
3.4. Global Architecture
4. Experiments
4.1. Dataset and Evaluation Protocols
4.2. Implementation Details
4.3. Comparison with Baselines
4.3.1. Results on UCMD
4.3.2. Results on Enhanced UCMD
4.3.3. Results on AID
4.4. Ablation Study
4.5. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Ma, Y.; Wu, H.; Wang, L.; Huang, B.; Ranjan, R.; Zomaya, A.; Jie, W. Remote sensing big data computing: Challenges and opportunities. Future Gener. Comput. Syst. 2015, 51, 47–60. [Google Scholar] [CrossRef]
- Mandal, D.; Annadani, Y.; Biswas, S. GrowBit: Incremental Hashing for Cross-Modal Retrieval. In Asian Conference on Computer Vision; Springer: Cham, Switzerland, 2018; pp. 305–321. [Google Scholar]
- Tong, X.-Y.; Xia, G.-S.; Hu, F.; Zhong, Y.; Datcu, M.; Zhang, L. Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation. IEEE Trans. Big Data 2019, 6. [Google Scholar] [CrossRef] [Green Version]
- Zhao, L.; Tang, J.; Yu, X.; Li, Y.; Mi, S.; Zhang, C. Content-Based Remote Sensing Image Retrieval Using Image Multi-feature Combination and SVM-Based Relevance Feedback. In Recent Advances in Computer Science and Information Engineering: Volume 1; Qian, Z., Cao, L., Su, W., Wang, T., Yang, H., Eds.; Springer: Berlin, Germany, 2012; pp. 761–767. [Google Scholar] [CrossRef]
- Daschiel, H.; Datcu, M. Information mining in remote sensing image archives: System evaluation. IEEE Trans. Geosci. Remote. Sens. 2005, 43, 188–199. [Google Scholar] [CrossRef]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. arXiv 2015, arXiv:1503.03832. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
- Li, W.-J.; Wang, S.; Kang, W.-C. Feature Learning based Deep Supervised Hashing with Pairwise Labels. arXiv 2015, arXiv:1511.03855. [Google Scholar]
- Demir, B.; Bruzzone, L. Hashing-Based Scalable Remote Sensing Image Search and Retrieval in Large Archives. IEEE Trans. Geosci. Remote. Sens. 2015, 54, 892–904. [Google Scholar] [CrossRef]
- Dai, O.E.; Demir, B.; Sankur, B.; Bruzzone, L. A Novel System for Content-Based Retrieval of Single and Multi-Label High-Dimensional Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2018, 11, 2473–2490. [Google Scholar] [CrossRef] [Green Version]
- Xia, G.-S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Lowe, D.G. Similarity Metric Learning for a Variable-Kernel Classifier. Neural Comput. 1995, 7, 72–85. [Google Scholar] [CrossRef]
- Mika, S.; Ratsch, G.; Weston, J.; Scholkopf, B.; Mullers, K.R. Fisher Discriminant Analysis with Kernels. In Proceedings of the 1999 IEEE Signal Processing Society Workshop: Neural Networks for Signal Processing IX, Madison, WI, USA, 23–25 August 1999. [Google Scholar]
- Xing, E.P.; Ng, A.Y.; Jordan, M.I.; Russell, S.J. Distance Metric Learning with Application to Clustering with Side-Information. In Proceedings of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2002. [Google Scholar]
- Hadsell, R.; Chopra, S.; Lecun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep metric learning using Triplet network. arXiv 2014, arXiv:1412.6622. [Google Scholar]
- Song, H.O.; Xiang, Y.; Jegelka, S.; Savarese, S. Deep Metric Learning via Lifted Structured Feature Embedding. arXiv 2015, arXiv:1511.06452. [Google Scholar]
- Wang, X.; Hua, Y.; Kodirov, E.; Robertson, N.M. Ranked List Loss for Deep Metric Learning. arXiv 2019, arXiv:1903.03238. [Google Scholar]
- Wu, C.-Y.; Manmatha, R.; Smola, A.J.; Krähenbühl, P. Sampling Matters in Deep Embedding Learning. arXiv 2017, arXiv:1706.07567. [Google Scholar]
- Bell, S.; Bala, K. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 2015, 34, 1–10. [Google Scholar] [CrossRef]
- Chopra, S.; Hadsell, R.; Lecun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep Face Recognition. In Proceedings of the British Machine Vision Conference 2015, Swansea, UK, 7–10 September 2015. [Google Scholar]
- Simo-Serra, E.; Trulls, E.; Ferraz, L.; Kokkinos, I.; Fua, P.; Moreno-Noguer, F. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 118–126. [Google Scholar]
- Zhao, P.; Zhang, T. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lillc, France, 7–9 July 2015; Volume 37, pp. 1–9. [Google Scholar]
- David, G.B.C. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Oliva, A.; Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Babenko, A.; Slesarev, A.; Chigorin, A.; Lempitsky, V.S.J.A. Neural Codes for Image Retrieval. arXiv 2014, arXiv:1404.1777. [Google Scholar]
- Krizhevsky, A.; Hinton, G.E. Using very deep autoencoders for content-based image retrieval. In Proceedings of the 19th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 27–29 April 2011. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Neural Information Processing Systems Conference (NIPS), Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
- Salakhutdinov, R.; Hinton, G. Semantic hashing. Int. J. Approx. Reason. 2009, 50, 969–978. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.-S.; Zhang, Y.; Huang, X.; Zhu, H.; Ma, J. Large-Scale Remote Sensing Image Retrieval by Deep Hashing Neural Networks. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 950–965. [Google Scholar] [CrossRef]
- Li, Y.-S.; Zhang, Y.; Huang, X.; Ma, J. Learning Source-Invariant Deep Hashing Convolutional Neural Networks for Cross-Source Remote Sensing Image Retrieval. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 6521–6536. [Google Scholar] [CrossRef]
- Roy, S.; Sangineto, E.; Demir, B.; Sebe, N. Deep Metric and Hash-Code Learning for Content-Based Retrieval of Remote Sensing Images. In Proceedings of the2018 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2018), Valencia, Spain, 22–27 July 2018; pp. 4539–4542. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Lu, X. A Deep Hashing Technique for Remote Sensing Image-Sound Retrieval. Remote Sens. 2019, 12, 84. [Google Scholar] [CrossRef] [Green Version]
- Kulis, B.; Grauman, K. Kernelized Locality-Sensitive Hashing for Scalable Image Search. In Proceedings of the International Conference on Computer Vision (ICCV), Kyoto, Japan, 27 September–4 October 2009; pp. 2130–2137. [Google Scholar] [CrossRef]
- Weiss, Y.; Torralba, A.; Fergus, R. Spectral Hashing. In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS’08), Vancouver, BC, Canada, 8–10 December 2008; Curran Associates Inc.: Red Hook, NY, USA, 2008; pp. 1753–1760. [Google Scholar]
- Liu, W.; Wang, J.; Ji, R.; Jiang, Y.-G.; Chang, S. Supervised Hashing with Kernels. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar] [CrossRef]
- Al Lehnen; Gary E Wesenberg; The Sphere Game in n Dimensions. Available online: http://faculty.madisoncollege.edu/alehnen/sphere/hypers.htm (accessed on 3 May 2002).
- Gong, Y.; Lazebnik, S.; Gordo, A.; Perronnin, F. Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2916–2929. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hyvrinen, A.; Hurri, J.; Hoyer, P.O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision; Springer Publishing Company, Inc.: Berlin, Germany, 2009. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language, Atlanta, GA, USA, 16 June 2013. [Google Scholar]
Symbol | Definition |
---|---|
X | The set of the remote sensing (RS) images |
Y | The set of the labels |
The m-th class j-th image | |
The m-th class p-th image | |
The l-th class n-th image | |
The high dimensional embedding | |
The low dimensional embedding | |
The i-th hash-like code | |
The i-th hash-code | |
The Hamming distance |
Layers | Input Dimension | Output Dimension | Filter Size | Stride |
---|---|---|---|---|
FC_1 | 2048 | 1024 | 1 | 1 |
LeakyRelu | 1024 | 1024 | - | - |
FC_2 | 1024 | 512 | 1 | 1 |
LeakyRelu | 512 | 512 | - | - |
FC_3 | 512 | L | 1 | 1 |
Sigmoid | L | L | - | - |
Methods | L = 16 Bits | L = 24 Bits | L = 32 Bits | |||
---|---|---|---|---|---|---|
map@20 | Time (ms) | map@20 | Time (ms) | map@20 | Time (ms) | |
KSLSH [9] | 0.557 | 25.3 | 0.594 | 25.5 | 0.630 | 25.6 |
MiLan [33] | 0.875 | 25.3 | 0.890 | 25.5 | 0.904 | 25.6 |
MiLan (Euclidean) [33] | 0.903 | 35.3 | 0.894 | 35.8 | 0.916 | 36.0 |
MiLan+S | 0.904 | 25.3 | 0.911 | 25.5 | 0.918 | 25.6 |
HPSH | 0.815 | 25.3 | 0.841 | 25.5 | 0.850 | 25.6 |
HPSH+S | 0.909 | 25.3 | 0.922 | 25.5 | 0.928 | 25.6 |
HPSH+S (Euclidean) | 0.923 | 35.3 | 0.929 | 35.8 | 0.930 | 36.0 |
Methods | L = 16 Bits | L = 24 Bits | L = 32 Bits | |||
---|---|---|---|---|---|---|
map@20 | Time(ms) | map@20 | Time(ms) | map@20 | Time(ms) | |
KSLSH [9] | 0.426 | 115.3 | 0.467 | 116.1 | 0.495 | 117.5 |
MiLan [33] | 0.876 | 117.5 | 0.891 | 116.0 | 0.926 | 114.5 |
MiLan+S | 0.914 | 117.5 | 0.930 | 116.0 | 0.946 | 114.5 |
HPSH | 0.723 | 117.5 | 0.767 | 116.0 | 0.807 | 114.5 |
HPSH+S | 0.898 | 117.5 | 0.936 | 116.0 | 0.955 | 114.5 |
mAP@K | 8 Bits | 16 Bits | 24 Bits | 32 Bits | 40 Bits |
---|---|---|---|---|---|
mAP@10 | 67.08 | 90.92 | 92.63 | 92.87 | 92.55 |
mAP@20 | 76.76 | 90.90 | 92.25 | 92.80 | 92.85 |
mAP@40 | 81.88 | 88.22 | 90.95 | 92.32 | 90.71 |
Train Test Ratio | 8 Bits | 16 Bits | 24 Bits | 32 Bits | 40 Bits |
---|---|---|---|---|---|
4:6 | 70.87 | 88.41 | 89.62 | 90.61 | 90.87 |
5:5 | 71.71 | 89.30 | 91.22 | 91.45 | 91.83 |
6:4 | 76.82 | 90.91 | 92.25 | 92.80 | 92.88 |
7:3 | 79.88 | 91.92 | 93.22 | 93.25 | 93.69 |
8:2 | 82.82 | 92.74 | 93.28 | 93.29 | 93.89 |
Loss Function | 8 Bits | 16 Bits | 24 Bits | 32 Bits | 40 Bits |
---|---|---|---|---|---|
Contrastive loss | 74.20 | 80.37 | 82.00 | 85.20 | 91.84 |
Triplet loss | 37.20 | 81.22 | 85.40 | 89.23 | 90.26 |
Margin loss | 76.76 | 90.90 | 92.25 | 92.80 | 92.85 |
Margin-β | 8 Bits | 16 Bits | 24 Bits | 32 Bits | 40 Bits |
---|---|---|---|---|---|
0.0 | 5.87 | 56.13 | 69.15 | 81.33 | 84.82 |
0.2 | 8.83 | 67.23 | 76.39 | 82.29 | 84.83 |
0.4 | 46.18 | 89.70 | 91.24 | 91.75 | 91.86 |
0.6 | 76.76 | 90.90 | 92.25 | 92.80 | 92.85 |
0.8 | 75.29 | 81.32 | 88.04 | 88.85 | 89.32 |
1.0 | 62.37 | 67.36 | 76.24 | 80.85 | 82.03 |
Margin-α | 8 Bits | 16 Bits | 24 Bits | 32 Bits | 40 Bits |
---|---|---|---|---|---|
0.0 | 65.04 | 76.61 | 77.00 | 77.64 | 79.08 |
0.1 | 69.01 | 85.52 | 87.00 | 87.25 | 89.57 |
0.2 | 69.55 | 87.89 | 90.16 | 90.38 | 91.21 |
0.3 | 76.76 | 90.90 | 92.25 | 92.80 | 92.85 |
0.4 | 72.27 | 86.83 | 90.28 | 90.52 | 91.15 |
0.5 | 71.57 | 85.21 | 87.41 | 89.52 | 89.88 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shan, X.; Liu, P.; Gou, G.; Zhou, Q.; Wang, Z. Deep Hash Remote Sensing Image Retrieval with Hard Probability Sampling. Remote Sens. 2020, 12, 2789. https://doi.org/10.3390/rs12172789
Shan X, Liu P, Gou G, Zhou Q, Wang Z. Deep Hash Remote Sensing Image Retrieval with Hard Probability Sampling. Remote Sensing. 2020; 12(17):2789. https://doi.org/10.3390/rs12172789
Chicago/Turabian StyleShan, Xue, Pingping Liu, Guixia Gou, Qiuzhan Zhou, and Zhen Wang. 2020. "Deep Hash Remote Sensing Image Retrieval with Hard Probability Sampling" Remote Sensing 12, no. 17: 2789. https://doi.org/10.3390/rs12172789