An Instance Segmentation Based Framework for Large-Sized High-Resolution Remote Sensing Images Registration
"> Figure 1
<p>The problem of non-correspondence between the (<b>a</b>) sensed image with poor positioning accuracy and (<b>b</b>) the reference image.</p> "> Figure 2
<p>The registration result of the entire large-sized, high-resolution (<b>a</b>) sensed image and (<b>b</b>) reference image.</p> "> Figure 3
<p>The (<b>a</b>) correct matches on the ground and the (<b>b</b>) mismatches on the buildings.</p> "> Figure 4
<p>The detected key points on (<b>a</b>) summer greenhouses, (<b>b</b>) winter greenhouses, (<b>c</b>) summer ponds, and (<b>d</b>) winter ponds.</p> "> Figure 5
<p>The proposed instance segmentation based framework for large-sized, high-resolution, remote sensing image registration.</p> "> Figure 6
<p>The structure of our single-stage, fine-grained, instance segmentation network for high-resolution remote sensing images. The input size in this paper is 896 × 896 and note that the BiFPN requires the input size to be a multiple of 128 pixels. P2-P6 indicate the HRNetV2p backbone output layers. In the instance segmentation head of each BiFPN output layer, the three branches represent the classification branch, kernel branch, and feature branch, respectively, which is the same as in SOLOv2. ‘Align’ means the adaptive-pooling, which is the same as in Reference [<a href="#B38-remotesensing-13-01657" class="html-bibr">38</a>]. <span class="html-italic">C</span> refers to the number of the instance categories. <span class="html-italic">S × S</span> refers to the number of grids in SOLOv2, and <span class="html-italic">s</span> refers to the scale of the input size while 1/4<span class="html-italic">s</span> is 228 × 228 in this paper. In the feature branch, each BiFPN output layer is up-sampled by convolutions and bilinear interpolations until it reaches 1/4 scale, and <math display="inline"><semantics> <mo>⊕</mo> </semantics></math> indicates the element-wise summation of them to acquire the unified feature. <math display="inline"><semantics> <mo>⊛</mo> </semantics></math> denotes the dynamic convolution operation, which is the same as in Reference [<a href="#B39-remotesensing-13-01657" class="html-bibr">39</a>]. <math display="inline"><semantics> <mo>⊗</mo> </semantics></math> denotes the fusion of the boundary map and the direction map and ‘Refinement’ denotes the post-processing of the coarse mask, which is the same as in Reference [<a href="#B40-remotesensing-13-01657" class="html-bibr">40</a>].</p> "> Figure 7
<p>The illustration of instance matching. Note that our instance segmentation masks are binary, and, here, we painted them in color to make it easier to understand. The vector on the right side of each patch is its Hu moments (take their base 10 logarithms). The red number below each reference patch represents the Euclidean distance between its Hu moments vector and the Hu moments vector of the sensed patch. The figure shows the rotation invariance of the circular patch and the effectiveness of the matching approach.</p> "> Figure 8
<p>Illustration of the cross-validation. <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mn>1</mn> <mi>s</mi> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mn>1</mn> <mi>r</mi> </msubsup> </mrow> </semantics></math> are a pair of matching key points given by SIFT, as well as <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mn>2</mn> <mi>s</mi> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mn>2</mn> <mi>r</mi> </msubsup> </mrow> </semantics></math>. The dashed arrow indicates the SIFT major orientation of each key point, that is, the positive direction of the x-axis of its coordinate system. Each dashed line without an arrow indicates the positive direction of the y-axis (note that the original positive direction of the y-axis in the image coordinate system is downward). <math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mn>1</mn> <mi>s</mi> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mn>1</mn> <mi>r</mi> </msubsup> </mrow> </semantics></math> denote the centers of the minimum enclosing circles of the center instances. The solid arrow indicates the relative position vector of each key point. In this example, <math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <msubsup> <mi>p</mi> <mn>1</mn> <mi>s</mi> </msubsup> <msubsup> <mi>c</mi> <mn>1</mn> <mi>s</mi> </msubsup> </mrow> <mo stretchy="true">→</mo> </mover> </mrow> </semantics></math> is (46.965, 110.409), and <math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <msubsup> <mi>p</mi> <mn>1</mn> <mi>r</mi> </msubsup> <msubsup> <mi>c</mi> <mn>1</mn> <mi>r</mi> </msubsup> </mrow> <mo stretchy="true">→</mo> </mover> </mrow> </semantics></math> is (260.657, 9.204). Their Euclidean distance is 236.446. <math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <msubsup> <mi>p</mi> <mn>2</mn> <mi>s</mi> </msubsup> <msubsup> <mi>c</mi> <mn>1</mn> <mi>s</mi> </msubsup> </mrow> <mo stretchy="true">→</mo> </mover> </mrow> </semantics></math> is (158.180, 99.205), and <math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <msubsup> <mi>p</mi> <mn>2</mn> <mi>r</mi> </msubsup> <msubsup> <mi>c</mi> <mn>1</mn> <mi>r</mi> </msubsup> </mrow> <mo stretchy="true">→</mo> </mover> </mrow> </semantics></math> is (160.201, 96.003). Their Euclidean distance is 3.787. Therefore, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mn>1</mn> <mi>s</mi> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mn>1</mn> <mi>r</mi> </msubsup> </mrow> </semantics></math> are key points of mismatch and eliminated. The process only involves a basic analytical geometry method, so we do not elaborate on it here.</p> "> Figure 9
<p>The SSFG instance segmentation test results on high-resolution remote sensing images. The four columns from left to right are: the building instance segmentation on Vaihingen, the building instance segmentation on WHU Building Dataset, the pond instance segmentation on Jilin-1-HZ, and the greenhouse instance segmentation on Jilin-1-HZ. We set the classification score threshold as 0.6.</p> "> Figure 10
<p>The influence of the values of <math display="inline"><semantics> <mi>m</mi> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>,</mo> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>t</mi> <msub> <mi>h</mi> <mrow> <mi>c</mi> <mi>v</mi> </mrow> </msub> </mrow> </semantics></math> in our framework for the registration result.</p> "> Figure 11
<p>Visualized result of our registration method on Christchurch.</p> "> Figure 12
<p>Visualized result of our registration method on Hangzhou-1.</p> "> Figure 13
<p>Visualized result of our registration method on Hangzhou-2.</p> ">
Abstract
:1. Introduction
- We propose an automatic registration framework specifically for large high-resolution remote sensing images. The method enhances the applicability of the sensed image with poor positioning accuracy, improves the accuracy and robustness of registration, and remains rather efficient. Furthermore, the framework supports embedding various feature-based methods to satisfy the requirement for more flexible applications. The above points will be illustrated in detail in Section 3 and Section 4.
- We propose an instance segmentation algorithm based on deep learning to achieve fine-grained and efficient extraction of the concerned objects for the registration framework, which will be introduced in Section 3. Even if it is not used for subsequent registration, the independent application of this algorithm to intelligent interpretation of remote sensing images also has broad significance and value.
2. Related Works
2.1. Feature-Based Methods for Remote Sensing Image Registration
2.2. Limitations of Related Works for Large-Sized High-Resolution Remote Sensing Images
3. Proposed Methods
3.1. Single-Stage Fine-Grained Instance Segmentation Network for High-Resolution Remote Sensing Images
- Backbone of high-resolution feature maps. With the pooling of CNN, the loss of object features (especially boundary features) is severe, so that the subsequent up-sampling generates a segmentation mask with poor object boundary fineness. We use HRNetV2p [41] as the backbone of the SSFG, which enables the feature maps to maintain a high-resolution representation during the feature extraction (the size of the feature maps in the main branch is always one-fourth of the input size). This allows the final segmentation to be performed on high-resolution feature maps to acquire finer boundaries. Note that the output of HRNetV2p in Reference [41] is a four-layer feature pyramid network (FPN, P2–P5), and we add a layer after P5 through a 3 × 3 convolution, that is, the output of our backbone is P2–P6.
- Attention mechanism of bidirectional feature fusion. High-resolution remote sensing images cover a very wide area and have more detailed textures, resulting in extremely complicated backgrounds [37]. It leads to a large amount of noise in the feature maps of CNN, thus, reducing the accuracy of object extraction and foreground segmentation [33]. The attention mechanism can make feature extraction pay more attention to the object foreground as well as reduce the noise of the feature map, which is especially suitable and effective for remote sensing images. We adopt the bidirectional cross-scale connections and weighted feature fusion network BiFPN [42] to achieve the attention mechanism. Note that, as recommended in Reference [42], our model uses six layers of BiFPN (when the input size is 896 × 896), but only 1 layer is shown in Figure 6 for illustration.
- Single-stage instance segmentation head. We adopt the SOLOv2 [39] as the head of the SSFG, which directly segments the instances without relying on bounding box detection to generate the coarse instance segmentation mask.
- Post-processing for the segmentation boundary refinement. We adopt a model-agnostic post-processing method SegFix [40], which predicts the boundary map and the direction map based on the shallow feature maps (C2) and fuses the two into an offset map to refine the segmentation boundary. Note that the SegFix network is trained and used separately in Reference [40], while we integrate it into the instance segmentation process to achieve end-to-end. The input of SegFix is P2 of the backbone, and the coarse mask is refined by the offset map to obtain the fine-grained result.
- The training loss function is defined as follows.
3.2. Instance Matching and Corresponding Local Area Generation
- For both instance segmentation masks, we use morphological operations, such as dilation followed by erosion to fill gaps and remove noise.
- For each instance in the sensed image, we take its minimum enclosing circle (the result of instance segmentation includes the contour coordinates). The center of the circle is denoted as , and the radius is denoted as .
- We take a circular patch as the local area to make it rotation invariant. The patch has as its center and as its radius, where is the expansion scale. This is done because the patch formed by the instance and its neighbors has a higher accuracy for matching than the single instance. Since we use OpenCV to calculate the Hu moments and it requires the input to be a matrix, we fill the circular patch with 0 into its smallest external square (the instances are filled with 1 and the background is 0 in our binary segmentation masks) and we can obtain the Hu moments of each instance patch in the sensed image.
- For each instance in the reference image, the center of the minimum enclosing circle of the instance is denoted as , and the radius is denoted as . We use the same approach to get the Hu moments of each instance patch. Note that, since the Hu moments calculated directly have a small order of magnitude, we actually take their base 10 logarithms as the result, which is the same as in Reference [45].
- For each Hu moments vector in the sensed image, we use brute force matching to find the vector with the smallest Euclidean distance in the reference image, thereby, achieving instance matching. In fact, this step can be implemented quickly through matrix operations. The above steps are depicted in Figure 7.
- For each pair of matching instances, we take and as the centers and generate a pair of boxes of size in the sensed and reference images, respectively. We filter out the boxes with the Intersection over Union (IoU) greater than 0.5 in the same image, which is the same as in Reference [27]. Finally, we crop the remaining pairs of boxes into the corresponding image block pairs. This step is illustrated in Figure 5.
- We discuss the impact of the values of and on performance in the subsequent experiments, and we take as 2 and as 600 in this paper. Please refer to Section 4 for details.
3.3. Local Feature Matching, Outlier Removal, and Registration
- For each image block pair, we mask out the unwanted instances and adopt a classic feature-based method (such as SIFT, etc.) to obtain the initial matching key points. The key points are matched by the Euclidean distance ratio between the nearest neighbor and the second nearest neighbor of corresponding features, and the ratio is set to 0.8, which is the same as in Reference [6]. This step is illustrated in Figure 5.
- We eliminate the key points of mismatch through multi-constraints. First, we introduce a cross-validation strategy, which identify the mismatches based on whether the center instances of the image block pair have similar relative positions with the key points. Since many classic hand-crafted features (SIFT, SURF, ORB, etc.) are rotation invariant, we can obtain the relative positions according to the major orientations (which are included in their features) of the key points. Specifically, we denote a pair of matching key points in the sensed and reference image blocks as and , respectively. Taking and as the origins and their major orientations as the positive directions of the x-axes, we can get the locations of and in the corresponding coordinate systems. In fact, the coordinates of and are the relative position vectors of the key points and the corresponding center instances. Then, we calculate the Euclidean distance of the pair of vectors. If it is greater than the threshold , it is considered a false match and eliminated. The is set to 10 in this paper, and please refer to Section 4 for details. Note that the relative position vector does not have scale invariance. If the spatial resolutions of the sensed and reference images are different, it is necessary to scale with the ratio of the resolutions while calculating the relative position vectors. Figure 8 illustrates an example of the cross-validation in detail.
- RANSAC is used to further eliminate false matches for a more accurate matching result. Finally, the affine matrix is computed by the least squares algorithm.
4. Experiments and Results
4.1. Instance Segmentation for High-Resolution Remote Sensing Images
4.1.1. Datasets and Metrics
4.1.2. Implementation Details of the SSFG
4.1.3. Results
4.2. High-Resolution Remote Sensing Image Registration
4.2.1. Test Data and Evaluation Metrics
4.2.2. Implementation Details of the Instance Segmentation Based Registration Framework
4.2.3. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zitova, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef] [Green Version]
- Brown, L.G. A survey of image registration techniques. ACM Comput. Surv. 1992, 24, 325–376. [Google Scholar] [CrossRef]
- Le Moigne, J. Introduction to remote sensing image registration. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 2565–2568. [Google Scholar]
- Ma, W.; Zhang, J.; Wu, Y.; Jiao, L.; Zhu, H.; Zhao, W. A novel two-step registration method for remote sensing images based on deep and local features. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4834–4843. [Google Scholar] [CrossRef]
- Zhu, H.; Jiao, L.; Ma, W.; Liu, F.; Zhao, W. A novel neural network for remote sensing image matching. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2853–2865. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Alcantarilla, P.F.; Solutions, T. Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Patt. Anal. Mach. Intell. 2011, 34, 1281–1298. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Morel, J.M.; Yu, G. ASIFT: A new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2009, 2, 438–469. [Google Scholar] [CrossRef]
- Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-like algorithm for SAR images. IEEE Trans. Geosci. Remote Sens. 2014, 53, 453–466. [Google Scholar] [CrossRef] [Green Version]
- Ma, W.; Wen, Z.; Wu, Y.; Jiao, L.; Gong, M.; Zheng, Y.; Liu, L. Remote sensing image registration with modified SIFT and enhanced feature matching. IEEE Geosci. Remote Sens. Lett. 2016, 14, 3–7. [Google Scholar] [CrossRef]
- Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Qing, Z. Fast and robust matching for multimodal remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar] [CrossRef] [Green Version]
- Huo, C.; Pan, C.; Huo, L.; Zhou, Z. Multilevel SIFT matching for large-size VHR image registration. IEEE Geosci. Rem. Sens. Lett. 2011, 9, 171–175. [Google Scholar] [CrossRef]
- Ma, J.; Chan, J.C.W.; Canters, F. Fully automatic subpixel image registration of multiangle CHRIS/Proba data. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2829–2839. [Google Scholar]
- Sedaghat, A.; Mokhtarzade, M.; Ebadi, H. Uniform robust scale-invariant feature matching for optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4516–4527. [Google Scholar] [CrossRef]
- Goncalves, H.; Corte-Real, L.; Goncalves, J.A. Automatic image registration through image segmentation and SIFT. IEEE Trans. Geosci. Remote Sens. 2011, 49, 2589–2600. [Google Scholar] [CrossRef] [Green Version]
- Gong, M.; Zhao, S.; Jiao, L.; Tian, D.; Wang, S. A novel coarse-to-fine scheme for automatic image registration based on SIFT and mutual information. IEEE Trans. Geosci. Remote Sens. 2013, 52, 4328–4338. [Google Scholar] [CrossRef]
- Ye, Y.; Shan, J. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 2014, 90, 83–95. [Google Scholar] [CrossRef]
- Kupfer, B.; Netanyahu, N.S.; Shimshoni, I. An efficient SIFT-based mode-seeking algorithm for sub-pixel registration of remotely sensed images. IEEE Geosci. Remote Sens. Lett. 2014, 12, 379–383. [Google Scholar] [CrossRef]
- Wu, Y.; Ma, W.; Gong, M.; Su, L.; Jiao, L. A novel point-matching algorithm based on fast sample consensus for image registration. IEEE Geosci. Remote Sens. Lett. 2014, 12, 43–47. [Google Scholar] [CrossRef]
- Ye, Y.; Shen, L. Hopc: A novel similarity metric based on geometric structural properties for multi-modal remote sensing image matching. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 9. [Google Scholar] [CrossRef] [Green Version]
- Ye, Y.; Shan, J.; Bruzzone, L.; Shen, L. Robust registration of multimodal remote sensing images based on structural similarity. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2941–2958. [Google Scholar] [CrossRef]
- Wang, S.; Quan, D.; Liang, X.; Ning, M.; Guo, Y.; Jiao, L. A deep learning framework for remote sensing image registration. ISPRS J. Photogramm. Remote Sens. 2018, 145, 148–164. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Marcos, D.; Tuia, D.; Kellenberger, B.; Zhang, L.; Bai, M.; Liao, R.; Urtasun, R. Learning deep structured active contours end-to-end. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8877–8885. [Google Scholar]
- Cheng, D.; Liao, R.; Fidler, S.; Urtasun, R. Darnet: Deep active ray network for building segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7431–7439. [Google Scholar]
- Hatamizadeh, A.; Sengupta, D.; Terzopoulos, D. End-to-end trainable deep active contour models for automated image segmentation: Delineating buildings in aerial imagery. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 730–746. [Google Scholar]
- Hamaguchi, R.; Fujita, A.; Nemoto, K.; Imaizumi, T.; Hikosaka, S. Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, CA, USA, 12–15 March 2018; pp. 1442–1450. [Google Scholar]
- Mou, L.; Zhu, X.X. Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6699–6711. [Google Scholar] [CrossRef] [Green Version]
- Feng, Y.; Diao, W.; Zhang, Y.; Li, H.; Chang, Z.; Yan, M.; Sun, X.; Gao, X. Ship Instance segmentation from remote sensing images using sequence local context module. In Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1025–1028. [Google Scholar]
- Lu, J.; Jia, H.; Gao, F.; Li, W.; Lu, Q. Reconstruction of digital surface model of single-view remote sensing image by semantic segmentation network. J. Electr. Inf. Technol. 2021, 43, 974–981. [Google Scholar]
- Li, Z.; Zhu, R.; Ma, J.; Meng, X.; Wang, D.; Liu, S. Airport detection method combined with continuous learning of residual-based network on remote sensing image. Acta Opt. Sin. 2020, 40, 1628005. [Google Scholar]
- Zhu, R.; Ma, J.; Li, Z.; Wang, D.; An, Y.; Zhong, X.; Gao, F.; Meng, X. Domestic multispectral image classification based on multilayer perception convolutional neural network. Acta Opt. Sin. 2020, 40, 1528003. [Google Scholar]
- Lu, J.; Li, T.; Ma, J.; Li, Z.; Jia, H. SAR: Single-stage anchor-free rotating object detection. IEEE Access 2020, 8, 205902–205912. [Google Scholar] [CrossRef]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 649–665. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2: Dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 1–17. [Google Scholar]
- Yuan, Y.; Xie, J.; Chen, X.; Wang, J. Segfix: Model-agnostic boundary refinement for segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 489–506. [Google Scholar]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. 2019. Available online: https://arxiv.org/abs/1904.04514 (accessed on 9 April 2019).
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Hu, M.K. Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 1962, 8, 179–187. [Google Scholar]
- Huang, Z.; Leng, J. Analysis of Hu’s moment invariants on image scaling and rotation. In Proceedings of the IEEE 2010 2nd International Conference on Computer Engineering and Technology, Chengdu, China, 16–18 April 2010; pp. 476–480. [Google Scholar]
- Cramer, M. The DGPF-test on digital airborne camera evaluation overview and test design. PFG Photogramm. Fernerkund. Geoinf. 2010, 2, 73–82. [Google Scholar] [CrossRef] [PubMed]
- Silberman, N.; Sontag, D.; Fergus, R. Instance segmentation of indoor scenes using a coverage loss. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 616–631. [Google Scholar]
- Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 724–732. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open Mmlab Detection toolbox and Benchmark. 2019. Available online: https://arxiv.org/abs/1906.07155 (accessed on 17 June 2019).
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9799–9808. [Google Scholar]
Method | Vaihingen | ||||
---|---|---|---|---|---|
Model | Backbone | Dice | mIoU | WCov | BoundF |
DSAC [28] | DSAC | - | 71.10 | 70.70 | 36.40 |
DarNet [29] | DarNet | 93.66 | 88.20 | 88.10 | 75.90 |
TDAC [30] | TDAC | 94.26 | 89.16 | 90.54 | 78.12 |
SSFG (ours) | HRNetV2p-W40 | 94.79 | 88.51 | 90.26 | 81.54 |
Method | WHU Building Dataset | ||||
---|---|---|---|---|---|
Model | Backbone | Dice | mIoU | WCov | BoundF |
SiU-Net [26] | U-Net | - | 88.40 | - | - |
SSFG (ours) | HRNetV2p-W40 | 93.87 | 89.18 | 89.93 | 80.02 |
Method | Jilin-1-HZ Pond | |||||
---|---|---|---|---|---|---|
Model | Backbone | Dice | mIoU | WCov | BoundF | FPS |
Yolact [50] | ResNet101 | 78.15 | 72.06 | 73.11 | 39.45 | 29.1 |
Mask RCNN [27] | ResNet101 | 86.70 | 77.36 | 82.55 | 56.80 | 10.4 |
PointRend [51] | ResNet101 | 89.31 | 82.13 | 85.04 | 78.60 | 8.9 |
SSFG (ours) | HRNetV2p-W40 | 94.88 | 90.68 | 92.54 | 83.10 | 20.3 |
Method | Jilin-1-HZ Greenhouse | |||||
---|---|---|---|---|---|---|
Model | Backbone | Dice | mIoU | WCov | BoundF | FPS |
Yolact [50] | ResNet101 | 70.88 | 65.40 | 68.21 | 37.28 | 29.6 |
Mask RCNN [27] | ResNet101 | 79.51 | 67.03 | 77.02 | 45.25 | 9.7 |
PointRend [51] | ResNet101 | 86.52 | 80.90 | 83.29 | 71.45 | 9.0 |
SSFG (ours) | HRNetV2p-W40 | 90.12 | 84.39 | 85.87 | 79.70 | 19.5 |
Image | Pixel Size | Time | Sensor | Resolution | Instance | PE 1 | |
---|---|---|---|---|---|---|---|
Christchurch | Sensed | 32,507 × 15,354 | Apr 2012 | Aerial | 0.3 m | Building | 22.3 m |
New Zealand | Reference | 32,771 × 15,920 | 2016 | ||||
Hangzhou-1, China | Sensed | 36,004 × 24,002 | Aug 2020 | Jilin-1 Satellites | 0.75 m | Pond, greenhouse | 110.5 m |
Reference | 37,960 × 27,050 | Jan 2021 | |||||
Hangzhou-2, China | Sensed | 36,003 × 24,002 | Aug 2020 | Jilin-1 Satellites | 0.75 m | Pond, greenhouse | 1204.1 m |
Reference | 40,989 × 32,504 | Jan 2021 |
Method | Christchurch | |||
---|---|---|---|---|
RMSE | NOCC | ROCC | ET (min) | |
SIFT [6] | 1.6504 | 4419 | 0.39 | 9.48 |
Affine-SIFT [11] | 6.2485 | 8506 | 0.32 | 27.19 |
SAR-SIFT [12] | 9.8316 | 3081 | 0.40 | 62.52 |
PSO-SIFT [13] | 1.6002 | 2165 | 0.31 | 46.31 |
CFOG [14] | 2.3106 | 1303 | 0.69 | 3.60 |
ours + SIFT | 1.5710 | 724 | 0.53 | 4.78 |
ours + SURF | 2.0460 | 491 | 0.41 | 3.86 |
ours + ORB | 2.5173 | 334 | 0.32 | 3.13 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, J.; Jia, H.; Li, T.; Li, Z.; Ma, J.; Zhu, R. An Instance Segmentation Based Framework for Large-Sized High-Resolution Remote Sensing Images Registration. Remote Sens. 2021, 13, 1657. https://doi.org/10.3390/rs13091657
Lu J, Jia H, Li T, Li Z, Ma J, Zhu R. An Instance Segmentation Based Framework for Large-Sized High-Resolution Remote Sensing Images Registration. Remote Sensing. 2021; 13(9):1657. https://doi.org/10.3390/rs13091657
Chicago/Turabian StyleLu, Junyan, Hongguang Jia, Tie Li, Zhuqiang Li, Jingyu Ma, and Ruifei Zhu. 2021. "An Instance Segmentation Based Framework for Large-Sized High-Resolution Remote Sensing Images Registration" Remote Sensing 13, no. 9: 1657. https://doi.org/10.3390/rs13091657