Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning
<p>Characteristics of objects in HRSIs. (<b>a</b>) There are huge scale variations among different planes. (<b>b</b>) Harbors present complex boundaries. (<b>c</b>) Densely packed ships appear in the marina. Notice the size gap among objects in the three scenes and the shape differences among the harbors of (<b>b</b>,<b>c</b>).</p> "> Figure 2
<p>The network structure of the proposed method, which is based on the architecture of PANet and adds the cross-scale adaptive fusion (CSAF) module for multi-scale feature map fusion, context attention upsampling (CAU) module to refine mask prediction and dynamic sample selection (DSS) module to select suitable positive/negative samples.</p> "> Figure 3
<p>The structure of proposed cross-scale adaptive fusion module. For each pyramidal feature map, the others are rescaled to the same shape and then spatially fused together according to the learned fusion weights.</p> "> Figure 4
<p>Illustration for the cross-scale adaptive fusion mechanism. Here we take the fusion to the target layer <math display="inline"><semantics> <msub> <mover accent="true"> <mi>P</mi> <mo>^</mo> </mover> <mn>4</mn> </msub> </semantics></math> for example.</p> "> Figure 5
<p>Illustration of the context attention upsampling. A feature map <math display="inline"><semantics> <mo>Ψ</mo> </semantics></math> with size <math display="inline"><semantics> <mrow> <mi>W</mi> <mo>×</mo> <mi>H</mi> <mo>×</mo> <mi>C</mi> </mrow> </semantics></math> is upsampled by a factor of <math display="inline"><semantics> <mi>η</mi> </semantics></math> to the output feature map <math display="inline"><semantics> <msup> <mo>Ψ</mo> <mo>′</mo> </msup> </semantics></math>. Here we take <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math> for example.</p> "> Figure 6
<p>The mutual interference among densely packed instances. There can be multiple objects in a candidate bounding box (as shown in the red box in the figure). These neighboring objects that have similar appearances and structures can be considered as interference noise during locating and classification, which affects the prediction results of the network.</p> "> Figure 7
<p>The calculation process of penalty item. The yellow box and the light blue box denote the candidate positive sample and its corresponding ground-truth box. The blue dashed box <span class="html-italic">R</span> represents their minimum enclosing rectangle.</p> "> Figure 8
<p>The differences between <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>o</mi> <mi>U</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>o</mi> <msub> <mi>U</mi> <mi>p</mi> </msub> </mrow> </semantics></math>. The candidate box <span class="html-italic">c</span> of (<b>a</b>) is placed parallel to the ground-truth <span class="html-italic">g</span>, while that of (<b>b</b>) is placed in a misplaced position. Although the IoU of (<b>a</b>,<b>b</b>) is the same, their difficulty of coordinate regression is different.</p> "> Figure 9
<p>Class-wise instance segmentation of proposed approach on iSAID validation set.</p> "> Figure 10
<p>Class-wise instance segmentation of proposed approach on NWPU VHR-10 test set.</p> "> Figure 11
<p>Visual instance segmentation results of the proposed method on iSAID validation set. (<b>a</b>) input images; (<b>b</b>) ground-truth mask; (<b>c</b>,<b>d</b>) predicted results of PANet and our method. The red rectangles indicate the missing prediction and under-segmentation problems of PANet.</p> "> Figure 12
<p>Visual instance segmentation results of the proposed method on NWPU VHR-10 test set. (<b>a</b>) input images; (<b>b</b>) ground-truth mask; (<b>c</b>,<b>d</b>) predicted results of PANet and our method.</p> ">
Abstract
:1. Introduction
- We propose the CSAF module with a novel multi-scale information fusion mechanism that can learn the fusion weights adaptively according to different input feature maps from the FPN.
- We extend the original deconvolution layer into the CAU module in the segmentation branch which can obtain more refined predicted masks by generating different upsampling kernels with contextual information.
- Instead of the traditional fixed threshold for determining positive and negative samples, the DSS module employs a dynamic threshold calculation algorithm to select more representative positive/negative samples.
- The proposed method has achieved state-of-the-art performance on two challenging public datasets for instance segmentation task in HRSIs.
2. Methodology
2.1. Cross-Scale Adaptive Fusion
2.2. Context Attention Upsampling
2.3. Dynamic Sample Selection
2.3.1. The DSS Strategy
2.3.2. Constrained IoU Calculation
3. Experiments
3.1. Datasets Description
3.2. Evaluation Metrics
3.3. Implementation Details
3.4. Quantitative Results
3.4.1. Results on iSAID
3.4.2. Results on NWPU VHR-10 Instance Segmentation Dataset
3.5. Qualitative Results
3.6. Ablation Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mou, L.; Zhu, X.X. Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6699–6711. [Google Scholar] [CrossRef] [Green Version]
- Zheng, X.; Gong, T.; Li, X.; Lu, X. Generalized Scene Classification from Small-Scale Datasets with Multitask Learning. IEEE Trans. Geosci. Remote Sens. 2021, in press. [Google Scholar] [CrossRef]
- Feng, Y.; Diao, W.; Zhang, Y.; Li, H.; Chang, Z.; Yan, M.; Sun, X.; Gao, X. Ship Instance Segmentation from Remote Sensing Images Using Sequence Local Context Module. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1025–1028. [Google Scholar]
- Zhao, K.; Kang, J.; Jung, J.; Sohn, G. Building Extraction From Satellite Images Using Mask R-CNN With Building Boundary Regularization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 247–251. [Google Scholar]
- Cheng, D.; Liao, R.; Fidler, S.; Urtasun, R. Darnet: Deep active ray network for building segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7431–7439. [Google Scholar]
- Luo, F.; Zou, Z.; Liu, J.; Lin, Z. Dimensionality reduction and classification of hyperspectral image via multi-structure unified discriminative embedding. IEEE Trans. Geosci. Remote Sens. 2021, in press. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning roi transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2849–2858. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Bischke, B.; Helber, P.; Folz, J.; Borth, D.; Dengel, A. Multi-task learning for segmentation of building footprints with deep neural networks. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1480–1484. [Google Scholar]
- Zheng, X.; Chen, X.; Lu, X.; Sun, B. Unsupervised Change Detection by Cross-Resolution Difference Learning. IEEE Trans. Geosci. Remote. Sens. 2021, in press. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Chen, L.C.; Hermans, A.; Papandreou, G.; Schroff, F.; Wang, P.; Adam, H. Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4013–4022. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; Wang, X. Mask scoring r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 6409–6418. [Google Scholar]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9799–9808. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4974–4983. [Google Scholar]
- Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. BlendMask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8573–8581. [Google Scholar]
- Xie, S.; Chen, Z.; Xu, C.; Lu, C. Environment upgrade reinforcement learning for non-differentiable multi-stage pipelines. J. Chongqing Univ. Posts Telecommun. 2020, 32, 857–858. [Google Scholar]
- Su, H.; Wei, S.; Yan, M.; Wang, C.; Shi, J.; Zhang, X. Object detection and instance segmentation in remote sensing imagery based on precise mask R-CNN. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1454–1457. [Google Scholar]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-quality instance segmentation for remote sensing imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef] [Green Version]
- Ran, J.; Yang, F.; Gao, C.; Zhao, Y.; Qin, A. Adaptive Fusion and Mask Refinement Instance Segmentation Network for High Resolution Remote Sensing Images. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2843–2846. [Google Scholar]
- Zhang, T.; Zhang, X.; Zhu, P.; Tang, X.; Li, C.; Jiao, L.; Zhou, H. Semantic Attention and Scale Complementary Network for Instance Segmentation in Remote Sensing Images. IEEE Trans. Cybern. 2021, in press. [Google Scholar] [CrossRef] [PubMed]
- Zeng, X.; Wei, S.; Wei, J.; Zhou, Z.; Shi, J.; Zhang, X.; Fan, F. CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images. Remote Sens. 2021, 13, 2788. [Google Scholar] [CrossRef]
- Luo, F.; Zhang, L.; Zhou, X.; Guo, T.; Cheng, Y.; Yin, T. Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1082–1086. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 November 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
- Carreira, J.; Sminchisescu, C. CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1312–1328. [Google Scholar] [CrossRef] [PubMed]
- Arbeláez, P.; Pont-Tuset, J.; Barron, J.T.; Marques, F.; Malik, J. Multiscale combinatorial grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 328–335. [Google Scholar]
- Pinheiro, P.O.; Collobert, R.; Dollár, P. Learning to segment object candidates. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, 7–12 December 2015; pp. 1990–1998. [Google Scholar]
- Pinheiro, P.O.; Lin, T.Y.; Collobert, R.; Dollár, P. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 75–91. [Google Scholar]
- Dai, J.; He, K.; Li, Y.; Ren, S.; Sun, J. Instance-sensitive fully convolutional networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 534–549. [Google Scholar]
- Arnab, A.; Jayasumana, S.; Zheng, S.; Torr, P.H. Higher order conditional random fields in deep neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 524–540. [Google Scholar]
- Li, Y.; Qi, H.; Dai, J.; Ji, X.; Wei, Y. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2359–2367. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Bai, M.; Urtasun, R. Deep watershed transform for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5221–5229. [Google Scholar]
- Hsu, Y.C.; Xu, Z.; Kira, Z.; Huang, J. Learning to Cluster for Proposal-Free Instance Segmentation. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Liu, S.; Jia, J.; Fidler, S.; Urtasun, R. Sgn: Sequential grouping networks for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3496–3504. [Google Scholar]
- Neven, D.; Brabandere, B.D.; Proesmans, M.; Gool, L.V. Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 8837–8845. [Google Scholar]
- Xie, E.; Sun, P.; Song, X.; Wang, W.; Liu, X.; Liang, D.; Shen, C.; Luo, P. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12193–12202. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 649–665. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2: Dynamic and Fast Instance Segmentation. In Proceedings of the Advances in Neural Information Processing Systems, virtual, 6–12 December 2020; pp. 17721–17732. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 28–37. [Google Scholar]
- Luo, F.; Huang, H.; Ma, Z.; Liu, J. Semisupervised sparse manifold discriminative analysis for feature extraction of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6197–6211. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zheng, X.; Wang, B.; Du, X.; Lu, X. Mutual Attention Inception Network for Remote Sensing Visual Question Answering. IEEE Trans. Geosci. Remote Sens. 2021, in press. [Google Scholar] [CrossRef]
- Cao, J.; Cholakkal, H.; Anwer, R.M.; Khan, F.S.; Pang, Y.; Shao, L. D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11485–11494. [Google Scholar]
- Gao, C.; Chen, X. Deep learning based action detection: A survey. J. Chongqing Univ. Posts Telecommun. 2020, 32, 991–1002. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 260–275. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Wang, G.; Wang, K.; Lin, L. Adaptively connected neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1781–1790. [Google Scholar]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
- iSAID: A Large-Scale Dataset for Instance Segmentation in Aerial Images. Available online: https://captain-whu.github.io/iSAID/evaluation.html (accessed on 18 November 2021).
- Source Code for Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning. Available online: https://github.com/yuanxiangyuee/ins_seg_HRSIs (accessed on 10 November 2021).
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
Method | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN [11] | 35.4 | 57.7 | 37.8 | 37.4 | 50.0 | 53.0 | 40.4 | 62.3 | 44.3 | 42.6 | 48.4 | 62.4 |
PANet [13] | 38.2 | 62.4 | 41.0 | 40.4 | 51.9 | 55.5 | 43.2 | 66.0 | 47.6 | 45.5 | 49.4 | 58.4 |
BlendMask [17] | 36.8 | 60.5 | 38.2 | 20.8 | 45.5 | 53.1 | 43.6 | 64.1 | 47.6 | 28.4 | 50.1 | 55.2 |
PointRend [15] | 38.5 | 62.2 | 40.8 | 39.9 | 56.0 | 53.5 | 43.6 | 65.2 | 47.9 | 44.9 | 58.0 | 48.8 |
ours | 40.1 | 64.6 | 43.0 | 42.0 | 56.6 | 77.3 | 46.2 | 68.8 | 51.3 | 48.3 | 58.3 | 80.1 |
Method | SV | LV | PL | ST | Ship | SP | HB | TC | GTF | SBF | BD | BR | BC | RA | HC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 35.4 | 15.5 | 37.3 | 51.5 | 38.5 | 44.3 | 33.5 | 28.3 | 76.8 | 26.0 | 38.7 | 49.8 | 20.3 | 34.2 | 31.7 | 4.8 |
PANet | 38.2 | 18.0 | 41.0 | 54.0 | 39.6 | 54.4 | 35.4 | 30.3 | 78.1 | 29.9 | 39.3 | 51.7 | 20.7 | 38.5 | 35.5 | 7.0 |
BlendMask | 36.8 | 14.7 | 37.8 | 54.7 | 40.3 | 41.0 | 34.2 | 30.4 | 78.4 | 23.6 | 41.4 | 51.9 | 20.9 | 38.5 | 35.6 | 8.1 |
PointRend | 38.5 | 16.4 | 40.7 | 54.4 | 37.4 | 49.2 | 32.6 | 31.3 | 79.2 | 35.3 | 43.0 | 52.7 | 22.1 | 39.4 | 35.6 | 8.3 |
ours | 40.1 | 20.0 | 41.9 | 54.1 | 41.7 | 56.6 | 36.0 | 31.4 | 78.0 | 31.5 | 45.4 | 55.2 | 22.4 | 43.9 | 35.2 | 8.1 |
Method | SV | LV | PL | ST | Ship | SP | HB | TC | GTF | SBF | BD | BR | BC | RA | HC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 40.4 | 19.7 | 42.8 | 69.3 | 38.7 | 46.3 | 37.7 | 47.3 | 76.0 | 40.4 | 37.4 | 49.1 | 24.5 | 32.6 | 30.5 | 14.3 |
PANet | 43.2 | 22.4 | 45.8 | 70.6 | 39.5 | 57.0 | 40.2 | 50.1 | 77.6 | 41.8 | 37.1 | 51.4 | 24.6 | 36.6 | 35.3 | 18.5 |
BlendMask | 43.6 | 19.9 | 44.7 | 74.3 | 42.5 | 47.7 | 39.5 | 49.4 | 80.0 | 34.9 | 43.6 | 52.6 | 26.0 | 39.1 | 37.6 | 21.3 |
PointRend | 43.6 | 20.3 | 45.0 | 70.7 | 37.0 | 52.8 | 36.1 | 50.9 | 79.9 | 45.8 | 42.7 | 53.3 | 25.9 | 39.1 | 36.3 | 17.8 |
ours | 46.2 | 25.8 | 47.5 | 71.0 | 42.3 | 60.6 | 41.6 | 53.6 | 78.3 | 44.2 | 45.0 | 55.8 | 27.8 | 42.9 | 34.8 | 21.5 |
Method | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 33.4 | 56.8 | 34.7 | 35.8 | 46.5 | 23.9 | 37.2 | 60.8 | 40.7 | 39.8 | 43.7 | 16.0 |
PANet | 38.4 | 61.8 | 41.4 | 41.2 | 47.2 | 11.8 | 43.8 | 65.3 | 49.3 | 46.2 | 52.6 | 17.5 |
blendmask | 36.7 | 59.5 | 39.0 | 39.5 | 44.7 | 10.6 | 43.1 | 62.3 | 48.4 | 45.8 | 49.0 | 19.1 |
PointRend | 38.1 | 61.3 | 41.0 | 40.7 | 48.6 | 16.9 | 43.5 | 64.6 | 48.5 | 45.8 | 53.0 | 22.8 |
D2det | 37.5 | 61.0 | 39.8 | - | - | - | - | - | - | - | - | - |
ours | 39.7 | 62.7 | 42.8 | 41.9 | 54.6 | 25.5 | 46.1 | 66.4 | 52.2 | 48.5 | 53.8 | 26.2 |
Method | SV | LV | PL | ST | Ship | SP | HB | TC | GTF | SBF | BD | BR | BC | RA | HC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 33.4 | 16.9 | 30.4 | 41.7 | 32.0 | 48.8 | 36.7 | 29.6 | 72.9 | 25.9 | 26.7 | 39.6 | 15.2 | 43.1 | 36.0 | 5.6 |
PANet | 38.4 | 17.6 | 32.1 | 45.0 | 37.3 | 50.7 | 38.3 | 33.3 | 76.2 | 28.8 | 38.9 | 53.4 | 18.9 | 48.5 | 41.3 | 15.2 |
BlendMask | 36.7 | 15.8 | 31.1 | 45.4 | 35.8 | 47.0 | 38.5 | 34.5 | 74.8 | 22.7 | 33.9 | 52.9 | 16.4 | 49.1 | 40.4 | 11.8 |
PointRend | 38.1 | 16.3 | 32.3 | 46.2 | 36.3 | 47.7 | 37.8 | 36.5 | 75.0 | 28.6 | 36.9 | 51.8 | 17.5 | 48.2 | 41.5 | 13.2 |
D2det | 37.5 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
ours | 39.7 | 18.9 | 34.4 | 47.2 | 37.5 | 54.7 | 40.8 | 35.7 | 77.6 | 31.2 | 40.0 | 54.5 | 19.3 | 49.4 | 41.9 | 15.5 |
Method | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 62.8 | 90.2 | 69.0 | 42.3 | 61.3 | 69.5 | 60.6 | 90.8 | 69.2 | 52.2 | 60.9 | 54.8 |
PANet | 64.8 | 92.3 | 72.7 | 45.1 | 63.7 | 72.3 | 66.3 | 91.3 | 77.3 | 57.9 | 57.5 | 60.2 |
PointRend | 65.4 | 88.1 | 73.0 | 42.8 | 63.4 | 77.5 | 64.9 | 88.3 | 76.3 | 54.0 | 65.1 | 60.7 |
BlendMask | 65.7 | 91.3 | 73.7 | 41.2 | 64.5 | 69.8 | 68.0 | 91.1 | 79.0 | 56.7 | 68.1 | 57.7 |
ours | 67.7 | 93.3 | 76.7 | 48.2 | 65.0 | 78.3 | 69.4 | 93.1 | 81.5 | 58.4 | 69.5 | 65.6 |
Method | PL | SH | ST | BD | TC | BC | GTF | HB | BR | VC | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 62.8 | 43.6 | 50.4 | 79.0 | 82.9 | 73.0 | 76.4 | 83.1 | 53.9 | 35.1 | 50.8 |
PANet | 64.8 | 50.6 | 53.5 | 78.4 | 83.5 | 73.0 | 78.1 | 87.2 | 58.6 | 33.8 | 51.6 |
PointRend | 65.4 | 54.5 | 53.2 | 75.7 | 84.3 | 72.4 | 74.4 | 90.1 | 58.8 | 35.9 | 54.7 |
BlendMask | 65.7 | 48.1 | 51.1 | 79.8 | 84.0 | 72.4 | 76.7 | 91.5 | 58.9 | 39.6 | 54.6 |
ours | 67.7 | 51.7 | 55.6 | 79.8 | 84.4 | 73.9 | 83.0 | 93.3 | 60.7 | 42.3 | 57.5 |
Method | PL | SH | ST | BD | TC | BC | GTF | HB | BR | VC | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 60.6 | 64.8 | 49.4 | 76.6 | 82.8 | 66.3 | 69.1 | 67.9 | 43.4 | 29.3 | 55.9 |
PANet | 66.2 | 79.0 | 61.5 | 75.4 | 80.7 | 76.9 | 74.9 | 74.1 | 51.5 | 32.3 | 56.3 |
PointRend | 63.8 | 75.6 | 60.1 | 76.3 | 81.5 | 71.4 | 68.4 | 72.8 | 43.4 | 31.9 | 56.2 |
BlendMask | 68.0 | 78.5 | 60.3 | 76.6 | 83.7 | 75.3 | 79.1 | 77.2 | 54.3 | 35.4 | 59.7 |
ours | 69.4 | 82.9 | 63.2 | 77.0 | 83.9 | 76.2 | 81.4 | 77.4 | 54.8 | 36.2 | 60.6 |
AUG | CSAF | CAU | DSS | Instance Segmentation | Object Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34.2 | 56.6 | 35.8 | 19.6 | 42.3 | 46.6 | 41.7 | 60.9 | 46.6 | 26.9 | 47.8 | 51.0 | ||||
✓ | 38.2 | 62.4 | 41.0 | 40.4 | 51.9 | 55.5 | 43.2 | 66.0 | 47.6 | 45.5 | 49.4 | 58.4 | |||
✓ | ✓ | 39.4 | 63.1 | 42.6 | 41.2 | 56.2 | 71.5 | 45.7 | 67.0 | 50.8 | 47.5 | 57.2 | 77.4 | ||
✓ | ✓ | ✓ | 39.7 | 63.4 | 42.9 | 41.4 | 56.4 | 71.9 | 45.8 | 67.2 | 51.1 | 47.5 | 57.3 | 77.4 | |
✓ | ✓ | ✓ | ✓ | 40.1 | 64.6 | 43.0 | 42.0 | 56.6 | 77.3 | 46.2 | 68.8 | 51.3 | 48.3 | 58.3 | 80.1 |
AUG | CSAF | CAU | DSS | Instance Segmentation | Object Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
63.1 | 90.5 | 68.6 | 42.0 | 61.9 | 69.6 | 62.3 | 90.3 | 73.6 | 54.2 | 63.5 | 52.2 | ||||
✓ | 64.8 | 91.3 | 72.7 | 45.1 | 63.7 | 72.3 | 66.3 | 91.3 | 77.3 | 57.9 | 57.5 | 60.2 | |||
✓ | ✓ | 65.4 | 92.4 | 72.6 | 45.8 | 64.1 | 73.4 | 68.0 | 92.5 | 79.0 | 58.3 | 68.6 | 61.8 | ||
✓ | ✓ | ✓ | 66.6 | 91.3 | 75.7 | 46.9 | 64.8 | 74.8 | 68.2 | 92.8 | 79.6 | 58.4 | 68.6 | 62.2 | |
✓ | ✓ | ✓ | ✓ | 67.7 | 93.3 | 76.7 | 48.2 | 65.0 | 78.3 | 69.4 | 93.1 | 81.5 | 58.5 | 69.5 | 65.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, F.; Yuan, X.; Ran, J.; Shu, W.; Zhao, Y.; Qin, A.; Gao, C. Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning. Remote Sens. 2021, 13, 4774. https://doi.org/10.3390/rs13234774
Yang F, Yuan X, Ran J, Shu W, Zhao Y, Qin A, Gao C. Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning. Remote Sensing. 2021; 13(23):4774. https://doi.org/10.3390/rs13234774
Chicago/Turabian StyleYang, Feng, Xiangyue Yuan, Jie Ran, Wenqiang Shu, Yue Zhao, Anyong Qin, and Chenqiang Gao. 2021. "Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning" Remote Sensing 13, no. 23: 4774. https://doi.org/10.3390/rs13234774
APA StyleYang, F., Yuan, X., Ran, J., Shu, W., Zhao, Y., Qin, A., & Gao, C. (2021). Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning. Remote Sensing, 13(23), 4774. https://doi.org/10.3390/rs13234774