HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery
"> Figure 1
<p>Examples of objects in the high-resolution (HR) remote sensing imagery. (<b>a</b>) original images; (<b>b</b>) bounding box results; (<b>c</b>) rotational bounding box results; (<b>d</b>) instance mask results.</p> "> Figure 2
<p>The architecture of the Mask R-CNN.</p> "> Figure 3
<p>Illustration of the high-quality instance segmentation network (HQ-ISNet) approach where “HRFPN” indicates a backbone network; “RPN” indicates the proposals; “Cs” indicates the classification; “M” denotes the mask branch; “B” represents the bounding box; “H” denotes the detection head; “pool” means region feature extraction.</p> "> Figure 4
<p>Illustration of the HR feature pyramid network (HRFPN) block.</p> "> Figure 5
<p>The architecture of the ISNet method where “Cs” denotes the classification; “B” indicates the bounding box; “M” represents the mask branch; “H” denotes the detection head; “pool” represents the region feature extraction. (<b>a</b>) ISNetV1; (<b>b</b>) ISNetV2.</p> "> Figure 6
<p>The architecture of three-stage mask branches.</p> "> Figure 7
<p>Example images and annotated instance masks of the synthetic aperture radar ship detection dataset (SSDD). (<b>a</b>) original synthetic aperture radar (SAR) images; (<b>b</b>) labeling process; (<b>c</b>) visualization result.</p> "> Figure 8
<p>Example images and annotated instance masks of the NWPU VHR-10 data set.</p> "> Figure 9
<p>Instance segmentation outcomes of the proposed approach in SAR images and remote sensing optical images. (<b>a</b>) and (<b>c</b>) are ground-truth mask; (<b>b</b>) and (<b>d</b>) are predicted instance results.</p> "> Figure 10
<p>Instance segmentation outcomes with the HQ-ISNet on the SAR image from the port of Houston. (<b>a</b>) Final results; (<b>b</b>) mask instance labels; (<b>c</b>) Sentinel-1B sensor imaging area.</p> "> Figure 11
<p>Outcomes of FPN and HRFPN in SAR images and remote sensing optical images. Row 1 is ground-truth mask; Row 2 and Row 3 are the outcomes of FPN and HRFPN, respectively.</p> "> Figure 12
<p>Comparison results of ISNetV1 and ISNetV2 in SAR images and remote sensing optical images. Row 1 is ground-truth mask; Row 2 is the outcome of ISNetV1; Row 3 is the outcome of ISNetV2.</p> "> Figure 13
<p>Outcomes from six approaches on the SSDD dataset. Row 1 is ground-truth mask; Row 2-4 are the outcomes of Faster R-CNN, Cascade R-CNN, and Mask R-CNN, respectively; Row 5-7 are the outcomes of Cascade Mask R-CNN, Hybrid Task Cascade, and HQ-ISNet, respectively.</p> "> Figure 14
<p>Outcomes from six approaches on the NWPU VHR-10 dataset. Row 1 is ground-truth mask; Row 2-4 are the outcomes of Faster R-CNN, Cascade R-CNN, and Mask R-CNN, respectively; Row 5-7 are the outcomes of Cascade Mask R-CNN, Hybrid Task Cascade, and HQ-ISNet, respectively.</p> "> Figure 15
<p>Statistical results of the instances. (<b>a</b>) SSDD; (<b>b</b>) NWPU VHR-10.</p> ">
Abstract
:1. Introduction
- We introduce HRFPN into remote sensing image instance segmentation to fully utilize multi-level feature maps and maintain HR feature maps, so as to solve the problem of spatial resolution loss in FPN;
- We design an ISNetV2 to refine mask information flow between mask branches, thereby promoting the improvement in mask prediction accuracy;
- We construct a new, more challenging dataset based on the SSDD and the NWPU VHR-10 dataset for remote sensing images instance segmentation, and it can be used as a benchmark for evaluating instance segmentation algorithms in the HR remote sensing images. In addition, we provide a study baseline for instance segmentation in remote sensing images;
- Most importantly, we are the first to perform instance segmentation in SAR images.
2. Related Work
2.1. Object Detection
2.2. Instance Segmentation
3. The Methods
3.1. Detailed Description of the HQ-ISNet
3.1.1. Backbone Network and RPN
3.1.2. Instance Segmentation Network
3.2. Loss Function
4. Experiments
4.1. Dataset Description
4.1.1. The SSDD Dataset
4.1.2. The NWPU VHR-10 Dataset
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Results and Analysis of HQ-ISNet
4.4.1. Results of the HQ-ISNet
4.4.2. Effect of HRFPN
4.4.3. Effect of ISNetV2
4.5. Comparison with Other Approaches
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense Attention Pyramid Networks for Multi-Scale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
- Mou, L.; Zhu, X. Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6699–6711. [Google Scholar] [CrossRef] [Green Version]
- Su, H.; Wei, S.; Yan, M.; Wang, C.; Shi, J.; Zhang, X. Object Detection and Instance Segmentation in Remote Sensing Imagery Based on Precise Mask R-CNN. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1454–1457. [Google Scholar]
- Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 Imagery. Remote Sens. 2019, 11, 531. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Lin, S.; Ding, L.; Bruzzone, L. Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images. Remote Sens. 2020, 12, 701. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Ma, H.; Liu, Y.; Ren, Y.; Yu, J. Detection of Collapsed Buildings in Post-Earthquake Remote Sensing Images Based on the Improved YOLOv3. Remote Sens. 2020, 12, 44. [Google Scholar] [CrossRef] [Green Version]
- Gong, Y.; Xiao, Z.; Tan, X.; Sui, H.; Xu, C.; Duan, H.; Li, D. Context-Aware Convolutional Neural Network for Object Detection in VHR Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2019. [Google Scholar] [CrossRef]
- Liu, N.; Cao, Z.; Cui, Z.; Pi, Y.; Dang, S. Multi-Layer Abstraction Saliency for Airport Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9820–9831. [Google Scholar] [CrossRef]
- Wei, S.; Su, H.; Ming, J.; Wang, C.; Yan, M.; Kumar, D.; Shi, J.; Zhang, X. Precise and Robust Ship Detection for High-Resolution SAR Imagery Based on HR-SDNet. Remote Sens. 2020, 12, 167. [Google Scholar] [CrossRef] [Green Version]
- Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Lei, L.; Zou, H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
- An, Q.; Pan, Z.; Liu, L.; You, H. DRBox-v2: An Improved Detector with Rotatable Boxes for Target Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8333–8349. [Google Scholar] [CrossRef]
- Xiao, X.; Zhou, Z.; Wang, B.; Li, L.; Miao, L. Ship Detection under Complex Backgrounds Based on Accurate Rotated Anchor Boxes from Paired Semantic Segmentation. Remote Sens. 2019, 11, 2506. [Google Scholar] [CrossRef] [Green Version]
- Shahzad, M.; Maurer, M.; Fraundorfer, F.; Wang, Y.; Zhu, X.X. Buildings Detection in VHR SAR Images Using Fully Convolution Neural Networks. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1100–1116. [Google Scholar] [CrossRef] [Green Version]
- Chen, G.; Zhang, X.; Wang, Q.; Dai, F.; Gong, Y.; Zhu, K. Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1633–1644. [Google Scholar] [CrossRef]
- Yu, B.; Yang, L.; Chen, F. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3252–3261. [Google Scholar] [CrossRef]
- Peng, C.; Li, Y.; Jiao, L.; Chen, Y.; Shang, R. Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019. [Google Scholar] [CrossRef]
- Nogueira, K.; Dalla Mura, M.; Chanussot, J.; Schwartz, W.R.; Dos Santos, J.A. Dynamic multicontext segmentation of remote sensing images based on convolutional networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7503–7520. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Loy, C.C. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4974–4983. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. arXiv 2019, arXiv:1906.09756. [Google Scholar] [CrossRef] [Green Version]
- Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; Wang, X. Mask scoring r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 6409–6418. [Google Scholar]
- Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Li, Z.; Zhou, F. FSSD: Feature fusion single shot multibox detector. arXiv 2017, arXiv:1712.00960. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; MIT PRESS: Cambridge, MA, USA, 2015; pp. 91–99. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Chen, L.C.; Hermans, A.; Papandreou, G.; Schroff, F.; Wang, P.; Adam, H. Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4013–4022. [Google Scholar]
- Liang, X.; Lin, L.; Wei, Y.; Shen, X.; Yang, J.; Yan, S. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2978–2991. [Google Scholar] [CrossRef] [PubMed]
- Bai, M.; Urtasun, R. Deep watershed transform for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5221–5229. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. arXiv 2019, arXiv:1902.09212. [Google Scholar]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Wada, K. labelme: Image Polygonal Annotation with Python. Available online: https://github.com/wkentaro/labelme (accessed on 20 July 2018).
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Zhang, Z. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
- Attema, E.; Cafforio, C.; Gottwald, M.; Guccione, P.; Guarnieri, A.M.; Rocca, F.; Snoeij, P. Flexible dynamic block adaptive quantization for Sentinel-1 SAR missions. IEEE Geosci. Remote Sens. Lett. 2010, 7, 766–770. [Google Scholar] [CrossRef]
FPN | HRFPN | ISNetV1 | ISNetV2 | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|---|
✓ | ✓ | 65.1 | 94.8 | 83.4 | 65.7 | 65.0 | 20.0 | ||
✓ | ✓ | 67.2 | 95.6 | 85.0 | 66.7 | 68.9 | 16.7 | ||
✓ | ✓ | 66.4 | 96.1 | 84.4 | 66.3 | 67.7 | 53.6 | ||
✓ | ✓ | 67.4 | 96.4 | 85.8 | 67.2 | 69.5 | 54.5 |
FPN | HRFPN | ISNetV1 | ISNetV2 | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|---|
✓ | ✓ | 60.3 | 92.3 | 66.6 | 45.3 | 60.7 | 67.3 | ||
✓ | ✓ | 65.6 | 94.5 | 72.7 | 52.7 | 66.0 | 77.9 | ||
✓ | ✓ | 65.1 | 94.5 | 72.0 | 49.6 | 65.9 | 76.6 | ||
✓ | ✓ | 67.2 | 94.6 | 74.2 | 52.1 | 67.8 | 77.5 |
Backbone | AP | AP50 | AP75 | APS | APM | APL | Time (ms) | Param (M) | Flops |
---|---|---|---|---|---|---|---|---|---|
R-50-FPN | 64.4 | 95.1 | 81.0 | 65.4 | 62.3 | 12.7 | 51.8 | 43.75 | 198.02 |
R-101-FPN | 64.5 | 95.7 | 82.6 | 64.7 | 65.0 | 22.0 | 63.3 | 62.74 | 244.27 |
HRFPN-W18 | 65.0 | 95.7 | 82.7 | 65.8 | 63.5 | 13.4 | 65.8 | 29.71 | 186.13 |
HRFPN-W32 | 65.5 | 95.8 | 84.0 | 66.2 | 65.3 | 20.6 | 74.1 | 49.50 | 245.65 |
HRFPN-W40 | 66.0 | 96.2 | 85.0 | 66.5 | 65.5 | 15.1 | 86.2 | 65.75 | 293.56 |
Backbone | AP | AP50 | AP75 | APS | APM | APL | Time (ms) | Param (M) | Flops |
---|---|---|---|---|---|---|---|---|---|
R-50+FPN | 56.2 | 90.2 | 60.7 | 40.9 | 56.6 | 61.1 | 61.0 | 43.80 | 198.25 |
R-101-FPN | 57.4 | 91.7 | 62.8 | 41.0 | 57.5 | 60.5 | 71.4 | 62.79 | 244.5 |
HRFPN-W18 | 58.0 | 89.9 | 64.9 | 43.3 | 58.9 | 64.3 | 75.2 | 29.75 | 186.36 |
HRFPN-W32 | 59.7 | 91.1 | 64.7 | 46.3 | 60.1 | 64.0 | 83.3 | 49.55 | 245.87 |
HRFPN-W40 | 60.7 | 92.7 | 65.5 | 47.2 | 61.6 | 64.0 | 96.2 | 65.80 | 293.79 |
Backbone | ISNetV2 | AP | AP50 | AP75 | APS | APM | APL | Time (ms) | Param (M) | Flops |
---|---|---|---|---|---|---|---|---|---|---|
R-50-FPN | - | 65.1 | 94.8 | 82.6 | 65.7 | 64.4 | 20.0 | 72.5 | 76.80 | 359.65 |
✓ | 65.9 | 96.1 | 83.5 | 66.0 | 66.9 | 30.8 | 71.9 | 76.99 | 362.24 | |
R-101-FPN | - | 65.0 | 94.8 | 83.4 | 65.5 | 65.0 | 12.0 | 86.2 | 95.79 | 405.90 |
✓ | 66.4 | 95.8 | 84.4 | 66.3 | 67.7 | 53.6 | 84.7 | 95.99 | 408.49 |
Backbone | ISNetV2 | AP | AP50 | AP75 | APS | APM | APL | Time (ms) | Param (M) | Flops |
---|---|---|---|---|---|---|---|---|---|---|
R-50-FPN | - | 59.8 | 91.9 | 66.6 | 45.3 | 60.0 | 67.3 | 81.3 | 76.83 | 360.22 |
✓ | 64.2 | 93.9 | 72.0 | 49.2 | 64.7 | 69.3 | 104.1 | 77.03 | 362.81 | |
R-101-FPN | - | 60.3 | 92.3 | 65.6 | 44.6 | 60.7 | 62.4 | 100.0 | 95.82 | 406.47 |
✓ | 65.1 | 94.5 | 71.4 | 49.6 | 65.9 | 76.6 | 117.6 | 96.02 | 409.06 |
Model | Backbone | Time (ms) | Model Size | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | R-50-FPN | 51.8 | 351M | 64.4 | 95.1 | 81.0 | 65.4 | 62.3 | 8.5 |
R-101-FPN | 63.3 | 503M | 64.5 | 95.7 | 82.6 | 64.7 | 65.0 | 22.0 | |
Mask Scoring R-CNN | R-50-FPN | 50.8 | 481M | 64.1 | 94.2 | 81.0 | 64.8 | 62.8 | 11.7 |
R-101-FPN | 63.3 | 633M | 64.8 | 95.0 | 82.4 | 65.0 | 64.7 | 13.4 | |
Cascade Mask R-CNN | R-50-FPN | 72.5 | 615M | 65.1 | 94.8 | 82.6 | 65.7 | 64.4 | 20.0 |
R-101-FPN | 86.2 | 768M | 65.0 | 94.8 | 83.4 | 65.5 | 65.0 | 12.0 | |
Hybrid Task Cascade | R-50-FPN | 119.0 | 639M | 66.0 | 95.2 | 84.0 | 66.3 | 66.4 | 25.8 |
R-101-FPN | 133.3 | 791M | 66.9 | 95.3 | 85.0 | 66.6 | 67.7 | 29.6 | |
HQ-ISNet | HRFPN-W18 | 87.0 | 504M | 66.5 | 96.2 | 84.4 | 66.5 | 67.7 | 29.7 |
HRFPN-W32 | 93.5 | 662M | 67.3 | 96.3 | 85.8 | 67.2 | 68.8 | 24.1 | |
HRFPN-W40 | 106.4 | 792M | 67.4 | 96.4 | 84.5 | 66.9 | 69.5 | 54.5 |
Model | Backbone | Time (ms) | Model Size | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | R-50-FPN | 61.0 | 351M | 56.2 | 90.2 | 60.7 | 40.9 | 56.6 | 61.1 |
R-101-FPN | 71.4 | 503M | 57.4 | 91.7 | 62.8 | 41.0 | 57.5 | 60.5 | |
Mask Scoring R-CNN | R-50-FPN | 59.9 | 481M | 57.7 | 89.9 | 63.4 | 42.0 | 58.8 | 61.6 |
R-101-FPN | 71.9 | 633M | 58.8 | 91.3 | 64.9 | 41.7 | 59.1 | 65.7 | |
Cascade Mask R-CNN | R-50-FPN | 81.3 | 615M | 59.8 | 91.9 | 66.6 | 45.3 | 60.0 | 67.3 |
R-101-FPN | 100.0 | 768M | 60.3 | 92.3 | 65.6 | 44.6 | 60.7 | 62.4 | |
Hybrid Task Cascade | R-50-FPN | 156.2 | 639M | 65.0 | 94.1 | 72.9 | 48.3 | 65.5 | 69.8 |
R-101-FPN | 166.7 | 791M | 65.7 | 94.4 | 73.4 | 50.7 | 66.2 | 75.8 | |
HQ-ISNet | HRFPN-W18 | 120.5 | 504M | 65.6 | 93.9 | 72.2 | 50.6 | 65.9 | 76.2 |
HRFPN-W32 | 128.2 | 662M | 65.9 | 94.2 | 72.6 | 52.1 | 66.1 | 76.6 | |
HRFPN-W40 | 137.0 | 792M | 67.2 | 94.6 | 74.2 | 51.9 | 67.8 | 77.5 |
Data Set | Backbone | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|
SSDD | HRFPN-W18 | 0.072 | 0.143 | 0.373 | 0.047 | 0.077 | 47.712 |
HRFPN-W32 | 0.018 | 0.003 | 0.453 | 0.033 | 0.087 | 45.837 | |
HRFPN-W40 | 0.073 | 0.003 | 0.208 | 0.035 | 0.357 | 106.652 | |
NWPU VHR-10 | HRFPN-W18 | 0.053 | 0.063 | 0.360 | 0.522 | 0.093 | 1.175 |
HRFPN-W32 | 0.082 | 0.497 | 0.493 | 1.36 | 0.128 | 2.022 | |
HRFPN-W40 | 0.093 | 0.075 | 1.007 | 1.333 | 0.035 | 1.717 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery. Remote Sens. 2020, 12, 989. https://doi.org/10.3390/rs12060989
Su H, Wei S, Liu S, Liang J, Wang C, Shi J, Zhang X. HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery. Remote Sensing. 2020; 12(6):989. https://doi.org/10.3390/rs12060989
Chicago/Turabian StyleSu, Hao, Shunjun Wei, Shan Liu, Jiadian Liang, Chen Wang, Jun Shi, and Xiaoling Zhang. 2020. "HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery" Remote Sensing 12, no. 6: 989. https://doi.org/10.3390/rs12060989