AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection
<p>Diagrams of raw point clouds, voxels, and bird’s eye view. <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> is the number of input features; <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> </mrow> </semantics></math> is the number of output features. <span class="html-italic">N</span> is the number of gathered features in non-empty voxels. GEMM represents the general matrix multiplication-based algorithm [<a href="#B10-remotesensing-14-01176" class="html-bibr">10</a>].</p> "> Figure 2
<p>The architecture of AFE-RCNN. (<b>a</b>) The residual of dual attention proposal generation module. (<b>b</b>) Multi-scale adaptive feature extraction module based on point clouds. (<b>c</b>) Refinement loss function module with vertex associativity.</p> "> Figure 3
<p>The architecture of the RDA module. The symbol ⊗ represents element addition, and the symbol ⊙ represents element multiplication. The drawing of this figure is inspired by ref. [<a href="#B30-remotesensing-14-01176" class="html-bibr">30</a>]. Conv2D represents a 2D convolution operation, Deconv represents a deconvolution operation. Layernorm represents normalization of the channel dimensions. The other symbols are described in this section.</p> "> Figure 4
<p>The architecture of MSAA. In local region <span class="html-italic">Q</span> with <span class="html-italic">N</span> points, (<b>a</b>) is the feature map <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> </semantics></math>, and the size of it is <math display="inline"><semantics> <mrow> <mrow> <mi>C</mi> <mtext> </mtext> <mo>×</mo> <mtext> </mtext> <mi>N</mi> </mrow> </mrow> </semantics></math>. <math display="inline"><semantics> <mi>C</mi> </semantics></math> is the number of channels in each point features. (<b>b</b>) Difference map <math display="inline"><semantics> <mrow> <msubsup> <mi>F</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>f</mi> </mrow> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msubsup> </mrow> </semantics></math>. For each <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> </semantics></math>, the difference of point features between each point pair is <math display="inline"><semantics> <mrow> <msubsup> <mi>F</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>f</mi> </mrow> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msubsup> <msub> <mrow> <mrow> <mo>=</mo> <mo>[</mo> <mi>F</mi> </mrow> </mrow> <mn>1</mn> </msub> <msub> <mrow> <mrow> <mo>−</mo> <mi>F</mi> </mrow> </mrow> <mi>i</mi> </msub> <msub> <mrow> <mrow> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>F</mi> </mrow> </mrow> <mi>i</mi> </msub> <mo>,</mo> <mo>…</mo> <msub> <mrow> <mrow> <mo>,</mo> <mi>F</mi> </mrow> </mrow> <mi>N</mi> </msub> <msub> <mrow> <mrow> <mo>−</mo> <mi>F</mi> </mrow> </mrow> <mi>i</mi> </msub> <mo>]</mo> </mrow> </semantics></math>. (<b>c</b>) Impact map <math display="inline"><semantics> <mrow> <msub> <mi>β</mi> <mi>i</mi> </msub> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <msubsup> <mi>F</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>f</mi> </mrow> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msubsup> </mrow> </semantics></math> learns the amount of impact among point features through MLP to get the impact map. (<b>d</b>) Adjusted feature map <math display="inline"><semantics> <mrow> <msubsup> <mi>F</mi> <mi>i</mi> <mo>′</mo> </msubsup> </mrow> </semantics></math>, it is the final adjustment vector.</p> "> Figure 5
<p>The RP curves of 3D detection for cars (thresh = 0.7) and cyclists (thresh = 0.5). (<b>a</b>) 3D detection for cars, (<b>b</b>) 3D detection for cyclists.</p> "> Figure 6
<p>Visualization results of 3D object detection. (<b>a</b>–<b>d</b>) are 4 difference scenes. The first block shows the results of PointRCNN, the second block shows the results of PV-RCNN, and the third block shows the results of our AFE-RCNN. The ground truth box is green, the predicted box of the car is red, and the predicted box of the cyclist is yellow. The circled part is the region to focus on for comparison.</p> ">
Abstract
:1. Introduction
- We design a residual of dual attention proposal generation module, i.e., RDA module. This module learns the correlation of features in both channel branch and spatial branch, while reducing the loss of the information transmission process. The RDA module leads to the higher quality box proposals and enhances the features of BEV. The enhanced features are beneficial to the box proposal refinement.
- We design a multi-scale feature extraction module based on feature adaptive adjustment, i.e., MSAA. The proposed module uses the multi-scale feature extraction method to enhance the robustness of sparse point cloud features. Meanwhile, to fully mine the neighboring contextual information among all points, we introduce a feature adaptive adjustment method to make the key points better describe the local neighborhood region.
- We design a loss function module based on vertex associativity, i.e., VA module. This module constructs a regression loss function based on the projection of the 3D detection box into the BEV coordinate system and the DIoU loss.
2. Related Work
3. AFE-RCNN for Point Cloud Object Detection
3.1. The Residual of Dual Attention Proposal Generation Module
3.2. Multi-Scale Feature Extraction Module Based on Adaptive Feature Adjustment
3.3. Refinement Loss Function Module with Vertex Associativity
3.4. Training Losses
- 1
- The proposal generation network performs the classification and regression of anchor boxes based on the BEV features. The region proposal loss is:
- 2
- The key point segmentation loss is used to filter the foreground, where the calculation method is the same as the classification loss .
- 3
- The refinement network performs the confidence prediction and regression of the box proposals based on the rich feature information of key points. is used for confidence prediction, while in Section 3.3 is used for box regression. The proposal refinement loss is:
4. Experiments and Results
4.1. Dataset and Implementation Details
4.2. Evaluation on the KITTI Online Test Server
4.3. Ablation Experiments Based on KITTI Validation Set
4.4. Evaluation on the KITTI Validation Set
4.5. Qualitative Analysis on the KITTI Dataset
4.6. Validation on the Waymo Open Dataset
4.7. Efficiency and Robustness Analysis of the Proposed Algorithm
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2443–2451. [Google Scholar]
- Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
- Wang, Q.; Chen, J.; Deng, J.; Zhang, X. 3D-CenterNet: 3D object detection network for point clouds with center estimation pri-ority. Pattern Recognit. 2021, 115, 107884. [Google Scholar] [CrossRef]
- Wang, L.; Zhao, D.; Wu, T.; Fu, H.; Wang, Z.; Xiao, L.; Xu, X.; Dai, B. Drosophila-inspired 3D moving object detection based on point clouds. Inf. Sci. 2020, 534, 154–171. [Google Scholar] [CrossRef]
- Li, X.; Guivant, J.; Khan, S. Real-time 3D object proposal generation and classification using limited processing resources. Robot. Auton. Syst. 2020, 130, 103557. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d Object Proposal Generation and Detection From Point Cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Shi, W.; Rajkumar, R. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1708–1716. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-Based 3D Single Stage Object Detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11037–11045. [Google Scholar]
- Yan, Y.; Mao, Y.X.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, S.; Wang, Z.; Shi, J.; Wang, X.; Li, H. From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2647–2664. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ye, Y.; Chen, H.; Zhang, C.; Hao, X.; Zhang, Z. Sarpnet: Shape attention regional proposal network for lidar-based 3d object detection. Neurocomputing 2020, 379, 53–63. [Google Scholar] [CrossRef]
- Wang, L.; Fan, X.; Chen, J.; Cheng, J.; Tan, J.; Ma, X. 3D object detection based on sparse convolution neural network and feature fusion for autonomous driving in smart cities. Sustain. Cities Soc. 2020, 54, 102002. [Google Scholar] [CrossRef]
- Zheng, W.; Tang, W.; Chen, S.; Jiang, L.; Fu, C.W. CIA-SSD: Confident IoU-Aware Single-Stage Object Detector from Point Cloud. arXiv 2020, arXiv:2012.03015. [Google Scholar]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 18–20 June 2019; pp. 12697–12705. [Google Scholar]
- Du, L.; Ye, X.; Tan, X.; Feng, J.; Xu, Z.; Ding, E.; Wen, S. Associate-3Ddet: Perceptual-to-conceptual association for 3D point cloud object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13326–13335. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10526–10535. [Google Scholar]
- Graham, B.; Maaten, L. Submanifold sparse convolutional networks. arXiv 2017, arXiv:1706.01307. [Google Scholar]
- Yang, B.; Liang, M.; Urtasun, R. Hdnet: Exploiting hd maps for 3d object detection. Conference on Robot Learning. PMLR 2018, 87, 146–155. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Charles, R.Q.; Li, Y.; Hao, S.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5099–5108. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 2018, 31, 820–830. [Google Scholar]
- Zhao, H.; Jiang, L.; Fu, C.-W.; Jia, J. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 18–20 June 2019; pp. 5560–5568. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Liu, H.; Liu, F.; Fan, X.; Huang, D. Polarized Self-Attention: Towards High-quality Pixel-wise Regression. arXiv 2021, arXiv:2107.00782. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, New York, NY, USA, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over union: A metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 18–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Simonelli, A.; Bulo, S.R.; Porzi, L.; Lopez-Antequera, M.; Kontschieder, P. Disentangling Monocular 3D Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October 2019; pp. 1991–1999. [Google Scholar]
- Zhang, H.; Yang, D.; Yurtsever, E.; Redmill, K.A.; Ozguner, O. Faraway-Frustum: Dealing with Lidar Sparsity for 3D Object Detection using Fusion. In Proceedings of the IEEE International Intelligent Transportation Systems Conference, Indianapolis, IN, USA, 19–22 September 2021; pp. 2646–2652. [Google Scholar]
- Deng, J.; Zhou, W.; Zhang, Y.; Li, H. From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4722–4734. [Google Scholar] [CrossRef]
- Gustafsson, F.K.; Danelljan, M.; Schon, T.B. Accurate 3D Object Detection using Energy-Based Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2855–2864. [Google Scholar]
- Huang, T.; Liu, Z.; Chen, X.; Bai, X. EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 35–52. [Google Scholar]
- Li, J.; Dai, H.; Shao, L.; Ding, Y. Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 553–562. [Google Scholar]
- Liu, Z.; Zhao, X.; Huang, T.; Hu, R.; Zhou, Y.; Bai, X. TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11677–11684. [Google Scholar]
- He, C.; Zeng, H.; Huang, J.; Hua, X.-S.; Zhang, L. Structure Aware Single-Stage 3D Object Detection from Point Cloud. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11873–11882. [Google Scholar]
- Xu, Q.; Zhou, Y.; Wang, W.; Qi, C.R. Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation. In Proceedings of the IEEE International Conference on Computer Vision, Virtual. 21–25 June 2021; pp. 15446–15456. [Google Scholar]
- He, Y.; Xia, G.; Luo, Y.; Su, L.; Zhang, Z.; Li, W.; Wang, P. DVFENet: Dual-branch voxel feature extraction network for 3D object detection. Neurocomputing 2021, 459, 201–211. [Google Scholar] [CrossRef]
- Zheng, W.; Tang, W.; Jiang, L.; Fu, C.-W. SE-SSD: Self-Ensembling Single-Stage Object Detector from Point Cloud. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14494–14503. [Google Scholar]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3D Object Detection and Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
Voxel Features | BEV Features | Raw Point Cloud Features | |
---|---|---|---|
√ | 84.54 | ||
√ | √ | 84.69 | |
√ | √ | √ | 84.72 |
Method | Modality | Car-3D Detection | Cyclist-3D Detection | ||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
EBM3DOD [39] (CVPR Workshops 2021) | Point | 91.05 | 80.12 | 72.78 | - | - | - |
Faraway-Frustum [37] (IEEE ITSC 2021) | 87.45 | 79.05 | 76.14 | 77.36 | 62.00 | 55.40 | |
3DSSD [9] (CVPR 2020 Oral) | 88.36 | 79.57 | 74.55 | 82.48 | 64.10 | 56.90 | |
Point-GNN [8] (CVPR 2020) | 88.33 | 79.47 | 72.29 | 78.60 | 63.48 | 57.08 | |
EPNet [40] (ECCV 2020) | 89.81 | 79.28 | 74.59 | - | - | - | |
PointRCNN [7] (CVPR 2019) | 86.96 | 75.64 | 70.70 | 74.96 | 58.82 | 52.53 | |
CIA-SSD [14] (AAAI 2021) | Voxel | 89.59 | 80.28 | 72.87 | 78.69 | 61.59 | 55.30 |
MGAF-3DSSD [41] (ACMMM 2021) | 88.16 | 79.68 | 72.39 | 80.64 | 63.43 | 55.15 | |
Part-A2 [11] (TPAMI 2020) | 87.81 | 78.49 | 73.51 | 79.17 | 63.52 | 56.93 | |
TANet [42] (AAAI 2020) | 84.39 | 75.94 | 68.82 | 75.70 | 59.44 | 52.53 | |
Associate-3Ddet [16] (CVPR 2020) | 85.99 | 77.40 | 70.53 | - | - | - | |
PointPillars [15] (CVPR 2019) | 82.58 | 74.31 | 68.99 | 77.10 | 58.65 | 51.92 | |
SA-SSD [43] (CVPR 2020) | Point-Voxel combination | 88.75 | 79.79 | 74.16 | - | - | - |
SPG [44] (ICCV 2021) | 90.50 | 82.13 | 78.90 | - | - | - | |
DVFENet [45] (Neurocomputing 2021) | 86.20 | 79.18 | 74.58 | 79.17 | 63.52 | 56.93 | |
H^23D R-CNN [38] (TCSVT2021) | 90.43 | 81.55 | 77.22 | 78.67 | 62.74 | 55.78 | |
SE-SSD [46] (CVPR 2021) | 91.49 | 82.54 | 77.15 | - | - | - | |
PV-RCNN [17] (CVPR2020) | 90.25 | 81.43 | 76.82 | 78.60 | 63.71 | 57.65 | |
AFE-RCNN(Ours) | 88.41 | 81.53 | 77.03 | 82.78 | 67.50 | 61.18 |
Method | Modality | Car-Orientation | Cyclist-Orientation | ||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
EBM3DOD [39] (CVPR Workshops 2021) | Point | 96.39 | 92.88 | 87.58 | - | - | - |
Point-GNN [8] (CVPR 2020) | 88.33 | 79.47 | 72.29 | - | - | - | |
EPNet [40] (ECCV 2020) | 96.13 | 94.22 | 89.68 | - | - | - | |
PointRCNN [7] (CVPR 2019) | 95.90 | 91.77 | 86.92 | 85.94 | 72.81 | 65.84 | |
CIA-SSD [14] (AAAI 2021) | Voxel | 96.65 | 93.34 | 85.76 | - | - | - |
TANet [42] (AAAI 2020) | 93.52 | 90.11 | 84.61 | 81.15 | 66.37 | 60.10 | |
MGAF-3DSSD [41] (ACMMM 2021) | 94.45 | 93.77 | 86.25 | 86.28 | 70.16 | 62.99 | |
Part-A2 [11] (TPAMI 2020) | 95.00 | 91.73 | 88.86 | 88.70 | 77.52 | 70.41 | |
Associate-3Ddet [16] (CVPR 2020) | 0.52 | 1.20 | 1.38 | - | - | - | |
PointPillars [15] (CVPR 2019) | 93.84 | 90.70 | 87.47 | 83.97 | 68.55 | 61.71 | |
SA-SSD [43] (CVPR 2020) | Point-Voxel combination | 39.40 | 38.30 | 37.07 | - | - | - |
SPG [44] (ICCV 2021) | 40.02 | 38.73 | 38.52 | - | - | - | |
H^23D R-CNN [38] (TCSVT2021) | 96.13 | 93.03 | 90.33 | 85.09 | 72.20 | 65.25 | |
DVFENet [45] (Neurocomputing 2021) | 95.33 | 94.44 | 91.55 | 85.48 | 73.43 | 66.87 | |
SE-SSD [46] (CVPR 2021) | 96.55 | 95.17 | 90.00 | - | - | - | |
PV-RCNN [17] (CVPR2020) | 98.15 | 94.57 | 91.85 | 86.43 | 79.70 | 72.96 | |
AFE-RCNN(Ours) | 95.84 | 94.63 | 92.07 | 88.89 | 79.18 | 73.65 |
Method | Proposed Modules | 3D Detection (mAP|40) | ||||||||
RDA | MSAA | VA | Easy | Moderate | Hard | |||||
Car | Cyclist | Car | Cyclist | Car | Cyclist | |||||
PV-RCNN Baseline | 92.10 | 89.10 | 84.36 | 70.38 | 82.48 | 66.17 | ||||
Ours | Ours1 | √ | 92.18 | 90.59 | 84.82 | 72.15 | 82.60 | 67.69 | ||
Ours2 | √ | √ | 92.27 | 91.21 | 85.05 | 75.01 | 82.91 | 70.36 | ||
AFE-RCNN | √ | √ | √ | 92.35 | 91.54 | 85.31 | 75.39 | 83.08 | 70.79 | |
RDA | MSAA | VA | Orientation Estimation (AOS|40) | |||||||
PV-RCNN Baseline | 98.25 | 97.04 | 94.26 | 82.11 | 92.07 | 77.88 | ||||
AFE-RCNN | √ | √ | √ | 98.18 | 94.50 | 94.29 | 84.24 | 94.01 | 81.01 |
Method | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | ||||
Car | Cyclist | Car | Cyclist | Car | Cyclist | |
PV-RCNN Baseline | 92.10 | 89.10 | 84.36 | 70.38 | 82.48 | 66.17 |
+DA block | 91.17 | 88.34 | 82.98 | 70.21 | 82.55 | 65.13 |
+RDA block | 92.18 | 90.59 | 84.82 | 72.15 | 82.60 | 67.69 |
IOU Thresh Car = 0.7 | |||
---|---|---|---|
Car | |||
Easy | Moderate | Hard | |
PointRCNN [7] (CVPR 2019) | 88.88 | 78.63 | 77.38 |
3DSSD [9] (CVPR 2020 Oral) | 89.71 | 79.45 | 78.67 |
SA-SSD [43] (CVPR 2020) | 90.15 | 79.91 | 78.78 |
Part-A2 [11] (TPAMI 2020) | 89.47 | 79.47 | 78.54 |
CIA-SSD [14] (AAAI 2021) | 90.04 | 79.81 | 78.80 |
TANet [42] (AAAI 2020) | 87.52 | 76.64 | 73.86 |
DVFENet [45] (Neurocomputing 2021) | 89.81 | 79.52 | 78.35 |
H^23D R-CNN [38] (TCSVT2021) | 89.63 | 85.20 | 79.08 |
SE-SSD [46] (CVPR 2021) | 90.21 | 86.25 | 79.22 |
PV-RCNN [17] (CVPR 2020) | 89.34 | 83.69 | 78.70 |
AFE-RCNN(Ours) | 89.61 | 83.99 | 79.18 |
Method | Vehicle (LEVEL 1) | Vehicle (LEVEL 2) | Cyclist (LEVEL 1) | Cyclist (LEVEL 2) | ||||
---|---|---|---|---|---|---|---|---|
mAP | mAPH | mAP | mAPH | mAP | mAPH | mAP | mAPH | |
Centerpoint [47] (CVPR 2021) | 65.98 | 65.40 | 57.98 | 57.47 | 63.05 | 61.68 | 60.72 | 59.39 |
PointPillars [15] (CVPR 2019) | 65.06 | 64.29 | 57.11 | 56.41 | 49.95 | 43.47 | 48.05 | 41.82 |
SECOND [10] (Sensors 2018) | 65.83 | 65.12 | 57.80 | 57.17 | 47.44 | 38.68 | 45.65 | 37.21 |
PV-RCNN [17] (CVPR 2020) | 71.23 | 70.53 | 62.58 | 61.96 | 58.87 | 40.29 | 56.36 | 39.11 |
AFE-RCNN (Ours) | 71.23 | 70.54 | 62.62 | 61.99 | 59.69 | 43.14 | 57.44 | 41.51 |
Method | Running Time (s) | |
---|---|---|
KITTI | Waymo Open Dataset | |
PV-RCNN Baseline | 0.0286 | 0.1033 |
AFE-RCNN (Ours) |
Class Number of the Objects | Running Time (s) | |||
---|---|---|---|---|
1 class | Car | 0.0279 | ||
85.33 | ||||
2 classes | Car | Cyclist | 0.0286 | |
85.29 | 75.39 | |||
3 classes | Car | Cyclist | Pedestrian | 0.0289 |
85.31 | 75.39 | 59.67 |
Method | Pedestrian | |||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | ||||
mAP|40 | AOS|40 | mAP|40 | AOS|40 | mAP|40 | AOS|40 | |
PV-RCNN Baseline | 62.71 | 67.82 | 54.49 | 62.17 | 49.88 | 58.07 |
AFE-RCNN | 66.19 | 71.94 | 59.67 | 66.28 | 54.97 | 63.02 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shuang, F.; Huang, H.; Li, Y.; Qu, R.; Li, P. AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection. Remote Sens. 2022, 14, 1176. https://doi.org/10.3390/rs14051176
Shuang F, Huang H, Li Y, Qu R, Li P. AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection. Remote Sensing. 2022; 14(5):1176. https://doi.org/10.3390/rs14051176
Chicago/Turabian StyleShuang, Feng, Hanzhang Huang, Yong Li, Rui Qu, and Pei Li. 2022. "AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection" Remote Sensing 14, no. 5: 1176. https://doi.org/10.3390/rs14051176