Vision-Guided Object Recognition and 6D Pose Estimation System Based on Deep Neural Network for Unmanned Aerial Vehicles towards Intelligent Logistics
<p>The hardware setup and workflow of the proposed vision system. (1) UAV: the UAV obtains pictures through the RGB-D camera, then transmits the data to the server via a WIFI system. (2) Server: the images are processed to detect, segment, classify the target of interest, and estimate its pose parameters. (3) Workflow: object detection: detect the target of interest in flight, the distance from the target is usually greater than 3 m; object tracking: approach and track the target, the distance to the target ranges from 1 to 3 m; semantic segmentation: segment the target pixels from the background and classify the target; 6D pose estimation: estimate the 6D pose parameters pixel-by-pixel in the target region. For segmentation and pose estimation, the distance is usually less than 1 m.</p> "> Figure 2
<p>UAV (<b>left</b>), the Percipio RGB-D camera (<b>right</b>).</p> "> Figure 3
<p>The process of object detection. The original image is cut into several 300 × 300 sub-images, and then the SSD algorithm is applied to each sub-image, respectively. Finally, these bounding boxes are mapped to the original image.</p> "> Figure 4
<p>The overall framework of the segmentation network. The encoder network uses convolutional layers and MaxPooling layers to extract features of different scales from RGB and depth images. The decoder layer uses convolution and Upsampling to output pixel-level classification results.</p> "> Figure 5
<p>Objects with the same textures and different shapes, the <b>left</b> object is a cup with a handle and the <b>right</b> object is another side of the cup with the handle hidden. The depth image is next to the color image.</p> "> Figure 6
<p>The overall framework of the classification network. Two convolutional networks are designed to extract color and geometric features, respectively, then fuse those features, and utilize fully connected layers to obtain results.</p> "> Figure 7
<p>Illustration of object 6D pose estimation. The pose transformation matrix between the object coordinate system and the camera coordinate system is composed of the rotation matrix <span class="html-italic">R</span> and the translation vector <span class="html-italic">t</span>.</p> "> Figure 8
<p>The overall framework of the 6D pose estimation network. (<b>A</b>) 6D pose estimation network: Firstly, the color and geometric features are extracted from two backbones and fused at the pixel level. Then, a predictor is used to regress the pose. Finally, the iterative refinement network is used to refine the pose. (<b>B</b>) Color Feature Extracting Backbone: Our backbone is based on ResNet18, for detail, the output of layer2, layer3, and layer4 and the output of PPM go through an RRB module to strengthen the features. Then the features are fused from different scales to improve the expressiveness of features.</p> "> Figure 9
<p>The objects present in the SIAT dataset.</p> "> Figure 10
<p>The Motion Capture System, OptiTrack, and its corresponding software interface.</p> "> Figure 11
<p>Some qualitative results of our segmentation network and other state-of-the-art segmentation networks. Different colors represent different categories.</p> "> Figure 12
<p>Quantitative evaluation of DenseFusion and the pose estimation network proposed in the paper on SIAT pose dataset.</p> "> Figure 13
<p>In order to obtain a more intuitive comparison, some qualitative results are shown here. All results are predicted based on the same segmentation mask as PoseCNN. The dots of different colors represent objects of different categories. The name below the image indicates the name of the method.</p> ">
Abstract
:1. Introduction
- Object detection: The RGB-D camera on the UAV can continuously capture images during flights and then transmit them to the server via the WIFI system. The target of interest can be detected and located by the single shot multibox detector (SSD) [5] algorithm.
- Object tracking: Once the target is detected, the UAV will continue to track and approach it steadily.
- Semantic Segmentation: The semantic segmentation network processes the image combined with the color and depth information and outputs an accurate segmentation mask. Furthermore, for objects with the same textures but different 3D geometries, as described in the subsection about object classification, we introduced a classification network to distinguish them.
- 6D Pose Estimation: The 6D pose estimation network calculates the pose parameters of the target in the segmented image and transmits them to the UAV.
- We built a practical vision system and reliable visual assistance of express delivery that expands collaborative work between humans and UAVs by enabling accurate localization and 6D pose parameters of the targets.
- We proposed a semantic segmentation network with a novel feature fusion structure, which provides more comprehensive semantic information for segmentation by connecting different features at different layers to fuse color and depth information.
- We proposed an innovative 6D pose estimation network that uses the pyramid pooling module (PPM) and the refined residual block (RRB) module in the color feature extraction backbone to enhance features to accurately generate 6D pose parameters for the target.
- We constructed a dataset with the ground truth of segmentation masks and 6D pose parameters to evaluate the performance of the algorithm of the system, namely the SIAT dataset.
2. Related Work
3. System and Methodology
3.1. Hardware Setup
3.2. System Overview
3.3. Object Detection
3.4. Object Tracking
3.5. Semantic Segmentation
- To better utilize the geometric information of the environment, in addition to the RGB images, we input the depth images into the segmentation network. Therefore, our encoder network consists of two branches, with one branch extracting color features and the other extracting geometric features. Next, the color and geometric features of the Max-Pooling layer will be concatenated at each downsampling stage to reinforce the expressiveness of the features.
3.6. Object Classification
3.7. Object 6D Pose Estimation
4. Experiments
4.1. Datasets
4.2. Metrics
4.3. Experiments on the SIAT Dataset
4.4. Experiments on the YCB-Video Dataset
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, Q.; Ye, H.; Huang, K.; Zha, Y.; Shi, L. Estimation of leaf area index of sugarcane using crop surface model based on UAV image. Trans. Chin. Soc. Agric. Eng. 2017, 33, 104–111. [Google Scholar]
- Viguier, R.; Lin, C.C.; Aliakbarpour, H.; Bunyak, F.; Pankanti, S.; Seetharaman, G.; Palaniappan, K. Automatic Video Content Summarization Using Geospatial Mosaics of Aerial Imagery. In Proceedings of the 2015 IEEE International Symposium on Multimedia (ISM), Miami, FL, USA, 14–16 December 2015. [Google Scholar]
- Thomas, J.; Loianno, G.; Daniilidis, K.; Kumar, V. The role of vision in perching and grasping for MAVs. In Proceedings of the Micro- & Nanotechnology Sensors, Systems, & Applications VIII, Baltimore, MD, USA, 17–21 April 2016. [Google Scholar]
- Thomas, J.; Loianno, G.; Daniilidis, K.; Kumar, V. Visual Servoing of Quadrotors for Perching by Hanging from Cylindrical Objects. IEEE Robot. Autom. Lett. 2016, 1, 57–64. [Google Scholar] [CrossRef] [Green Version]
- Kehl, W.; Manhardt, F.; Tombari, F.; Ilic, S.; Navab, N. SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1521–1529. [Google Scholar]
- Smolyanskiy, N.; Kamenev, A.; Smith, J.; Birchfield, S. Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 4241–4247. [Google Scholar]
- Kainuma, A.; Madokoro, H.; Sato, K.; Shimoi, N. Occlusion-robust segmentation for multiple objects using a micro air vehicle. In Proceedings of the 2016 16th International Conference on Control, Automation and Systems (ICCAS), Gyeongju, Republic of Korea, 16–19 October 2016. [Google Scholar]
- Yan, K.; Sukthankar, R. PCA-SIFT: A more distinctive representation for local image descriptors. Proc. CVPR 2004, 2, 506–513. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 386–397. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In ECCV 2016: Computer Vision—ECCV 2016, Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [Green Version]
- Bruni, V.; Vitulano, D. An improvement of kernel-based object tracking based on human perception. IEEE Trans. Syst. Man Cybern. Syst. 2014, 44, 1474–1485. [Google Scholar] [CrossRef]
- Xiao, C.; Yilmaz, A. Efficient tracking with distinctive target colors and silhouette. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2728–2733. [Google Scholar]
- Lychkov, I.I.; Alfimtsev, A.N.; Sakulin, S.A. Tracking of moving objects with regeneration of object feature points. In Proceedings of the 2018 Global Smart Industry Conference (GloSIC), Chelyabinsk, Russia, 13–15 November 2018; pp. 1–6. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016. [Google Scholar]
- Fan, H.; Ling, H. SANet: Structure-Aware Network for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Xu, Y.; Ban, Y.; Delorme, G.; Gan, C.; Rus, D.; Alameda-Pineda, X. Transcenter: Transformers with dense queries for multiple-object tracking. arXiv 2021, arXiv:2103.15145. [Google Scholar]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8844–8854. [Google Scholar]
- Ilea, D.E.; Whelan, P.F. Image segmentation based on the integration of colour–texture descriptors—A review. Pattern Recognit. 2011, 44, 2479–2501. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7262–7272. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Nejatishahidin, N.; Fayyazsanavi, P.; Kosecka, J. Object pose estimation using mid-level visual representations. arXiv 2022, arXiv:2203.01449. [Google Scholar]
- Zhu, M.; Derpanis, K.G.; Yang, Y.; Brahmbhatt, S.; Zhang, M.; Phillips, C.; Lecce, M.; Daniilidis, K. Single image 3D object detection and pose estimation for grasping. In Proceedings of the IEEE International Conference on Robotics & Automation, Hong Kong, Chia, 31 May–5 June 2014. [Google Scholar]
- Tekin, B.; Sinha, S.N.; Fua, P. Real-time seamless single shot 6d object pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 292–301. [Google Scholar]
- Rad, M.; Lepetit, V. BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3828–3836. [Google Scholar]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. Epnp: An accurate o (n) solution to the pnp problem. Int. J. Comput. Vis. 2009, 81, 155. [Google Scholar] [CrossRef] [Green Version]
- Doumanoglou, A.; Kouskouridas, R.; Malassiotis, S.; Kim, T.K. Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Wang, C.; Xu, D.; Zhu, Y.; Martín-Martín, R.; Lu, C.; Fei-Fei, L.; Savarese, S. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. arXiv 2019, arXiv:1901.04780. [Google Scholar]
- Kuo, W.; Angelova, A.; Lin, T.Y.; Dai, A. Mask2cad: 3d shape prediction by learning to segment and retrieve. In Computer Vision—ECCV 2020, Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 260–277. [Google Scholar]
- Kuo, W.; Angelova, A.; Lin, T.Y.; Dai, A. Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12589–12599. [Google Scholar]
- Liang, G.; Chen, F.; Liang, Y.; Feng, Y.; Wang, C.; Wu, X. A manufacturing-oriented intelligent vision system based on deep neural network for object recognition and 6d pose estimation. Front. Neurorobot. 2021, 14, 616775. [Google Scholar] [CrossRef]
- He, Y.; Wang, Y.; Fan, H.; Sun, J.; Chen, Q. FS6D: Few-Shot 6D Pose Estimation of Novel Objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6814–6824. [Google Scholar]
- Wang, A.; Pruksachatkun, Y.; Nangia, N.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S. Superglue: A stickier benchmark for general-purpose language understanding systems. In Advances in Neural Information Processing Systems, Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; NeurIPS: Vancouver, BC, Canada, 2019; Volume 32. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1857–1866. [Google Scholar]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv 2017, arXiv:1711.00199. [Google Scholar]
- Zhan, S.; Chung, R.; Zhang, X.T. An Accurate and Robust Strip-Edge-Based Structured Light Means for Shiny Surface Micromeasurement in 3-D. IEEE Trans. Ind. Electron. 2013, 60, 1023–1032. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Dense(per) | Dense(iter) | Ours(per) | Ours(iter) | |||||
---|---|---|---|---|---|---|---|---|
AUC | ADD-S | AUC | ADD-S | AUC | ADD-S | AUC | ADD-S | |
toy | 97.4 | 88.6 | 97.4 | 88.6 | 88.8 | 97.2 | 88.8 | 97.8 |
Lay’s | 68.0 | 73.8 | 72.7 | 75.3 | 76.0 | 74.0 | 78.6 | 82.2 |
bowl | 97.4 | 91.5 | 97.4 | 91.5 | 92.2 | 97.7 | 92.3 | 99.1 |
Thermos cup | 50.0 | 58.9 | 50.0 | 58.9 | 73.6 | 61.6 | 73.6 | 61.6 |
Tea box | 69.2 | 82.0 | 69.2 | 82.0 | 75.2 | 68.8 | 79.5 | 68.8 |
Blue moon | 52.2 | 75.7 | 60.8 | 76.4 | 78.1 | 62.5 | 84.0 | 73.1 |
Metal block | 64.3 | 78.5 | 64.5 | 78.5 | 76.7 | 64.4 | 81.4 | 76.9 |
Carton | 71.7 | 83.4 | 71.7 | 83.4 | 75.1 | 66.1 | 80.1 | 75.3 |
cup | 96.3 | 85.9 | 97.6 | 87.6 | 87.9 | 97.7 | 88.9 | 99.5 |
back of cup | 92.7 | 88.2 | 92.7 | 88.2 | 87.7 | 94.0 | 90.1 | 98.1 |
MEAN | 75.7 | 79.5 | 77.3 | 81.0 | 81.4 | 78.7 | 83.7 | 83.2 |
PoseCNN + ICP | Dense(per) | Dense(iter) | Ours(per) | Ours(iter) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
AUC | ADD-S | AUC | ADD-S | AUC | ADD-S | AUC | ADD-S | AUC | ADD-S | |
002 master chef can | 95.8 | 100.0 | 95.2 | 100.0 | 96.4 | 100.0 | 95.3 | 100.0 | 96.3 | 100.0 |
003 cracker box | 92.7 | 93.0 | 92.5 | 99.3 | 95.5 | 99.5 | 92.6 | 100.0 | 96.3 | 100.0 |
004 sugar box | 98.2 | 100 | 95.1 | 100.0 | 97.5 | 100.0 | 95.5 | 100.0 | 97.7 | 100.0 |
005 tomato soup can | 94.5 | 96.8 | 93.7 | 96.9 | 94.6 | 96.9 | 96.8 | 100.0 | 97.7 | 100.0 |
006 mustard bottle | 98.6 | 100.0 | 95.9 | 100.0 | 97.2 | 100.0 | 96.0 | 100.0 | 97.8 | 100.0 |
007 tuna fish can | 97.1 | 97.9 | 94.9 | 100.0 | 96.6 | 100.0 | 96.0 | 100.0 | 97.2 | 100.0 |
008 pudding box | 97.9 | 100.0 | 94.7 | 100.0 | 96.5 | 100.0 | 94.3 | 100.0 | 96.8 | 100.0 |
009 gelatin box | 98.8 | 100.0 | 95.8 | 100.0 | 98.1 | 100.0 | 97.3 | 100.0 | 98.2 | 100.0 |
010 potted meat can | 92.7 | 97.2 | 90.1 | 93.1 | 91.3 | 93.1 | 93.0 | 95.4 | 94.0 | 95.3 |
011 banana | 97.1 | 99.7 | 91.5 | 93.9 | 96.6 | 100.0 | 93.5 | 96.8 | 97.1 | 100.0 |
019 pitcher base | 97.8 | 100.0 | 94.6 | 100.0 | 97.1 | 100.0 | 93.4 | 99.5 | 97.9 | 100.0 |
021 bleach cleanser | 96.9 | 99.9 | 94.3 | 99.8 | 95.8 | 100.0 | 95.0 | 99.7 | 96.7 | 100.0 |
024 bowl | 81.0 | 58.8 | 86.6 | 69.5 | 88.2 | 98.8 | 84.4 | 73.9 | 88.8 | 96.8 |
025 mug | 95.0 | 99.5 | 95.5 | 100.0 | 97.1 | 100.0 | 96.0 | 100.0 | 97.3 | 100.0 |
035 power drill | 98.2 | 99.9 | 92.4 | 97.1 | 96.0 | 98.7 | 92.9 | 97.3 | 96.1 | 98.3 |
036 wood block | 87.6 | 82.6 | 85.5 | 93.4 | 89.7 | 94.6 | 85.8 | 84.3 | 91.7 | 96.7 |
037 scissors | 91.7 | 100 | 96.4 | 100.0 | 95.2 | 100.0 | 96.6 | 100.0 | 93.1 | 99.5 |
040 large marker | 97.2 | 98.0 | 94.7 | 99.2 | 97.5 | 100.0 | 95.9 | 99.7 | 97.8 | 100.0 |
051 large clamp | 75.2 | 75.6 | 71.6 | 78.5 | 72.9 | 79.2 | 73.7 | 79.2 | 75.7 | 80.1 |
052 extra large clamp | 64.4 | 55.6 | 69.0 | 69.5 | 69.8 | 76.3 | 83.4 | 83.6 | 83.3 | 88.9 |
061 foam brick | 97.2 | 99.6 | 92.4 | 100.0 | 92.5 | 100.0 | 94.8 | 100.0 | 96.4 | 100.0 |
MEAN | 93.0 | 93.1 | 91.2 | 95.3 | 93.1 | 96.8 | 92.8 | 96.5 | 94.8 | 97.9 |
Detection | Segmentation | Classification | 6D Pose Estimation | Image Transmission | |
---|---|---|---|---|---|
Time (s) | 0.049 | 0.02 | 0.002 | 0.023 | 0.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, S.; Liang, Y.; Luo, Z.; Liang, G.; Wang, C.; Wu, X. Vision-Guided Object Recognition and 6D Pose Estimation System Based on Deep Neural Network for Unmanned Aerial Vehicles towards Intelligent Logistics. Appl. Sci. 2023, 13, 115. https://doi.org/10.3390/app13010115
Luo S, Liang Y, Luo Z, Liang G, Wang C, Wu X. Vision-Guided Object Recognition and 6D Pose Estimation System Based on Deep Neural Network for Unmanned Aerial Vehicles towards Intelligent Logistics. Applied Sciences. 2023; 13(1):115. https://doi.org/10.3390/app13010115
Chicago/Turabian StyleLuo, Sijin, Yu Liang, Zhehao Luo, Guoyuan Liang, Can Wang, and Xinyu Wu. 2023. "Vision-Guided Object Recognition and 6D Pose Estimation System Based on Deep Neural Network for Unmanned Aerial Vehicles towards Intelligent Logistics" Applied Sciences 13, no. 1: 115. https://doi.org/10.3390/app13010115