Abstract
Camera pose estimation in robotic applications is paramount. Most of recent algorithms based on convolutional neural networks demonstrate that they are able to predict the camera pose adequately. However, they usually suffer from the computational complexity which prevent them from running in real-time. Additionally, they are not robust to perturbations such as partial occlusion while they have not been trained on such cases beforehand. To study these limitations, this paper presents a fast and robust end-to-end Siamese convolutional model for robot-camera pose estimation. Two colored-frames are fed to the model at the same time, and the generic features are produced mainly based on the transfer learning. The extracted features are then concatenated, from which the relative pose is directly obtained at the output. Furthermore, a new dataset is generated, which includes several videos taken at various situations for the model evaluation. The proposed technique shows a robust performance even in challenging scenes, which have not been rehearsed during the training phase. Through the experiments conducted with an eye-in-hand KUKA robotic arm, the presented network renders fairly accurate results on camera pose estimation despite scene-illumination changes. Also, the pose estimation is conducted with reasonable accuracy in presence of partial camera occlusion. The results are enhanced by defining a new dynamic weighted loss function. The proposed method is further exploited in visual servoing scenario.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bateux Q, Marchand E, Leitner J, Chaumette F, Corke P (2018) Training deep neural networks for visual servoing. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–8
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vision Image Underst 110(3):346–359
Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6684–6692
Brachmann E, Rother C (2018) Learning less is more-6d camera localization via 3d surface regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4654–4662
Brahmbhatt S, Gu J, Kim K, Hays J, Kautz J (2018) Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2616–2625
Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: binary robust independent elementary features. In: European conference on computer vision. Springer, New York, pp 778–792
Cavallari T, Golodetz S, Lord N, Valentin J, Prisacariu V, Di Stefano L, Torr PH (2019) Real-time rgb-d camera pose estimation in novel scenes using a relocalisation cascade. IEEE Transactions on Pattern Analysis and Machine Intelligence
Charco JL, Vintimilla BX, Sappa AD (2018) Deep learning based camera pose estimation in multi-view environment. In: 2018 14Th international conference on signal-image technology & internet-based systems (SITIS), IEEE, pp 224–228
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:, Delving deep into convolutional nets. arXiv:1405.3531
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition (CVPR), IEEE, pp 248–255
DeTone D, Malisiewicz T, Rabinovich A (2016) Deep image homography estimation. arXiv:1606.03798
Francois C (2017) Deep learning with python
Gálvez-López D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):1188–1197
Glocker B, Izadi S, Shotton J, Criminisi A (2013) Real-time rgb-d camera relocalization. In: IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 173–179
Glocker B, Shotton J, Criminisi A, Izadi S (2014) Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding. IEEE Trans Visualizat Comput Graph 21(5):571–583
Golodetz S, Cavallari T, Lord NA, Prisacariu VA, Murray DW, Torr PH (2018) Collaborative large-scale dense 3d reconstruction with online inter-agent pose optimisation. IEEE Trans Visualizat Comput Graph 24(11):2895–2905
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, New York, pp 749–765
Kähler O, Prisacariu VA, Ren CY, Sun X, Torr P, Murray D (2015) Very high frame rate volumetric integration of depth images on mobile devices. IEEE Trans Visualizat Comput Graph 21 (11):1241–1250
Kamranian Z, Nilchi ARN, Monadjemi A, Navab N (2018) Iterative algorithm for interactive co-segmentation using semantic information propagation. Appl Intell 48(12):5019–5036
Kamranian Z, Nilchi ARN, Sadeghian H, Tombari F, Navab N (2019) Joint motion boundary detection and cnn-based feature visualization for video object segmentation. Neural Comput Applic pp 1–19
Kamranian Z, Tombari F, Nilchi ARN, Monadjemi A, Navab N (2018) Co-segmentation via visualization. J Vis Commun Image Represent 55:201–214
Kendall A, Cipolla R (2017) Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5974–5983
Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Konda KR, Memisevic R (2015) Learning visual odometry with a convolutional network. In: VISAPP (1), pp 486–490
Lathuilière S, Mesejo P, Alameda-Pineda X, Horaud R (2019) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Mach Intell
Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: deep iterative matching for 6d pose estimation. In: Proceedings of the european conference on computer vision (ECCV), pp 683–698
Lin Y, Liu Z, Huang J, Wang C, Du G, Bai J, Lian S (2019) Deep global-relative networks for end-to-end 6-dof visual localization and odometry. In: Pacific rim international conference on artificial intelligence. Springer, New York, pp 454–467
Liu R, Zhang H, Liu M, Xia X, Hu T (2009) Stereo cameras self-calibration based on sift. In: 2009 international conference on measuring technology and mechatronics automation, IEEE, vol 1, pp 352–355
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR), conference on, IEEE, pp 3431–3440
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Melekhov I, Ylioinas J, Kannala J, Rahtu E (2017) Image-based localization using hourglass networks. In: Proceedings of the IEEE international conference on computer vision workshops, pp 879–886
Melekhov I, Ylioinas J, Kannala J, Rahtu E (2017) Relative camera pose estimation using convolutional neural networks. In: International conference on advanced concepts for intelligent vision systems. Springer, New York, pp 675–687
Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Radwan N, Valada A, Burgard W (2018) Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robot Automat Lett 3(4):4407–4414
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 international conference on computer vision, Ieee, pp 2564–2571
Sadeghian H, Villani L, Kamranian Z, Karami A (2015) Visual servoing with safe interaction using image moments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5479–5485
Sarlin PE, Cadena C, Siegwart R, Dymczyk M (2019) From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12716–12725
Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113
Siciliano B, Sciavicco L, Villani L, Oriolo G (2010) Robotics: modelling, planning and control. Springer Science & Business Media, New York
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sivic J, Zisserman A (2008) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Ruiz-del Solar J, Loncomilla P, Soto N (2018) A survey on deep learning methods for robot vision. arXiv:1803.10862
Tola E, Lepetit V, Fua P (2009) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830
Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5038–5047
Valentin J, Vineet V, Cheng MM, Kim D, Shotton J, Kohli P, Nießner M., Criminisi A, Izadi S, Torr P (2015) Semanticpaint: interactive 3d labeling and learning at your fingertips. ACM Trans Graph (TOG) 34(5):1–17
Wang Z, Dai Z, Póczos B, Carbonell J (2019) Characterizing and avoiding negative transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11293–11302
Zhang Y, Wang S, Genlin J (2015) Application of time-varying acceleration coefficients pso to face pose estimation. In: First international conference on information sciences, machinery, materials and energy. Atlantis Press, Paris
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflicts of interest.
Additional information
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kamranian, Z., Sadeghian, H., Naghsh Nilchi, A.R. et al. Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intell 51, 3581–3599 (2021). https://doi.org/10.1007/s10489-020-01982-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01982-z