End-to-End Monocular Range Estimation for Forward Collision Warning
<p>Overview of architecture. The range is represented as a weighted sum of a set of potential distances and the whole architecture consists of weight map generation and distance map generation. The solid lines with arrows represent the forward implementation of our method, while the dashed lines with arrows indicate the loss calculation and back-propagation in the training stage.</p> "> Figure 2
<p>Our autonomous driving vehicle with image coordinate system, LiDAR coordinate system, and world coordinate system in our setting.</p> "> Figure 3
<p>The network structure of weight generation network. The U-Net structure is an encoder-decoder network. The encoder and decoder parts consist of fully convolutional networks. Fully connected layers are applied to the flattened encoded features to provide spatial position information. See the text for more details.</p> "> Figure 4
<p>Processed Apollo dataset. These masks represent different preset collision regions. Even for the same color image, different collision regions may induce different target ranges.</p> "> Figure 5
<p>Visualization of generated weight maps. The first row is the color image. The second row is the colorized weighted map. The last row is the color image superimposed on the preset collision region and the colorized weighted map. This interpretable representation of the last row will be used to illustrate experimental results in the following. We use cyan rectangles to highlight the notable regions and zoom in these regions for better visualization.</p> "> Figure 6
<p>Performance on target objects of various classes.</p> "> Figure 7
<p>Results on objects of various categories.</p> "> Figure 8
<p>Generalization capability. These six images are from KITTI and virtual KITTI datasets. They are collected with different cameras and views than the training dataset.</p> "> Figure 9
<p>Results on images with multiple objects in front.</p> "> Figure 10
<p>Performance on cars in the test set of the processed synthetic Apollo dataset.</p> "> Figure 11
<p>Performance on a data sequence collected by our autonomous driving vehicle.</p> "> Figure 12
<p>Samples of results in our collected sequence. The results of the Inverse Perspective Mapping (IPM)-based method are shown in the top row. The results of our method are shown in the bottom row.</p> "> Figure 13
<p>Mean absolute error (MAE) (in m) on the train set and test set under different ranges.</p> "> Figure 14
<p><math display="inline"><semantics> <mi>δ</mi> </semantics></math> corresponding to the threshold of relative error in the test set.</p> "> Figure 15
<p>Examples of five types of failure cases of our method. See the text for more details.</p> ">
Abstract
:1. Introduction
2. Approach
2.1. Overview
2.2. Distance Map Generation
2.3. Weight Map Generation
2.4. End-to-End Learning
2.5. Training Settings
3. Experiment Setup
3.1. Synthetic Dataset
3.2. Real-World Data Collection
4. Results and Analyses
4.1. Interpretability
4.2. Class-Agnostic Property
4.3. Generalization Capability
4.4. Closest Object
4.5. Comparison
4.6. Failure Cases
5. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Van Der Horst, R.; Hogema, J. Time-to-collision and collision avoidance systems. In Proceedings of the 6th ICTCT Workshop: Safety Evaluation of Traffic Systems: Traffic Conflicts and Other Measures, Salzburg, Austria, 27–29 October 1993; pp. 109–121. [Google Scholar]
- Dagan, E.; Mano, O.; Stein, G.P.; Shashua, A. Forward collision warning with a single camera. In Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 37–42. [Google Scholar]
- Chen, Y.L.; Shen, K.Y.; Wang, S.C. Forward collision warning system considering both time-to-collision and safety braking distance. In Proceedings of the 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), Melbourne, Australia, 19–21 June 2013; pp. 972–977. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2366–2374. [Google Scholar]
- Kim, G.; Cho, J.S. Vision-based vehicle detection and inter-vehicle distance estimation. In Proceedings of the 2012 12th International Conference on Control, Automation and Systems, JeJu Island, Korea, 17–21 October 2012; pp. 625–629. [Google Scholar]
- Wu, B.F.; Chen, Y.L.; Chen, Y.H.; Chen, C.J. Real-Time Nighttime Vehicle Detection and Recognition System Based on Computer Vision. U.S. Patent No. 7,949,190, 24 May 2011. [Google Scholar]
- Kim, G.; Cho, J.S. Vision-based vehicle detection and inter-vehicle distance estimation for driver alarm system. Opt. Rev. 2012, 19, 388–393. [Google Scholar] [CrossRef]
- Tuohy, S.; O’Cualain, D.; Jones, E.; Glavin, M. Distance determination for an automobile environment using inverse perspective mapping in OpenCV. In Proceedings of the IET Irish Signals and Systems Conference (ISSC 2010), Cork, Ireland, 23–24 June 2010; pp. 100–105. [Google Scholar]
- Mallot, H.A.; Bülthoff, H.H.; Little, J.; Bohrer, S. Inverse perspective mapping simplifies optical flow computation and obstacle detection. Biol. Cybern. 1991, 64, 177–185. [Google Scholar] [CrossRef] [PubMed]
- Rezaei, M.; Terauchi, M.; Klette, R. Robust vehicle detection and distance estimation under challenging lighting conditions. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2723–2743. [Google Scholar] [CrossRef]
- Kim, J.B. Efficient vehicle detection and distance estimation based on aggregated channel features and inverse perspective mapping from a single camera. Symmetry 2019, 11, 1205. [Google Scholar] [CrossRef] [Green Version]
- Stein, G.P.; Mano, O.; Shashua, A. Vision-based ACC with a single camera: Bounds on range and range rate accuracy. In Proceedings of the IEEE IV2003 Intelligent Vehicles Symposium, Proceedings (Cat. No. 03TH8683), Columbus, OH, USA, 9–11 June 2003; pp. 120–125. [Google Scholar]
- Zhu, J.; Fang, Y. Learning object-specific distance from a monocular image. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3839–3848. [Google Scholar]
- Zhang, Y.; Li, Y.; Zhao, M.; Yu, X. A regional regression network for monocular object distance estimation. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Zhe, T.; Huang, L.; Wu, Q.; Zhang, J.; Pei, C.; Li, L. Inter-Vehicle distance estimation method based on monocular vision using 3D detection. IEEE Trans. Veh. Technol. 2020, 69, 4907–4919. [Google Scholar] [CrossRef]
- Facil, J.M.; Ummenhofer, B.; Zhou, H.; Montesano, L.; Brox, T.; Civera, J. CAM-Convs: Camera-aware multi-scale convolutions for single-view depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11826–11835. [Google Scholar]
- Zhao, Y.; Kong, S.; Fowlkes, C. When perspective comes for free: Improving depth prediction with camera pose encoding. arXiv 2020, arXiv:2007.03887. [Google Scholar]
- Dijk, T.V.; Croon, G.D. How do neural networks see depth in single images? In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 2183–2191. [Google Scholar]
- de Paula, M.B.; Jung, C.R.; da Silveira, L., Jr. Automatic on-the-fly extrinsic camera calibration of onboard vehicular cameras. Expert Syst. Appl. 2014, 41, 1997–2007. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the ICML, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Dugas, C.; Bengio, Y.; Bélisle, F.; Nadeau, C.; Garcia, R. Incorporating second-order functional knowledge for better option pricing. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001; pp. 472–478. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 4340–4349. [Google Scholar]
- Ren, J.; Chen, X.; Liu, J.; Sun, W.; Pang, J.; Yan, Q.; Tai, Y.W.; Xu, L. Accurate single stage detector using recurrent rolling convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5420–5428. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, J.; Li, J. End-to-End Monocular Range Estimation for Forward Collision Warning. Sensors 2020, 20, 5941. https://doi.org/10.3390/s20205941
Tang J, Li J. End-to-End Monocular Range Estimation for Forward Collision Warning. Sensors. 2020; 20(20):5941. https://doi.org/10.3390/s20205941
Chicago/Turabian StyleTang, Jie, and Jian Li. 2020. "End-to-End Monocular Range Estimation for Forward Collision Warning" Sensors 20, no. 20: 5941. https://doi.org/10.3390/s20205941
APA StyleTang, J., & Li, J. (2020). End-to-End Monocular Range Estimation for Forward Collision Warning. Sensors, 20(20), 5941. https://doi.org/10.3390/s20205941