Abstract
In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.
A. Sadat and S. Casas—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: CVPR (2019)
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: CVPR (2018)
Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: CoRL (2018)
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449 (2019)
Tang, C., Salakhutdinov, R.R.: Multiple futures prediction. In: Advances in Neural Information Processing Systems, pp. 15398–15408 (2019)
Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. arXiv preprint arXiv:1910.08233 (2019)
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NIPS (1989)
Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. arXiv preprint arXiv:1912.12294 (2019)
Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning to steer by mimicking features from heterogeneous auxiliary networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8433–8440 (2019)
Zeng, W., et al.: End-to-end interpretable neural motion planner. In: CVPR (2019)
Rhinehart, N., McAllister, R., Levine, S.: Deep imitative models for flexible inference, planning, and control. arXiv preprint arXiv:1810.06544 (2018)
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Kendall, A., et al..: Learning to drive in a day. arXiv preprint arXiv:1807.00412 (2018)
Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: ICRA (2018)
Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9329–9338 (2019)
Sadat, A., Ren, M., Pokrovsky, A., Lin, Y.C., Yumer, E., Urtasun, R.: Jointly learnable behavior and trajectory planning for self-driving vehicles. In: IROS (2018)
Fan, H., Xia, Z., Liu, C., Chen, Y., Kong, Q.: An auto-tuning framework for autonomous vehicles. arXiv preprint arXiv:1808.04913 (2018)
Ma, H., Wang, Y., Tang, L., Kodagoda, S., Xiong, R.: Towards navigation without precise localization: weakly supervised learning of goal-directed navigation cost map. CoRR abs/1906.02468 (2019)
Banzhaf, H., Sanzenbacher, P., Baumann, U., Zöllner, J.M.: Learning to predict ego-vehicle poses for sampling-based nonholonomic motion planning. IEEE Robot. Autom. Lett. 4(2), 1053–1060 (2019)
Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2017)
Parisotto, E., Salakhutdinov, R.: Neural map: structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360 (2017)
Khan, A., Zhang, C., Atanasov, N., Karydis, K., Kumar, V., Lee, D.D.: Memory augmented control networks. arXiv preprint arXiv:1709.05706 (2017)
Ratliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: ICML (2006)
Bansal, M., Krizhevsky, A., Ogale, A.: ChauffeurNet: learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079 (2018)
Zhao, A., He, T., Liang, Y., Huang, H., Broeck, G.V.d., Soatto, S.: LaTeS: latent space distillation for teacher-student driving policy learning. arXiv preprint arXiv:1912.02973 (2019)
Hawke, J., et al.: Urban driving with conditional imitation learning. arXiv preprint arXiv:1912.00177 (2019)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Wulfmeier, M., Ondruska, P., Posner, I.: Deep inverse reinforcement learning. CoRR abs/1507.04888 (2015)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPS (2016)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3057–3065. IEEE (2017)
Frossard, D., Kee, E., Urtasun, R.: DeepSignals: predicting intent of drivers through visual signals. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9697–9703. IEEE (2019)
Phan-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. arXiv preprint arXiv:1911.10298 (2019)
Liang, M., et al.: PnPNet: end-to-end perception and prediction with tracking in the loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12126–12134 (2019)
Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. arXiv preprint arXiv:2006.02636 (2020)
Li, L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020)
Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting (2020)
Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22(6), 46–57 (1989)
Thrun, S.: Learning occupancy grid maps with forward sensor models. Auton. Robots 15(2), 111–127 (2003)
Hoermann, S., Bach, M., Dietmayer, K.: Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2056–2063. IEEE (2018)
Jain, A., et al.: Discrete residual flow for probabilistic pedestrian behavior prediction. arXiv preprint arXiv:1910.08041 (2019)
Ridel, D., Deo, N., Wolf, D., Trivedi, M.: Scene compliant trajectory forecast with agent-centric spatio-temporal grids. IEEE Robot. Autom. Lett. 5(2), 2816–2823 (2020)
Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The garden of forking paths: towards multi-future trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10508–10518 (2020)
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3d object detection from point clouds. In: CVPR (2018)
Werling, M., Ziegler, J., Kammel, S., Thrun, S.: Optimal trajectory generation for dynamic street scenarios in a frenet frame. In: ICRA (2010)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. arXiv (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (mp4 9244 KB)
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R. (2020). Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-58592-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)