Abstract
Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code availability
Code files are available at https://github.com/JunchaoLi001/Model-free_DRL_LSTM_on_POMDP_with_LDGBA.
References
Kurniawati, H. (2022). Partially observable Markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems. https://doi.org/10.1146/annurev-control-042920-092451
Kaufman, H., & Howard, R. A. (1961). Dynamic programming and Markov processes. The American Mathematical Monthly. https://doi.org/10.2307/2312519
Cai, M., Xiao, S., Li, B., Li, Z., & Kan, Z. (2021). Reinforcement learning based temporal logic control with maximum probabilistic satisfaction (Vol. 2021). https://doi.org/10.1109/ICRA48506.2021.9561903
Cai, M., Xiao, S., Li, Z., & Kan, Z. (2021). Optimal probabilistic motion planning with potential infeasible ltl constraints. IEEE Transactions on Automatic Control. https://doi.org/10.1109/TAC.2021.3138704
Perez, A., Platt, R., Konidaris, G., Kaelbling, L., & Lozano-Perez, T. (2012). Lqr-rrt*: Optimal sampling-based motion planning with automatically derived extension heuristics. In 2012 IEEE International Conference on Robotics and Automation (pp. 2537–2542). IEEE.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based pomdp solvers. Autonomous Agents and Multi-Agent Systems. https://doi.org/10.1007/s10458-012-9200-2
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for pomdps.
Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces (Vol. 4) (2009). https://doi.org/10.15607/rss.2008.iv.009
Li, J., Cai, M., Wang, Z., & Xiao, S. (2023). Model-based motion planning in pomdps with temporal logic specifications. Advanced Robotics, 37(14), 871–886.
Mnih, V., Silver, D., & Riedmiller, M. (2013). Playing atari with deep q learning. Nips.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM. https://doi.org/10.1145/3065386
Hausknecht, M., & Stone, P. (2015). Deep recurrent q-learning for partially observable mdps (Vol. FS-15-06).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation. https://doi.org/10.1162/neco.1997.9.8.1735
Foerster, J. N., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks.
Zhu, P., Li, X., Poupart, P., & Miao, G. (2017). On improving deep reinforcement learning for pomdps.
Heess, N., Hunt, J. J., Lillicrap, T. P., & Silver, D. (2015). Memory-based control with recurrent neural networks.
Meng, L., Gorbet, R., & Kulic, D. (2021). Memory-based deep reinforcement learning for pomdps. https://doi.org/10.1109/IROS51168.2021.9636140
0 Baier, C., & Katoen, J.-P. (2008). Principles of model checking (Vol. 950).
Křetínský, J., Meggendorfer, T., & Sickert, S. (2018). Owl: A library for \(\omega \)-words, automata, and ltl, vol. 11138 LNCS. https://doi.org/10.1007/978-3-030-01090-4_34
Chatterjee, K., Chmelík, M., Gupta, R., & Kanodia, A. (2015). Qualitative analysis of pomdps with temporal logic specifications for robotics applications, vol. 2015-June. https://doi.org/10.1109/ICRA.2015.7139019
Icarte, R. T., Waldie, E., Klassen, T. Q., Valenzano, R., Castro, M. P., & McIlraith, S. A. (2019). Learning reward machines for partially observable reinforcement learning, vol. 32.
Icarte, R. T., Klassen, T. Q., Valenzano, R., & McIlraith, S. A. (2022). Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73, 173–208.
Sharan, R., & Burdick, J. (2014). Finite state control of pomdps with ltl specifications. https://doi.org/10.1109/ACC.2014.6858909
Bouton, M., & Kochenderfer, M. J. (2020). Point-based methods for model checking in partially observable Markov decision processes. https://doi.org/10.1609/aaai.v34i06.6563
Ahmadi, M., Sharan, R., & Burdick, J. W. (2020). Stochastic finite state control of pomdps with ltl specifications.
Carr, S., Jansen, N., Wimmer, R., Serban, A., Becker, B., & Topcu, U. (2019). Counterexample-guided strategy improvement for pomdps using recurrent neural networks (vol. 2019). https://doi.org/10.24963/ijcai.2019/768.
Carr, S., Jansen, N., & Topcu, U. (2020) Verifiable rnn-based policies for pomdps under temporal logic constraints (Vol. 2021). https://doi.org/10.24963/ijcai.2020/570.
Hahn, E. M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., & Wojtczak, D. (2019). Omega-regular objectives in model-free reinforcement learning (Vol. 11427). LNCS. https://doi.org/10.1007/978-3-030-17462-0_27
Cai, M., Hasanbeig, M., Xiao, S., Abate, A., & Kan, Z. (2021). Modular deep reinforcement learning for continuous motion planning with temporal logic. IEEE Robotics and Automation Letters. https://doi.org/10.1109/LRA.2021.3101544
Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G. J., & Lee, I. (2019). Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees (Vol. 2019). https://doi.org/10.1109/CDC40024.2019.9028919
Hasanbeig, H., Kroening, D., & Abate, A. (2023). Certified reinforcement learning with logic guidance. Artificial Intelligence, 322, 103949.
Oura, R., Sakakibara, A., & Ushio, T. (2020). Reinforcement learning of control policy for linear temporal logic specifications using limit-deterministic generalized büchi automata. IEEE Control Systems Letters. https://doi.org/10.1109/LCSYS.2020.2980552
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning. https://doi.org/10.1007/bf00992699
Bozkurt, A. K., Wang, Y., Zavlanos, M. M., & Pajic, M. (2020). Control synthesis from linear temporal logic specifications using model-free reinforcement learning. https://doi.org/10.1109/ICRA40945.2020.9196796
Sickert, S., Esparza, J., Jaax, S., & Křetínský, J. (2016). Limit-deterministic büchi automata for linear temporal logic (Vol. 9780). https://doi.org/10.1007/978-3-319-41540-6_17.
Coumans, E., & Bai, Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org (2016–2021)
Oroojlooy, A., & Hajinezhad, D. (2022). A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence. https://doi.org/10.1007/s10489-022-04105-y
Zhou, W., Li, J., & Zhang, Q. (2022). Joint communication and action learning in multi-target tracking of uav swarms with deep reinforcement learning. Drones, 6(11), 339.
Acknowledgements
Li and Xiao would like to thank US Department of Education (ED#P116S210005) and NSF (#2226936) for supporting this research.
Funding
This research was funded by US Department of Education (ED#P116S210005) and NSF (#2226936).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Cai, M., Kan, Z. et al. Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments. Auton Agent Multi-Agent Syst 38, 14 (2024). https://doi.org/10.1007/s10458-024-09641-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10458-024-09641-0