[go: up one dir, main page]

Skip to main content
Log in

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Code availability

Code files are available at https://github.com/JunchaoLi001/Model-free_DRL_LSTM_on_POMDP_with_LDGBA.

Notes

  1. https://github.com/JunchaoLi001/Model-free_DRL_LSTM_on_POMDP_with_LDGBA.

References

  1. Kurniawati, H. (2022). Partially observable Markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems. https://doi.org/10.1146/annurev-control-042920-092451

    Article  MATH  Google Scholar 

  2. Kaufman, H., & Howard, R. A. (1961). Dynamic programming and Markov processes. The American Mathematical Monthly. https://doi.org/10.2307/2312519

    Article  MathSciNet  MATH  Google Scholar 

  3. Cai, M., Xiao, S., Li, B., Li, Z., & Kan, Z. (2021). Reinforcement learning based temporal logic control with maximum probabilistic satisfaction (Vol. 2021). https://doi.org/10.1109/ICRA48506.2021.9561903

  4. Cai, M., Xiao, S., Li, Z., & Kan, Z. (2021). Optimal probabilistic motion planning with potential infeasible ltl constraints. IEEE Transactions on Automatic Control. https://doi.org/10.1109/TAC.2021.3138704

    Article  MATH  Google Scholar 

  5. Perez, A., Platt, R., Konidaris, G., Kaelbling, L., & Lozano-Perez, T. (2012). Lqr-rrt*: Optimal sampling-based motion planning with automatically derived extension heuristics. In 2012 IEEE International Conference on Robotics and Automation (pp. 2537–2542). IEEE.

  6. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.

    MATH  Google Scholar 

  7. Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based pomdp solvers. Autonomous Agents and Multi-Agent Systems. https://doi.org/10.1007/s10458-012-9200-2

    Article  MATH  Google Scholar 

  8. Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for pomdps.

  9. Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces (Vol. 4) (2009). https://doi.org/10.15607/rss.2008.iv.009

  10. Li, J., Cai, M., Wang, Z., & Xiao, S. (2023). Model-based motion planning in pomdps with temporal logic specifications. Advanced Robotics, 37(14), 871–886.

    Article  MATH  Google Scholar 

  11. Mnih, V., Silver, D., & Riedmiller, M. (2013). Playing atari with deep q learning. Nips.

  12. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  13. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM. https://doi.org/10.1145/3065386

    Article  MATH  Google Scholar 

  14. Hausknecht, M., & Stone, P. (2015). Deep recurrent q-learning for partially observable mdps (Vol. FS-15-06).

  15. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  MATH  Google Scholar 

  16. Foerster, J. N., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks.

  17. Zhu, P., Li, X., Poupart, P., & Miao, G. (2017). On improving deep reinforcement learning for pomdps.

  18. Heess, N., Hunt, J. J., Lillicrap, T. P., & Silver, D. (2015). Memory-based control with recurrent neural networks.

  19. Meng, L., Gorbet, R., & Kulic, D. (2021). Memory-based deep reinforcement learning for pomdps. https://doi.org/10.1109/IROS51168.2021.9636140

  20. 0 Baier, C., & Katoen, J.-P. (2008). Principles of model checking (Vol. 950).

  21. Křetínský, J., Meggendorfer, T., & Sickert, S. (2018). Owl: A library for \(\omega \)-words, automata, and ltl, vol. 11138 LNCS. https://doi.org/10.1007/978-3-030-01090-4_34

  22. Chatterjee, K., Chmelík, M., Gupta, R., & Kanodia, A. (2015). Qualitative analysis of pomdps with temporal logic specifications for robotics applications, vol. 2015-June. https://doi.org/10.1109/ICRA.2015.7139019

  23. Icarte, R. T., Waldie, E., Klassen, T. Q., Valenzano, R., Castro, M. P., & McIlraith, S. A. (2019). Learning reward machines for partially observable reinforcement learning, vol. 32.

  24. Icarte, R. T., Klassen, T. Q., Valenzano, R., & McIlraith, S. A. (2022). Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73, 173–208.

    Article  MathSciNet  MATH  Google Scholar 

  25. Sharan, R., & Burdick, J. (2014). Finite state control of pomdps with ltl specifications. https://doi.org/10.1109/ACC.2014.6858909

  26. Bouton, M., & Kochenderfer, M. J. (2020). Point-based methods for model checking in partially observable Markov decision processes. https://doi.org/10.1609/aaai.v34i06.6563

  27. Ahmadi, M., Sharan, R., & Burdick, J. W. (2020). Stochastic finite state control of pomdps with ltl specifications.

  28. Carr, S., Jansen, N., Wimmer, R., Serban, A., Becker, B., & Topcu, U. (2019). Counterexample-guided strategy improvement for pomdps using recurrent neural networks (vol. 2019). https://doi.org/10.24963/ijcai.2019/768.

  29. Carr, S., Jansen, N., & Topcu, U. (2020) Verifiable rnn-based policies for pomdps under temporal logic constraints (Vol. 2021). https://doi.org/10.24963/ijcai.2020/570.

  30. Hahn, E. M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., & Wojtczak, D. (2019). Omega-regular objectives in model-free reinforcement learning (Vol. 11427). LNCS. https://doi.org/10.1007/978-3-030-17462-0_27

  31. Cai, M., Hasanbeig, M., Xiao, S., Abate, A., & Kan, Z. (2021). Modular deep reinforcement learning for continuous motion planning with temporal logic. IEEE Robotics and Automation Letters. https://doi.org/10.1109/LRA.2021.3101544

    Article  MATH  Google Scholar 

  32. Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G. J., & Lee, I. (2019). Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees (Vol. 2019). https://doi.org/10.1109/CDC40024.2019.9028919

  33. Hasanbeig, H., Kroening, D., & Abate, A. (2023). Certified reinforcement learning with logic guidance. Artificial Intelligence, 322, 103949.

    Article  MathSciNet  MATH  Google Scholar 

  34. Oura, R., Sakakibara, A., & Ushio, T. (2020). Reinforcement learning of control policy for linear temporal logic specifications using limit-deterministic generalized büchi automata. IEEE Control Systems Letters. https://doi.org/10.1109/LCSYS.2020.2980552

    Article  MATH  Google Scholar 

  35. Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning. https://doi.org/10.1007/bf00992699

    Article  MATH  Google Scholar 

  36. Bozkurt, A. K., Wang, Y., Zavlanos, M. M., & Pajic, M. (2020). Control synthesis from linear temporal logic specifications using model-free reinforcement learning. https://doi.org/10.1109/ICRA40945.2020.9196796

  37. Sickert, S., Esparza, J., Jaax, S., & Křetínský, J. (2016). Limit-deterministic büchi automata for linear temporal logic (Vol. 9780). https://doi.org/10.1007/978-3-319-41540-6_17.

  38. Coumans, E., & Bai, Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org (2016–2021)

  39. Oroojlooy, A., & Hajinezhad, D. (2022). A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence. https://doi.org/10.1007/s10489-022-04105-y

    Article  MATH  Google Scholar 

  40. Zhou, W., Li, J., & Zhang, Q. (2022). Joint communication and action learning in multi-target tracking of uav swarms with deep reinforcement learning. Drones, 6(11), 339.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

Li and Xiao would like to thank US Department of Education (ED#P116S210005) and NSF (#2226936) for supporting this research.

Funding

This research was funded by US Department of Education (ED#P116S210005) and NSF (#2226936).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junchao Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Cai, M., Kan, Z. et al. Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments. Auton Agent Multi-Agent Syst 38, 14 (2024). https://doi.org/10.1007/s10458-024-09641-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-024-09641-0

Keywords

Navigation