A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14381))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

1164 Accesses

Abstract

Sparse-reward reinforcement learning environments pose a particular challenge because the agent receives infrequent rewards, making it difficult to learn an optimal policy. In this paper, we propose NSSE, a novel approach that combines that stratified state space exploration with prioritised sweeping to enhance the informativeness of learning, thus enabling fast learning convergence. We evaluate NSSE on three typical Atari sparse reward environments. The results demonstrate that our state space exploration method exhibits strong performance compared to two baseline algorithms: Deep Q-Network (DQN) and noisy Deep Q-Network (Noisy DQN).

Dr Yang is supported by Royal Academy Engineering SHE project RAEng (IF2223-172) and Royal Society of Edinburgh (961_Yang).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks

Article Open access 10 January 2025

Safe Reinforcement Learning Through Regret and State Restorations in Evaluation Stages

A Comparative Study of Model-Free Reinforcement Learning Approaches

References

Aubret, A., Matignon, L., Hassas, S.: An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey. Entropy 25(2), 327 (2023)
Article Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Bellman, R.: A Markovian decision process. J. Math. Mech. 679–684 (1957)
Google Scholar
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: First return, then explore. Nature 590(7847), 580–586 (2021)
Article Google Scholar
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Hester, T., et al.: Deep Q-learning from demonstrations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Jo, D., et al.: LECO: learnable episodic count for task-specific intrinsic reward. Adv. Neural. Inf. Process. Syst. 35, 30432–30445 (2022)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. In: Wiley Series in Probability and Statistics (1994)
Google Scholar
Saglam, B., Mutlu, F.B., Cicek, D.C., Kozat, S.S.: Actor prioritized experience replay. arXiv preprint arXiv:2209.00532 (2022)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Seo, Y., Chen, L., Shin, J., Lee, H., Abbeel, P., Lee, K.: State entropy maximization with random encoders for efficient exploration. In: International Conference on Machine Learning, pp. 9443–9454. PMLR (2021)
Google Scholar
Yu, X., Lyu, Y., Tsang, I.: Intrinsic reward driven imitation learning via generative model. In: International Conference on Machine Learning, pp. 10925–10935. PMLR (2020)
Google Scholar
Yuan, M., Pun, M.O., Wang, D.: Rényi state entropy maximization for exploration acceleration in reinforcement learning. IEEE Trans. Artif. Intell. (2022)
Google Scholar
Zheng, C., Yang, S., Parra-Ullauri, J.M., Garcia-Dominguez, A., Bencomo, N.: Reward-reinforced generative adversarial networks for multi-agent systems. IEEE Trans. Emerg. Top. Comput. Intell. 6, 479–488 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Xi Liu, Long Ma, Zhen Chen & Yong Liao
University of Oxford, Oxford, UK
Changgang Zheng
Anhui Medical University, Hefei, China
Ren Chen
Edinburgh Napier University, Edinburgh, UK
Shufan Yang

Authors

Xi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Long Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Changgang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ren Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liao
View author publications
You can also search for this author in PubMed Google Scholar
Shufan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shufan Yang .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
DFKI: German Research Center for Artificial Intelligence, Oldenburg, Germany
Frederic Stahl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X. et al. (2023). A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment. In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science(), vol 14381. Springer, Cham. https://doi.org/10.1007/978-3-031-47994-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-47994-6_18
Published: 08 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47993-9
Online ISBN: 978-3-031-47994-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics