A critical state identification approach to inverse reinforcement learning for autonomous systems

477 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Inverse reinforcement learning features a reward function based on reward features and positive demonstrations. When complex learning tasks are performed, the entire state space is used to form the set of reward features, but this large set results in a long computational time. Retrieving important states from the full state space addresses this problem. This study formulated a method that extracts critical features by combining negative and positive demonstrations of searching for critical states from the entire state space to increase learning efficiency. In this method, two types of demonstrations are used: positive demonstrations, which are given by experts and agents imitate, and negative demonstrations, which demonstrate incorrect motions to be avoided by agents. All significant features are extracted by identifying the critical states over the entire state space. This is achieved by comparing the difference between the negative and positive demonstrations. When these critical states are identified, they form the set of reward features, and a reward function is derived that enables agents to learn a policy using reinforcement learning quickly. A speeding car simulation was used to verify the proposed method. The simulation results demonstrate that the proposed approach allows an agent to search for a positive strategy and that the agent then displays intelligent expert-like behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Inverse Reinforcement Learning Based on Behaviors of a Learning Agent

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
MATH Google Scholar
Shi H, Sun G, Wang Y, Hwang KS (2018) Adaptive image-based visual servoing with temporary loss of the visual signal. IEEE Trans Industr Inf 15(4):1956–1965
Article Google Scholar
Hwang KS, Jiang WC, Chen YJ (2014) Model learning and knowledge sharing for a multiagent system with Dyna-Q learning. IEEE Trans Cybern 45(5):978–990
Article Google Scholar
Shi H, Li X, Hwang KS, Pan W, Xu G (2016) Decoupled visual servoing with fuzzy Q-learning. IEEE Trans Industr Inf 14(1):241–252
Article Google Scholar
Liu B, Singh S, Lewis RL, Qin S (2014) Optimal rewards for cooperative agents. IEEE Trans Auton Ment Dev 6(4):286–297
Article Google Scholar
Abbeel P, Dolgov D, Ng AY, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: 2008 IEEE/RSJ international conference on intelligent robots and systems, IEEE (pp 1083–1090)
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning, p 1
Hwang M, Jiang WC, Chen YJ, Hwang KS, Tseng YC (2019) An efficient unified approach using demonstrations for inverse reinforcement learning. IEEE Trans Cogn Develop Syst. https://doi.org/10.1109/TCDS.2019.2957831
Article Google Scholar
Michini B, Walsh TJ, Agha-Mohammadi AA, How JP (2015) Bayesian nonparametric reward learning from demonstration. IEEE Trans Rob 31(2):369–386
Article Google Scholar
Choi J, Kim KE (2014) Hierarchical bayesian inverse reinforcement learning. IEEE Trans Cybern 45(4):793–805
Article Google Scholar
Daskalakis C, Foster DJ, Golowich N (2021) Independent policy gradient methods for competitive reinforcement learning. arXiv:2101.04233
Moerland TM, Broekens J, Jonker CM (2020) Model-based reinforcement learning: a survey. arXiv:2006.16712
Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst 2021:1
Article Google Scholar
Haydari A, Yilmaz Y (2020) Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 2020:2
Google Scholar
Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: NIPS, vol 23, p 1342
Abbeel P, Coates A, Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Rob Res 29(13):1608–1639
Article Google Scholar
Arora S, Doshi P (2021) A survey of inverse reinforcement learning: challenges, methods and progress. Artif Intell 2021:103500
Article MathSciNet Google Scholar
Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In 2010 IEEE international conference on robotics and automation (pp 1142–1148), IEEE
Grollman DH, Billard A (2011) Donut as i do: learning from failed demonstrations. In 2011 IEEE international conference on robotics and automation (pp 3804–3809), IEEE
Zhang T, Liu Y, Hwang M, Hwang KS, Ma C, Cheng J (2020) An end-to-end inverse reinforcement learning by a boosting approach with relative entropy. Inf Sci 520:1–14
Article Google Scholar
Lopes M, Melo F, Montesano L (2009) Active learning for reward estimation in inverse reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases (pp 31–46). Springer, Berlin, Heidelberg
Kolter JZ, Abbeel P, Ng AY (2008) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in neural information processing systems (pp 769–776)
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Icml (Vol 1, p2)
Schapire RE (1999) A brief introduction to boosting. In: Ijcai (Vol 99, pp 1401–1406)
Pflueger M, Agha A, Sukhatme GS (2019) Rover-IRL: inverse reinforcement learning with soft value iteration networks for planetary rover path planning. IEEE Rob Autom Lett 4(2):1387–1394
Article Google Scholar
Zeng Y, Xu K, Qin L, Yin Q (2020) A semi-Markov decision model with inverse reinforcement learning for recognizing the destination of a maneuvering agent in real time strategy games. IEEE Access 8:15392–15409
Article Google Scholar
Pelusi D, Mascella R (2013) Optimal control Algorithms for second order Systems. J Comput Sci 9(2):183–197
Article Google Scholar
Roman RC, Precup RE, Petriu EM (2021) Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems. Eur J Control 58:373–387
Article MathSciNet Google Scholar
Turnip A, Panggabean JH (2020) Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension. Int J Artif Intell 18(1):193–206
Google Scholar
Xue W, Kolaric P, Fan J, Lian B, Chai T, Lewis FL (2021) Inverse reinforcement learning in tracking control based on inverse optimal control. IEEE Trans Cybern 2021:5
Google Scholar
Dvijotham K, Todorov E (2010) Inverse optimal control with linearly-solvable MDPs. In: ICML
Xiang F, Wang Z, Yuan X (2013) Dissimilarity sparsity-preserving projections in feature extraction for visual recognition. Appl Opt 52(20):5022–5029
Article Google Scholar
Xiang F, Jian Z, Liang P, Xueqiang G (2018) Robust image fusion with block sparse representation and online dictionary learning. IET Image Proc 12(3):345–353
Article Google Scholar
Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for Transfer Learning. 2007. In: Proceedings of the 24th international conference on machine learning
Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Aaai (vol 8, pp 1433–1438)
Lin JL, Hwang KS, Shi H, Pan W (2020) An ensemble method for inverse reinforcement learning. Inf Sci 512:518–532
Article Google Scholar
Pan W, Qu R, Hwang KS, Lin HS (2019) An ensemble fuzzy approach for inverse reinforcement learning. Int J Fuzzy Syst 21(1):95–103
Article MathSciNet Google Scholar
Konar A, Chakraborty IG, Singh SJ, Jain LC, Nagar AK (2013) A deterministic improved Q-learning for path planning of a mobile robot. IEEE Trans Syst Man Cybern Syst 43(5):1141–1153
Article Google Scholar
Hwang KS, Lin JL, Yeh KH (2015) Learning to adjust and refine gait patterns for a biped robot. IEEE Trans Syst Man Cybern Syst 45(12):1481–1490
Article Google Scholar
Doltsinis S, Ferreira P, Lohse N (2014) An MDP model-based reinforcement learning approach for production station ramp-up optimization: Q-learning analysis. IEEE Trans Syst Man Cybern Syst 44(9):1125–1138
Article Google Scholar
Hwang KS, Jiang WC, Chen YJ, Hwang I (2017) Model learning for multistep backward prediction in dyna-${Q}$ learning. IEEE Trans Syst Man Cybern Syst 48(9):1470–1481
Article Google Scholar
Xie Z, Zhang Q, Jiang Z, Liu H (2020) Robot learning from demonstration for path planning: a review. Sci China Technol Sci 2020:1–10
Google Scholar
Balian R (2004) Entropy, a protean concept. Progress Math Phys 38:119
MathSciNet Google Scholar
IRIS (2017) Inverse reinforcement learning based on critical state demo. IRIS Lab. National Sun Yat-sen University, Kaohsiung, Taiwan. [Online]. https://www.youtube.com/watch?v=cMaOdoTt4Hw. Accessed 16 Nov 2015

Download references

Author information

Authors and Affiliations

Zhejiang University, 866 Yuhangtang Rd, Hangzhou, 310058, People’s Republic of China
Maxwell Hwang
TungHai Unversity, No. 1727, Sec. 4, Taiwan Boulevard, Xitun District, Taichung City, 407224, Taiwan
Wei-Cheng Jiang & Yu-Jen Chen

Authors

Maxwell Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Cheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Jen Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Cheng Jiang.

Ethics declarations

Conflict of interest

Detailed information of all authors’ receives research support is listing as: This study was funded by the Ministry of Science and Technology, Taiwan, under Grant MOST 109-2221-E-029-022-. No other author has reported a potential conflict of interest relevant to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, M., Jiang, WC. & Chen, YJ. A critical state identification approach to inverse reinforcement learning for autonomous systems. Int. J. Mach. Learn. & Cyber. 13, 1409–1423 (2022). https://doi.org/10.1007/s13042-021-01454-x

Download citation

Received: 15 January 2021
Accepted: 17 October 2021
Published: 30 October 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s13042-021-01454-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Inverse Reinforcement Learning Based on Behaviors of a Learning Agent

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A critical state identification approach to inverse reinforcement learning for autonomous systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Inverse Reinforcement Learning Based on Behaviors of a Learning Agent

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation