Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1992
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states.
2001 •
In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.
Q-learning is a simple, powerful algorithm for behavior learn-ing. It was derived in the context of single agent decision making in Markov decision process environments, but its applicability is much broader—in experiments in multia-gent environments, Q-learning has also performed well. Our preliminary analysis using dynamical systems finds that Q-learning's indirect control of behavior via estimates of value contributes to its beneficial performance in general-sum 2-player games like the Prisoner's Dilemma.
International Journal of Intelligent Systems
Concurrent Q-learning: Reinforcement learning for dynamic goals and environments2005 •
This article presents a powerful new algorithm for reinforcement learning in problems where the goals and also the environment may change. The algorithm is completely goal independent, allowing the mechanics of the environment to be learned independently of the task that is being undertaken. Conventional reinforcement learning techniques, such as Q-learning, are goal dependent. When the goal or reward conditions change, previous learning interferes with the new task that is being learned, resulting in very poor performance. Previously, the Concurrent Q-Learning algorithm was developed, based on Watkin's Q-learning, which learns the relative proximity of all states simultaneously. This learning is completely independent of the reward experienced at those states and, through a simple action selection strategy, may be applied to any given reward structure. Here it is shown that the extra information obtained may be used to replace the eligibility traces of Watkin's Q-learning, allowing many more value updates to be made at each time step. The new algorithm is compared to the previous version and also to DG-learning in tasks involving changing goals and environments. The new algorithm is shown to perform significantly better than these alternatives, especially in situations involving novel obstructions. The algorithm adapts quickly and intelligently to changes in both the environment and reward structure, and does not suffer interference from training undertaken prior to those changes. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 1037–1052, 2005.
This paper introduces an approach to reinforcement learning by cooperating agents using a variation of the Q-learning method. Q-learning is a model free method i.e. in this method agent does not need to predict future condition. The framework provided by approximation space makes it possible to minimize the overestimation caused by approximated Q-values. Due to overestimation learning capability of the algorithm is not consistent. It is observed that under this condition the ability to take a particular action is decreased. Therefore, by using the Rough Q-learning method the performance of the algorithm increases. This is shown by comparing the average Q values for Q learning and Rough Q learning by means of plots.
1997 •
Abstract Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent.
O Sistema dos Céus: a colusão entre catolicismo, fascismo e marxismo no Rio Grande do Norte (1930-1960)
Peixoto, Renato Amado - O Sistema dos Céus2024 •
Revue forestière française
Indicateurs de gestion durable et enjeux forestiers des politiques publiques2012 •
1980 •
2023 •
Этнографическое обозрение
“ЖИЛИ НЕ ТОЛЬКО ОДНОЙ РАБОТОЙ”: ИНТЕРВЬЮ С Н.С. ПОЛИЩУК2023 •
About Tell Tweini (Syria): Artefacts, Ecofacts and Landscape. Research Results of the Belgian Mission
Sauvage C. & G. Jans - Clay Loom Weights of the Iron Age Period at Tell Tweini Field A2019 •
Transnational Literature
Identity and Nation in Kazuo Ishiguro’s An Artist of the Floating World2018 •
2015 •
Journal of Arts and Linguistics Studies
Transforming TESOL in Pakistan: Amplifying Language Learning through Visual Semiotics in ELT Textbooks' Cover Illustrations2024 •
International Journal of Engineering & Technology
Receiving and Responding to WhatsApp Official Group Messages Among Employees: An Early Interpretation Analysis2018 •
2015 •
Journal of Cultural Heritage Management and Sustainable Development
A value-based monitoring system to support heritage conservation planning2013 •
Journal of Evidence-Based Social Work
Integrating Adolescent Substance Abuse Treatment with HIV Services: Evidence-Based Models and Baseline Descriptions2014 •