Address: [go: up one dir, main page]

Include Form Remove Scripts Session Cookies

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In

or

Email

Password

Remember me on this computer

or reset password

Enter the email address you signed up with and we'll email you a reset link.

Need an account? Click here to sign up

Log In
Sign Up

Download Free PDF

Download Free PDF

paper cover thumbnail

Training and delayed reinforcements in Q-learning agents

Training and delayed reinforcements in Q-learning agents

1997

Abstract Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent.

Related Papers

Q-learning in two-player two-action games

Monica Babes-Vroman

Q-learning is a simple, powerful algorithm for behavior learn-ing. It was derived in the context of single agent decision making in Markov decision process environments, but its applicability is much broader—in experiments in multia-gent environments, Q-learning has also performed well. Our preliminary analysis using dynamical systems finds that Q-learning's indirect control of behavior via estimates of value contributes to its beneficial performance in general-sum 2-player games like the Prisoner's Dilemma.

simple Q-learners

Michael Anderson

Incorporating Advice into Agents that Learn from Reinforcements

Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present a system called ratle that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In ratle, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple programming language. Based on techniques from knowledge-based neural networks, ratle inserts these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that shows our approach leads to statistically-significant gains in expected reward. Importantly, the advice improves the expected reward regardless of the stage of training at which it is given. A shorter version of this paper appears in the Pr...

Q-Decomposition for Reinforcement Learning Agents

Stuart J. Russell

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

2001 •

In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.

Operations research, computer science. Interface series

Q-Learning: A Tutorial and Extensions

1997 •

International Journal of Intelligent Systems

Concurrent Q-learning: Reinforcement learning for dynamic goals and environments

2005 •

Robert Ollington

This article presents a powerful new algorithm for reinforcement learning in problems where the goals and also the environment may change. The algorithm is completely goal independent, allowing the mechanics of the environment to be learned independently of the task that is being undertaken. Conventional reinforcement learning techniques, such as Q-learning, are goal dependent. When the goal or reward conditions change, previous learning interferes with the new task that is being learned, resulting in very poor performance. Previously, the Concurrent Q-Learning algorithm was developed, based on Watkin's Q-learning, which learns the relative proximity of all states simultaneously. This learning is completely independent of the reward experienced at those states and, through a simple action selection strategy, may be applied to any given reward structure. Here it is shown that the extra information obtained may be used to replace the eligibility traces of Watkin's Q-learning, allowing many more value updates to be made at each time step. The new algorithm is compared to the previous version and also to DG-learning in tasks involving changing goals and environments. The new algorithm is shown to perform significantly better than these alternatives, especially in situations involving novel obstructions. The algorithm adapts quickly and intelligently to changes in both the environment and reward structure, and does not suffer interference from training undertaken prior to those changes. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 1037–1052, 2005.

stanford.edu

Extensions to Q-learning

Adaptive Behavior

Training Agents to Perform Sequential Behavior

1994 •

Marco Colombetti

Principios de Aprendizaje y Conducta, Domjan, 2010 6Ed

Jesús Moisés Llanes Gatica

This document is currently being converted. Please check back in a few minutes.

RELATED PAPERS

Mezzo secolo di studi e qualche nuova osservazione (1974-2023), in Antonio Campi a Torre Pallavicina. L’Oratorio di Santa Lucia, a cura di E. Scianna, Persico Dosimo 2023, pp. 67-79

Oceanic Linguistics

The Greater West Bomberai Language Family

2022 •

Timothy Usher, Antoinette Schapper

Teaching literature

Ahmed C H E N A F Ramdani

Reflexiones en torno al Diseño Industrial

2020 •

Revista Metafísica y Persona

Philosophy and Neuroscience: Relation between Mirror Neurons and Empathy

2018 •

Santiago de Arteaga Gallinal

Normas regulamentadoras

DASAR-DASAR CONTROL COMPONENT DAN SYSMAC

Review of Biblical Literature

Book review.rbl christs first theologian

2018 •

Dr. Jeffrey L. Cockrell

Organizing the Parthian Empire ( English )

jalal dilmaghani

The Kingcaid Billionaires 04 Tracie Delaney Mesmerized By You

Isabelle Budescu

Journal of Science and Medicine in Sport

Athletic performance and training characteristics in Junior Tennis Davis-Cup Players

2015 •

Alexander Ferrauti

Archiv für Diplomatik

Sigillum Petri plebani de Glathovia. Ein spätmittelalterliches Pfarrersiegel aus Klattau (Böhmen)

2004 •

Journal of Organic Chemistry

Photoinduced Skeletal Rearrangement of N-Substituted Colchicine Derivatives

2020 •

Academia Medicine

Racial and gender disparities in the effect of new drug approvals on U.S. cancer mortality

2024 •

Frank Lichtenberg

英语文凭《詹姆斯·库克大学学位证书》【176555708微信】《JCU研究生文凭证书》【詹姆斯·库克大学毕业证书成绩单全套办理】【留学毕业文凭】

hgkghk jhljhkhg

Border/Lines

Wealth and Nations: Modern Nationalism in Catalonia and Quebec

1987 •

Robert Schwartzwald

Information engineering express

Quantitative Measurement and Analysis to Computational Thinking for Elementary Schools in Japan

2022 •

IGI Global eBooks

Trust as an Aspect of Organisational Culture

2012 •

Afyon Kocatepe Üniversitesi Sosyal Bilimler Dergisi

Hz. Peygamber (sav)’in Kurduğu Devlette Milli İrade

About
Press
Blog
Papers
Topics
We're Hiring!
Help Center

Find new research papers in:
Physics
Chemistry
Biology
Health Sciences
Ecology
Earth Sciences
Cognitive Science
Mathematics
Computer Science

Terms
Privacy
Copyright
Academia ©2024