Address: [go: up one dir, main page]

Include Form Remove Scripts Session Cookies

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In

or

Email

Password

Remember me on this computer

or reset password

Enter the email address you signed up with and we'll email you a reset link.

Need an account? Click here to sign up

Log In
Sign Up

Download Free PDF

Download Free PDF

paper cover thumbnail

Q-learning

Q-learning

1992

Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states.

Related Papers

Operations research, computer science. Interface series

Q-Learning: A Tutorial and Extensions

1997 •

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

2001 •

In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.

Machine Learning

Incremental multi-step Q-learning

1996 •

Q-learning in two-player two-action games

Monica Babes-Vroman

Q-learning is a simple, powerful algorithm for behavior learn-ing. It was derived in the context of single agent decision making in Markov decision process environments, but its applicability is much broader—in experiments in multia-gent environments, Q-learning has also performed well. Our preliminary analysis using dynamical systems finds that Q-learning's indirect control of behavior via estimates of value contributes to its beneficial performance in general-sum 2-player games like the Prisoner's Dilemma.

stanford.edu

Extensions to Q-learning

Machine Learning

Continuous-action Q-learning

2002 •

Daniele Posenato

International Journal of Intelligent Systems

Concurrent Q-learning: Reinforcement learning for dynamic goals and environments

2005 •

Robert Ollington

This article presents a powerful new algorithm for reinforcement learning in problems where the goals and also the environment may change. The algorithm is completely goal independent, allowing the mechanics of the environment to be learned independently of the task that is being undertaken. Conventional reinforcement learning techniques, such as Q-learning, are goal dependent. When the goal or reward conditions change, previous learning interferes with the new task that is being learned, resulting in very poor performance. Previously, the Concurrent Q-Learning algorithm was developed, based on Watkin's Q-learning, which learns the relative proximity of all states simultaneously. This learning is completely independent of the reward experienced at those states and, through a simple action selection strategy, may be applied to any given reward structure. Here it is shown that the extra information obtained may be used to replace the eligibility traces of Watkin's Q-learning, allowing many more value updates to be made at each time step. The new algorithm is compared to the previous version and also to DG-learning in tasks involving changing goals and environments. The new algorithm is shown to perform significantly better than these alternatives, especially in situations involving novel obstructions. The algorithm adapts quickly and intelligently to changes in both the environment and reward structure, and does not suffer interference from training undertaken prior to those changes. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 1037–1052, 2005.

simple Q-learners

Michael Anderson

Q Learning in Context of Approximation Spaces

K.Sridhar Patnaik

This paper introduces an approach to reinforcement learning by cooperating agents using a variation of the Q-learning method. Q-learning is a model free method i.e. in this method agent does not need to predict future condition. The framework provided by approximation space makes it possible to minimize the overestimation caused by approximated Q-values. Due to overestimation learning capability of the algorithm is not consistent. It is observed that under this condition the ability to take a particular action is decreased. Therefore, by using the Rough Q-learning method the performance of the algorithm increases. This is shown by comparing the average Q values for Q learning and Rough Q learning by means of plots.

Training and delayed reinforcements in Q-learning agents

1997 •

Abstract Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent.

This document is currently being converted. Please check back in a few minutes.

RELATED PAPERS

Epistolari e santità femminile tra '300 e inizio '500. Convegno, Ecole française de Rome e ISIME, 18-19 marzo 2024

Alessandra Bartolomei Romagnoli

O Sistema dos Céus: a colusão entre catolicismo, fascismo e marxismo no Rio Grande do Norte (1930-1960)

Peixoto, Renato Amado - O Sistema dos Céus

2024 •

Renato Amado Peixoto

Revue forestière française

Indicateurs de gestion durable et enjeux forestiers des politiques publiques

2012 •

Jean-Luc PEYRON

O CURRÍCULO DE CIÊNCIAS HUMANAS E SOCIAIS NO BRASIL NO CONTEXTO DA ASCENSÃO DO CONSERVADORISMO

Amurabi Oliveira

Остроумов Н. Пример острословия сартов; Песня-сатира "Виктор-бай" / ЗВОРАО. Т.5, IX. СПб, 1891, 1896

PSIHOLOGIE ȘI RELIGIE

Ecaterina Hanganu

Gömöri János: Merkur-keresztes kerámiabélyegek Sopronból. Soproni Szemle 1980, 164-168.

1980 •

János Gömöri

Sprievodný materiál k podpornému opatreniu Činnosť na podporu sociálneho zaradenia

2023 •

Rastislav Rosinsky

The Politics of Silent Citizenship: Psychological government and the 'facts' of happiness

Этнографическое обозрение

“ЖИЛИ НЕ ТОЛЬКО ОДНОЙ РАБОТОЙ”: ИНТЕРВЬЮ С Н.С. ПОЛИЩУК

2023 •

About Tell Tweini (Syria): Artefacts, Ecofacts and Landscape. Research Results of the Belgian Mission

Sauvage C. & G. Jans - Clay Loom Weights of the Iron Age Period at Tell Tweini Field A

2019 •

Greta Jans, Caroline Sauvage

Transnational Literature

Identity and Nation in Kazuo Ishiguro’s An Artist of the Floating World

2018 •

Donato D'Urso, La lotta al brigantaggio in Basilicata (Nuova Antologia 2000)

Cureus

Incidence and Risk Factors of Postoperative Complications in General Surgery Patients

Shantanu Navgale

Zbornik sažetaka

Položaj kriminologa i kriminalista kao indikator ozbiljnost i savremene društvene zajednice / Criminologists and Criminalists Position as an Indicator of the Seriousness of Contemporary Society

2015 •

Elmedin Muratbegovic

Journal of Arts and Linguistics Studies

Transforming TESOL in Pakistan: Amplifying Language Learning through Visual Semiotics in ELT Textbooks' Cover Illustrations

2024 •

International Journal of Engineering & Technology

Receiving and Responding to WhatsApp Official Group Messages Among Employees: An Early Interpretation Analysis

2018 •

Zuraidah Abu Talib

Rhizomucor miehei lipase supported on chitosan-graphene oxide beads for the production of geraniol propionate

2015 •

Journal of Cultural Heritage Management and Sustainable Development

A value-based monitoring system to support heritage conservation planning

2013 •

Journal of Evidence-Based Social Work

Integrating Adolescent Substance Abuse Treatment with HIV Services: Evidence-Based Models and Baseline Descriptions

2014 •

Results in Physics

Dynamics of quantum discord of two coupled spin-1/2’s subjected to time-dependent magnetic fields

2019 •

Practices and Future Challenges

2011 •

ABUBAKAR SADEEQ

About
Press
Blog
People
Papers
Topics
We're Hiring!
Help Center

Find new research papers in:
Physics
Chemistry
Biology
Health Sciences
Ecology
Earth Sciences
Cognitive Science
Mathematics
Computer Science

Terms
Privacy
Copyright
Academia ©2024