Skip to main content
Carlos Mariano
    Page 301. Chapter 6 EVOLUTIONARY ALGORITHMS AND MULTIPLE OBJECTIVE OPTIMIZATION Carlos A. Coello Coello CINVESTAV-IPN, Departamento de Ingenieria Electrica Av. Institute Politecnico Nacional No. 2508, Col. ...
    ABSTRACT
    This paper describes a new algorithm, called MDQL, for the solution of multiple objective optimization problems. MDQL is based on a new distributed Q-learning algorithm, called DQL, which is also introduced in this paper. In DQL a family... more
    This paper describes a new algorithm, called MDQL, for the solution of multiple objective optimization problems. MDQL is based on a new distributed Q-learning algorithm, called DQL, which is also introduced in this paper. In DQL a family of independent agents, explo- ring different options, finds a common policy in a common environment. Information about action goodness is transmitted using traces over state- action pairs. MDQL extends this idea to multiple objectives, assigning a family of agents for each objective involved. A non-dominant criterion is used to construct Pareto fronts and by delaying adjustments on the rewards MDQL achieves better distributions of solutions. Furthermore, an extension for applying reinforcement learning to continuous functions is also given. Successful results of MDQL on several test-bed problems suggested in the literature are described.
    In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this... more
    In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.
    Abstract The di culty in solving multiple objective optimization problems with traditional techniques, has urge researchers to use alternative approaches. Ant-Q algorithms have shown good results in the solu-tion of combinatorial... more
    Abstract The di culty in solving multiple objective optimization problems with traditional techniques, has urge researchers to use alternative approaches. Ant-Q algorithms have shown good results in the solu-tion of combinatorial optimization problems, however little work ...
    Many problems can be characterized by several competing objectives. Multiple objective optimization problems have recently received considerable attention specially by the evolutionary algorithms community. Their proposals, however,... more
    Many problems can be characterized by several competing objectives. Multiple objective optimization problems have recently received considerable attention specially by the evolutionary algorithms community. Their proposals, however, require an adequate codification of the problem into strings, which is not always easy to do. This paper introduces a new algorithm, called MDQL, for multiple objective optimization problems which does not suffer from previous limitations. MDQL is based on a new distributed Q-learning algorithm, called DQL, which is also introduced in this paper. Furthermore, an extension for applying reinforcement learning to continuos functions is also given. Successful results of MDQL on a continuos non restricted problem whose Pareto front is convex and on a continuos non-convex problem with restrictions are described.