Abstract: In many reinforcement learning applications, the use of options based on sequences of low level actions (macro operators) has been reported to produce learning speedup due to a more active exploration of the state space. In this paper we present an evaluation of the use o f two sorts of option policies in a series of contexts: option policies O_{Π} (action sequences that depend on the states visited during execution), and option policies O_{S} (fixed sequence of actions, depending exclusively on the state in which the option is initiated). Our goals were a) to analyze O_{S} policies and compare them…to O_{Π} policies with respect to convergence and learned policy; and b) to study the use of a Termination Improvement technique for O_{S} policies which allows for interruption of option execution if a more promising one is found . Results show that an O_{S} policy can be more effective than an O_{Π} policy, unless the latter is designed considering prior domain knowledge. Moreover, Termination Improvement for O_{S} options increases effectiveness of learning due to autonomous adaptation of the size of the action sequence depending on the state where the option is initiated.
Show more