We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 1
arXiv:2303.11366v1 [cs.AI] 20 Mar 2023
Reflexion: an autonomous agent with dynamic
memory and self-reflection
Noah Shinn Beck Labash
Northeastern University Northeastern University
Boston, MA Boston, MA
shinn.nénortheastern.edu Labach.bénortheastern.edu
Ashwin Gopinath
‘Massachusetts Institute of Technology
‘Cambridge, MA
agopitnit eda
Abstract
Recent advancements in decision-making large language model (LLM) agents have
‘demonstrated impressive performance across various benchmarks. However, these
state-of-the-art approaches typically necessitate internal model fine-tuning, external
model fine-tuning, or policy optimization over a defined state space. Implementing
these methods can prove challenging due to the scarcity of high-quality taining
data or the lack of well-defined state space, Moreover, these agents do not possess
certain qualities inherent to human decision-making processes, specifically the
ability to lea from mistakes. Self-rflection allows humans to efficiently solve
‘novel problems through a process of trial and error. Building on recent research, we
propose Reflexion, an approach that endows an agent with dynamic memory and
self-reflection capabilities to enhance its existing reasoning trace and task-specific
action choice abilities, To achieve full automation, we introduce a straightforward
yet effective heuristic that enables the agent to pinpoint hallucination instances,
_void repetition in action sequences, and, in some environments, construct an inter-
‘nal memory map of the given environment. To assess our approach, we evaluate
the agent's ability to complete decision-making tasks in AlfWorld environments
and knowledge-intensive, search-based question-and-answer tasks in HotPolQA
‘environments, We observe success rates of 97% and 51%, respectively, and provide
a discussion on the emergent property of self-reflection.
1 Introduction
‘Mastering decision-making and knowledge-intensive search tasks in novel environments is a crucial
skill set for large-scale natural language agents. LLMs such as OpenAI's GPT:3 (Brown et al,
2020), Google’s PaLM (Chowdhery et a., 2022), and others have achieved impressive results on
various benchmarks (Kaplan et al, 2020; Rae et al., 2021; Nakano et al, 2021; Kojima et al.. 2022;
Ouyang et al, 2022; Chung et al., 2022). These models exhibit human-like abilities to understand
tasks in given environments, marking significant progress inthe field of natural language processing.
Grounding complex tasks in natural language allows agents to overcome high syntactic barriers that
may result in false-negative errors, However, learning optimal policies for natural language RL agents
is challenging due to vast and mostly unbound state spaces.
Several decision-making approaches have been proposed to enable natural language agents to select
their next action without a learned policy in text-based environments, Chain-of-thought (CoT)
Under review
Preprin