CoMAL - Collaborative Multi-Agent Large Language Models For
CoMAL - Collaborative Multi-Agent Large Language Models For
Mixed-Autonomy Traffic
Huaiyuan Yao1 , Longchao Da1 , Vishnu Nandam1 , Justin Turnau1 ,
Zhiwei Liu2 , Linsey Pang2 , Hua Wei1
1
Arizona State University, 2 Salesforce
arXiv:2410.14368v1 [cs.AI] 18 Oct 2024
CAV-3 .....
CAV CAV-2
CAV-4
.
Human Shared Message Pool Ring
CAV-1 CAV-n
Obs Actions
Scenario Task Planner Collaboration Module
Description Allocation Generation
Shared Messages
Figure Eight
LLM Agent Reason Engine
System prompt
Few-shot
Memory Module
Experiences
Execution Module
Prompt Generator
Merge
Updates
(a) Single-Agent Pipeline (b) Multi-Agent Workflow (c) Scenarios
Figure 1: The overall framework of CoMAL. (a) Single-Agent Pipeline: The prompt generator integrates scenario
descriptions, few-shot experiences, and shared messages, which are then fed into the LLM. The LLM subsequently
allocates tasks and generates planners. (b) Multi-Agent Workflow comprises three modules: the Collaboration
Module, the Reason Engine, and the Execution Module. (c) The three Benchmarks Scenarios for CoMAL Ring:
The ring road network consists of a closed-loop road where vehicles continuously travel in a circular fashion.
Figure Eight (FE): is an extension of the ring road, consisting of two circular loops connected by an intersection.
Merge: The merged network simulates how vehicles entering from an on-ramp cause disturbances.
knowledge-driven, relying on common sense, verbal the traffic system. Leveraging the stored experiences
communication, and reasoning [22]. This is in con- within the Memory Module, the agents participate in
trast to data-driven methods that require vast amounts a brainstorming session in the Collaboration Module,
of training data and often struggle with generaliza- where they allocate tasks and establish their specific
tion. Recent advancements in LLMs [23] offer promising roles. Subsequently, each agent develops driving plans
knowledge-driven solutions for addressing the mixed- according to its designated role through the Reason En-
autonomy traffic problem. LLMs can emulate human- gine. Then Reason Engine generates a rule-based driv-
like knowledge, including the ability to make decisions, ing planner grounded in the Intelligent Driver Model
form agreements, analyze road situations, and collabo- (IDM), which is then implemented in the Execution
rate in real-time. By leveraging this embodied human Module to compensate for the inherent limitations of
knowledge, multiple intelligent LLM agents with dis- LLMs in control performance.
tinct roles and attributes can work together to handle Our primary contributions are as follows:
complex tasks more efficiently [24], offering new avenues
for improving traffic systems through cooperation and 1) To the best of our knowledge, we are the first to in-
common-sense reasoning. tegrate the collaborative capability of multi-agent
Building upon these insights and recognizing the language models in autonomous driving. The Col-
limitations of RL in generalizing to new traffic situa- laboration Module effectively establishes a session
tions, we propose leveraging LLMs for their common- for multiple agents to engage in brainstorming and
sense reasoning and adaptability. LLMs offer better task allocation.
generalization by drawing on embedded knowledge, al-
lowing them to manage complex and unpredictable traf- 2) We propose CoMAL, a multi-agent framework de-
fic dynamics more effectively. We developed an innova- signed to address mixed-autonomy traffic chal-
tive framework named CoMAL specifically designed to lenges. CoMAL integrates rule-based planners to
address mixed-autonomy traffic challenges, as depicted enhance control mechanisms while leveraging the
in Figure 1. CoMAL comprises a simulation environ- cooperative and reasoning ability of LLM.
ment that enables the interaction of individual LLM
3) We evaluate CoMAL on Flow benchmark in three
agents and allows agents to collect perceptual data from
classical traffic scenarios (Figure 1 (c)) and com-
pare its performance against RL methods. Exten- multi-agent systems. In traffic control, LLMs facili-
sive experimental results demonstrate significant tate human-machine interaction and improve decision-
performance improvements in terms of average ve- making processes. The survey [23] discusses how LLMs
locity and driving smoothness across various LLMs. can be integrated into multi-agent systems to enhance
communication and coordination among agents in traf-
4) We conduct experiments using the latest LLM fic environments. And the study DiLu [22] explores
models, including GPT-4o-mini and Qwen- a knowledge-driven approach using LLMs, while LLM-
72B/32B/7B, showcasing the adaptability of Light [35] utilizes LLMs as decision-making agents for
CoMAL across a diverse range of LLMs and traffic signal control. A pioneering model Open-TI [36]
demonstrating its superior performance over that integrates LLMs with external traffic analysis tools
state-of-the-art RL approaches in mixed-autonomy to perform comprehensive traffic simulations and task-
traffic scenarios. specific operations. Their findings suggest that incorpo-
rating domain-specific knowledge into LLMs can signifi-
2 Related Work cantly enhance the performance and reliability of multi-
agent systems in complex scenarios. This involves not
2.1 Mixed-Autonomy Traffic
only improving the performance of individual agents but
Mixed-autonomy traffic, where connected autonomous
also fostering effective communication and coordination
vehicles (CAVs) along with human-driven vehicles exist
among the agents.
in a system [7], presents a significant challenge in
traffic dynamics modeling and control. A control
3 Methodology
strategy named the “slow-in, fast-out” approach [13]
has demonstrated improvements in traffic throughput We introduce CoMAL, a framework designed for LLM
with a minimal percentage of autonomous vehicles. agents integrated into connected autonomous vehicles
Reinforcement learning (RL) offers a more dynamic (CAVs) to collaborate and enhance the overall velocity
and adaptable solution. Benchmarks in RL [25] like and driving smoothness of traffic flow. As illustrated
Mujoco and the Arcade Learning Environment [26] in Figure 1, we delineate CoMAL at two distinct levels:
provide systematic evaluation and comparison of algo- the single-agent pipeline and the multi-agent workflow.
rithms. And especially for Mixed-Autonomy, bench- At the single-agent level, the LLM-based agents
mark Flow [7] proposes four traffic scenarios to il- make decisions based on prompts that include few-shot
lustrate distinct RL problems including shockwave experiences stored in memory, as well as scenario de-
minimization, inflow management, efficient merging, scriptions derived from environmental perception. The
and intersection control. It evaluates and compares agents operate in two modes: task allocation within the
RL algorithms like Trust Region Policy Optimiza- Collaboration Module and planner generation within
tion (TRPO) [15, 27], Proximal Policy Optimization the Reason Engine. The multi-agent workflow consists
(PPO) [28], Evolutionary Strategies (ES) [29], and Aug- of three modules: the Collaboration Module, the Rea-
mented Random Search (ARS) [30] in traffic scenar- son Engine, and the Execution Module. In the Col-
ios. [31] also explores the mixed-autonomy scenario in a laboration Module, CoMAL establishes a shared mes-
multi-agent traffic signal control system [32]. sage pool that facilitates brainstorming and collabora-
tive decision-making among agents. Within this shared
2.2 Large Language Model-based Multi-Agents space, agents collectively allocate tasks, define their re-
Large Language models (LLMs) have become integral spective roles, and formulate individual driving plans.
to multi-agent systems [23] due to their capabilities Each agent then generates a rule-based driving plan-
in generalization and common-sense reasoning. LLM- ner in Reason Engine, which is subsequently executed
based multi-agent systems leverage these strengths to within the Execution Module to ensure coordinated
enhance decision-making and communication among driving behavior and smooth traffic flow.
agents [33]. This approach is particularly beneficial in
complex scenarios like mixed-autonomy traffic, where it 3.1 Single-Agent Pipeline
is essential to have effective interaction between human The quality of prompts significantly influences the out-
drivers and autonomous systems. put quality of LLM. CoMAL utilizes a prompt gener-
The communication structure of LLM-based multi- ator that integrates all essential information for effec-
agent systems varies across different studies to ad- tive decision-making. The workflow for each individ-
dress specific challenges [23, 34, 24]. For example, ual agent involves several steps: (1) encode the sce-
research [24] has explored both centralized and de- nario into a textual description within the Perception
centralized communication structures for LLM-based Module; (2) recall relevant driving experiences from the
Driving Task
You are the brain of an autonomous vehicle in the road. Your
vehicle ID is CAV_01. You can connect all the autonomous
vehicles in the scenario. Please make the decision to optimize the
average velocity of all the vehicles. Try your best to avoid collisions Collaboration Module
with other objects. Thank you all for your cooperation. Since everyone has confirmed their
positions and roles, I will lead the queue, setting a consistent speed to
maintain safe distances. Let's proceed safely and ensure we avoid
Map description collisions while maintaining optimal speed. Drive safely, everyone!
The merge map consists of two straight single lanes. One is the My Role Decision: "Leader"
main highway where all vehicles drive, and another is the merge
branch to intersect with main highy way. The perturbations caused
by the vehicles entering through the ramp leads to the formation of Reason Engine
congested waves downstream in the main highway. Step 1. Role Clarification
After reviewing the positions the consensus shown by other vehicles, I
am ready to take on the role of the leader.
Motion state
Your speed is 3.80 m/s, and lane position is 11.75 m. Step 2. Scene Understanding
There are other vehicles driving around you, and below is their My position at 27.26m provides a good overview ahead and control over
basic information: initiating movement through the intersection.
- Vehicle CAV_02 is driving on the same lane as you. The speed of Step 3. Motion Instruction
it is 4.51 m/s, and lane position is 15.64 m. ......... I propose we start moving slowly and steadily to ensure all vehicles can
follow safely.
Step 4. Planner Generation
Let’s use the planner: driving_planner(a_max = 0.2, v_target = 6.0,
min_headway = 2.0)
Figure 2: (a) Left: A detailed prompt example for CoMAL, consisting of a system prompt that specifies the
driving task, along with map description and motion state provided by the Perception Module. (b) Right: A case
of the collaboration and reasoning process. Following task allocation during brainstorming, a hierarchical chain
of thought breaks down the driving plan into incremental steps, ensuring consistency in decision-making. This
process includes role clarification, scene understanding, motion instruction, and planner generation.
Memory Module; (3) receive shared messages from other hicle and surrounding agents, which directly influences
agents through the Collaboration Module; (4) gener- the planning of vehicles’ movement.
ate prompts and feed it into the LLM; (5) decode the
LLM response for task allocation or planning purposes. 3.1.2 Memory Module Similar to human drivers,
In this section, we detail the Environment Perception the agent must make decisions based on reasoning
Module and Memory Module. processes that are informed by past driving experiences.
To achieve this, we employ a Memory Module that
3.1.1 Environment Perception Module To effi- stores experiences from previous driving scenarios and
ciently extract prompts from complex environmental handmade instructions. Initially, the agent is provided
data and enhance the scene understanding of LLMs, we with a set of predefined experiences, which the LLM
design an Environment Perception Module. This mod- then updates continuously as it engages in reasoning
ule extracts key information from the simulation envi- during new situations. This approach allows the agent
ronment and constructs a textual scenario description. to refine its decision-making over time, improving its
The description follows a set of standard rules to gener- performance in diverse driving contexts.
ate a thorough representation in natural language. The
scene information is divided into two parts: static map 3.2 Multi-Agent Workflow In a mixed-autonomy
and dynamic agents, as shown in Figure 2. traffic setting, where CAVs operate alongside human
The static map information represents the scenario drivers, the main objective is to optimize overall traffic
type, providing semantic priors for vehicle motion plan- flow. To achieve this, we propose a three-stage decision-
ning. The description of the map helps the LLM in- making workflow. In the Collaboration Module, agents
tuitively understand the scenario’s geometry. The dy- first discuss and allocate tasks within a public message
namic information describes the motion of the ego ve- pool. In Reason Engine, each agent then independently
Good day, fellow vehicles! This is Vehicle 01. In order to form an efficient queue that allows us all to pass quickly
and safely, I propose the following initial order according to our current positions: Vehicle 1, Vehicle 2, Vehicle 3,
Vehicle 4 ...... Each vehicle should maintain a safe following distance from the vehicle in front. Please confirm your
agreement or suggest adjustments. Let's ensure we avoid collisions and maintain optimal speed!
CAV-01 My Role Decision: "Not decided yet"
Hello everyone, this is Vehicle 02. I agree with Vehicle 01's proposed order since it reflects our current positions
accurately. If no one has any objections, I suggest we move forward with this plan and start adjusting our speeds to
maintain safe distances. Waiting for confirmation from the rest. Thank you!
My Role Decision: "Follower"
CAV-02
......
Thank you all for your cooperation. Since everyone has confirmed their positions and roles, I will lead the queue,
setting a consistent speed to maintain safe distances. Here's the final order and role assignment: 1. Vehicle 1
(Leader) 2. Vehicle 2 (Leading Follower) 3. Vehicle 3 (Follower).... Let's proceed safely and ensure we avoid
collisions while maintaining optimal speed. Drive safely, everyone!
CAV-01 My Role Decision: "Leader"
Figure 3: Demonstration of the interaction process of agents in the scenario Figure Eight 1. The agents decide
to form a queue and subsequently allocate the roles of leader and follower.
determines its driving plan based on its assigned role System Prompt The system prompt defines the
and generates a driving planner. Finally, the driving planning task and associated driving knowledge. Its pri-
planner is executed in the Execution Module. mary goal is to standardize the format of both input and
output data, as well as clarify the objectives of planner
3.2.1 Collaboration Module Collaborative agents generation. Specifically, it ensures a clear understanding
work together towards a shared objective, typically ex- of the physical meaning of each parameter in the IDM
changing information to enhance the collective solution. planner, such as speed limit (v0 ), maximum accelera-
In the Collaboration Module, all CAVs participate in- tion (am ), and minimum headway (s0 ). This provides a
teractively by forming a queue for brainstorming and structured foundation for the decision-making process.
communication. In the brainstorming session, the vehi- Hierarchical Chain-of-thougts The hierarchical
cles take turns subsequently speaking in a public chan- chain-of-thought process involves four critical compo-
nel to propose strategies and assign tasks among them- nents: role clarification, scene understanding, motion
selves. This collaborative effort ensures that each CAV instruction, and planner generation. Initially, it is cru-
understands its specific role in the traffic system. cial to clarify the role and task of the ego vehicle within
Communication Structure Here, we introduce a a collaborative context. Then the LLM is directed to
shared message pool to boost communication efficiency, focus on key information in the scenario, such as head-
as shown in Figure 3. This communication structure way distance and lead vehicles. Based on the scenario
maintains a shared message pool where agents can analysis, the LLM is then prompted to provide motion
publish messages and subscribe to the latest messages instructions for the ego vehicle. Finally, each agent uti-
from one another. Agents will take turns to speak one lizes scenario analysis and motion instructions to gen-
at a time until the strategy is fully developed and each erate a driving planner, parameterized by IDM model.
agent’s role is clearly defined.
3.2.3 Execution Module We utilize the rule-based
3.2.2 Reason Engine During team brainstorming, IDM model as a planner to execute driving strategies by
each agent determines its role and formulates a strat- adjusting its parameters. IDM is a car-following model
egy to collaborate with other vehicles. Based on these to compute longitudinal dynamics. In this model, the
defined roles, the Reason Engine generates an appro- acceleration ak for vehicle k is defined by its bumper-
priate driving planner to effectively control the vehi- to-bumper headway sk (distance to preceding vehicle),
cle. The Reason Engine takes scenario description and velocity vk , and relative velocity ∆vk , via the following
predefined system prompts as inputs. Subsequently, equation:
the LLM generates the driving planner based on IDM
through reasoning guided by a hierarchical chain-of-
thought prompt.
Figure Eight (FE) The FE network builds on
the ring road by connecting two circular loops via an
dvk vk s∗ (vk , ∆vk ) 2
(3.1) ak = = amax [1 − ( )δ − ( ) ] intersection. In mixed-autonomy scenarios, CAVs are
dt v0 sk introduced to smooth traffic and prevent stop-and-go
where s∗ is the desired headway of the vehicle, waves.
denoted by: Merge The merged network simulates highway
disturbances caused by vehicles entering from an on-
ramp, which creates stop-and-go waves. In mixed-
vk ∆vk
(3.2) s∗ (vk , ∆vk ) = s0 + max(0, vk T + √ ) autonomy scenarios, CAVs are tasked with mitigating
2 amax b these waves based on local observations and adjusting
where s0 , v0 , T, δ, amax , b are given parameters. We to fluctuating vehicle numbers in the open network.
set the desired time headway T , the comfortable braking We investigate different levels of difficulty for each
deceleration b, and the acceleration exponent δ as proposed benchmark by adjusting their scenario-specific
constants while adjusting the desired velocity v0 , the meta-parameters. Table 1 provides detailed descriptions
minimum spacing s0 , and the maximum acceleration of the selected meta-parameters for each benchmark.
amax to tailor the driving planners. Thus Reason
Engine generates a driving planner by customizing Table 1: Configurations of Benchmarks
IDM’s parameters (v0 , amax , s0 ).
Scenario Name Time(s) Vehicles Distribution
4 Experiments FE 0 150 13 humans, 1 CAV
In a mixed-autonomy setting, a subset of vehicles are FE 1 150 7 humans, 7 CAVs
tasked with the objective of improving overall traffic FE 2 150 0 humans, 14 CAVs
flow and mitigating the formation and propagation of Ring 0 150 21 humans, 1 CAV
stop-and-go waves. Thus, in our experiments, we aim Ring 1 150 19 humans, 3 CAVs
to address several key questions: Ring 2 150 11 humans, 11 CAVs
Merge 0 75 10.0% CAV penetration rate
• How can CAVs enhance traffic flow and eliminate Merge 1 75 25.0% CAV penetration rate
stop-and-go shockwaves? Merge 2 75 33.3% CAV penetration rate
Merge 3 75 50.0% CAV penetration rate
• How do multiple LLM-based agents collaborate to Merge 4 75 90.0% CAV penetration rate
achieve this goal?
• Do different LLM models influence the results? 4.3 Metrics To provide a comprehensive assessment
of traffic flow and mitigate the occurrence of shock-
4.1 Implementation Details The experiments are waves, we utilize two metrics:
conducted in Flow [15] with SUMO [37], a microscopic • Average vehicle speed in the network (m/s). Higher
simulator for traffic and vehicle dynamics. For details values indicate better overall traffic flow.
on the architecture and on training autonomous vehicles
to maximize system-level velocity, we refer the readers • Standard deviation of vehicle speed (m/s). The
to [15]. The environment offers several driving models smaller is more stable. Lower values reflect greater
to simulate human driver and a realistic interaction stability and consistency in traffic movement.
between vehicles. We adopt OpenAI GPT-4o-mini,
Qwen-72B/32B/7B in this paper. 4.4 Specification on Communication In this sec-
tion, we focus on the interactive process among agents
4.2 Scenarios We evaluate our model on the Figure as they work to solve the mixed-traffic problem. In the
Eight (FE), Ring, and Merge scenarios from the Flow FE scenario, the agents recognize the need to form a
benchmark. Further details are provided below and queue, identify a leader, and designate the remaining
illustrated in Figure 1 (c). agents as followers. The process of task allocation and
Ring The ring road network consists of a circu- leader selection is illustrated in Figure 3. Additionally,
lar lane where vehicles continuously travel in a loop. in the ring and merge scenarios, agents aim to elimi-
It is commonly used to study traffic dynamics, as dis- nate shockwaves. Their reasoning is as follows: if there
turbances can cause stop-and-go waves. In mixed- is relative traffic congestion ahead of the ego vehicle, the
autonomy scenarios, CAVs are deployed to reduce these agent approaches the lead vehicle slowly; otherwise, it
waves and enhance traffic flow stability. accelerates to follow the lead vehicle closely.
Table 2: Quantitative Evaluation of CoMAL on Flow Benchmarks
Metric Model FE 0 FE 1 FE 2 Ring 0 Ring 1 Ring 2 Merge 0 Merge 1 Merge 2 Merge 3 Merge 4
Human Driver 5.61 5.61 5.61 2.88 2.88 2.88 6.40 6.40 6.40 6.40 6.40
Avg
CoMAL 6.40 6.47 6.29 2.86 2.85 2.87 6.59 7.40 7.42 7.86 8.83
Human Driver 4.55 4.55 4.55 0.79 0.79 0.79 3.12 3.12 3.12 3.12 3.12
Std
CoMAL 1.74 1.77 2.24 0.29 0.26 0.31 2.88 2.91 2.61 2.47 2.70
Figure 4: Visualization of vehicle trajectories in Ring 0 setting. The ring road has a total length of 230 meters
and contains 22 vehicles. Each line in the space-time diagrams shows the position of a specific vehicle over time.
When a vehicle completes a full lap of the ring, its position resets to zero. Left: In the absence of automated
vehicles, human-driven vehicles exhibit stop-and-go shockwaves due to inherent instability. Right: With three
connected autonomous vehicles using the CoMAL framework, the unstable vehicles are stabilized.