A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles
<p>A diagrammatic representation illustrating the difference between (<b>a</b>) single-agent RL framework and (<b>b</b>) multi-agent RL framework.</p> "> Figure 2
<p>Schematic diagram of the three popular learning paradigms in multi-agent RL setting.</p> "> Figure 3
<p>A taxonomy of categories of MARL for CAVs.</p> "> Figure 4
<p>Popular CAV simulators: (<b>a</b>) SUMO; (<b>b</b>) CityFlow; (<b>c</b>) CARLA; (<b>d</b>) OpenAI traffic simulator; (<b>e</b>) Multi-Car Racing Gym Environment; (<b>f</b>) INTERSECTION data visualization tool; (<b>g</b>) Highway-env; (<b>h</b>) Gym-Gazebo; (<b>i</b>) PTV VISSIM; and (<b>j</b>) PRESCAN.</p> "> Figure 4 Cont.
<p>Popular CAV simulators: (<b>a</b>) SUMO; (<b>b</b>) CityFlow; (<b>c</b>) CARLA; (<b>d</b>) OpenAI traffic simulator; (<b>e</b>) Multi-Car Racing Gym Environment; (<b>f</b>) INTERSECTION data visualization tool; (<b>g</b>) Highway-env; (<b>h</b>) Gym-Gazebo; (<b>i</b>) PTV VISSIM; and (<b>j</b>) PRESCAN.</p> ">
Abstract
:1. Introduction
- This survey presents a comprehension of SOTA DRL techniques used in MARL.
- It discusses the uncertainty-aware algorithms used in motion planning to tackle the problems of a real-time environment.
- It includes the learning paradigms and techniques of MARL with their details.
- It describes the open-source simulators available for CAV applications.
- Useful datasets available in the public domain have also been introduced in this survey.
- It incorporates the popular applications of CAVs and involved techniques with their advantages and limitations.
- Finally, it presents the shortcomings and research gaps in the CAV domain and suggests future directions to fill this gap using MARL techniques.
2. Multi-Agent Reinforcement Learning
2.1. Decentralized Training and Decentralized Execution (DTDE)
2.2. Centralized Training and Centralized Execution (CTCE)
2.3. Centralized Training and Decentralized Execution (CTDE)
3. MARL for CAVs
3.1. Simulators
Reference No. | Category | Algorithmic Contributions | Test Scenario | Simulator/Framework |
---|---|---|---|---|
[47] | Policy gradient without Markovian assumptions, option graph with a gating mechanism | Double merge scenario | Custom | |
[48] | Cooperative motion planning | LSTM #, REINFORCE | Vehicle platooning on freeway | Highway-env simulator |
[49] | Value function-based RL, kinematics constraint encoding | Two-agent collision avoidance | Custom | |
[50] | Q-learning, graph convolution network | On-ramp-merging scenarios | SUMO | |
[51] | Coordination graphs, max-plus algorithm | Lane-free driving | Flow, SUMO | |
[52] | Curriculum learning, PPO $ | Stop-and-go wave | SUMO, Veins | |
[53] | Hierarchical MARL | One-to-One racing | Kart racing environment | |
[54] | Multi-agent advantage actor–critic, parameter sharing | Lane-changing scenario | Highway-env simulator | |
[55] | Mean field multi-agent DQN % | Dynamic routing problem | SUMO | |
[56] | Curriculum learning, PPO | Bidirectional driving on a narrow road | Multi-Car Racing Gym Environment | |
[57] | Shapley value-based reward allocation | Lane-changing scenario | CARLA | |
[58] | Altruism as convex optimization | Highway merging | Custom | |
[59] | Trajectory prediction | Latent representation learning for RL | Lane-merging scenarios | CARLA |
[60] | Continual learning, graph neural network | INTERACTION dataset scenarios | INTERACTION dataset visualization tool | |
[61] | Graph neural network, ego- and allocentric approach | INTERACTION and TrajNet++ dataset scenarios | INTERACTION dataset visualization tool | |
[62] | Graph attention network, parameter sharing | INTERACTION and NGSIM dataset scenarios | INTERACTION dataset visualization tool | |
[63] | Spatiotemporal graph autoencoder, kernel density estimation | MAAD dataset | Multi-Car Racing Gym Environment | |
[64] | Intelligent traffic management | Curriculum learning, LSTM | Unsignalized intersection | SUMO |
[65] | Fastest crossing time point algorithm, MA-DQN | Unsignalized intersection | Custom (Python-based simulation) | |
[66] | Delay aware Markov game, MA-DDPG £ | Unsignalized intersection | Highway-env simulator | |
[67] | Outflow congestion (new metric), transfer RL | Traffic congestion | SUMO | |
[68] | Game-theoretic auction mechanism | Unsignalized intersection, roundabout, merging scenarios | OpenAI traffic simulator | |
[69] | Transfer planning, max-plus coordination, DQN | Unsignalized intersection | SUMO | |
[70] | Independent Q-learning | Cooperative traffic light control | VISSIM | |
[71] | Multi-agent advantage actor-critic, multiple local learning agents, spatial discount factor | Cooperative traffic light control | SUMO | |
[72] | Graph neural network, recurrent neural network | Cooperative traffic light control | CityFlow | |
[73] | Nearest-neighbor-based state representation, pheromone-based regional green-wave | Cooperative traffic light control | SUMO | |
[74] | Contextual DQN, contextual actor–critic | Cooperative fleet management | Custom | |
[75] | Sim-to-real | LSTM, epistemic, and aleatoric uncertainty estimation, MPC | Optimal parking assignment, NGSIM, HighD, and INTERACTION dataset | INTERACTION dataset visualization tool |
[76] | QMIX | Optimal parking assignment | Custom | |
[77] | Sim-to-real/Safety | Lyapunov function, soft actor–critic | Obstacle avoidance while driving | Gym-Gazebo |
[78] | Safety | DDPG | Distracted pedestrian avoidance | PVI framework |
[79] | TD3 §, MPC € | Crash avoidance, lane keeping | Custom | |
[80] | Safety | Shielding, CQL ¤, MADDPG | Lane-free traffic control | SUMO |
[81] | Benchmarking platform | MCTS ¥, RL-based models | Ego-vehicle velocity control | BARK |
[82] | Unified approach | Multiple scenarios (ring, highway ramp, etc.) | SUMO | |
[83] | Partially observable Markov game | Unsignalized intersection | MACAD-Gym | |
[84] | Model-based offline RL for trajectory planning | NGSIM dataset scenarios, unprotected left turn at the intersection | NGSIM, CARLA |
3.2. Application Areas
3.2.1. Cooperative Motion Planning
3.2.2. Trajectory Prediction
3.2.3. Intelligent Traffic Management (ITM)
- (i)
- Allowing each local agent to learn some features of the nearby agents to gain deeper localized traffic information;
- (ii)
- Introducing a spatial discount factor in the reward function that weakens information transmission from the farther agents. The proposed method was tested on a simulated multi-intersection traffic scenario developed using SUMO and has been demonstrated to outperform SOTA algorithms.
3.3. Sim-to-Real Approaches
3.4. Safety
- (i)
- Uncertainty quantification;
- (ii)
- Lyapunov functions.
- Cybersecurity: CAVs are connected to the internet, which means they are vulnerable to cyberattacks. A cyberattack on a CAV could compromise the safety of passengers and other road users.
- Malfunctions: CAVs operate with complex sensors, software, and hardware. A malfunction in any of these components could lead to an accident.
- Personal data privacy: CAVS collect a vast amount of data about their passengers and surroundings. There is a risk that these personal data could be misused or hacked, compromising the passengers’ privacy.
- Legal liability: In the event of an accident involving a CAV, it may be challenging to determine who is legally liable. Is it the manufacturer of the vehicle, the software developer, or the vehicle’s owner? This also leads to a moral dilemma.
- Infrastructure compatibility: CAVs require a new, more sophisticated infrastructure to operate safely. The infrastructure must be compatible with the vehicles’ sensors and communication systems, which could be a significant challenge.
- Safety of humans and animals on the road: Ensuring the safety of pedestrians, cyclists, and other vulnerable road users such as animals is critical, and AVs must be designed to detect and respond to them. However, this can be particularly challenging in crowded urban environments.
3.5. Benchmarking Platforms
3.6. Datasets Widely Used in CAV Applications
- INTERACTION [42]: This dataset comprises realistic movements of different traffic participants in diverse, highly interactive driving scenarios across multiple countries. Further details and data formats are available on the dataset’s website. This dataset can facilitate research in several areas related to behavior, such as predicting intentions/motions/behaviors, cloning behaviors through imitation and inverse reinforcement learning, modeling and analyzing behavior, reinforcement learning, developing and verifying decision-making and planning algorithms, extracting and categorizing interactive behaviors, and generating driving scenarios/cases/behaviors.
- TrajNet++ [94]: A large-scale trajectory prediction benchmark dataset designed to evaluate the performance of trajectory prediction models. It contains over 78,000 pedestrian trajectories in various real-world scenarios. The dataset also includes high-resolution top-view images of the scenes and additional information such as pedestrian attributes (e.g., age, gender, clothing) and social groups. The TrajNet++ dataset provides evaluation metrics for trajectory prediction models, including average displacement error (ADE), final displacement error (FDE), and trajectory intersection over union (IOU). It also includes a baseline model and a leaderboard to facilitate fair comparison and benchmarking of different models.
- Next Generation Simulation (NGSIM) [95]: This is a collection of detailed traffic trajectory datasets from real-world traffic observations. It was collected by the US Federal Highway Administration (FHWA) at six different locations in the United States between 2005 and 2007. The NGSIM dataset includes vehicle trajectory data from video cameras and roadside sensors. The dataset contains individual vehicles’ position, speed, and acceleration as they move through the traffic network, as well as other characteristics such as vehicle type, length, and width.
4. Discussion
- Spatiotemporal Data Analysis: Sequential data such as spatiotemporal traffic data containing information such as vehicles’ location, speed, inter-vehicle distance, etc., play a key role in solving CAV domain problems. To represent the relations and extract spatiotemporal features in the traffic data, graph-based networks such as graph convolution networks (GCNs) [103], graph attention networks (GATs) [104], etc., are used as they are capable of generating node embeddings which allow storing each object’s features as well as inter-related features [48,58,92]. They are used to solve various CAV problems such as vehicle platooning, lane merging, highway on-ramp merging, trajectory prediction, unsignalized intersection management, solving traffic congestion, and controlling traffic lights. Despite the current developments, there is a need to incorporate multi-modal spatiotemporal data to extract richer information that can help improve the performance of GCNs. Such data can be meteorological data on air quality, weather conditions, temperature, etc., to allow for more efficient traffic prediction vision-based data of the surroundings to provide contextual information during motion planning. Similarly, researchers can explore other CAV-related data to develop models for better decision making in CAVs.
- Domain Adaptation: Transferring the knowledge learned in a simulated environment to a real-world environment is called domain adaptation. This goal is often difficult to achieve in the CAV domain because the traffic environment data received by the sensors in the real world could belong to vastly different distributions compared to the training data seen in simulated environments. Another important related issue is inference time in the real-world environment. Various works have considered the sim-to-real transfer approach to tackle this challenge [75,77]. However, they only focus on localized CAV tasks such as optimal parking assignment to find the optimal parking spots for the autonomous vehicles in real time or performing navigation in a controlled environment such as indoor areas. In recent years, tremendous success has been achieved by meta-reinforcement learning (meta-RL) [105] techniques to perform zero-shot sim-to-real domain adaptation in robotics [105,106,107]. Hence, meta-RL-based approaches could be explored for their application to the CAV domain for efficient sim-to-real transfer. Nevertheless, another potential research topic in this area can be active learning [108], where the agents can consult human experts to improve the model’s decision-making ability in real time.
- Safety and Interpretability: Safety is one of the most critical factors for the real-time implementation of CAVs. CAVs are deployed in the real world where various actors, such as other vehicles (both CAVs and HDVs), pedestrians, and other entities, may co-exist and move independently. In such a scenario, CAVs must navigate safely and comfortably by avoiding possible collisions. Several works have utilized constraint satisfaction methods by defining the Lyapunov function for the vehicle’s behavior policy, defining safety specifications, and regulating the vehicle’s actions through shielding to avoid “unsafe” actions. Nonetheless, there exists a need for model interpretability [109] to provide safety guarantees for the real-world deployment of CAV systems. Hence, researchers can also explore the possibility of developing an interpretable MARL model for CAVs.
- Benchmarking Platforms and Datasets: Developing benchmarking platforms and curating datasets play a crucial role in validating the usability of MARL algorithms. The CAV domain spans a variety of tasks, such as cooperative motion planning, trajectory prediction of CAVs and pedestrians, automated traffic light control, and automated traffic management [79]. Some researchers have developed and proposed benchmarks for behavior models for a limited number of tasks such as controlling ego-vehicle velocity in a multi-agent environment. Hence, there is a need to develop more advanced benchmarking platforms to cover a more extensive range of CAV tasks.
- Communication Issues: Beyond the algorithmic and data-based research, communication among the CAV agents in an environment heavily influences the performance of CAV systems. Information transmission delay in a CAV network can become catastrophic and life-threatening in a real-world scenario. It is another open challenge for researchers to consider information transmission delay in the CAV network while developing MARL algorithms [19].
- Technical Challenges: Developing full AVs is a complex and cumbersome process that requires advanced hardware and software systems, such as sensors, ML algorithms, and AI. These technologies are still evolving, and technical challenges need to be overcome, such as ensuring the safety and reliability of the system.
- Consumer Acceptance: Consumers are reluctant to adopt CAVs due to concerns about safety, reliability, and loss of control. Education and awareness campaigns are necessary to help consumers understand the benefits of CAVs and overcome their concerns.
- Infrastructure: The deployment of CAVs requires a supportive infrastructure, such as ITSs, advanced communication networks, and charging stations for electric vehicles. The lack of infrastructure also hinders the deployment and adoption of CAVs.
- Cost: The development and deployment of CAVs require significant investment, and the cost of the technology is still relatively high. Therefore, many consumers and businesses cannot afford the high cost of CAVs, which can limit their adoption.
- Regulatory and Legal Issues: The deployment of CAVs raises a number of legal and regulatory issues, including liability, data privacy, and cybersecurity. Governments and regulators need to establish clear rules and regulations to ensure the safety and security of the public when using these vehicles.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; p. 30. [Google Scholar]
- Haarnoja, T.; Ha, S.; Zhou, A.; Tan, J.; Tucker, G.; Levine, S. Learning to Walk via Deep Reinforcement Learning. arXiv 2019, arXiv:1812.11103. [Google Scholar]
- Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-line building energy optimization using deep reinforcement learning. IEEE Trans. Smart Grid. 2018, 10, 3698–3708. [Google Scholar] [CrossRef]
- Perez-Liebana, D.; Hofmann, K.; Mohanty, S.P.; Kuno, N.; Kramer, A.; Devlin, S.; Gaina, R.D.; Ionita, D. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition. arXiv 2019, arXiv:1901.08129. [Google Scholar]
- Arulkumaran, K.; Cully, A.; Togelius, J. AlphaStar: An Evolutionary Computation Perspective. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Prague, Czech Republic, 13–17 July 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
- Park, K.H.; Kim, Y.J.; Kim, J.H. Modular Q-Learning Based Multi-Agent Cooperation for Robot Soccer. Robot. Auton. Syst. 2001, 35, 109–122. [Google Scholar] [CrossRef]
- Cui, J.; Liu, Y.; Nallanathan, A. Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks. IEEE Trans. Wirel. Commun. 2020, 19, 729–743. [Google Scholar] [CrossRef]
- Arvind, C.S.; Senthilnath, J. Autonomous RL: Autonomous Vehicle Obstacle Avoidance in a Dynamic Environment Using MLP-SARSA Reinforcement Learning. In Proceedings of the 2019 15th International Conference on Mechatronics System and Robots (ICMSR), Singapore, 3–5 May 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
- Petrillo, A.; Salvi, A.; Santini, S.; Valente, A.S. Adaptive Multi-Agents Synchronization for Collaborative Driving of Autonomous Vehicles with Multiple Communication Delays. Transp. Res. Part C Emerg. Technol. 2018, 86, 372–392. [Google Scholar] [CrossRef]
- Pomerleau, D.A. ALVINN: An Autonomous Land Vehicle in a Neural Network. In Advances in Neural Information Processing Systems; Morgan-Kaufmann: Burlington, MA, USA, 1988; Volume 1. [Google Scholar]
- The DARPA Grand Challenge: Ten Years Later. Available online: https://www.darpa.mil/news-events/2014-03-13 (accessed on 31 January 2023).
- Singh, S.; Saini, B.S. Autonomous Cars: Recent Developments, Challenges, and Possible Solutions. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2021; p. 1022. [Google Scholar]
- Zhang, J.; Wang, F.Y.; Wang, K.; Lin, W.H.; Xu, X.; Chen, C. Data-Driven Intelligent Transportation Systems: A Survey. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1624–1639. [Google Scholar] [CrossRef]
- Hernandez-Leal, P.; Kaisers, M.; Baarslag, T.; Munoz de Cote, E. A Survey of Learning in Multi-agent Environments: Dealing with Non-Stationarity. arXiv 2019, arXiv:1707.09183. [Google Scholar]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multi-agent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
- Du, W.; Ding, S. A Survey on Multi-Agent Deep Reinforcement Learning: From the Perspective of Challenges and Applications. Artif. Intell. Rev. 2021, 54, 3215–3238. [Google Scholar] [CrossRef]
- Gronauer, S.; Diepold, K. Multi-Agent Deep Reinforcement Learning: A Survey. Artif. Intell. Rev. 2022, 55, 895–943. [Google Scholar] [CrossRef]
- Wong, A.; Bäck, T.; Kononova, A.V.; Plaat, A. Deep Multi-Agent Reinforcement Learning: Challenges and Directions. Artif. Intell. Rev. 2023, 56, 5023–5056. [Google Scholar] [CrossRef]
- Althamary, I.; Huang, C.W.; Lin, P. A Survey on Multi-Agent Reinforcement Learning Methods for Vehicular Networks. In Proceedings of the 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco, 24–28 June 2019; pp. 1154–1159. [Google Scholar]
- Li, T.; Zhu, K.; Luong, N.C.; Niyato, D.; Wu, Q.; Zhang, Y.; Chen, B. Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2022, 24, 1240–1279. [Google Scholar] [CrossRef]
- Schmidt, L.M.; Brosig, J.; Plinge, A.; Eskofier, B.M.; Mutschler, C. An Introduction to Multi-Agent Reinforcement Learning and Review of Its Application to Autonomous Mobility. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 1342–1349. [Google Scholar]
- Dinneweth, J.; Boubezoul, A.; Mandiau, R.; Espié, S. Multi-Agent Reinforcement Learning for Autonomous Vehicles: A Survey. Auton. Intell. Syst. 2022, 2, 27. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Littman, M.L. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Machine Learning Proceedings 1994; Cohen, W.W., Hirsh, H., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1994; pp. 157–163. [Google Scholar]
- Gupta, J.K.; Egorov, M.; Kochenderfer, M. Cooperative Multi-Agent Control Using Deep Reinforcement Learning. In Autonomous Agents and Multi-Agent Systems; Sukthankar, G., Rodriguez-Aguilar, J.A., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2017; pp. 66–83. [Google Scholar]
- Strouse, D.J.; Kleiman-Weiner, M.; Tenenbaum, J.; Botvinick, M.; Schwab, D. Learning to Share and Hide Intentions Using Information Regularization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, New York, NY, USA, 3–8 December 2018; Curran Associates Inc.: New York, NY, USA, 2018; pp. 10270–10281. [Google Scholar]
- Omidshafiei, S.; Pazis, J.; Amato, C.; How, J.P.; Vian, J. Deep Decentralized Multi-Task Multi-Agent Reinforcement Learning under Partial Observability. In Proceedings of the 34th International Conference on Machine Learning, PMLR, 2017, Sydney, NSW, Australia, 6–11 August 2017; pp. 2681–2690. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
- Konda, V.; Tsitsiklis, J. Actor-Critic Algorithms. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1999; Volume 12. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.F.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-Decomposition Networks for Cooperative Multi-Agent Learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Shariq, I.; Sha, F. Actor-attention-critic for multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2961–2970. [Google Scholar]
- Li, J.; Kuang, K.; Wang, B.; Liu, F.; Chen, L.; Wu, F.; Xiao, J. Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Online, 14–18 August 2021; ACM: New York, NY, USA, 2021. [Google Scholar]
- Huang, S.; Zhang, H.; Huang, Z. Multi-UAV Collision Avoidance Using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment. arXiv 2022, arXiv:2204.08594. [Google Scholar]
- Feng, L.; Xie, Y.; Liu, B.; Wang, S. Multi-Level Credit Assignment for Cooperative Multi-Agent Reinforcement Learning. Appl. Sci. 2022, 12, 6938. [Google Scholar] [CrossRef]
- Azzam, R.; Boiko, I.; Zweiri, Y. Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution. Drones 2023, 7, 193. [Google Scholar] [CrossRef]
- Zhang, H.; Feng, S.; Liu, C.; Ding, Y.; Zhu, Y.; Zhou, Z.; Zhang, W.; Yu, Y.; Jin, H.; Li, Z. CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario. In Proceedings of the The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3620–3624. [Google Scholar]
- Krajzewicz, D. Traffic Simulation with SUMO—Simulation of Urban Mobility. In Fundamentals of Traffic Simulation; Barceló, J., Ed.; International Series in Operations Research & Management Science; Springer: New York, NY, USA, 2010; pp. 269–293. [Google Scholar]
- Wu, C.; Kreidieh, A.; Parvate, K.; Vinitsky, E.; Bayen, A.M. Flow: A Modular Learning Framework for Mixed Autonomy Traffic. IEEE Trans. Robot. 2022, 38, 1270–1286. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. arXiv 2017, arXiv:1711.03938. [Google Scholar]
- Eleurent/Highway-Env: A Minimalist Environment for Decision-Making in Autonomous Driving. Available online: https://github.com/eleurent/highway-env (accessed on 31 January 2023).
- Igilitschenski/Multi_car_racing: An OpenAI Gym Environment for Multi-Agent Car Racing Based on Gym’s Original Car Racing Environment. Available online: https://github.com/igilitschenski/multi_car_racing (accessed on 31 January 2023).
- Zhan, W.; Sun, L.; Wang, D.; Shi, H.; Clausse, A.; Naumann, M.; Kummerle, J.; Konigshof, H.; Stiller, C.; de La Fortelle, A.; et al. INTERACTION Dataset: An INTERnational, Adversarial and Cooperative MoTION Dataset in Interactive Driving Scenarios with Semantic Maps. arXiv 2019, arXiv:1910.03088. [Google Scholar]
- Gym-Graph-Traffic. Available online: https://github.com/rltraffic/gym-graph-traffic (accessed on 9 January 2023).
- Lopez, N.G.; Leire, Y.; Nuin, E.; Moral, E.B.; Usategui, L.; Juan, S.; Rueda, A.S.; Vilches, V.M.; Kojcev, R. Gym-Gazebo2, a Toolkit for Reinforcement Learning Using ROS 2 and Gazebo. arXiv 2019, arXiv:1903.06278. [Google Scholar]
- Fellendorf, M.; Vortisch, P. Microscopic traffic flow simulator VISSIM. Fundam. Traffic Simul. 2010, 145, 63–93. [Google Scholar]
- Gietelink, O.J.; Verburg, D.J.; Labibes, K.; Oostendorp, A.F. Pre-crash system validation with PRESCAN and VEHIL. In IEEE Intelligent Vehicles Symposium; IEEE: New York, NY, USA, 2004; pp. 913–918. [Google Scholar]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv 2016, arXiv:1610.03295. [Google Scholar]
- Peake, A.; McCalmon, J.; Raiford, B.; Liu, T.; Alqahtani, S. Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 15–22. [Google Scholar]
- Chen, Y.F.; Liu, M.; Everett, M.; How, J.P. Decentralized Non-Communicating Multiagent Collision Avoidance with Deep Reinforcement Learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: New York, NY, USA, 2004; pp. 285–292. [Google Scholar]
- Chen, S.; Dong, J.; Ha, P.Y.J.; Li, Y.; Labi, S. Graph Neural Network and Reinforcement Learning for Multi-Agent Cooperative Control of Connected Autonomous Vehicles. Comput. Aided Civ. Infrastruct. Eng. 2021, 36, 838–857. [Google Scholar] [CrossRef]
- Troullinos, D.; Chalkiadakis, G.; Papamichail, I.; Papageorgiou, M. Collaborative Multi-Agent Decision Making for Lane-Free Autonomous Driving. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ‘21, London, UK, 3–7 May 2021; International Foundation for Autonomous Agents and Multi-Agent Systems: Richland, SC, USA, 2021; pp. 1335–1343. [Google Scholar]
- Li, M.; Cao, Z.; Li, Z. A Reinforcement Learning-Based Vehicle Platoon Control Strategy for Reducing Energy Consumption in Traffic Oscillations. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5309–5322. [Google Scholar] [CrossRef]
- Thakkar, R.S.; Samyal, A.S.; Fridovich-Keil, D.; Xu, Z.; Topcu, U. Hierarchical Control for Head-to-Head Autonomous Racing. arXiv 2022, arXiv:2202.12861. [Google Scholar]
- Zhou, W.; Chen, D.; Yan, J.; Li, Z.; Yin, H.; Ge, W. Multi-Agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic. Auton. Intell. Syst. 2022, 2, 5. [Google Scholar] [CrossRef]
- Shou, Z.; Chen, X.; Fu, Y.; Di, X. Multi-Agent Reinforcement Learning for Markov Routing Games: A New Modeling Paradigm for Dynamic Traffic Assignment. Transp. Res. Part C Emerg. Technol. 2022, 137, 103560. [Google Scholar] [CrossRef]
- Şehriyaroğlu, M.; Genç, Y. Cooperative Multi-Agent Reinforcement Learning for Autonomous Cars Passing on Narrow Road. In Smart Applications with Advanced Machine Learning and Human-Centred Problem Design; Hemanth, J.D., Kose, U., Watada, J., Patrut, B., Eds.; Engineering Cyber-Physical Systems and Critical Infrastructures; Springer International Publishing: Cham, Switzerland, 2023; pp. 533–540. [Google Scholar]
- Han, S.; Wang, H.; Su, S.; Shi, Y.; Miao, F. Stable and efficient Shapley value-based reward reallocation for multi-agent reinforcement learning of autonomous vehicles. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: New York, NY, USA, 2022; pp. 8765–8771. [Google Scholar]
- Toghi, B.; Valiente, R.; Sadigh, D.; Pedarsani, R.; Fallah, Y.P. Altruistic Maneuver Planning for Cooperative Autonomous Vehicles Using Multi-Agent Advantage Actor-Critic. arXiv 2021, arXiv:2107.05664. [Google Scholar]
- Xie, A.; Losey, D.; Tolsma, R.; Finn, C.; Sadigh, D. Learning Latent Representations to Influence Multi-Agent Interaction. In Proceedings of the 2020 Conference on Robot Learning, PMLR, 2021, Auckland, New Zealand, 14–18 December 2020; pp. 575–588. [Google Scholar]
- Ma, H.; Sun, Y.; Li, J.; Tomizuka, M.; Choi, C. Continual Multi-Agent Interaction Behavior Prediction With Conditional Generative Memory. IEEE Robot. Autom. Lett. 2021, 6, 8410–8417. [Google Scholar] [CrossRef]
- Jia, X.; Sun, L.; Zhao, H.; Tomizuka, M.; Zhan, W. Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views. In Proceedings of the 5th Conference on Robot Learning, PMLR 2022, London, UK, 8–11 November 2022; pp. 1434–1443. [Google Scholar]
- Mo, X.; Huang, Z.; Xing, Y.; Lv, C. Multi-Agent Trajectory Prediction with Heterogeneous Edge-Enhanced Graph Attention Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 9554–9567. [Google Scholar] [CrossRef]
- Wiederer, J.; Bouazizi, A.; Troina, M.; Kressel, U.; Belagiannis, V. Anomaly Detection in Multi-Agent Trajectories for Automated Driving. In Proceedings of the 5th Conference on Robot Learning, PMLR 2022, London, UK, 8–11 November 2022; pp. 1223–1233. [Google Scholar]
- Guillen-Perez, A.; Maria-Dolores, C. Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow’s Intersections. IEEE Trans. Veh. Technol. 2022, 71, 7033–7043. [Google Scholar]
- Xu, Y.; Zhou, H.; Ma, T.; Zhao, J.; Qian, B.; Shen, X. Leveraging Multi-Agent Learning for Automated Vehicles Scheduling at Nonsignalized Intersections. IEEE Internet Things J. 2021, 8, 11427–11439. [Google Scholar] [CrossRef]
- Chen, B.; Xu, M.; Liu, Z.; Li, L.; Zhao, D. Delay-Aware Multi-Agent Reinforcement Learning for Cooperative and Competitive Environments. arXiv 2020, arXiv:2005.05441. [Google Scholar]
- Cui, J.; Macke, W.; Yedidsion, H.; Urieli, D.; Stone, P. Scalable Multi-Agent Driving Policies for Reducing Traffic Congestion. arXiv 2022, arXiv:2103.00058. [Google Scholar]
- Chandra, R.; Manocha, D. GamePlan: Game-Theoretic Multi-Agent Planning with Human Drivers at Intersections, Roundabouts, and Merging. IEEE Robot. Autom. Lett. 2022, 7, 2676–2683. [Google Scholar] [CrossRef]
- Van der Pol, E.; Oliehoek, F.A. Coordinated Deep Reinforcement Learners for Traffic Light Control. In Proceedings of the Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016) 8, Barcelona, Spain, 5–10 December 2016; pp. 21–38. [Google Scholar]
- Prabuchandran, K.J.; Kumar, H.A.N.; Bhatnagar, S. Multi-Agent Reinforcement Learning for Traffic Signal Control. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; IEEE: New York, NY, USA, 2014; pp. 2529–2534. [Google Scholar]
- Chu, T.; Wang, J.; Codecà, L.; Li, Z. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1086–1095. [Google Scholar] [CrossRef]
- Wang, Y.; Cai, P.; Lu, G. Cooperative Autonomous Traffic Organization Method for Connected Automated Vehicles in Multi-Intersection Road Networks. Transp. Res. Part C Emerg. Technol. 2020, 111, 458–476. [Google Scholar] [CrossRef]
- Wang, T.; Cao, J.; Hussain, A. Adaptive Traffic Signal Control for Large-Scale Scenario with Cooperative Group-Based Multi-Agent Reinforcement Learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103046. [Google Scholar] [CrossRef]
- Lin, K.; Zhao, R.; Xu, Z.; Zhou, J. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1774–1783. [Google Scholar]
- Tang, X.; Yang, K.; Wang, H.; Wu, J.; Qin, Y.; Yu, W.; Cao, D. Prediction-Uncertainty-Aware Decision-Making for Autonomous Vehicles. IEEE Trans. Intell. Veh. 2022, 7, 849–862. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, C.; Liao, F.; Li, X.; Du, Y. Online Parking Assignment in an Environment of Partially Connected Vehicles: A Multi-Agent Deep Reinforcement Learning Approach. Transp. Res. Part C Emerg. Technol. 2022, 138, 103624. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, R.; Wu, T.; Weng, R.; Han, M.; Zhao, Y. Safe Reinforcement Learning with Stability Guarantee for Motion Planning of Autonomous Vehicles. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5435–5444. [Google Scholar] [CrossRef]
- Zhu, H.; Han, T.; Alhajyaseen, W.K.M.; Iryo-Asano, M.; Nakamura, H. Can Automated Driving Prevent Crashes with Distracted Pedestrians? An Exploration of Motion Planning at Unsignalized Mid-Block Crosswalks. Accid. Anal. Prev. 2022, 173, 106711. [Google Scholar] [CrossRef]
- Bautista-Montesano, R.; Galluzzi, R.; Ruan, K.; Fu, Y.; Di, X. Autonomous Navigation at Unsignalized Intersections: A Coupled Reinforcement Learning and Model Predictive Control Approach. Transp. Res. Part C Emerg. Technol. 2022, 139, 103662. [Google Scholar] [CrossRef]
- Elsayed-Aly, I.; Bharadwaj, S.; Amato, C.; Ehlers, R.; Topcu, U.; Feng, L. Safe Multi-Agent Reinforcement Learning via Shielding. arXiv 2021, arXiv:2101.11196. [Google Scholar]
- Bernhard, J.; Esterle, K.; Hart, P.; Kessler, T. BARK: Open Behavior Benchmarking in Multi-Agent Environments. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, Las Vegas, NV, USA, 24 October–24 January 2020; pp. 6201–6208. [Google Scholar]
- Yan, Z.; Kreidieh, A.R.; Vinitsky, E.; Bayen, A.M.; Wu, C. Unified Automatic Control of Vehicular Systems With Reinforcement Learning. IEEE Trans. Autom. Sci. Eng. 2022, 4, 1–16. [Google Scholar] [CrossRef]
- Palanisamy, P. Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar]
- Diehl, C.; Sievernich, T.; Krüger, M.; Hoffmann, F.; Bertran, T. Umbrella: Uncertainty-aware model-based offline reinforcement learning leveraging planning. arXiv 2021, arXiv:2111.11097. [Google Scholar]
- Bhalla, S.; Subramanian, S.G.; Crowley, M. Deep Multi Agent Reinforcement Learning for Autonomous Driving. In Advances in Artificial Intelligence; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; pp. 67–78. [Google Scholar]
- Boehmer, W.; Kurin, V.; Whiteson, S. Deep Coordination Graphs. In Proceedings of the 37th International Conference on Machine Learning, PMLR, 2020, Online, 13–18 July 2020; pp. 980–991. [Google Scholar]
- Kok, J.R.; Vlassis, N. Using the Max-Plus Algorithm for Multi-agent Decision Making in Coordination Graphs. In RoboCup 2005: Robot Soccer World Cup IX; Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
- Park, M.G.; Jeon, J.H.; Lee, M.C. Obstacle Avoidance for Mobile Robots Using Artificial Potential Field Approach with Simulated Annealing. In Proceedings of the ISIE 2001. 2001 IEEE International Symposium on Industrial Electronics Proceedings (Cat. No.01TH8570), Pusan, Republic of Korea, 12–16 June 2001; Volume 3, pp. 1530–1535. [Google Scholar]
- Hart, S. Shapley Value. In Game Theory; The New Palgrave; Eatwell, J., Milgate, M., Newman, P., Eds.; Palgrave Macmillan: London, UK, 1989. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Li, S.; Wu, Y.; Cui, X.; Dong, H.; Fang, F.; Russell, S. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 29–31 January 2019; Volume 33, pp. 4213–4220. [Google Scholar]
- Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–3 February 2018; Volume 32. [Google Scholar]
- Deo, N.; Trivedi, M.M. Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver Based LSTMs. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1179–1184. [Google Scholar]
- Kothari, P.; Kreiss, S.; Alahi, A. Human Trajectory Forecasting in Crowds: A Deep Learning Perspective. arXiv 2021, arXiv:2007.03639. [Google Scholar] [CrossRef]
- U.S. Department of Transportation Federal Highway Administration. Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data; Federal Highway Administration: Washington, DC, USA, 2016. [CrossRef]
- Węglarczyk, S. Kernel Density Estimation and Its Application; Zielinski, W., Kuchar, L., Michalski, A., Kazmierczak, B., Eds.; ITM Web of Conferences: Princeton, NJ, USA, 2018; Volume 23. [Google Scholar]
- Oliehoek, F.A.; Whiteson, S.; Spaan, M.T.J. Approximate solutions for factored Dec-POMDPs with many agents. In Proceedings of the AAMAS, Saint Paul, MN, USA, 6–10 May 2013; pp. 563–570. [Google Scholar]
- Richter, S. Learning Traffic Control-Towards Practical Traffic Control Using Policy Gradients; Albert-Ludwigs-Universitat Freiburg: Breisgau, Germany, 2006. [Google Scholar]
- Camacho, E.F.; Alba, C.B. Model Predictive Control; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the International Conference on Machine Learning, PMLR, 2018, Sydney, Australia, 3–7 December 2018; pp. 1861–1870. [Google Scholar]
- Ogren, P.; Egerstedt, M.; Hu, X. A control Lyapunov function approach to multi-agent coordination. In Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No. 01CH37228), Orlando, FL, USA, 4–7 December 2001; IEEE: New York, NY, USA, 2001; Volume 2, pp. 1150–1155. [Google Scholar]
- Browne, C.B.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, P.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef]
- Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph Convolutional Networks: A Comprehensive Review. Comput. Soc. Netw. 2019, 6, 11. [Google Scholar] [CrossRef]
- Zhang, C.; Yu, J.J.Q.; Liu, Y. Spatial-Temporal Graph Attention Networks: A Deep Learning Approach for Traffic Forecasting. IEEE Access 2019, 7, 166246–166256. [Google Scholar] [CrossRef]
- Nagabandi, A.; Clavera, I.; Liu, S.; Fearing, R.S.; Abbeel, P.; Levine, S.; Finn, C. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv 2019, arXiv:1803.11347. [Google Scholar]
- Arndt, K.; Hazara, M.; Ghadirzadeh, A.; Kyrki, V. Meta Reinforcement Learning for Sim-to-Real Domain Adaptation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2725–2731. [Google Scholar]
- Schoettler, G.; Nair, A.; Ojea, J.A.; Levine, S.; Solowjow, E. Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2021; pp. 9728–9735. [Google Scholar]
- Aggarwal, C.C.; Kong, X.; Gu, Q.; Han, J.; Yu, P.S. Active learning: A survey. In Data Classification; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014; pp. 599–634. [Google Scholar]
- Verma, A.; Murali, V.; Singh, R.; Kohli, P.; Chaudhuri, S. Programmatically Interpretable Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5045–5054. [Google Scholar]
Reference No. | Key Features | Scope | Flexibility | Accessibility |
---|---|---|---|---|
Bernhard et al. [81] | Systematic evaluation and improvement of vehicle behavior models. | Vehicle behavior models. | Capable of extending to future behavior models beyond the original reference implementations. | Open source. |
Yan et al. [82] | Provides unified multi-agent, multi-task reinforcement learning methodologies to simulate vehicular systems in mixed autonomy traffic. | Various deep reinforcement learning algorithm implementations for decision-making tasks in AVs. | Capable of extending to more complex traffic scenarios. | Open source. |
Palanisamy [83] | Provides a multi-agent autonomous driving platform for simulating various kinds of driving environments and diverse types of agents. | Various driving environments with diverse driving agents. | Capable of introducing more complex types of driving environments depending on user’s need. | Open source. |
Diehl et al. [84] | Provides an uncertainty-aware model-based offline planning framework for tackling uncertainties in real-world driving scenarios. | Uncertainty-aware autonomous driving framework. | Difficult to extend the existing capabilities due to the complex implementation. | Open source. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yadav, P.; Mishra, A.; Kim, S. A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles. Sensors 2023, 23, 4710. https://doi.org/10.3390/s23104710
Yadav P, Mishra A, Kim S. A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles. Sensors. 2023; 23(10):4710. https://doi.org/10.3390/s23104710
Chicago/Turabian StyleYadav, Pamul, Ashutosh Mishra, and Shiho Kim. 2023. "A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles" Sensors 23, no. 10: 4710. https://doi.org/10.3390/s23104710