Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference
<p>Illustration of textual narrative generation.</p> "> Figure 2
<p>Zero-shot (ZS).</p> "> Figure 3
<p>Zero-shot with CoT (ZS_CoT).</p> "> Figure 4
<p>Zero-shot with prompt engineering (ZS_PE).</p> "> Figure 5
<p>Zero-shot with prompt engineering & CoT (ZS_PE_CoT).</p> "> Figure 6
<p>Few shot (FS).</p> "> Figure 7
<p>Exemplar responses of LLMs in different settings.</p> "> Figure 8
<p>Effect of PE or CoT separately.</p> "> Figure 9
<p>Performance comparison of models in ZS, ZS_PE, and ZS_PE_CoT.</p> "> Figure 10
<p>Word cloud for correctly inferred “Minor or non-injury accident” in the ZS_CoT setting.</p> "> Figure 11
<p>Word cloud for correctly inferred “Serious injury accident” in the ZS_CoT setting.</p> "> Figure 12
<p>Word cloud for correctly inferred “Fatal accident” in the ZS_CoT setting.</p> "> Figure 13
<p>Output examples for fatal accidents from LLaMA3-70B in ZS_CoT setting.</p> ">
Abstract
:1. Introduction
2. Methods
2.1. Model Descriptions
2.2. In-Context Learning
2.3. Chain-of-Thoughts (CoT)
2.4. Prompt Engineering (PE)
3. Data
3.1. Dataset
3.2. Textual Narrative Generation
4. Experiments
4.1. Experiments Design
4.2. Prompts for LLMs
4.2.1. Zero-Shot
4.2.2. Few-Shot
4.3. Evaluation Metrics
- : The number of correctly classified instances in the test dataset.
- : The total number of instances in the test dataset.
- : The number of correctly predicted instances of the class.
- : The number of instances wrongly classified into the class.
- : The number of instances of the class wrongly classified as something else.
5. Findings
5.1. Exemplar Responses of LLMs to Crash Severity Inference Queries
5.2. Severity Inference Performance of the LLMs with Different Strategies
5.3. Effectiveness of Prompt Engineering (PE) and Chain-of-Thought (CoT)
5.4. Zero-Shot vs. Few-Shot Learning
6. Discussions
6.1. Can LLMs with CoT Yield Logical Reasoning for Their Inference Outcomes?
- Crash-related factors (e.g., “rear-end collision”, “pedestrian”, “opposite directions”, “corner”)
- Environmental conditions (e.g., “wet road surface”, “rain”, “dark”, “stop-go”)
- Driver behavior (e.g., “failing to yield”, “misjudgment”, “turning”, “give way”, “excessive speed”)
- Driver characteristics (e.g., “male” “older driver”, “age”)
- Vehicle factors (e.g., “bus”, “headlights”, “seatbelt”)
- Road design elements (e.g., “traffic lights”,“intersection”, “curved road”, “t intersection”)
6.2. Limitations and Future Research
- Expanding the dataset to include a larger and more diverse set of samples will allow for a more comprehensive evaluation of the models’ capabilities and improve the robustness of the results.
- Fine-tuning LLMs with more extensive and domain-specific data (e.g., crash reports and databases) can significantly enhance their domain knowledge to better understand the nuances and specificities of traffic accidents, leading to more accurate and reliable reasoning and inference.
- Investigating explanation methods in conjunction with LLMs can yield more interpretable and trustworthy results.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Mannering, F.; Bhat, C.R.; Shankar, V.; Abdel-Aty, M. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis. Anal. Methods Accid. Res. 2020, 25, 100113. [Google Scholar] [CrossRef]
- Golob, T.F.; Recker, W.W. Relationships among urban freeway accidents, traffic flow, weather, and lighting conditions. J. Transp. Eng. 2003, 129, 342–353. [Google Scholar] [CrossRef]
- Eluru, N.; Bhat, C.R. A joint econometric analysis of seat belt use and crash-related injury severity. Accid. Anal. Prev. 2007, 39, 1037–1049. [Google Scholar] [CrossRef] [PubMed]
- Lord, D.; Mannering, F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transp. Res. Part Policy Pract. 2010, 44, 291–305. [Google Scholar] [CrossRef]
- Savolainen, P.T.; Mannering, F.L.; Lord, D.; Quddus, M.A. The statistical analysis of highway crash-injury severities: A review and assessment of methodological alternatives. Accid. Anal. Prev. 2011, 43, 1666–1676. [Google Scholar] [CrossRef] [PubMed]
- Karlaftis, M.G.; Golias, I. Effects of road geometry and traffic volumes on rural roadway accident rates. Accid. Anal. Prev. 2002, 34, 357–365. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Akinci, B.; Qian, S. Inferring heterogeneous treatment effects of work zones on crashes. Accid. Anal. Prev. 2022, 177, 106811. [Google Scholar] [CrossRef] [PubMed]
- Pervez, A.; Lee, J.; Huang, H. Exploring factors affecting the injury severity of freeway tunnel crashes: A random parameters approach with heterogeneity in means and variances. Accid. Anal. Prev. 2022, 178, 106835. [Google Scholar] [CrossRef] [PubMed]
- Goh, Y.M.; Ubeynarayana, C. Construction accident narrative classification: An evaluation of text mining techniques. Accid. Anal. Prev. 2017, 108, 122–130. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Green, E.; Chen, M.; Souleyrette, R.R. Identifying secondary crashes using text mining techniques. J. Transp. Saf. Secur. 2020, 12, 1338–1358. [Google Scholar] [CrossRef]
- Das, S.; Datta, S.; Zubaidi, H.A.; Obaid, I.A. Applying interpretable machine learning to classify tree and utility pole related crash injury types. IATSS Res. 2021, 45, 310–316. [Google Scholar] [CrossRef]
- OpenAI. GPT-3.5 Turbo Updates. 2023. Available online: https://platform.openai.com/docs/models/gpt-3-5-turbo (accessed on 14 June 2024).
- AI@Meta. Llama 3 Model Card. 2024. Available online: https://www.meta.ai (accessed on 14 June 2024).
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Wu, X.; Yao, W.; Chen, J.; Pan, X.; Wang, X.; Liu, N.; Yu, D. From language modeling to instruction following: Understanding the behavior shift in llms after instruction tuning. arXiv 2023, arXiv:2310.00492. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Yin, W. Meta-learning for few-shot natural language processing: A survey. arXiv 2020, arXiv:2007.09604. [Google Scholar]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Qin, C.; Zhang, A.; Zhang, Z.; Chen, J.; Yasunaga, M.; Yang, D. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? In Proceedings of the The 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023. [Google Scholar]
- Zhang, Z.; Zhang, A.; Li, M.; Zhao, H.; Karypis, G.; Smola, A. Multimodal Chain-of-Thought reasoning in language models. arXiv 2023, arXiv:2302.00923. [Google Scholar]
- Lyu, Q.; Havaldar, S.; Stein, A.; Zhang, L.; Rao, D.; Wong, E.; Apidianaki, M.; Callison-Burch, C. Faithful Chain-of-Thought Reasoning. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023), Bali, Indonesia, 1–4 November 2023; Association for Computational Linguistics: Bali, Indonesia, 2023. [Google Scholar]
- Wang, Y.; Zhong, W.; Li, L.; Mi, F.; Zeng, X.; Huang, W.; Shang, L.; Jiang, X.; Liu, Q. Aligning large language models with human: A survey. arXiv 2023, arXiv:2307.12966. [Google Scholar]
- Shen, T.; Jin, R.; Huang, Y.; Liu, C.; Dong, W.; Guo, Z.; Wu, X.; Liu, Y.; Xiong, D. Large language model alignment: A survey. arXiv 2023, arXiv:2309.15025. [Google Scholar]
Variables | Description |
---|---|
Crash characteristics | |
ACCIDENT_TYPE | The type of accident. |
EVENT_TYPE | Type of incident event. |
VEHICLE_1_COLL_PT | Collision point on the first vehicle involved in the event. |
VEHICLE_2_COLL_PT | Collision point on the second vehicle involved in the event. |
OBJECT_TYPE | Object involved in the specific accident event. |
DCA | The definitions for classifying accidents. |
ACCIDENT_MONTH | The month in which the accident occurred, derived from “ACCIDENT_DATE”. |
TIME_PERIOD | The period the accident occurred, derived from “ACCIDENT_TIME”. |
DAY_OF_WEEK | The day of the week the accident occurred. |
LGA_NAME | The name of local government areas. |
REGION_NAME | The region where the accident occurred. |
DEG_URBAN_NAME | The type of urbanized area for the crash site. |
Driver characteristics | |
DRIVER_SEX | The sex of the driver. |
AGE_GROUP | The age group of the driver, derived from “DRIVER_AGE”. |
ROAD_USER_TYPE | The role of the person was at the time of the accident. |
Vehicle characteristics | |
VEHICLE_TYPE | The type or category of vehicle. |
VEHICLE_WEIGHT | The weight or mass of the vehicle. The unit of measurement is kilograms. |
NO_OF_WHEELS | The number of wheels that the vehicle has. |
SEATING_CAPACITY | The number of seats in the vehicle. |
FUEL_TYPE | The type of fuel used by the vehicle. |
VEHICLE_AGE | The age of the vehicle when the accident occurred. |
VEHICLE_BODY_STYLE | The body type of the vehicle. |
TRAILER_TYPE | The type of trailer towed by the vehicle involved in the accident. |
Roadway attributes | |
ROAD_TYPE | Type of the highest priority road at the intersection or the road the accident occurred. |
ROAD_GEOMETRY | The layout of the road where the accident occurred. |
SPEED_ZONE | The speed zone at the location of the accident. |
ROAD_SURFACE_TYPE | The type of road surface: 1: Paved 2: Unpaved 3: Gravel 9: Not known. |
ROAD_TYPE_INT | The type or suffix of the intersecting road. |
COMPLEX_INT_NO | Whether or not the segment is part of a complex intersection. |
Environmental factors | |
LIGHT_CONDITION | The light condition or level of brightness at the time of the accident. |
SURFACE_COND | Road surface condition: dry, wet, muddy, snowy, icy, unknown. |
SURFACE_COND_SEQ | Starts with 1 and incremented by 1 if more than one road surface condition. |
ATMOSPH_COND | Atmospheric condition. |
ATMOSPH_COND_SEQ | 1 and incremented by 1 if more than one atmospheric condition is entered. |
Situational factors | |
HELMET_BELT_WORN | Whether or not the person was wearing a helmet or seatbelt at the time of the accident. |
NO_OF_VEHICLES | The number of vehicles involved in the accident. |
LAMPS | Whether the lamps or headlights for the vehicle were alight (on). |
VEHICLE_MOVEMENT | The actual movement of the vehicle before the accident. |
TRAFFIC_CONTROL | The type of traffic control measure in the location where the accident occurred. |
NO_PERSONS | The number of people involved in the accident. |
NO_OCCUPANTS | The number of occupants or people in the vehicle at the time of the accident. |
SUB_DCA | SUB_DCA code and description of the accident. |
SUB_DCA_SEQ | Starts with 1 and is incremented by 1 if more than one sub_dca is entered. |
DRIVER_INTENT | The intent of the driver initially. |
Experiments Setting | ||||
Plain | Chain-of-Thought | Prompt Engineering | Prompt Engineering with Chain-of-Thought | |
Zero-shot | ZS | ZS_CoT | ZS_PE | ZS_PE_CoT |
Few-shot | FS | / | FS_PE | / |
Models | ||||
GPT-3.5-turbo | LLaMA3-8B | LLaMA3-70B | ||
Sampling Strategy | Greedy | Greedy | Greedy | |
(temperature, top_p) | (0, 0.01) | - | - |
Model | Macro F1-Score | Macro-Accuracy | Fatal Accident | Serious Injury Accident | Minor or Non-Injury Accident | |
---|---|---|---|---|---|---|
ZS | ||||||
GPT-3.5 | 0.1812 | 0.3400 | 0.00 | 1.00 | 0.02 | |
LLaMA3-8B | 0.1818 | 0.3400 | 0.00 | 1.00 | 0.02 | |
LLaMA3-70B | 0.4541 | 0.4533 | 0.44 | 0.34 | 0.58 | |
ZS_CoT | ||||||
GPT-3.5 | 0.2073 | 0.3533 | 0.00 | 1.00 | 0.06 | |
LLaMA3-8B | 0.2496 | 0.3533 | 0.00 | 0.88 | 0.18 | |
LLaMA3-70B | 0.4747 | 0.4733 | 0.40 | 0.64 | 0.38 | |
ZS_PE | ||||||
GPT-3.5 | 0.3798 | 0.4533 | 0.62 | 0.72 | 0.02 | |
LLaMA3-8B | 0.3120 | 0.4000 | 0.34 | 0.86 | 0.00 | |
LLaMA3-70B | 0.4755 | 0.4933 | 0.60 | 0.66 | 0.22 | |
ZS_PE_CoT | ||||||
GPT-3.5 | 0.3509 | 0.4200 | 0.68 | 0.56 | 0.02 | |
LLaMA3-8B | 0.4033 | 0.4533 | 0.60 | 0.68 | 0.08 | |
LLaMA3-70B | 0.3581 | 0.4267 | 0.62 | 0.64 | 0.02 | |
FS | ||||||
GPT-3.5 | 0.2514 | 0.3667 | 0.04 | 0.96 | 0.10 | |
LLaMA3-8B | 0.4068 | 0.4267 | 0.22 | 0.72 | 0.34 | |
LLaMA3-70B | 0.4131 | 0.4200 | 0.26 | 0.64 | 0.36 | |
FS_PE | ||||||
GPT-3.5 | 0.2576 | 0.3667 | 0.18 | 0.92 | 0.00 | |
LLaMA3-8B | 0.2928 | 0.3933 | 0.08 | 0.98 | 0.12 | |
LLaMA3-70B | 0.3856 | 0.4600 | 0.56 | 0.80 | 0.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhen, H.; Shi, Y.; Huang, Y.; Yang, J.J.; Liu, N. Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference. Computers 2024, 13, 232. https://doi.org/10.3390/computers13090232
Zhen H, Shi Y, Huang Y, Yang JJ, Liu N. Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference. Computers. 2024; 13(9):232. https://doi.org/10.3390/computers13090232
Chicago/Turabian StyleZhen, Hao, Yucheng Shi, Yongcan Huang, Jidong J. Yang, and Ninghao Liu. 2024. "Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference" Computers 13, no. 9: 232. https://doi.org/10.3390/computers13090232