MDPI - Publisher of Open Access Journals

20 pages, 5650 KiB

Open AccessArticle

Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization

by Zongshang Pang, Yuta Nakashima, Mayu Otani and Hajime Nagahara

J. Imaging 2024, 10(9), 229; https://doi.org/10.3390/jimaging10090229 (registering DOI) - 14 Sep 2024

Viewed by 160

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Past efforts have invariantly involved training summarization models with annotated summaries or heuristic objectives. In this work, we reveal that features pre-trained on image-level [...] Read more.

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Past efforts have invariantly involved training summarization models with annotated summaries or heuristic objectives. In this work, we reveal that features pre-trained on image-level tasks contain rich semantic information that can be readily leveraged to quantify frame-level importance for zero-shot video summarization. Leveraging pre-trained features and contrastive learning, we propose three metrics featuring a desirable keyframe: local dissimilarity, global consistency, and uniqueness. We show that the metrics can well-capture the diversity and representativeness of frames commonly used for the unsupervised generation of video summaries, demonstrating competitive or better performance compared to past methods when no training is needed. We further propose a contrastive learning-based pre-training strategy on unlabeled videos to enhance the quality of the proposed metrics and, thus, improve the evaluated performance on the public benchmarks TVSum and SumMe. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

20 pages, 2961 KiB

Open AccessArticle

Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

by Hao Zhen, Yucheng Shi, Yongcan Huang, Jidong J. Yang and Ninghao Liu

Computers 2024, 13(9), 232; https://doi.org/10.3390/computers13090232 (registering DOI) - 14 Sep 2024

Viewed by 218

Abstract

Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity analysis and inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular [...] Read more.

Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity analysis and inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models’ capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs’ reasoning process, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)

► Show Figures

Figure 1

4 pages, 797 KiB

Open AccessProceeding Paper

Accelerating Urban Drainage Simulations: A Data-Efficient GNN Metamodel for SWMM Flowrates

by Alexander Garzón, Zoran Kapelan, Jeroen Langeveld and Riccardo Taormina

Eng. Proc. 2024, 69(1), 137; https://doi.org/10.3390/engproc2024069137 (registering DOI) - 13 Sep 2024

Viewed by 46

Abstract

Computational models for water resources often experience slow execution times, limiting their application. Metamodels, especially those based on machine learning, offer a promising alternative. Our research extends a prior Graph Neural Network (GNN) metamodel for the Storm Water Management Model (SWMM), which efficiently [...] Read more.

Computational models for water resources often experience slow execution times, limiting their application. Metamodels, especially those based on machine learning, offer a promising alternative. Our research extends a prior Graph Neural Network (GNN) metamodel for the Storm Water Management Model (SWMM), which efficiently learns with less data and generalizes to new UDS sections via transfer learning. We extend the metamodel’s functioning by adding flowrate prediction, crucial for assessing water quality and flooding risks. Using an Encoder–Processor–Decoder architecture, the metamodel displays high accuracy on the simulated time series. Future work is aimed at incorporating more physical principles and testing further transferability. Full article

(This article belongs to the Proceedings of The 3rd International Joint Conference on Water Distribution Systems Analysis & Computing and Control for the Water Industry (WDSA/CCWI 2024))

► Show Figures

Figure 1

Figure 1
Summary of the process to generate a prediction for one future time step of depths and flow rates. Subsequent predictions are obtained by iteratively repeating this process. (a) shows the inputs: partial time series of runoff and water depths, and system information (topology, node elevation, pipe diameters, and lengths). These data are organized in windows and normalized before entering the artificial neural network. (b) shows the metamodel structure in three stages: Encoder, Processor, and Decoder. The Encoder is a set of two multilayer perceptrons, <math display="inline"><semantics> <mrow> <mi mathvariant="normal">ϕ</mi> </mrow> </semantics></math>, that separately computes the embedding of nodes (pictured in pink) and pipes (pictured in green). These embeddings are fed to the graph layer which computes new node embeddings (pictured in gray). The output of this phase is then decoded by the Decoder, a set of two MLPs that transform the processed embeddings into raw predictions of the physical variables, i.e., depth (<math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="normal">d</mi> </mrow> <mrow> <mo>∗</mo> </mrow> </msup> </mrow> </semantics></math>) and flow rate (<math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="normal">q</mi> </mrow> <mrow> <mo>∗</mo> </mrow> </msup> </mrow> </semantics></math>). These quantities are marked with an asterisk to indicate they have not been post-processed. (c) shows the new predictions of depths and flow rates after being post-processed. Having these values, the process repeats to determine the entire time series. This diagram is adapted from [<a href="#B2-engproc-69-00137" class="html-bibr">2</a>] to illustrate the modification of the method. Full article ">Figure 2
Performance of the model for emulating flow rates during a validation rainfall event. (a) shows the distribution of Root Mean Square Error (RMSE) in the map of the storm water system. Each point represents a pipe in the map. (b) shows the original and emulated time series of flow rates for a pipe with the one of the highest RMSEs (<math display="inline"><semantics> <mrow> <mn>0.1</mn> <mo> </mo> <msup> <mrow> <mi mathvariant="normal">m</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msup> <mo>/</mo> <mi mathvariant="normal">s</mi> </mrow> </semantics></math>), indicated in (a) with a cross. Full article ">

23 pages, 13322 KiB

Open AccessArticle

Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models

by Xintao Xing and Peng Chen

Appl. Sci. 2024, 14(17), 7819; https://doi.org/10.3390/app14177819 - 3 Sep 2024

Viewed by 400

Abstract

With the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models to improve [...] Read more.

With the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models to improve the effectiveness of police report extraction has become an inevitable trend in the field of police data analysis. This study addresses the characteristics of Chinese police reports and the need to extract key elements by employing large language models specific to the public security domain for entity extraction. Several lightweight (6/7b) open-source large language models were tested as base models. To enhance model performance, LoRA fine-tuning was employed, combined with data engineering approaches. A zero-shot data augmentation method based on ChatGPT and prompt engineering techniques tailored for police reports were proposed to further improve model performance. The key police report data from a certain city in 2019 were used as a sample for testing. Compared to the base models, prompt engineering improved the F1 score by approximately 3%, while fine-tuning led to an increase of 10–50% in the F1 score. After fine-tuning and comparing different base models, the Baichuan model demonstrated the best overall performance in extracting key elements from police reports. Using the data augmentation method to double the data size resulted in an additional 4% increase in the F1 score, achieving optimal model performance. Compared to the fine-tuned universal information extraction (UIE) large language model, the police report entity extraction model constructed in this study improved the F1 score for each element by approximately 5%, with a 42% improvement in the F1 score for the “organization” element. Finally, ChatGPT was employed to align the extracted entities, resulting in a high-quality entity extraction outcome. Full article

► Show Figures

Figure 1

20 pages, 6718 KiB

Open AccessArticle

Using Multimodal Large Language Models (MLLMs) for Automated Detection of Traffic Safety-Critical Events

by Mohammad Abu Tami, Huthaifa I. Ashqar, Mohammed Elhenawy, Sebastien Glaser and Andry Rakotonirainy

Vehicles 2024, 6(3), 1571-1590; https://doi.org/10.3390/vehicles6030074 - 2 Sep 2024

Viewed by 452

Abstract

Traditional approaches to safety event analysis in autonomous systems have relied on complex machine and deep learning models and extensive datasets for high accuracy and reliability. However, the emerge of multimodal large language models (MLLMs) offers a novel approach by integrating textual, visual, [...] Read more.

Traditional approaches to safety event analysis in autonomous systems have relied on complex machine and deep learning models and extensive datasets for high accuracy and reliability. However, the emerge of multimodal large language models (MLLMs) offers a novel approach by integrating textual, visual, and audio modalities. Our framework leverages the logical and visual reasoning power of MLLMs, directing their output through object-level question–answer (QA) prompts to ensure accurate, reliable, and actionable insights for investigating safety-critical event detection and analysis. By incorporating models like Gemini-Pro-Vision 1.5, we aim to automate safety-critical event detection and analysis along with mitigating common issues such as hallucinations in MLLM outputs. The results demonstrate the framework’s potential in different in-context learning (ICT) settings such as zero-shot and few-shot learning methods. Furthermore, we investigate other settings such as self-ensemble learning and a varying number of frames. The results show that a few-shot learning model consistently outperformed other learning models, achieving the highest overall accuracy of about 79%. The comparative analysis with previous studies on visual reasoning revealed that previous models showed moderate performance in driving safety tasks, while our proposed model significantly outperformed them. To the best of our knowledge, our proposed MLLM model stands out as the first of its kind, capable of handling multiple tasks for each safety-critical event. It can identify risky scenarios, classify diverse scenes, determine car directions, categorize agents, and recommend the appropriate actions, setting a new standard in safety-critical event management. This study shows the significance of MLLMs in advancing the analysis of naturalistic driving videos to improve safety-critical event detection and understanding the interactions in complex environments. Full article

(This article belongs to the Special Issue Vehicle Design Processes, 2nd Edition)

► Show Figures

Figure 1

44 pages, 4286 KiB

Open AccessArticle

Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data

by Shadi Jaradat, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar and Mohammad Elhenawy

Smart Cities 2024, 7(5), 2422-2465; https://doi.org/10.3390/smartcities7050095 - 1 Sep 2024

Viewed by 709

Abstract

Road traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time traffic safety analysis offers a promising alternative, yet effective frameworks for this integration are scarce. This study [...] Read more.

Road traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time traffic safety analysis offers a promising alternative, yet effective frameworks for this integration are scarce. This study introduces a novel multitask learning (MTL) framework utilizing large language models (LLMs) to analyze RTC-related tweets from Australia. We collected 26,226 traffic-related tweets from May 2022 to May 2023. Using GPT-3.5, we extracted fifteen distinct features categorized into six classification tasks and nine information retrieval tasks. These features were then used to fine-tune GPT-2 for language modeling, which outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across most tasks. Unlike traditional single-task classifiers that may miss critical details, our MTL approach simultaneously classifies RTC-related tweets and extracts detailed information in natural language. Our fine-tunedGPT-2 model achieved an average accuracy of 85% across the six classification tasks, surpassing the baseline GPT-4o mini model’s 64% and XGBoost’s 83.5%. In information retrieval tasks, our fine-tuned GPT-2 model achieved a BLEU-4 score of 0.22, a ROUGE-I score of 0.78, and a WER of 0.30, significantly outperforming the baseline GPT-4 mini model’s BLEU-4 score of 0.0674, ROUGE-I score of 0.2992, and WER of 2.0715. These results demonstrate the efficacy of our fine-tuned GPT-2 model in enhancing both classification and information retrieval, offering valuable insights for data-driven decision-making to improve road safety. This study is the first to explicitly apply social media data and LLMs within an MTL framework to enhance traffic safety. Full article

(This article belongs to the Section Smart Transportation)

► Show Figures

Figure 1

22 pages, 31331 KiB

Open AccessArticle

A Zero-Shot Learning Approach for Blockage Detection and Identification Based on the Stacking Ensemble Model

by Chaoqun Li, Zao Feng, Mingkai Jiang and Zhenglang Wang

Sensors 2024, 24(17), 5596; https://doi.org/10.3390/s24175596 - 29 Aug 2024

Viewed by 445

Abstract

A data-driven approach to defect identification requires many labeled samples for model training. Yet new defects tend to appear during data acquisition cycles, which can lead to a lack of labeled samples of these new defects. Aiming at solving this problem, we proposed [...] Read more.

A data-driven approach to defect identification requires many labeled samples for model training. Yet new defects tend to appear during data acquisition cycles, which can lead to a lack of labeled samples of these new defects. Aiming at solving this problem, we proposed a zero-shot pipeline blockage detection and identification method based on stacking ensemble learning. The experimental signals were first decomposed using variational modal decomposition (VMD), and then, the information entropy was calculated for each intrinsic modal function (IMF) component to construct the feature sets. Second, the attribute matrix was established according to the attribute descriptions of the defect categories, and the stacking ensemble attribute learner was used for the attribute learning of defect features. Finally, defect identification was accomplished by comparing the similarity within the attribute matrices. The experimental results show that target defects can be identified even without targeted training samples. The model showed better classification performance on the six sets of experimental data, and the average recognition accuracy of the model for unknown defect categories reached 72.5%. Full article

(This article belongs to the Special Issue Structural Health Monitoring Using Sensors and Machine Learning)

► Show Figures

Graphical abstract

24 pages, 3548 KiB

Open AccessArticle

Adapting CLIP for Action Recognition via Dual Semantic Supervision and Temporal Prompt Reparameterization

by Lujuan Deng, Jieqing Tan and Fangmei Liu

Electronics 2024, 13(16), 3348; https://doi.org/10.3390/electronics13163348 - 22 Aug 2024

Viewed by 355

Abstract

The contrastive vision–language pre-trained model CLIP, driven by large-scale open-vocabulary image–text pairs, has recently demonstrated remarkable zero-shot generalization capabilities in diverse downstream image tasks, which has made numerous models dominated by the “image pre-training followed by fine-tuning” paradigm exhibit promising results on standard [...] Read more.

The contrastive vision–language pre-trained model CLIP, driven by large-scale open-vocabulary image–text pairs, has recently demonstrated remarkable zero-shot generalization capabilities in diverse downstream image tasks, which has made numerous models dominated by the “image pre-training followed by fine-tuning” paradigm exhibit promising results on standard video benchmarks. However, as models scale up, full fine-tuning adaptive strategy for specific tasks becomes difficult in terms of training and storage. In this work, we propose a novel method that adapts CLIP to the video domain for efficient recognition without destroying the original pre-trained parameters. Specifically, we introduce temporal prompts to realize the object of reasoning about the dynamic content of videos for pre-trained models that lack temporal cues. Then, by replacing the direct learning style of prompt vectors with a lightweight reparameterization encoder, the model can be adapted to domain-specific adjustment to learn more generalizable representations. Furthermore, we predefine a Chinese label dictionary to enhance video representation by co-supervision of Chinese and English semantics. Extensive experiments on video action recognition benchmarks show that our method achieves competitive or even better performance than most existing methods with fewer trainable parameters in both general and few-shot recognition scenarios. Full article

► Show Figures

Figure 1

19 pages, 674 KiB

Open AccessArticle

Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search

by Thi-Trang Nguyen and Ji-Hyeong Han

Electronics 2024, 13(16), 3325; https://doi.org/10.3390/electronics13163325 - 21 Aug 2024

Viewed by 461

Abstract

Designing a high-performance neural network is a difficult task. Neural architecture search (NAS) methods aim to solve this process. However, the construction of a high-quality accuracy predictor, which is a key component of NAS, usually requires significant computation. Therefore, zero-shot proxy-based NAS methods [...] Read more.

Designing a high-performance neural network is a difficult task. Neural architecture search (NAS) methods aim to solve this process. However, the construction of a high-quality accuracy predictor, which is a key component of NAS, usually requires significant computation. Therefore, zero-shot proxy-based NAS methods have been actively and extensively investigated. In this work, we propose a new efficient zero-shot proxy, Incorporated-Score, to rank deep neural network architectures instead of using an accuracy predictor. The proposed Incorporated-Score proxy is generated by incorporating the zen-score and entropy information of the network, and it does not need to train any network. We then introduce an optimal NAS algorithm called Incorporated-NAS that targets the maximization of the Incorporated-Score of the neural network within the specified inference budgets. The experiments show that the network designed by Incorporated-NAS with Incorporated-Score outperforms the previously proposed Zen-NAS and achieves a new SOTAaccuracy on the CIFAR-10, CIFAR-100, and ImageNet datasets with a lightweight scale. Full article

(This article belongs to the Special Issue Towards Efficient and Reliable AI at the Edge)

► Show Figures

Figure 1

19 pages, 1692 KiB

Open AccessArticle

An Efficient Cross-Modal Privacy-Preserving Image–Text Retrieval Scheme

by Kejun Zhang, Shaofei Xu, Yutuo Song, Yuwei Xu, Pengcheng Li, Xiang Yang, Bing Zou and Wenbin Wang

Symmetry 2024, 16(8), 1084; https://doi.org/10.3390/sym16081084 - 21 Aug 2024

Viewed by 741

Abstract

Preserving the privacy of the ever-increasing multimedia data on the cloud while providing accurate and fast retrieval services has become a hot topic in information security. However, existing relevant schemes still have significant room for improvement in accuracy and speed. Therefore, this paper [...] Read more.

Preserving the privacy of the ever-increasing multimedia data on the cloud while providing accurate and fast retrieval services has become a hot topic in information security. However, existing relevant schemes still have significant room for improvement in accuracy and speed. Therefore, this paper proposes a privacy-preserving image–text retrieval scheme called PITR. To enhance model performance with minimal parameter training, we freeze all parameters of a multimodal pre-trained model and incorporate trainable modules along with either a general adapter or a specialized adapter, which are used to enhance the model’s ability to perform zero-shot image classification and cross-modal retrieval in general or specialized datasets, respectively. To preserve the privacy of outsourced data on the cloud and the privacy of the user’s retrieval process, we employ asymmetric scalar-product-preserving encryption technology suitable for inner product calculation, and we employ distributed index storage technology and construct a two-level security model. We construct a hierarchical index structure to speed up query matching among massive high-dimensional index vectors. Experimental results demonstrate that our scheme can provide users with secure, accurate, fast cross-modal retrieval service while preserving data privacy. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

20 pages, 2982 KiB

Open AccessArticle

Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement

by Ibrahim Nawawi, Kurnia Fahmy Ilmawan, Muhammad Rifqi Maarif and Muhammad Syafrudin

Information 2024, 15(8), 499; https://doi.org/10.3390/info15080499 - 20 Aug 2024

Viewed by 494

Abstract

Hospitality services play a crucial role in shaping tourist satisfaction and revisiting intention toward destinations. Traditional feedback methods like surveys often fail to capture the nuanced and real-time experiences of tourists. Digital platforms such as TripAdvisor, Yelp, and Google Reviews provide a rich [...] Read more.

Hospitality services play a crucial role in shaping tourist satisfaction and revisiting intention toward destinations. Traditional feedback methods like surveys often fail to capture the nuanced and real-time experiences of tourists. Digital platforms such as TripAdvisor, Yelp, and Google Reviews provide a rich source of user-generated content, but the sheer volume of reviews makes manual analysis impractical. This study proposes integrating aspect-based sentiment analysis with zero-shot learning to analyze online tourist reviews effectively without requiring extensive annotated datasets. Using pretrained models like RoBERTa, the research framework involves keyword extraction, sentence segment detection, aspect construction, and sentiment polarity measurement. The dataset, sourced from TripAdvisor reviews of attractions, hotels, and restaurants in Central Java, Indonesia, underwent preprocessing to ensure suitability for analysis. The results highlight the importance of aspects such as food, accommodation, and cultural experiences in tourist satisfaction. The findings indicate a need for continuous service improvement to meet evolving tourist expectations, demonstrating the potential of advanced natural language processing techniques in enhancing hospitality services and customer satisfaction. Full article

(This article belongs to the Special Issue Advances in Data and Network Sciences Applied to Computational Social Science)

► Show Figures

Figure 1

23 pages, 5374 KiB

Open AccessArticle

Leveraging Visual Language Model and Generative Diffusion Model for Zero-Shot SAR Target Recognition

by Junyu Wang, Hao Sun, Tao Tang, Yuli Sun, Qishan He, Lin Lei and Kefeng Ji

Remote Sens. 2024, 16(16), 2927; https://doi.org/10.3390/rs16162927 - 9 Aug 2024

Viewed by 601

Abstract

Simulated data play an important role in SAR target recognition, particularly under zero-shot learning (ZSL) conditions caused by the lack of training samples. The traditional SAR simulation method is based on manually constructing target 3D models for electromagnetic simulation, which is costly and [...] Read more.

Simulated data play an important role in SAR target recognition, particularly under zero-shot learning (ZSL) conditions caused by the lack of training samples. The traditional SAR simulation method is based on manually constructing target 3D models for electromagnetic simulation, which is costly and limited by the target’s prior knowledge base. Also, the unavoidable discrepancy between simulated SAR and measured SAR makes the traditional simulation method more limited for target recognition. This paper proposes an innovative SAR simulation method based on a visual language model and generative diffusion model by extracting target semantic information from optical remote sensing images and transforming it into a 3D model for SAR simulation to address the challenge of SAR target recognition under ZSL conditions. Additionally, to reduce the domain shift between the simulated domain and the measured domain, we propose a domain adaptation method based on dynamic weight domain loss and classification loss. The effectiveness of semantic information-based 3D models has been validated on the MSTAR dataset and the feasibility of the proposed framework has been validated on the self-built civilian vehicle dataset. The experimental results demonstrate that the first proposed SAR simulation method based on a visual language model and generative diffusion model can effectively improve target recognition performance under ZSL conditions. Full article

► Show Figures

Figure 1

16 pages, 1963 KiB

Open AccessArticle

Cross-Domain Fake News Detection Using a Prompt-Based Approach

by Jawaher Alghamdi, Yuqing Lin and Suhuai Luo

Future Internet 2024, 16(8), 286; https://doi.org/10.3390/fi16080286 - 8 Aug 2024

Viewed by 696

Abstract

The proliferation of fake news poses a significant challenge in today’s information landscape, spanning diverse domains and topics and undermining traditional detection methods confined to specific domains. In response, there is a growing interest in strategies for detecting cross-domain misinformation. However, traditional machine [...] Read more.

The proliferation of fake news poses a significant challenge in today’s information landscape, spanning diverse domains and topics and undermining traditional detection methods confined to specific domains. In response, there is a growing interest in strategies for detecting cross-domain misinformation. However, traditional machine learning (ML) approaches often struggle with the nuanced contextual understanding required for accurate news classification. To address these challenges, we propose a novel contextualized cross-domain prompt-based zero-shot approach utilizing a pre-trained Generative Pre-trained Transformer (GPT) model for fake news detection (FND). In contrast to conventional fine-tuning methods reliant on extensive labeled datasets, our approach places particular emphasis on refining prompt integration and classification logic within the model’s framework. This refinement enhances the model’s ability to accurately classify fake news across diverse domains. Additionally, the adaptability of our approach allows for customization across diverse tasks by modifying prompt placeholders. Our research significantly advances zero-shot learning by demonstrating the efficacy of prompt-based methodologies in text classification, particularly in scenarios with limited training data. Through extensive experimentation, we illustrate that our method effectively captures domain-specific features and generalizes well to other domains, surpassing existing models in terms of performance. These findings contribute significantly to the ongoing efforts to combat fake news dissemination, particularly in environments with severely limited training data, such as online platforms. Full article

(This article belongs to the Special Issue Embracing Artificial Intelligence (AI) for Network and Service)

► Show Figures

Figure 1

16 pages, 698 KiB

Open AccessArticle

Leveraging Medical Knowledge Graphs and Large Language Models for Enhanced Mental Disorder Information Extraction

by Chaelim Park, Hayoung Lee and Ok-ran Jeong

Future Internet 2024, 16(8), 260; https://doi.org/10.3390/fi16080260 - 24 Jul 2024

Viewed by 772

Abstract

The accurate diagnosis and effective treatment of mental health disorders such as depression remain challenging owing to the complex underlying causes and varied symptomatology. Traditional information extraction methods struggle to adapt to evolving diagnostic criteria such as the Diagnostic and Statistical Manual of [...] Read more.

The accurate diagnosis and effective treatment of mental health disorders such as depression remain challenging owing to the complex underlying causes and varied symptomatology. Traditional information extraction methods struggle to adapt to evolving diagnostic criteria such as the Diagnostic and Statistical Manual of Mental Disorders fifth edition (DSM-5) and to contextualize rich patient data effectively. This study proposes a novel approach for enhancing information extraction from mental health data by integrating medical knowledge graphs and large language models (LLMs). Our method leverages the structured organization of knowledge graphs specifically designed for the rich domain of mental health, combined with the powerful predictive capabilities and zero-shot learning abilities of LLMs. This research enhances the quality of knowledge graphs through entity linking and demonstrates superiority over traditional information extraction techniques, making a significant contribution to the field of mental health. It enables a more fine-grained analysis of the data and the development of new applications. Our approach redefines the manner in which mental health data are extracted and utilized. By integrating these insights with existing healthcare applications, the groundwork is laid for the development of real-time patient monitoring systems. The performance evaluation of this knowledge graph highlights its effectiveness and reliability, indicating significant advancements in automating medical data processing and depression management. Full article

(This article belongs to the Special Issue Distributed Storage of Large Knowledge Graphs with Mobility Data)

► Show Figures

Figure 1

22 pages, 3224 KiB

Open AccessArticle

Large-Scale Urban Traffic Management Using Zero-Shot Knowledge Transfer in Multi-Agent Reinforcement Learning for Intersection Patterns

by Theodore Tranos, Christos Spatharis, Konstantinos Blekas and Andreas-Giorgios Stafylopatis

Robotics 2024, 13(7), 109; https://doi.org/10.3390/robotics13070109 - 19 Jul 2024

Viewed by 938

Abstract

The automatic control of vehicle traffic in large urban networks constitutes one of the most serious challenges to modern societies, with an impact on improving the quality of human life and saving energy and time. Intersections are a special traffic structure of pivotal [...] Read more.

The automatic control of vehicle traffic in large urban networks constitutes one of the most serious challenges to modern societies, with an impact on improving the quality of human life and saving energy and time. Intersections are a special traffic structure of pivotal importance as they accumulate a large number of vehicles that should be served in an optimal manner. Constructing intelligent models that manage to automatically coordinate and steer vehicles through intersections is a key point in the fragmentation of traffic control, offering active solutions through the flexibility of automatically adapting to a variety of traffic conditions. Responding to this call, this work aims to propose an integrated active solution of automatic traffic management. We introduce a multi-agent reinforcement learning framework that effectively models traffic flow at individual unsignalized intersections. It relies on a compact agent definition, a rich information state space, and a learning process characterized not only by depth and quality, but also by substantial degrees of freedom and variability. The resulting driving profiles are further transferred to larger road networks to integrate their individual elements and compose an effective automatic traffic control platform. Experiments are conducted on simulated road networks of variable complexity, demonstrating the potential of the proposed method. Full article

(This article belongs to the Section AI in Robotics)

► Show Figures

Figure 1

Figure 1
Examples of different configurations covered by the 3-way and 4-way intersection patterns. Full article ">Figure 2
Overview structure of the proposed method for traffic control in road networks consisting of three major modules. Full article ">Figure 3
Traffic control in a 4-way intersection pattern using the proposed MARL scheme. Every road-agent (<math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>A</mi> <mi>i</mi> </msub> </mrow> </semantics></math>) is responsible for safely guiding vehicles through its designated road segment, while cooperatively coordinating with the other agents (in our case three). Full article ">Figure 4
Matching the network’s intersections with two default intersection patterns. The road network contains four and two copies of the default “4-way” and “3-way” intersection patterns, respectively. The light-colored thin strip corresponds to a route that a vehicle may follow within this network. Full article ">Figure 5
Learning curves of the 3-way and 4-way intersection patterns in terms of the average velocity, duration, and collisions per epoch, created by using a rolling window of fifty (50) episodes. Full article ">Figure 6
Dynamic evolution of both the average velocity and the frequency of vehicles being served per second, obtained from the implementation of learned multi-agent policies on intersection patterns within a designated test scenario. Traffic state colored zones are also shown. Full article ">Figure 7
Four artificial road networks of increasing complexity that were generated for evaluating the knowledge transfer process. Every intersection is a noisy copy of either the default 3-way or 4-way intersection pattern. Full article ">

Search Results (209)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (209)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI