1. Introduction
The need for reliable and autonomous vehicles is increasing with time. A vehicular ad hoc network (VANET) is crucial because it provides seamless and convenient communication between the infrastructure and the vehicle [
1]. However, to ensure safety and efficiency between vehicles, real-time decision-making is very essential [
2]. In addition to that, road events are unpredictable and dynamic, requiring a rapid, accurate, and resource-efficient analysis of complicated and dynamic vehicular data [
3]. This ensures optimal decision-making and operations for dynamic network environments and high-speed vehicles. This paper focuses on solving high computational costs, latency, and inconsistent accuracy issues by enabling scalable, fast, and accurate real-time decision-making in these environments.
In VANET, the challenges relate to vehicles with limited resources, prediction, data management, and vehicle condition monitoring [
4]. With the evolution of VANET, digital twins (DTs), virtual representations of physical entities [
5], are crucial for resource management, prediction, and real-time monitoring. Digital twin-based VANET (DT-VANET) predicts vehicle behavior, manages resources, and monitors vehicles in real-time. Another significant benefit of using digital twins in VANET is that we can effectively process large amounts of data and extract valuable and meaningful insights. However, processing these large amounts of data presents challenges such as accuracy and latency [
6], which are very important in real-time decision-making. In addition, keeping the response time as minimal as possible is also a challenge that we need to take care of [
7]. It is essential to use reliable CNN models like EfficientNet to effectively manage the challenges of real-time decision-making and accuracy with minimal latency, computational resources, and convergence time in VANETs, for example, traffic sign classification and detection of pedestrians and objects in rain, fog, and various light conditions.
The growth of deep learning techniques shows the vast capability to protect vehicular networks, which increases the accuracy rate for decision making [
8,
9]. Convolutional Neural Networks (CNNs) are one of the types of deep learning algorithms. CNN, especially EfficientNet, improves the ability to make fast and accurate decisions with less computational cost [
10]. It is convenient to evaluate and forecast data acquired in real-time, such as road condition information, driving information, weather conditions, etc., when DT-VANET and EfficientNet are combined. The predicted projected results are transmitted to the moving vehicle through base stations. For example, they remind the driver to pay attention to the road sign or change their vehicle speed accordingly to reduce the likelihood of traffic accidents.
This paper and the research aim to understand better how EfficientNet, an advanced convolution neural network architecture, enhances real-time decision-making in VANET based on digital twins. EfficientNet is known for its good performance in real-time applications in efficiency and accuracy [
10]. We can optimize decision-making, data and image processing, traffic condition prediction, optimal driving strategies, risks, and accurate and on-time dynamic traffic condition responses using EfficientNet in DT-VANET.
Our work is the first to use EfficientNet to perform optimal real-time decision-making and object detection in DT-VANET under real-world conditions. In addition, we focus on the optimized architecture that can be used for real-time decision-making. We also discussed the related work and application of EfficientNet in twin-based vehicular networks. Therefore, the following is a summary of our contributions:
We build a thorough, general architecture for vehicular networks based on digital twins using EfficientNet. A scenario for a digital twin-based vehicular network is shown. In addition, we offer diagrams for twin-based vehicle networks.
We outline the application of EfficientNet in digital twin-based vehicular networks.
Lastly, we conducted experiments/simulations on real-time datasets to evaluate the performance metrics. Our experimental results show an improvement in the reliability and performance of DT-VANET.
The remainder of the paper is structured as follows. Related works are reviewed in
Section 2. We suggest an EfficientNet architecture in
Section 3. Subsequently, in
Section 4, extensive experiments are conducted and evaluated to determine the efficiency and effectiveness of our suggested architecture. We finally arrive at our conclusions in
Section 5.
3. The Proposed EfficientNet Model for DT-VANET
The proposed model includes a highly efficient convolutional neural network (CNN) called EfficientNet with DT-VANET to achieve real-time decision-making. The main objective of the model is to improve the reactive and predictive capabilities by using EfficientNet’s efficiency and accuracy [
10] and DT’s computation to process massive amounts of data produced by VANET. Our model also uses transfer learning to minimize communication overhead and reduce latency. Real-time architecture decision-making allows vehicles to avoid accidents, optimize routes, and improve overall road security. Thus, the entire model architecture can make timely decisions to increase resource management, network performance, and traffic safety by integrating EfficientNet, DT technology, and transfer learning.
This section will discuss the proposed CNN-based EfficientNet model, its components, system architecture, and novel lightweight EfficientNet-based DT-VANET algorithms. The model has the following essential parts.
3.1. Digital Twin
Digital twins provide a virtual illustration of the physical vehicular environment. They constantly reflect sensor data, vehicle positions, communication patterns, and speeds. Predictive modeling features are also integrated with the digital twin, allowing it to estimate future network states using historical and present data.
3.1.1. Data Synchronization and Acquisition
Data (traffic patterns, weather conditions, road infrastructure information) from various sensors installed on the vehicle are transmitted through communication devices to the digital twin layer. The acquisition and organization of these data ensure an accurate and current virtual model. Our primary sensor is a camera for real-time images, for example, traffic recognition and object detection, and our supporting sensors are IoT sensors, for example, GPS, accelerometers, and environmental sensors to provide additional vehicle state data and V2X (Vehicle-to-Everything) communication to share critical information across the VANET network. There is also an option to use LiDAR and radar sensors, but LiDAR is expensive and generates large amounts of data, increasing processing time and cost. That is why our architecture does not recommend LiDAR and radar sensors.
3.1.2. Predictive Modeling
The digital twin constructs a detailed model of the selected environment, road infrastructure, surrounding vehicles, and traffic direction. After that, DT produces predictive models that simulate possible incidents such as traffic jams, collisions, or network interruptions using current and historical data. These models are necessary to make proactive decisions.
3.2. Convolutional Neural Network
A specially designed type of deep learning algorithm called Convolutional Neural Networks (CNNs) is used to interpret structured grid data, e.g., images, and classify objects in an image by applying weight and bias to different parts. CNNs are necessary to manage and understand the enormous amount of visual and sensor data produced by the vehicles and their surroundings from the perspective of VANET and DTs. CNN’s architecture minimizes an image to make it easier to interpret while maintaining the essential and noticeable elements needed for a precise prediction. CNNs are the best choice for making real-time and dynamic vehicular decisions because of their ability to learn partial hierarchies of features adaptively and automatically. The following are the three essential features of CNN:
A convolutional layer, which extracts the high- and low-level properties. As we continue to add convolutional layers, the initial convolutional layers currently serve as a manager for extracting low-level attributes. Additional layers are also capable of extracting high-level attributes.
A pooling layer is created using dimensionality reduction to decrease the processing power required to process the data through the extracted feature’s spatial size.
Following the application of the pooling and convolutional layers, the model is effectively trained to know the characteristics. The completed output is flattened and supplied to a fully connected (FC) layer for image classification in several groups.
3.2.1. CNN’s Function in VANET
Vehicles in VANET are equipped with a collection of cameras and sensors that consistently generate data. These effectively managed data can support several advanced applications, including autonomous driving, traffic management, and collision avoidance. Relevant features are extracted from raw sensor data by CNN, such as object detection (e.g., other vehicles, pedestrians, road signs), environmental knowledge, and more. CNNs are skilled in image and video analysis. Then, higher-level decision-making processes are essential to assure the effectiveness and safety of VANET, which uses these features as input.
3.2.2. CNN Application in DT-VANET
Every vehicle in DT-VANET has an equivalent virtual replication of its physical actions and state in real time to handle the spatial and visual data collected from the vehicles. CNN can be added to these digital twin layers. Real-time vehicle environment analysis can be performed using CNNs within the digital twin framework, allowing optimal decision-making.
3.2.3. EfficientNet Integration in DT-VANET
The suggested system’s most essential element is integrating EfficientNet into DT-VANET. The enormous amount of data the digital twin produces is processed using EfficientNet’s design, which helps the system predict in a precise and timely manner.
For our research, we used EfficientNet, an advanced CNN architecture recognized for its precision and computation efficiency. With EfficientNet, efficiency is greatly improved while computational cost is kept to a minimum by maintaining fairness in network width, depth, and resolution scaling by compound scaling. This feature benefits VANETs, where real-time processing speed and accuracy are equally essential.
Data Preprocessing: To ensure feasible input into the EfficientNet model, the raw data from the digital twin are pre-processed and organized. This step requires feature extraction, dimensionality reduction, and normalization to prepare the data for neural network processing.
EfficientNet Architecture: Known for its effective network scalability. EfficientNet is used to manage complicated data streams. The architecture uses fine-tuning to ensure the model can function in real-time with the least latency to maintain fairness in the compromise between computational efficiency and accuracy. An EfficientNet-based model capable of particular VANET activities like anomaly detection, route optimization, or traffic prediction should be designed and implemented.
Model Training: To maximize the performance of the EfficientNet model, train it on historical data before real-time data.
Prediction and Decision Making: After managing the input data, EfficientNet predicts possible dangers, vehicle behaviors, and traffic circumstances. The outcomes that might result in vehicle rerouting, traffic signal adjustments, or driver notifications are based on these estimates.
3.3. Transfer Learning
When there is a large dataset, it is generally evident that CNN performs better than when a small dataset is used. Since training a model with extensive data is not always feasible, transfer learning is used in these cases. Later, we explain how a model is trained on a large standard dataset in transfer learning and may be used to extract attributes for relatively small datasets. This increases productivity and offers a general strategy for solving novel problems. The benefits of using transfer learning are reduced training time, improved performance, and better generalization. We may fine-tune models already trained on big datasets, like object recognition or image classification tasks for VANET-specific jobs, by applying transfer learning, for instance, traffic prediction in real-time, identifying and preventing accidents and road hazards.
3.4. System Architecture
This section first presents a high-level architecture using EfficientNet in DT-VANET, shown in
Figure 1. The two primary layers of the architecture are the digital twin layer and the physical interaction layer. The physical layer comprises all components (for instance, autonomous vehicles and Roadside Units (RSUs)) needed to implement DT-VANET. Using the concept of twin objects, cloud servers implement the digital twin layer [
33]. A twin object is known to utilize the virtual representation of the physical system, and these virtual representations of the actual vehicular network can simulate a variety of applications and functions of the vehicular network. Mathematical and experimental methods can be used to model such network functions or applications [
34]. The raw image data from the vehicle DT were first pre-processed. The EfficientNet model pre-trained on the larger dataset is fine-tuned on the image data to provide decisions. These recommended decisions are then transmitted back to the vehicles for implementation. The functionality of EfficientNet on the digital twin layer is shown in
Figure 2.
Figure 3 discusses the basic idea behind transfer learning.
Figure 2 shows the functionality of the proposed framework. The initial step is pre-processing the image to remove unnecessary parts and then augmenting the image so that the model learns different situations and patterns. The second step involves training the model with pre-processed EfficientNet (dataset). The experiment’s third stage involves decision-making or recommendations.
3.5. Vehicle Digital Twin Communication
The real-time decision-making using EfficientNet, as described in Algorithm 1, consists of the following events:
Algorithm 1 Vehicle digital twin communication model update |
- 1:
Step 1: Data Upload Initialization - 2:
(a) Each vehicle and its neighbors vehicle v collect real-time image data, , associated with the vehicle. - 3:
(b) The vehicle utilizes existent base stations, networks, and wireless communications (5G, WiFi) at the physical layer for data transfer. - 4:
(c) These images are about the surrounding traffic conditions and environment. - 5:
Initialize vehicle v, neighbor n, and time t: - 6:
whiledo - 7:
while do - 8:
while do - 9:
Step 2: Pre-Processing/Managing Images at Digital Twin Layer (resize, mean subtraction, etc.) to prepare them for EfficientNet input. - 10:
Update Digital Twin with real-time image data : - 11:
Normalize the sensor data for vehicle : - 12:
Update time - 13:
end while - 14:
Update neighbor - 15:
end while - 16:
Update vehicle - 17:
end while - 18:
for each update do Eff-Net(); - 19:
Step 3: Transmit the final updated optimal decision suggested/recommended , from the base station to the corresponding vehicle. - 20:
Step 4: Transmit the feedback on whether the suggested decision is implemented and, if yes, how effective or accurate the recommended decision is. True decision feedback is transmitted from the vehicle in the physical layer to the EfficientNet Model in the digital twin layer via the base station for potential improvement.
|
3.5.1. Initialization
Every vehicle carries confidential and private data. Base stations, networks, and wireless communications (5G, Wi-Fi) at the physical layer help transfer data to the digital twin layer.
First, vehicle v in DT-VANET wants to transmit its data to the digital twin layer so that we can optimize decision-making for this vehicle. The neighboring vehicle and the closest to vehicle v is n, which is used for data sharing and communication. The information from the adjacent vehicle is required to validate the data/events and help us make better decisions. The time variable begins from one. The primary purpose of the time variable is to collect real-time data and make predictions accordingly.
3.5.2. Selection of Vehicles
The first loop will traverse each vehicle v in the DT-VANET individually. The loop will guarantee iteration over each vehicle in the network if the condition is met, which means all vehicles are considered once, one at a time. The total number of vehicles on a network is denoted by . Until every vehicle’s decision has been made, the loop continues to be executed.
The second loop iterates through all the neighboring vehicles for each vehicle v. N indicates the number of neighboring vehicles within the communication range. The goal is to communicate with and collect data from neighboring vehicles to help in the decision-making process later for vehicle v. Moving through N neighbors, n is incremented for each neighboring vehicle.
The third loop iterates over time intervals. The total time intervals are represented as T. Digital twins are updated, and real-time data from vehicle sensors are gathered each time. This loop constantly collects and updates data to ensure decision-making in real-time.
3.5.3. Updates to the Digital Twin
Real-time image data
from the vehicle
are gathered at each time t. A virtual representation of the vehicle’s current state that is updated with the most recent image data is called digital twin
. The mean value of the sensor’s previous data is subtracted to normalize the sensor interpretations and exclude biases resulting from the prior data.
After subtracting the mean, the image data are normalized by dividing by the standard deviation
or 255 (8-bit image pixel size). Scaling the data ensures it falls into a range that the EfficientNet model can handle.
Normalization makes the data suitable for neural network input and helps avoid problems like outliers that affect decision-making. This keeps updating until all the vehicles and their neighbors are processing.
In Step 3, after EfficientNet presents the optimum decision recommendation , it is communicated to the vehicle through the base station.
After deciding on the last step, 4, we communicated the feedback from the vehicle to the digital twin layer for improvement in the future decisions of the EfficientNet.
3.6. Server-Side Decision Making Using EfficientNet
Algorithm 2 discusses the steps related to the EfficientNet model and their output decision. This algorithm inputs the image data, which the vehicle’s respective digital twins have already pre-processed. It outputs the most optimum decision the vehicle could take in that situation.
Algorithm 2 Cloud server-side weighted decision making using EfficientNet update |
|
3.6.1. Training Dataset
The Image Training Dataset DS, which consists of previous pre-recorded images from the vehicle’s camera and related decisions, is used to train the EfficientNet model. The primary purpose is to find patterns in the image data that can assist in making real-time decisions. By training EfficientNet, we can ensure that it can be used afterward to make real-time predictions.
3.6.2. Predicted Optimal Decision
After training the model, real-time pre-processed image data from the digital twin are used as input for a feed-forward pass. The predicted decision for the vehicle at time t is produced as an output. The EfficientNet model predicts the vehicle’s decision based on the features learned during the training phase.
3.6.3. Local and Neighbor Vehicle Decision Prediction
The vehicle’s own predicted decision
is merged with the predicted decisions of its neighboring vehicle
using a weighted average to increase decision accuracy.
The above equation represents the local decision’s confidence factor by . The weight is assigned to the neighboring vehicle decision . By combining these two sources of information, vehicle may make better-informed decisions by considering their prediction and the neighboring vehicle’s predictions.
After integrating decisions, the final decision is forwarded to the concerned vehicle for execution. This could include communicating with neighboring vehicles, rerouting, or changing speed. The algorithm, which depends on local and neighboring information, confirms that every vehicle makes optimal real-time decisions.
3.6.4. Feedback Comparison
A comparison is made between the actual (accurate) decision and the predicted decision . An anomaly is identified if the difference exceeds the threshold determined by the rule (three standard deviations).
The rule helps determine whether the decision deviates abnormally from predicted behavior, suggesting potential anomalies in image data or the decision-making process.
4. Performance Evaluation
This section thoroughly evaluates our suggested framework using EfficientNet through different performance evaluations.
4.1. Dataset
We used the German Traffic Sign Recognition Benchmark (GTSRB) (
https://benchmark.ini.rub.de/index.html, accessed on 4th November 2024) [
35] dataset for our simulations. One well-known and frequently used dataset in traffic sign identification is the German Traffic Sign Recognition Benchmark (GTSRB). This is one of the most commonly used datasets for traffic sign recognition. More than 50,000 images representing 43 distinct types of traffic signs are included. These images are perfect for training robust models because they vary in scale, weather, and lighting. Because of its large size, excellent annotations, and realistic scenario modeling, the GTSRB dataset is still widely used by researchers for traffic sign recognition studies.
DT constantly receives real-time traffic images from vehicle-mounted cameras and roadside sensors. The trained EfficientNet model, fine-tuned on GTSRB, is deployed to recognize and classify traffic signs effectively. The DT uses the classified traffic signs to dynamically update the virtual environment, allowing real-time decisions about lane shifts, speed adjustments, and traffic light control. The classified traffic signs are crucial input for the DT simulation engine that predicts the vehicle’s paths and actions in real-world scenarios. This optimizes vehicle control strategies by giving early warnings of traffic rules, stop signs, or speed limits.
Performance metrics like accuracy prevent incorrect decisions in DT-VANET. Latency determines how rapidly the DT can detect and respond to real-world traffic. Low resource consumption ensures the smooth functioning of DT-VANET.
Some of the traffic sign sample images from this dataset are shown in
Figure 4.
Figure 5 shows the workflow of how we implemented EfficientNet in the dataset.
4.2. Classification Type
The GTSRB dataset is a dataset of traffic signs categorized into multiple classes, and EfficientNet is used to classify images into one of those predefined categories.
The GTSRB dataset is a collection of images of 43 different classes of traffic signs. These classes are the different categories of signs used on the roads, including speed limits, warnings, prohibitions, and mandatory signs. Some of the major categories include:
speed limit signs (e.g., 30 km/h, 50 km/h, 70 km/h, etc.);
no entry/no passing signs (e.g., no entry, no passing for vehicles over 3.5 t);
priority signs (e.g., right-of-way, give way, stop);
danger warning signs (e.g., dangerous curve, slippery road, pedestrian crossing);
mandatory signs (e.g., turn left, turn right, go straight, roundabout);
prohibitory signs (e.g., no entry, no overtaking);
warning signs (e.g., pedestrian crossing, road work, bumpy road);
priority and yield signs (e.g., stop sign, yield sign).
4.3. Classification Process
4.3.1. Image Pre-Processing
It is crucial to perform preprocessing to bring the GTSRB images from their raw form into a suitable shape that can be fed into the EfficientNet model. The GTSRB dataset contains original images of various sizes. EfficientNet expects images to be in a specific form. E.g., EfficientNet-B0 works best with images of 224 × 224 pixels. Resizing is carried out so that all the input images are the same size during training and inference. The GTSRB images are also noisy and poorly illuminated. The GTSRB dataset has unclear traffic sign images. These images also have a specific amount of uncertainty. Thus, before further processing, traffic sign images have to be normalized. The proposed method first enhances the low-pixel-value images by data normalization for stable training and utilizing the min-max normalization function along with Gaussian and Laplacian filters. Road sign images in their raw form have pixel values between 0 and 255. Min-max normalization scales these values between 0 and 1 by dividing by 255. We applied Gaussian blur to the original images. Then, we attempted to remove the blur by subtracting the blurred image from the original image to reduce noise and smooth the image. Then, we added a weighted portion of the mask. Then, we used a Laplacian filter with a kernel size of 3 × 3 for edge enhancement. This facilitates the training of the neural network by stabilizing the weight updates and also enhances convergence.
4.3.2. Data Division and Augmentation
The deep neural network requires large datasets for better results. Our dataset is large. We have over 50,000 traffic sign images in our dataset, divided into a 70% training set and a 10% validation set, and the remaining 20% are used for testing purposes. Therefore, data augmentation is needed to improve model generalization since traffic signs can appear in different lighting conditions, angles, and distortions in real-world scenarios. We have applied rotation, flipping, and brightness adjustment techniques. Augmentation ensures that the model learns the patterns from different situations and perspectives.
4.3.3. Model Training
The chosen CNN model is EfficientNet because it is very efficient in extracting features from images while maintaining high accuracy. It consists of convolutional layers that detect the edges, textures, and shapes in traffic signs.
Instead of training from scratch, we employ pre-trained EfficientNet weights from a large dataset (ImageNet). The model is fine-tuned on the GTSRB dataset to focus on traffic sign recognition. This approach also reduces the training time and performs better with the limited data.
The last layer of EfficientNet is modified to classify images into 43 categories of traffic signs. A Softmax function is used in the output layer to compute the probabilities for each class, and the sign with the highest probabilities is selected as the final prediction.
Since GTSRB is a multi-class classification problem (with 43 classes), Categorical Cross-Entropy Loss is employed. It measures how well the predicted probability distribution matches the actual class labels.
We used Adam (Adaptive Moment Estimation), the most commonly used optimizer, which adjusts learning rates automatically and works very effectively for complicated models, such as EfficientNet.
The dataset is split into training and validation sets for the model to learn from labeled images and validated using a separate test set to check performance.
4.3.4. Classification Decision Making
After the model has been trained, it is used for real-time traffic sign recognition. The trained model uses a testing set of traffic sign images of the GTSRB dataset. These images are first preprocessed (i.e., resized and normalized) before being given as input to the model. The model provides a probability score for each of the 43 traffic sign categories. The class with the highest probability is chosen as the prediction.
4.3.5. Evaluation Metrics
Accuracy, Precision, Recall, F1-Score, Specificity, Auc, and kappa are used to measure performance. The confusion matrix is used to assess misclassification patterns. Latency, Convergence Time, and Resource Consumption are compared to other approaches to compare performance.
4.4. Baseline Studies
We will compare VGG16, ResNet50, and EfficientNet CNNs. We have used Resnet50 because it has a modern architecture and is well known for its performance and efficiency, while VGG16 is known for its simplicity and effectiveness. All models are implemented in this work on TensorFlow. All models are trained using TensorFlow. As our research of the vehicular network context is primarily aimed toward real-time decision-making, once we have the images through the sensors and cameras of vehicles, it is independent of considerations of wireless transmission effects (e.g., network delays and bandwidth constraints). Wireless transmission introduces additional variables (latency, packet loss, bandwidth constraints, etc.), making it hard to isolate the impact of the CNN model on the decision-making itself, which is the purpose of this paper.
4.5. Performance Metrics
The confusion matrix is the most crucial metric for assessing a model’s performance since other evaluation metrics can be derived from this metric.
The confusion matrix is evaluated using four performance measures: True Negative (TN), False Positive (FP), False Negative (FN), and True Positive (TP). Thus, TN is when the model accurately ignores irrelevant objects or recognizes that there is no traffic sign. FP is when the model misclassifies a traffic sign into the incorrect category or recognizes a sign when none exists. TP is when the model accurately identifies and classifies the traffic sign in the accurate category. FN is when the model fails to identify an existing traffic sign.
Accuracy tells us the proportion of data correctly predicted by our model. It is the ratio of correctly predicted values to the total number of values predicted.
Recall: The model can correctly identify the positive cases.
Precision: It measures the proportion of correctly identified positives out of all predicted positives.
F1 score: The harmonic mean of the model’s precision and recall is calculated using the following formula. It takes both precision and recall into account to compute the metric.
Specificity: It is the fraction of correctly identified negatives out of all negatives.
Latency is the time needed for the model to process an input and produce an output. The use of computational resources throughout the training process is called Resource Consumption (Computational).
4.6. Experimental Results and Performance Analysis
Many experiments have been performed to test the feasibility of EfficientNet for decision-making. All the experimental evaluations are conducted using Python 3.11.8, libraries mentioned in the
Appendix A, and a GPU-based system. The EfficientNet model provides better accuracy than the other two models. EfficientNet provides 98.75% accuracy on training data and 96.67% accuracy on the testing dataset, as shown in
Figure 6. The model performance measure shows a noticeable improvement, demonstrating the potential for improved accuracy and lower loss. Initially, the validation accuracy is relatively low, but it increases steadily to reach up to approximately 96%.
The experiments have been conducted in a total of 150 epochs. In this case, a batch size of 32, an image size of 224, and a verbose one have been used for the experiment. For Resnet50, in the accuracy model, the first validation accuracy is relatively low at approximately 0.86, and then after a few epochs, the validation accuracy increases drastically to 0.95. Similarly, the first validation loss is also approximately 0.45, but after a few epochs, the loss is less than 0.25.
Figure 7 and
Figure 8 display the accuracy and loss for the test and validation set of the ResNet50 and VGG16 models, respectively.
From the model accuracy and loss graphs in
Figure 8, we deduced that there is quite a difference between the initial model value and the persistent accuracy and loss values. Therefore, VGG16 has a lower accuracy value than the others. EfficientNet and ResNet50 have different accuracy and loss graphs.
In contrast to VGG16, which has a testing accuracy and loss of 91.00% and 0.44, respectively, EfficientNet has a testing accuracy of 96.67% and a loss of 0.13. ResNet50’s accuracy and loss values are somewhat between EfficientNet and VGG16, yielding a testing accuracy of 95.20% and a testing loss of 0.20.
Table 1 compares testing accuracy and loss across all three models. EfficientNet outperforms ResNet50 and VGG16 regarding testing loss and accuracy due to its features, like better scaling strategy, improved parameter efficiency, advanced convolutional blocks, better regularization techniques, and higher computational efficiency. At the same time,
Figure 9 shows a performance analysis.
The performance of EfficientNet was also compared using various performance metrics that are very important for decision-making, including accuracy, precision, recall, F1 score, AUC, kappa, latency, and resource consumption for faster processing. The confusion matrix is used to evaluate a few of these factors. The details of the confusion matrix can be analyzed in
Figure 10. Misclassifications occur from overfitting using 20% of the validation and testing data taken from datasets with 56,930 images displayed in the confusion matrix. According to the confusion matrix, out of 11,386 images, EfficientNet has 379 misclassified traffic signs, ResNet50 has 546, and VGG16 has 1025 misclassified traffic signs. EfficientNet has higher accuracy than the others because there are less misclassified data. VGG16 has the lowest confidence level among other CNN models. All CNN models do a relatively good job of classifying traffic signs. Four critical measures have been considered when comparing techniques: precision, recall, F1-score, and specificity. All assessment metrics for the CNN models discussed above are presented in
Table 2 and graph in
Figure 11.
Compared with the other two CNN models, EfficientNet has the highest precision, recall, F-1 score, and specificity, as shown in
Table 2 and the analysis graph in
Figure 11. EfficientNet latency and computational resources are also minimal compared to other models, supporting our claim in the research paper. EfficientNet produces perfect results, making decision making in DT-VANET faster and more accurate, with fewer computational resources and less convergence time.
4.7. Comparison of EfficientNet with VGG16 and ResNet50
Through Compound Scaling, EfficientNet is found to outperform VGG16 and is better than ResNet50 in terms of accuracy. EfficientNet generally has low validation loss because of its efficient utilization of parameters compared to ResNET50 and VGG16 models, which might experience higher validation loss due to their large size and potential inefficiencies in capturing fine details. VGG16 has the highest latency because of the larger model size, and EfficientNet has less model size and latency, making it a recommended choice for decision-making. VGG16 requires the most memory and computational resources and is the heaviest. The lightest is EfficientNet, which has fewer parameters, is suitable for real-time decision making in DT-VANET, and requires less time for convergence. EfficientNet is the fastest of the three models because it is designed to be efficient regarding latency, computational cost, accuracy, and convergence time. The slowest model is VGG16, attributed to its deep and wide architecture and high parameter count.