Futureinternet 15 00335
Futureinternet 15 00335
net/publication/374618222
CITATION READS
1 28
4 authors:
All content following this page was uploaded by Joan David Gonzalez-Franco on 11 October 2023.
Article
Comparison of Supervised Learning Algorithms on a 5G
Dataset Reduced via Principal Component Analysis (PCA)
Joan D. Gonzalez-Franco 1 , Jorge E. Preciado-Velasco 1, * , Jose E. Lozano-Rizk 2 , Raul Rivera-Rodriguez 2 ,
Jorge Torres-Rodriguez 3 and Miguel A. Alonso-Arevalo 1
Abstract: Improving the quality of service (QoS) and meeting service level agreements (SLAs) are
critical objectives in next-generation networks. This article presents a study on applying supervised
learning (SL) algorithms in a 5G/B5G service dataset after being subjected to a principal component
analysis (PCA). The study objective is to evaluate if the reduction of the dimensionality of the dataset
via PCA affects the predictive capacity of the SL algorithms. A machine learning (ML) scheme
proposed in a previous article used the same algorithms and parameters, which allows for a fair
comparison with the results obtained in this work. We searched the best hyperparameters for each
SL algorithm, and the simulation results indicate that the support vector machine (SVM) algorithm
obtained a precision of 98% and a F1 score of 98.1%. We concluded that the findings of this study
hold significance for research in the field of next-generation networks, which involve a wide range of
input parameters and can benefit from the application of principal component analysis (PCA) on the
Citation: Gonzalez-Franco, J.D.; performance of QoS and maintaining the SLA.
Preciado-Velasco, J.E.; Lozano-Rizk,
J.E.; Rivera-Rodriguez, R.;
Keywords: 5G/B5G service classification; dimension reduction; ML; PCA; QoS; SLA
Torres-Rodriguez, J.; Alonso-Arevalo,
M.A. Comparison of Supervised
Learning Algorithms on a 5G Dataset
Reduced via Principal Component
1. Introduction
Analysis (PCA). Future Internet 2023,
15, 335. https://doi.org/10.3390/ Nowadays, Internet speed is an eminent concern for both users and service providers;
fi15100335 hence, cellular companies are in a battle for fifth generation (5G) Internet implementation to
provide a super-fast and reliable connection to their users [1]. So, enhancing the reliability,
Academic Editor: Xavier Fernando
capacity, and speed of channels is mandatory. There are actions to accomplish those tasks,
Received: 8 September 2023 such as massive multiple-input and multiple-output (MIMO) or low power and low latency
Revised: 7 October 2023 for the tactile internet, among others [2].
Accepted: 9 October 2023 The primary focus of 5G networks is to enhance the transmission speed, capacity,
Published: 11 October 2023 and reliability of wireless channels. The fulfillment of these tasks is possible with the
use of low power and low latency for the tactile internet, the Internet of Things (IoT),
massive multiple-input and multiple-output (MIMO), robotics, autonomous vehicles, and
industry [2].
Copyright: © 2023 by the authors.
It is essential for real-time applications and seamless connectivity with low latency
Licensee MDPI, Basel, Switzerland.
requirements to have a reliable network, and 5G offers a latency of 1 ms to satisfy this
This article is an open access article
distributed under the terms and
requirement [3]. In the future, the above traits of 5G will assist in upcoming big data
conditions of the Creative Commons
applications such as diagnosing critical-life situations in hospitals, fast money transfer in a
Attribution (CC BY) license (https:// stealthy way, handling inventory in warehouses, and much more [4].
creativecommons.org/licenses/by/ LTE, 4G, and their earlier generations cannot support the latest traffic-intensive appli-
4.0/). cations with quality of service (QoS). Here is where the 5G characteristic offers an average
data rate of more than 100 megabits per second (Mbps) and data rates of up to 20 Gbps
(faster than 4G) to meet these requirements [5].
Applying supervised learning algorithms to high-dimensional datasets is a common
challenge in various areas of research and practical application. One of the most widely
used methods that allow for the extraction of the most relevant characteristics of the data to
reduce its dimensionality and represent them in a lower dimensional space [6,7] is principal
component analysis (PCA). However, although PCA can improve the performance of
supervised learning algorithms, it can also hide important information and affect the
performance of these algorithms.
This article presents a study on applying supervised learning algorithms to a dataset
after having undergone a PCA procedure. The main objective is to analyze how PCA
affects the performance of supervised learning algorithms for classifying 5G/B5G services
in terms of processing time and generalization capacity. We also analyze if PCA is suitable
based on the following decision metrics: accuracy, precision, recall, f1 score, and Matthew’s
correlation coefficient (MCC).
The methodology used in this article considers the use of a public dataset and the
application of the five supervised learning algorithms used by the authors in [1], such as
decision tree (DT), random forest (RF), support vector machines (SVM), K-nearest neighbors
(KNN), and multi-layer perceptron classifier (MLPC). In addition, we are applying a K-
Folds cross-validation (K = 10) procedure to ensure the validity of the results.
The main contributions of this study are as follows:
â The reduction of dimension in QoS parameters with respect to the 5G dataset, main-
taining 95% of data variance;
â The evaluation of the impact of PCA on the performance of supervised learning
algorithms and the comparison of the results obtained with a previous study that did
not use PCA;
â The identification of the best hyperparameters of five SL algorithms to obtain the best
possible results for the PCA 5G dataset;
â The improvement of the performance metrics of SL models concerning previous
studies, with an accuracy of 98.1% and an F1 score of 98%.
These results can benefit various application areas of network management and
telecommunications service providers.
The organization of the rest of this paper is as follows: Section 2 discusses the PCA
involved in service classification and the methodology applied. Section 3 describes the
methodology used in the present work and how we used PCA to reduce the characteristics
of the database. In Section 4, we present the simulation results of this study and the
comparison with previous work. Finally, Section 5 concludes the article with some final
remarks, the limitations of this paper, and suggestions for future work.
2. Related Works
PCA has proven to be an effective technique for reducing the dimensionality of
datasets in various research areas. In the context of 5G networks, several studies explored
using the PCA strategy to reduce the complexity of datasets used in service classification.
These related works focus on applying PCA to data related to QoS measurements in
5G/B5G networks.
The authors in [8] analyze the throughput obtained with the variations observed
on the identified parameters on which it depends. They analyzed the problem as a re-
gression problem and applied regressor models. We analyzed multiple models ranging
from statistical to probabilistic and machine learning to deep recurrent networks with a
10-fold cross-validation. Moreover, we applied dimensionality reduction to the dataset and
observed the performance.
Reference [9] introduces a weighted principal component analysis (WPCA)-based
service recommendation method. WPCA is a modification of PCA that allows for the
weighting of different variables. This weighting can reflect the importance of different
Future Internet 2023, 15, 335 3 of 16
variables in the recommendation process. The authors use a real-world dataset of user
ratings of movies to evaluate the WPCA-based service recommendation method.
Reference [10] compares the performance of different text classifiers when using
principal component analysis (PCA) as a feature extraction method. The results showed
that PCA can improve the performance of all three text classifiers. The authors also
found that the number of principal components used can affect the performance of the
text classifiers. In general, using more principal components leads to better performance.
However, using too many principal components can lead to overfitting.
Reference [11] discusses the importance of service level agreements (SLAs) in 5G
networks and beyond, especially for 5G-enabled healthcare systems. SLAs are a set of
promises a service provider makes to its customers to meet and help ensure service quality.
Reference [12] proposes a method for segmenting medical images using principal
component analysis (PCA). The method first uses PCA to reduce the dimensionality of
the image data. The authors reduced the dimensionality by projecting the data onto a
lower-dimensional subspace that captures the most critical information in the data. The
authors further evaluated the proposed method on a dataset of medical images. The results
showed that the proposed method permits highly accurate segmenting images.
Reference [13] introduces principal component analysis (PCA) as a statistical method
for reducing the dimensionality of data while retaining as much information as possible.
PCA is a widely used technique in many fields, including machine learning, data mining,
and image processing. The authors used PCA to reduce the dimensionality of the data
from 12 to 3 components. These components correlated with the three main tumor types in
the study.
The related works presented consistently demonstrated the effectiveness of principal
component analysis (PCA) in various critical tasks within 5G networks. These works
highlight how PCA can improve performance in critical areas such as channel estimation,
user pooling, resource allocation, and interference management. In all cases, the PCA-based
approach outperformed traditional algorithms by reducing dimensionality and exploiting
relevant features in datasets from 5G networks.
In Table 1, we compare previous research works and identify the gap, which effectively
highlights the milestones achieved in the current literature and the area of opportunity
at the time. Moreover, it highlights the importance of our study by focusing on the area
of opportunity that still requires attention and how this research can contribute to the
advancement of service classification in 5G/B5G networks.
Related
Principal Topic Gap
Work
Throughput variations; regression models; PCA; Further investigation of optimal
Article [8]
performance analysis. hyperparameter settings.
Weighted PCA (WPCA); service recommendation; Exploration of the impact of different weighting
Article [9]
variable weighting. strategies in WPCA.
Examination of the trade-off between the number of
Article [10] Text classification; PCA as a feature extraction method.
principal components and classifier performance.
Service-level agreements (SLAs) in 5G networks; Investigation of SLA adaptation mechanisms in
Article [11]
healthcare systems. dynamic network environments.
Medical image segmentation; PCA for Evaluation of the proposed method on a larger and
Article [12]
dimensionality reduction. more diverse medical image dataset.
Exploration of applications of PCA in other fields within
Article [13] PCA as a dimensionality reduction method.
5G networks.
Future Internet 2023, 15, 335 4 of 16
These studies support the idea that PCA is a powerful tool with which to improve the
performance and efficiency of 5G networks by providing a more compact representation
of data and enabling more accurate decision-making. These investigations establish the
groundwork for forthcoming advancements and enhancements in the domain of 5G net-
works, fostering the examination of PCA-based methodologies in additional domains and
applications inside this framework.
In conclusion, PCA is a promising technique in 5G networks, offering opportunities
to optimize and improve various critical aspects of these networks. Knowledge and
understanding of PCA applications in this context are essential to continued advancement
in the search for more efficient and effective solutions in designing and deploying next-
generation networks.
Our proposal takes advantage of some of the areas of opportunity shown in Table 1
(Articles [8], [11], and [13]). We use PCA’s advantages to reduce dimensions in a 5G dataset,
apply different SL algorithms that learn to classify services, and finish with the search for
hyperparameters to obtain better results. We claim that a better service classification leads
to a better QoS for the operator and, in turn, compliance with the SLAs.
3. Methodology
In this section, we present the fundamental process covered by this study to evaluate
the effectiveness of service classification using SL algorithms in the context of 5G/B5G
networks. Our approach considers three crucial phases that converge in the search for
an optimal solution: the first involves the reduction of the dimensionality of our data
using PCA, followed by the application of SL algorithms, and culminating in the search for
hyperparameters that allow us to obtain more solid and precise results.
Figure 1 shows the block diagram of the processes followed in this work. This diagram
provides an overview of the interconnected stages that make up our methodological
Future Internet 2023, 15, x FOR PEER REVIEW 5 of 16
approach. This section thoroughly explores each phase, highlighting the techniques used
in each step.
This work evaluates the SL scheme’s performance in classifying services after passing
the training data through a PCA procedure. We used the same dataset and SL algorithms
(DT, RF, SVM, KNN, and MLPC) for this analysis as in [1]. This research focuses on pre-
cisely evaluating the PCA’s impact on the data.
Future Internet 2023, 15, 335 5 of 16
This comprehensive approach allows for a deeper understanding of our case study; it
demonstrates the importance of each phase in obtaining accurate and relevant results in
the context of next-generation networks.
This work evaluates the SL scheme’s performance in classifying services after passing
the training data through a PCA procedure. We used the same dataset and SL algorithms
(DT, RF, SVM, KNN, and MLPC) for this analysis as in [1]. This research focuses on precisely
evaluating the PCA’s impact on the data.
The dataset used in this study was created manually following communication stan-
dards. The dataset includes 165 samples and 13 variables, which are key performance
indicators (KPIs) and key quality indicators (KQIs) of 5G services. Variables represent
essential performance metrics such as latency, throughput, and packet loss, among others.
We performed the PCA analysis to reduce the dimension of the dataset, as shown
in Figure 1. We had to normalize the features of the 5G dataset because we needed each
feature to contribute equally. We calculated the accumulative variance and observed how
principal components retained 95% of the original variance. Derived from this process,
we created a PCA 5G dataset and a group of supervised learning models to classify the
new data (PCA dataset). We performed a cross-validation evaluation and compared the
performance metrics of the models obtained in this study with the previous research results.
Since we want to measure the impact of PCA on the dataset to guarantee the compara-
bility of the results, we use the same parameters to fit the machine learning models in both
studies. In addition, we performed a detailed comparison of the results obtained from the
confusion matrices and performance metrics. Later, we search for the best hyperparameters
for each algorithm to obtain the best possible results in the performance metrics.
We must observe the confusion matrix to evaluate the performance metrics in an
ML problem. A confusion matrix visualizes the predictive model’s performance, presents
confusion in two labels, and permits us to obtain the performance metrics’ equations. Each
row represents the current label for the test values (Y test), and the matrix columns represent
the number of predictions for each label (Y) made by the predictive model. Table 2 shows
an example of a confusion matrix.
Prediction (Y)
Positive Negative
True Positives False Negatives
Current (Ytest) Positive
(TP) (FN)
False Positives True Negatives
Negative
(FP) (TN)
TP + TN
Accuracy = . (1)
TP + FP + TN + FN
2. Precision:
TP
Precision = . (2)
TP + FP
3. Recall:
TP
Recall = . (3)
TP + FN
4. F1 score:
2 ∗ Precision ∗ recall
F1 score = . (4)
Precision + recall
5. Matthew’s correlation coefficient (MCC): This is the only binary classification rate
that generates a high score only if the binary predictor can correctly predict most
positive and negative data instances. It ranges in the interval [−1, +1], with extreme
values −1 and +1 reached in cases of perfect misclassification and perfect classification,
respectively. At the same time, MCC = 0 is the expected value for the coin-tossing
classifier [15]. It is an alternative measure unaffected by the issue of unbalanced
datasets.
TP ∗ TN − FP ∗ FN
MCC = p (5)
( TP + FP) ∗ ( TP + FN ) ∗ ( TN + FP) ∗ ( TN + FN )
In summary, the methodological approach employed in this article combines the PCA
technique with well-established SL algorithms for accurate data classification. In addition,
our proposal searches for the hyperparameters of each algorithm to find the best results.
Furthermore, we evaluated the effectiveness of this approach in terms of performance
metrics. We compared it with previous work to demonstrate the utility of PCA analysis in
data classification in next-generation networks.
4. Simulation Results
This section explains the steps necessary to meet our research approach (see Figure 1).
Section 4.1 focuses on applying the PCA to the dataset and understanding the possible
results obtained from this analysis. Section 4.2 focuses on applying SL algorithms to the
PCA dataset, and Section 4.3 focuses on searching for the best hyperparameters of each
SL algorithm.
Therefore, this graph indicates that we must consider reducing the dimensions from
Therefore, this graph indicates that we must consider reducing the dimensions from
13 to approximately 9. Likewise, we know the cumulative sum of the importance of the
13 to approximately 9. Likewise,
variances we knowindividually
of each component the cumulative
since thesum of the
permits importance
visualization of the
of which main
variances of eachcomponent
component individually
to consider. Figure 2bsince thethepermits
shows importancevisualization of which
of each component until main
reaching
100% of Figure
component to consider. the variance.
2b shows the importance of each component until reaching
100% of the variance. Figure 2b shows that from the ninth principal component onwards, the graph main-
tains an almost linear shape and does not represent more than 5% of the main variance of
Figure 2b shows that from the ninth principal component onwards, the graph main-
the data. We calculated the cumulative sum to determine the exact percentage of variance
tains an almost linear shape
in each and does
component. Tablenot represent
3 shows more
the results than
of this 5% of the main variance of
analysis.
the data. We calculated the cumulative sum to determine the exact percentage of variance
in each component.Table 3. Accumulative
Table 3 shows variance of principal
the results component.
of this analysis.
Principal Component Variance (%) Accumulative Variance (%)
Table 3. Accumulative variance 1of principal component. 27.81 27.81
2 18.3 46.11
Principal Component 3 Variance (%) 14.35 Accumulative Variance
60.46 (%)
1 4 27.81 11.73 27.81 72.19
5 7.39 79.58
2 6 18.3 5 46.11 84.58
3 7 14.35 4.4 60.46 88.98
8 3.1 92.08
9 2.76 94.84
10 2.29 97.13
11 1.43 98.56
12 1.02 99.58
13 0.42 100
Observing the results of Table 3, we note, with nine principal components guaranteed,
approximately 95% of the original variance of the data. With this, it is possible to reduce
the dimensions of 13 training variables to 9, which is considered sufficient to fulfill the
purpose of this article.
Therefore, we created a new dataset (PCA dataset) of dimension (165, 10); the 165 rows
of data are maintained, the nine columns correspond to the nine main components, and the
last column corresponds to the labels of the 5G services for classification. In Appendix A,
Table A1 shows a fragment of the PCA dataset containing the first ten rows.
Future Internet 2023, 15, 335 8 of 16
Figure 3 shows the confusion matrices obtained during the testing process for each
model in the simulation. The main diagonal shows the number of correct predictions the
predictive model makes. Values outside the main diagonal represent wrong predictions.
Then, we applied Equations (1)–(5) of the metrics obtained from the confusion matrix
to evaluate the performance of the predictive models. Table 5 shows the results.
Figure 3 shows the confusion matrices obtained during the testing process for each
Future Internet 2023, 15, 335 9 of 16
model in the simulation. The main diagonal shows the number of correct predictions the
predictive model makes. Values outside the main diagonal represent wrong predictions.
Figure 3.
Figure 3. Confusion
Confusion matrices of each
matrices of each model
model (DT,
(DT, RF,
RF, SVM,
SVM,KNN,
KNN,and
andMLPC).
MLPC).
Then, 5we
Table applied
shows thatEquations
for the DT(1)–(5)
and RFof the metrics
models, obtained
after applyingfrom the confusion
PCA, the valuesmatrix
of the
to evaluate the
performance performance
metrics decreasedof the predictive models.
considerably. Table 5 shows
This decrement could the results. the data
be because
now have fewer features, many even with negative values, and a slight change in the
Table 5. can
dataset Model
makemetric
theresults for the simulation.
tree structure unstable, which can cause variance [10].
KNN and MLPC are the ones that provide the best results. However, when analyzing
SL Algorithms Accuracy (%) Precision Macro (%) Recall Macro (%) F1 Score Macro (%) MCC (%)
the results of the cross-validation metrics, the KNN model could not have learned better or
DT 81.8 81.7 86.8 82.4 79.6
presented an underfitting. To make it easier to see if the model is overfitting or underfitting,
RF 78.8 80.4 81.4 79.5 75.9
and to obtain a better understanding of this, Figure 4 shows the learning curves of each
SVM 78.8 model, where75.9 each curve represents79.6 how each model learned 73.7 by comparing 77.5 the training
KNN 90.9 data with the 93.7
test data. 92.2 91.9 89.8
MLPC 90.9 In Figure93.1
4, we observed that the 92.2 92.1 is the SVM model;
model that learns best 89.5however,
since the validation metrics values are low, we need to obtain the best results. When
Tablethe
analyzing 5 shows
graph that for the DT to
corresponding and
KNN,RF models, after applying
the validation PCA,
curve is not thetovalues
close of the
the learning
performance metrics decreased considerably. This decrement
curve, so we inferred that the model is not correctly adjusted. could be because the data
now Inhave
thefewer
case of features,
the MLPC many even both
model, withcurves
negative values, and
approach eachaother
slightatchange in the
the point da-
where
taset
we can 120
reach make the tree
input data.structure unstable,
Furthermore, which can causemetric
the cross-validation variance
for [10].
this model is 90.16%,
and the accuracy obtained in the testing phase was 90.9%. The best result of fitting the
model to the dataset is when there is a difference of less than 1%.
After applying PCA to the original dataset, the model that obtained the best results
was the MLPC. Figure 5 compares the MLPC of this work and the RF of the previous paper.
However, these results are optional, so we searched for the best hyperparameters.
KNN and MLPC are the ones that provide the best results. However, when analyzing
the results of the cross-validation metrics, the KNN model could not have learned better
or presented an underfitting. To make it easier to see if the model is overfitting or under-
fitting, and to obtain a better understanding of this, Figure 4 shows the learning curves of
Future Internet 2023, 15, 335 10 of 16
each model, where each curve represents how each model learned by comparing the train-
ing data with the test data.
In Figure 4, we observed that the model that learns best is the SVM model; however,
since the validation metrics values are low, we need to obtain the best results. When ana-
lyzing the graph corresponding to KNN, the validation curve is not close to the learning
curve, so we inferred that the model is not correctly adjusted.
In the case of the MLPC model, both curves approach each other at the point where
we reach 120 input data. Furthermore, the cross-validation metric for this model is 90.16%,
and the accuracy obtained in the testing phase was 90.9%. The best result of fitting the
model to the dataset is when there is a difference of less than 1%.
After applying PCA to the original dataset, the model that obtained the best results
was the MLPC. Figure 5 compares the MLPC of this work and the RF of the previous
paper. However, these results are optional, so we searched for the best hyperparameters.
FigureFigure 5. Comparison
5. Comparison ofofthe
theresults
results obtained
obtained in in
both papers.
both papers.
K-Folds (K = 10)
SL Algorithms Hyperparameters Best Value
Cross-Validation Results
criterion gini
DT max_depth 6 82.58
min_samples_split 2
criterion gini
max_depth 5
RF 87.25
min_samples_split 2
n_estimators 100
kernel linear
SVM 96.92
C parameter 103
n_neighbors 3
weights distance
KNN 85.55
algorithm auto
p 2
hidden_layer_sizes (50, 100, 50)
activation tanh
MLPC alpha 0.0001 93.90
solver adam
max_iter 3000
5. Results Discussion
In this section, we delve deeper into the outcomes of our study regarding service
classification in 5G/B5G networks after applying PCA and hyperparameters. We present
the confusion matrices and the performance metrics. Moreover, we conduct a direct
comparison with the results of a previous study. This analysis phase is pivotal as it
allows us to comprehend our approach’s effectiveness and relevance in the context of
next-generation networks.
A distinctive aspect of our research is the direct comparison with a prior study that
addressed service classification in 5G/B5G networks without applying PCA. This compari-
son offers a clear insight into how dimensionality reduction through PCA influences the
parison with the results of a previous study. This analysis phase is pivotal as it allows us
to comprehend our approach’s effectiveness and relevance in the context of next-genera-
tion networks.
A distinctive aspect of our research is the direct comparison with a prior study that
Future Internet 2023, 15, 335 addressed service classification in 5G/B5G networks without applying PCA. This compar- 12 of 16
ison offers a clear insight into how dimensionality reduction through PCA influences the
performance of SL algorithms. The outcomes of this comparison are pivotal to determin-
ing the effectiveness
performance of our approach
of SL algorithms. and its of
The outcomes impact on the current
this comparison state ofto
are pivotal research in this
determining
field.
the effectiveness of our approach and its impact on the current state of research in this field.
Figure
Figure 6 shows the confusion
confusion matrices
matricesobtained
obtainedfrom
fromthese
thesemodels.
models.
Figure
Figure 6.
6. Confusion matrices of
Confusion matrices ofeach
eachmodel
modelwith
withhyperparameters
hyperparameters (DT,
(DT, RF,RF, SVM,
SVM, KNN,
KNN, andand
MLPC).
MLPC).
Now, we applied Equations (1)–(5) of the metrics obtained from the confusion matrix
Now, we
to evaluate theapplied Equations
performance (1)–(5)
of the of the models
predictive metrics with
obtained from the confusion
the hyperparameters. matrix
Table 7
to evaluate the performance
shows the results. of the predictive models with the hyperparameters. Table 7
shows the results.
Table 7. Model metric results for the simulation with hyperparameters.
Figure 7 shows the learning curve of the SVM algorithm with hyperparameters. So,
we can see now that the SL algorithm was learning well with the validation set of data
(X test and Y test).
We obtained better results than those shown in previous research work. Figure 8
shows a comparative table between the SVM model, which obtained the best results in this
work, and the RF from previous studies.
MLPC 90.9 95.4 90.7 92.0
Figure 7 shows the learning curve of the SVM algorithm with hyperparameter
we can see now
Figure that the
7 shows theSLlearning
algorithm was learning
curve well algorithm
of the SVM with the validation set of
with hyperp
(Xtest and Ytest).
Future Internet 2023, 15, 335 we can see now that the SL algorithm was learning well with 13
theof 16validat
(Xtest and Ytest).
We obtained better results than those shown in previous research work. Fig
shows a comparative table between the SVM model, which obtained the best resu
Figure
Figure 7. Learning
andcurve
7. Learning
this work, of the
curve
the RF linear
from SVM model
of previous
the linear with Cmodel
SVM
studies. = 1000. with C = 1000.
We
98.5 obtained better results than those shown in previous research w
98.1 98.1 98
shows a98comparative table between the SVM model, which obtained the
this work, and the RF from previous studies.
97.5 97.2
96.9 97
97
98.5 96.6 96.6
%
Figure
Figure
95
8. 8. Comparison
Comparison ofresults
of the the results obtained
obtained in both papers.
in both papers.
Accuracy Precision Recall F1 score MCC
Research Applicability
Performance
Our metrics
research results can benefit various application areas of network management
and telecommunications service providers. The areas of applicability of this work are
as follows: Paper without PCA This study (PCA + hyperparameters)
1. Improvement of QoS in new generation networks: Through dimensionality reduction
using PCA and the search for hyperparameters, it allows for a more precise and effi-
Figure
cient8.classification
Comparison of the results
of services. obtained
Our approach in both
directly papers.
impacts the end-user experience
by ensuring reliability and efficiency in the delivery of network services. Users of
5G/B5G services will experience more robust connectivity and greater satisfaction
due to improved QoS.
Future Internet 2023, 15, 335 14 of 16
6. Conclusions
In summary, in this work, a service classification analysis was conducted in 5G/B5G
networks using a dataset containing KPI and KQI variables. Principal component analysis
(PCA) was applied to effectively reduce the dimensionality of the dataset while employing
the same ML classification algorithms (DT, RF, SVM, KNN, MLPC) as in [1]. The results
indicate that the MLPC algorithm achieved an accuracy of 90.9% and a Matthews correlation
coefficient of 89.5%. While these results are relevant and satisfactory, they did not surpass
the performance metrics obtained in prior research.
As demonstrated in Figure 4, the initial application of these algorithms revealed sub-
optimal learning outcomes. In response, we undertook an exploration of hyperparameters
for the ML algorithms. Following that, the hyperparameters were fine-tuned and subse-
quently implemented, resulting in notable enhancements in performance. Notably, the
SVM algorithm exhibited a precision of 98.1% and an F1 score of 98%. Figure 8 directly
compares the SVM result and the earlier findings.
In conclusion, this study underscores the critical role of PCA analysis and hyperpa-
rameter optimization in service classification within 5G/B5G networks. It is important to
note that a more precise service classifier directly translates into improved quality of service
(QoS), ensuring that users experience higher performance and reliability standards, and
thereby meeting and surpassing the expectations outlined in service-level agreements (SLA).
These insights are poised to significantly enhance the QoS and ensure SLA compliance in
next-generation network environments.
The main limitation of this research is the lack of a public 5G dataset that contains
real measurements of QoS parameters from service providers. Service providers do not
often share their networks’ operational data, making it challenging to find operational data,
which is why synthetic datasets are used. In future work, an area of interest will be the
ability to execute ML algorithms to classify services according to performance and quality
parameters extracted from operators.
In future work, we will consider the implementation of dynamic PCA, an adaptive
approach that can dynamically adjust PCA parameters to suit evolving network conditions.
Data Availability Statement: The data that support the findings of this study are available from the
corresponding author, upon reasonable request.
Acknowledgments: The authors would like to thank to the CICESE and the CONAHCYT for
their support.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 Service
2.5756558 0.24621559 −1.35927118 3.23078901 1.09972575 −0.69121917 −0.73070834 −1.16510564 −0.00452683 UHD_Video_Streaming
3.19269459 0.46629112 −1.93486147 1.82095405 1.18999224 −0.45830314 −0.1137988 −1.13037469 0.87825037 UHD_Video_Streaming
3.68179527 −0.11216349 1.42957963 −1.12968081 −0.34314677 0.06361543 −0.18534234 0.44228557 −0.1214715 Immerse_Experience
−0.2956968 −0.3660506 −1.53749501 −1.18555701 0.84736679 0.01813633 −0.74932499 0.61125113 0.27002474 Smart_Grid
−2.1886923 1.654586 0.90803809 0.96660038 −0.8333005 0.66169658 −0.92042026 −0.12399808 −1.37094177 ITS
−2.3979213 1.74833131 2.18157625 1.22849353 2.49116385 −0.34653657 −0.58041048 0.71969267 0.25472965 Vo5G
3.6678642 0.19525714 1.72733445 −0.77410883 −0.62850646 0.11037757 −0.08152386 −0.06107831 0.08985644 Immerse_Experience
−0.4308387 −2.43457613 0.51735842 0.53574906 −0.52068866 −0.77749575 0.29709111 0.02079445 0.0129068 e_Health
−0.7316077 −2.54502195 0.36201518 0.21264354 −0.46036545 −0.62669515 0.29703591 0.42173884 0.05331082 Connected_Vehicles
−0.2116078 −1.18875613 −1.44643445 −0.92424323 0.25153326 −0.56259378 −0.27871799 0.59457551 −0.10843045 Industry_Automation
References
1. Preciado-Velasco, J.E.; Gonzalez-Franco, J.D.; Anias-Calderon, C.E.; Nieto-Hipolito, J.I.; Rivera-Rodriguez, R. 5G/B5G service
classification using supervised learning. Appl. Sci. 2021, 11, 4942. [CrossRef]
2. Sufyan, A.; Khan, K.B.; Khashan, O.A.; Mir, T.; Mir, U. From 5G to beyond 5G: A Comprehensive Survey of Wireless Network
Evolution, Challenges, and Promising Technologies. Electronics 2023, 12, 2200. [CrossRef]
3. Gökarslan, K.; Sandal, Y.S.; Tugcu, T. Towards a URLLC-Aware Programmable Data Path with P4 for Industrial 5G Networks. In
Proceedings of the 2021 IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada,
14–23 June 2021; pp. 1–6. [CrossRef]
4. Valanarasu, R.; Christy, A. Comprehensive Survey of Wireless Cognitive and 5G Networks. J. Ubiquitous Comput. Commun.
Technol. 2019, 1, 23–32. [CrossRef]
5. Amjad, M.; Musavian, L.; Rehmani, M.H. Effective Capacity in Wireless Networks: A Comprehensive Survey. IEEE Commun.
Surv. Tutor. 2019, 21, 3007–3038. [CrossRef]
6. Shlens, J. A Tutorial on Principal Component Analysis. Educational 2014, 51. [CrossRef]
7. Xia, Z.; Chen, Y.; Xu, C. Multiview PCA: A Methodology of Feature Extraction and Dimension Reduction for High-Order Data.
IEEE Trans. Cybern. 2022, 52, 11068–11080. [CrossRef] [PubMed]
Future Internet 2023, 15, 335 16 of 16
8. Mithillesh Kumar, P.; Supriya, M. Throughput Analysis with Effect of Dimensionality Reduction on 5G Dataset using Machine
Learning and Deep Learning Models. In Proceedings of the 2022 International Conference on Industry 4.0 Technology (I4Tech),
Pune, India, 23–24 September 2022; pp. 1–7. [CrossRef]
9. Qi, L.; Dou, W.; Chen, J. Weighted principal component analysis-based service selection method for multimedia services in cloud.
Computing 2016, 98, 195–214. [CrossRef]
10. Taloba, A.I.; Eisa, D.A.; Ismail, S.S. A Comparative Study on using Principle Component Analysis with different Text Classifiers.
Int. J. Comput. Appl. 2018, 180, 1–6. [CrossRef]
11. Qureshi, H.N.; Manalastas, M.; Zaidi, S.M.A.; Imran, A.; Al Kalaa, M.O. Service Level Agreements for 5G and Beyond: Overview,
Challenges and Enablers of 5G-Healthcare Systems. IEEE Access 2021, 9, 1044–1061. [CrossRef]
12. Maneno, K.M.; Rimiru, R.; Otieno, C. Segmentation via principal component analysis for perceptron classification. In Proceedings
of the 2nd International Conference on Intelligent and Innovative Computing Applications, ACM, New York, NY, USA, 24–25
September 2020; pp. 1–8. [CrossRef]
13. Beattie, J.R.; Esmonde-White, F.W.L. Exploration of Principal Component Analysis: Deriving Principal Component Analysis
Visually Using Spectra. Appl. Spectrosc. 2021, 75, 361–375. [CrossRef] [PubMed]
14. Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag.
Process 2015, 5, 1–11. [CrossRef]
15. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary
classification evaluation. BMC Genom. 2020, 21, 6. [CrossRef]
16. Liyanapathirana, L. Classification Model Evaluation. 2018. Available online: https://heartbeat.fritz.ai/classification-model-
evaluation-90d743883106 (accessed on 8 October 2023).
17. Anonimous. Cross-Validation: Evaluating Estimator Performance. Scikit-Learn. 2020, pp. 1–10. Available online: https:
//scikit-llearn.org/stable/modules/cross_validation.html# (accessed on 8 October 2023).
18. Probst, P.; Bischl, B.; Boulesteix, A.-L. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Mach. Learn. 2018.
[CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.