Ordinal classification for interval-valued data and
interval-valued functional data
arXiv:2310.19433v1 [stat.ME] 30 Oct 2023
Aleix Alcacera , Marina Martinez-Garciaa, Irene Epifanioa,∗
a Dep.
Matemàtiques, Universitat Jaume I, Castelló 12071, Spain
∗ Corresponding
author
Email addresses: aalcacer@uji.es (Aleix Alcacer), martigar@uji.es (Marina
Martinez-Garcia), epifanio@uji.es (Irene Epifanio)
Preprint submitted to Expert Systems with Applications
October 31, 2023
Abstract
The aim of ordinal classification is to predict the ordered labels of the output
from a set of observed inputs. Interval-valued data refers to data in the form of
intervals. For the first time, interval-valued data and interval-valued functional
data are considered as inputs in an ordinal classification problem. Six ordinal
classifiers for interval data and interval-valued functional data are proposed.
Three of them are parametric, one of them is based on ordinal binary decompositions and the other two are based on ordered logistic regression. The other
three methods are based on the use of distances between interval data and kernels on interval data. One of the methods uses the weighted k-nearest-neighbor
technique for ordinal classification. Another method considers kernel principal
component analysis plus an ordinal classifier. And the sixth method, which is
the method that performs best, uses a kernel-induced ordinal random forest.
They are compared with naı̈ve approaches in an extensive experimental study
with synthetic and original real data sets, about human global development, and
weather data. The results show that considering ordering and interval-valued
information improves the accuracy. The source code and data sets are available
at https://github.com/aleixalcacer/OCFIVD.
Keywords: Ordinal regression, Interval data, Symbolic data, Functional data
analysis, random forest
2020 MSC: 62H30, 62R10
1. Introduction
Symbolic data analysis (SDA) is gaining popularity since this kind of data
can arise as the result of aggregation of very large data sets, which are very common nowadays (Billard & Diday, 2003). However, there are also data that are
naturally symbolic (see Billard (2008) for some illustrative examples of symbolic data). In classical multivariate analysis, data points consist of a single
(numerical or categorical) value for each feature, i.e. they are point-valued
data. However, symbolic data can be lists, intervals, histograms, etc.
In this work, we focus on interval-valued data (IVD), i.e. each data point
is expressed in interval format. Using classical techniques with IVD can cause
distorted results due to the loss of information, as explained in Billard (2008).
Each single observation of point-valued data has no internal variation unlike any
single symbolic data value, which has its own internal variation (Billard, 2006).
For example, for an interval-valued observation [4, 10] and assuming a uniform
distribution across the interval, the variance (s2 ) is 3 (see Bertrand & Goupil
(2000) on how to calculate the symbolic sample variance). However, if the
mid-point of that interval (7) is considered as a point-valued data, s2 = 0.
As consequence, the sample variance of the entire interval-valued data set
comprises both the within- and between- observation variations. As an illustrative example, Billard (2006) compared the results obtained by using symbolic
2
analysis versus classical analysis with principal component analysis (PCA). The
classical results were less informative than the richer knowledge gained from
the symbolic analysis. However, this is not exclusive to PCA, rather the use of
naive ways to deal with IVD may introduce errors in analysis (Li et al., 2019).
This is why methodologies for IVD should be used with this kind of data.
1.1. Interval-valued data
IVD data have been used in different statistical problems, such as archetypal analysis (D’Esposito et al., 2012), classification (Duarte Silva & Brito, 2006,
2015; Ramos-Guajardo & Grzegorzewski, 2016; Appice et al., 2006; de Souza et al.,
2008; Jahanshahloo et al., 2007; Qi et al., 2020; Angulo et al., 2008; Rossi & Conan-Guez,
2002), clustering (Shimizu, 2011; Sun et al., 2022b; Chen & Billard, 2019; D’Urso et al.,
2023), hypothesis testing (Grzegorzewski & Śpiewak, 2019; Maharaj et al., 2021),
outlier analysis Duarte Silva et al. (2018), PCA (Lauro & Palumbo, 2000; Le-Rademacher & Billard,
2012; Sun et al., 2022a), and regression analysis (Blanco-Fernández et al., 2011;
Sinova et al., 2012; Xu & Qin, 2022). An excellent survey is found in Brito
(2014).
Recent advances in analysis of IVD and interesting applications comprise a
variety of fields such as carbon price forecasting (Liu et al., 2022), P M2.5 concentration forecasting (Wang et al., 2022), the spatial behavior of the number
of cases per COVID-19 and rent price analysis (Freitas et al., 2022), clustering
Fungi species (Rizo Rodrı́guez & Tenório de Carvalho, 2022) and forecasting of
oil price (Sun et al., 2022c).
Focusing on classification methodologies, Duarte Silva & Brito (2006), Duarte Silva & Brito
(2015) and Ramos-Guajardo & Grzegorzewski (2016) developed discriminant
analysis methods for interval data. Appice et al. (2006) considered different
distances for several types of symbolic data with the k nearest neighbor method.
de Souza et al. (2008) introduced multi-class logistic regression models. Jahanshahloo et al.
(2007) extended data envelopment analysis–discriminant analysis methodology for interval data. Qi et al. (2020) applied traditional classification methods with a different representation of interval data. Support vector machines
(Angulo et al., 2008) and artificial neural networks (Rossi & Conan-Guez, 2002)
were used in interval data classification.
Another field that is gaining popularity is functional data analysis (FDA),
since technological advances have permitted the acquisition of functional data.
In FDA, data points are functions. Ramsay & Silverman (2005) provide an excellent overview of FDA. Usually, the observed functions are point-valued, but
they can also be interval-valued functions (IVF). Some examples of these are
the maximum and minimum temperatures for each day in meteorological stations, daily interval stock prices, and a person’s daily blood pressure records.
Working with the intervals provides a more realistic view on the weather conditions variations than the simple average values. Analogously, the intervals
offer a more relevant information for experts in order to evaluate the stocks
tendency and volatility in the same day (Lauro & Palumbo, 2000). Similarly,
being aware of the blood pressure fluctuations is critical from the healthy point
of view (Sanidas et al., 2019). Likewise, more meaningful results are obtained
3
when interval-valued functions are used rather than single-point functional data,
as shown by (D’Urso et al., 2023) in clustering.
1.2. Ordinal classification
In classification, a label in a set has to be predicted based on the observation
of several inputs. Most of the time, labels are considered unordered, even if they
are not. Therefore, the vast majority of classification algorithms are conceived to
solve nominal classification problems. Nevertheless, ordered classification problems arise in many fields, such as collaborative filtering (Alcacer et al., 2021), environmental management (Balugani et al., 2021), finance (Hirk et al., 2019), information retrieval, medicine (Barbero-Gómez et al., 2021; Singer et al., 2021),
psychology, and social sciences, among others Gutiérrez et al. (2016). In ordered
classification problems, labels are ordered: for example, ratings (very low, low,
indifferent, high, very high). In those problems, misclassifying an instance in
a neighboring class is generally less relevant than misclassifying it in distant
classes. Furthermore, Vargas et al. (2022) explained that taking order into account in ordinal classifiers usually accelerates the learning process and reduces
the amount of data needed for training.
A taxonomy of ordinal classification methods was proposed by Gutiérrez et al.
(2016), where methodologies were grouped into three approaches. Naı̈ve approaches use simpler paradigms and comprise regression, nominal classification
(the order is neglected (Ferrando et al., 2020)), and cost-sensitive classification.
The second approach divides the ordinal categories into several binary labels,
i.e. it uses binary decompositions, as made by Frank & Hall (2001). Finally,
the third approach comprises the threshold models, which are the most successful methods in ordinal classification. They assume that there is a continuous
feature that can explain the behavior of the ordinal factor. These models include: cumulative link models (Agresti, 2002, Ch. 7), support vector machines,
discriminant learning, perceptron learning, augmented binary classification, ensembles (Hechenbichler & Schliep, 2004; Hornung, 2020; Vega-Márquez et al.,
2021), and Gaussian processes.
The number of methods for ordered classification for multivariate data is
much lower than for nominal classification (Hornung, 2020; Pierola et al., 2016),
and even extremely lower for functional data (Ferrando et al., 2021) or other
kinds of data, such as those on Riemannian manifolds (Simó et al., 2020). For
the case of interval data, literature about ordinal classification is very scarce.
In fact, to the best of our knowledge, only monotonic classification, which is a
particular case of ordinal classification, has been studied for IVD (Chen et al.,
2022). In monotonic classification, there are monotonicity constraints between
inputs and outputs.
1.3. Our contributions
Due to this scarcity, the objectives of this work are to introduce and compare
different ordinal methods for IVD and IVF. We also aim to compare them with
the naı̈ve approach, which consists of a) not taking the order into account,
4
i.e. applying nominal classification methods to ordinal classification problems
as if classes were unordered, or b) taking order into account, but discarding
the interval-valued nature of the data. Gutiérrez et al. (2016) carried out a
comparative evaluation of ordinal methods for multivariate data and showed
that, even if the naı̈ve approach can be very competitive, taking into account
order improves the performance. To the best of our knowledge, no previous
work has had these objectives. Therefore, the novelty of this work lies in:
• Proposing several methods for ordinal classification of IVD and IVF.
• Comparing those methods using simulated and real data.
• Comparing the proposed ordinal methods using nominal classification
methods for IVD and IVF to see whether not discarding the information
about order is important for the case of IVD.
• Comparing the performance when using ordinal classifiers, but without
taking into account the interval-valued information of the data, i.e. considering the mid-point of the intervals as inputs.
• Providing the R (R Core Team, 2023) code with the algorithms. The data
and code are available at https://github.com/aleixalcacer/OCFIVD
for reproducibility.
The paper is organized as follows: Section 2 reviews previous works that will
be used in our proposals. Section 3 introduces the proposed ordinal classification
methods for both IVD and IVF. The results are presented and discussed in
Section 4. Finally, Section 5 contains some conclusions.
2. Current methodologies
2.1. Multivariate ordinal classification methodologies
We review ordinal classifiers for point-valued data. Let X be an N × K
matrix with N observations (xi ) of K features and y the response vector, which
is an ordered factor with Q levels, the ordered classes C1 , ..., CQ .
The Frank and Hall (FH) method (Frank & Hall, 2001): the ordinal classification problem is broken into a series of Q − 1 binary classification
problems, where one class is made up of C1 , ..., Cq and the other class of
Cq+1 , ..., CQ , for q = 1 ,..., Q − 1. For a new observation with features x,
let pq be the estimate of P (y > Cq |x). Therefore, the predicted probabilities of each of the Q classes are: P (y = C1 |x) = 1 − p1 ; P (y = Cq |x) =
pq−1 − pq , q = 2, ..., Q − 1; P (y = CQ |x) = pQ−1 , for q = 1, ..., Q − 1. In
order to apply this method, we need the binary classifier to return class
probability estimates.
5
Weighted k-Nearest-Neighbor Techniques for Ordinal Classification (wkNN):
Hechenbichler & Schliep (2004) describe the use of weighted nearest neighbors for ordinal classification. For a new observation x, the k + 1 nearest
neighbors to x are found according to a certain distance function d(x, xi ).
The k smallest distances are normalized by dividing them by the distance
to the (k + 1)th neighbor and transformed into weights by any kernel function. In the case of nominal classification, the predicted class is usually
chosen by the weighted majority vote of the k nearest neighbors, i.e. by
the mode. But for ordinal classification, the weighted median is used for
predicting the class. The implementation is based on the function kknn
from the R package kknn (Schliep & Hechenbichler, 2016).
Ordered logistic regression (POLR): The cumulative link model is described
in detail in Agresti (2002, Ch. 7). The model is logit P (y <= Cq |x) =
ζq - η, where the inverse of the standard logistic cumulative distribution
function is the logit link function (logit(p) = log(p/(1−p))), ζq parameters
give each cumulative logit, and η represents the linear predictor β1 x1 +
... + βK xK . After estimating the parameters, we can predict the class
probabilities for a new instance, which is assigned to the class with the
highest probability. We have used the polr function from the R package
MASS (Venables & Ripley, 2002) in the implementation.
Ordinal forest (OF): In OF, ordinal classes are predicted by a random forest
methodology introduced by Hornung (2020) for multivariate features. Optimized score values are used in place of the category values of the ordinal
response and the results are treated as a metric output. The method is
implemented in the function ordfor of the R package ordinalForest.
2.2. Point-valued functional ordinal classification methodologies
Kernel-Induced Random Forests (KIRF): KIRF was introduced by Fan et al.
(2010) for nominal classification of functional data. Rather than using the
raw observations, the idea is to use kernel functions of each two different
observations of the training set as candidate splitting rules in the kernelinduced classification trees. Those trees are employed in KIRF.
Functional principal component analysis (FPCA) + ordinal classifier:
Aguilera & Escabias (2008) considered FPCA followed by POLR. In Ferrando et al.
(2021) more methodologies for functional ordinal classification based on a
FPCA decomposition of the functions followed by an ordinal classifier are
considered.
2.3. Interval-valued classification
We focus on the methodologies related with our proposals. de Souza et al.
(2008) proposed applying multi-class logistic regression model in two ways.
First, they fit that model jointly to the lower and upper extreme values of
the intervals. Second, they fit the model to the lower and upper extreme values
of the intervals separately.
6
The methodology proposed by Duarte Silva & Brito (2015) returns class
probabilities, which are needed for the FH method. In that method, intervals are represented by a bivariate normal distribution (one feature is for the
midpoints and the second feature for the logarithm of ranges of the intervals),
and classical linear (or quadratic) discriminant analysis are applied. We refer
to this method as LDA-ID (Linear Discriminant Analysis of Interval Data).
Let X be an N × K matrix with N observations (Xi ) of K interval variables
(X.j ), i.e. Xij = [xlij , xuij ]. Assuming all intervals are non-degenerate (xlij < xuij ,
i = 1, ..., N ; j = 1, ..., K), Xij is represented by the midpoint cij = (xlij +xuij )/2
∗
and the log-range rij
= ln(xuij −xlij ). In the Gaussian model, a joint multivariate
Normal distribution N(µ, Σ) is assumed forthe midpoints Cand the logs of the
ΣCC ΣCR∗
ranges R∗ i.e., µ = [µ′C µ′R∗ ] and Σ =
, where ′ denotes
Σ R∗ C Σ R∗ R∗
the transpose and µC and µR∗ are K-dimensional column vectors of the mean
values of C and R∗ , respectively, and ΣCC , ΣCR∗ , ΣR∗ C and ΣR∗ R∗ are their
variance-covariance matrices. Four different configurations for the variancecovariance matrix are considered: case 1) non-zero correlations among all C
and R∗ , therefore, there is no restriction on Σ; case 2) variables X.j are not
correlated, therefore, ΣCC , ΣCR∗ , ΣR∗ C and ΣR∗ R∗ are diagonal; case 3) there
is no correlation between C and R∗ , therefore, ΣCR∗ = ΣR∗ C = 0; case 4) all C
and R∗ are not correlated, therefore, Σ is diagonal. See Brito & Duarte Silva
(2012) for details about the estimators for each covariance configuration. For
each configuration, the classical linear discriminant classification rule can be
obtained as explained by Duarte Silva & Brito (2015).
The methodology is implemented in the function lda of the R package MAINT.Data
(Duarte Silva et al., 2021). Results for all four configurations are compared by
the Bayesian Information Criterion (BIC), and the one with the lowest BIC
value is selected.
2.4. Distances between interval-valued data and interval-valued functions
Several distances have been defined for IVD and IVF (Shimizu, 2011; Sun et al.,
2022b,a). The Hausdorff distance is commonly used. Let us review its definition.
Let Xi = ([xli1 , xui1 ], ..., [xliK , xuiK ]) be an observation of K-dimensional IVD,
where i = 1, ..., N and xlik ≤ xuik , xlik , xuik ∈ R . The Hausdorff distance between
PK
Xi and Xj is defined by DH (Xi , Xj ) = k=1 Dk (Xi , Xj ), where Dk (Xi , Xj ) =
max(|xlik − xljk |, |xuik − xujk |). The Euclidean Hausdorff Distance is defined by
qP
K
2
DEH (Xi , Xj ) =
k=1 [Dk (Xi , Xj )] .
Analogously, the functional Hausdorff distance is defined for a set of IVF
Xi (t) = [xli (t), xui (t)], with i = 1, ..., N , xli (t) ≤ xui (t) and t ∈ [a, b], i.e. xli (t)
(xui (t)) is the lower (upper) function of Xi (t). The functional Hausdorff distance
Rb
between Xi (t) and Xj (t) is defined by DF H (Xi (t), Xj (t)) = a max(|xli (t) −
xlj (t)|, |xui (t) − xuj (t)|)dt. The Functional Euclidean Hausdorff distance is deRbq
fined by DF EH (Xi (t), Xj (t)) = a max{|xli (t) − xlj (t)|2 , |xui (t) − xuj (t)|2 }dt.
In practice, the integral can be estimated by numerical integration such as the
7
trapezoidal rule. If we have multivariate IVF, for example,
bivariate IVF Fi (t) =
p
(Xi (t), Yi (t)) we can define DF EH (Fi (t), Fj (t)) = DF EH (Xi (t), Xj (t))2 + DF EH (Yi (t), Yj (t))2 .
If the scale of the variables is very different, some standardization should be
carried out. See De Carvalho et al. (2006) for different alternatives for IVD.
2.5. Kernel on interval data
Do & Poulet (2005) defined a Radial Basis Function (RBF) kernel for dealing
with interval data as follows: KI hXi , Xj i = exp(−DEH (Xi , Xj )2 /γ), where the
parameter γ is the spread of the kernel.
Once the kernel is defined, we can use kernel techniques, such as Kernel Principal Component Analysis (KPCA) (Schölkopf et al., 1998). The implementation is based on the function kpca from the R package kernlab (Karatzoglou et al.,
2004).
3. Proposed methods for ordinal classification of interval-valued data
and interval-valued functions
Besides naı̈ve approaches, such as 1) working with midpoints of intervals and
applying ordinal classification methods or 2) using interval classification methods without considering order, we propose the following methods that consider
both the order in the response and the interval nature of the input data.
FH+ LDA-ID: we propose to use the FH method with LDA-ID as the binary
classifier. Although LDA-ID is intended for IVD, we can also use it with
IVF. The idea is to discretize the observed IVF to a (fine) grid of equally
spaced values, so we can work with them as IVD. Therefore, FH + LDAID can be used with both IVD and IVF. Note that FH takes the order
of the output into account and LDA-ID obtains the class probabilities
for each interval-valued binary classification problem. If the grid where
functions are discretized is too fine, this methodology will not work with
functions if we have a great quantity of observed features with respect to
observations. In that case, the grid should not be so fine.
DI+wkNN: our proposal is to use the distances introduced in Sect. 2.4 together with wkNN for ordinal classification. Depending on whether we
work with IVD or IVF, the method will be DEH +wkNN or DF EH +wkNN,
respectively, although they will be referred to as DI+wkNN indistinctly.
In the implementation, k = 7 is used; the parameter has not been tuned.
KPCA + ordinal classifiers: our idea is to carry out a feature extraction
stage followed by an ordinal classifier for multivariate data. Preprocessing
is a powerful methodology in multivariate data (Hastie et al., 2009, p.150151) for improving the performance of a learning procedure. Previously, it
has also been employed successfully with functional data (Epifanio, 2008;
Epifanio & Ventura, 2011). The idea is similar to carrying out FPCA +
ordinal classifier, which is explained in Sect. 2.2 for point-valued functional
8
data, but in our proposal we consider KPCA with a kernel for interval
data.
Note that Do & Poulet (2005) only defined an RBF kernel for IVD. However, we can extend it for IVF. Therefore, we introduce a new definition, an
RBF kernel for IVF KF I hXi (t), Xj (t)i = exp (−DF EH (Xi (t), Xj (t))2 /γ).
In the implementation, γ = 1 is used; the parameter has not been tuned.
Therefore, depending on whether we work with IVD or IVF, the method
will be KI PCA+ ordinal classifier or KF I PCA+ ordinal classifier, respectively. The ordinal classifier can be POLR, OF, or another ordinal classifier for multivariate data. We have considered POLR in the experimental
section, so we will refer to KPCA +POLR for IVD or IVF indistinctly.
KIOF: our proposal is to use KIRF with IVD and IVF using KI or KF I ,
respectively, but rather than using nominal classification of random forest
(RF) as in KIRF, we consider OF. We refer to this method as KIOF for
both KI IOF and KF I IOF indistinctly.
POLR-I and POLR-I2: our proposal is to extend the ideas in de Souza et al.
(2008) to ordinal classification of IVD and IVF. Rather than using a multiclass logistic regression model, POLR is considered. For IVF, they can be
discretized as explained in FH + LDA-ID.
Therefore, in POLR-I, the inputs of POLR are the lower and upper bounds
of the intervals. However, in POLR-I2, we build two models. One model
is built by applying POLR to the lower bounds of the intervals, while
the other model is built by applying POLR to the upper bounds of the
intervals. The posterior probabilities of the classes for both models are
averaged to obtain the final posterior probabilities of the classes.
FH+ LDA-ID belongs to the second approach of ordinal binary decompositions, while the rest of methods belong to the third approach of threshold
methods. DI+wkNN and KIOF are included in the ensembles subgroup, while
KPCA +POLR, POLR-I and POLR-I2 are part of the cumulative link models.
In the multivariate case, methods of the third approach are the most successful
(Vega-Márquez et al., 2021). Although POLR is fast to train and one of the
most popular ordinal classification methods, it has not the best performance
in the multivariate setting (Gutiérrez et al., 2016). Therefore, a priori, we expect the last five methods proposed being the best performing, and, specially
DI+wkNN and KIOF, if IVD (or IVF) case follows the same pattern as the
multivariate case (Gutiérrez et al., 2016).
Table 1 presents an overview of the methods used in the experiments for
IVD (analogously for IVF). On the one hand, the proposed methodologies
(FH+LDA-ID, DI+ wkNN, KPCA + POLR, KIOF, POLR-I, and POLR-I2).
On the other hand, the established techniques, which are naı̈ve methods in
this case, since no previous methodology has been considered for ordinal classification of IVD until now. Three naı̈ve methodologies are contemplated: using
9
LDA-ID, which is an interval-valued classifier that does not take order into account; and using POLR and OF considering the midpoints of intervals as inputs,
i.e. these two classifiers take into account order, but not the interval-valued nature of the data. In the implementation, default parameters (they can be seen
in the code file) have been considered.
Table 1: Summary of some characteristics of the methods.
Method
Approach
(Sect. 1.2)
Parametric
Input
POLR (Ordered logistic
regression)
1st
Yes
Midpoints of IVD
OF (Ordinal forest)
1st
No
Midpoints of IVD
LDA-ID (Linear Discriminant Analysis of
Interval Data)
1st
Yes
Interval bounds transformed
points and log-ranges
FH+LDA-ID
(Frank
and Hall method +
LDA-ID)
2nd
Yes
Midpoints and log-ranges for each binary
classification problem of FH-method
DI+ wkNN (Weighted
k-Nearest-Neighbor
with DEH or DF EH
3rd
No
Interval bounds for computing the distances (input of wkNN)
KPCA + POLR (Kernel Principal Component Analysis + POLR)
3rd
Mixed
Interval bounds for computing the distances and the kernel. The projections on
the principal components (input of POLR)
KIOF (Kernel-Induced
Random Forests with
OF)
3rd
No
Interval bounds for computing the distances and the kernel (input of OF)
POLR-I
(1st
Extension of de Souza et al.
(2008) with POLR)
3rd
Yes
Lower and upper bounds separately, as two
distinct features
POLR-I2 (2nd Extension of de Souza et al.
(2008) with POLR)
3rd
Yes
Lower bounds to POLR-1 and upper
bounds to POLR-2. Posterior probabilities
of POLR-1 and POLR-2 to combine them
into
mid-
for IVD and IVF resp.)
4. Results and discussion
We compare the methodologies in three different scenarios. Artificial data
are generated in Sect. 4.1, while real data sets are considered in Sect. 4.2 and
4.3, which deal with IVD and IVF, respectively.
Although the details will be provided for each scenario, the experimental
setup is common to all three scenarios. In each scenario, we consider a variety
of ordered levels for the output. A total of 50 full data sets are built in each
case. Each of these data sets is divided into a training set and a test set. For
assessing performance, we compute accuracy (success rate) in the corresponding
test set. As the same data sets are used to compare all the methods, a completely
randomized block design is applied to test the differences between methods,
together with Tukey’s test for comparing all pairs of means (Montgomery, 2019).
We have used the R package multcomp (Hothorn et al., 2008).
4.1. Simulated data
Two ordinal classification problems with IVD are simulated, with three and
four ordered classes, respectively. The simulation design of synthetic interval
10
data sets resembles that made by de Souza et al. (2008). Nevertheless, we consider more than 3 classes and the distribution parameters are different bearing
in mind that the output is ordinal.
Each class is composed of 100 samples generated according to bivariate
Normal distributions with non-correlated components and the following
pa
µ1
and Σ =
rameters for the mean µ and the covariance matrix Σ: µ =
µ2
2
σ1
ρσ1 σ2
, with
ρσ1 σ2
σ22
1. Class 1: µ1 = 25 µ2 = 50, ρ = 0, σ1 = 6, σ2 = 3
2. Class 2: µ1 = 38 µ2 = 40, ρ = 0, σ1 = 3, σ2 = 3
3. Class 3: µ1 = 45 µ2 = 35, ρ = 0, σ1 = 5, σ2 = 5
for the problem with three ordered classes, and
1.
2.
3.
4.
Class
Class
Class
Class
1
2
3
4
µ1
µ1
µ1
µ1
= 25
= 30
= 38
= 45
µ2
µ2
µ2
µ2
= 50,
= 45,
= 40,
= 35,
ρ = 0 σ1 = 6, σ2 = 3
ρ = 0, σ1 = 5, σ2 = 5
ρ = 0, σ1 = 3, σ2 = 3
ρ = 0, σ1 = 2, σ2 = 3
for the problem with four ordered classes.
To build the interval-valued data set, each generated data point (z1 , z2 ) is
used as a seed of a vector of intervals (rectangle) defined as ([z1 −γ1 /2, z1 +γ1 /2],
[z2 −γ2 /2, z2 +γ2 /2]), where the parameters γ1 and γ2 are randomly drawn from
a continuous uniform distribution on the interval [1, 5]. Classes are balanced and
several samples are shown in Fig. 1.
For both simulation designs, we generate 50 data sets, where 80% of the
samples are used for training and 20% are used for testing, i.e. 240 (320)
samples for training and 60 (80) samples for testing for the ordinal classification
problem with 3 (4) classes.
Table 2 contains a summary of the performance of each method for both
simulation designs separately (second and third column) and jointly (last column). The maximum value in each column appears in bold. The p-values of the
multiple comparison of means by Tukey contrasts are shown in Table 3 for both
simulation designs jointly. We include the following significance codes for the
ranges of the p-values: ’***’ means that the p-value is in the interval [0, 0.001],
’**’ means that the p-values are in the interval (0.001, 0.01], ’*’ means that the
p-value belongs to the interval (0.01, 0.05], ’.’ means that the p-value is in the
interval (0.05, 0.1], while no significance code indicates that the p-value is in
the interval (0.1, 1]. Depending on the α level considered, we would say that
there is a statistically significant difference between the mean accuracy values
of both methods.
With 3 ordered classes, the best method is KIOF, with 90.7% mean accuracy,
followed by DI+wkNN (with DEH ), with 89.8% mean accuracy, and whose
differences are statistically significant with respect to the rest of the methods.
For the sake of brevity, we do not show the table of p-values for the simulation
11
50
Group
Y
1
2
3
40
30
20
30
40
50
60
X
Figure 1: Plot of 10 samples per group from the synthetic data with 3 classes. The rectangles
denote the IVD, and the dots are the corresponding midpoints.
design with 3 classes, but the p-values of the Tukey contrasts for DI+ wkNN
and KIOF with respect to the rest of the methods are below 3e-05. There is also
a statistically significant difference between DI+ wkNN and KIOF (p-value =
0.0279). Therefore, two of our proposals show better performance than the naı̈ve
approaches. Table 4 shows more performance statistics for illustrative purpose,
as our problems have balanced sample sizes and this kind of statistics are more
critical in imbalanced situations. The highest or second highest value in the
majority (6) of columns are reached by DI+ wkNN and KIOF. For the F1-score
columns, which combine precision and recall, the highest values correspond to
FH+LDA-ID and DI+ wkNN for class 1 (KIOF is the third best performing
method); KIOF and DI+ wkNN for class 2; and KIOF and DI+ wkNN for class
3.
With 4 ordered classes, four methods obtained the highest mean accuracy
with 77.8% mean accuracy. These four methods are OF, DI+ wkNN, KPCA
+ POLR, and POLR-I2. The method with the next highest mean accuracy is
KIOF, whose mean accuracy is 77.1%. As before, for the sake of brevity, we do
not show the table of p-values for the simulation design with 4 classes, but the
p-values of the Tukey contrasts for KIOF with respect to the four best methods
are 0.0695. The p-values of the Tukey contrasts for the four best methods and all
other methods except KIOF are below 0.001, so they are statistically significant.
Therefore, three of our proposals together with the naı̈ve method OF provide
the best performance.
When we consider both simulation designs jointly, the best performance is
achieved by KIOF, with 83.9% mean accuracy. The second best performance
12
Table 2: Mean and standard deviation, in brackets, of accuracy (percentage) over 50 simulations for synthetic data.
Method
POLR
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
POLR-I
POLR-I2
3 classes
88.2 (3.7)
88.1 (3.2)
88.2 (3.7)
88.1 (3.2)
89.8 (3.5)
88.1 (3.2)
90.7 (3.2)
88.2 (3.7)
88.1 (3.2)
4 classes
76.5 (4.5)
77.8 (4.5)
76.5 (4.5)
76.1 (4.5)
77.8 (4.2)
77.8 (4.5)
77.1 (4.3)
76.5 (4.5)
77.8 (4.5)
Global
82.4 (7.2)
83 (6.5)
82.4 (7.2)
82.1 (7.2)
83.8 (7.2)
83 (6.5)
83.9 (7.8)
82.4 (7.2)
83 (6.5)
Table 3: P-values of Tukey simultaneous comparison for synthetic data.
Method
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
POLR-I
POLR-I2
POLR
.2235
1
.5670
.0021**
.2235
.0011**
1
.2235
.2235
.0737.
.0627.
1
.0394*
.2235
1
.5670
.0021**
.2235
.0011**
1
.2235
.0737.
.0001***
.5670
.0737.
.8415
.0021**
.0627.
.0394*
.2235
1
.0011**
.0394*
OF
LDA-ID
FH+LDA-ID
.0003***
DI+ wkNN
.0627.
KPCA + POLR
KIOF
POLR-I
.2235
is achieved by DI+ wkNN, with 83.8% mean accuracy. The third best performance is achieved by three methods, OF, KPCA + POLR, and POLR-I2, with
83% mean accuracy. The p-values of the Tukey contrasts for KIOF with respect
to these three methods are 0.0394, while for all other methods (LDA-ID, FH +
LDA- ID, POLR -I), they are below 0.0011, as can be seen in Table 3. Therefore,
they are statistically significant. Analogously, the p-values of the Tukey contrasts for DI+ wkNN with respect to the three methods OF, KPCA + POLR,
and POLR-I2 are 0.0627, while for all other methods (LDA-ID, FH + LDAID, POLR-I), they are below 0.0021. The method that performs least well is
FH+LDA-ID, with 82.1% mean accuracy, which is not statistically significantly
different from POLR, LDA-ID, and POLR-I, with 82.4% mean accuracy.
In summary, our proposals KIOF and DI+wkNN are the methods that perform best, in fact, statistically significantly better than the rest of the methods
for an α level of 0.1.
4.2. Global Development data
We consider development data from of 183 countries. The input features
are two interval-valued variables based on the following two gender inequality
indicators from The World Bank (2022): a) Women Business and the Law Index Score (LAW) (scale 1-100), which measures how laws and regulations affect
women’s economic opportunities. Overall scores are calculated by taking the
average score of each index (Mobility, Workplace, Pay, Marriage, Parenthood,
13
Table 4: Mean of precision, recall and F1 for each class, over 50 simulations for synthetic data
with three classes. The maximum value in each column appears in bold.
Method
Precision
Recall
1
2
3
1
2
POLR
98.2 80.0 86.8 96.8 85.6
OF
94.0 82.0 88.5 99.6 84.5
LDA-ID
100 78.5 87.5 97.2 88.8
FH+LDA-ID
98.6 84.2 81.4 99.8 79.0
DI+ wkNN
99.4 81.7 89.4 98.6 89.7
KPCA + POLR 97.6 80.1 85.9 96.5 83.2
KIOF
99.1 82.8 90.9 98.0 90.7
POLR-I
98.4 80.2 86.0 96.9 84.3
POLR-I2
98.2 79.7 86.9 96.8 85.8
F1
3
1
2
3
82.2 97.5 82.3 84.0
79.9 96.6 82.6 83.4
79.3 98.6 83.0 82.8
85.9 99.2 81.1 83.2
81.8 99.0 85.1 85.0
83.7 97.0 81.1 84.4
83.8 98.5 86.2 86.9
83.3 97.6 81.8 84.1
81.8 97.5 82.3 83.9
Entrepreneurship, Assets, and Pension), with 100 representing the highest possible score; b) The percentage of seats held by women in national parliaments
(GenPar), which represents the percentage of parliamentary seats in a single or
lower chamber held by women.
The extremes of the interval-valued variables are the minimum and the maximum for both gender indicators between 2000 and 2021.
As regards the output feature, the ordered factor is built by dividing the
Human-Development Index (HDI) for 2021 into certain percentiles (United Nations,
2022).
The HDI is the geometric mean of three normalized indices: HDI =
√
3
LEI · EI · II, where LEI stands for the Life Expectancy Index, EI for the Education Index, and II, the Income Index. HDI assesses having a long and healthy
life, being knowledgeable, and having a decent standard of living. For the
ordered categorical variable, we consider 5 possible levels from 3 to 7. For example, for the case of 3 ordered classes, the classes are defined by the labels
[0, L33 [, [L33 , L66 [, [L66 , L100 ], where Li denotes the value of the i-th percentile
of HDI. Therefore, classes are balanced. Several samples are displayed in Fig. 2.
In summary, we have 5 data sets with the same inputs, but 5 different outputs,
which have a different number of ordered categories ranging from 3 to 7. For
each of these 5 data sets, we use a Monte Carlo cross-validation, where 50 random splits of each data set are created. In each split, the data set is divided into
training data (80% of 183 countries) and validation data (20% of 183 countries).
Table 5 shows a summary of the performance of each method for the experimental designs jointly, while Fig. 3 displays the performance separately. The
p-values of the multiple comparison of means by Tukey contrasts are shown in
Table 6 for the experimental designs jointly. As before, we include the significance codes. Table 7 shows performance statistics for three classes.
According to the results in Table 5, the best methods are DI+ wkNN (with
DEH ) and KPCA + POLR, with 40% and 39.98% mean accuracy, respectively.
The third best performance is achieved by KIOF, with 39.21% mean accuracy.
They are not statistically significantly different according to the p-values of the
14
40
Group
Gen
1
2
3
20
0
40
60
80
100
Law
Figure 2: Plot of 10 samples per group from the global development data with 3 classes. The
rectangles denote the IVD, and the dots are the corresponding midpoints.
Tukey contrasts in Table 6. However, DI+ wkNN and KPCA + POLR are
statistically significantly different with respect to the rest of the methods. OF
is the fourth best method in terms of performance, with 38.84% mean accuracy. This naı̈ve method achieves results similar to FH+LDA-ID and POLR-I,
with 38.54% and 38.51% mean accuracy. The methods that perform worst are
LDA-ID, POLR, and POLR-I2, with 37.24%, 37.31%, and 37.43% mean accuracy, respectively. These three methods form a homogeneous group that is,
statistically significantly different from the rest of the methods at level 0.1.
According to the results of Fig. 3, the best accuracy with 3 ordered classes
is provided by POLR-I, with 56.4% mean accuracy. For the sake of brevity, we
do not show the table of p-values for each experimental design, but in this case,
no statistically significant difference is found between POLR-I and DI+ wkNN,
KPCA + POLR, KIOF, and POLR-I2 at level 0.1, i.e. we find a statistically
significant difference between POLR-I and all the naı̈ve approaches (LDA-ID,
POLR, OF), together with FH+LDA-ID. If F1-score rankings in Table 7 are
considered, the methods with jointly lowest rankings (best performing methods)
in the three classes coincide with those methods with the highest accuracies
(POLR-I and KPCA + POLR).
Following the results of Fig. 3, the best accuracy with 4 ordered classes is
provided by DI+ wkNN, with 46.8% mean accuracy. DI+ wkNN is statistically
significantly different from LDA-ID, OF, KPCA + POLR, KIOF, and POLR-I
at level 0.1. The best accuracy with 5 ordered classes is provided by KPCA +
POLR, with 37.6% mean accuracy. KPCA + POLR is statistically significantly
different from the rest of the methods at level 0.1. The best accuracy with
15
Table 5: Mean and standard deviation, in brackets, of accuracy (percentage) over 50 splits of
the 5 data sets from the global development data.
Method
POLR
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
POLR-I
POLR-I2
Global
37.31 (13.47)
38.84 (10.48)
37.24 (12.42)
38.54 (11.46)
40.0 (11.77)
39.98 (12.14)
39.21 (11.75)
38.51 (12.31)
37.43 (13.5)
0.6
Model
LDA−ID
Model accuracy
POLR
0.4
OF
FH + LDA−ID
DI + wkNN
KPCA + POLR
KIOF
0.2
POLR−I
POLR−I2
0.0
3
4
5
6
7
Number of groups
Figure 3: Mean and standard deviation of accuracy over 50 simulations for each experimental
design of the global development data.
6 ordered classes is provided by DI+ wkNN, with 34.6% mean accuracy. DI+
wkNN is statistically significantly different from POLR, FH+LDA-ID, POLR-I,
and POLR-I2 at level 0.1. The best accuracy with 7 ordered classes is provided
by OF, with 31.4% mean accuracy. OF is statistically significant different from
LDA-ID, POLR, FH+LDA-ID, KPCA + POLR, POLR-I, and POLR-I2 at level
0.1. In summary, except for one simulation design, DI+ wkNN is among the
best methods in all situations.
4.3. Catalan meteorological data
We consider data from 160 Catalan weather stations in 2015 (see Fig. 4).
These data were provided by the Servei Meteorològic de Catalunya (Catalan
Meteorological Service). The input features are two functional interval-valued
variables observed at 365 points: a) the minimum and maximum daily temperatures measured in degrees Celsius for each of the 365 days; and b) the
16
Table 6: P-values of Tukey simultaneous comparisons for the global development data.
Method
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
POLR-I
POLR
.01734*
.91987
.05605 .
3.09e-05***
3.58e-05***
.00320**
.06283 .
.85367
.02815*
OF
.01315*
LDA-ID
FH+LDA-ID
POLR-I2
.63874
.07291 .
.07843 .
.56864
.60324
.04432*
1.98e-05***
2.30e-05***
.00230**
.04990*
.77562
.02369*
.02583*
.29861
.95988
.08428 .
DI+ wkNN
.97325
KPCA + POLR
.22104
.02076*
6.80e-05***
.23395
.02267*
7.82e-05***
.27585
.00571**
KIOF
POLR-I
.09371 .
Table 7: Mean of accuracy (Acc.), precision, recall and F1 for each class, over 50 simulations
for the global development data with three classes. The maximum value in each column
appears in bold.
Method
POLR
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
POLR-I
POLR-I2
Acc
54.4
52.1
53.6
51.8
55.2
56.2
54.9
56.4
55.0
Precision
1
2
3
56.8 45.1 61.1
52.2 40.9 57.6
51.6 49.9 61.6
52.1 45.4 69.0
57.9 44.5 71.3
54.2 48.7 69.9
55.7 45.9 63.6
59.2 50.3 62.9
56.6 45.9 63.8
1
61.0
78.0
50.1
40.8
58.3
72.8
76.5
67.8
62.2
Recall
2
38.2
16.3
49.2
65.3
54.6
36.6
30.3
41.8
37.9
3
66.6
65.9
64.5
51.2
53.9
62.2
61.6
63.8
68.8
1
58.1
61.6
49.5
44.2
56.7
61.0
63.5
62.4
58.2
F1
2
39.7
23.7
48.1
52.8
48.0
39.8
35.7
44.0
39.2
3
62.0
60.2
61.8
57.6
60.5
65.0
61.3
61.5
64.5
minimum and maximum daily relative humidity values measured in percentage
for each of the 365 days. Fig. 5 displays a sample of these functions. As both
variables are measured in non-compatible units, each functional variable is standardized, so that both variables have an equal weight in the methods that use
distances. We consider the average daily temperatures of each day and weather
station, which were also provided by the Servei Meteorològic de Catalunya.
Then, functional means and variances are defined daily across weather stations
(Ramsay & Silverman, 2005). The same procedure is carried out for relative
humidity. The functional averages are subtracted from the respective functions,
then the functions are divided by the respective standard deviation functions.
Note that our two interval-valued functional variables are equivalent to 730
(365 × 2) interval-valued variables, which are also highly correlated between
neighboring days. This means that some methods fail and cannot be used
with these data. In particular, the methods that fail are LDA-ID, POLR, OF,
FH+LDA-ID, POLR-I, and POLR-I2, i.e. all the methods except DI+ wkNN
(with DF EH for bivariate IVF), KPCA + POLR, and KIOF. We have tried
to solve this problem by sampling the days, and rather than using 365 days,
considering only one in every 30 days for the methods that fail. Therefore, we
work with two interval-valued functional variables observed on 12 days, rather
than 365 days, for methods LDA-ID, OF, and FH+LDA-ID, which is equivalent
to 24 (12 × 2) interval-valued variables. However, methods POLR, POLR-I,
17
Figure 4: Map of Catalonia (Spain) with the situation of the Catalan weather stations and
their altitude.
and POLR-I2 continue to fail, and, therefore, they are not considered in this
problem. In summary, for this problem, we work with the full data set with
DI+ wkNN, KPCA + POLR, and KIOF, and with a time sampled data set for
LDA-ID, OF, and FH+LDA-ID.
As regards the output feature, the ordered factor is built from the division
into certain percentiles of altitude of each weather station. As before, for the
ordered categorical variable, we consider 5 possible levels from 3 to 7. For
example, for the case of 3 ordered classes, the balanced classes are defined by
the labels [0, L33 [, [L33 , L66 [, [L66 , L100 ], where Li denotes the value of the i-th
percentile of altitude. Therefore, we have 5 data sets with the same inputs, but
5 different outputs, which have a different number of ordered categories ranging
from 3 to 7. For each of these 5 data sets, we use a Monte Carlo cross-validation,
where 50 random splits of each data set are created. In each split, the data set
is divided into training data (80% of the weather stations) and validation data
(20% of the weather stations).
Table 8 shows a summary of the performance of each method for the experimental designs jointly, while Fig. 6 displays the performance separately. The
p-values of the multiple comparison of means by Tukey contrasts are shown in
Table 9 for the experimental designs jointly. As before, we include the significance codes. Table 10 shows performance statistics for three classes.
According to the results in Table 8, the best method is KIOF, with 75.62%
mean accuracy. KIOF is statistically significantly different from the rest of the
methods according to the p-values of the Tukey contrasts in Table 9. The following best methods in terms of performance are KPCA + POLR, OF, and DI+
wkNN, with 73.72%, 73.48%, and 72.18% mean accuracy, respectively. These
18
Lac Redon (2.247 m)
30
20
20
10
10
0
0
−10
90
60
60
30
200
300
0
100
200
300
30
0
Prades (916m)
40
40
30
30
20
20
10
10
0
0
−10
100
200
300
0
100
200
300
200
300
Prades (916m)
SantPau (852m)
Hum
100
SantPau (852m)
Temp
90
−10
0
90
90
60
60
30
30
−10
0
100
200
300
0
100
200
300
0
Torredembarra (4m)
Torroella (7m)
Temp
Lac Redon (2.247 m)
Tosa d’Alp (2500m)
Hum
40
30
40
40
30
30
20
20
10
10
0
0
−10
100
200
300
0
100
Torredembarra (4m)
Torroella (7m)
Hum
Temp
Tosa d’Alp (2500m)
40
90
90
60
60
30
30
−10
0
100
200
300
0
100
200
300
0
100
200
a)
300
0
100
200
300
b)
Figure 5: a) Minimum (blue) and and maximum (red) daily temperatures on the Celsius scale
and b) minimum (blue) and maximum (red) daily relative humidity values (right) of a sample
of six stations. Weather stations of high, medium and low altitude appear in top, middle and
bottom rows, respectively.
Table 8: Mean and standard deviation, in brackets, of accuracy (percentage) over 50 splits of
the 5 data sets from the meteorological data.
Method
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
Global
73.48 (13.55)
68.15 (14.89)
69.58 (14.48)
72.18 (13.8)
73.72 (13.35)
75.62 (11.77)
three methods form a homogeneous group that is statistically significantly different from the rest of the methods. The methods that perform the worst
are LDA-ID and FH+LDA-ID, with 68.15% and 69.58% mean accuracy, respectively. These two methods form another homogeneous group that is statistically
significantly different from the rest of the methods.
According to the results in Fig. 6, the best accuracy with 3 ordered classes
is provided by KIOF, with 86.5% mean accuracy. For the sake of brevity, we
do not show the table of p-values for each experimental design, but in this case,
no statistically significant difference is found between KIOF and OF at level
0.05. The highest or second highest value in the majority (6) of columns (not
including accuracy) in Table 10 is reached by KIOF.
Following the results of Fig. 6, the best accuracy with 4 ordered classes
is provided by KPCA + POLR, with 82.5% mean accuracy. No statistically
significant difference is found between KPCA + POLR and DI+ wkNN at level
19
1.00
Needs sampling?
0.75
FALSE
Model accuracy
TRUE
Model
0.50
LDA−ID
OF
FH + LDA−ID
KIOF
0.25
DI + wkNN
KPCA + POLR
0.00
3
4
5
6
7
Number of groups
Figure 6: Mean and standard deviation of accuracy over 50 simulations for each experimental
design of the meteorological data.
Table 9: P-values of Tukey simultaneous comparisons for the meteorological data.
Method
LDA-ID
FH+LDA-ID DI+ wkNN KPCA + POLR
KIOF
OF
2.98e-08*** 4.72e-05***
.17387
.79363
.02459*
LDA-ID
.13608
2.68e-05***
6.65e-09***
9.77e-15***
FH+LDA-ID
.00659**
1.50e-05***
3.23e-10***
DI+ wkNN
.10498
.00032***
KPCA + POLR
.046947*
0.05. The best accuracy with 5 ordered classes is provided again by KPCA
+ POLR, with 75.2% mean accuracy. No statistically significant difference is
found between KPCA + POLR and KIOF at level 0.05. The best accuracy
with 6 ordered classes is provided by KIOF, with 71% mean accuracy. No
statistically significant difference is found between KIOF and OF at level 0.05.
The best accuracy with 7 ordered classes is provided again by KIOF, with 69.5%
mean accuracy. KIOF is statistically significantly different from the rest of the
methods at level 0.05. In summary, except for one simulation design, KIOF is
among the best methods in all situations.
4.4. Discussion
Let us consider the results of the simulated and real data sets together.
KIOF and DI + wkNN are the best methods (Table 2) for simulated data. DI +
wkNN, KPCA + POLR, and KIOF are the best methods (Table 5) for the global
development data, while KIOF is the best method for the meteorological data.
Therefore, KIOF appears to be the best methodology in all three situations.
KIOF is a nonparametric and highly nonlinear method. Appart from KIOF, DI
+ wkNN and KPCA + POLR also seem better alternatives than FH + LDA-ID,
POLR-I, and POLR-I2. These three methods are parametric, unlike KIOF and
20
Table 10: Mean of accuracy (Acc.), precision, recall and F1 for each class, over 50 simulations
for the meteorological data with three classes. The maximum value in each column appears
in bold.
Method
OF
LDA-ID
FH+LDA-ID
DI+ wkNN
KPCA + POLR
KIOF
Acc
86.0
81.3
81.8
83.9
83.8
86.5
Precision
1
2
3
82.0 80.5 92.2
82.5 71.3 93.4
83.9 71.8 93.5
89.2 71.6 97.0
88.0 72.3 90.1
90.5 77.8 93.4
1
95.9
85.5
88.4
85.5
89.2
93.5
Recall
2
71.1
75.8
76.2
86.7
77.4
84.9
3
88.8
83.2
82.4
81.6
84.6
82.8
1
87.5
82.6
84.6
86.2
87.6
91.1
F1
2
74.5
72.1
72.6
78.2
75.5
80.1
3
89.8
86.8
86.5
87.6
85.8
86.1
DI + wkNN, which are nonparametric methods. KPCA + POLR combines a
nonparametric part with a parametric method.
Taking order and interval-valued information into account is beneficial. Although OF is quite competitive despite being a naı̈ve approach, KIOF that
combines OF and the use of the interval-valued information improves the performance. The other two naı̈ve methods, POLR and LDA-ID do not perform
so well, being among the worst methods in all the data sets. FH + LDA-ID is
only statistically significantly better than LDA-ID for the global development
data. Note that it depends on the class probabilities returned by LDA-ID.
For the functional case, it is clear that KIOF, DI + wkNN, and KPCA +
POLR are the best options, since for the other methods we have to discard some
information by sampling.
5. Conclusions
We have proposed six methodologies (FH + LDA-ID, DI + wkNN, KPCA
+ POLR, KIOF, POLR-I, and POLR-I2) for computing ordinal classification in
two different cases, with interval-valued data and functional interval-valued data
as inputs. To the best of our knowledge, this is the first time these issues have
been addressed. We have made an extensive comparative study with simulated
and real data sets and different experimental setups regarding the number of
levels of output.
Although there is no single method that always performs the best in all
possible data sets, there are some methods that are more recommendable. KIOF
has returned excellent results with both interval-valued data and functional
interval-valued data. DI + wkNN and KPCA + POLR are also recommendable
in both cases.
As future work, more methods could be explored. For example, we use
KPCA + POLR, but another ordinal classification method could be used after
KPCA rather than POLR. POLR with variable selection could also be used.
Another line of future work would be to work with interval-valued mixed data,
with function and vector parts, or to extend the work to other symbolic data.
However, applications are also one of the main directions for future work since
21
many times interval-valued information is not exploited (Pérez-Navarro et al.,
2023). Other ways to explore are dealing with incomplete interval-valued data
(Qi et al., 2021) and imbalanced interval-valued data (Qi et al., 2023). Note
that we obtain ordinal classification problems by discretizing the response into
Q different classes with equal frequency. We prefer to assess the performance
in this more controlled environment since this is the first time that ordinal
classification is addressed with IVD and IVF.
CRediT authorship contribution statement
A. A.: Data curation, Formal analysis, Investigation, Software, Visualization, Writing - review & editing. M. M-G.: Data curation, Formal analysis,
Investigation, Software, Visualization, Writing - review & editing. I.E.: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing - original draft, Writing - review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests
or personal relationships that could have influenced the work reported in this
paper.
Acknowledgments
The authors would like to thank the Servei Meteorològic de Catalunya for
providing them with the meteorological data.
This research was partially supported by the Spanish Ministry of Universities (FPU grant FPU20/0182), Spanish Ministry of Science and Innovation
(PID2022-141699NB-I00, PID2020-118763GA-I00 and PID2020-118071GB-I00),
CIGE/2022/066 from Generalitat Valenciana and UJI-B2020-22 and UJI-A202212 from Universitat Jaume I, Spain.
References
Agresti, A. (2002). Categorical Data Analysis. Wiley.
Aguilera, A., & Escabias, M. (2008). Solving multicollinearity in functional
multinomial logit models for nominal and ordinal responses. In Functional
and Operatorial Statistics (pp. 7–13). Springer.
Alcacer, A., Epifanio, I., Valero, J., & Ballester, A. (2021). Combining classification and user-based collaborative filtering for matching footwear size.
Mathematics, 9 , 771.
Angulo, C., Anguita, D., Gonzalez-Abril, L., & Ortega, J. (2008). Support
vector machines for interval discriminant analysis. Neurocomputing, 71 , 1220–
1229.
22
Appice, A., d’Amato, C., Esposito, F., & Malerba, D. (2006). Classification of
symbolic objects: A lazy learning approach. Intelligent Data Analysis, 10 ,
301–324.
Balugani, E., Lolli, F., Pini, M., Ferrari, A. M., Neri, P., Gamberini, R., &
Rimini, B. (2021). Dimensionality reduced robust ordinal regression applied
to life cycle assessment. Expert Systems with Applications, 178 , 115021.
Barbero-Gómez, J., Gutiérrez, P.-A., Vargas, V.-M., Vallejo-Casas, J.-A., &
Hervás-Martı́nez, C. (2021). An ordinal cnn approach for the assessment of
neurological damage in Parkinson’s disease patients. Expert Systems with
Applications, 182 , 115271.
Bertrand, P., & Goupil, F. (2000). Descriptive statistics for symbolic data.
In Analysis of symbolic data: exploratory methods for extracting statistical
information from complex data (pp. 106–124). Springer.
Billard, L. (2006). Symbolic data analysis: what is it? In Compstat 2006Proceedings in Computational Statistics: 17th Symposium Held in Rome,
Italy, 2006 (pp. 261–269). Springer.
Billard, L. (2008). Some analyses of interval data. Journal of Computing and
Information Technology, 16 , 225–233.
Billard, L., & Diday, E. (2003). From the statistics of data to the statistics of
knowledge:symbolic data analysis. Journal of the American Statistical Association, 98 , 470–487.
Blanco-Fernández, A., Corral, N., & González-Rodrı́guez, G. (2011). Estimation
of a flexible simple linear model for interval data based on set arithmetic.
Computational Statistics & Data Analysis, 55 , 2568–2578.
Brito, P. (2014). Symbolic data analysis: another look at the interaction of
data mining and statistics. Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 4 , 281–295.
Brito, P., & Duarte Silva, A. P. (2012). Modelling interval data with normal
and skew-normal distributions. Journal of Applied Statistics, 39 , 3–20.
Chen, J., Li, Z., Wang, X., & Zhai, J. (2022). A hybrid monotone decision tree
model for interval-valued attributes. Advances in Computational Intelligence,
2 , 1–11.
Chen, Y., & Billard, L. (2019). A study of divisive clustering with Hausdorff
distances for interval data. Pattern Recognition, 96 , 106969.
De Carvalho, F. d. A., Brito, P., & Bock, H.-H. (2006). Dynamic clustering for
interval data based on L2 distance. Computational Statistics, 21 , 231–250.
23
D’Esposito, M. R., Palumbo, F., & Ragozini, G. (2012). Interval archetypes:
a new tool for interval data analysis. Statistical Analysis and Data Mining:
The ASA Data Science Journal , 5 , 322–335.
Do, T.-N., & Poulet, F. (2005). Kernel methods and visualization for interval
data mining. In Proceedings of the Conference on Applied Stochastic Models
and Data Analysis, ASMDA (pp. 345–354).
Duarte Silva, A., & Brito, P. (2015). Discriminant analysis of interval data: An
assessment of parametric and distance-based approaches. Journal of Classification, 32 , 516–541.
Duarte Silva, A. P., & Brito, P. (2006). Linear discriminant analysis for interval
data. Computational Statistics, 21 , 289–308.
Duarte Silva, A. P., Brito, P., Filzmoser, P., & Dias, J. G. (2021). MAINT.Data:
Modelling and Analysing Interval Data in R. The R Journal , 13 , 336–364.
Duarte Silva, A. P., Filzmoser, P., & Brito, P. (2018). Outlier detection in
interval data. Advances in Data Analysis and Classification, 12 , 785–822.
D’Urso, P., De Giovanni, L., Maharaj, E. A., Brito, P., & Teles, P. (2023).
Wavelet-based fuzzy clustering of interval time series. International Journal
of Approximate Reasoning, 152 , 136–159.
Epifanio, I. (2008). Shape descriptors for classification of functional data. Technometrics, 50 , 284–294.
Epifanio, I., & Ventura, N. (2011). Functional data analysis in shape analysis.
Computational Statistics & Data Analysis, 55 , 2758–2773.
Fan, G., Cao, J., & Wang, J. (2010). Functional data classification for temporal
gene expression data with kernel-induced random forests. In IEEE Symp. on
Comput. Intell. in Bioinformatics and Computational Biology (pp. 1–5).
Ferrando, L., Epifanio, I., & Ventura-Campos, N. (2021). Ordinal classification
of 3D brain structures by functional data analysis. Statistics & Probability
Letters, 179 , 109227.
Ferrando, L., Ventura-Campos, N., & Epifanio, I. (2020). Detecting and visualizing differences in brain structures with spharm and functional data analysis.
NeuroImage, 222 , 117209.
Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. Lecture
Notes in Computer Science, 2167 , 145–156.
Freitas, W. W., de Souza, R. M., Amaral, G. J., & De Bastiani, F. (2022). Exploratory spatial analysis for interval data: A new autocorrelation index with
COVID-19 and rent price applications. Expert Systems with Applications,
195 , 116561.
24
Grzegorzewski, P., & Śpiewak, M. (2019). The sign test and the signed-rank
test for interval-valued data. International Journal of Intelligent Systems,
34 , 2122–2150.
Gutiérrez, P., Pérez-Ortiz, M., & et al. (2016). Ordinal regression methods:
Survey and experimental study. IEEE T Know. Data En., 28 , 127–146.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction volume 2.
Springer.
Hechenbichler, K., & Schliep, K. (2004). Weighted k-nearest-neighbor techniques
and ordinal classification. Technical Report Ludwig-Maximilians-Universität
München.
Hirk, R., Hornik, K., & Vana, L. (2019). Multivariate ordinal regression models:
an analysis of corporate credit ratings. Statistical Methods & Applications,
28 , 507–539.
Hornung, R. (2020). Ordinal forests. Journal of Classification, 37 , 4–17.
Hothorn, T., Bretz, F., & Westfall, P. (2008). Simultaneous inference in general
parametric models. Biometrical Journal , 50 , 346–363.
Jahanshahloo, G., Lotfi, F. H., Balf, F. R., & Rezai, H. Z. (2007). Discriminant
analysis of interval data using Monte Carlo method in assessment of overlap.
Applied Mathematics and Computation, 191 , 521–532.
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab-an S4
package for kernel methods in R. Journal of Statistical Software, 11 , 1–20.
Lauro, C. N., & Palumbo, F. (2000). Principal component analysis of interval
data: a symbolic data analysis approach. Computational statistics, 15 , 73–87.
Le-Rademacher, J., & Billard, L. (2012). Symbolic covariance principal component analysis and visualization for interval-valued data. Journal of Computational and Graphical Statistics, 21 , 413–432.
Li, M.-L., Di Mauro, F., Candan, K. S., & Sapino, M. L. (2019). Matrix
factorization with interval-valued data. IEEE Transactions on Knowledge
and Data Engineering, 33 , 1644–1658.
Liu, J., Wang, P., Chen, H., & Zhu, J. (2022). A combination forecasting
model based on hybrid interval multi-scale decomposition: Application to
interval-valued carbon price forecasting. Expert Systems with Applications,
191 , 116267.
Maharaj, E. A., Brito, P., & Teles, P. (2021). A test to compare interval time
series. International Journal of Approximate Reasoning, 133 , 17–29.
25
Montgomery, D. C. (2019). Design and analysis of experiments. John Wiley &
sons.
Pierola, A., Epifanio, I., & Alemany, S. (2016). An ensemble of ordered logistic
regression and random forest for child garment size matching. Computers &
Industrial Engineering, 101 , 455–465.
Pérez-Navarro, A., Montoliu, R., Sansano-Sansano, E., Martı́nez-Garcia, M.,
Femenı́a, R., & Torres-Sospedra, J. (2023). Accuracy of a single position estimate for kNN-based fingerprinting indoor positioning applying error propagation theory. IEEE Sensors Journal , 23 , 18765–18775.
Qi, X., Guo, H., Artem, Z., & Wang, W. (2020). An interval-valued data
classification method based on the unified representation frame. IEEE Access,
8 , 17002–17012.
Qi, X., Guo, H., & Wang, W. (2021). A reliable KNN filling approach for incomplete interval-valued data. Engineering Applications of Artificial Intelligence,
100 , 104175.
Qi, X., Wang, W., Shi, Y., Qi, H., & Mu, X. (2023). AGURF: An adaptive
general unified representation frame for imbalanced interval-valued data. Information Sciences, 641 , 119089.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria.
Ramos-Guajardo, A. B., & Grzegorzewski, P. (2016). Distance-based linear
discriminant analysis for interval-valued data. Information Sciences, 372 ,
591–607.
Ramsay, J. O., & Silverman, B. W. (2005). Functional Data Analysis. (2nd
ed.). Springer.
Rizo Rodrı́guez, S. I., & Tenório de Carvalho, F. A. (2022). Clustering intervalvalued data with adaptive Euclidean and city-block distances. Expert Systems
with Applications, 198 , 116774.
Rossi, F., & Conan-Guez, B. (2002). Multi-layer perceptron on interval data.
In Classification, Clustering, and Data Analysis: Recent Advances and Applications (pp. 427–434). Springer.
Sanidas, E., Grassos, C., Papadopoulos, D. P., Velliou, M., Tsioufis, K., Mantzourani, M., Perrea, D., Iliopoulos, D., Barbetseas, J., & Papademetriou,
V. (2019). Labile hypertension: a new disease or a variability phenomenon?
Journal of Human Hypertension, 33 , 436–443.
Schliep, K., & Hechenbichler, K. (2016). kknn: Weighted k-Nearest Neighbors.
URL: https://CRAN.R-project.org/package=kknn Rpackage version 1.3.1.
26
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis
as a kernel eigenvalue problem. Neural computation, 10 , 1299–1319.
Shimizu, N. (2011). Hierarchical clustering for interval-valued functional data.
In J. Watada, G. Phillips-Wren, L. C. Jain, & R. J. Howlett (Eds.), Intelligent
Decision Technologies (pp. 769–778). Berlin, Heidelberg: Springer Berlin
Heidelberg.
Simó, A., Ibáñez, M. V., Epifanio, I., & et al. (2020). Generalized partially linear
models on Riemannian manifolds. J. R. Stat. Soc. C-Appl., 69 , 641–661.
Singer, G., Ratnovsky, A., & Naftali, S. (2021). Classification of severity of
trachea stenosis from EEG signals using ordinal decision-tree based algorithms
and ensemble-based ordinal and non-ordinal algorithms. Expert Systems with
Applications, 173 , 114707.
Sinova, B., Colubi, A., Gil, M., & González-Rodrı́guez, G. (2012). Interval
arithmetic-based simple linear regression between interval data: Discussion
and sensitivity analysis on the choice of the metric. Information Sciences,
199 , 109–124.
de Souza, R. M., Cysneiros, F. J. A., Queiroz, D. C., & Roberta, A. d. A.
(2008). A multi-class logistic regression model for interval data. In 2008
IEEE International Conference on Systems, Man and Cybernetics (pp. 1253–
1258). IEEE.
Sun, L., Wang, K., Xu, L., Zhang, C., & Balezentis, T. (2022a). A timevarying distance based interval-valued functional principal component analysis method – a case study of consumer price index. Information Sciences,
589 , 94–116.
Sun, L., Zhu, L., Li, W., Zhang, C., & Balezentis, T. (2022b). Interval-valued
functional clustering based on the Wasserstein distance with application to
stock data. Information Sciences, 606 , 910–926.
Sun, Y., Zhang, X., Wan, A. T., & Wang, S. (2022c). Model averaging for
interval-valued data. European Journal of Operational Research, 301 , 772–
784.
The World Bank (2022). Data from database: World development indicators.
http://data.worldbank.org/.
United Nations (2022). Human development index. https://hdr.undp.org/datacenter/human-development-index.
Vargas, V. M., Gutiérrez, P. A., & Hervás-Martı́nez, C. (2022). Unimodal
regularisation based on beta distribution for deep ordinal regression. Pattern
Recognition, 122 , 108310.
27
Vega-Márquez, B., Nepomuceno-Chamorro, I. A., Rubio-Escudero, C., &
Riquelme, J. C. (2021). OCEAn: Ordinal classification with an ensemble
approach. Information Sciences, 580 , 221–242.
Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S .
(4th ed.). New York: Springer. ISBN 0-387-95457-0.
Wang, Z., Li, H., Chen, H., Ding, Z., & Zhu, J. (2022). Linear and nonlinear framework for interval-valued PM2.5 concentration forecasting based on
multi-factor interval division strategy and bivariate empirical mode decomposition. Expert Systems with Applications, 205 , 117707.
Xu, M., & Qin, Z. (2022). A bivariate Bayesian method for interval-valued
regression models. Knowledge-Based Systems, 235 , 107396.
28