[go: up one dir, main page]

Academia.eduAcademia.edu
Ordinal classification for interval-valued data and interval-valued functional data arXiv:2310.19433v1 [stat.ME] 30 Oct 2023 Aleix Alcacera , Marina Martinez-Garciaa, Irene Epifanioa,∗ a Dep. Matemàtiques, Universitat Jaume I, Castelló 12071, Spain ∗ Corresponding author Email addresses: aalcacer@uji.es (Aleix Alcacer), martigar@uji.es (Marina Martinez-Garcia), epifanio@uji.es (Irene Epifanio) Preprint submitted to Expert Systems with Applications October 31, 2023 Abstract The aim of ordinal classification is to predict the ordered labels of the output from a set of observed inputs. Interval-valued data refers to data in the form of intervals. For the first time, interval-valued data and interval-valued functional data are considered as inputs in an ordinal classification problem. Six ordinal classifiers for interval data and interval-valued functional data are proposed. Three of them are parametric, one of them is based on ordinal binary decompositions and the other two are based on ordered logistic regression. The other three methods are based on the use of distances between interval data and kernels on interval data. One of the methods uses the weighted k-nearest-neighbor technique for ordinal classification. Another method considers kernel principal component analysis plus an ordinal classifier. And the sixth method, which is the method that performs best, uses a kernel-induced ordinal random forest. They are compared with naı̈ve approaches in an extensive experimental study with synthetic and original real data sets, about human global development, and weather data. The results show that considering ordering and interval-valued information improves the accuracy. The source code and data sets are available at https://github.com/aleixalcacer/OCFIVD. Keywords: Ordinal regression, Interval data, Symbolic data, Functional data analysis, random forest 2020 MSC: 62H30, 62R10 1. Introduction Symbolic data analysis (SDA) is gaining popularity since this kind of data can arise as the result of aggregation of very large data sets, which are very common nowadays (Billard & Diday, 2003). However, there are also data that are naturally symbolic (see Billard (2008) for some illustrative examples of symbolic data). In classical multivariate analysis, data points consist of a single (numerical or categorical) value for each feature, i.e. they are point-valued data. However, symbolic data can be lists, intervals, histograms, etc. In this work, we focus on interval-valued data (IVD), i.e. each data point is expressed in interval format. Using classical techniques with IVD can cause distorted results due to the loss of information, as explained in Billard (2008). Each single observation of point-valued data has no internal variation unlike any single symbolic data value, which has its own internal variation (Billard, 2006). For example, for an interval-valued observation [4, 10] and assuming a uniform distribution across the interval, the variance (s2 ) is 3 (see Bertrand & Goupil (2000) on how to calculate the symbolic sample variance). However, if the mid-point of that interval (7) is considered as a point-valued data, s2 = 0. As consequence, the sample variance of the entire interval-valued data set comprises both the within- and between- observation variations. As an illustrative example, Billard (2006) compared the results obtained by using symbolic 2 analysis versus classical analysis with principal component analysis (PCA). The classical results were less informative than the richer knowledge gained from the symbolic analysis. However, this is not exclusive to PCA, rather the use of naive ways to deal with IVD may introduce errors in analysis (Li et al., 2019). This is why methodologies for IVD should be used with this kind of data. 1.1. Interval-valued data IVD data have been used in different statistical problems, such as archetypal analysis (D’Esposito et al., 2012), classification (Duarte Silva & Brito, 2006, 2015; Ramos-Guajardo & Grzegorzewski, 2016; Appice et al., 2006; de Souza et al., 2008; Jahanshahloo et al., 2007; Qi et al., 2020; Angulo et al., 2008; Rossi & Conan-Guez, 2002), clustering (Shimizu, 2011; Sun et al., 2022b; Chen & Billard, 2019; D’Urso et al., 2023), hypothesis testing (Grzegorzewski & Śpiewak, 2019; Maharaj et al., 2021), outlier analysis Duarte Silva et al. (2018), PCA (Lauro & Palumbo, 2000; Le-Rademacher & Billard, 2012; Sun et al., 2022a), and regression analysis (Blanco-Fernández et al., 2011; Sinova et al., 2012; Xu & Qin, 2022). An excellent survey is found in Brito (2014). Recent advances in analysis of IVD and interesting applications comprise a variety of fields such as carbon price forecasting (Liu et al., 2022), P M2.5 concentration forecasting (Wang et al., 2022), the spatial behavior of the number of cases per COVID-19 and rent price analysis (Freitas et al., 2022), clustering Fungi species (Rizo Rodrı́guez & Tenório de Carvalho, 2022) and forecasting of oil price (Sun et al., 2022c). Focusing on classification methodologies, Duarte Silva & Brito (2006), Duarte Silva & Brito (2015) and Ramos-Guajardo & Grzegorzewski (2016) developed discriminant analysis methods for interval data. Appice et al. (2006) considered different distances for several types of symbolic data with the k nearest neighbor method. de Souza et al. (2008) introduced multi-class logistic regression models. Jahanshahloo et al. (2007) extended data envelopment analysis–discriminant analysis methodology for interval data. Qi et al. (2020) applied traditional classification methods with a different representation of interval data. Support vector machines (Angulo et al., 2008) and artificial neural networks (Rossi & Conan-Guez, 2002) were used in interval data classification. Another field that is gaining popularity is functional data analysis (FDA), since technological advances have permitted the acquisition of functional data. In FDA, data points are functions. Ramsay & Silverman (2005) provide an excellent overview of FDA. Usually, the observed functions are point-valued, but they can also be interval-valued functions (IVF). Some examples of these are the maximum and minimum temperatures for each day in meteorological stations, daily interval stock prices, and a person’s daily blood pressure records. Working with the intervals provides a more realistic view on the weather conditions variations than the simple average values. Analogously, the intervals offer a more relevant information for experts in order to evaluate the stocks tendency and volatility in the same day (Lauro & Palumbo, 2000). Similarly, being aware of the blood pressure fluctuations is critical from the healthy point of view (Sanidas et al., 2019). Likewise, more meaningful results are obtained 3 when interval-valued functions are used rather than single-point functional data, as shown by (D’Urso et al., 2023) in clustering. 1.2. Ordinal classification In classification, a label in a set has to be predicted based on the observation of several inputs. Most of the time, labels are considered unordered, even if they are not. Therefore, the vast majority of classification algorithms are conceived to solve nominal classification problems. Nevertheless, ordered classification problems arise in many fields, such as collaborative filtering (Alcacer et al., 2021), environmental management (Balugani et al., 2021), finance (Hirk et al., 2019), information retrieval, medicine (Barbero-Gómez et al., 2021; Singer et al., 2021), psychology, and social sciences, among others Gutiérrez et al. (2016). In ordered classification problems, labels are ordered: for example, ratings (very low, low, indifferent, high, very high). In those problems, misclassifying an instance in a neighboring class is generally less relevant than misclassifying it in distant classes. Furthermore, Vargas et al. (2022) explained that taking order into account in ordinal classifiers usually accelerates the learning process and reduces the amount of data needed for training. A taxonomy of ordinal classification methods was proposed by Gutiérrez et al. (2016), where methodologies were grouped into three approaches. Naı̈ve approaches use simpler paradigms and comprise regression, nominal classification (the order is neglected (Ferrando et al., 2020)), and cost-sensitive classification. The second approach divides the ordinal categories into several binary labels, i.e. it uses binary decompositions, as made by Frank & Hall (2001). Finally, the third approach comprises the threshold models, which are the most successful methods in ordinal classification. They assume that there is a continuous feature that can explain the behavior of the ordinal factor. These models include: cumulative link models (Agresti, 2002, Ch. 7), support vector machines, discriminant learning, perceptron learning, augmented binary classification, ensembles (Hechenbichler & Schliep, 2004; Hornung, 2020; Vega-Márquez et al., 2021), and Gaussian processes. The number of methods for ordered classification for multivariate data is much lower than for nominal classification (Hornung, 2020; Pierola et al., 2016), and even extremely lower for functional data (Ferrando et al., 2021) or other kinds of data, such as those on Riemannian manifolds (Simó et al., 2020). For the case of interval data, literature about ordinal classification is very scarce. In fact, to the best of our knowledge, only monotonic classification, which is a particular case of ordinal classification, has been studied for IVD (Chen et al., 2022). In monotonic classification, there are monotonicity constraints between inputs and outputs. 1.3. Our contributions Due to this scarcity, the objectives of this work are to introduce and compare different ordinal methods for IVD and IVF. We also aim to compare them with the naı̈ve approach, which consists of a) not taking the order into account, 4 i.e. applying nominal classification methods to ordinal classification problems as if classes were unordered, or b) taking order into account, but discarding the interval-valued nature of the data. Gutiérrez et al. (2016) carried out a comparative evaluation of ordinal methods for multivariate data and showed that, even if the naı̈ve approach can be very competitive, taking into account order improves the performance. To the best of our knowledge, no previous work has had these objectives. Therefore, the novelty of this work lies in: • Proposing several methods for ordinal classification of IVD and IVF. • Comparing those methods using simulated and real data. • Comparing the proposed ordinal methods using nominal classification methods for IVD and IVF to see whether not discarding the information about order is important for the case of IVD. • Comparing the performance when using ordinal classifiers, but without taking into account the interval-valued information of the data, i.e. considering the mid-point of the intervals as inputs. • Providing the R (R Core Team, 2023) code with the algorithms. The data and code are available at https://github.com/aleixalcacer/OCFIVD for reproducibility. The paper is organized as follows: Section 2 reviews previous works that will be used in our proposals. Section 3 introduces the proposed ordinal classification methods for both IVD and IVF. The results are presented and discussed in Section 4. Finally, Section 5 contains some conclusions. 2. Current methodologies 2.1. Multivariate ordinal classification methodologies We review ordinal classifiers for point-valued data. Let X be an N × K matrix with N observations (xi ) of K features and y the response vector, which is an ordered factor with Q levels, the ordered classes C1 , ..., CQ . The Frank and Hall (FH) method (Frank & Hall, 2001): the ordinal classification problem is broken into a series of Q − 1 binary classification problems, where one class is made up of C1 , ..., Cq and the other class of Cq+1 , ..., CQ , for q = 1 ,..., Q − 1. For a new observation with features x, let pq be the estimate of P (y > Cq |x). Therefore, the predicted probabilities of each of the Q classes are: P (y = C1 |x) = 1 − p1 ; P (y = Cq |x) = pq−1 − pq , q = 2, ..., Q − 1; P (y = CQ |x) = pQ−1 , for q = 1, ..., Q − 1. In order to apply this method, we need the binary classifier to return class probability estimates. 5 Weighted k-Nearest-Neighbor Techniques for Ordinal Classification (wkNN): Hechenbichler & Schliep (2004) describe the use of weighted nearest neighbors for ordinal classification. For a new observation x, the k + 1 nearest neighbors to x are found according to a certain distance function d(x, xi ). The k smallest distances are normalized by dividing them by the distance to the (k + 1)th neighbor and transformed into weights by any kernel function. In the case of nominal classification, the predicted class is usually chosen by the weighted majority vote of the k nearest neighbors, i.e. by the mode. But for ordinal classification, the weighted median is used for predicting the class. The implementation is based on the function kknn from the R package kknn (Schliep & Hechenbichler, 2016). Ordered logistic regression (POLR): The cumulative link model is described in detail in Agresti (2002, Ch. 7). The model is logit P (y <= Cq |x) = ζq - η, where the inverse of the standard logistic cumulative distribution function is the logit link function (logit(p) = log(p/(1−p))), ζq parameters give each cumulative logit, and η represents the linear predictor β1 x1 + ... + βK xK . After estimating the parameters, we can predict the class probabilities for a new instance, which is assigned to the class with the highest probability. We have used the polr function from the R package MASS (Venables & Ripley, 2002) in the implementation. Ordinal forest (OF): In OF, ordinal classes are predicted by a random forest methodology introduced by Hornung (2020) for multivariate features. Optimized score values are used in place of the category values of the ordinal response and the results are treated as a metric output. The method is implemented in the function ordfor of the R package ordinalForest. 2.2. Point-valued functional ordinal classification methodologies Kernel-Induced Random Forests (KIRF): KIRF was introduced by Fan et al. (2010) for nominal classification of functional data. Rather than using the raw observations, the idea is to use kernel functions of each two different observations of the training set as candidate splitting rules in the kernelinduced classification trees. Those trees are employed in KIRF. Functional principal component analysis (FPCA) + ordinal classifier: Aguilera & Escabias (2008) considered FPCA followed by POLR. In Ferrando et al. (2021) more methodologies for functional ordinal classification based on a FPCA decomposition of the functions followed by an ordinal classifier are considered. 2.3. Interval-valued classification We focus on the methodologies related with our proposals. de Souza et al. (2008) proposed applying multi-class logistic regression model in two ways. First, they fit that model jointly to the lower and upper extreme values of the intervals. Second, they fit the model to the lower and upper extreme values of the intervals separately. 6 The methodology proposed by Duarte Silva & Brito (2015) returns class probabilities, which are needed for the FH method. In that method, intervals are represented by a bivariate normal distribution (one feature is for the midpoints and the second feature for the logarithm of ranges of the intervals), and classical linear (or quadratic) discriminant analysis are applied. We refer to this method as LDA-ID (Linear Discriminant Analysis of Interval Data). Let X be an N × K matrix with N observations (Xi ) of K interval variables (X.j ), i.e. Xij = [xlij , xuij ]. Assuming all intervals are non-degenerate (xlij < xuij , i = 1, ..., N ; j = 1, ..., K), Xij is represented by the midpoint cij = (xlij +xuij )/2 ∗ and the log-range rij = ln(xuij −xlij ). In the Gaussian model, a joint multivariate Normal distribution N(µ, Σ) is assumed forthe midpoints Cand the logs of the ΣCC ΣCR∗ ranges R∗ i.e., µ = [µ′C µ′R∗ ] and Σ = , where ′ denotes Σ R∗ C Σ R∗ R∗ the transpose and µC and µR∗ are K-dimensional column vectors of the mean values of C and R∗ , respectively, and ΣCC , ΣCR∗ , ΣR∗ C and ΣR∗ R∗ are their variance-covariance matrices. Four different configurations for the variancecovariance matrix are considered: case 1) non-zero correlations among all C and R∗ , therefore, there is no restriction on Σ; case 2) variables X.j are not correlated, therefore, ΣCC , ΣCR∗ , ΣR∗ C and ΣR∗ R∗ are diagonal; case 3) there is no correlation between C and R∗ , therefore, ΣCR∗ = ΣR∗ C = 0; case 4) all C and R∗ are not correlated, therefore, Σ is diagonal. See Brito & Duarte Silva (2012) for details about the estimators for each covariance configuration. For each configuration, the classical linear discriminant classification rule can be obtained as explained by Duarte Silva & Brito (2015). The methodology is implemented in the function lda of the R package MAINT.Data (Duarte Silva et al., 2021). Results for all four configurations are compared by the Bayesian Information Criterion (BIC), and the one with the lowest BIC value is selected. 2.4. Distances between interval-valued data and interval-valued functions Several distances have been defined for IVD and IVF (Shimizu, 2011; Sun et al., 2022b,a). The Hausdorff distance is commonly used. Let us review its definition. Let Xi = ([xli1 , xui1 ], ..., [xliK , xuiK ]) be an observation of K-dimensional IVD, where i = 1, ..., N and xlik ≤ xuik , xlik , xuik ∈ R . The Hausdorff distance between PK Xi and Xj is defined by DH (Xi , Xj ) = k=1 Dk (Xi , Xj ), where Dk (Xi , Xj ) = max(|xlik − xljk |, |xuik − xujk |). The Euclidean Hausdorff Distance is defined by qP K 2 DEH (Xi , Xj ) = k=1 [Dk (Xi , Xj )] . Analogously, the functional Hausdorff distance is defined for a set of IVF Xi (t) = [xli (t), xui (t)], with i = 1, ..., N , xli (t) ≤ xui (t) and t ∈ [a, b], i.e. xli (t) (xui (t)) is the lower (upper) function of Xi (t). The functional Hausdorff distance Rb between Xi (t) and Xj (t) is defined by DF H (Xi (t), Xj (t)) = a max(|xli (t) − xlj (t)|, |xui (t) − xuj (t)|)dt. The Functional Euclidean Hausdorff distance is deRbq fined by DF EH (Xi (t), Xj (t)) = a max{|xli (t) − xlj (t)|2 , |xui (t) − xuj (t)|2 }dt. In practice, the integral can be estimated by numerical integration such as the 7 trapezoidal rule. If we have multivariate IVF, for example, bivariate IVF Fi (t) = p (Xi (t), Yi (t)) we can define DF EH (Fi (t), Fj (t)) = DF EH (Xi (t), Xj (t))2 + DF EH (Yi (t), Yj (t))2 . If the scale of the variables is very different, some standardization should be carried out. See De Carvalho et al. (2006) for different alternatives for IVD. 2.5. Kernel on interval data Do & Poulet (2005) defined a Radial Basis Function (RBF) kernel for dealing with interval data as follows: KI hXi , Xj i = exp(−DEH (Xi , Xj )2 /γ), where the parameter γ is the spread of the kernel. Once the kernel is defined, we can use kernel techniques, such as Kernel Principal Component Analysis (KPCA) (Schölkopf et al., 1998). The implementation is based on the function kpca from the R package kernlab (Karatzoglou et al., 2004). 3. Proposed methods for ordinal classification of interval-valued data and interval-valued functions Besides naı̈ve approaches, such as 1) working with midpoints of intervals and applying ordinal classification methods or 2) using interval classification methods without considering order, we propose the following methods that consider both the order in the response and the interval nature of the input data. FH+ LDA-ID: we propose to use the FH method with LDA-ID as the binary classifier. Although LDA-ID is intended for IVD, we can also use it with IVF. The idea is to discretize the observed IVF to a (fine) grid of equally spaced values, so we can work with them as IVD. Therefore, FH + LDAID can be used with both IVD and IVF. Note that FH takes the order of the output into account and LDA-ID obtains the class probabilities for each interval-valued binary classification problem. If the grid where functions are discretized is too fine, this methodology will not work with functions if we have a great quantity of observed features with respect to observations. In that case, the grid should not be so fine. DI+wkNN: our proposal is to use the distances introduced in Sect. 2.4 together with wkNN for ordinal classification. Depending on whether we work with IVD or IVF, the method will be DEH +wkNN or DF EH +wkNN, respectively, although they will be referred to as DI+wkNN indistinctly. In the implementation, k = 7 is used; the parameter has not been tuned. KPCA + ordinal classifiers: our idea is to carry out a feature extraction stage followed by an ordinal classifier for multivariate data. Preprocessing is a powerful methodology in multivariate data (Hastie et al., 2009, p.150151) for improving the performance of a learning procedure. Previously, it has also been employed successfully with functional data (Epifanio, 2008; Epifanio & Ventura, 2011). The idea is similar to carrying out FPCA + ordinal classifier, which is explained in Sect. 2.2 for point-valued functional 8 data, but in our proposal we consider KPCA with a kernel for interval data. Note that Do & Poulet (2005) only defined an RBF kernel for IVD. However, we can extend it for IVF. Therefore, we introduce a new definition, an RBF kernel for IVF KF I hXi (t), Xj (t)i = exp (−DF EH (Xi (t), Xj (t))2 /γ). In the implementation, γ = 1 is used; the parameter has not been tuned. Therefore, depending on whether we work with IVD or IVF, the method will be KI PCA+ ordinal classifier or KF I PCA+ ordinal classifier, respectively. The ordinal classifier can be POLR, OF, or another ordinal classifier for multivariate data. We have considered POLR in the experimental section, so we will refer to KPCA +POLR for IVD or IVF indistinctly. KIOF: our proposal is to use KIRF with IVD and IVF using KI or KF I , respectively, but rather than using nominal classification of random forest (RF) as in KIRF, we consider OF. We refer to this method as KIOF for both KI IOF and KF I IOF indistinctly. POLR-I and POLR-I2: our proposal is to extend the ideas in de Souza et al. (2008) to ordinal classification of IVD and IVF. Rather than using a multiclass logistic regression model, POLR is considered. For IVF, they can be discretized as explained in FH + LDA-ID. Therefore, in POLR-I, the inputs of POLR are the lower and upper bounds of the intervals. However, in POLR-I2, we build two models. One model is built by applying POLR to the lower bounds of the intervals, while the other model is built by applying POLR to the upper bounds of the intervals. The posterior probabilities of the classes for both models are averaged to obtain the final posterior probabilities of the classes. FH+ LDA-ID belongs to the second approach of ordinal binary decompositions, while the rest of methods belong to the third approach of threshold methods. DI+wkNN and KIOF are included in the ensembles subgroup, while KPCA +POLR, POLR-I and POLR-I2 are part of the cumulative link models. In the multivariate case, methods of the third approach are the most successful (Vega-Márquez et al., 2021). Although POLR is fast to train and one of the most popular ordinal classification methods, it has not the best performance in the multivariate setting (Gutiérrez et al., 2016). Therefore, a priori, we expect the last five methods proposed being the best performing, and, specially DI+wkNN and KIOF, if IVD (or IVF) case follows the same pattern as the multivariate case (Gutiérrez et al., 2016). Table 1 presents an overview of the methods used in the experiments for IVD (analogously for IVF). On the one hand, the proposed methodologies (FH+LDA-ID, DI+ wkNN, KPCA + POLR, KIOF, POLR-I, and POLR-I2). On the other hand, the established techniques, which are naı̈ve methods in this case, since no previous methodology has been considered for ordinal classification of IVD until now. Three naı̈ve methodologies are contemplated: using 9 LDA-ID, which is an interval-valued classifier that does not take order into account; and using POLR and OF considering the midpoints of intervals as inputs, i.e. these two classifiers take into account order, but not the interval-valued nature of the data. In the implementation, default parameters (they can be seen in the code file) have been considered. Table 1: Summary of some characteristics of the methods. Method Approach (Sect. 1.2) Parametric Input POLR (Ordered logistic regression) 1st Yes Midpoints of IVD OF (Ordinal forest) 1st No Midpoints of IVD LDA-ID (Linear Discriminant Analysis of Interval Data) 1st Yes Interval bounds transformed points and log-ranges FH+LDA-ID (Frank and Hall method + LDA-ID) 2nd Yes Midpoints and log-ranges for each binary classification problem of FH-method DI+ wkNN (Weighted k-Nearest-Neighbor with DEH or DF EH 3rd No Interval bounds for computing the distances (input of wkNN) KPCA + POLR (Kernel Principal Component Analysis + POLR) 3rd Mixed Interval bounds for computing the distances and the kernel. The projections on the principal components (input of POLR) KIOF (Kernel-Induced Random Forests with OF) 3rd No Interval bounds for computing the distances and the kernel (input of OF) POLR-I (1st Extension of de Souza et al. (2008) with POLR) 3rd Yes Lower and upper bounds separately, as two distinct features POLR-I2 (2nd Extension of de Souza et al. (2008) with POLR) 3rd Yes Lower bounds to POLR-1 and upper bounds to POLR-2. Posterior probabilities of POLR-1 and POLR-2 to combine them into mid- for IVD and IVF resp.) 4. Results and discussion We compare the methodologies in three different scenarios. Artificial data are generated in Sect. 4.1, while real data sets are considered in Sect. 4.2 and 4.3, which deal with IVD and IVF, respectively. Although the details will be provided for each scenario, the experimental setup is common to all three scenarios. In each scenario, we consider a variety of ordered levels for the output. A total of 50 full data sets are built in each case. Each of these data sets is divided into a training set and a test set. For assessing performance, we compute accuracy (success rate) in the corresponding test set. As the same data sets are used to compare all the methods, a completely randomized block design is applied to test the differences between methods, together with Tukey’s test for comparing all pairs of means (Montgomery, 2019). We have used the R package multcomp (Hothorn et al., 2008). 4.1. Simulated data Two ordinal classification problems with IVD are simulated, with three and four ordered classes, respectively. The simulation design of synthetic interval 10 data sets resembles that made by de Souza et al. (2008). Nevertheless, we consider more than 3 classes and the distribution parameters are different bearing in mind that the output is ordinal. Each class is composed of 100 samples generated according to bivariate Normal distributions with non-correlated components and the following pa µ1 and Σ = rameters for the mean µ and the covariance matrix Σ: µ = µ2  2  σ1 ρσ1 σ2 , with ρσ1 σ2 σ22 1. Class 1: µ1 = 25 µ2 = 50, ρ = 0, σ1 = 6, σ2 = 3 2. Class 2: µ1 = 38 µ2 = 40, ρ = 0, σ1 = 3, σ2 = 3 3. Class 3: µ1 = 45 µ2 = 35, ρ = 0, σ1 = 5, σ2 = 5 for the problem with three ordered classes, and 1. 2. 3. 4. Class Class Class Class 1 2 3 4 µ1 µ1 µ1 µ1 = 25 = 30 = 38 = 45 µ2 µ2 µ2 µ2 = 50, = 45, = 40, = 35, ρ = 0 σ1 = 6, σ2 = 3 ρ = 0, σ1 = 5, σ2 = 5 ρ = 0, σ1 = 3, σ2 = 3 ρ = 0, σ1 = 2, σ2 = 3 for the problem with four ordered classes. To build the interval-valued data set, each generated data point (z1 , z2 ) is used as a seed of a vector of intervals (rectangle) defined as ([z1 −γ1 /2, z1 +γ1 /2], [z2 −γ2 /2, z2 +γ2 /2]), where the parameters γ1 and γ2 are randomly drawn from a continuous uniform distribution on the interval [1, 5]. Classes are balanced and several samples are shown in Fig. 1. For both simulation designs, we generate 50 data sets, where 80% of the samples are used for training and 20% are used for testing, i.e. 240 (320) samples for training and 60 (80) samples for testing for the ordinal classification problem with 3 (4) classes. Table 2 contains a summary of the performance of each method for both simulation designs separately (second and third column) and jointly (last column). The maximum value in each column appears in bold. The p-values of the multiple comparison of means by Tukey contrasts are shown in Table 3 for both simulation designs jointly. We include the following significance codes for the ranges of the p-values: ’***’ means that the p-value is in the interval [0, 0.001], ’**’ means that the p-values are in the interval (0.001, 0.01], ’*’ means that the p-value belongs to the interval (0.01, 0.05], ’.’ means that the p-value is in the interval (0.05, 0.1], while no significance code indicates that the p-value is in the interval (0.1, 1]. Depending on the α level considered, we would say that there is a statistically significant difference between the mean accuracy values of both methods. With 3 ordered classes, the best method is KIOF, with 90.7% mean accuracy, followed by DI+wkNN (with DEH ), with 89.8% mean accuracy, and whose differences are statistically significant with respect to the rest of the methods. For the sake of brevity, we do not show the table of p-values for the simulation 11 50 Group Y 1 2 3 40 30 20 30 40 50 60 X Figure 1: Plot of 10 samples per group from the synthetic data with 3 classes. The rectangles denote the IVD, and the dots are the corresponding midpoints. design with 3 classes, but the p-values of the Tukey contrasts for DI+ wkNN and KIOF with respect to the rest of the methods are below 3e-05. There is also a statistically significant difference between DI+ wkNN and KIOF (p-value = 0.0279). Therefore, two of our proposals show better performance than the naı̈ve approaches. Table 4 shows more performance statistics for illustrative purpose, as our problems have balanced sample sizes and this kind of statistics are more critical in imbalanced situations. The highest or second highest value in the majority (6) of columns are reached by DI+ wkNN and KIOF. For the F1-score columns, which combine precision and recall, the highest values correspond to FH+LDA-ID and DI+ wkNN for class 1 (KIOF is the third best performing method); KIOF and DI+ wkNN for class 2; and KIOF and DI+ wkNN for class 3. With 4 ordered classes, four methods obtained the highest mean accuracy with 77.8% mean accuracy. These four methods are OF, DI+ wkNN, KPCA + POLR, and POLR-I2. The method with the next highest mean accuracy is KIOF, whose mean accuracy is 77.1%. As before, for the sake of brevity, we do not show the table of p-values for the simulation design with 4 classes, but the p-values of the Tukey contrasts for KIOF with respect to the four best methods are 0.0695. The p-values of the Tukey contrasts for the four best methods and all other methods except KIOF are below 0.001, so they are statistically significant. Therefore, three of our proposals together with the naı̈ve method OF provide the best performance. When we consider both simulation designs jointly, the best performance is achieved by KIOF, with 83.9% mean accuracy. The second best performance 12 Table 2: Mean and standard deviation, in brackets, of accuracy (percentage) over 50 simulations for synthetic data. Method POLR OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF POLR-I POLR-I2 3 classes 88.2 (3.7) 88.1 (3.2) 88.2 (3.7) 88.1 (3.2) 89.8 (3.5) 88.1 (3.2) 90.7 (3.2) 88.2 (3.7) 88.1 (3.2) 4 classes 76.5 (4.5) 77.8 (4.5) 76.5 (4.5) 76.1 (4.5) 77.8 (4.2) 77.8 (4.5) 77.1 (4.3) 76.5 (4.5) 77.8 (4.5) Global 82.4 (7.2) 83 (6.5) 82.4 (7.2) 82.1 (7.2) 83.8 (7.2) 83 (6.5) 83.9 (7.8) 82.4 (7.2) 83 (6.5) Table 3: P-values of Tukey simultaneous comparison for synthetic data. Method OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF POLR-I POLR-I2 POLR .2235 1 .5670 .0021** .2235 .0011** 1 .2235 .2235 .0737. .0627. 1 .0394* .2235 1 .5670 .0021** .2235 .0011** 1 .2235 .0737. .0001*** .5670 .0737. .8415 .0021** .0627. .0394* .2235 1 .0011** .0394* OF LDA-ID FH+LDA-ID .0003*** DI+ wkNN .0627. KPCA + POLR KIOF POLR-I .2235 is achieved by DI+ wkNN, with 83.8% mean accuracy. The third best performance is achieved by three methods, OF, KPCA + POLR, and POLR-I2, with 83% mean accuracy. The p-values of the Tukey contrasts for KIOF with respect to these three methods are 0.0394, while for all other methods (LDA-ID, FH + LDA- ID, POLR -I), they are below 0.0011, as can be seen in Table 3. Therefore, they are statistically significant. Analogously, the p-values of the Tukey contrasts for DI+ wkNN with respect to the three methods OF, KPCA + POLR, and POLR-I2 are 0.0627, while for all other methods (LDA-ID, FH + LDAID, POLR-I), they are below 0.0021. The method that performs least well is FH+LDA-ID, with 82.1% mean accuracy, which is not statistically significantly different from POLR, LDA-ID, and POLR-I, with 82.4% mean accuracy. In summary, our proposals KIOF and DI+wkNN are the methods that perform best, in fact, statistically significantly better than the rest of the methods for an α level of 0.1. 4.2. Global Development data We consider development data from of 183 countries. The input features are two interval-valued variables based on the following two gender inequality indicators from The World Bank (2022): a) Women Business and the Law Index Score (LAW) (scale 1-100), which measures how laws and regulations affect women’s economic opportunities. Overall scores are calculated by taking the average score of each index (Mobility, Workplace, Pay, Marriage, Parenthood, 13 Table 4: Mean of precision, recall and F1 for each class, over 50 simulations for synthetic data with three classes. The maximum value in each column appears in bold. Method Precision Recall 1 2 3 1 2 POLR 98.2 80.0 86.8 96.8 85.6 OF 94.0 82.0 88.5 99.6 84.5 LDA-ID 100 78.5 87.5 97.2 88.8 FH+LDA-ID 98.6 84.2 81.4 99.8 79.0 DI+ wkNN 99.4 81.7 89.4 98.6 89.7 KPCA + POLR 97.6 80.1 85.9 96.5 83.2 KIOF 99.1 82.8 90.9 98.0 90.7 POLR-I 98.4 80.2 86.0 96.9 84.3 POLR-I2 98.2 79.7 86.9 96.8 85.8 F1 3 1 2 3 82.2 97.5 82.3 84.0 79.9 96.6 82.6 83.4 79.3 98.6 83.0 82.8 85.9 99.2 81.1 83.2 81.8 99.0 85.1 85.0 83.7 97.0 81.1 84.4 83.8 98.5 86.2 86.9 83.3 97.6 81.8 84.1 81.8 97.5 82.3 83.9 Entrepreneurship, Assets, and Pension), with 100 representing the highest possible score; b) The percentage of seats held by women in national parliaments (GenPar), which represents the percentage of parliamentary seats in a single or lower chamber held by women. The extremes of the interval-valued variables are the minimum and the maximum for both gender indicators between 2000 and 2021. As regards the output feature, the ordered factor is built by dividing the Human-Development Index (HDI) for 2021 into certain percentiles (United Nations, 2022). The HDI is the geometric mean of three normalized indices: HDI = √ 3 LEI · EI · II, where LEI stands for the Life Expectancy Index, EI for the Education Index, and II, the Income Index. HDI assesses having a long and healthy life, being knowledgeable, and having a decent standard of living. For the ordered categorical variable, we consider 5 possible levels from 3 to 7. For example, for the case of 3 ordered classes, the classes are defined by the labels [0, L33 [, [L33 , L66 [, [L66 , L100 ], where Li denotes the value of the i-th percentile of HDI. Therefore, classes are balanced. Several samples are displayed in Fig. 2. In summary, we have 5 data sets with the same inputs, but 5 different outputs, which have a different number of ordered categories ranging from 3 to 7. For each of these 5 data sets, we use a Monte Carlo cross-validation, where 50 random splits of each data set are created. In each split, the data set is divided into training data (80% of 183 countries) and validation data (20% of 183 countries). Table 5 shows a summary of the performance of each method for the experimental designs jointly, while Fig. 3 displays the performance separately. The p-values of the multiple comparison of means by Tukey contrasts are shown in Table 6 for the experimental designs jointly. As before, we include the significance codes. Table 7 shows performance statistics for three classes. According to the results in Table 5, the best methods are DI+ wkNN (with DEH ) and KPCA + POLR, with 40% and 39.98% mean accuracy, respectively. The third best performance is achieved by KIOF, with 39.21% mean accuracy. They are not statistically significantly different according to the p-values of the 14 40 Group Gen 1 2 3 20 0 40 60 80 100 Law Figure 2: Plot of 10 samples per group from the global development data with 3 classes. The rectangles denote the IVD, and the dots are the corresponding midpoints. Tukey contrasts in Table 6. However, DI+ wkNN and KPCA + POLR are statistically significantly different with respect to the rest of the methods. OF is the fourth best method in terms of performance, with 38.84% mean accuracy. This naı̈ve method achieves results similar to FH+LDA-ID and POLR-I, with 38.54% and 38.51% mean accuracy. The methods that perform worst are LDA-ID, POLR, and POLR-I2, with 37.24%, 37.31%, and 37.43% mean accuracy, respectively. These three methods form a homogeneous group that is, statistically significantly different from the rest of the methods at level 0.1. According to the results of Fig. 3, the best accuracy with 3 ordered classes is provided by POLR-I, with 56.4% mean accuracy. For the sake of brevity, we do not show the table of p-values for each experimental design, but in this case, no statistically significant difference is found between POLR-I and DI+ wkNN, KPCA + POLR, KIOF, and POLR-I2 at level 0.1, i.e. we find a statistically significant difference between POLR-I and all the naı̈ve approaches (LDA-ID, POLR, OF), together with FH+LDA-ID. If F1-score rankings in Table 7 are considered, the methods with jointly lowest rankings (best performing methods) in the three classes coincide with those methods with the highest accuracies (POLR-I and KPCA + POLR). Following the results of Fig. 3, the best accuracy with 4 ordered classes is provided by DI+ wkNN, with 46.8% mean accuracy. DI+ wkNN is statistically significantly different from LDA-ID, OF, KPCA + POLR, KIOF, and POLR-I at level 0.1. The best accuracy with 5 ordered classes is provided by KPCA + POLR, with 37.6% mean accuracy. KPCA + POLR is statistically significantly different from the rest of the methods at level 0.1. The best accuracy with 15 Table 5: Mean and standard deviation, in brackets, of accuracy (percentage) over 50 splits of the 5 data sets from the global development data. Method POLR OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF POLR-I POLR-I2 Global 37.31 (13.47) 38.84 (10.48) 37.24 (12.42) 38.54 (11.46) 40.0 (11.77) 39.98 (12.14) 39.21 (11.75) 38.51 (12.31) 37.43 (13.5) 0.6 Model LDA−ID Model accuracy POLR 0.4 OF FH + LDA−ID DI + wkNN KPCA + POLR KIOF 0.2 POLR−I POLR−I2 0.0 3 4 5 6 7 Number of groups Figure 3: Mean and standard deviation of accuracy over 50 simulations for each experimental design of the global development data. 6 ordered classes is provided by DI+ wkNN, with 34.6% mean accuracy. DI+ wkNN is statistically significantly different from POLR, FH+LDA-ID, POLR-I, and POLR-I2 at level 0.1. The best accuracy with 7 ordered classes is provided by OF, with 31.4% mean accuracy. OF is statistically significant different from LDA-ID, POLR, FH+LDA-ID, KPCA + POLR, POLR-I, and POLR-I2 at level 0.1. In summary, except for one simulation design, DI+ wkNN is among the best methods in all situations. 4.3. Catalan meteorological data We consider data from 160 Catalan weather stations in 2015 (see Fig. 4). These data were provided by the Servei Meteorològic de Catalunya (Catalan Meteorological Service). The input features are two functional interval-valued variables observed at 365 points: a) the minimum and maximum daily temperatures measured in degrees Celsius for each of the 365 days; and b) the 16 Table 6: P-values of Tukey simultaneous comparisons for the global development data. Method OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF POLR-I POLR .01734* .91987 .05605 . 3.09e-05*** 3.58e-05*** .00320** .06283 . .85367 .02815* OF .01315* LDA-ID FH+LDA-ID POLR-I2 .63874 .07291 . .07843 . .56864 .60324 .04432* 1.98e-05*** 2.30e-05*** .00230** .04990* .77562 .02369* .02583* .29861 .95988 .08428 . DI+ wkNN .97325 KPCA + POLR .22104 .02076* 6.80e-05*** .23395 .02267* 7.82e-05*** .27585 .00571** KIOF POLR-I .09371 . Table 7: Mean of accuracy (Acc.), precision, recall and F1 for each class, over 50 simulations for the global development data with three classes. The maximum value in each column appears in bold. Method POLR OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF POLR-I POLR-I2 Acc 54.4 52.1 53.6 51.8 55.2 56.2 54.9 56.4 55.0 Precision 1 2 3 56.8 45.1 61.1 52.2 40.9 57.6 51.6 49.9 61.6 52.1 45.4 69.0 57.9 44.5 71.3 54.2 48.7 69.9 55.7 45.9 63.6 59.2 50.3 62.9 56.6 45.9 63.8 1 61.0 78.0 50.1 40.8 58.3 72.8 76.5 67.8 62.2 Recall 2 38.2 16.3 49.2 65.3 54.6 36.6 30.3 41.8 37.9 3 66.6 65.9 64.5 51.2 53.9 62.2 61.6 63.8 68.8 1 58.1 61.6 49.5 44.2 56.7 61.0 63.5 62.4 58.2 F1 2 39.7 23.7 48.1 52.8 48.0 39.8 35.7 44.0 39.2 3 62.0 60.2 61.8 57.6 60.5 65.0 61.3 61.5 64.5 minimum and maximum daily relative humidity values measured in percentage for each of the 365 days. Fig. 5 displays a sample of these functions. As both variables are measured in non-compatible units, each functional variable is standardized, so that both variables have an equal weight in the methods that use distances. We consider the average daily temperatures of each day and weather station, which were also provided by the Servei Meteorològic de Catalunya. Then, functional means and variances are defined daily across weather stations (Ramsay & Silverman, 2005). The same procedure is carried out for relative humidity. The functional averages are subtracted from the respective functions, then the functions are divided by the respective standard deviation functions. Note that our two interval-valued functional variables are equivalent to 730 (365 × 2) interval-valued variables, which are also highly correlated between neighboring days. This means that some methods fail and cannot be used with these data. In particular, the methods that fail are LDA-ID, POLR, OF, FH+LDA-ID, POLR-I, and POLR-I2, i.e. all the methods except DI+ wkNN (with DF EH for bivariate IVF), KPCA + POLR, and KIOF. We have tried to solve this problem by sampling the days, and rather than using 365 days, considering only one in every 30 days for the methods that fail. Therefore, we work with two interval-valued functional variables observed on 12 days, rather than 365 days, for methods LDA-ID, OF, and FH+LDA-ID, which is equivalent to 24 (12 × 2) interval-valued variables. However, methods POLR, POLR-I, 17 Figure 4: Map of Catalonia (Spain) with the situation of the Catalan weather stations and their altitude. and POLR-I2 continue to fail, and, therefore, they are not considered in this problem. In summary, for this problem, we work with the full data set with DI+ wkNN, KPCA + POLR, and KIOF, and with a time sampled data set for LDA-ID, OF, and FH+LDA-ID. As regards the output feature, the ordered factor is built from the division into certain percentiles of altitude of each weather station. As before, for the ordered categorical variable, we consider 5 possible levels from 3 to 7. For example, for the case of 3 ordered classes, the balanced classes are defined by the labels [0, L33 [, [L33 , L66 [, [L66 , L100 ], where Li denotes the value of the i-th percentile of altitude. Therefore, we have 5 data sets with the same inputs, but 5 different outputs, which have a different number of ordered categories ranging from 3 to 7. For each of these 5 data sets, we use a Monte Carlo cross-validation, where 50 random splits of each data set are created. In each split, the data set is divided into training data (80% of the weather stations) and validation data (20% of the weather stations). Table 8 shows a summary of the performance of each method for the experimental designs jointly, while Fig. 6 displays the performance separately. The p-values of the multiple comparison of means by Tukey contrasts are shown in Table 9 for the experimental designs jointly. As before, we include the significance codes. Table 10 shows performance statistics for three classes. According to the results in Table 8, the best method is KIOF, with 75.62% mean accuracy. KIOF is statistically significantly different from the rest of the methods according to the p-values of the Tukey contrasts in Table 9. The following best methods in terms of performance are KPCA + POLR, OF, and DI+ wkNN, with 73.72%, 73.48%, and 72.18% mean accuracy, respectively. These 18 Lac Redon (2.247 m) 30 20 20 10 10 0 0 −10 90 60 60 30 200 300 0 100 200 300 30 0 Prades (916m) 40 40 30 30 20 20 10 10 0 0 −10 100 200 300 0 100 200 300 200 300 Prades (916m) SantPau (852m) Hum 100 SantPau (852m) Temp 90 −10 0 90 90 60 60 30 30 −10 0 100 200 300 0 100 200 300 0 Torredembarra (4m) Torroella (7m) Temp Lac Redon (2.247 m) Tosa d’Alp (2500m) Hum 40 30 40 40 30 30 20 20 10 10 0 0 −10 100 200 300 0 100 Torredembarra (4m) Torroella (7m) Hum Temp Tosa d’Alp (2500m) 40 90 90 60 60 30 30 −10 0 100 200 300 0 100 200 300 0 100 200 a) 300 0 100 200 300 b) Figure 5: a) Minimum (blue) and and maximum (red) daily temperatures on the Celsius scale and b) minimum (blue) and maximum (red) daily relative humidity values (right) of a sample of six stations. Weather stations of high, medium and low altitude appear in top, middle and bottom rows, respectively. Table 8: Mean and standard deviation, in brackets, of accuracy (percentage) over 50 splits of the 5 data sets from the meteorological data. Method OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF Global 73.48 (13.55) 68.15 (14.89) 69.58 (14.48) 72.18 (13.8) 73.72 (13.35) 75.62 (11.77) three methods form a homogeneous group that is statistically significantly different from the rest of the methods. The methods that perform the worst are LDA-ID and FH+LDA-ID, with 68.15% and 69.58% mean accuracy, respectively. These two methods form another homogeneous group that is statistically significantly different from the rest of the methods. According to the results in Fig. 6, the best accuracy with 3 ordered classes is provided by KIOF, with 86.5% mean accuracy. For the sake of brevity, we do not show the table of p-values for each experimental design, but in this case, no statistically significant difference is found between KIOF and OF at level 0.05. The highest or second highest value in the majority (6) of columns (not including accuracy) in Table 10 is reached by KIOF. Following the results of Fig. 6, the best accuracy with 4 ordered classes is provided by KPCA + POLR, with 82.5% mean accuracy. No statistically significant difference is found between KPCA + POLR and DI+ wkNN at level 19 1.00 Needs sampling? 0.75 FALSE Model accuracy TRUE Model 0.50 LDA−ID OF FH + LDA−ID KIOF 0.25 DI + wkNN KPCA + POLR 0.00 3 4 5 6 7 Number of groups Figure 6: Mean and standard deviation of accuracy over 50 simulations for each experimental design of the meteorological data. Table 9: P-values of Tukey simultaneous comparisons for the meteorological data. Method LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF OF 2.98e-08*** 4.72e-05*** .17387 .79363 .02459* LDA-ID .13608 2.68e-05*** 6.65e-09*** 9.77e-15*** FH+LDA-ID .00659** 1.50e-05*** 3.23e-10*** DI+ wkNN .10498 .00032*** KPCA + POLR .046947* 0.05. The best accuracy with 5 ordered classes is provided again by KPCA + POLR, with 75.2% mean accuracy. No statistically significant difference is found between KPCA + POLR and KIOF at level 0.05. The best accuracy with 6 ordered classes is provided by KIOF, with 71% mean accuracy. No statistically significant difference is found between KIOF and OF at level 0.05. The best accuracy with 7 ordered classes is provided again by KIOF, with 69.5% mean accuracy. KIOF is statistically significantly different from the rest of the methods at level 0.05. In summary, except for one simulation design, KIOF is among the best methods in all situations. 4.4. Discussion Let us consider the results of the simulated and real data sets together. KIOF and DI + wkNN are the best methods (Table 2) for simulated data. DI + wkNN, KPCA + POLR, and KIOF are the best methods (Table 5) for the global development data, while KIOF is the best method for the meteorological data. Therefore, KIOF appears to be the best methodology in all three situations. KIOF is a nonparametric and highly nonlinear method. Appart from KIOF, DI + wkNN and KPCA + POLR also seem better alternatives than FH + LDA-ID, POLR-I, and POLR-I2. These three methods are parametric, unlike KIOF and 20 Table 10: Mean of accuracy (Acc.), precision, recall and F1 for each class, over 50 simulations for the meteorological data with three classes. The maximum value in each column appears in bold. Method OF LDA-ID FH+LDA-ID DI+ wkNN KPCA + POLR KIOF Acc 86.0 81.3 81.8 83.9 83.8 86.5 Precision 1 2 3 82.0 80.5 92.2 82.5 71.3 93.4 83.9 71.8 93.5 89.2 71.6 97.0 88.0 72.3 90.1 90.5 77.8 93.4 1 95.9 85.5 88.4 85.5 89.2 93.5 Recall 2 71.1 75.8 76.2 86.7 77.4 84.9 3 88.8 83.2 82.4 81.6 84.6 82.8 1 87.5 82.6 84.6 86.2 87.6 91.1 F1 2 74.5 72.1 72.6 78.2 75.5 80.1 3 89.8 86.8 86.5 87.6 85.8 86.1 DI + wkNN, which are nonparametric methods. KPCA + POLR combines a nonparametric part with a parametric method. Taking order and interval-valued information into account is beneficial. Although OF is quite competitive despite being a naı̈ve approach, KIOF that combines OF and the use of the interval-valued information improves the performance. The other two naı̈ve methods, POLR and LDA-ID do not perform so well, being among the worst methods in all the data sets. FH + LDA-ID is only statistically significantly better than LDA-ID for the global development data. Note that it depends on the class probabilities returned by LDA-ID. For the functional case, it is clear that KIOF, DI + wkNN, and KPCA + POLR are the best options, since for the other methods we have to discard some information by sampling. 5. Conclusions We have proposed six methodologies (FH + LDA-ID, DI + wkNN, KPCA + POLR, KIOF, POLR-I, and POLR-I2) for computing ordinal classification in two different cases, with interval-valued data and functional interval-valued data as inputs. To the best of our knowledge, this is the first time these issues have been addressed. We have made an extensive comparative study with simulated and real data sets and different experimental setups regarding the number of levels of output. Although there is no single method that always performs the best in all possible data sets, there are some methods that are more recommendable. KIOF has returned excellent results with both interval-valued data and functional interval-valued data. DI + wkNN and KPCA + POLR are also recommendable in both cases. As future work, more methods could be explored. For example, we use KPCA + POLR, but another ordinal classification method could be used after KPCA rather than POLR. POLR with variable selection could also be used. Another line of future work would be to work with interval-valued mixed data, with function and vector parts, or to extend the work to other symbolic data. However, applications are also one of the main directions for future work since 21 many times interval-valued information is not exploited (Pérez-Navarro et al., 2023). Other ways to explore are dealing with incomplete interval-valued data (Qi et al., 2021) and imbalanced interval-valued data (Qi et al., 2023). Note that we obtain ordinal classification problems by discretizing the response into Q different classes with equal frequency. We prefer to assess the performance in this more controlled environment since this is the first time that ordinal classification is addressed with IVD and IVF. CRediT authorship contribution statement A. A.: Data curation, Formal analysis, Investigation, Software, Visualization, Writing - review & editing. M. M-G.: Data curation, Formal analysis, Investigation, Software, Visualization, Writing - review & editing. I.E.: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing - original draft, Writing - review & editing. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper. Acknowledgments The authors would like to thank the Servei Meteorològic de Catalunya for providing them with the meteorological data. This research was partially supported by the Spanish Ministry of Universities (FPU grant FPU20/0182), Spanish Ministry of Science and Innovation (PID2022-141699NB-I00, PID2020-118763GA-I00 and PID2020-118071GB-I00), CIGE/2022/066 from Generalitat Valenciana and UJI-B2020-22 and UJI-A202212 from Universitat Jaume I, Spain. References Agresti, A. (2002). Categorical Data Analysis. Wiley. Aguilera, A., & Escabias, M. (2008). Solving multicollinearity in functional multinomial logit models for nominal and ordinal responses. In Functional and Operatorial Statistics (pp. 7–13). Springer. Alcacer, A., Epifanio, I., Valero, J., & Ballester, A. (2021). Combining classification and user-based collaborative filtering for matching footwear size. Mathematics, 9 , 771. Angulo, C., Anguita, D., Gonzalez-Abril, L., & Ortega, J. (2008). Support vector machines for interval discriminant analysis. Neurocomputing, 71 , 1220– 1229. 22 Appice, A., d’Amato, C., Esposito, F., & Malerba, D. (2006). Classification of symbolic objects: A lazy learning approach. Intelligent Data Analysis, 10 , 301–324. Balugani, E., Lolli, F., Pini, M., Ferrari, A. M., Neri, P., Gamberini, R., & Rimini, B. (2021). Dimensionality reduced robust ordinal regression applied to life cycle assessment. Expert Systems with Applications, 178 , 115021. Barbero-Gómez, J., Gutiérrez, P.-A., Vargas, V.-M., Vallejo-Casas, J.-A., & Hervás-Martı́nez, C. (2021). An ordinal cnn approach for the assessment of neurological damage in Parkinson’s disease patients. Expert Systems with Applications, 182 , 115271. Bertrand, P., & Goupil, F. (2000). Descriptive statistics for symbolic data. In Analysis of symbolic data: exploratory methods for extracting statistical information from complex data (pp. 106–124). Springer. Billard, L. (2006). Symbolic data analysis: what is it? In Compstat 2006Proceedings in Computational Statistics: 17th Symposium Held in Rome, Italy, 2006 (pp. 261–269). Springer. Billard, L. (2008). Some analyses of interval data. Journal of Computing and Information Technology, 16 , 225–233. Billard, L., & Diday, E. (2003). From the statistics of data to the statistics of knowledge:symbolic data analysis. Journal of the American Statistical Association, 98 , 470–487. Blanco-Fernández, A., Corral, N., & González-Rodrı́guez, G. (2011). Estimation of a flexible simple linear model for interval data based on set arithmetic. Computational Statistics & Data Analysis, 55 , 2568–2578. Brito, P. (2014). Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4 , 281–295. Brito, P., & Duarte Silva, A. P. (2012). Modelling interval data with normal and skew-normal distributions. Journal of Applied Statistics, 39 , 3–20. Chen, J., Li, Z., Wang, X., & Zhai, J. (2022). A hybrid monotone decision tree model for interval-valued attributes. Advances in Computational Intelligence, 2 , 1–11. Chen, Y., & Billard, L. (2019). A study of divisive clustering with Hausdorff distances for interval data. Pattern Recognition, 96 , 106969. De Carvalho, F. d. A., Brito, P., & Bock, H.-H. (2006). Dynamic clustering for interval data based on L2 distance. Computational Statistics, 21 , 231–250. 23 D’Esposito, M. R., Palumbo, F., & Ragozini, G. (2012). Interval archetypes: a new tool for interval data analysis. Statistical Analysis and Data Mining: The ASA Data Science Journal , 5 , 322–335. Do, T.-N., & Poulet, F. (2005). Kernel methods and visualization for interval data mining. In Proceedings of the Conference on Applied Stochastic Models and Data Analysis, ASMDA (pp. 345–354). Duarte Silva, A., & Brito, P. (2015). Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification, 32 , 516–541. Duarte Silva, A. P., & Brito, P. (2006). Linear discriminant analysis for interval data. Computational Statistics, 21 , 289–308. Duarte Silva, A. P., Brito, P., Filzmoser, P., & Dias, J. G. (2021). MAINT.Data: Modelling and Analysing Interval Data in R. The R Journal , 13 , 336–364. Duarte Silva, A. P., Filzmoser, P., & Brito, P. (2018). Outlier detection in interval data. Advances in Data Analysis and Classification, 12 , 785–822. D’Urso, P., De Giovanni, L., Maharaj, E. A., Brito, P., & Teles, P. (2023). Wavelet-based fuzzy clustering of interval time series. International Journal of Approximate Reasoning, 152 , 136–159. Epifanio, I. (2008). Shape descriptors for classification of functional data. Technometrics, 50 , 284–294. Epifanio, I., & Ventura, N. (2011). Functional data analysis in shape analysis. Computational Statistics & Data Analysis, 55 , 2758–2773. Fan, G., Cao, J., & Wang, J. (2010). Functional data classification for temporal gene expression data with kernel-induced random forests. In IEEE Symp. on Comput. Intell. in Bioinformatics and Computational Biology (pp. 1–5). Ferrando, L., Epifanio, I., & Ventura-Campos, N. (2021). Ordinal classification of 3D brain structures by functional data analysis. Statistics & Probability Letters, 179 , 109227. Ferrando, L., Ventura-Campos, N., & Epifanio, I. (2020). Detecting and visualizing differences in brain structures with spharm and functional data analysis. NeuroImage, 222 , 117209. Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. Lecture Notes in Computer Science, 2167 , 145–156. Freitas, W. W., de Souza, R. M., Amaral, G. J., & De Bastiani, F. (2022). Exploratory spatial analysis for interval data: A new autocorrelation index with COVID-19 and rent price applications. Expert Systems with Applications, 195 , 116561. 24 Grzegorzewski, P., & Śpiewak, M. (2019). The sign test and the signed-rank test for interval-valued data. International Journal of Intelligent Systems, 34 , 2122–2150. Gutiérrez, P., Pérez-Ortiz, M., & et al. (2016). Ordinal regression methods: Survey and experimental study. IEEE T Know. Data En., 28 , 127–146. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction volume 2. Springer. Hechenbichler, K., & Schliep, K. (2004). Weighted k-nearest-neighbor techniques and ordinal classification. Technical Report Ludwig-Maximilians-Universität München. Hirk, R., Hornik, K., & Vana, L. (2019). Multivariate ordinal regression models: an analysis of corporate credit ratings. Statistical Methods & Applications, 28 , 507–539. Hornung, R. (2020). Ordinal forests. Journal of Classification, 37 , 4–17. Hothorn, T., Bretz, F., & Westfall, P. (2008). Simultaneous inference in general parametric models. Biometrical Journal , 50 , 346–363. Jahanshahloo, G., Lotfi, F. H., Balf, F. R., & Rezai, H. Z. (2007). Discriminant analysis of interval data using Monte Carlo method in assessment of overlap. Applied Mathematics and Computation, 191 , 521–532. Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab-an S4 package for kernel methods in R. Journal of Statistical Software, 11 , 1–20. Lauro, C. N., & Palumbo, F. (2000). Principal component analysis of interval data: a symbolic data analysis approach. Computational statistics, 15 , 73–87. Le-Rademacher, J., & Billard, L. (2012). Symbolic covariance principal component analysis and visualization for interval-valued data. Journal of Computational and Graphical Statistics, 21 , 413–432. Li, M.-L., Di Mauro, F., Candan, K. S., & Sapino, M. L. (2019). Matrix factorization with interval-valued data. IEEE Transactions on Knowledge and Data Engineering, 33 , 1644–1658. Liu, J., Wang, P., Chen, H., & Zhu, J. (2022). A combination forecasting model based on hybrid interval multi-scale decomposition: Application to interval-valued carbon price forecasting. Expert Systems with Applications, 191 , 116267. Maharaj, E. A., Brito, P., & Teles, P. (2021). A test to compare interval time series. International Journal of Approximate Reasoning, 133 , 17–29. 25 Montgomery, D. C. (2019). Design and analysis of experiments. John Wiley & sons. Pierola, A., Epifanio, I., & Alemany, S. (2016). An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101 , 455–465. Pérez-Navarro, A., Montoliu, R., Sansano-Sansano, E., Martı́nez-Garcia, M., Femenı́a, R., & Torres-Sospedra, J. (2023). Accuracy of a single position estimate for kNN-based fingerprinting indoor positioning applying error propagation theory. IEEE Sensors Journal , 23 , 18765–18775. Qi, X., Guo, H., Artem, Z., & Wang, W. (2020). An interval-valued data classification method based on the unified representation frame. IEEE Access, 8 , 17002–17012. Qi, X., Guo, H., & Wang, W. (2021). A reliable KNN filling approach for incomplete interval-valued data. Engineering Applications of Artificial Intelligence, 100 , 104175. Qi, X., Wang, W., Shi, Y., Qi, H., & Mu, X. (2023). AGURF: An adaptive general unified representation frame for imbalanced interval-valued data. Information Sciences, 641 , 119089. R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. Ramos-Guajardo, A. B., & Grzegorzewski, P. (2016). Distance-based linear discriminant analysis for interval-valued data. Information Sciences, 372 , 591–607. Ramsay, J. O., & Silverman, B. W. (2005). Functional Data Analysis. (2nd ed.). Springer. Rizo Rodrı́guez, S. I., & Tenório de Carvalho, F. A. (2022). Clustering intervalvalued data with adaptive Euclidean and city-block distances. Expert Systems with Applications, 198 , 116774. Rossi, F., & Conan-Guez, B. (2002). Multi-layer perceptron on interval data. In Classification, Clustering, and Data Analysis: Recent Advances and Applications (pp. 427–434). Springer. Sanidas, E., Grassos, C., Papadopoulos, D. P., Velliou, M., Tsioufis, K., Mantzourani, M., Perrea, D., Iliopoulos, D., Barbetseas, J., & Papademetriou, V. (2019). Labile hypertension: a new disease or a variability phenomenon? Journal of Human Hypertension, 33 , 436–443. Schliep, K., & Hechenbichler, K. (2016). kknn: Weighted k-Nearest Neighbors. URL: https://CRAN.R-project.org/package=kknn Rpackage version 1.3.1. 26 Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10 , 1299–1319. Shimizu, N. (2011). Hierarchical clustering for interval-valued functional data. In J. Watada, G. Phillips-Wren, L. C. Jain, & R. J. Howlett (Eds.), Intelligent Decision Technologies (pp. 769–778). Berlin, Heidelberg: Springer Berlin Heidelberg. Simó, A., Ibáñez, M. V., Epifanio, I., & et al. (2020). Generalized partially linear models on Riemannian manifolds. J. R. Stat. Soc. C-Appl., 69 , 641–661. Singer, G., Ratnovsky, A., & Naftali, S. (2021). Classification of severity of trachea stenosis from EEG signals using ordinal decision-tree based algorithms and ensemble-based ordinal and non-ordinal algorithms. Expert Systems with Applications, 173 , 114707. Sinova, B., Colubi, A., Gil, M., & González-Rodrı́guez, G. (2012). Interval arithmetic-based simple linear regression between interval data: Discussion and sensitivity analysis on the choice of the metric. Information Sciences, 199 , 109–124. de Souza, R. M., Cysneiros, F. J. A., Queiroz, D. C., & Roberta, A. d. A. (2008). A multi-class logistic regression model for interval data. In 2008 IEEE International Conference on Systems, Man and Cybernetics (pp. 1253– 1258). IEEE. Sun, L., Wang, K., Xu, L., Zhang, C., & Balezentis, T. (2022a). A timevarying distance based interval-valued functional principal component analysis method – a case study of consumer price index. Information Sciences, 589 , 94–116. Sun, L., Zhu, L., Li, W., Zhang, C., & Balezentis, T. (2022b). Interval-valued functional clustering based on the Wasserstein distance with application to stock data. Information Sciences, 606 , 910–926. Sun, Y., Zhang, X., Wan, A. T., & Wang, S. (2022c). Model averaging for interval-valued data. European Journal of Operational Research, 301 , 772– 784. The World Bank (2022). Data from database: World development indicators. http://data.worldbank.org/. United Nations (2022). Human development index. https://hdr.undp.org/datacenter/human-development-index. Vargas, V. M., Gutiérrez, P. A., & Hervás-Martı́nez, C. (2022). Unimodal regularisation based on beta distribution for deep ordinal regression. Pattern Recognition, 122 , 108310. 27 Vega-Márquez, B., Nepomuceno-Chamorro, I. A., Rubio-Escudero, C., & Riquelme, J. C. (2021). OCEAn: Ordinal classification with an ensemble approach. Information Sciences, 580 , 221–242. Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S . (4th ed.). New York: Springer. ISBN 0-387-95457-0. Wang, Z., Li, H., Chen, H., Ding, Z., & Zhu, J. (2022). Linear and nonlinear framework for interval-valued PM2.5 concentration forecasting based on multi-factor interval division strategy and bivariate empirical mode decomposition. Expert Systems with Applications, 205 , 117707. Xu, M., & Qin, Z. (2022). A bivariate Bayesian method for interval-valued regression models. Knowledge-Based Systems, 235 , 107396. 28