1. The Center Method
Symbolic data were introduced by Diday in [
1]. In contrast to classical data analysis, in which a variable takes a single value, a variable in symbolic data can take a finite or infinite set of values: For example, an interval variable can take an infinite set of numerical values that range from low to high. As Principal Component Analysis (PCA) is one of the most popular multivariate methods for dimension reduction, its extension to symbolic data is important. Many generalizations of PCA have been developed and several studies have contributed to its extension to interval-valued data. Among the methods for this in the literature, two are the vertex method and the center method [
2,
3,
4]. In [
5], the authors introduced new PCA techniques in order to visualize and compare the structures of interval data. Then, the authors of [
6] proposed an approach that extended the classical PCA method to interval-valued data by using symbolic covariance to determine the principal component space to reflect the total variation in the interval-valued data. PCA has also been extended to histogram data in a number of studies (see [
7,
8,
9,
10,
11]).
Most of these methods were developed for interval matrices, where an interval matrix
X is defined as
where
for all
and
(others authors denote the interval
by
or
for all
and
). An interval matrix can be considered a subset of a matrix
, which we denote by
, such that
. In this case,
.
The center matrix of
X is defined as
where
Here,
for
and
;
is a real number and not an interval. Thus, the center matrix (
2) is a classical matrix. The center principal component method starts from the center matrix; in other words, classical PCA is applied to the center matrix
. Then, the
kth principal components of the centers are
where
is the
kth eigenvector of the variance–covariance matrix of
, defined in Equation (
2). For cases (rows)
, the
kth principal component for an interval variable is constructed as follows. Let
be the interval principal component for an interval variable. Then,
where
and
. More details can be found in [
2].
The dual problem in the center PCA method was introduced by Rodriguez in [
12]. To generalize duality relations, we let
D be an interval matrix, defined as
for
and
, with
and
denoting the average and standard deviation of column
j, respectively. The formulas shown in Theorem 1 are thus obtained and can be used to calculate the projections of interval variables.
Theorem 1. If the hyper-rectangle defined by the jth column of D in the ith principal component is projected in the direction of , then the minimum and maximum values can be computed by Equations (7) and (8), respectively. The proof of this theorem can be found in [
12,
13].
2. The Best Point Method
Let
X be an
matrix of interval variables and let
. If we apply PCA to a matrix
Z, then the
kth principal component of
Z for an observation
, with
is
where
is the average of the variable
(i.e.,
), and
is the
kth eigenvector associated with the variance–covariance matrix of
Z. It is clear that
is an orthonormal basis of
.
For the matrix
X defined in (
1), we let
for
. Then, we define the vertex matrix for an observation
i as
Thus, the vertex matrix of
X is
Next, the rows of the vertex matrix of
X are projected as supplementary elements in the PCA of
Z. We define the supplementary vertex matrix as
where
is the standard deviation of
. To simplify this approach, we denote each row of the matrix
by
with
, in which
is the number of nontrivial intervals, and
. Then, the co-ordinates are obtained:
with
, in which
is the number of nontrivial intervals. Then, the minimum and maximum of the interval can be calculated:
and
The formulas in the following theorem allow us to compute Equations (
15) and (
16) much more quickly.
Theorem 2. The co-ordinates of can be found as follows:where , , and is the mean of the jth column. Proof. As
and
are supplementary elements in the PCA of
Z, they must first be centered with respect to the columns (variables) of
Z. Thus,
Then, from Equations (
17) and (
18),
The following theorem provides the co-ordinates in the variable space; this is a dual relationship. We need to center and standardize the matrix
Z.
Next, we next focus on the matrix
,
. Let
be the
jth column of
, with
. Then, the interval matrix is centered and standardized with respect to
Z:
To facilitate the analysis, we define
The inertia matrix
is symmetric and positive semidefinite, so all its eigenvectors are orthogonal and its eigenvalues are real and nonnegative. We let
denote the
s eigenvectors of
associated with eigenvalues
. Then,
is defined as a matrix of the size
whose columns are the eigenvectors of
. We can compute the co-ordinates of the variables in the correlation circle as
, and we can then compute the
ith column of
Z in the
jth principal component (in the
direction) using Equation (
23).
The next theorem proves the duality relation of any matrix that belongs to an interval matrix.
Theorem 3. If the hyper-rectangle defined by the jth column of in the ith principal component is projected in the direction of , then the maximum and minimum values can be obtained by Equations (24) and (25), respectively. Proof. The proof of this theorem is similar to the proof of Theorem 2. □
In the above, we prove that
and that
and
are a combination of the projections of the vertices of the hyper-rectangle
. We can form duality relations between the eigenvectors of
and
: Both matrices have the same
s positive eigenvalues
, and if
are the first
s eigenvectors of
, then the relations between the eigenvectors of
and
can be computed by Equations (
26) and (
27):
Above, we provide the theory to apply PCA to all matrices . Now, we aim to find a matrix that is optimal for one of two criteria: (1) The minimization of the square of the distance from the vertices of the hypercubes to the principal axes of Z, or (2) the maximization of the variance of the first components of Z. We develop these concepts in the following two sections.
2.1. Minimizing the Square of the Distance from the Hypercube Vertices to the Principal Axes of Z
Let
X be an interval of an
matrix,
, and
, with
and
eigenvectors of the variance–covariance matrix of
Z. We let
denote the vertex matrix of
X and
, in which
is the number of nontrivial intervals for case
. Then, the centered and standardized vertex matrix with respect to
Z has the following form:
Let
be the function defined by
where ||.|| is the Euclidean norm. To compute
, we propose Algorithm 1:
Algorithm 1 The computation of . |
Require:Xan matrix of intervals, , s number of principal components. Ensure:
- 1:
Apply PCA to Z. - 2:
, with and eigenvectors of the variance–covariance matrix of Z. - 3:
Compute the vertex matrix of X. - 4:
Compute the vertex matrix of the centered and standardized X with respect to Z. - 5:
. - 6:
return.
|
As
,
is a finite union of compact sets and
is a continuous function,
always reaches the minimum and the maximum. In this case, the aim is to obtain the matrix
Z that minimizes the distance to the vertex matrix
. The problem that we aim to solve is
Definition 1. The matrix that solves Problem 30 is the optimal matrix with respect to distance, which is denoted by . To perform the optimization that computes , we propose Algorithm 2:
Algorithm 2 Computation of the Best Matrix with respect to the distances of the vertices. |
Require:Xa symbolic matrix of intervals of dimension , , s number of principal components, is the variation tolerance between iterations, and N is the maximum number of iterations. Ensure:.
- 1:
Consider , the center matrix 2, to be the initial value. - 2:
Get by means of optimization algorithm . - 3:
Get Use Theorem 2. - 4:
return.
|
2.2. Maximizing the Variance of the First Components
Let X be an interval matrix of dimension , , and , with and eigenvectors of the variance–covariance matrix of Z and denoting the set of associated eigenvalues of the variance–covariance matrix of Z. We define the function as . To compute , we propose Algorithm 3:
Algorithm 3 The computation of . |
Require:Xan symbolic matrix of intervals of dimension , s number of principal components. Ensure:.
- 1:
Apply PCA to Z. - 2:
set of associated eigenvalues of the variance–covariance matrix of Z. - 3:
. - 4:
return.
|
As above, since
and
is the finite union of compact sets with
s number of principal components,
is a continuous function and, thus, always reaches the minimum and the maximum. In this case, the aim is to obtain the matrix
Z that maximizes the accumulated inertia in the first
s principal components. The problem that we want to solve is
Definition 2. The matrix that solves Problem 31 is the optimal matrix with respect to inertia, denoted by . To perform the optimization that computes , we propose Algorithm 4:
Algorithm 4 The computation of the Best Matrix with respect to inertia. |
Require:Xan symbolic matrix of intervals of dimension, , s number of principal components. Ensure:.
- 1:
Consider , center matrix 2, as the initial value. - 2:
Get by means of the optimization algorithm . - 3:
Get using Theorem 2. - 4:
return.
|
3. Experimental Evaluation: The Application to Facial Recognition
Automatic facial recognition has recently gained momentum, especially in the context of security issues such as access to buildings, and in the context of monitoring and continued surveillance. A well-known application of facial recognition is its incorporation in the iPhone X. According to Apple’s support website, the technology that enables facial ID is some of the most advanced hardware and software that has ever been created. The TrueDepth camera captures accurate facial data by projecting and analyzing over 30,000 invisible dots to create a facial depth map, while also capturing an infrared image. These images are transformed into a mathematical representation, which is compared with registered facial data. In both R and Python, a significant number of libraries, such as the
videoplayR package, have also been developed for facial recognition. The link below contains more details:
http://www.stoltzmaniac.com/facial-recognition-in-r/.
As described in this section, we applied all the proposed methods using
interval-valued variables in a data set of
faces for a total of 27,000 photos. The data set was taken from [
14], in which the authors investigated facial characteristics for detection purposes in a surveillance study. In this study, the center PCA method in [
4] was applied, as shown in
Figure 1. The data are provided in
Table 1.
The data set contains measurements of
random variables designed to identify each face: The length spanned by the eyes
(distance AD in
Figure 1), the length between the eyes
(distance BC), the length from the outer right eye to the upper-middle lip at point H between the nose and mouth
(distance AH), the corresponding length for the left eye
(DH), the length from point H to the outside of the mouth on the right side
(EH), and the corresponding distance to the left side of the mouth
(GH). For each facial image in this facial recognition process, salient features, such as the nose, mouth, and eyes, are located using morphological operators. The boundaries of the located elements are extracted by using a specific active contour method based on Fourier descriptors, which incorporates information about the global shape of each object. Finally, the specific points delimiting each extracted boundary are located, and the distance is measured between a specific pair of points, as represented by the random variables in
Figure 1. This distance measure is expressed as the number of pixels in a facial image. As there is a sequence of such images, the actual measured distances are interval-valued variables. Thus, for example, the eye span distance
for case HUS1 is
for this series of images. Notably, different conditions of alignment, illumination, pose, and occlusion cause variation in the distances extracted from different images of the same person. The study that generated the data set involved nine men and three sequences for each subject for a total of
cases. The complete data set is provided in
Table 1.
It is important to note that the data in
Table 1 are aggregated. There are 27 interval-valued cases; if each case is drawn from a sequence of 1000 images, then there are 27,000 classical point observations in
. An underlying assumption of the standard classical analysis is that all 27,000 observations are independent. However, this is not the case in this data set. The data values for each face form a set of 1000 dependent observations. Therefore, if we were to use each image as a statistical unit by performing classical analysis, then we would lose information about the dependence contained in the 27,000 observations. The resulting principal component analysis would look for axes that maximize the variability across all 27,000 images, regardless of whether some images belong to the same sequence. In contrast, as interval-valued observations are obtained from each sequence, the Best Point method extracts the principal component axes that maximize the variability in each interval (i.e., those that maximize the internal variability), thereby retaining information on the dependency among the 1000 images in each sequence.
Comparison between the Center, Vertex, and Best Point Methods
We applied the vertex, center, and best point principal component methods to the data in
Table 1. The Best Point principal component method was run with two different goals: (1) To minimize the squared distance and (2) to maximize the variance. From this point, the Best Point principal component method that minimizes the squared distance was designated as the Best Point Distance, and the Best Point principal component method that maximizes the variance was designated as the Best Point Variance.
Table 2 shows the first two principal components generated by the four methods.
Figure 2 compares the data in
Table 1 with the principal planes generated by the vertex, center, Best Point Distance, and Best Point Variance principal component methods. The plots of these, along with the first principal component (PC1) and second principal component (PC2) axes, are shown in
Figure 2. The proximity of the three sequences for the three faces for each individual can be readily observed, which validates their within-subject coherence. Furthermore, with all four methods, the same four classes of faces were distinguished. Faces {INC, FRA} can be regarded as one class, faces {HUS, PHI, JPL} and {ISA, ROM} constitute two other classes, and faces {LOT, KHA} form the fourth class. Of the four principal planes in the graph, those corresponding to the Best Point Distance and Best Point Variance methods show much better separation of the face classes, which results in superior facial classification. In other words, the proposed methods more accurately predicted the individual in the photo.
Numerical analysis results confirm that the separation of classes from the Best Point Distance and Best Point Variance methods was much better than that from the other methods.
Table 3 compares the accumulated variance of the vertex, center, Best Point Distance, and Best Point Variance principal component methods. The better methods are clearly Best Point Distance and Best Point Variance, which, in the third principal component, reached 91.25% and 99.72% of the accumulated variance, respectively; both of which were far superior to the results of the center and vertex methods.
As shown in
Table 4, for the criterion of the minimum distance between the corners of the original hyper-rectangles in
and the principal components, the Best Point Distance method outperformed the other methods, with a minimum distance of 6676.43. This distance was significantly less than the distances obtained by the other methods.
In
Table 5, we show the correlation of the original variables and the first principal component generated by the vertex, center, Best Point Distance, and Best Point Variance methods. For all variables except for
, the correlation was stronger with Best Point Distance or Best Point Variance. It can be inferred that the original variables were better represented by the PC1 component of the Best Point Distance and Best Point Variance methods. Interestingly, the correlation of the PC1 component from the Best Point Distance and Best Point Variance methods was stronger with the variables for the upper part of the face. The same result was generated if the correlations of the original variables were analyzed with respect to the other principal components of Best Point Distance and Best Point Variance.
4. Conclusions
This work focused on improving the center and vertex principal component methodology for interval-valued data. Compared with classical methods, symbolic methods based on interval-valued variables have important advantages, such as improved computational complexity due to reduced execution times, as small data tables are used. For example, for the facial recognition example, a table was passed from 27,000 cases to only 1 of 27 cases. In addition, symbolic methods allow for much better handling and interpretation of data variability. In the facial recognition scenario, the variation in the distances that measure different variables from one photo to another of the same person (such as the variation in the distance of the eyes from one photo to another) was due to the variation in the angle at which the photo was taken.
The Best Point methods proposed in this paper considerably improved both the center method and the vertex method. This is because Best Point Variance maximized the variance explained by the components and Best Point Distance minimized the squared distance between the vertices of the hyper-rectangles and their respective projections. As shown in the tables above, this led to a substantial improvement in all the quality indices used in principal component analysis. The result is better data clustering and, therefore, better prediction.
In future works, a consensus between the Best Point Variance and Best Point Distance methods could be constructed by applying a multiobjective optimization method to the functions
and
. Finally, all the proposed algorithms for executing symbolic analyses of interval-valued data are available in the RSDA package in R (see [
15]).