[go: up one dir, main page]

0% found this document useful (0 votes)
8 views4 pages

A Support Vector Clustering Method

Uploaded by

dridabs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

A Support Vector Clustering Method

Uploaded by

dridabs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A Support Vector Clustering Method

Asa Ben-Hur
Faculty of Industrial Engineering and Management
Technion, Haifa 32000, Israel

David Horn
School of Physics and Astronomy
Raymond and Beverly Sackler Faculty of Exact Sciences
Tel Aviv University, Tel Aviv 69978, Israel

Hava T. Siegelmann Vladimir Vapnik


Faculty of Industrial Engineering and Management AT&T Labs Research
Technion, Haifa 32000, Israel 100 Schultz Dr., Red Bank, NJ 07701, USA

Abstract form that outlines its boundaries. It also uses an extension


to a higher dimensional space. However it employs a ker-
We present a novel kernel method for data clustering us- nel method, and hence it can be implemented by algebraic
ing a description of the data by support vectors. The kernel manipulations in the original (input) space. Kernel meth-
rejects a projection of the data points from data space to a ods are the basis of the support vector machine (SVM) ap-
high dimensional feature space. Cluster boundaries are de- proach that was developed for classification problems [2];
fined a s spheres in feature space, which represent complex the decision boundary in the input space is represented by
geometric shapes in data space. We utilize this geometric a hyperplane in the higher dimensional feature space. We
representation of the data to construct a simple clustering use a similar idea, representing an arbitrary closed curve
algorithm. that encloses a cluster of data points by a sphere in high
dimensional feature space. Our formalism is similar to the
analysis of [lo, 111, in which the authors characterize the
support of a distribution represented by a finite data-set and
1. Introduction detect its outliers. This paper presents the general ideas and
an application in clustering. Further developments will be
Clustering of data sets is a problem that has been studied found in [ 11.
extensively [3,4]. Since different types of data sets are of-
ten more amenable to one clustering method than another,
there exists continuing interest in developing yet better al- 2. Describing Cluster Boundaries with Support
gorithms that will have general appeal and success. Vectors
A recent algorithm by Lipson and Siegelmann [ 5 ] repre-
sents data points in a higher dimensional space built out of In this section we develop a method for representing the
direct products of the original data space. Their algorithm boundary of a cluster of data points using the formalism
relies on tensorial manipulations in the higher dimensional of support vectors. Let {xi} C x be the data-set, with
space, and allows for cluster boundaries with rich geometric x Rd,the input space. The boundary of this cluster
shapes. When cluster boundaries overlap the corresponding may be rather complicated. Using a nonlinear transforma-
data points are interpreted as beloging to both clusters. tion from x to some high dimensional space, the clus-
The algorithm developed here shares with [ 5 ] the prop- ter of points may take on a much simpler form. In fact,
erty that a cluster is associated with a specific geometric it turns out that with an appropriate choice of the mapping

0-7695-0750-6/00 $10.00 0 2000 IEEE 724


@, a sphere in the high dimensional space which encloses This formalism can be generalized to allow for outliers
the transformed data points corresponds to a complex geo- [ 12,8,9] obeying
metrical shape which encloses the points in input space (see
figures). We want the shape in input space to be a tight IIwq)- all2 F R2 + t j (9)
fit around the data points, so we look for the smallest en- with 2 0, by extending the Lagrangian into
closing sphere. An enclosing sphere is represented by the
constraints: L , = L - czs,. (10)
II@(xi) - all2 5 R2 V i (1) j
where 11 . 11 is the Euclidean norm and a is the center of This is implemented with the same dual Lagrangian, W as
the sphere. Our goal is to minimize R2 over all choices of above. with the added constraints
a that satisfy these constraints. To solve this problem we
introduce the Lagrangian 0 I P3 F c. (11)
The interpretation is that for all points lying inside or on the
j
sphere t3= 0 and 0,< C , while for the outliers P3 = C.
For each point x we define the distance of its image in
where P j 2 0 are Lagrange multipliers. Minimizing L with feature space from the center of the sphere:
respect to the radius R leads to the normalization condition
R2(x) = ll@(x)- all2 . (12)
CPj=I7 (3)
In view of (4) and the definition of the kernel we have:

while minimization with respect to a leads to R2(X) = K(x7 -2


3
PjK(xJ > x)+x
2>3
PZP3K(XZ, xj)

(4) (13)
j The radius of the cluster is defined as:
Using these relations we may eliminate the variables R and R = {R(x,) I x , is a support vector } . (14)
a, tuming the Lagrangian into the Wolfe dual which is a
function of the variables pj: In practice, one takes the average over all support vectors.
The contour that encloses the cluster in data space is the set
W= Q(Xj)”j - PiPj@(Xi) . Q(Xj) (5)
j id
{x I R(x) = R } . (15)

Maximizing W with respect to pi leads to the emergence of A point x is an outlier if R(x) > R.
two types of data-points: Those for which P j = 0, include The shape of the contour is governed by two parame-
all points that lie inside the sphere and some that may lie on ters, q and C. The figures below demonstrate that as q is
it, and those for which /3j > 0 that are on the boundary of increased, this shape represents a tighter fit to the cluster of
the sphere. The latter are the support vectors which define points. For C = 1 no outliers are allowed due to the con-
the center of the sphere as seen above. straint of Eq. (3). The size of C determines the number
We follow the standard method of SV and represent the of outliers. As C is decreased, the number of outliers in-
dot products @(xi). @(xj)by an appropriate Mercer kernel creases. Moreover, as can be seen from the three q = 50
[7]. Common choices are polynomial kemels examples, the influence of outliers on the shape of the clus-
ter boundary decreases too. The number of support vectors
K(Xi,Xj) = (Xi . xj + l ) d (6) depends on both q and C. We observed that for fixed q , as
C is decreased, their number decreases since the increased
or the Gaussian kernel number of outliers makes the shape smoother, and thus eas-
~ ( x ~ , xe-qIIxt-xjIIZ
~ ) (7) ier to describe with less support vectors. Quantitativeresults
on the parameter dependence of the number of outliers and
which is used here. It was noted in [111 that the polynomial support vectors can be found in [ 11.
kemel is not appropriate since it gives a weight to points
with large coordinate values. A comparison with the Lapla- 3. The Clustering Algorithm
cian kemel is given elsewhere [l]. The Lagrangian W is
now written as:
We extend the single cluster case to a multiple cluster
W = E K ( x j 7 x j ) P j- C P i P j K ( x i , x j ) . (8) problem in a straightforward fashion. We choose the num-
j i,j ber of clusters and pick data-points as initial centers of those

725
clusters. Then we test the data points one by one, and as-
sign each point xi to the cluster which minimizes R(xi).
This assigns each point according to the nearest center of
a feature space sphere. After each data point is added, the
parameters ,Bjare re-evaluated with the algorithm presented
above. There are efficient algorithms tailored especially for
the SVM quadratic programming problem. The Sequential
Minimal Optimization (SMO) 161 can be adapted for the La-
grangian (10) [lo]. Another advantage is that a smart ini-
tialization that uses the result of the previous iteration gives
fast convergence.

06

OS

04
Figure 3. q = 50, C = 1.
03

i
02 06

ot

-01

4 2

4 3

-04
-0s 4 s -0. -02 0 02 04 os
-01

-02

Figure 1. q = 6, C = 1. - 04
4 .3 6

2
OS

Figure 4. q = 50, C = 0.09.

the problem is defined. We see that for low values of q, 6


and 9, the shapes of the cluster boundaries do not form tight
fits to the data. For such values of q the result also depends
on the choice of initial points and the order in which the data
is presented. Higher values of q , 20 and 50, do much better.
-02- Here the treatment of outliers becomes important. Thus, the
-03- three figures of q = 50 show that only as C is decreased to
0.07, the effect of the outliers is reduced and smooth curves
are obtained for the cluster boundaries. The optimal value
of q required to represent the contour of a cluster is data
Figure 2. q = 9, C = 1. dependent, and increases as the desired curvature increases.
In the problem at hand q = 50 seems to be too high since
Examples of the results of this procedure are shown in the lower cluster splits into two pieces. Hence we think that
Figures 1-6. The data set on which we tested the algorithm q = 20,C = 0.07 gives a better solution.
is composed of 144 points in lR2, with 66 points in the upper The algorithm presented here utilizes the support vector
cluster and 78 points in the lower one. The data points are approach for representing cluster boundaries in a very sim-
denoted by dots; those data points that are support vectors ple way. Work on a more elaborate clustering scheme is
or outliers are surrounded by diamonds or squares respec- now in progress.
tively. The curves are the projections of the spheres in the Our algorithm utilizes the support vector approach,
high dimensional feature space onto the data-space in which hence it puts special emphasis on cluster boundaries. This

726
06
A + + + I

I3

Figure 5. q = 50, C = 0.07. Figure 7. Clustering produced by K-means

nual ACM Workshop on COLT, pages 144-1 52. ACM Press,

6
I3
1998.
[3] R. Duda and P. Hart. Pattern classifrcation and scene analy-
sis. Wiley-Interscience, 1973.
[4] A. Jain and R. Dubes. Algorithms for clustering data. Pren-
tice Hall, Englewood Cliffs, NJ, 1988.
[5] H. Lipson and H. Siegelmann. Clustering irregular shapes

4
..
I
m

01 06
using high-order neurons. Neural Computution, 2000, to be
published.
[6] J. Platt. Fast training of SVMs using sequential minimal op-
timization. In B. Scholkopf, C. Burges, and A. Smola, ed-
itors, Advances in Kernel Methods - Support Vector Learn-
ing, pages 185-208. MIT Press, Cambridge, MA, 1999.
[7] S. Saitoh. T h e o q of reproducing kernels and its applica-
tions. Longman Scientific & Technical, 1988.
Figure 6. q = 20, C = 0.07. [8] B. Scholkopf. Support Vector Learning. R. Oldenburg Ver-
lag, 1997.
[9] B. Scholkopf, C. Burgess, and A. Smolla, editors. Advances
can be contrasted with a standard clustering algorithm, like in Kernel Methods - Support Vector Learning. MIT Press,
K-means 141, that separates data into clusters by minimizing 1999.
data distances from cluster centers, a criterion which yields [lo] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and
very simple cluster boundaries. Applying K-means to our R. Williamson. Estimating the support of a high dimen-
data set with K = 2 we obtain the results shown in Fig- sional distribution. In Proceedings of the Annual Conference
ure 7. Obviously this clustering algorithm is less suitable to on Neural Information Systems 1999 (NIPS"99). MIT Press,
this problem than ours, because it does not reflect the intu- 2000.
itive notion that proximity between points within a cluster [ 111 D. Tax and R. Duin. Support vector domain description.
Pattern Recognition letters, 20: 1991-1999, 1999.
should be a key criterion. This highlights the flexibility of
[12] V. Vapnik. The Nature of Statistical Learning Theory.
the SV-algorithm, which takes proximity into account and Springer Verlag, 1995.
can represent clusters with arbitrary shapes.
Acknowledgments.
This work was partially supported by the Israel Ministry of
Science.

References
[I] A. Ben-Hur, D. Hom, H. Siegelmann, and V. Vapnik. A
support vector method for hierarchical clustering. Preprint.
[2] B. Boser, 1. Guyon, and V. Vapnik. A training algorithm for
optimal margin classifiers. In Proceedings of the 5th An-

727

You might also like