[go: up one dir, main page]

0% found this document useful (0 votes)
6 views8 pages

probabilistic_clustering

Uploaded by

James
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

probabilistic_clustering

Uploaded by

James
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Probabilistic Analysis of the RNN-CLINK Clustering Algorithm

S.D. Lang, L.-J. Mao, W.-L. Hsu


School of Computer Science
University of Central Florida
Orlando, FL 32816

ABSTRACT

Clustering is among the oldest techniques used in data mining applications. Typical implementations of the hierarchical
agglomerative clustering methods (HACM) require an amount of O(N2 )-space when there are N data objects, making such
algorithms impractical for problems involving large datasets. The well-known clustering algorithm RNN-CLINK requires
only O(N)-space but O(N3 )-time in the worst case, although the average time appears to be O(N2 log N). We provide a
probabilistic interpretation of the average-time complexity of the algorithm. We also report experimental results using the
randomly generated bit vectors and using the NETNEWS articles as the input, to support our theoretical analysis.

Keywords: Clustering, nearest neighbor, reciprocal nearest neighbor, complete link, probabilistic analysis .

1. INTRODUCTION

Cluster analysis, or clustering, is a multivariate statistical technique which identifies groupings of the data objects based on
the inter-object similarities computed by a chosen distance metric.3, 10, 13. Clustering methods can be divided into two types:
partitional and hierarchical. The partitional clustering methods start with a single group that contains all the objects, then
proceed to divide the objects into a fixed number of clusters. Many early clustering applications used partitional clustering
due to its computational efficiency when large datasets need to be processed. However, partitional clustering methods
require prior specification of the number of clusters as an input parameter. Also, the partitions obtained are often strongly
dependent on the order in which the objects are processed.7. The hierarchical agglomerative clustering methods (HACM)
attempt to cluster the objects into a hierarchy of nested clusters. Starting with the individual objects as separate clusters, the
HACMs successively merge the most similar pair of the clusters into a higher level cluster, until a single cluster is left as the
root of the hierarchy. HACMs are also known as unsupervised classification, because their ability in automatically
discovering the inter-object similarity patterns. Clustering techniques have been applied in a variety of engineering and
scientific fields such as biology, psychology, computer vision, data compression, information retrieval, and more recently,
data mining.3, 4, 8, 10.

In order to cluster the objects in a data set, a distance metric is used to quantify the degree of association (i.e., similarity)
between objects and clusters. Once a distance metric is defined between the objects in the data set, variations of the
clustering algorithms exist depending on how the inter-cluster distance is computed. The most commonly used HACMs are
single link , complete link, group average link , and Ward’s methods, where the single link and complete link methods
represent the two extremes.7. In the single link method, the distance between two clusters is the shortest distance between
two objects in different clusters; in the complete link method, the distance between two clusters is the longest distance
between two objects in different clusters. Several studies reported in the literature 7, 18 indicate superior clustering accuracy of
the complete link method, which is the focus of the present paper.

Typical implementations of the complete link clustering method require O(N2 )-space with N data objects, making the
algorithm infeasible for large data mining applications. One exception is the RNN-CLINK algorithm which uses only O(N)-
space but O(N3 )-time in the worst case, although it has been noticed that the algorithm’s average time-complexity appears to
be O(N2 log N).1. In this paper, we present a probabilistic interpretation of this behavior, supported by experimental results.
The remainder of the paper is organized as follows: Section 2 describes the most relevant work. In Section 3, we briefly
describe the implementation of the RNN-CLINK algorithm, and present a probabilistic analysis of the algorithm’s average-
time complexity. Section 4 reports the experimental results using the randomly generated bit vectors and using the
NETNEWS articles as the input, to support our theoretical analysis. Section 5 offers the conclusion and points out directions
for further research.
2. RELATED WORK

The first complete link algorithm reported in the literature is the CLINK algorithm by Defays.6. The algorithm requires O(N2 )
time and O(N) space, so it is quite efficient. However, its output depends on certain node insertion order to generate the
exact complete link hierarchy. Also, according to the experiments performed by El-Hamdouchi and Willett,7 Defays's
CLINK algorithm gave unsatisfactory results in some information retrieval applications. Voorhees proposed an O(N2 )-space,
O(N3 )-time algorithm18 in which she first computed the distances between N given objects, then sorted them by applying the
bucket-sort algorithm. The result is an ordered list of triplets (i, j, dist) in which i and j are object ids, and dist is the distance
between objects i and j. The algorithm then proceeds to construct the clusters using these sorted distance triplets. Since the
distances between all pairs of the clusters are processed in the descending order, two clusters of size mi and mj can be merged
as soon as the mi mj -th distance triplet of the objects in the respective clusters is reached.

Using priority queues with an O(N2 ) work space, Day and Edelsbrunner,5 and Aichholzer,2 showed that the complete link
hierarchy can be obtained in O(N2 log N) time. By applying the idea of the reciprocal nearest neighbors (RNN) with the
Lance-Williams updating formula,12 an O(N2 )-space and O(N2 )-time algorithm was described by Murtagh.14, 15. In this
algorithm, the initial distance computation between all clusters takes O(N2 ) time, and the distance update after each merge-
operation takes O(N) time. Since there are a total of N − 1 such cluster-merge operations, the total time complexity is O(N2 ).
On the other hand, for special cases when the objects are points in the 2-D plane, and the distance is the Euclidean metric,
Krzanaric and Levcopoulos developed an O(N log2 N)-time clustering algorithm. 11. There is still no general algorithm for the
complete link clustering method that takes less than O(N3 ) time while using only O(N) space.

3. THE RNN-CLINK ALGORITHM

In this section, we first explain the definition and properties of the reciprocal nearest neighbors (RNN), we then present an
implementation of the RNN-CLINK algorithm for the complete link method. Several ideas similar to nearest neighbors have
been studied in the literature. These include the reciprocal nearest neighbor (RNN) technique used in agglomerative
clustering,14, 15 and the mutual nearest neighbor (MNN)9 and pairwise nearest neighbor (PNN)16 techniques in clustering and
other types of application.

For any given connected graph, a nearest neighbor chain (NN-chain, see Figure 1) can be obtained by starting from an
arbitrary node c1 , find the nearest neighbor of c1 , call it c2 = NN(c1 ), then find the nearest neighbor of c2 , call it c3 = NN(c2 ),
etc. It is easy to see that any NN-chain is a subset of a minimum spanning tree (MST). Since the inter-node distances are
monotonically decreasing along the chain, an NN-chain must end in a pair of nodes that are nearest neighbors of each other.

NN NN NN NN NN RNN
C1 → C2 → C3 → C4 → ………… C k − 2 → C k − 1 ↔ Ck

Figure 1. A Nearest Neighbor Chain and Reciprocal Nearest Neighbors (RNN)


the symbol → represents NN; the symbol ↔ represents RNN

The following property of the nearest neighbor chains (NN-chain) in a connected graph with a distance metric is known:14, 15.

• Let d(cx, cy) denote the distance between node cx and node cy. The distances along an NN-chain consisting of nodes c1 ,
c2 , …, cp and cq , are monotonically decreasing, i.e. d(c1 , c2 ) > d(c2 , c3 ) > d(c3 , c4 ) > … > d(cp , cq ) = d(cq , cp ). An NN-
chain must end at some point, and the last pair of the nodes on this chain are nearest neighbors of each other, called
Reciprocal Nearest Neighbors (RNN).
3.1. Implementation of the RNN-CLINK Algorithm

In the following, we briefly describe our implementation of the complete link method based on the RNN technique (Figures 2
and 3). We also explain two speed-up measures of the implementation. The RNN-CLINK algorithm (Figure 2) first
initializes each node as a single-node cluster. In each iteration of the while-loop, every cluster ci computes its nearest
neighbor NN(ci ) (Steps 3-6), and, based on these nearest neighbor results, every cluster creates an NN-chain which terminates
with an RNN pair. The algorithm then merges the two clusters of each RNN pair and updates the cluster structure (steps 7-
12). The while-loop terminates when there is only one cluster left.

Algorithm RNN-CLINK
Input: G = (V, E), a distance metric d(v, w) defined on V × V ; Output: Cluster structure C

1. Initialize: C = { ci | 1 ≤ i ≤ N } = { vi | 1 ≤ i ≤ N } = V
2. While ( |C| > 1) {
3. For all ci ∈ C {
4. COMPUTE-INTER-CLUSTER-DISTANCE(ci )
5. FIND-NN(ci )
6. }
7. For all ci ∈ C {
8. if clusters ci and NN(ci ) form an RNN pair {
9. MERGE-CLUSTER(ci , NN(ci ))
10. SORT-CLUSTER(C, ci , NN(ci ))
11. }
12. }
13.}

Figure 2. Algorithm RNN-CLINK

Algorithm COMPUTE-INTER-CLUSTER-DISTANCE (ci )


Input: G = (V, E), a dis tance metric d(v, w) defined on V × V, cluster structure C
Output: The distances between cluster ci and each of the other clusters

1. For all cj ∈ C – { ci } {

2. δ (ci , cj ) = 0
3. For all vx ∈ cj , vy ∈ cj {

4. if d(vx,vy) > δ (ci , cj ) then δ (ci , cj ) = d(vx,vy)


5. }
6. }

Figure 3. Algorithm COMPUTE-INTER-CLUSTER-DISTANCE


We now analyze the time complexity of the RNN-CLINK algorithm. First, we consider Step 4 of the algorithm which
computes the inter-cluster distances. Assuming the distance between two objects can be computed in O(1) time, then the
initial computation of the distances between N objects takes O(N2 ) time, i.e. N(N − 1)/2 distances. When the objects form
clusters, the distance between two clusters cq and cr is defined as: δ(cq , cr ) = MAX{d(xi , xj ) | xi ∈ cq and xj ∈ cr }. Since RNN-
CLINK computes the minimum δ(cq , cr ) for each cluster cq , i.e., finding cq 's nearest neighbor, and RNN-CLINK operates
under an O(N) space constraint, the distance computation between clusters must be done in a cluster-by-cluster fashion. An
easy way to accomplish this is to maintain an array of node indexes to keep track of the node-cluster relationship. After each
cluster-merge operation, we move the node indexes in this array so that the nodes of the same cluster have their indexes next
to each other. As a result, the distance computation between clusters can be done in O(N2 ) time in each iteration of the loop.

The purpose of the MERGE-CLUSTER function (Step 9 of Figure 2) is to re-label and update the cluster hierarchy, while the
purpose of SORT-CLUSTER (Step 10) is to sort the node indexes based on their cluster number. Thus, the computation time
of each iteration of the while-loop is O(N2 ). Since the algorithm finds at least one RNN pair in each iteration, the number of
iterations is O(N), proving that the overall time complexity of RNN-CLINK is O(N3 ). The space complexity of RNN-CLINK
is O(N), because there are only a constant number of O(N) arrays used in the algorithm.

Two speed-up measures can be incorporated in the implementation of the RNN-CLINK algorithm. First, if cluster ci and
NN(ci ) are not merged in the current iteration of the while-loop, NN(ci ) would still be the nearest neighbor of ci for the next
iteration, thus saving the time to recompute NN(ci ). Second, the computation of NN(ci ) involves keeping track of the value
of a variable min-clust-dist (the minimum cluster-distance found so far) and computing the distance between all pairs of the
clusters, i.e. δ(ci , ck) for 1 ≤ k ≠ i ≤ n. However, while computing an inter-cluster distance δ(ci , ck) can take up to |ci |×|ck|
inter-object distance computations, this process can safely stop once a distance value between two objects, x∈ci and y∈ck , is
found to be greater than min-clust-dist. Although the savings in inter-object computations is not known in general, our
experimental results show that this speed-up measure is a key factor in reducing the execution time of the RNN-CLINK
algorithm from the worst-case O(N3 ) time to close to O(N2 log N) on average.

3.2. Probabilistic Analysis of the RNN Structure

We assume there is a (finite or infinite) universe of points, from which we draw a finite sample S = {x1 , x2 ,..., xn }. These
sample points are used to model the objects and the clusters in the RNN-CLINK algorithm. To make the problem more
tractable, we assume the sample points are chosen with the uniform probability from the universe, and the points are chosen
with replacement, i.e., after point xi is selected, it is returned to the universe before the next point xi+1 is selected. Thus, it is
possible to have xi = xj for i ≠ j in our sample, although the probability of its occurrence will be negligible for the ranges of
the sample sizes we will be considering. The main reason that this replacement model is used is its simplicity in performing
probabilistic analysis. We assume a distance metric is defined between the points of the universe.

Definition 1. A distance metric d(xi , xj ) is defined for any points xi and xj , satisfying the following properties:
1. d(xi , xj ) ≥ 0, and d(xi , xj ) = 0 iff xi = xj .
2. d(xi , xj ) = d(xj , xi ).

Definition 2. A point xj is called a nearest neighbor of point xi , denoted xj = NN(xi ), in a set of n points x1 , x 2 ,..., x n , if d(xi , xj )
≤ d(xi , xk) for all k ≠ i and 1 ≤ k ≤ n.

Note that a point xi may have multiple nearest neighbors when equal distances occur. In that case, we would still use NN(xi )
to denote any of these nearest neighbors of xi , by abuse of notation.

Definition 3. Two points xi and xj are called reciprocal nearest neighbors (RNN's) if xi = NN(xj ) and xj = NN(xi ). That is, if
d(xi , xj ) ≤ d(xi , xk) and d(xi , xj ) ≤ d(xj , xk), for all k, 1 ≤ k ≤ n, k ≠ i and k ≠ j. Such pair of points xi and xj are called an RNN
pair.

Definition 4. A nearest-neighbor chain (NN chain) is a sequence of points xi1 , xi2 ,..., xim such that xik+1 = NN(xik) for 1 ≤ k ≤
(m − 1), and xim−1 = NN(xim ). Thus, an NN chain consists of a sequence of points that end with an RNN pair.
We now state the following fact precisely in terms our notation and terminology.14, 15.

Lemma 1. Let S = {x1 , x2 ,..., xn } be a set of n points chosen from an arbitrary space such that a distance function is defined
between any pair of the points. An NN chain exists starting with any point xi , provided that when forming an NN chain, xi1 ,
xi2 ,…, if xij−1 is one of the nearest neighbors of xij , due to equal distances, the NN chain chooses xij−1 as NN(xij ) and
terminates the chain.

Our key result is the following theorem which relates the RNN structure to certain conditional probability measures.

Theorem 1. For 1 ≤ k ≤ n, let S k ={x1 , x2 ,…., xk} denote a set of k random points chosen from a universe with replacement.
Let Pk denote the probability that an arbitrary point xi is involved in an RNN pair in the set Sk; that is, Pk = Pr[xi = NN(xj) | xj
= NN(xi)]. Then the following properties hold:
(1) The expected number of RNN pairs in S n ={x1 , x2 ,…., xn } is n ⋅ Pn .
2
(2) If Pk ≥ P for 1≤ k ≤ n, where P, 0 < P < 1, is a constant independent of n, then the expected length of an NN chain in Sn
={x1 , x2 ,…., xn } is ≤ 1 .
P2

Proof: (1) Let Vi denote the following random variable


1, if point xi is involved in an RNN pair in Sn
Vi = {
0, ohterwise.
n
Then the expected number of RNN pairs in Sn = 1
2 ∑ E (Vi) , because each RNN pair will be counted twice in the
i =1
n
summation ∑ E (Vi ) . Notice that
i =1
E(Vi) = 1⋅ Pr [Vi = 1] + 0 · Pr[Vi = 0], by definition
= Pr [xi is involved in an RNN pair]
= Pn by definition of Pn
Substituting E(Vi) = Pn into the preceding summation yields the formula of (1).

(2) Let L denote the length of an NN chain. By Lemma 1, the length of L ≤ n – 1.


n −1
Thus, E[L] = ∑ k ⋅ Pr [ L = k ] , by definition of E[L].
k =1
Let xi be the starting node of an NN chain. Notice that Pr[L = 1] = Pr[xi is involved in an RNN pair in Sn] = Pn.
Similarly, Pr[L = 2] = (1 – Pn)⋅Pn−1 , because L = 2 means xi is not involved in an RNN pair, which has a probability 1 −
Pn Also, L = 2 means the node NN(xi), call it xj, is involved in an RNN pair in the node set Sn – {xi}. The probability
for xj to be involved in an RNN pair is Pn−1 , because the node set Sn – {xi} contains (n – 1) random points with
replacement. Thus, Pr[L = 2] = (1 – Pn )⋅Pn−1 because the two events {xi is not involved in an RNN pair} and {xj =
NN(xi) is involved in an RNN pair} are two independent events.

More generally, Pr[L = k] = (1 – Pn ) (1 – Pn−1 ) (1 – Pn−2 ).... (1 – Pn−k+2 ) Pn−k+1 , for 2 ≤ k ≤ n – 1, by the same argument.
Notice that Pr[L = k] ≤ (1 – P)k−1 because each 1 – Pj ≤ 1 – Pn for n – k + 1 ≤ j ≤ n, and Pn−k+1 ≤ 1.
n −1 n −1 ∞
Therefore, E[L] = ∑ k ⋅ Pr [ L = k ] ≤ ∑ k ⋅ (1 − P ) k −1 ≤ ∑ k ⋅ (1 − P ) k −1
k =1 k =1 k =1

= 1 , using the formula
∑ k ⋅ x k −1 = 1 if 0 < |x| < 1.
P2 k =1 (1 − x ) 2
Notice that the proof of the theorem strongly depends on the randomness property of the points in set Sn = {x1 , x2 ,…., xn }.
That is, starting with an arbitrary point xi, the random sample assumption implies that its nearest neighbor xj can be any of the
remaining points with an equal probability; similarly, the nearest neighbor of xj is a random point among the points in Sn –
{xi, xj}. In general, after forming a sequence of k nearest neighbors starting from point xi, he random sample assumption
implies that the remaining set of n – k points still form a random set of n – k points.

We want to apply this random sample model to the analysis of the RNN-CLINK algorithm. Specifically, we want to prove
that the average number of while-loop iterations is O(log N), which would imply the O(N2 log N) average-time complexity of
the algorithm. To this end, it suffices to prove that the average number of RNN pairs in a set of m clusters is ≥ Cm, for some
constant C, throughout the entire looping process of the algorithm. This result follows from Theorem 1, if the conditional
probability of an RNN is ≥ C, and if the randomness property holds among the clusters, throughout the entire looping process
of the algorithm. Our empirical results reported in the next section support this claim, using the randomly generated bit
vectors, and using the news articles available from NETNEWS, as the input objects for clustering. However, we are unable
to prove either the validity of the randomness assumption or a constant lower bound for the conditional probabilities at the
present time.

4. EXPERIMENTAL RESULTS

In this section, we present the experimental results reporting the time complexity of the RNN-CLINK algorithm. We use the
number of inter-object distance computations (NODC) as a measure of the execution time, mainly because this measure is
independent of the computing platform and it provides a good indication of the actual execution time. In the experiments, we
implemented both RNN-CLINK and a well-known single-link clustering algorithm SLINK.17. The NODC of the single-link
algorithm for clustering N objects is simply N(N–1)/2, because the algorithm computes all inter-object distances once and
uses O(N2 ) space to save them.

Our first experiment is based on randomly selected news articles available on the NETNEWS. We created 6 sets of test data,
with the number of news articles ranging from 100 to 600, and the number of news groups ranging from 2 to 10. The news
articles are first filtered through a stoplist and a stemmer, in order to eliminate the insignificant words (e.g., the, on, up) and
to reduce the words to their stems. The distance formula used for measuring the similarity between two articles is the Dice
coefficient with binary term weights.8. For articles Dq and Dr with Tq and Tr terms, respectively, and ϑ qr common terms, the
similarity between the two articles is given by Sim(Dq, Dr) = 2 * ϑ qr / (Tq +Tr). Table 1 presents the statistics comparing the
NODC values of RNN-CLINK and SLINK. The column MOE (measure of effect) reports the Hubert’s Γ statistics which
measures the clustering accuracy using the news group classification as the basis. We notice that for RNN-CLINK, the ratio
of the number of RNN pairs to the number of objects (articles) seems to have a lower bound of 0.25, at least for the first
iteration of the clustering loop. Also, the total number of iterations appears to grow logarithmically.

RNN-CLINK SLINK
No. of
No. of No. of While-loop RNN
News
Articles Terms Iterations pairs NODC's MOE (Γ) NODC's MOE (Γ)
Groups in 1
ST

round
100 2 2975 20 32 43516 0.34 4950 0.07
200 3 4879 21 65 157926 0.65 19900 0.49
300 5 6541 23 86 396724 0.51 44850 0.35
400 7 7587 25 110 737937 0.41 79800 0.14
500 8 8809 27 125 1214330 0.44 124750 0.12
600 10 9706 31 151 1712507 0.41 179700 0.10
* NODC: Number of Distance Computations.
* MOE (Γ): larger absolute values of Γ suggest better clustering performance

Table 1. Experimental results of clustering news articles


Our second experiment uses randomly generated bit-vectors as the input data, where the Hamming distance is used to
measure the dissimilarity between vectors. The dimension of the vectors is fixed at 200, and the number of objects (vectors)
varies from 100 to 1000. In each case, we ran the RNN-CLINK clustering for 100 times; Table 2 reports the average
NODC’s, and the average number of clustering loop iterations, for each of the cases. For the same reason given previously,
the NODC value for SLINK is set to N(N–1)/2. Similar to the results reported in Table 1, the average number of clustering
loop iterations seems to grow logarithmically. Figure 4 plots the NODC values of the RNN-CLINK and SLINK algorithms,
along with the curves of N2 and N2 log N. The figure clearly shows a growth pattern of O(N2 log N) for the average-time
complexity of RNN-CLINK. Therefore, our experimental results using two types of input data suggest that the conclusion of
Theorem 1 concerning the average number of RNN pairs seems valid, even when the assumption on the random sample
points is not necessarily satisfied.

RNN-CLINK SLINK
No. of Nodes
Avg. No. of Avg. NODC's Avg. NODC's
Iterations
100 18 40021 4950
200 23 180678 19900
300 26 431559 44850
400 28 798095 79800
500 30 1281078 124750
600 31 1889095 179700
700 33 2622060 244650
800 34 3478650 319600
900 35 4423560 404550
1000 36 5537472 499500
* NODC: Number of Distance Computations.

Table 2. Experimental results of clustering random bit-vectors

Comparison of Number of Distance Computations


No. of Distance Computations

8000000
7000000
2
N log N
6000000
5000000
RNN-CLINK
4000000
3000000
2000000 N
2

1000000
0 SLINK
100 200 300 400 500 600 700 800 900 1000

Number of Bit-Vectors

Figure 4. Experimental results of clustering random bit-vectors


5. CONCLUSION AND FURTHER RESEARCH

In this paper, we presented an implementation of the O(N)-space clustering algorithm RNN-CLINK using the reciprocal
nearest neighbor technique. We also described two speed-up measures that improve the algorithm’s execution time. We
gave a probabilistic model using random sample points and assumptions about the RNN conditional probabilities to explain
the algorithm’s O(N2 log N) average-time complexity. Our experimental results suggested the validity of these probabilistic
assumptions. For further research, we would like to provide a mathematical proof of these assumptions.

REFERENCES

1. O. Aichholzer, “Clustering the hypercubes,” SFB Report Series 93, TU-Graz, 1996.
2. O. Aichholzer and F. Aurenhammer, “Classifying hyperplanes in hypercubes,” SIAM J. Discrete Math, 9 (2), pp. 225-
232, May 1996.
3. M. R. Anderberg, Cluster Analysis for Applications, Academic Press, New York, 1973.
4. A. Berson and S. J. Smith, Data Warehousing, Data Mining, and OLAP, McGraw-Hill, New York, 1997.
5. W. H. E. Day and H. Edelsbrunner, “Efficient algorithms for agglomerative hierarchical clustering methods,” Journal of
Classification, 1(1), pp. 7-24, 1984.
6. D. Defays, “An efficient algorithm for a complete link method,” The Computer Journal, 20 (4), pp. 364-366, 1977.
7. A. El-Hamdouchi and P. Willett, “Comparison of hierarchical agglomerative clustering methods for document retrieval,”
The Computer Journal, 32 (3), 1989.
8. W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms, Prentice Hall, 1992.
9. K. C. Gowda and G. Krishna, “Agglomerative clustering using the concept of mutual nearest neighborhood,” Pattern
Recognition, 10 (2), pp. 105-112, 1978.
10. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, 1988.
11. D. Krznaric and C. Levcopoulos, “Fast algorithms for complete linkage clustering,” Discrete Comput. Geom. 19, pp.
131-145, 1998.
12. G. H. Lance and W. T. Williams, “A general theory of classificatory sorting strategies: I. Hierarchical systems,” The
Computer Journal 9, pp. 373-380, 1967.
13. L. Laufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to cluster Analysis, John Wiley and Sons,
Inc., New York, 1990.
14. F. Murtagh, “A survey of recent advances in hierarchical clustering algorithms,” The Computer Journal, 26 (4), pp. 354-
359, 1983.
15. F. Murtagh, “Complexities of hierarchic clustering algorithms: State of the art,” Computational Statistics Quarterly, 1,
pp. 101-113, 1984.
16. J. Shanbehzadeh and P. O. Ogunbona, “On the computational complexity of the LBG and PNN algorithms,” IEEE
Trans. on Image Proc., 6 (4), pp. 614-617, 1997.
17. R. Sibson, “SLINK: An optimally efficient algorithm for the single-link cluster method,” The Computer Journal, 16 (1),
pp. 30-45, 1973.
18. E. M. Voorhees, “The effectiveness and efficiency of agglomerative hierarchic clustering in Document Retrieval,” Ph.D.
thesis, Cornell University, 1985.

You might also like