[go: up one dir, main page]

Academia.eduAcademia.edu
Spectral concentration and greedy k-clustering Tamal K. Dey ∗ Alfred Rossi † Anastasios Sidiropoulos ‡ arXiv:1404.1008v3 [cs.DS] 26 Nov 2014 Abstract A popular graph clustering method is to consider the embedding of an input graph into Rk induced by the first k eigenvectors of its Laplacian, and to partition the graph via geometric manipulations on the resulting metric space. Despite the practical success of this methodology, there is limited understanding of several heuristics that follow this framework. We provide theoretical justification for a natural such heuristic some form of which has been previously proposed [BXKS11, NJW01]. Our result can be summarized as follows. We say that a partition of a graph is strong if each cluster has small external conductance, but large internal conductance. A recent result on graph partitioning shows that strong partitions exist for graphs with sufficiently large spectral gap [OT14]. We prove that the eigenvectors cluster around their mean on each such strong partition. Combining our result with a simple greedy algorithm for k-centers gives us the desired spectral partitioning algorithm. We show that for bounded-degree graphs with a sufficiently large gap between the k-th and (k + 1)-th eigenvalue of its Laplacian, this algorithm computes a partition that is close to a strong one. We also show how this greedy algorithm for k-center can be implemented in time O(nk 2 log n) using randomization. Finally, we evaluate our algorithm on some real-world, and synthetic inputs. ∗ Dept. of Computer Science and Engineering, The Ohio State University. Columbus, OH, 43210. tamaldey@cse.ohio-state.edu † Dept. of Computer Science and Engineering, The Ohio State University. Columbus, OH, 43210. rossi.49@osu.edu ‡ Dept. of Computer Science and Engineering, and Dept. of Mathematics, The Ohio State University. Columbus, OH, 43201. sidiropoulos.1@osu.edu 1 1 Introduction Spectral partitioning is a fundamental algorithmic primitive, that has found applications in numerous domains [HK92, NJW01, BS93, PSWB92, CSZ94, BXKS11]. Let G be an undirected n-vertex graph, and let LG = I − D−1/2 AD−1/2 be its normalized Laplacian, where A is the adjacency matrix of G and D is a diagonal matrix with dii equal to the degree of the ith vertex. Let 0 = λ1 ≤ λ2 ≤ . . . ≤ λn be the eigenvalues of LG , and ξ 1 , ξ 2 . . . , ξ n ∈ Rn a corresponding collection of orthonormal eigenvectors. For a subset S ⊂ V , the external conductance and internal conductance are defined to be ϕout (S) := ϕout (S; G) := |E(S, V (G) \ S)| vol(S) and ϕin (S) := min vol(S) S ′ ⊆S,vol(S ′ )≤ 2 ϕout (S ′ ; G[S]), P respectively, where vol(S) = v∈S deg(v), E(X, Y ) denotes the set of edges between X and Y , and G[S] denotes the subgraph of G induced on S. The discrete version of Cheeger’s inequality asserts that a graph admits a bipartition into two sets of small external conductance if and only if λ2 is small [Che70, AM85, Alo86, SJ89, Mih89]. In fact, such a bipartition can be efficiently computed via a simple algorithm that examines ξ 2 . Generalizations of Cheeger’s inequality have been obtained by Lee, Oveis Gharan, and Trevisan [LOT12], and Louis et al. [LRTV12]. They showed that spectral algorithms can be used to find k disjoint subsets, each with small external conductance, provided that λk is small. Even though the clusters given by the above spectral partitioning methods have small external conductance, they are not guaranteed to have large internal conductance. In other words, for a resulting cluster C, the induced graph G[C] might admit further partitioning into sub-clusters of small conductance. Kannan, Vempala and Vetta proposed quantifying the quality of a partition by measuring the internal conductance of clusters [KVV04]. We define a k-partition to be a partition A = {A1 , . . . , Ak } of V (G) into k disjoint subsets. We say that A is (αin , αout )-strong, for some αin , αout ≥ 0, if for all i ∈ {1, . . . , k}, we have ϕin (Ai ) ≥ αin and ϕout (Ai ) ≤ αout . Oveis Gharan and Trevisan [OT14] (see also [Tan11]) showed that, if the gap between λk and λk+1 is large enough, then there exists a partitioning into k clusters, each having small external conductance and large internal conductance. Theorem 1.1 (Oveis Gharan & Trevisan [OT14]). p There exists a universal constant c > 0, 2 such G with λk+1  that for any graph p  (LG ) > ck λk (LG ), there exists a k-partition of G that 3 is Ω(λk+1 (LG )/k), O(k λk (LG )) -strong. The same paper [OT14] also shows how to compute a partitioning with slightly worse quantitative guarantees, using an iterative combinatorial algorithm. 1.1 Our contribution Spectral based k-clustering is widely used in practice because of its effectiveness and simplicity. Despite practical success, its theoretical understanding is limited. For example, k-center clustering in the eigenspace has been considered by Balakrishnan et al. [BXKS11]. They show that for a class of 1 Algorithm: Greedy Spectral k-Clustering Input: Graph G Output: Partition C = {C1 , . . . , Ck } of V (G) Let ξ 1 , . . . , ξ k be the k first eigenvectors of LG .  Let f : V (G) → Rk , where for any u ∈ V (G), f (u) = ξ 1 (u)(degG (u))−1/2 , . . . , ξ k (u)(degG (u))−1/2 . √ √ −1/2 Let R = (dmax − 2k δ)/12k n (δ from Lemma 3.1 and dmax denotes the maximum degree). V0 = V (G) for i = 1, . . . , k − 1 ui = argmaxu∈Vi−1 |ball(f (u), 2R) ∩ f (Vi−1 )| = argmaxu∈Vi−1 |{w ∈ Vi−1 : kf (u) − f (w)k2 ≤ 2R}| Ci = ball(f (ui ), 2R) ∩ Vi−1 Vi = Vi−1 \ Ci C k = Vk Figure 1: The greedy spectral k-clustering algorithm. random graphs sampled from a certain hierarchical distribution, computing an approximate solution to the k-center clustering recovers a provably correct partition with high probability. However, their result holds only for the special case of the particular random graph model. A somewhat similar approach has also been considered by Jordan, Ng and Weiss [NJW01], who used k-means clustering in the eigenspace. 1.2 Greedy k-clusters We present a simple spectral algorithm in Figure 1 which computes a partition provably close to the one guaranteed to exist by Theorem 1.1. Our algorithm consists of a simple greedy clustering procedure performed on the embedding into Rk induced by the first k eigenvectors. Let G be the input graph, with ξ 1 , . . . , ξ k being the first k eigenvectors of its normalized Laplacian LG , which we assume to be orthonormal. Define the embedding f : V (G) → Rk , where for any u ∈ V (G), we have f (u) = ξ 1 (u)(degG (u))−1/2 , . . . , ξ k (u)(degG (u))−1/2 , where degG (u) denotes the degree of u in G. The algorithm iteratively chooses a vertex that has maximum number of vertices within distance R in Rk where R is computed from n, k, and the gap between λk and λk+1 . We treat every such chosen vertex as “center” of a cluster. For successive iterations, all vertices in previously chosen clusters are discarded. We formally describe the process below. Inductively define a partition C = {C1 , . . . , Ck } of V (G) that uses an auxiliary sequence V (G) = V0 ⊇ V1 ⊇ . . . ⊇ Vk . For any i ∈ {1, . . . , k − 1} and a chosen R > 0, we proceed as follows. For any u ∈ Vi−1 , let Ni (u) = ball(f (u), 2R) ∩ f (Vi−1 ) = {w ∈ Vi−1 : kf (u) − f (w)k2 ≤ 2R} and let ui ∈ Vi−1 be a vertex maximizing |Ni (u)|. We set Ci = Ni (ui ), and Vi = Vi−1 \ Ci . Finally, we set Ck = Vk . This completes the definition of the partition C = {C1 , . . . , Ck }. A distance on k-partitions. For two sets Y, Z, their symmetric difference is given by Y △ Z = (Y \ Z) ∪ (Z \ Y ). Let X be a finite set, k ≥ 1, and let A = {A1 , . . . , Ak }, A′ = {A′1 , . . . , A′k } be 2 collections of disjoint subsets of X. Then, we define a distance function between A, A′ , by |A △ A′ | = min σ k X i=1 Ai △A′σ(i) where σ ranges over all bijections σ : {1, . . . , k} → {1, . . . , k}. We are now ready to state our main result. Theorem 1.2 (Spectral partitioning via greedy k-clustering). Let G be a graph with maximum degree dmax , let k ≥ 1, and τ > 0 such that λ3k+1 (LG ) > τ · λk (LG ). Let A be the k-partition of V (G) guaranteed by Theorem 1.1. Then, on input G, the algorithm in Figure 1 outputs a partition C such that d4 k 7 log3 n |A △ C| = O(dmax k 4 + n max ). τ In section 4 we show how the algorithm in Figure 1 can be implemented in time O(nk 2 log n) via random sampling. Also, observe that, one can take the advantage of Voronoi partitions to compute the clusters after the k-centers are computed by our greedy algorithm. This will eliminate the ad-hoc assignment of the kth cluster to the left-over vertices not included in any of the previous (k − 1) clusters chosen by the algorithm. Instead, the algorithm goes ahead with choosing k-centers greedily and then clusters the vertices with a Voronoi partition taking the k-centers as Voronoi sites. The proof that this Voronoi partition also enjoys the similar guarantee as that of our greedy clustering essentially remains the same as that of the proof of Theorem 1.2. Our results can be viewed as providing further theoretical justification for the popular clustering algorithms proposed in [BXKS11] and [NJW01]. 1.3 Overview of our theoretical justification We briefly outline the main ingredients of our approach. It is known that a graph is connected, if and only if λ2 > 0. Cheeger’s inequality can thus be viewed as a robust variant of this property. Namely, a graph has large internal conductance, if and only if λ2 is bounded away from 0. For a graph G with k connected components, it is easy to show that, for any ξ i of the first k eigenvectors, D−1/2 ξ i is constant on every connected component of G. In particular, this implies that in the embedding f : V (G) → Rk induced by the first k eigenvectors, for every connected component C, all vertices in C are mapped to the same point f (C), and f (C) 6= f (C ′ ) for distinct components C, C ′ . Therefore, we can recover the components of G by performing k-center clustering in the eigenspace. Our result can be viewed as a robust variant of the above clustering property. More precisely, we show that if the gap between λk and λk+1 is sufficiently large, then our simple greedy algorithm computes a k-center clustering in the eigenspace that recovers a partition close to the one guaranteed to exist by Theorem 1.1. A main step in our proof is to show that for each one of the first k eigenvectors ξ i , there exists a vector e ξ i that is close to D−1/2 ξ i , and it is constant on each cluster of the desired partition. Using this property, we show that the image of each cluster is concentrated around a center point, and that different cluster centers are sufficiently far apart from each other. Combining these two properties, we obtain the desired guarantee on k-center clustering in the eigenspace. A qualitatively similar concentration result was proven by Kwok et al. [KLL+ 13]. They obtain a vector e ξ i close to D−1/2 ξ i and is constant on 2k + 1 clusters. However, we require that e ξ i is 3 constant on precisely k clusters. As such, their result does not seem directly applicable to our setting. A caveat for the spectral approach. A crucial aspect of our result is that the partition computed by our algorithm is only guaranteed to be close to the strong partition implied by Theorem 1.1. We now elaborate on why such an approximate guarantee might be unavoidable for “natural” spectral clustering algorithms. Essentially all known partitioning algorithms that are based on spectral embeddings, exploit only the fact that the first few eigenvectors of the input graph have small Rayleigh quotient. Indeed, all of these algorithms have the same guarantee if one uses vectors of small Rayleigh quotient instead of true eigenvectors. This is often a desirable property, since it implies that these methods are robust under small perturbations of the graph and the embedding. In this setting, it is easy to construct examples of graphs where introducing perturbations on the spectral embedding of a small fraction of vertices does not change the values of any Rayleigh quotient significantly. Consequently, any known analysis that is based on bounds on the Rayleigh quotient seems insufficient to correctly cluster all vertices. It is easy to construct examples of graphs where the incorrect classification of even a single vertex violates the requirement of a strong clustering. Proving whether a purely spectral method can recover a strong partition exactly remains an interesting open problem. Further related work. There has been a lot of work that seeks to provide theoretical justification for the practical success of spectral clustering methods. It has been shown that for several important classes of graphs of maximum degree dmax , such as planar [ST96b, ST96a], surfaceembedded [Kel06], and more generally minor-free graphs [BLR10], λ2 = O(dmax /n). This implies in particular that a simple spectral partitioning algorithm can be used to compute balanced separa√ tors of size O( n) is such graphs of bounded degree. Bounds on λk for minor-free graphs have also been obtained by Kelner et al. [KLPT11]. We also remark that an improved version of Cheeger’s inequality has been obtained by Kwok et al. [KLL+ 13] for graphs with large λk . We note that Lee et al. [LOT12] have shown that assuming there is a gap between λ(1−δ)k and λk for some δ > 0, one can obtain a k-partitioning into sets of small external conductance via geometric considerations on the eigenspace. Their partitioning procedure is different than our k-center algorithm, and their result is incomparable to the one given here. It is an interesting open problem whether their techniques can be used to obtain better quantitative bounds for analyzing k-center in the eigenspace. Organization. The remainder of the paper is organized as follows. In Section 2 we prove that any of the first k eigenvectors multiplied with the matrix D−1/2 can be approximated by a vector that is constant on each cluster. Using this concentration result, in Section 3 we show that the greedy k-center partition in the eigenspace gives a partition close to the one guaranteed by Theorem 1.1. Section 4.1 presents a simple randomized version of the greedy algorithm that runs in near-linear time. Finally, Section 4.2 contains an experimental evaluation of our algorithm. 4 2 Spectral concentration In this section, we prove that D−1/2 ξ i for any of the first k eigenvectors ξ i is close (with respect to the ℓ2 norm) to some vector e ξ i , such that e ξ i is constant on each cluster. It will be convenient to prove this property for an arbitrary vector in the span of the k vectors D−1/2 ξ 1 , . . . , D−1/2 ξ k . Theorem 2.1. Let G be a graph of maximum degree dmax , and let k ≥ 1, satisfying the condition of Theorem 1.1. Suppose further that λ3k+1 (LG ) > τ · λk (LG ), for some τ > 0. Let A = {A1 , . . . , Ak } be the k-partitioning of G given by Theorem 1.1. Let ξ 1 , . . . , ξ k ∈ Rn be the first k eigenvectors of e ∈ Rn , such that LG , and let x ∈ span(D−1/2 ξ 1 , . . . , D−1/2 ξ k ) be a unit vector. Then, there exists x ek22 ≤ 1/n + c′ · (i) kx − x d3max k3 log3 n , τ for some universal constant c′ > 0, and e is constant on Ai , i.e. for any u, v ∈ Ai , we have x e(u) = x e(v). (ii) for any i ∈ {1, . . . , k}, x Before laying out the proof, we provide some explanation of the statement of the theorem. First, note that, one can take x = D−1/2 ξ i for any i ∈ [1, k] and thus the result holds for each e is constructed eigenvector multiplied with D−1/2 . Second, the partition-wise uniform vector x by taking the mean of the values of x on each partition. This, according to (i), means that x assumes values in each partition close to their mean. The ℓ2 -distance between x and its uniform e has two terms, the first one is relatively small for large n whereas the second approximation x is more complicated involving several factors though its inverse dependence on τ representing the spectral gap is noteworthy. In summary, if there is a sufficiently large gap between λk and λk+1 , the values taken by the vector x have k prominent modes over k partitions. Let us briefly give some high-level intuition behind our proof. Consider some vector x in the e obtained by setting the value on each cluster span of the k vectors D−1/2 ξ 1 , . . . , D−1/2 ξ k , and let x to be equal to its mean. Suppose, for the sake of contradiction, that kx − x̃k2 is large. Roughly speaking, this means that there must exist a cluster Ai , such that values of x are not concentrated around their mean. Using this property, we can find two large disjoint subsets X, X ′ ⊂ Ai , such that x assigns values much smaller than the mean to vertices in X, and much larger than the mean to vertices in X ′ . Since the cluster Ai has high internal conductance, we can find many edge-disjoint paths between X and X ′ . This implies that in the embedding into R1 induced by x, each such path P must be “stretched” by a large factor; that is, the end-points of P are far away in R1 . By choosing X and X ′ carefully, we can conclude that the Rayleigh quotient of x must be large, which contradicts the fact that λk is small. Proof. We have kxk2 = 1 by assumption. Recall that the k-partition A given by Theorem 1.1 is (ϕin , ϕout )-strong, where ϕin ≥ cOT · λk+1 (LG )/k, for some universal constant cOT > 0, and p ϕout = O(k 3 λk (LG )). For any i ∈ {1, . . . , k}, let 1 X x(u). αi = |Ai | u∈Ai e ∈ Rn , such that for any u ∈ Ai we have x e(u) = αi . It suffices to show that x e Define the vector x satisfies the assertion. 5 Figure 2: Bucketing with αi shifted to 0 Let β = n−4 . Consider partitioning Ai into buckets as shown in Figure 2. Formally, for any i ∈ {1, . . . , k}, and for any j ∈ Z, let   {u ∈ Ai : x(u) − αi ∈ β · [−2−j , −2−j−1 )} if j < 0 {u ∈ Ai : x(u) − αi ∈ β · [−1, 1)} if j = 0 Ai,j =  {u ∈ Ai : x(u) − αi ∈ β · [2j−1 , 2j )} if j > 0 We first argue that for any i ∈ {1, . . . , k}, and for any j ∈ Z, with |j| > 10 log n, we have Ai,j = ∅. (1) To see that, suppose for the sake of contradiction that there exists a non-empty Ai,j ∗ , for some j ∗ ∈ Z, with |j ∗ | > 10 log n. Then, X x2 (u) ≥ n−4 210 log n > n > 1, kxk22 ≥ u∈Ai,j ∗ which contradicts the assumption kxk2 = 1, and thus establishing (1). Let     X X 1 (x(u) − αi )2 < A1 = Ai ∈ A : for all j 6= 0, (x(u) − αi )2 ,   40 log n u∈Ai,j u∈Ai and A2 = A \ A 1 . Consider first some Ai ∈ A1 . By (1) we have X u∈Ai (x(u) − αi )2 < 2 X u∈Ai,0 (x(u) − αi )2 < 2nβ 2 < 1/n2 . (2) Next, consider some Ai ∈ A2 . By the definition of A2 , there exists some j ∗ 6= 0, such that X u∈Ai,j ∗ (x(u) − αi )2 ≥ X 1 (x(u) − αi )2 . 40 log n (3) u∈Ai Pick some j ∗ 6= 0 satisfying (3), and maximizing |j ∗ |. Assume w.l.o.g. that j ∗ > 0 (the case j ∗ < 0 is symmetric). Let Z = {u ∈ Ai : x(u) ≤ αi }. We first establish a lower bound on |Z|. By the choice of j ∗ , we have that for any j < −j ∗ , |Ai,j | ≤ 4 · 6 |Ai,j ∗ | . 4|j+j ∗ | (4) By the definition of αi , we have X j≤0 ∗ |Z ∩ Ai,j | · 2−j ≥ |Ai,j ∗ | · 2j . (5) P P ∗ By (5) & (4) we have j∈{−j ∗ −2,...,0} |Ai,j ∩ Z| · 2−j ≥ |Ai,j ∗ | · 2j −1 , and thus j∈{−j ∗ −2,...,0} |Ai,j ∩ Z| ≥ 14 |Ai,j ∗ |, which implies 1 |Z| ≥ |Ai,j ∗ |. 4 (6) Let (S, Ai \ S) be a minimum cut in G[Ai ], separating Ai,j ∗ from Z, i.e. with Ai,j ∗ ⊆ S, and Z ⊆ Ai \ S. We have |E(S, Ai \ S)| ≥ ϕin · min{|Ai,j ∗ |, |Z|}. (7) |E(S, Ai \ S)| ≥ ϕin · |Ai,j ∗ |/4. (8) By (6) & (7) we obtain By (8) and the max-flow/min-cut theorem, we obtain that there exists a collection P of edgedisjoint paths in G[Ai ], such that every P ∈ P has one endpoint in Ai,j ∗ and one endpoint in Z, satisfying |P| ≥ ϕin · |Ai,j ∗ |/4. (9) |Ai,j ∗ −1 | ≤ 160 · log n · |Ai,j ∗ |. (10) By (3), we have that Since the paths in P are edge-disjoint, it follows that if we pick a path P ∈ P uniformly at random, the expected number of vertices in Ai,j ∗ −1 that are visited by P , is at most |Ai,j ∗ −1 | · dmax /|P|. By averaging, there exists a sub-collection of paths P ′ ⊆ P, with |P ′ | ≥ |P|/2, and such that any path P ∈ P ′ visits at most 2|Ai,j ∗ −1 | · dmax /|P| vertices in Ai,j ∗ −1 . Consider some path P ∈ P ′ , and let ∗ P = p1 , . . . , pt be the sequence of vertices visited by P . Observe that x(p1 ) ≥ αi + β · 2j −1 , which is on the right of Ai,j ∗ −1 , and x(pt ) ≤ αi , which is on the left of Ai,j ∗ −1 . It follows that there exists an edge {u, u′ } ∈ E(P ), such that |P| 2 · |Ai,j ∗ −1 | · dmax ϕin · |Ai,j ∗ |/4 ∗ ≥ β2j −3 32 · dmax · 10 · log n · |Ai,j ∗ | |x(u) − x(u′ )| ≥ β · 2j ∗ −2 · ∗ β · 2j ϕin . ≥ 10240 · dmax · log n 7 Therefore, X ′ {u,u′ }∈E(G[A i ]) 2 ′ (x(u) − x(u )) ≥ |P | ·  ∗ β · 2j ϕin 10240 · dmax · log n 2 ϕ3in ∗ · |Ai,j ∗ | · (β · 2j )2 2 22 2 2 2 · 10 · dmax · log n X ϕ3in ≥ 24 (x(u) − αi )2 · 2 2 · 102 · d2max · log n u∈A ∗ ≥ i,j ≥ ϕ3in 103 · d2max 226 · · log3 n · X u∈Ai (x(u) − αi )2 . (11) Since x ∈ span(D−1/2 ξ 1 , . . . , D−1/2 ξ k ), by the Courant-Fischer Theorem we obtain X 1 (x(u) − x(u′ ))2 λk (LG ) ≥ dmax ′ {u,u }∈E(G) 1 X X (x(u) − x(u′ ))2 ≥ dmax ≥ X X ϕ3in (x(u) − αi )2 . 226 · 103 · d3max · log3 n A ∈A u∈A A∈A2 {u,u′ }∈E(G[A]) 2 i i Therefore, X X Ai ∈A2 u∈Ai (x(u) − αi )2 ≤ λk (LG ) · 226 · 103 · d3max · log3 n ϕ3in ≤ λk (LG ) · 226 · 103 · d3max · log3 n c3OT · λ3k+1 (LG )/k 3 ≤ 226 · 103 · d3max · log3 n · k 3 c3OT · τ < c′ · for some universal constant c′ > 0. Combining (2) & (12) we obtain X X ek22 = (x(u) − αi )2 kx − x Ai ∈A u∈Ai  ≤ X X Ai ∈A1 u∈Ai d3max · log3 n · k 3 , τ  (x(u) − αi )2  +  d3max X X Ai ∈A2 u∈Ai 3 k3 · log n · τ 3 3 d · log n · k3 ≤ 1/n + c′ · max , τ ≤ k/n2 + c′ ·  as required. 8 (12)  (x(u) − αi )2  3 From spectral concentration to spectral clustering In this section we prove Theorem 1.2. We begin by showing that in the embedding induced by the k vectors D−1/2 ξ 1 , . . . , D−1/2 ξ k , most of the clusters from the Oveis Gharan-Trevisan partition are concentrated around center points in Rk which are sufficiently far apart from each other. Lemma 3.1. Let G be a graph of maximum degree dmax , and let k ≥ 1. Suppose that λ3k+1 (LG ) > τ · λk (LG ), where τ > 32c′ d4max k 5 log3 n and c′ > 0 is the universal constant given by Theo3 3 3 rem 2.1. Let δ = 1/n + c′ · dmax k τ log n and A = {A1 , . . . , Ak } be the k-partition of G given by Theorem 1.1. Let ξ 1 , . . . , ξ n be the eigenvectors of LG , and let f : V (G) → Rk be the spectral embedding of G induced by the first k eigenvectors. That is, for any u ∈ V (G), f (u) = √ √ −1/2 (ξ 1 (u)(degG (u))−1/2 , . . . , ξ k (u)(degG (u))−1/2 ). Let R = (dmax − 2k δ)/13k n. Then R > 0. Moreover, there exists a k-partitioning A′ = {A′1 , . . . , A′k } of G, and p1 , . . . , pk ∈ Rk , such that the following conditions are satisfied: 4 6 (i) |A △ A′ | = O(dmax k 3 + n dmax k τ log 3 n ). (ii) For any i ∈ {1, . . . , k}, A′i ⊂ ball(pi , R). (iii) For any i 6= j ∈ {1, . . . , k}, kpi − pj k2 ≥ 6R. Proof. Let ξ i , i = 1, . . . , k be the eigenvectors of LG . For any i ∈ {1, . . . , k}, let ζ i = D−1/2 ξi , and e be its approximation with the average value in each of the k clusters, that is let ζ i e = (αi1 , . . . , αi1 , αi2 , . . . , αi2 , , . . . , αik , . . . , αik ), ζ i where for any i, j ∈ {1, . . . , k}, αij = 1 X ζ i (u). |Aj | u∈Aj e as illustrated below. e be k × n matrices where Φrow(i) = ζ i and Φ e row(i) = ζ Let Φ and Φ i ζ 1: ζ 2: ζ 1 (u1 ) ζ 2 (u1 ) ζ 1 (u2 ) ζ 2 (u2 ) ··· ··· .. . ζ 1 (un ) ζ 2 (un ) e : ζ 1 e : ζ ζk: ζ k (u1 ) ζ k (u2 ) ··· ζ k (un ) e : ζ k 2 p1 α11 · · · α11 α21 · · · α21 ··· ··· ··· .. . pk α1k · · · α1k α2k · · · α2k αk1 · · · αk1 ··· αkk · · · αkk For any i ∈ {1, . . . , k}, let pi = (α1i , α2i , . . . , αki ), that is, pi is any of the columns in the block e of Φ that corresponds to the ith cluster Ai . 3 3 n·k3 , where Our goal is to show that kpi − pj k2 is large for i 6= j. Writing δ = 1/n + c′ · dmax ·log τ ′ c > 0 is the universal constant given by Theorem 2.1, we have (by Theorem 2.1), k X X i=1 u∈V (G) e (u))2 = (ζ i (u) − ζ i 9 k X i=1 e k2 ≤ k · δ. kζ i − ζ i 2 (13) p Let R = γkδ/n, for some γ > 0 to be determined later. In particular, R > 0. Consider the embedding f (u) = (ζ 1 (u), . . . , ζ k (u)) of a vertex u. We define Xoutliers = {u ∈ V (G) : u ∈ Ai for some i ∈ {1, . . . , k}, and kf (u) − pi k2 > R} By (13) and definition of R, we have |Xoutliers | < n/γ (14) Now we show that for any i 6= j ∈ {1, . . . , k}, we have kpi − pj k2 ≥ 6R. (15) Suppose that, to the contrary, there exist i 6= j ∈ {1, . . . , k} so that kpi − pj k22 ≤ 36R2 . Define a e except all columns corresponding to Ai have been replaced with matrix Φ̂ which is identical to Φ pj . Observe that the column rank of Φ̂ is at most k − 1 because at most k − 1 columns remain e which already independent after we replace the columns corresponding to Ai with that of Aj in Φ had a column rank at most k. Therefore, rank(Φ̂) ≤ k − 1. (16) We will next show that the matrix Φ̂ cannot have rank less than k, reaching a contradiction with the above conclusion (16). e and its modified version ζ̂ in the new matrix Φ̂. Observe Let us now look at any row ζ i i e by at most that each element in a row vector ζ̂ i may differ from the corresponding element in ζ i 2 6R because the square of the column vector norm changed at most by 36R . Therefore, for any i ∈ {1, . . . , k}, we have √ √ e k2 + kζ e − ζ̂ k2 ≤ δ + 6 nR. kζ i − ζ̂ i k2 ≤ kζ i − ζ (17) i i i Let Ψ be an n × n matrix of rank n obtained by adding n − k orthogonal unit row vectors to the matrix Φ. Such a matrix Ψ always exists since Φ has rank k. Let also Ψ̂ be the n × n matrix obtained by adding this same set of row vectors to Φ̂. We show that this modified Ψ̂ has rank n, which implies that Φ̂ has rank k, contradicting (16). Let P be the n-dimensional rectangular parallelepiped spanned by the row vectors of Ψ. Let P̂ be the parallelepiped spanned by the row vectors of Ψ̂. Let V (P ), and V (P̂ ) be the sets of vertices of P , and P̂ , respectively. The vertices of P√ and P̂ are in a correspondence. By (17), each row √ vector of Ψ is at distance at most 6 nR + δ from the corresponding row of Ψ̂. Since Ψ and Ψ̂ differ in at most k row vectors, and every vertex of P (resp. P̂ ) is the sum of a subset of row vectors of Ψ (resp. Ψ̂), it follows that the distance between every vertex q of P , and the corresponding vertex q̂ of P̂ , is at most √ √ kq − q̂k2 ≤ 6k nR + k δ. −1/2 Every side of P has length at least dmax . Therefore, there exists an n-dimensional rectangular √ √ −1/2 −1/2 parallelepiped C ⊆ P̂ , of side length dmax − 12k nR − 2k δ. The volume of C is (dmax − √ √ √ √ −1/2 12k nR − 2k δ)n , which is positive provided that R < |dmax − 2k δ|/(12 nk). Therefore, if √ √ −1/2 R = |dmax − 2k δ|/(13 nk), the parallelepiped P̂ has positive volume, and hence the matrix Φ̂ is non-singular. 10 √ −1/2 By setting γ = (dmax − 2k δ)2 /(169k 3 δ), we get √ √ −1/2 R = |dmax − 2k δ|/13k n. Thus, for this choice of R, we obtain that Φ̂ has rank k, which yields a contradiction. We have thus established (15). We next define a collection A′ = {A′1 , . . . , A′k } of subsets of V (G). For any i ∈ {1, . . . , k}, let A′i = {u ∈ V (G) : kf (u) − pi k2 ≤ R}. By (15) it follows that the clusters A′1 , . . . , A′k are pairwise disjoint. Thus, by (14) we obtain |A △ A′ | < n/γ. (18) √ −1/2 −1/2 Since τ > 32c′ d4max k 5 log3 n, it follows that dmax − 2k δ > dmax /2. In particular, R = √ √ −1/2 (dmax − 2k δ)/13k n. By (18), we therefore have |A △ A′ | < n/γ = n 676c′ d4max k 6 log3 n 169k 3 δ 3 , ≤ 676d k + n √ max −1/2 τ (dmax − 2k δ)2 concluding the proof. We are now ready to prove Theorem 1.2. Proof of Theorem 1.2. If τ = O(d4max k 5 log3 n), then the assertion is vacuously true. We may therefore assume that τ > c′ d4max k 5 log3 n, for some universal constant c′ > 0, which satisfies the condition of Lemma 3.1. Let A, A′ , f , R, and p1 , . . . , pk be as in Lemma 3.1. Let ε′ = |A △ A′ |/n = 3 4 6 k3 + dmax k τ log n ). Let C = {C1 , . . . , Ck } be the ordered collection of pairwise disjoint subsets O( dmax n of V (G) output by the greedy spectral k-clustering algorithm in Figure 1. The set of vertices ′ not covered by Sany of the  clusters in A plays a special role in our argument which we denote as ′ ′ ′ W = V (G) \ A′ ∈A′ Ai . Clearly, |W | ≤ |A △ A | ≤ ε n. i We say that a cluster A′i is touched if the algorithm outputs a cluster Cj ∈ C with Cj ∩ A′i 6= ∅. Since the clusters in C cover all vertices in V (G), every cluster in A′ is touched by some cluster in C. For a cluster A′i , let Cρ(i) be the cluster in C that touches A′i for the first time in the algorithm. Let I be a maximal subset of {1, . . . , k} such that the restriction of ρ on I is a bijection. Let i∗ = |I|. By permuting the indices of the clusters in A′ , we may assume w.l.o.g. that I = {1, . . . , i∗ }. In particular, if i∗ = k, then ρ is a bijection between {1, . . . , k} and {1, . . . , k}. If i∗ < k, we claim that the clusters {A′i | i = i∗ + 1, . . . , k} are all mapped to Ck , that is, ∗ ρ(i + 1) = . . . = ρ(k) = k. This is because the cluster Cρ(i) 6= Ck can intersect at most one cluster in A′ because every cluster in A′ is contained inside some ball of radius R, the distance between any two centers of such balls is at least 6R, and each Cρ(i) 6= Ck is contained inside some ball of radius 2R. Also, observe that |A′i \ Cρ(i) | ≤ ε′ n for every cluster A′i . This is certainly true if ρ(i) = k because then Ck contains A′i completely. When ρ(i) 6= k, Cρ(i) cannot intersect any other cluster in A′ and it can get at most ε′ n vertices from W . If A′i \ Cρ(i) had more than ε′ n vertices, the algorithm could have made a better choice by taking the entire A′i while computing Cρ(i) . Such a choice can be made by taking Cρ(i) to be all the yet unclustered points that are inside a ball of 11 radius 2R centered at any point in A′i ; since A′i is contained inside a ball of radius R, it follows by the triangle inequality that A′i will be contained inside Cρ(i) . Next, we observe that when ρ is not a complete bijection, that is, i∗ < k, a cluster A′i with ρ(i) = k can have at most 2ε′ n vertices. Suppose not. As ρ is not a complete bijection, we have ρ(i∗ + 1) = . . . = ρ(k) = k as we argued above. This means that there is a cluster Cj with j < k, which does not intersect any cluster in A′ for the first time. Then, it has the only option of intersecting a cluster in A′ beyond the first time and/or intersect W . Since |A′i \ Cρ(i) | ≤ ε′ n for all i, Cj can have at most ε′ n + |W | ≤ 2ε′ n vertices. But, the algorithm could have made a better choice by selecting A′i while computing Cj because |A′i | > 2ε′ n by our assumption. We reach a contradiction. in A′ other than Since for every i < k, |A′i \ Cρ(i) | ≤ ε′ n and Cρ(i) cannot intersect any cluster P A′i , we have |A′i △ Cρ(i) | ≤ ε′ n + |W | ≤ 2ε′ n. We obtain that, if i∗ < k, we have i≤i∗ |A′i △ Cρ(i) | ≤ 2kε′ n. If i∗ = k, Ck contains A′i∗ entirely and may intersect other clusters in A′ for the second time and beyond. Then, we have |A′i∗ △ Ck | = kε′ n + |W | ≤ (k + 1)ε′ n. Using the bijectivity of ρ on the set {1, . . . , i∗ }, we have X |A| ≤ 3kε′ n + 2kε′ n + 2ε′ n ≤ 7kε′ n. |A′ △ C| ≤ 2kε′ n + (k + 1)ε′ n + |W | + A∈A′ \{A1 ,...,A′i∗ } We conclude that |A △ C| ≤ |A △ A′ | + |A′ △ C| < ε′ n + |A′ △ C| ≤ 8kε′ n = O(dmax k 4 + n d4max k 7 log3 n ) τ as required. 4 Implementation in practice The greedy algorithm in Figure 1 iterates over all n vertices to determine the best center. In practice, n can be much larger than k. So, instead of iterating over all n vertices, it is better to iterate over a number of vertices that primarily depends on k. 4.1 A faster algorithm In the algorithm from the previous section, in every iteration i ∈ {1, . . . , k}, we compute the value |Ni (u)| for all u ∈ Vi . We can speed up the algorithm by computing |Ni (u)| only for a randomly chosen subset of Vi , of size about Θ(k log n). This results in a faster randomized algorithm that runs in O(k 2 log n) time. It is summarized in Figure 3. A statement similar to Theorem 1.2 is proved in Theorem 4.1. Theorem 4.1. Let G be a graph with maximum degree dmax , let k ≥ 1, and τ > 0 such that λ3k+1 (LG ) > τ · λk (LG ). Let A be the k-partition of V (G) guaranteed by Theorem 1.1. Then, on input G, the algorithm in Figure 3 outputs a partition C such that, with high probability, |A △ C| = O(dmax k 4 + n 12 d4max k 7 log3 n ). τ Algorithm: Fast Spectral k-Clustering Input: Graph G Output: Partition C = {C1 , . . . , Ck } of V (G) Let ξ 1 , . . . , ξ k be the k first eigenvectors of G.  Let f : V (G) → Rk , where for any u ∈ V (G), f (u) = ξ 1 (u)(degG (u))−1/2 , . . . , ξ k (u)(degG (u))−1/2 . √ √ −1/2 Let R = (dmax − 2k δ)/12k n (δ from Lemma 3.1 and dmax denotes the maximum degree). V0 = V (G) for i = 1, . . . , k − 1 Sample uniformly at random, and with repetition, a subset Ui−1 ⊆ Vi−1 , |Ui−1 | = Θ(k log n). ui = argmaxu∈Ui−1 |ball(f (u), 2R) ∩ f (Vi−1 )| = argmaxu∈Ui−1 |{w ∈ Vi−1 : kf (u) − f (w)k2 ≤ 2R}| Ci = ball(f (ui ), 2R) ∩ Vi−1 Vi = Vi−1 \ Ci C k = Vk Figure 3: A faster spectral k-clustering algorithm. Proof. Let A′ = {A′1 , . . . , Ak } be the collection of pairwise disjoint subsets of V (G), ρ : {1, . . . , k} → {1, . . . , k}, and ε′ as in the proof of Theorem 1.2. Since each set Ui is sampled uniformly at random, it follows that with high probability the following holds: for each i ∈ {1, . . . , k}, if |Ai | ≥ 2ε′ n, then ρ(i) 6= k. The rest of the argument is identical to the proof of Theorem 1.2. 4.2 Experimental evaluation Results from our spectral k-clustering implementation are shown in Figure 4. Cluster assignments for graphs are shown as colorized nodes.1 In the case where the graph comes from a triangulated surface, we have extended the coloring to a small surface patch in the vicinity of the node. Each experiment includes a plot of the eigenvalues of the normalized Laplacian. A small rectangle on each plot highlights the corresponding spectral gap between k and k + 1. The first row shows a partitioning of a graph with vertices on five subsets, depicted as circles. Each subset is a random graph constructed by adding a large number of edges to a cycle. Additional edges are added randomly between cycles. By varying the relative edge densities we are able to produce graphs which have several large jumps in the spectrum. Here we obtain clusterings for k = 2 (left) and k = 5 (right) which coincide with the two prominent spectral gaps. In the second row, we show examples where the input graph consists of the 1-skeleton of a 3D model. This graph has three components: two small ball-like surfaces and a larger component which resembles union of three intersecting balls. The model surface is constricted at the interfaces between the balls, forming necks of varying sizes. Here, clusterings for k = 4 and k = 5 split the larger component along these interfaces, consistent with what is expected from spectral geometry. We demonstrate this effect once more in the third row with a clustering of a symmetric model for k = 8. The noisy, nested rings in the third row do not have a clear spectral gap. They partition well only when k is chosen appropriately, which we took to be 2. We remark that the spectral gap in the above examples is generally smaller than the requirement 1 It may be beneficial to view the results in color on a high resolution display 13 in our Theorems. However, our spectral clustering algorithm seems to produce meaningful results even in such examples. This suggests that stronger theoretical guarantees might be obtainable. We believe this is an interesting open problem. 14 Figure 4: Experimental results. 15 References [Alo86] Noga Alon. Eigenvalues and expanders. Combinatorica, 6(2):83–96, 1986. [AM85] Noga Alon and V. D. Milman. λ1 , isoperimetric inequalities for graphs, and superconcentrators. J. Comb. Theory, Ser. B, 38(1):73–88, 1985. [BLR10] Punyashloka Biswal, James R. Lee, and Satish Rao. Eigenvalue bounds, spectral partitioning, and metrical deformations via flows. J. ACM, 57(3), 2010. [BS93] Stephen T. Barnard and Horst D. Simon. A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. In PPSC, pages 711–718, 1993. [BXKS11] Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, and Aarti Singh. Noise thresholds for spectral clustering. In NIPS, pages 954–962, 2011. [Che70] Jeff Cheeger. A lower bound for the smallest eigenvalue of the Laplacian. In Problems in Analysis (Papers dedicated to Salomon Bochner, 1969), pages 195–199. Princeton Univ. Press, Princeton, NJ, 1970. [CSZ94] Pak K. Chan, Martine D. F. Schlag, and Jason Y. Zien. Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems, 13(9):1088–1096, 1994. [HK92] Lars W. Hagen and Andrew B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems, 11(9):1074– 1085, 1992. [Kel06] Jonathan A. Kelner. Spectral partitioning, eigenvalue bounds, and circle packings for graphs of bounded genus. SIAM J. Comput., 35(4):882–902, 2006. [KLL+ 13] Tsz Chiu Kwok, Lap Chi Lau, Yin Tat Lee, Shayan Oveis Gharan, and Luca Trevisan. Improved cheeger’s inequality: analysis of spectral partitioning algorithms through higher order spectral gap. In STOC, pages 11–20, 2013. [KLPT11] Jonathan A Kelner, James R Lee, Gregory N Price, and Shang-Hua Teng. Metric Uniformization and Spectral Bounds for Graphs. GAFA Geometric And Functional Analysis, 21(5):1117–1143, August 2011. [KVV04] Ravi Kannan, Santosh Vempala, and Adrian Vetta. On clusterings: Good, bad and spectral. J. ACM, 51(3):497–515, 2004. [LOT12] James R. Lee, Shayan Oveis Gharan, and Luca Trevisan. Multi-way spectral partitioning and higher-order cheeger inequalities. In STOC, pages 1117–1130, 2012. [LRTV12] Anand Louis, Prasad Raghavendra, Prasad Tetali, and Santosh Vempala. Many sparse cuts via higher eigenvalues. In STOC, pages 1131–1140, 2012. [Mih89] Milena Mihail. Conductance and convergence of markov chains-a combinatorial treatment of expanders. In FOCS, pages 526–531, 1989. 16 [NJW01] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849–856, 2001. [OT14] Shayan Oveis Gharan and Luca Trevisan. Partitioning into expanders. In SODA, pages 1256–1266, 2014. [PSWB92] Alex Pothen, Horst D. Simon, Lie Wang, and Stephen T. Barnard. Towards a fast implementation of spectral nested dissection. In SC, pages 42–51, 1992. [SJ89] Alistair Sinclair and Mark Jerrum. Approximate counting, uniform generation and rapidly mixing markov chains. Inf. Comput., 82(1):93–133, 1989. [ST96a] Daniel A. Spielman and Shang-Hua Teng. Disk packings and planar separators. In Symposium on Computational Geometry, pages 349–358, 1996. [ST96b] Daniel A. Spielman and Shang-Hua Teng. Spectral partitioning works: Planar graphs and finite element meshes. In FOCS, pages 96–105, 1996. [Tan11] Mamoru Tanaka. Multi-way expansion constants and partitions of a graph. arXiv.org, December 2011. Acknowledgment The authors wish to thank James R. Lee for bringing to their attention a result from the latest version of [LOT12]. This work was partially supported by the NSF grants CCF 1318595 and CCF 1423230. 17