The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtai... more The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time $O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|)$ , where r is the number of pairs (v ∈ F, w ∈ G) such that v and w have the same label. Our second algorithm runs in time $O(L r \lg r \cdot \lg\lg|G|)$ , where L is the size of the LCS of F and G. For this algorithm we present a novel three dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an $O(r h \lg \lg|G|)$ time algorithm, where h = height(F) + height(G).
The following probabilistic process models the generation of noisy clustering data: Clusters corr... more The following probabilistic process models the generation of noisy clustering data: Clusters correspond to disjoint sets of vertices in a graph. Each two vertices from the same set are connected by an edge with probability p, and each two vertices from different sets are connected by an edge with probability r < p. The goal of the clustering problem is to reconstruct the clusters from the graph. We give algorithms that solve this problem with high probability. Compared to previous studies, our algorithms have lower time complexity and wider parameter range of applicability. In particular, our algorithms can handle O(√n/ log n) clusters in an n-vertex graph, while all previous algorithms require that the number of clusters is constant.
We study the subtree isomorphism problem: Given trees H and G, find a subtree of G which is isomo... more We study the subtree isomorphism problem: Given trees H and G, find a subtree of G which is isomorphic to H or decide that there is no such subtree. We give an O([k1.8/log k] n) time algorithm for this problem, where k and n are the number of vertices in H and G respectively. This improves over the O(k1.5n) algorithms of Chung (1987) and Matula (1978). We also give a randomized (Las Vegas) O(min(k1.45n, kn1.43))-time algorithm for the decision problem
In a clustering problem one has to partition a set of elements into homogeneous and well-separate... more In a clustering problem one has to partition a set of elements into homogeneous and well-separated subsets. From a graph theoretic point of view, a cluster graph is a vertex-disjoint union of cliques. The clustering problem is the task of making fewest changes to the edge set of an input graph so that it becomes a cluster graph. We study the complexity of three variants of the problem. In the Cluster Completion variant edges can only be added. In Cluster Deletion, edges can only be deleted. In Cluster Editing, both edge additions and edge deletions are allowed. We also study these variants when the desired solution must contain a prespecified number of clusters. We show that Cluster Editing is NP-complete, Cluster Deletion is NPhard to approximate to within some constant factor, and Cluster Completion is polynomial. When the desired solution must contain exactly p clusters, we show that Cluster Editing is NP-complete for every p≥ 2; Cluster Deletion is polynomial for p = 2 but NP-complete for p> 2; and Cluster Completion is polynomial for any p. We also give a constant factor approximation algorithm for Cluster Editing when p = 2.
The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementa... more The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we give, what is to our knowledge, the first inherently multi-dimensional definition of LCS. We discuss the Longest Common Substructure of two matrices and the Longest Common Subtree problem for multiple trees including a constrained version. Both problems cannot be solved by a natural extension of the original LCS solution. We investigate the tractability of the above problems. For the first we prove $\cal{NP}$ -Completeness. For the latter $\cal{NP}$ -hardness holds for two general unordered trees and for k trees in the constrained LCS.
Real Scaled Matching refers to the problem of finding all locations in the text where the pattern... more Real Scaled Matching refers to the problem of finding all locations in the text where the pattern, proportionally enlarged according to an arbitrary real-sized scale, appears. Real scaled matching is an important problem that was originally inspired by Computer Vision. In this paper, we present a new, more precise and realistic, definition for one dimensional real scaled matching, and an efficient algorithm for solving this problem. For a text of length n and a pattern of length m, the algorithm runs in time $O(n\log m + \sqrt{n} m^{3/2}\sqrt{\log m})$ .
Two equal length strings, or two equal sized two dimensional texts, parameterize match (p-match) ... more Two equal length strings, or two equal sized two dimensional texts, parameterize match (p-match) if there is a one-one mapping (relative to the alphabet) of their characters. Two dimensional parameterized matching is the task of finding all m × m substrings of an n× n text that p-match to an m× m pattern. This models, for example, searching for color images with changing of color maps. We present an algorithm that solves the two dimensional parameterized matching problem in O(n 2+m 2.5·polylog(m)) time.
The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtai... more The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time $O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|)$ , where r is the number of pairs (v ∈ F, w ∈ G) such that v and w have the same label. Our second algorithm runs in time $O(L r \lg r \cdot \lg\lg|G|)$ , where L is the size of the LCS of F and G. For this algorithm we present a novel three dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an $O(r h \lg \lg|G|)$ time algorithm, where h = height(F) + height(G).
The following probabilistic process models the generation of noisy clustering data: Clusters corr... more The following probabilistic process models the generation of noisy clustering data: Clusters correspond to disjoint sets of vertices in a graph. Each two vertices from the same set are connected by an edge with probability p, and each two vertices from different sets are connected by an edge with probability r < p. The goal of the clustering problem is to reconstruct the clusters from the graph. We give algorithms that solve this problem with high probability. Compared to previous studies, our algorithms have lower time complexity and wider parameter range of applicability. In particular, our algorithms can handle O(√n/ log n) clusters in an n-vertex graph, while all previous algorithms require that the number of clusters is constant.
We study the subtree isomorphism problem: Given trees H and G, find a subtree of G which is isomo... more We study the subtree isomorphism problem: Given trees H and G, find a subtree of G which is isomorphic to H or decide that there is no such subtree. We give an O([k1.8/log k] n) time algorithm for this problem, where k and n are the number of vertices in H and G respectively. This improves over the O(k1.5n) algorithms of Chung (1987) and Matula (1978). We also give a randomized (Las Vegas) O(min(k1.45n, kn1.43))-time algorithm for the decision problem
In a clustering problem one has to partition a set of elements into homogeneous and well-separate... more In a clustering problem one has to partition a set of elements into homogeneous and well-separated subsets. From a graph theoretic point of view, a cluster graph is a vertex-disjoint union of cliques. The clustering problem is the task of making fewest changes to the edge set of an input graph so that it becomes a cluster graph. We study the complexity of three variants of the problem. In the Cluster Completion variant edges can only be added. In Cluster Deletion, edges can only be deleted. In Cluster Editing, both edge additions and edge deletions are allowed. We also study these variants when the desired solution must contain a prespecified number of clusters. We show that Cluster Editing is NP-complete, Cluster Deletion is NPhard to approximate to within some constant factor, and Cluster Completion is polynomial. When the desired solution must contain exactly p clusters, we show that Cluster Editing is NP-complete for every p≥ 2; Cluster Deletion is polynomial for p = 2 but NP-complete for p> 2; and Cluster Completion is polynomial for any p. We also give a constant factor approximation algorithm for Cluster Editing when p = 2.
The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementa... more The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we give, what is to our knowledge, the first inherently multi-dimensional definition of LCS. We discuss the Longest Common Substructure of two matrices and the Longest Common Subtree problem for multiple trees including a constrained version. Both problems cannot be solved by a natural extension of the original LCS solution. We investigate the tractability of the above problems. For the first we prove $\cal{NP}$ -Completeness. For the latter $\cal{NP}$ -hardness holds for two general unordered trees and for k trees in the constrained LCS.
Real Scaled Matching refers to the problem of finding all locations in the text where the pattern... more Real Scaled Matching refers to the problem of finding all locations in the text where the pattern, proportionally enlarged according to an arbitrary real-sized scale, appears. Real scaled matching is an important problem that was originally inspired by Computer Vision. In this paper, we present a new, more precise and realistic, definition for one dimensional real scaled matching, and an efficient algorithm for solving this problem. For a text of length n and a pattern of length m, the algorithm runs in time $O(n\log m + \sqrt{n} m^{3/2}\sqrt{\log m})$ .
Two equal length strings, or two equal sized two dimensional texts, parameterize match (p-match) ... more Two equal length strings, or two equal sized two dimensional texts, parameterize match (p-match) if there is a one-one mapping (relative to the alphabet) of their characters. Two dimensional parameterized matching is the task of finding all m × m substrings of an n× n text that p-match to an m× m pattern. This models, for example, searching for color images with changing of color maps. We present an algorithm that solves the two dimensional parameterized matching problem in O(n 2+m 2.5·polylog(m)) time.
Uploads
Papers by Dekel Tsur