Skip to main content
Zeev Volkovich

    Zeev Volkovich

    Research Interests:
    We discuss a new approach for the proof of the Levy-Khintchine formula for the V -infinitely divisible laws. Our proof is based on a description of the conditionally positive definite functions as positive functionals on semi-normed... more
    We discuss a new approach for the proof of the Levy-Khintchine formula for the V -infinitely divisible laws. Our proof is based on a description of the conditionally positive definite functions as positive functionals on semi-normed algebras of suitable test functions. In the framework of this approach we obtain integral representations of the common continuous positive definite functions and the logarithms of characteristic functions of the ordinary infinitely divisible and V -infinitely divisible distribution.
    The estimation of the suggested number of clusters in dataset is an ill posed problem of essential relevance in cluster analysis. A group (cluster) is characterized by a relatively high similarity among its elements in addition to a... more
    The estimation of the suggested number of clusters in dataset is an ill posed problem of essential relevance in cluster analysis. A group (cluster) is characterized by a relatively high similarity among its elements in addition to a relatively low similarity to elements of other groups. High stability in partitions, obtained from the same data source, is logically classified as a high consistency of the clustering process. Thus, the number of clusters that maximizes cluster stability can serve as an estimator for the "true" number of clusters. In the current paper we consider a probabilistic approach to this problem resting upon the Gaussian clusters model. We claim that sequences of clustered samples can be interpreted as Gaussian distributed i.i.d. samples drawn from the same source, if the number of clusters is chosen correctly. The samples closeness, within the clusters, can be measured by means of the p-values, calculated for the appropriate Hotelling's T-square s...
    Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster... more
    Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal “true” number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implement this idea, we draw pairs of independent equal sized samples, where one sample in any pair is drawn from the data source and the other one is drawn from a noised version thereof. We then run the same clustering method on both samples in any pair and test the similarity between the obtained partitions using a general k-Nearest Neighbor Binomial model. These similarity measurements enable us to estimate the correct number of clusters. A series of numerical experiments on both synthet...
    In this paper we introduce a new heuristic approach for local clustering of the protein-protein interaction networks (PPIN), which can be applied to very large graphs. The method is based on idea of repeated bisections (rbr) proposed... more
    In this paper we introduce a new heuristic approach for local clustering of the protein-protein interaction networks (PPIN), which can be applied to very large graphs. The method is based on idea of repeated bisections (rbr) proposed earlier for global clustering of PPIN. Each round of bisection is carried out by multilevel graph clusterization method realized by "Graculus" tool.
    Research Interests:
    Abstract Let Nn = {(Tnk,Xnk),k 1} be a Bernoulli p.p. on Z = (0,1) ◊ [0,1),. We discuss weak limit theorems for Nn as well as for the associated sum,and extremal processes Sn(t) = { P X nk : Tnk,t} and Yn(t) = {_Xnk : Tnk,t} on an open... more
    Abstract Let Nn = {(Tnk,Xnk),k 1} be a Bernoulli p.p. on Z = (0,1) ◊ [0,1),. We discuss weak limit theorems for Nn as well as for the associated sum,and extremal processes Sn(t) = { P X nk : Tnk,t} and Yn(t) = {_Xnk : Tnk,t} on an open subset of Z. Key Words and Phrases: point processes; increasing processes; weak
    We discuss weak limit theorems for a uniformly negligible triangular array (u.n.t.a.) in Z=[0,∞)×[0,∞) d as well as for the sum and extremal processes associated with it on an open subset S. The complement of S turns out to be the... more
    We discuss weak limit theorems for a uniformly negligible triangular array (u.n.t.a.) in Z=[0,∞)×[0,∞) d as well as for the sum and extremal processes associated with it on an open subset S. The complement of S turns out to be the explosion area of the limit Poisson point process. In order to prove our criterion for the weak convergence of the sum processes, we introduce and study sum processes over the explosion area. Finally, we generalize the model of u.n.t.a. to random sample size processes.
    Research Interests:
    The article presents the theoretical foundations of the algorithm for calculating the number of different genomes in the medium under study and of two algorithms for determining the presence of a particular (known) genome in this medium.... more
    The article presents the theoretical foundations of the algorithm for calculating the number of different genomes in the medium under study and of two algorithms for determining the presence of a particular (known) genome in this medium. The approach is based on the analysis of the compositional spectra of subsequently sequenced samples of the medium. The theoretical estimations required for the implementation of the algorithms are obtained.
    By this paper we contribute to an area of inverse problems with applications in medicine and further areas of science, engineering and social sciences. Most expressive gene selection is a form of the inverse problem where we start from... more
    By this paper we contribute to an area of inverse problems with applications in medicine and further areas of science, engineering and social sciences. Most expressive gene selection is a form of the inverse problem where we start from given gene expressions level for DNA microarrays and intend to turn out the causes which can lead to such occurrences. In this paper, an effective gene selection algorithm is proposed for evaluating the gene expression differences in the compared datasets based on a two-sample distribution-free test statistics. We focus on the analysis of the genes activeness on a certain disease using intrinsic information about corresponding dataset structure. Further the inverse problem of parameter defining for 'active' gene set selection is formulated as follows: given set of 'active' genes, quality of sub diagnoses differentiation determines parameters for gene selection. The algorithm was evaluated on the Acute Lymphoblastic Leukemia (ALL) Dataset.
    In this paper, an efficient gene selection algorithm is proposed, which employs a two-sample distribution-free test statistics for evaluating the gene expression di¤erences in the compared datasets. The experimen- tal results obtained for... more
    In this paper, an efficient gene selection algorithm is proposed, which employs a two-sample distribution-free test statistics for evaluating the gene expression di¤erences in the compared datasets. The experimen- tal results obtained for the Acute Lymphoblastic Leukemia (ALL) Dataset con…rm the e¢ ciency of the algorithm.
    Research Interests:
    In this paper, a method for the study of cluster st ability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements... more
    In this paper, a method for the study of cluster st ability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the cluste rs cores. The second one, associated with the clust er margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained cluster s are similar. The resemblance is measured by the t otal number of edges, in the clusters minimal spanning trees, conn ecting points from different samples. We use the Fr iedman and Rafsky two sample test statistic. Under the homogen eity hypothesis, this statistic is normally distrib uted. Thus, it can expected that the true number of clusters correspon ds to the statistic empirical distribution which is closest to normal. Numerical experiments demonstrate the abi...
    The exon-intron structures of fungi genes are quite different from each other, and the evolution of such struc-tures raises many questions. We tried to address some of these questions with an accent on methods of revealing evolu-tionary... more
    The exon-intron structures of fungi genes are quite different from each other, and the evolution of such struc-tures raises many questions. We tried to address some of these questions with an accent on methods of revealing evolu-tionary factors based on the analysis of gene exon-intron structures using statistical analysis. Taking whole genomes of fungi, we went through all the protein-coding genes in each chromosome separately and calculated the portion of intron-containing genes and average values of the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We found striking similarities between all of these average properties of chromosomes of the same spe-cies and significant differences between properties of the chromosomes belonging to species of different divisions (Phyla) of the kingdom of Fungi. Comparing those chromosomal and genomic averages, we have developed a technique of clus-tering based on characteristics of the exon-intr...
    Research Interests:
    A new representation for the characteristic function of the multivariate strictly geo-stable distribution is presented. The representation is appealing from a parametric viewpoint: its parameters have an intuitive probabilistic... more
    A new representation for the characteristic function of the multivariate strictly geo-stable distribution is presented. The representation is appealing from a parametric viewpoint: its parameters have an intuitive probabilistic interpretation; and it is particularly useful for estimating the parameters of the geo-stable distribution.
    Massive determination of complete genomes sequences has led to development of different tools for genome comparisons. Our approach is to compare genomes according to typical genomic distributions of a mathematical function that reflects a... more
    Massive determination of complete genomes sequences has led to development of different tools for genome comparisons. Our approach is to compare genomes according to typical genomic distributions of a mathematical function that reflects a certain biological function. In this study we used comprehensive genome analysis of DNA curvature distributions before starts and after ends of prokaryotic genes to evaluate the
    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a... more
    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resti...
    ABSTRACT
    ABSTRACT

    And 63 more