This paper presents the derivation of an unsupervised learning algorithm, which enables the ident... more This paper presents the derivation of an unsupervised learning algorithm, which enables the identification and visualization of latent structure within ensembles of high-dimensional data. This provides a linear projection of the data onto a lower dimensional subspace to identify the characteristic structure of the observations independent latent causes. The algorithm is shown to be a very promising tool for unsupervised exploratory data analysis and data visualization. Experimental results confirm the attractiveness of this technique for exploratory data analysis and an empirical comparison is made with the recently proposed generative topographic mapping (GTM) and standard principal component analysis (PCA). Based on standard probability density models a generic nonlinearity is developed which allows both (1) identification and visualization of dichotomised clusters inherent in the observed data and (2) separation of sources with arbitrary distributions from mixtures, whose dimensionality may be greater than that of number of sources. The resulting algorithm is therefore also a generalized neural approach to independent component analysis (ICA) and it is considered to be a promising method for analysis of real-world data that will consist of sub- and super-Gaussian components such as biomedical signals
A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale spa... more A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections
An expectation-maximization algorithm for learning sparse and overcomplete data representations i... more An expectation-maximization algorithm for learning sparse and overcomplete data representations is presented. The proposed algorithm exploits a variational approximation to a range of heavy-tailed distributions whose limit is the Laplacian. A rigorous lower bound on the sparse prior distribution is derived, which enables the analytic marginalization of a lower bound on the data likelihood. This lower bound enables the development of an expectation-maximization algorithm for learning the overcomplete basis vectors and inferring the most probable basis coefficients.
This article develops an extended independent component analysis algorithm for mixtures of arbitr... more This article develops an extended independent component analysis algorithm for mixtures of arbitrary subgaussian and supergaussian sources. The gaussian mixture model of Pearson is employed in deriving a closed-form generic score function for strictly subgaussian sources. This is combined with the score function for a unimodal supergaussian density to provide a computationally simple yet powerful algorithm for performing independent component analysis on arbitrary mixtures of nongaussian sources.
This paper presents the derivation of an unsupervised learning algorithm, which enables the ident... more This paper presents the derivation of an unsupervised learning algorithm, which enables the identification and visualization of latent structure within ensembles of high-dimensional data. This provides a linear projection of the data onto a lower dimensional subspace to identify the characteristic structure of the observations independent latent causes. The algorithm is shown to be a very promising tool for unsupervised exploratory data analysis and data visualization. Experimental results confirm the attractiveness of this technique for exploratory data analysis and an empirical comparison is made with the recently proposed generative topographic mapping (GTM) and standard principal component analysis (PCA). Based on standard probability density models a generic nonlinearity is developed which allows both (1) identification and visualization of dichotomised clusters inherent in the observed data and (2) separation of sources with arbitrary distributions from mixtures, whose dimensionality may be greater than that of number of sources. The resulting algorithm is therefore also a generalized neural approach to independent component analysis (ICA) and it is considered to be a promising method for analysis of real-world data that will consist of sub- and super-Gaussian components such as biomedical signals
A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale spa... more A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections
An expectation-maximization algorithm for learning sparse and overcomplete data representations i... more An expectation-maximization algorithm for learning sparse and overcomplete data representations is presented. The proposed algorithm exploits a variational approximation to a range of heavy-tailed distributions whose limit is the Laplacian. A rigorous lower bound on the sparse prior distribution is derived, which enables the analytic marginalization of a lower bound on the data likelihood. This lower bound enables the development of an expectation-maximization algorithm for learning the overcomplete basis vectors and inferring the most probable basis coefficients.
This article develops an extended independent component analysis algorithm for mixtures of arbitr... more This article develops an extended independent component analysis algorithm for mixtures of arbitrary subgaussian and supergaussian sources. The gaussian mixture model of Pearson is employed in deriving a closed-form generic score function for strictly subgaussian sources. This is combined with the score function for a unimodal supergaussian density to provide a computationally simple yet powerful algorithm for performing independent component analysis on arbitrary mixtures of nongaussian sources.
Uploads
Papers by mark girolami