Skip to main content

mark girolami

University College London, Statistical Science, Department Member

Followers

19

Following

1

Public Views

Interests

Uploads

Papers by mark girolami

A common neural-network model for unsupervised exploratory data analysis and independent component analysis

IEEE Transactions on Neural Networks, 1998

This paper presents the derivation of an unsupervised learning algorithm, which enables the ident... more This paper presents the derivation of an unsupervised learning algorithm, which enables the identification and visualization of latent structure within ensembles of high-dimensional data. This provides a linear projection of the data onto a lower dimensional subspace to identify the characteristic structure of the observations independent latent causes. The algorithm is shown to be a very promising tool for unsupervised exploratory data analysis and data visualization. Experimental results confirm the attractiveness of this technique for exploratory data analysis and an empirical comparison is made with the recently proposed generative topographic mapping (GTM) and standard principal component analysis (PCA). Based on standard probability density models a generic nonlinearity is developed which allows both (1) identification and visualization of dichotomised clusters inherent in the observed data and (2) separation of sources with arbitrary distributions from mixtures, whose dimensionality may be greater than that of number of sources. The resulting algorithm is therefore also a generalized neural approach to independent component analysis (ICA) and it is considered to be a promising method for analysis of real-world data that will consist of sub- and super-Gaussian components such as biomedical signals

An Expectation-Maximization Approach to Nonlinear Component Analysis

Neural Computation, 2001

Kernel PCA for Feature Extraction and DeNoising in Nonlinear Regression

Neural Computing and Applications, 2001

Kernel PCA for Feature Extraction and DeNoising in Non-linear Regression

Neural Computing and Applications, 2000

Hierarchic Bayesian models for kernel learning

A Bayesian regression approach to the inference of regulatory networks from gene expression data

Bioinformatics/computer Applications in The Biosciences, 2005

Biologically valid linear factor models of gene expression

Bioinformatics/computer Applications in The Biosciences, 2004

Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors

Neural Computation, 2006

The Latent Process Decomposition of cDNA Microarray Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005

A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections

Journal of Intelligent Information Systems, 2002

Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents

A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale spa... more A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections

A Variational Method for Learning Sparse and Overcomplete Representations

Neural Computation, 2001

An expectation-maximization algorithm for learning sparse and overcomplete data representations i... more An expectation-maximization algorithm for learning sparse and overcomplete data representations is presented. The proposed algorithm exploits a variational approximation to a range of heavy-tailed distributions whose limit is the Laplacian. A rigorous lower bound on the sparse prior distribution is derived, which enables the analytic marginalization of a lower bound on the data likelihood. This lower bound enables the development of an expectation-maximization algorithm for learning the overcomplete basis vectors and inferring the most probable basis coefficients.

Blind Source Separation of More Sources Than Mixtures Using Overcomplete Representations

IEEE Signal Processing Letters, 1998

An Alternative Perspective on Adaptive Independent Component Analysis Algorithms

Neural Computation, 1998

This article develops an extended independent component analysis algorithm for mixtures of arbitr... more This article develops an extended independent component analysis algorithm for mixtures of arbitrary subgaussian and supergaussian sources. The gaussian mixture model of Pearson is employed in deriving a closed-form generic score function for strictly subgaussian sources. This is combined with the score function for a unimodal supergaussian density to provide a computationally simple yet powerful algorithm for performing independent component analysis on arbitrary mixtures of nongaussian sources.

Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources

Neural Computation, 1999

Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources

Neural Computation, 1999

Fisher Information for Inverse Problems and Trace Class Operators

A common neural-network model for unsupervised exploratory data analysis and independent component analysis

IEEE Transactions on Neural Networks, 1998

This paper presents the derivation of an unsupervised learning algorithm, which enables the ident... more This paper presents the derivation of an unsupervised learning algorithm, which enables the identification and visualization of latent structure within ensembles of high-dimensional data. This provides a linear projection of the data onto a lower dimensional subspace to identify the characteristic structure of the observations independent latent causes. The algorithm is shown to be a very promising tool for unsupervised exploratory data analysis and data visualization. Experimental results confirm the attractiveness of this technique for exploratory data analysis and an empirical comparison is made with the recently proposed generative topographic mapping (GTM) and standard principal component analysis (PCA). Based on standard probability density models a generic nonlinearity is developed which allows both (1) identification and visualization of dichotomised clusters inherent in the observed data and (2) separation of sources with arbitrary distributions from mixtures, whose dimensionality may be greater than that of number of sources. The resulting algorithm is therefore also a generalized neural approach to independent component analysis (ICA) and it is considered to be a promising method for analysis of real-world data that will consist of sub- and super-Gaussian components such as biomedical signals

An Expectation-Maximization Approach to Nonlinear Component Analysis

Neural Computation, 2001

Kernel PCA for Feature Extraction and DeNoising in Nonlinear Regression

Neural Computing and Applications, 2001

Kernel PCA for Feature Extraction and DeNoising in Non-linear Regression

Neural Computing and Applications, 2000

Hierarchic Bayesian models for kernel learning

A Bayesian regression approach to the inference of regulatory networks from gene expression data

Bioinformatics/computer Applications in The Biosciences, 2005

Biologically valid linear factor models of gene expression

Bioinformatics/computer Applications in The Biosciences, 2004

Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors

Neural Computation, 2006

The Latent Process Decomposition of cDNA Microarray Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005

A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections

Journal of Intelligent Information Systems, 2002

Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents

A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale spa... more A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections

A Variational Method for Learning Sparse and Overcomplete Representations

Neural Computation, 2001

An expectation-maximization algorithm for learning sparse and overcomplete data representations i... more An expectation-maximization algorithm for learning sparse and overcomplete data representations is presented. The proposed algorithm exploits a variational approximation to a range of heavy-tailed distributions whose limit is the Laplacian. A rigorous lower bound on the sparse prior distribution is derived, which enables the analytic marginalization of a lower bound on the data likelihood. This lower bound enables the development of an expectation-maximization algorithm for learning the overcomplete basis vectors and inferring the most probable basis coefficients.

Blind Source Separation of More Sources Than Mixtures Using Overcomplete Representations

IEEE Signal Processing Letters, 1998

An Alternative Perspective on Adaptive Independent Component Analysis Algorithms

Neural Computation, 1998

This article develops an extended independent component analysis algorithm for mixtures of arbitr... more This article develops an extended independent component analysis algorithm for mixtures of arbitrary subgaussian and supergaussian sources. The gaussian mixture model of Pearson is employed in deriving a closed-form generic score function for strictly subgaussian sources. This is combined with the score function for a unimodal supergaussian density to provide a computationally simple yet powerful algorithm for performing independent component analysis on arbitrary mixtures of nongaussian sources.

Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources

Neural Computation, 1999

Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources

Neural Computation, 1999

Fisher Information for Inverse Problems and Trace Class Operators