Skip to main content

Andrew Morris

Followers

22

Following

12

Co-authors

3

Public Views

InterestsView All (6)

Uploads

Papers

On and off units detect information bottle-necks for speech recognition

2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1991

We show how the objective measure of mutual information (MI) can be used to confirm that informa... more We show how the objective measure of mutual information (MI) can be used to confirm that information for identifying Place of Articulation (PoA) for plosives in Vowel-Plosive-Vowel context is concentrated at both ON (burst onset) and OFF (voicing termination) events in the acoustic spectrogram and is predominantly dynamic rather than static. We then run recognition tests to show that single-speaker plosive PoA in VPV context can be reliably identified from just one pair of short-term spectra centred at either ON or OFF position.

Phoneme transition detection and broad classification using a simple model based on the function of onset detector cells found in the cochlear nucleus

4th European Conference on Speech Communication and Technology (Eurospeech 1995), 1995

We present a simple model for onset and offset detection which is based on the broad functionalit... more We present a simple model for onset and offset detection which is based on the broad functionality of onset cells in the cochlear nucleus, the first auditory brain centre. We show that the clusters of transition events detected by this model in the spectrogram can be used to both locate and broad-classify phoneme transitions. A preliminary Isolated Word Recognition system is described which bases recognition solely on evidence from detected transition clusters together with short spectral samples taken from each cluster centre. Recognition performance is compared with that for two other IWR systems of a similar complexity which process the whole signal uniformly.

An information theoretical investigation into the distribution of phonetic information across the auditory spectrogram

Computer Speech & Language, 1993

From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition

Interspeech 2004, 2004

Adaptive ML-weighting in multi-band recombination of Gaussian mixture ASR

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001

SecurePhone: A mobile phone with biometric authentication and e-signature support for dealing secure transactions on the fly

Frame Based Features

Lecture Notes in Computer Science

Abstract. In this chapter we will discuss feature extraction methods for speaker classification. ... more

Enhancing Speaker Discrimination at the Feature Level

Lecture Notes in Computer Science

C. Müller (Ed.): Speaker Classification I, LNAI 4343, pp. 260277, 2007. © Springer-Verlag Berlin... more

Comparison of HMM experts with MLP experts in the full combination multi-band …

Sixth International Conference on Spoken Language …

In this paper we apply the Full Combination (FC) multi-band approach, which has originally been i... more

Automatic phoneme segmentation with relaxed textual constraints

Proc. of ELREC, Marrakech, Morroco, May 1, 2008

Speech synthesis by unit selection requires the segmentation of a large single speaker high quali... more Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, eg Hidden Markov Models (HMM), can be optimised for maximum segmentation accuracy. This paper presents the results of tuning such a phoneme segmentation system. Firstly, using no text transcription, the design of an HMM phoneme recogniser is optimised subject to a phoneme bigram language model. Optimal performance is obtained with triphone models, 7 states ...

Nonintrusive multibiometrics on a mobile device: a comparison of fusion techniques

SPIE Proceedings, 2006

Morris 2000 icslp

GMM based clustering and speaker separability in the Timit speech database

by Andrew Morris, Dalei Wu, and Jacques Koreman

IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85-A/B/C/D, No.1, 2005

Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection... more Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection with a simple Gaussian Mixture Model (GMM) for the data distribution for each speaker, gives above 99% correct recognition. In contrast, a powerful classifier such as a Multi Layer Perceptron (MLP), trained to estimate speaker probabilities, even on a small subset of speakers often performs no better than random selection. We hypothesise two effects which could combine to produce this situation. MLPs do badly because the acoustic feature data is primarily clustered around phonemes, so that speaker classes are highly fragmented and interspersed. In contrast, GMMs model speaker data distributions well because variation within the phonetic cluster identified by each Gaussian is primarily due to speaker variation, with the result that when speaker models are trained by adapting only the means from a multi speaker world model, the resulting GMMs are highly discriminative between speakers. In this article we analyse the distribution of speech and speaker information, both overall and within the cluster identified by each Gaussian in a GMM tuned for speaker recognition on Timit. We show that the results of this analysis support the above hypotheses, and then discuss ways in which the enhanced speaker separability within each Gaussian cluster could be used to harness the discriminative power of MLPs to provide feature data enhancement and improved speaker identification.

Submitted for ICASSP’98 SOME SOLUTIONS TO THE MISSING FEATURE PROBLEM IN DATA CLASSIFICATION, WITH APPLICATION TO NOISE ROBUST ASR

We address the theoretical and practical issues involved in ASR when some of the observation data... more We address the theoretical and practical issues involved in ASR when some of the observation data for the target signal is masked by other signals. Techniques discussed range from simple missing data imputation to Bayesian optimal classification. We have developed the Bayesian approach because this allows prior knowledge to be incorporated naturally into the recognition process, thereby permitting us to go beyond the simple “integrate over missing data ” or “marginals ” approach reported elsewhere, which we show to be inadequate for dealing with realistic patterns of missing data. After deriving general techniques for recognition with missing data, these techniques are formulated in the context of an HMM based CSR system. This scheme is evaluated under both random and more realistic patterns of missing data, with speech from the DARPA RM corpus and noise from NOISEX. We find that a key problem in real world recognition with missing data is that efficient ASR requires data vector com...

Global features for rapid identity verification with dynamic biometric data

Interspeech 2007, 2007

... andrew.morris@spinvox.com, jacques.koreman@hf.ntnu.no, bao.ly_van@int-evry.fr, {harin.sellahe... more

An information theoretic measure of sequence recognition performance

PAPER Special Section/Issue on Corpus-Based Speech Technologies GMM based clustering and speaker separability in the Timit speech database

ABSTRACT

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Lecture Notes in Computer Science, 2006

Multimodal person authentication on a smartphone under realistic conditions

SPIE Proceedings, 2006

Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR

Computer Speech & Language, 2005

On and off units detect information bottle-necks for speech recognition

2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1991

We show how the objective measure of mutual information (MI) can be used to confirm that informa... more We show how the objective measure of mutual information (MI) can be used to confirm that information for identifying Place of Articulation (PoA) for plosives in Vowel-Plosive-Vowel context is concentrated at both ON (burst onset) and OFF (voicing termination) events in the acoustic spectrogram and is predominantly dynamic rather than static. We then run recognition tests to show that single-speaker plosive PoA in VPV context can be reliably identified from just one pair of short-term spectra centred at either ON or OFF position.

Phoneme transition detection and broad classification using a simple model based on the function of onset detector cells found in the cochlear nucleus

4th European Conference on Speech Communication and Technology (Eurospeech 1995), 1995

We present a simple model for onset and offset detection which is based on the broad functionalit... more We present a simple model for onset and offset detection which is based on the broad functionality of onset cells in the cochlear nucleus, the first auditory brain centre. We show that the clusters of transition events detected by this model in the spectrogram can be used to both locate and broad-classify phoneme transitions. A preliminary Isolated Word Recognition system is described which bases recognition solely on evidence from detected transition clusters together with short spectral samples taken from each cluster centre. Recognition performance is compared with that for two other IWR systems of a similar complexity which process the whole signal uniformly.

An information theoretical investigation into the distribution of phonetic information across the auditory spectrogram

Computer Speech & Language, 1993

From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition

Interspeech 2004, 2004

Adaptive ML-weighting in multi-band recombination of Gaussian mixture ASR

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001

SecurePhone: A mobile phone with biometric authentication and e-signature support for dealing secure transactions on the fly

Frame Based Features

Lecture Notes in Computer Science

Abstract. In this chapter we will discuss feature extraction methods for speaker classification. ... more

Enhancing Speaker Discrimination at the Feature Level

Lecture Notes in Computer Science

C. Müller (Ed.): Speaker Classification I, LNAI 4343, pp. 260277, 2007. © Springer-Verlag Berlin... more

Comparison of HMM experts with MLP experts in the full combination multi-band …

Sixth International Conference on Spoken Language …

In this paper we apply the Full Combination (FC) multi-band approach, which has originally been i... more

Automatic phoneme segmentation with relaxed textual constraints

Proc. of ELREC, Marrakech, Morroco, May 1, 2008

Speech synthesis by unit selection requires the segmentation of a large single speaker high quali... more Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, eg Hidden Markov Models (HMM), can be optimised for maximum segmentation accuracy. This paper presents the results of tuning such a phoneme segmentation system. Firstly, using no text transcription, the design of an HMM phoneme recogniser is optimised subject to a phoneme bigram language model. Optimal performance is obtained with triphone models, 7 states ...

Nonintrusive multibiometrics on a mobile device: a comparison of fusion techniques

SPIE Proceedings, 2006

Morris 2000 icslp

GMM based clustering and speaker separability in the Timit speech database

by Andrew Morris, Dalei Wu, and Jacques Koreman

IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85-A/B/C/D, No.1, 2005

Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection... more Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection with a simple Gaussian Mixture Model (GMM) for the data distribution for each speaker, gives above 99% correct recognition. In contrast, a powerful classifier such as a Multi Layer Perceptron (MLP), trained to estimate speaker probabilities, even on a small subset of speakers often performs no better than random selection. We hypothesise two effects which could combine to produce this situation. MLPs do badly because the acoustic feature data is primarily clustered around phonemes, so that speaker classes are highly fragmented and interspersed. In contrast, GMMs model speaker data distributions well because variation within the phonetic cluster identified by each Gaussian is primarily due to speaker variation, with the result that when speaker models are trained by adapting only the means from a multi speaker world model, the resulting GMMs are highly discriminative between speakers. In this article we analyse the distribution of speech and speaker information, both overall and within the cluster identified by each Gaussian in a GMM tuned for speaker recognition on Timit. We show that the results of this analysis support the above hypotheses, and then discuss ways in which the enhanced speaker separability within each Gaussian cluster could be used to harness the discriminative power of MLPs to provide feature data enhancement and improved speaker identification.

Submitted for ICASSP’98 SOME SOLUTIONS TO THE MISSING FEATURE PROBLEM IN DATA CLASSIFICATION, WITH APPLICATION TO NOISE ROBUST ASR

We address the theoretical and practical issues involved in ASR when some of the observation data... more We address the theoretical and practical issues involved in ASR when some of the observation data for the target signal is masked by other signals. Techniques discussed range from simple missing data imputation to Bayesian optimal classification. We have developed the Bayesian approach because this allows prior knowledge to be incorporated naturally into the recognition process, thereby permitting us to go beyond the simple “integrate over missing data ” or “marginals ” approach reported elsewhere, which we show to be inadequate for dealing with realistic patterns of missing data. After deriving general techniques for recognition with missing data, these techniques are formulated in the context of an HMM based CSR system. This scheme is evaluated under both random and more realistic patterns of missing data, with speech from the DARPA RM corpus and noise from NOISEX. We find that a key problem in real world recognition with missing data is that efficient ASR requires data vector com...

Global features for rapid identity verification with dynamic biometric data

Interspeech 2007, 2007

... andrew.morris@spinvox.com, jacques.koreman@hf.ntnu.no, bao.ly_van@int-evry.fr, {harin.sellahe... more

An information theoretic measure of sequence recognition performance

PAPER Special Section/Issue on Corpus-Based Speech Technologies GMM based clustering and speaker separability in the Timit speech database

ABSTRACT

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Lecture Notes in Computer Science, 2006

Multimodal person authentication on a smartphone under realistic conditions

SPIE Proceedings, 2006

Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR

Computer Speech & Language, 2005