The SpeechDat project aims at producing telephone speech databases to be used for training and te... more The SpeechDat project aims at producing telephone speech databases to be used for training and testing of speech recognition and speaker verification devices. The main features are: coverage of applications (application-oriented words, phonetically rich sentences, spontaneous utterances, speaker verification), coverage of the 11 official European languages and variants, coverage of speaking styles (commands, carefully pronounced and spontaneous speech), coverage of
An artificial neural network has been trained by the error back-propagation technique to recognis... more An artificial neural network has been trained by the error back-propagation technique to recognise phonemes and words. The speech material was recorded by a male Swedish talker and was labelled by a phonetician. There were 38 output nodes corresponding to Swedish phonemes. Introducing coarticulation information by adding simple recurrency to the net is shown to more effective than expanding the
Two artificial neural networks have been trained to recognise phonemes in continuous speech: mult... more Two artificial neural networks have been trained to recognise phonemes in continuous speech: multi-layer perceptron (MLP) nets and probabilistic neural networks (PNN). The speech material was recorded by one male Swedish speaker and the sentences were phonetically labelled. Fifty sentences were used for training and another fifty were used for testing. Both networks had a single hidden layer and 38
International Conference on Acoustics, Speech, and Signal Processing, 1990
The orthographic structure of Swedish words was used for predicting word class using a connection... more The orthographic structure of Swedish words was used for predicting word class using a connectionist approach. This technique can be used to aid syntactic processing within a text-to-speech system. The error backpropagation technique was used for the connectionist learning procedure. A corpus of the 10000 most frequent Swedish words was used for training and testing the system. The results indicate
An artificial neural network has been trained to recog- nize phonemes using the error back-propag... more An artificial neural network has been trained to recog- nize phonemes using the error back-propagation tech- nique. First a coarse feature network is trained to extract seven quasi-phonetic features from the spectral frames of a Bark-scaled filter bank. The outputs of this net and the spectral outputs of the filter bank were input to a phoneme recognition net. The coarse
The EU-funded SpeechDat project was initiated in order to create large-scale speech databases for... more The EU-funded SpeechDat project was initiated in order to create large-scale speech databases for the development of voice-operated telecommunication services. This paper deals with the design of two such Swedish resources: 5000 speakers recorded over the fixed telephone network and 1000 speakers over the mobile network. Speakers were balanced according to gender, age and dialect. We also report on experiences from speaker recruitment. A “snowball” method, in which people gave addresses to friends according to a chain letter principle, was shown to be effective. Females were, in general, more cooperative than males. However, using Internet for recruiting favored young males. Statistics on speaker distribution are presented. Results regarding orthographic labeling of pronunciation, pronunciation errors and non-speech events are also included. The length of the longest word in a read sentence is shown to be directly correlated with mispronunciations and word repetitions.
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986
Corpora of approximately 10,000 words have been examined in five languages: Swedish, English, Ger... more Corpora of approximately 10,000 words have been examined in five languages: Swedish, English, German, Italian, and French. A 2-class and a 6- class "cohort" classification have been defined, and calculations made of the number of cohorts, the number of unique cohorts, and their maximum, and expected sizes. The discriminatory ability of stress is also considered.
The object of t he Olga p roject i s to d evelop an interactive 3D animated talking agent. A futu... more The object of t he Olga p roject i s to d evelop an interactive 3D animated talking agent. A futuristic application scenario is interactive digital TV, where the Olga agent would gu ide naive users through the v arious s ervices available on the network. The current application is a consumer information service for microwave ovens. Olga required the
An artificial neural network has been trained to recognizes phonemes using the error back-propaga... more An artificial neural network has been trained to recognizes phonemes using the error back-propagation technique. First a coarse feature network was trained to extract seven quasi-phonetic features from the spectral frames of a Bark-scaled filter bank. The outputs of this net and the spectral outputs of the filter bank were input to a phoneme recognition net. The coarse features were
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986
A technique of nonlinear frequency warping has been investigated for recognition of Swedish vowel... more A technique of nonlinear frequency warping has been investigated for recognition of Swedish vowels. A frequency warp between two spectra is computed using a standard dynamic programming algorithm. The frequency distance, defined as the area between the obtained warping function and the diagonal, is contributing to the spectral distance. The distance between two spectra is a weighted sum of the warped amplitude distance and the frequency distance. By changing two weights, we get a gradual shift between non-warped amplitude distance, warped amplitude distance, and frequency distance. In recognition experiments on natural and synthetic vowel spectra, a metric combining the frequency and amplitude distances gave better results than using only amplitude or frequency deviation. Analysis of the results of the synthetic vowels show a reduced sensitivity to voice source and pitch variation. For the natural vowels, the recognition improvement is larger for the male and female speakers separately than for the combined groups.
Includes comments by Stefanie Seneff and Nelson Kiang. (PsycINFO Database Record (c) 2012 APA, al... more Includes comments by Stefanie Seneff and Nelson Kiang. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Studies of expressive speech have shown that discrete emotions such as anger, fear, joy, and sadn... more Studies of expressive speech have shown that discrete emotions such as anger, fear, joy, and sadness can be accurately communicated, also cross-culturally, and that each emotion is associated with reasonably specific acoustic characteristics [8]. However, most previous research has been conducted on acted emotions. These certainly have something in common with naturally occurring emotions but may also be more intense
The SpeechDat project aims at producing telephone speech databases to be used for training and te... more The SpeechDat project aims at producing telephone speech databases to be used for training and testing of speech recognition and speaker verification devices. The main features are: coverage of applications (application-oriented words, phonetically rich sentences, spontaneous utterances, speaker verification), coverage of the 11 official European languages and variants, coverage of speaking styles (commands, carefully pronounced and spontaneous speech), coverage of
An artificial neural network has been trained by the error back-propagation technique to recognis... more An artificial neural network has been trained by the error back-propagation technique to recognise phonemes and words. The speech material was recorded by a male Swedish talker and was labelled by a phonetician. There were 38 output nodes corresponding to Swedish phonemes. Introducing coarticulation information by adding simple recurrency to the net is shown to more effective than expanding the
Two artificial neural networks have been trained to recognise phonemes in continuous speech: mult... more Two artificial neural networks have been trained to recognise phonemes in continuous speech: multi-layer perceptron (MLP) nets and probabilistic neural networks (PNN). The speech material was recorded by one male Swedish speaker and the sentences were phonetically labelled. Fifty sentences were used for training and another fifty were used for testing. Both networks had a single hidden layer and 38
International Conference on Acoustics, Speech, and Signal Processing, 1990
The orthographic structure of Swedish words was used for predicting word class using a connection... more The orthographic structure of Swedish words was used for predicting word class using a connectionist approach. This technique can be used to aid syntactic processing within a text-to-speech system. The error backpropagation technique was used for the connectionist learning procedure. A corpus of the 10000 most frequent Swedish words was used for training and testing the system. The results indicate
An artificial neural network has been trained to recog- nize phonemes using the error back-propag... more An artificial neural network has been trained to recog- nize phonemes using the error back-propagation tech- nique. First a coarse feature network is trained to extract seven quasi-phonetic features from the spectral frames of a Bark-scaled filter bank. The outputs of this net and the spectral outputs of the filter bank were input to a phoneme recognition net. The coarse
The EU-funded SpeechDat project was initiated in order to create large-scale speech databases for... more The EU-funded SpeechDat project was initiated in order to create large-scale speech databases for the development of voice-operated telecommunication services. This paper deals with the design of two such Swedish resources: 5000 speakers recorded over the fixed telephone network and 1000 speakers over the mobile network. Speakers were balanced according to gender, age and dialect. We also report on experiences from speaker recruitment. A “snowball” method, in which people gave addresses to friends according to a chain letter principle, was shown to be effective. Females were, in general, more cooperative than males. However, using Internet for recruiting favored young males. Statistics on speaker distribution are presented. Results regarding orthographic labeling of pronunciation, pronunciation errors and non-speech events are also included. The length of the longest word in a read sentence is shown to be directly correlated with mispronunciations and word repetitions.
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986
Corpora of approximately 10,000 words have been examined in five languages: Swedish, English, Ger... more Corpora of approximately 10,000 words have been examined in five languages: Swedish, English, German, Italian, and French. A 2-class and a 6- class "cohort" classification have been defined, and calculations made of the number of cohorts, the number of unique cohorts, and their maximum, and expected sizes. The discriminatory ability of stress is also considered.
The object of t he Olga p roject i s to d evelop an interactive 3D animated talking agent. A futu... more The object of t he Olga p roject i s to d evelop an interactive 3D animated talking agent. A futuristic application scenario is interactive digital TV, where the Olga agent would gu ide naive users through the v arious s ervices available on the network. The current application is a consumer information service for microwave ovens. Olga required the
An artificial neural network has been trained to recognizes phonemes using the error back-propaga... more An artificial neural network has been trained to recognizes phonemes using the error back-propagation technique. First a coarse feature network was trained to extract seven quasi-phonetic features from the spectral frames of a Bark-scaled filter bank. The outputs of this net and the spectral outputs of the filter bank were input to a phoneme recognition net. The coarse features were
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986
A technique of nonlinear frequency warping has been investigated for recognition of Swedish vowel... more A technique of nonlinear frequency warping has been investigated for recognition of Swedish vowels. A frequency warp between two spectra is computed using a standard dynamic programming algorithm. The frequency distance, defined as the area between the obtained warping function and the diagonal, is contributing to the spectral distance. The distance between two spectra is a weighted sum of the warped amplitude distance and the frequency distance. By changing two weights, we get a gradual shift between non-warped amplitude distance, warped amplitude distance, and frequency distance. In recognition experiments on natural and synthetic vowel spectra, a metric combining the frequency and amplitude distances gave better results than using only amplitude or frequency deviation. Analysis of the results of the synthetic vowels show a reduced sensitivity to voice source and pitch variation. For the natural vowels, the recognition improvement is larger for the male and female speakers separately than for the combined groups.
Includes comments by Stefanie Seneff and Nelson Kiang. (PsycINFO Database Record (c) 2012 APA, al... more Includes comments by Stefanie Seneff and Nelson Kiang. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Studies of expressive speech have shown that discrete emotions such as anger, fear, joy, and sadn... more Studies of expressive speech have shown that discrete emotions such as anger, fear, joy, and sadness can be accurately communicated, also cross-culturally, and that each emotion is associated with reasonably specific acoustic characteristics [8]. However, most previous research has been conducted on acted emotions. These certainly have something in common with naturally occurring emotions but may also be more intense
Uploads
Papers