Chen et al., 1998 - Google Patents

IBM's lvcsr system for transcription of broadcast news used in the 1997 hub4 english evaluation

Chen et al., 1998

Document ID: 9784258388485805477
Author: Chen S; Gales M; Gopalakrishnan P; Gopinath R; Printz H; Kanevsky D; Olsen P; Polymenakos L
Publication year: 1998
Publication venue: Proceedings of the Speech Recognition Workshop

External Links

Cited by

Snippet

This paper describes IBM's large vocabulary continuous speech recognition (LVCSR) system used in the 1997 Hub4 English evaluation. It focusses on extensions and improvements to the system used in the 1996 evaluation. The recognizer uses an additional …

Continue reading at citeseerx.ist.psu.edu (PDF) (other versions)

238000011156 evaluation 0 title abstract description 46

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/085—Methods for reducing search complexity, pruning
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation

Similar Documents

Publication	Publication Date	Title
US11164566B2 (en)	2021-11-02	Dialect-specific acoustic language modeling and speech recognition
Chen et al.	2006	Advances in speech transcription at IBM under the DARPA EARS program
US7457745B2 (en)	2008-11-25	Method and apparatus for fast on-line automatic speaker/environment adaptation for speech/speaker recognition in the presence of changing environments
Gupta et al.	2014	I-vector-based speaker adaptation of deep neural networks for french broadcast audio transcription
JP4141495B2 (en)	2008-08-27	Method and apparatus for speech recognition using optimized partial probability mixture sharing
Rybach et al.	2009	Audio segmentation for speech recognition using segment features
Ney et al.	1998	The RWTH large vocabulary continuous speech recognition system
Reichl et al.	1998	Decision tree state tying based on segmental clustering for acoustic modeling
US20120245919A1 (en)	2012-09-27	Probabilistic Representation of Acoustic Segments
Lu et al.	2013	Cross-lingual subspace gaussian mixture models for low-resource speech recognition
Chen et al.	1998	IBM's lvcsr system for transcription of broadcast news used in the 1997 hub4 english evaluation
Chen et al.	2002	Automatic transcription of broadcast news
Abdou et al.	2019	Arabic speech recognition: Challenges and state of the art
Soltau et al.	2009	Advances in Arabic speech transcription at IBM under the DARPA GALE program
Polymenakos et al.	1998	Transcription of broadcast news-some recent improvements to IBM's LVCSR system
Gollan et al.	2008	Confidence scores for acoustic model adaptation
Zhang et al.	2005	Minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition
Bocchieri et al.	2011	Speech recognition modeling advances for mobile voice search
Kubala et al.	1998	The 1997 BBN Byblos system applied to broadcast news transcription
Gauvain et al.	2000	Fast decoding for indexation of broadcast data.
Kim et al.	2003	Recent advances in broadcast news transcription
Cuendet et al.	2006	Model adaptation for sentence segmentation from speech
AT&T
Nagarajan et al.	2006	Language identification using acoustic log-likelihoods of syllable-like units
Gauvain et al.	1996	The LIMSI 1995 Hub3 System