Schalkwyk et al., 2003 - Google Patents

Speech recognition with dynamic grammars using finite-state transducers.

Schalkwyk et al., 2003

Document ID: 3918036325450160206
Author: Schalkwyk J; Hetherington I; Story E
Publication year: 2003
Publication venue: INTERSPEECH

External Links

Cited by

Snippet

Spoken language systems, ranging from interactive voice response (IVR) to mixed-initiative conversational systems, make use of a wide range of recognition grammars and vocabularies. The recognition grammars are either static (created at design time) or dynamic …

Continue reading at sls.csail.mit.edu (PDF) (other versions)

230000001419 dependent 0 abstract description 24

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/193—Formal grammars, e.g. finite state automata, context free grammars or word networks
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/085—Methods for reducing search complexity, pruning
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services, time announcement
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services, time announcement
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4938—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility

Similar Documents

Publication	Publication Date	Title
Valtchev et al.	1997	MMIE training of large vocabulary recognition systems
US5983177A (en)	1999-11-09	Method and apparatus for obtaining transcriptions from multiple training utterances
US7072837B2 (en)	2006-07-04	Method for processing initially recognized speech in a speech recognition session
US6208964B1 (en)	2001-03-27	Method and apparatus for providing unsupervised adaptation of transcriptions
US6243680B1 (en)	2001-06-05	Method and apparatus for obtaining a transcription of phrases through text and spoken utterances
US7043431B2 (en)	2006-05-09	Multilingual speech recognition system using text derived recognition models
Aleksic et al.	2015	Improved recognition of contact names in voice commands
Schalkwyk et al.	2003	Speech recognition with dynamic grammars using finite-state transducers.
WO2004042697A2 (en)	2004-05-21	Multi-lingual speech recognition with cross-language context modeling
EP1470548A2 (en)	2004-10-27	System and method for speech recognition by multi-pass recognition using context specific grammars
Moore et al.	2006	Juicer: A weighted finite-state transducer speech decoder
Sharp et al.	1997	The Watson speech recognition engine
Gauvain et al.	1995	Developments in continuous speech dictation using the ARPA WSJ task
US20030009331A1 (en)	2003-01-09	Grammars for speech recognition
KR100726875B1 (en)	2007-06-14	Speech recognition device with complementary language model for typical mistakes in verbal conversation
KR20140082711A (en)	2014-07-02	Efficient incremental modification of optimized finite-state transducers(fsts) for use in speech applications
AU2004256561A1 (en)	2005-01-20	Voice recognition for large dynamic vocabularies
Lee et al.	2002	Continuous speech recognition consortium: an open repository for CSR tools and models
Alumäe et al.	2012	Open and extendable speech recognition application architecture for mobile environments.
Liu et al.	2019	Efficient dynamic wfst decoding for personalized language models
WO2001026092A2 (en)	2001-04-12	Attribute-based word modeling
JP2004252121A (en)	2004-09-09	Language processing apparatus and language processing method, and program and recording medium
Zhou et al.	1997	Speech technology integration and research platform: a system study.
Hetherington	2005	A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition.
Chen et al.	1994	Large vocabulary word recognition based on tree-trellis search