Adams CEO et al., 2017 - Google Patents

Automated Speech Recognition for Captioned Telephone Conversations

Adams CEO et al., 2017

Document ID: 2867807617360583404
Author: Adams CEO J; Basye PhD K; Parlikar PhD A; Fletcher PhD A; Kim PhD J
Publication year: 2017

External Links

Cited by

Snippet

Abstract Internet Protocol Captioned Telephone Service is a service for people with hearing loss, allowing them to communicate effectively by having a human Communications Assistant transcribe the call and equipment that displays the transcription in near real time …

Continue reading at commons.clarku.edu (PDF) (other versions)

230000035897 transcription 0 abstract description 24

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication	Publication Date	Title
JP5327054B2 (en)	2013-10-30	Pronunciation variation rule extraction device, pronunciation variation rule extraction method, and pronunciation variation rule extraction program
US8818801B2 (en)	2014-08-26	Dialogue speech recognition system, dialogue speech recognition method, and recording medium for storing dialogue speech recognition program
US9117450B2 (en)	2015-08-25	Combining re-speaking, partial agent transcription and ASR for improved accuracy / human guided ASR
Fujie et al.	2004	A conversation robot with back-channel feedback function based on linguistic and nonlinguistic information
CN117043856A (en)	2023-11-10	End-to-end model on high-efficiency streaming non-recursive devices
US11361780B2 (en)	2022-06-14	Real-time speech-to-speech generation (RSSG) apparatus, method and a system therefore
JP2024502946A (en)	2024-01-24	Punctuation and capitalization of speech recognition transcripts
JP2024502946A6 (en)	2024-01-24	Punctuation and capitalization of speech recognition transcripts
Kons et al.	2018	Neural TTS voice conversion
Walker et al.	2017	Semi-supervised model training for unbounded conversational speech recognition
Ihori et al.	2020	Parallel corpus for Japanese spoken-to-written style conversion
Kumar et al.	2016	Automatic spontaneous speech recognition for Punjabi language interview speech corpus
Mirishkar et al.	2021	CSTD-Telugu corpus: Crowd-sourced approach for large-scale speech data collection
JP5184467B2 (en)	2013-04-17	Adaptive acoustic model generation apparatus and program
Koo et al.	2023	KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
Adhikary et al.	2019	Investigating speech recognition for improving predictive aac
Amarasingha et al.	2012	Speaker independent sinhala speech recognition for voice dialling
US20240096236A1 (en)	2024-03-21	System for reply generation
Savchenko	2014	Phonetic encoding method in the isolated words recognition problem
Adams CEO et al.	2017	Automated Speech Recognition for Captioned Telephone Conversations
Tsunematsu et al.	2020	Neural Speech Completion.
Mizera et al.	2014	Impact of irregular pronunciation on phonetic segmentation of nijmegen corpus of casual czech
Tarján et al.	2013	Improved recognition of Hungarian call center conversations
Kessens et al.	2004	On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions
Koriyama et al.	2010	Conversational spontaneous speech synthesis using average voice model.