CN101014997B - 用于生成用于自动语音识别器的训练数据的方法和系统 - Google Patents
用于生成用于自动语音识别器的训练数据的方法和系统 Download PDFInfo
- Publication number
- CN101014997B CN101014997B CN200580005136.0A CN200580005136A CN101014997B CN 101014997 B CN101014997 B CN 101014997B CN 200580005136 A CN200580005136 A CN 200580005136A CN 101014997 B CN101014997 B CN 101014997B
- Authority
- CN
- China
- Prior art keywords
- spectral
- sampling frequency
- audio data
- codebook
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Electrically Operated Instructional Devices (AREA)
- Telephonic Communication Services (AREA)
- Synchronisation In Digital Transmission Systems (AREA)
Abstract
Description
Claims (13)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP04100646 | 2004-02-18 | ||
EP04100646.1 | 2004-02-18 | ||
PCT/IB2005/050518 WO2005083677A2 (en) | 2004-02-18 | 2005-02-10 | Method and system for generating training data for an automatic speech recogniser |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101014997A CN101014997A (zh) | 2007-08-08 |
CN101014997B true CN101014997B (zh) | 2012-04-04 |
Family
ID=34896083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200580005136.0A Expired - Lifetime CN101014997B (zh) | 2004-02-18 | 2005-02-10 | 用于生成用于自动语音识别器的训练数据的方法和系统 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8438026B2 (zh) |
EP (1) | EP1719114A2 (zh) |
JP (1) | JP5230103B2 (zh) |
CN (1) | CN101014997B (zh) |
WO (1) | WO2005083677A2 (zh) |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE60214027T2 (de) * | 2001-11-14 | 2007-02-15 | Matsushita Electric Industrial Co., Ltd., Kadoma | Kodiervorrichtung und dekodiervorrichtung |
US7983916B2 (en) * | 2007-07-03 | 2011-07-19 | General Motors Llc | Sampling rate independent speech recognition |
JP5326892B2 (ja) * | 2008-12-26 | 2013-10-30 | 富士通株式会社 | 情報処理装置、プログラム、および音響モデルを生成する方法 |
JP5326714B2 (ja) * | 2009-03-23 | 2013-10-30 | 沖電気工業株式会社 | 帯域拡張装置、方法及びプログラム、並びに、量子化雑音学習装置、方法及びプログラム |
CN102483916B (zh) | 2009-08-28 | 2014-08-06 | 国际商业机器公司 | 声音特征量提取装置和声音特征量提取方法 |
US8538035B2 (en) | 2010-04-29 | 2013-09-17 | Audience, Inc. | Multi-microphone robust noise suppression |
US8473287B2 (en) | 2010-04-19 | 2013-06-25 | Audience, Inc. | Method for jointly optimizing noise reduction and voice quality in a mono or multi-microphone system |
US8798290B1 (en) | 2010-04-21 | 2014-08-05 | Audience, Inc. | Systems and methods for adaptive signal equalization |
US8781137B1 (en) | 2010-04-27 | 2014-07-15 | Audience, Inc. | Wind noise detection and suppression |
US9245538B1 (en) * | 2010-05-20 | 2016-01-26 | Audience, Inc. | Bandwidth enhancement of speech signals assisted by noise reduction |
US8447596B2 (en) | 2010-07-12 | 2013-05-21 | Audience, Inc. | Monaural noise suppression based on computational auditory scene analysis |
WO2012025579A1 (en) * | 2010-08-24 | 2012-03-01 | Veovox Sa | System and method for recognizing a user voice command in noisy environment |
US9076446B2 (en) * | 2012-03-22 | 2015-07-07 | Qiguang Lin | Method and apparatus for robust speaker and speech recognition |
US9319510B2 (en) * | 2013-02-15 | 2016-04-19 | Qualcomm Incorporated | Personalized bandwidth extension |
CA2995530C (en) | 2013-09-12 | 2018-07-24 | Saudi Arabian Oil Company | Dynamic threshold methods, systems, computer readable media, and program code for filtering noise and restoring attenuated high-frequency components of acoustic signals |
US9953646B2 (en) | 2014-09-02 | 2018-04-24 | Belleau Technologies | Method and system for dynamic speech recognition and tracking of prewritten script |
US9842608B2 (en) * | 2014-10-03 | 2017-12-12 | Google Inc. | Automatic selective gain control of audio data for speech recognition |
CN104468001B (zh) * | 2014-11-26 | 2017-04-19 | 北京邮电大学 | 基于无线电信号频谱特征模板的信号识别方法及系统 |
EP3265919B1 (en) * | 2015-03-06 | 2021-09-29 | Georgia Tech Research Corporation | Device fingerprinting for cyber-physical systems |
CN105989849B (zh) * | 2015-06-03 | 2019-12-03 | 乐融致新电子科技(天津)有限公司 | 一种语音增强方法、语音识别方法、聚类方法及装置 |
CN105513590A (zh) * | 2015-11-23 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | 语音识别的方法和装置 |
CN108510979B (zh) | 2017-02-27 | 2020-12-15 | 芋头科技(杭州)有限公司 | 一种混合频率声学识别模型的训练方法及语音识别方法 |
US10984795B2 (en) | 2018-04-12 | 2021-04-20 | Samsung Electronics Co., Ltd. | Electronic apparatus and operation method thereof |
CN113870872B (zh) * | 2018-06-05 | 2024-11-22 | 安克创新科技股份有限公司 | 基于深度学习的语音音质增强方法、装置和系统 |
US11392794B2 (en) | 2018-09-10 | 2022-07-19 | Ca, Inc. | Amplification of initial training data |
US11295726B2 (en) | 2019-04-08 | 2022-04-05 | International Business Machines Corporation | Synthetic narrowband data generation for narrowband automatic speech recognition systems |
US10997967B2 (en) * | 2019-04-18 | 2021-05-04 | Honeywell International Inc. | Methods and systems for cockpit speech recognition acoustic model training with multi-level corpus data augmentation |
US11335329B2 (en) * | 2019-08-28 | 2022-05-17 | Tata Consultancy Services Limited | Method and system for generating synthetic multi-conditioned data sets for robust automatic speech recognition |
CN110459205B (zh) | 2019-09-24 | 2022-04-12 | 京东科技控股股份有限公司 | 语音识别方法及装置、计算机可存储介质 |
US11749281B2 (en) | 2019-12-04 | 2023-09-05 | Soundhound Ai Ip, Llc | Neural speech-to-meaning |
US11308938B2 (en) | 2019-12-05 | 2022-04-19 | Soundhound, Inc. | Synthesizing speech recognition training data |
CN111916103B (zh) * | 2020-08-11 | 2024-02-20 | 南京拓灵智能科技有限公司 | 一种音频降噪方法和装置 |
CN112116903B (zh) * | 2020-08-17 | 2024-09-13 | 北京大米科技有限公司 | 语音合成模型的生成方法、装置、存储介质及电子设备 |
CN112131865B (zh) * | 2020-09-11 | 2023-12-08 | 成都运达科技股份有限公司 | 一种轨道交通报文数字压缩处理方法、装置及存储介质 |
CN113708863B (zh) * | 2021-09-10 | 2023-08-01 | 中国人民解放军63891部队 | 频谱感知训练数据集的构建方法及装置 |
US12148437B2 (en) * | 2021-12-10 | 2024-11-19 | Microsoft Technology Licensing, Llc | Feature domain bandwidth extension and spectral rebalance for ASR data augmentation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5455888A (en) * | 1992-12-04 | 1995-10-03 | Northern Telecom Limited | Speech bandwidth extension method and apparatus |
US5581652A (en) * | 1992-10-05 | 1996-12-03 | Nippon Telegraph And Telephone Corporation | Reconstruction of wideband speech from narrowband speech using codebooks |
CN1175171A (zh) * | 1997-07-24 | 1998-03-04 | 北京信威通信技术有限公司 | 扩频通信系统中载波恢复和补偿的方法及其装置 |
WO2000049601A1 (en) * | 1999-02-19 | 2000-08-24 | Custom Speech Usa, Inc. | Automated transcription system and method using two speech converting instances and computer-assisted correction |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2524472B2 (ja) | 1992-09-21 | 1996-08-14 | インターナショナル・ビジネス・マシーンズ・コーポレイション | 電話回線利用の音声認識システムを訓練する方法 |
SE512719C2 (sv) * | 1997-06-10 | 2000-05-02 | Lars Gustaf Liljeryd | En metod och anordning för reduktion av dataflöde baserad på harmonisk bandbreddsexpansion |
US6381571B1 (en) * | 1998-05-01 | 2002-04-30 | Texas Instruments Incorporated | Sequential determination of utterance log-spectral mean by maximum a posteriori probability estimation |
US6199041B1 (en) * | 1998-11-20 | 2001-03-06 | International Business Machines Corporation | System and method for sampling rate transformation in speech recognition |
JP4244514B2 (ja) * | 2000-10-23 | 2009-03-25 | セイコーエプソン株式会社 | 音声認識方法および音声認識装置 |
JP4577543B2 (ja) | 2000-11-21 | 2010-11-10 | ソニー株式会社 | モデル適応装置およびモデル適応方法、記録媒体、並びに音声認識装置 |
JP2002189487A (ja) * | 2000-12-20 | 2002-07-05 | Mitsubishi Electric Corp | 音声認識装置および音声認識方法 |
JP2002268698A (ja) | 2001-03-08 | 2002-09-20 | Nec Corp | 音声認識装置と標準パターン作成装置及び方法並びにプログラム |
US6990447B2 (en) * | 2001-11-15 | 2006-01-24 | Microsoft Corportion | Method and apparatus for denoising and deverberation using variational inference and strong speech models |
CA2454296A1 (en) * | 2003-12-29 | 2005-06-29 | Nokia Corporation | Method and device for speech enhancement in the presence of background noise |
US7454338B2 (en) * | 2005-02-08 | 2008-11-18 | Microsoft Corporation | Training wideband acoustic models in the cepstral domain using mixed-bandwidth training data and extended vectors for speech recognition |
-
2005
- 2005-02-10 US US10/597,983 patent/US8438026B2/en active Active
- 2005-02-10 CN CN200580005136.0A patent/CN101014997B/zh not_active Expired - Lifetime
- 2005-02-10 JP JP2006553731A patent/JP5230103B2/ja not_active Expired - Fee Related
- 2005-02-10 WO PCT/IB2005/050518 patent/WO2005083677A2/en active Application Filing
- 2005-02-10 EP EP05702937A patent/EP1719114A2/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5581652A (en) * | 1992-10-05 | 1996-12-03 | Nippon Telegraph And Telephone Corporation | Reconstruction of wideband speech from narrowband speech using codebooks |
US5455888A (en) * | 1992-12-04 | 1995-10-03 | Northern Telecom Limited | Speech bandwidth extension method and apparatus |
CN1175171A (zh) * | 1997-07-24 | 1998-03-04 | 北京信威通信技术有限公司 | 扩频通信系统中载波恢复和补偿的方法及其装置 |
WO2000049601A1 (en) * | 1999-02-19 | 2000-08-24 | Custom Speech Usa, Inc. | Automated transcription system and method using two speech converting instances and computer-assisted correction |
Non-Patent Citations (1)
Title |
---|
Enbom, N.Kleijn, W.B..Bandwidth expansion of speech based on vector quantizationof themel frequency cepstral coefficients.Speech Coding Proceedings, 1999 IEEE Workshop on.2002,171-173. * |
Also Published As
Publication number | Publication date |
---|---|
EP1719114A2 (en) | 2006-11-08 |
JP5230103B2 (ja) | 2013-07-10 |
US8438026B2 (en) | 2013-05-07 |
CN101014997A (zh) | 2007-08-08 |
WO2005083677A2 (en) | 2005-09-09 |
JP2007523374A (ja) | 2007-08-16 |
US20080215322A1 (en) | 2008-09-04 |
WO2005083677A3 (en) | 2006-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101014997B (zh) | 用于生成用于自动语音识别器的训练数据的方法和系统 | |
Shrawankar et al. | Techniques for feature extraction in speech recognition system: A comparative study | |
Sarikaya et al. | High resolution speech feature parametrization for monophone-based stressed speech recognition | |
Hirsch et al. | A new approach for the adaptation of HMMs to reverberation and background noise | |
WO2004049304A1 (ja) | 音声合成方法および音声合成装置 | |
Ismail et al. | Mfcc-vq approach for qalqalahtajweed rule checking | |
WO2013030134A1 (en) | Method and apparatus for acoustic source separation | |
CN113744715A (zh) | 声码器语音合成方法、装置、计算机设备及存储介质 | |
Chavan et al. | Speech recognition in noisy environment, issues and challenges: A review | |
JP2002268698A (ja) | 音声認識装置と標準パターン作成装置及び方法並びにプログラム | |
JP3587966B2 (ja) | 音声認識方法、装置そよびその記憶媒体 | |
Eichner et al. | Voice characteristics conversion for TTS using reverse VTLN | |
CN107919115B (zh) | 一种基于非线性谱变换的特征补偿方法 | |
Makhijani et al. | Speech enhancement using pitch detection approach for noisy environment | |
JP3250604B2 (ja) | 音声認識方法および装置 | |
JP4464797B2 (ja) | 音声認識方法、この方法を実施する装置、プログラムおよびその記録媒体 | |
Maged et al. | Improving speaker identification system using discrete wavelet transform and AWGN | |
Rynjah et al. | Khasi speech recognition using hidden Markov model with different spectral features: A comparison | |
Jain et al. | Comparative study of speaker recognition techniques in IoT devices for text independent negative recognition | |
Sehr et al. | Hands-free speech recognition using a reverberation model in the feature domain | |
Lyubimov et al. | Exploiting non-negative matrix factorization with linear constraints in noise-robust speaker identification | |
Singh et al. | A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters | |
Harshita et al. | Speech Recognition with Frequency Domain LineaR Prediction | |
Gupta et al. | Speech Recognition using MFCC & VQ | |
Tan et al. | Speech feature extraction and reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: NUANCE COMMUNICATION INC. Free format text: FORMER OWNER: KONINKLIKE PHILIPS ELECTRONICS N.V. Effective date: 20121227 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20121227 Address after: Massachusetts Patentee after: Nuance Communications, Inc. Address before: Holland Ian Deho Finn Patentee before: Koninklijke Philips Electronics N.V. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231027 Address after: Washington State Patentee after: MICROSOFT TECHNOLOGY LICENSING, LLC Address before: Massachusetts Patentee before: Nuance Communications, Inc. |
|
CX01 | Expiry of patent term |
Granted publication date: 20120404 |
|
CX01 | Expiry of patent term |