SG11201808360SA - Acoustic model training method, speech recognition method, apparatus, device and medium - Google Patents
Acoustic model training method, speech recognition method, apparatus, device and mediumInfo
- Publication number
- SG11201808360SA SG11201808360SA SG11201808360SA SG11201808360SA SG11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA
- Authority
- SG
- Singapore
- Prior art keywords
- training
- acoustic model
- model
- medium
- model training
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/16—Hidden Markov models [HMM]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/148—Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/022—Demisyllables, biphones or triphones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Electrically Operated Instructional Devices (AREA)
- Character Discrimination (AREA)
Abstract
An acoustic model training method, a speech recognition method, an apparatus, a device and a medium. The acoustic model training method comprises: performing feature extraction from a training speech signal to obtain an audio feature sequence; training the audio feature sequence by a phoneme mixed Gaussian Model-Hidden Markov Model to obtain a phoneme feature sequence; and training the phoneme feature sequence by a Deep Neural Net-Hidden Markov Model-sequence training model to obtain a target acoustic model. The acoustic model training method can effectively save the time required for an acoustic model training, improve the training efficiency, and ensure the recognition efficiency.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710627480.8A CN107680582B (en) | 2017-07-28 | 2017-07-28 | Acoustic model training method, voice recognition method, device, equipment and medium |
PCT/CN2017/099825 WO2019019252A1 (en) | 2017-07-28 | 2017-08-31 | Acoustic model training method, speech recognition method and apparatus, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
SG11201808360SA true SG11201808360SA (en) | 2019-02-27 |
Family
ID=61133210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
SG11201808360SA SG11201808360SA (en) | 2017-07-28 | 2017-08-31 | Acoustic model training method, speech recognition method, apparatus, device and medium |
Country Status (4)
Country | Link |
---|---|
US (1) | US11030998B2 (en) |
CN (1) | CN107680582B (en) |
SG (1) | SG11201808360SA (en) |
WO (1) | WO2019019252A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110634476A (en) * | 2019-10-09 | 2019-12-31 | 深圳大学 | A method and system for quickly building a robust acoustic model |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102535411B1 (en) * | 2017-11-16 | 2023-05-23 | 삼성전자주식회사 | Apparatus and method related to metric learning based data classification |
CN108447475A (en) * | 2018-03-02 | 2018-08-24 | 国家电网公司华中分部 | A Method of Establishing a Speech Recognition Model Based on Power Dispatch System |
CN108564940B (en) * | 2018-03-20 | 2020-04-28 | 平安科技(深圳)有限公司 | Speech recognition method, server and computer-readable storage medium |
CN108806696B (en) * | 2018-05-08 | 2020-06-05 | 平安科技(深圳)有限公司 | Method and device for establishing voiceprint model, computer equipment and storage medium |
CN108831463B (en) * | 2018-06-28 | 2021-11-12 | 广州方硅信息技术有限公司 | Lip language synthesis method and device, electronic equipment and storage medium |
CN108989341B (en) * | 2018-08-21 | 2023-01-13 | 平安科技(深圳)有限公司 | Voice autonomous registration method and device, computer equipment and storage medium |
CN108986835B (en) * | 2018-08-28 | 2019-11-26 | 百度在线网络技术(北京)有限公司 | Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network |
CN109167880B (en) * | 2018-08-30 | 2021-05-21 | 努比亚技术有限公司 | Double-sided screen terminal control method, double-sided screen terminal and computer readable storage medium |
CN109036379B (en) * | 2018-09-06 | 2021-06-11 | 百度时代网络技术(北京)有限公司 | Speech recognition method, apparatus and storage medium |
CN110164452B (en) * | 2018-10-10 | 2023-03-10 | 腾讯科技(深圳)有限公司 | Voiceprint recognition method, model training method and server |
CN111048062B (en) | 2018-10-10 | 2022-10-04 | 华为技术有限公司 | Speech synthesis method and apparatus |
CN109559735B (en) * | 2018-10-11 | 2023-10-27 | 平安科技(深圳)有限公司 | Voice recognition method, terminal equipment and medium based on neural network |
CN109524011A (en) * | 2018-10-22 | 2019-03-26 | 四川虹美智能科技有限公司 | A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition |
CN109243429B (en) * | 2018-11-21 | 2021-12-10 | 苏州奇梦者网络科技有限公司 | Voice modeling method and device |
US11170761B2 (en) * | 2018-12-04 | 2021-11-09 | Sorenson Ip Holdings, Llc | Training of speech recognition systems |
CN109326277B (en) * | 2018-12-05 | 2022-02-08 | 四川长虹电器股份有限公司 | Semi-supervised phoneme forced alignment model establishing method and system |
CN109243465A (en) * | 2018-12-06 | 2019-01-18 | 平安科技(深圳)有限公司 | Voiceprint authentication method, device, computer equipment and storage medium |
CN109830277B (en) * | 2018-12-12 | 2024-03-15 | 平安科技(深圳)有限公司 | Rope skipping monitoring method, electronic device and storage medium |
CN109817191B (en) * | 2019-01-04 | 2023-06-06 | 平安科技(深圳)有限公司 | Tremolo modeling method, device, computer equipment and storage medium |
CN109616103B (en) * | 2019-01-09 | 2022-03-22 | 百度在线网络技术(北京)有限公司 | Acoustic model training method and device and storage medium |
CN109887484B (en) * | 2019-02-22 | 2023-08-04 | 平安科技(深圳)有限公司 | Dual learning-based voice recognition and voice synthesis method and device |
CN111798857A (en) * | 2019-04-08 | 2020-10-20 | 北京嘀嘀无限科技发展有限公司 | Information identification method and device, electronic equipment and storage medium |
CN111833847B (en) * | 2019-04-15 | 2023-07-25 | 北京百度网讯科技有限公司 | Voice processing model training method and device |
CN110415685A (en) * | 2019-08-20 | 2019-11-05 | 河海大学 | A Speech Recognition Method |
WO2021126444A1 (en) * | 2019-12-20 | 2021-06-24 | Eduworks Corporation | Real-time voice phishing detection |
US11586964B2 (en) * | 2020-01-30 | 2023-02-21 | Dell Products L.P. | Device component management using deep learning techniques |
CN111489739B (en) * | 2020-04-17 | 2023-06-16 | 嘉楠明芯(北京)科技有限公司 | Phoneme recognition method, apparatus and computer readable storage medium |
CN111696525A (en) * | 2020-05-08 | 2020-09-22 | 天津大学 | Kaldi-based Chinese speech recognition acoustic model construction method |
CN111798841B (en) * | 2020-05-13 | 2023-01-03 | 厦门快商通科技股份有限公司 | Acoustic model training method and system, mobile terminal and storage medium |
CN111666469B (en) * | 2020-05-13 | 2023-06-16 | 广州国音智能科技有限公司 | Statement library construction method, device, equipment and storage medium |
CN111833852B (en) * | 2020-06-30 | 2022-04-15 | 思必驰科技股份有限公司 | Acoustic model training method and device and computer readable storage medium |
CN111816171B (en) * | 2020-08-31 | 2020-12-11 | 北京世纪好未来教育科技有限公司 | Speech recognition model training method, speech recognition method and device |
CN111933121B (en) * | 2020-08-31 | 2024-03-12 | 广州市百果园信息技术有限公司 | Acoustic model training method and device |
CN112331219B (en) * | 2020-11-05 | 2024-05-03 | 北京晴数智慧科技有限公司 | Voice processing method and device |
CN112489662B (en) * | 2020-11-13 | 2024-06-18 | 北京汇钧科技有限公司 | Method and apparatus for training speech processing model |
CN113393828A (en) * | 2020-11-24 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Training method of voice synthesis model, and voice synthesis method and device |
CN113035247B (en) * | 2021-03-17 | 2022-12-23 | 广州虎牙科技有限公司 | Audio text alignment method and device, electronic equipment and storage medium |
CN113223504B (en) * | 2021-04-30 | 2023-12-26 | 平安科技(深圳)有限公司 | Training method, device, equipment and storage medium of acoustic model |
TWI780738B (en) * | 2021-05-28 | 2022-10-11 | 宇康生科股份有限公司 | Abnormal articulation corpus amplification method and system, speech recognition platform, and abnormal articulation auxiliary device |
CN113345418B (en) * | 2021-06-09 | 2024-08-09 | 中国科学技术大学 | Multilingual model training method based on cross-language self-training |
CN113450803B (en) * | 2021-06-09 | 2024-03-19 | 上海明略人工智能(集团)有限公司 | Conference recording transfer method, system, computer device and readable storage medium |
CN113449626B (en) * | 2021-06-23 | 2023-11-07 | 中国科学院上海高等研究院 | Method and device for analyzing vibration signal of hidden Markov model, storage medium and terminal |
CN113689867B (en) * | 2021-08-18 | 2022-06-28 | 北京百度网讯科技有限公司 | Training method and device of voice conversion model, electronic equipment and medium |
CN113723546B (en) * | 2021-09-03 | 2023-12-22 | 江苏理工学院 | Bearing fault detection method and system based on discrete hidden Markov model |
CN114360517B (en) * | 2021-12-17 | 2023-04-18 | 天翼爱音乐文化科技有限公司 | Audio processing method and device in complex environment and storage medium |
CN114446283A (en) * | 2022-02-17 | 2022-05-06 | 平安普惠企业管理有限公司 | Voice processing method, device, electronic device and storage medium |
CN114783415A (en) * | 2022-03-11 | 2022-07-22 | 科大讯飞股份有限公司 | Voiceprint extraction method, identity recognition method and related equipment |
CN116364063B (en) * | 2023-06-01 | 2023-09-05 | 蔚来汽车科技(安徽)有限公司 | Phoneme alignment method, apparatus, driving apparatus, and medium |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7089178B2 (en) * | 2002-04-30 | 2006-08-08 | Qualcomm Inc. | Multistream network feature processing for a distributed speech recognition system |
US8972253B2 (en) | 2010-09-15 | 2015-03-03 | Microsoft Technology Licensing, Llc | Deep belief network for large vocabulary continuous speech recognition |
US8442821B1 (en) | 2012-07-27 | 2013-05-14 | Google Inc. | Multi-frame prediction for hybrid neural network/hidden Markov models |
US9972306B2 (en) * | 2012-08-07 | 2018-05-15 | Interactive Intelligence Group, Inc. | Method and system for acoustic data selection for training the parameters of an acoustic model |
AU2013305615B2 (en) * | 2012-08-24 | 2018-07-05 | Genesys Cloud Services, Inc. | Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems |
CN103117060B (en) * | 2013-01-18 | 2015-10-28 | 中国科学院声学研究所 | For modeling method, the modeling of the acoustic model of speech recognition |
CN103971678B (en) | 2013-01-29 | 2015-08-12 | 腾讯科技(深圳)有限公司 | Keyword spotting method and apparatus |
CN103971685B (en) * | 2013-01-30 | 2015-06-10 | 腾讯科技(深圳)有限公司 | Method and system for recognizing voice commands |
CN104575504A (en) * | 2014-12-24 | 2015-04-29 | 上海师范大学 | Method for personalized television voice wake-up by voiceprint and voice identification |
KR101988222B1 (en) | 2015-02-12 | 2019-06-13 | 한국전자통신연구원 | Apparatus and method for large vocabulary continuous speech recognition |
CN107112005A (en) * | 2015-04-17 | 2017-08-29 | 微软技术许可有限责任公司 | Depth nerve SVMs |
KR102494139B1 (en) * | 2015-11-06 | 2023-01-31 | 삼성전자주식회사 | Apparatus and method for training neural network, apparatus and method for speech recognition |
JP6679898B2 (en) * | 2015-11-24 | 2020-04-15 | 富士通株式会社 | KEYWORD DETECTION DEVICE, KEYWORD DETECTION METHOD, AND KEYWORD DETECTION COMPUTER PROGRAM |
CN105702250B (en) * | 2016-01-06 | 2020-05-19 | 福建天晴数码有限公司 | Speech recognition method and device |
CN105869624B (en) * | 2016-03-29 | 2019-05-10 | 腾讯科技(深圳)有限公司 | The construction method and device of tone decoding network in spoken digit recognition |
CN105976812B (en) * | 2016-04-28 | 2019-04-26 | 腾讯科技(深圳)有限公司 | A kind of audio recognition method and its equipment |
CN105957518B (en) * | 2016-06-16 | 2019-05-31 | 内蒙古大学 | A kind of method of Mongol large vocabulary continuous speech recognition |
CN106409289B (en) * | 2016-09-23 | 2019-06-28 | 合肥美的智能科技有限公司 | Environment self-adaption method, speech recognition equipment and the household electrical appliance of speech recognition |
-
2017
- 2017-07-28 CN CN201710627480.8A patent/CN107680582B/en active Active
- 2017-08-31 US US16/097,850 patent/US11030998B2/en active Active
- 2017-08-31 WO PCT/CN2017/099825 patent/WO2019019252A1/en active Application Filing
- 2017-08-31 SG SG11201808360SA patent/SG11201808360SA/en unknown
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110634476A (en) * | 2019-10-09 | 2019-12-31 | 深圳大学 | A method and system for quickly building a robust acoustic model |
Also Published As
Publication number | Publication date |
---|---|
US20210125603A1 (en) | 2021-04-29 |
WO2019019252A1 (en) | 2019-01-31 |
CN107680582A (en) | 2018-02-09 |
US11030998B2 (en) | 2021-06-08 |
CN107680582B (en) | 2021-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
SG11201808360SA (en) | Acoustic model training method, speech recognition method, apparatus, device and medium | |
EP3154054A3 (en) | Method and apparatus for training language model and recognizing speech | |
EP4053835A4 (en) | VOICE RECOGNITION METHOD AND APPARATUS, AND DEVICE, AND STORAGE MEDIA | |
PH12019501674A1 (en) | Speech wakeup method, apparatus, and electronic device | |
WO2014025682A3 (en) | Acoustic data selection for training the parameters of an acoustic model | |
EP3001662A3 (en) | Conference proceed apparatus and method for advancing conference | |
EP4235369A3 (en) | Modality learning on mobile devices | |
EP3648099A4 (en) | Voice recognition method, device, apparatus, and storage medium | |
EP3968179A4 (en) | Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device | |
MY179900A (en) | Speech recognition method and speech recognition apparatus | |
EP4414977A3 (en) | Speech endpointing | |
GB2551917A (en) | Privacy-preserving training corpus selection | |
WO2018038385A3 (en) | Method for voice recognition and electronic device for performing same | |
EP4235646A3 (en) | Adaptive audio enhancement for multichannel speech recognition | |
EP3353766A4 (en) | Methods for the automated generation of speech sample asset production scores for users of a distributed language learning system, automated accent recognition and quantification and improved speech recognition | |
EP3479376A4 (en) | Speech recognition method and apparatus based on speaker recognition | |
EP3193328A4 (en) | Method and device for performing voice recognition using grammar model | |
WO2016044027A8 (en) | Method and apparatus for performing speaker recognition | |
EP3046053A3 (en) | Method and apparatus for training language model, and method and apparatus for recognizing language | |
EP3584790A4 (en) | Voiceprint recognition method, device, storage medium, and background server | |
EP4280210A3 (en) | Hotword detection on multiple devices | |
EP4235649A3 (en) | Language model biasing | |
EP4113507A4 (en) | VOICE RECOGNITION METHOD AND APPARATUS, APPARATUS AND STORAGE MEDIUM | |
GB2566215A (en) | Voice user interface | |
EP2963643A3 (en) | Entity name recognition |