[go: up one dir, main page]

SG11201808360SA - Acoustic model training method, speech recognition method, apparatus, device and medium - Google Patents

Acoustic model training method, speech recognition method, apparatus, device and medium

Info

Publication number
SG11201808360SA
SG11201808360SA SG11201808360SA SG11201808360SA SG11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA
Authority
SG
Singapore
Prior art keywords
training
acoustic model
model
medium
model training
Prior art date
Application number
SG11201808360SA
Inventor
Hao Liang
Jianzong Wang
Ning Cheng
Jing Xiao
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of SG11201808360SA publication Critical patent/SG11201808360SA/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/16Hidden Markov models [HMM]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/022Demisyllables, biphones or triphones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Character Discrimination (AREA)

Abstract

An acoustic model training method, a speech recognition method, an apparatus, a device and a medium. The acoustic model training method comprises: performing feature extraction from a training speech signal to obtain an audio feature sequence; training the audio feature sequence by a phoneme mixed Gaussian Model-Hidden Markov Model to obtain a phoneme feature sequence; and training the phoneme feature sequence by a Deep Neural Net-Hidden Markov Model-sequence training model to obtain a target acoustic model. The acoustic model training method can effectively save the time required for an acoustic model training, improve the training efficiency, and ensure the recognition efficiency.
SG11201808360SA 2017-07-28 2017-08-31 Acoustic model training method, speech recognition method, apparatus, device and medium SG11201808360SA (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710627480.8A CN107680582B (en) 2017-07-28 2017-07-28 Acoustic model training method, voice recognition method, device, equipment and medium
PCT/CN2017/099825 WO2019019252A1 (en) 2017-07-28 2017-08-31 Acoustic model training method, speech recognition method and apparatus, device and medium

Publications (1)

Publication Number Publication Date
SG11201808360SA true SG11201808360SA (en) 2019-02-27

Family

ID=61133210

Family Applications (1)

Application Number Title Priority Date Filing Date
SG11201808360SA SG11201808360SA (en) 2017-07-28 2017-08-31 Acoustic model training method, speech recognition method, apparatus, device and medium

Country Status (4)

Country Link
US (1) US11030998B2 (en)
CN (1) CN107680582B (en)
SG (1) SG11201808360SA (en)
WO (1) WO2019019252A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634476A (en) * 2019-10-09 2019-12-31 深圳大学 A method and system for quickly building a robust acoustic model

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102535411B1 (en) * 2017-11-16 2023-05-23 삼성전자주식회사 Apparatus and method related to metric learning based data classification
CN108447475A (en) * 2018-03-02 2018-08-24 国家电网公司华中分部 A Method of Establishing a Speech Recognition Model Based on Power Dispatch System
CN108564940B (en) * 2018-03-20 2020-04-28 平安科技(深圳)有限公司 Speech recognition method, server and computer-readable storage medium
CN108806696B (en) * 2018-05-08 2020-06-05 平安科技(深圳)有限公司 Method and device for establishing voiceprint model, computer equipment and storage medium
CN108831463B (en) * 2018-06-28 2021-11-12 广州方硅信息技术有限公司 Lip language synthesis method and device, electronic equipment and storage medium
CN108989341B (en) * 2018-08-21 2023-01-13 平安科技(深圳)有限公司 Voice autonomous registration method and device, computer equipment and storage medium
CN108986835B (en) * 2018-08-28 2019-11-26 百度在线网络技术(北京)有限公司 Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network
CN109167880B (en) * 2018-08-30 2021-05-21 努比亚技术有限公司 Double-sided screen terminal control method, double-sided screen terminal and computer readable storage medium
CN109036379B (en) * 2018-09-06 2021-06-11 百度时代网络技术(北京)有限公司 Speech recognition method, apparatus and storage medium
CN110164452B (en) * 2018-10-10 2023-03-10 腾讯科技(深圳)有限公司 Voiceprint recognition method, model training method and server
CN111048062B (en) 2018-10-10 2022-10-04 华为技术有限公司 Speech synthesis method and apparatus
CN109559735B (en) * 2018-10-11 2023-10-27 平安科技(深圳)有限公司 Voice recognition method, terminal equipment and medium based on neural network
CN109524011A (en) * 2018-10-22 2019-03-26 四川虹美智能科技有限公司 A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition
CN109243429B (en) * 2018-11-21 2021-12-10 苏州奇梦者网络科技有限公司 Voice modeling method and device
US11170761B2 (en) * 2018-12-04 2021-11-09 Sorenson Ip Holdings, Llc Training of speech recognition systems
CN109326277B (en) * 2018-12-05 2022-02-08 四川长虹电器股份有限公司 Semi-supervised phoneme forced alignment model establishing method and system
CN109243465A (en) * 2018-12-06 2019-01-18 平安科技(深圳)有限公司 Voiceprint authentication method, device, computer equipment and storage medium
CN109830277B (en) * 2018-12-12 2024-03-15 平安科技(深圳)有限公司 Rope skipping monitoring method, electronic device and storage medium
CN109817191B (en) * 2019-01-04 2023-06-06 平安科技(深圳)有限公司 Tremolo modeling method, device, computer equipment and storage medium
CN109616103B (en) * 2019-01-09 2022-03-22 百度在线网络技术(北京)有限公司 Acoustic model training method and device and storage medium
CN109887484B (en) * 2019-02-22 2023-08-04 平安科技(深圳)有限公司 Dual learning-based voice recognition and voice synthesis method and device
CN111798857A (en) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 Information identification method and device, electronic equipment and storage medium
CN111833847B (en) * 2019-04-15 2023-07-25 北京百度网讯科技有限公司 Voice processing model training method and device
CN110415685A (en) * 2019-08-20 2019-11-05 河海大学 A Speech Recognition Method
WO2021126444A1 (en) * 2019-12-20 2021-06-24 Eduworks Corporation Real-time voice phishing detection
US11586964B2 (en) * 2020-01-30 2023-02-21 Dell Products L.P. Device component management using deep learning techniques
CN111489739B (en) * 2020-04-17 2023-06-16 嘉楠明芯(北京)科技有限公司 Phoneme recognition method, apparatus and computer readable storage medium
CN111696525A (en) * 2020-05-08 2020-09-22 天津大学 Kaldi-based Chinese speech recognition acoustic model construction method
CN111798841B (en) * 2020-05-13 2023-01-03 厦门快商通科技股份有限公司 Acoustic model training method and system, mobile terminal and storage medium
CN111666469B (en) * 2020-05-13 2023-06-16 广州国音智能科技有限公司 Statement library construction method, device, equipment and storage medium
CN111833852B (en) * 2020-06-30 2022-04-15 思必驰科技股份有限公司 Acoustic model training method and device and computer readable storage medium
CN111816171B (en) * 2020-08-31 2020-12-11 北京世纪好未来教育科技有限公司 Speech recognition model training method, speech recognition method and device
CN111933121B (en) * 2020-08-31 2024-03-12 广州市百果园信息技术有限公司 Acoustic model training method and device
CN112331219B (en) * 2020-11-05 2024-05-03 北京晴数智慧科技有限公司 Voice processing method and device
CN112489662B (en) * 2020-11-13 2024-06-18 北京汇钧科技有限公司 Method and apparatus for training speech processing model
CN113393828A (en) * 2020-11-24 2021-09-14 腾讯科技(深圳)有限公司 Training method of voice synthesis model, and voice synthesis method and device
CN113035247B (en) * 2021-03-17 2022-12-23 广州虎牙科技有限公司 Audio text alignment method and device, electronic equipment and storage medium
CN113223504B (en) * 2021-04-30 2023-12-26 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of acoustic model
TWI780738B (en) * 2021-05-28 2022-10-11 宇康生科股份有限公司 Abnormal articulation corpus amplification method and system, speech recognition platform, and abnormal articulation auxiliary device
CN113345418B (en) * 2021-06-09 2024-08-09 中国科学技术大学 Multilingual model training method based on cross-language self-training
CN113450803B (en) * 2021-06-09 2024-03-19 上海明略人工智能(集团)有限公司 Conference recording transfer method, system, computer device and readable storage medium
CN113449626B (en) * 2021-06-23 2023-11-07 中国科学院上海高等研究院 Method and device for analyzing vibration signal of hidden Markov model, storage medium and terminal
CN113689867B (en) * 2021-08-18 2022-06-28 北京百度网讯科技有限公司 Training method and device of voice conversion model, electronic equipment and medium
CN113723546B (en) * 2021-09-03 2023-12-22 江苏理工学院 Bearing fault detection method and system based on discrete hidden Markov model
CN114360517B (en) * 2021-12-17 2023-04-18 天翼爱音乐文化科技有限公司 Audio processing method and device in complex environment and storage medium
CN114446283A (en) * 2022-02-17 2022-05-06 平安普惠企业管理有限公司 Voice processing method, device, electronic device and storage medium
CN114783415A (en) * 2022-03-11 2022-07-22 科大讯飞股份有限公司 Voiceprint extraction method, identity recognition method and related equipment
CN116364063B (en) * 2023-06-01 2023-09-05 蔚来汽车科技(安徽)有限公司 Phoneme alignment method, apparatus, driving apparatus, and medium

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7089178B2 (en) * 2002-04-30 2006-08-08 Qualcomm Inc. Multistream network feature processing for a distributed speech recognition system
US8972253B2 (en) 2010-09-15 2015-03-03 Microsoft Technology Licensing, Llc Deep belief network for large vocabulary continuous speech recognition
US8442821B1 (en) 2012-07-27 2013-05-14 Google Inc. Multi-frame prediction for hybrid neural network/hidden Markov models
US9972306B2 (en) * 2012-08-07 2018-05-15 Interactive Intelligence Group, Inc. Method and system for acoustic data selection for training the parameters of an acoustic model
AU2013305615B2 (en) * 2012-08-24 2018-07-05 Genesys Cloud Services, Inc. Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems
CN103117060B (en) * 2013-01-18 2015-10-28 中国科学院声学研究所 For modeling method, the modeling of the acoustic model of speech recognition
CN103971678B (en) 2013-01-29 2015-08-12 腾讯科技(深圳)有限公司 Keyword spotting method and apparatus
CN103971685B (en) * 2013-01-30 2015-06-10 腾讯科技(深圳)有限公司 Method and system for recognizing voice commands
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
KR101988222B1 (en) 2015-02-12 2019-06-13 한국전자통신연구원 Apparatus and method for large vocabulary continuous speech recognition
CN107112005A (en) * 2015-04-17 2017-08-29 微软技术许可有限责任公司 Depth nerve SVMs
KR102494139B1 (en) * 2015-11-06 2023-01-31 삼성전자주식회사 Apparatus and method for training neural network, apparatus and method for speech recognition
JP6679898B2 (en) * 2015-11-24 2020-04-15 富士通株式会社 KEYWORD DETECTION DEVICE, KEYWORD DETECTION METHOD, AND KEYWORD DETECTION COMPUTER PROGRAM
CN105702250B (en) * 2016-01-06 2020-05-19 福建天晴数码有限公司 Speech recognition method and device
CN105869624B (en) * 2016-03-29 2019-05-10 腾讯科技(深圳)有限公司 The construction method and device of tone decoding network in spoken digit recognition
CN105976812B (en) * 2016-04-28 2019-04-26 腾讯科技(深圳)有限公司 A kind of audio recognition method and its equipment
CN105957518B (en) * 2016-06-16 2019-05-31 内蒙古大学 A kind of method of Mongol large vocabulary continuous speech recognition
CN106409289B (en) * 2016-09-23 2019-06-28 合肥美的智能科技有限公司 Environment self-adaption method, speech recognition equipment and the household electrical appliance of speech recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634476A (en) * 2019-10-09 2019-12-31 深圳大学 A method and system for quickly building a robust acoustic model

Also Published As

Publication number Publication date
US20210125603A1 (en) 2021-04-29
WO2019019252A1 (en) 2019-01-31
CN107680582A (en) 2018-02-09
US11030998B2 (en) 2021-06-08
CN107680582B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
SG11201808360SA (en) Acoustic model training method, speech recognition method, apparatus, device and medium
EP3154054A3 (en) Method and apparatus for training language model and recognizing speech
EP4053835A4 (en) VOICE RECOGNITION METHOD AND APPARATUS, AND DEVICE, AND STORAGE MEDIA
PH12019501674A1 (en) Speech wakeup method, apparatus, and electronic device
WO2014025682A3 (en) Acoustic data selection for training the parameters of an acoustic model
EP3001662A3 (en) Conference proceed apparatus and method for advancing conference
EP4235369A3 (en) Modality learning on mobile devices
EP3648099A4 (en) Voice recognition method, device, apparatus, and storage medium
EP3968179A4 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
MY179900A (en) Speech recognition method and speech recognition apparatus
EP4414977A3 (en) Speech endpointing
GB2551917A (en) Privacy-preserving training corpus selection
WO2018038385A3 (en) Method for voice recognition and electronic device for performing same
EP4235646A3 (en) Adaptive audio enhancement for multichannel speech recognition
EP3353766A4 (en) Methods for the automated generation of speech sample asset production scores for users of a distributed language learning system, automated accent recognition and quantification and improved speech recognition
EP3479376A4 (en) Speech recognition method and apparatus based on speaker recognition
EP3193328A4 (en) Method and device for performing voice recognition using grammar model
WO2016044027A8 (en) Method and apparatus for performing speaker recognition
EP3046053A3 (en) Method and apparatus for training language model, and method and apparatus for recognizing language
EP3584790A4 (en) Voiceprint recognition method, device, storage medium, and background server
EP4280210A3 (en) Hotword detection on multiple devices
EP4235649A3 (en) Language model biasing
EP4113507A4 (en) VOICE RECOGNITION METHOD AND APPARATUS, APPARATUS AND STORAGE MEDIUM
GB2566215A (en) Voice user interface
EP2963643A3 (en) Entity name recognition