CN111583939A - Method and device for specific target wake-up by voice recognition - Google Patents
Method and device for specific target wake-up by voice recognition Download PDFInfo
- Publication number
- CN111583939A CN111583939A CN201910124945.7A CN201910124945A CN111583939A CN 111583939 A CN111583939 A CN 111583939A CN 201910124945 A CN201910124945 A CN 201910124945A CN 111583939 A CN111583939 A CN 111583939A
- Authority
- CN
- China
- Prior art keywords
- target
- module
- specific target
- voice
- tested
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种语音识别领域,尤其涉及一种语音识别的方法及装置。The present invention relates to the field of speech recognition, and in particular, to a method and device for speech recognition.
背景技术Background technique
近年来,智慧音箱逐渐改变人们生活的方式,智慧音箱作为语音助理可协助用户执行生活上的任务,例如帮忙叫车、购物、提醒事项、记录资讯等等,尽管智慧音箱带来生活上更多便利,然而智慧音箱仍有许多安全隐患,有时智慧音箱无法有效地判别使用者是否为初始设定的用户而进行信用卡下订商品的可能性,因此,为了防止有心人士使用,目前市面上许多智慧音箱会采用语音识别的方式作为防护措施。In recent years, smart speakers have gradually changed the way people live. As a voice assistant, smart speakers can assist users in performing tasks in life, such as helping with car calls, shopping, reminders, recording information, etc. Although smart speakers bring more It is convenient, but there are still many security risks in smart speakers. Sometimes smart speakers cannot effectively determine whether the user is the initial user and the possibility of ordering goods with a credit card. Therefore, in order to prevent people from using it, many smart speakers are currently on the market. The speaker will use voice recognition as a protective measure.
一般的智慧音箱通常采用语音唤醒的方式唤醒智慧音箱进而执行后续任务,所谓语音唤醒的方式通常是从一段连续的语音中自动撷取一些使用者预先注册的语音指令(唤醒词)。传统上使用隐藏式马可夫模型(Hidden Markov Model,HMM)的技术,利用单独的音素(Phoneme)、音节的特征向量比对,找出机率最大(最有可能)的单字,后来,又结合高斯混合模型(Gaussian Mixture Model,GMM)形成经典的GMM-HMM模型。现有的GMM-HMM模型常采用最大相似度训练方法(Maximum Likelihood),然而此种方法在某些因素下容易使得竞争者答案机率大于正确答案机率,则导致正确率的下降,因此仍有进步改善的空间。General smart speakers usually use voice wake-up to wake up the smart speakers to perform subsequent tasks. The so-called voice wake-up method usually automatically captures some pre-registered voice commands (wake words) from a continuous voice. Traditionally, the Hidden Markov Model (HMM) technique was used to compare the feature vectors of individual phonemes and syllables to find the word with the highest probability (most likely), and later, combined with Gaussian mixture The model (Gaussian Mixture Model, GMM) forms the classic GMM-HMM model. The existing GMM-HMM model often adopts the maximum similarity training method (Maximum Likelihood). However, this method tends to make the competitor's answer probability greater than the correct answer probability under certain factors, resulting in a decline in the correct rate, so there is still progress. Room for improvement.
发明内容SUMMARY OF THE INVENTION
本发明的目的是针对上述现有技术存在的缺陷和不足,提出一种语音识别用于特定目标唤醒的实现方法,利用特定目标的唤醒词结合采用鉴别式训练的潜藏向量状态模型(Hidden Vector State Model,简称HVS Model),实现对特定目标的身分识别监测,从而达到特定目标语音唤醒的目的。The purpose of the present invention is to aim at the defects and deficiencies of the above-mentioned prior art, and propose a method for realizing the wake-up of a specific target by speech recognition, which utilizes the wake-up word of the specific target in combination with the Hidden Vector State Model (Hidden Vector State Model) of discriminative training. Model, referred to as HVS Model), to realize the identification and monitoring of specific targets, so as to achieve the purpose of voice wake-up of specific targets.
为实现上述目的,本发明实施例一方面提出了一种语音识别用于特定目标唤醒的方法,包括以下步骤:In order to achieve the above object, an embodiment of the present invention provides a method for waking up a specific target by voice recognition, including the following steps:
S1:接收一特定目标的语音讯息并对所述特定目标的语音讯息进行预处理,提取所述特定目标的一语音特征;S1: Receive a voice message of a specific target and preprocess the voice message of the specific target to extract a voice feature of the specific target;
S2:将所述特定目标的语音特征作为以鉴别式训练的潜藏向量状态模型(HVS Model)的输入数据并进行训练,得到一特定目标声学模型,并储存所述特定目标声学模型;S2: take the speech feature of the specific target as the input data of the latent vector state model (HVS Model) of the discriminative training and train to obtain a specific target acoustic model, and store the specific target acoustic model;
S3:接收一待测目标的语音讯息并对所述待测目标的语音讯息进行预处理,提取所述待测目标的一语音特征;S3: Receive a voice message of a target to be tested and preprocess the voice message of the target to be tested to extract a voice feature of the target to be tested;
S4:将所述待测目标的语音特征作为以鉴别式训练的潜藏向量状态模型的输入数据并进行训练,得到一待测目标的声学模型;S4: take the voice feature of the target to be measured as the input data of the latent vector state model trained by the discriminant type and train to obtain an acoustic model of the target to be measured;
S5:比对所述待测目标的声学模型与所述特定目标的声学模型之间的关联性,若两者有关联则将所述待测目标的语音特征使用至少一语言模型进行语言解码,并根据语言解码结果判断是否唤醒。S5: Compare the correlation between the acoustic model of the target to be measured and the acoustic model of the specific target, and if the two are correlated, use at least one language model to perform language decoding on the speech feature of the target to be measured, And judge whether to wake up according to the language decoding result.
具体地,所述特定目标的语音讯息与所述待测目标的语音讯息中包括至少一唤醒词。Specifically, the voice message of the specific target and the voice message of the target to be tested include at least one wake-up word.
具体地,所述预处理包括:将语音讯息进行杂讯抑制处理及回音消除处理。Specifically, the preprocessing includes: performing noise suppression processing and echo cancellation processing on the voice message.
具体地,所述语音特征利用梅尔倒频谱系数(MFCC)的方式取得。Specifically, the speech features are obtained by means of Mel cepstral coefficients (MFCC).
具体地,所述鉴别式训练采用最大互信息法(MMI)进行训练。Specifically, the discriminative training adopts the maximum mutual information method (MMI) for training.
具体地,所述语言模型包括一词库模型或一文法模型或及其组合。Specifically, the language model includes a vocabulary model or a grammar model or a combination thereof.
具体地,所述根据语言解码结果判断是否达到语音识别的唤醒,其步骤包含:将所述待测目标的语音特征进行语言解码;判断待测目标语音讯息其中是否包含所述唤醒词;若包含所述唤醒词则语音识别唤醒启动,若没有包含所述唤醒词则语音识别唤醒未启动。Specifically, judging whether the wake-up of speech recognition is achieved according to the language decoding result, the steps include: performing language decoding on the speech feature of the target to be tested; judging whether the voice message of the target to be tested contains the wake-up word; If the wake-up word is included, the voice recognition wake-up is activated, and if the wake-up word is not included, the voice recognition wake-up is not activated.
本发明实施例另一方面提出一种语音识别用于特定目标唤醒的装置,包括:Another aspect of the embodiments of the present invention provides a voice recognition device for waking up a specific target, including:
一采集模组,包括多个麦克风阵列,用于接收特定目标与待测目标的语音讯息,其中所述语音讯息包含一唤醒词;an acquisition module including a plurality of microphone arrays for receiving voice messages of a specific target and a target to be tested, wherein the voice messages include a wake-up word;
一提取模组,连接所述采集模组,用于提取所述特定目标以及所述待测目标的语音讯息其中的MFCC语音特征;an extraction module, connected to the acquisition module, for extracting the MFCC voice features in the voice messages of the specific target and the target to be tested;
一训练模组,连接所述提取模组,用于将所述特定目标以及所述待测目标的语音讯息其中的MFCC语音特征作为以最大互信息法训练的潜藏向量状态模型的输入数据,并获取训练后的特定目标的声学模型与待测目标的声学模型;a training module, connected to the extraction module, for using the MFCC voice features in the voice messages of the specific target and the target to be tested as the input data of the latent vector state model trained by the maximum mutual information method, and Obtain the acoustic model of the specific target after training and the acoustic model of the target to be tested;
一存储模组,连接所述训练模组,用于保存训练完成的特定目标的声学模型;a storage module, connected to the training module, for saving the acoustic model of the specific target that has been trained;
一解码模组,连接所述提取模组,用于将所述待测目标的语音讯息进行语言解码;以及a decoding module, connected to the extraction module, for performing language decoding on the voice message of the target to be tested; and
一处理器模组,连接所述训练模组、所述存储模组与所述解码模组,用于比对所述存储模组中的特定目标的声学模型与待测目标的声学模型,以及根据比对结果判断是否启动所述解码模组进行待测目标的语音讯息的语言解码,并根据语言解码后的待测目标的语音讯息确认是否包含唤醒词以唤醒所述装置。a processor module, connected to the training module, the storage module and the decoding module, for comparing the acoustic model of the specific target in the storage module with the acoustic model of the target to be tested, and According to the comparison result, it is judged whether to activate the decoding module to perform language decoding of the voice message of the target to be tested, and to confirm whether the voice message of the target to be tested after the language decoding contains a wake-up word to wake up the device.
具体地,所述装置进一步包括一注册模组,所述注册模组连接所述采集模组与所述存储模组,所述注册模组用于启动保存特定目标的声学模型到所述存储模组。Specifically, the device further includes a registration module, the registration module is connected to the acquisition module and the storage module, and the registration module is used to start saving the acoustic model of a specific target to the storage module Group.
具体地,所述装置进一步包括一无线通讯模组,其中,所述无线通讯模组用于进行外部通讯连接。Specifically, the device further includes a wireless communication module, wherein the wireless communication module is used for external communication connection.
与现有技术相比,本发明语音识别用于特定目标唤醒的方法及装置采用鉴别式训练的潜藏向量状态模型作为声学模型,使用鉴别式训练除了最大化正确答案的出现机率外,也会将竞争者的出现机率降低,增加其正确答案与竞争者之间的鉴别能力,能够快速且准确地判断待测目标是否为特定目标,进而达到唤醒的功用。Compared with the prior art, the voice recognition method and device for specific target awakening of the present invention adopts the latent vector state model of discriminative training as the acoustic model. The probability of a competitor's appearance is reduced, the ability to distinguish between the correct answer and the competitor is increased, and it can quickly and accurately determine whether the target to be tested is a specific target, thereby achieving the function of awakening.
附图说明Description of drawings
图1为本发明实施例一种语音识别用于特定目标唤醒的方法流程示意图。FIG. 1 is a schematic flowchart of a method for waking up a specific target by voice recognition according to an embodiment of the present invention.
图2为本发明实施例一种语音识别用于特定目标唤醒的装置示意图。FIG. 2 is a schematic diagram of an apparatus for waking up a specific target by voice recognition according to an embodiment of the present invention.
图中各附图标记说明如下:The reference numerals in the figure are explained as follows:
100 语音识别装置 11 采集模组100
12 提取模组 13 训练模组12
14 存储模组 15 解码模组14
16 处理器模组 17 注册模组16
18 无线通讯模组18 Wireless communication module
S101~S105 流程步骤。S101~S105 Process steps.
具体实施方式Detailed ways
为详细说明本发明的技术内容、构造特征、所达成的目的及功效,以下兹例举实施例并配合图式详予说明。In order to describe the technical content, structural features, achieved goals and effects of the present invention in detail, the following examples are given and described in detail with the drawings.
请参阅图1,图1为本发明实施例公开的一种语音识别用于特定目标唤醒的方法流程示意图,包括如下步骤:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a method for waking up a specific target by speech recognition disclosed in an embodiment of the present invention, including the following steps:
步骤S101:接收一特定目标的语音讯息并对所述特定目标的语音讯息进行预处理,提取所述特定目标的一语音特征;Step S101: Receive a voice message of a specific target and preprocess the voice message of the specific target to extract a voice feature of the specific target;
具体的,此步骤中特定目标指的是进行语音识别中达到唤醒条件的注册用户,而语音讯息为事先准备好的文本,此文本内容中会包含预设的一唤醒词,特定目标先朗读文本内容并经由本发明实施例一语音识别装置100的一采集模组11收集特定目标的语音讯息。Specifically, the specific target in this step refers to the registered user who has reached the wake-up condition in the speech recognition, and the voice message is a text prepared in advance. The text content will contain a preset wake-up word, and the specific target will read the text aloud first. content and collect voice information of a specific target through a
具体的,此步骤中所收集的语音讯息为类比语音讯号,需要将类比语音讯号转成数位语音讯号才可进行后续语音识别处理。另外,在语音讯息中可能会包含其他环境噪音,因此也需要对语音讯息进行预处理,滤除无用的环境噪音并取得有效的语音讯号,所述预处理包含对数位语音讯号进行杂讯抑制处理及回音消除处理,上述预处理可以参照目前现有降噪处理的技术。Specifically, the voice information collected in this step is an analog voice signal, and the analog voice signal needs to be converted into a digital voice signal before subsequent voice recognition processing can be performed. In addition, other environmental noises may be included in the voice information, so it is also necessary to preprocess the voice information to filter out the useless environmental noise and obtain an effective voice signal. The preprocessing includes noise suppression processing on the digital voice signal. and echo cancellation processing, the above-mentioned preprocessing can refer to the existing noise reduction processing technology.
具体的,完成预处理后的语音讯号需要提取特定目标的语音特征,本发明实施例中采用梅尔倒频谱系数(Mel-frequency Cepstral Coefficients, 简称MFCC)的方式撷取特定目标的语音特征,将预处理后的语音讯号切割为多个音框(Frame blocking)、针对需要加重语音讯号的部分进行预强调(Pre-emphasis)、进行加窗(Window)等作业,得到更加清晰、明确的一段语音特征。Specifically, the voice signal after the preprocessing needs to extract the voice features of the specific target. In the embodiment of the present invention, the method of Mel-frequency Cepstral Coefficients (MFCC) is used to extract the voice features of the specific target. The pre-processed speech signal is cut into multiple frames (Frame blocking), pre-emphasis (Pre-emphasis) and windowing (Window) are performed for the part that needs to be emphasized, so as to obtain a clearer and clearer piece of speech. feature.
步骤S102:将所述特定目标的语音特征作为以鉴别式训练的潜藏向量状态模型(Hidden Vector State Model, 简称HVS Model)的输入数据并进行训练,得到一特定目标声学模型,并储存所述特定目标声学模型;Step S102: Use the speech feature of the specific target as the input data of the discriminative training Hidden Vector State Model (HVS Model for short) and train it to obtain a specific target acoustic model, and store the specific target acoustic model. target acoustic model;
具体的,此步骤中将特定目标的语音特征作为输入资料进行声学模型的训练,在本发明实施例中采用潜藏向量状态模型并使用鉴别式训练的方式进行训练,鉴别式训练不以最大化训练声学语料的相似度为目标,而以最小化分类(或辨识)错误为目标,增进辨识率。Specifically, in this step, the voice feature of the specific target is used as the input data to train the acoustic model. In the embodiment of the present invention, the latent vector state model is used and the training is performed by using the discriminative training method. The discriminative training does not maximize the training. The similarity of the acoustic corpus is the goal, and the classification (or identification) error is minimized to improve the identification rate.
其中鉴别式训练是以最大互信息法(Maximum Mutual Information, 简称MMI)为准则进行训练,其能够将最大化正确答案出现的机率提高,并有效的降低竞争者出现的机率,并增加正确答案与竞争者的鉴别性。Among them, the discriminative training is based on the maximum mutual information method (Maximum Mutual Information, referred to as MMI) for training, which can maximize the probability of the correct answer, and effectively reduce the probability of competitors, and increase the probability of correct answers and Competitor discrimination.
具体的,此步骤中储存所述特定目标声学模型指的是储存到本发明实施例语音识别装置100的一存储模组14。Specifically, storing the specific target acoustic model in this step refers to storing in a
步骤S103:接收一待测目标的语音讯息并对所述待测目标的语音讯息进行预处理,提取所述待测目标的一语音特征;Step S103: Receive a voice message of a target to be tested and preprocess the voice message of the target to be tested to extract a voice feature of the target to be tested;
具体的,此步骤中待测目标指的是欲进行语音识别比对的使用人,待测目标输出一段语音讯息,并经由本发明实施例语音识别装置100的一采集模组11收集待测目标的语音讯息。Specifically, in this step, the target to be measured refers to a user who wants to perform voice recognition and comparison. The target to be measured outputs a piece of voice information, and the target to be measured is collected by a
具体的,此步骤中对待测目标的语音讯息进行预处理,并提取所述待测目标的语音特征,其处理步骤等同于上述对特定目标的语音讯息进行预处理,并提取所述特定目标的语音特征的流程。Specifically, in this step, the voice information of the target to be tested is preprocessed, and the voice features of the target to be tested are extracted. The flow of speech features.
步骤S104:将所述待测目标的语音特征作为以鉴别式训练的潜藏向量状态模型的输入数据并进行训练,得到一待测目标的声学模型;Step S104: take the speech feature of the target to be measured as the input data of the latent vector state model trained by the discriminant type and train it to obtain an acoustic model of the target to be measured;
具体的,此步骤中对待测目标的语音特征作为输入资料进行声学模型的训练,在本发明实施例中采用潜藏向量状态模型并使用鉴别式训练的方式进行训练,鉴别式训练是以最大互信息法(Maximum Mutual Information, 简称MMI)为准则进行训练。Specifically, in this step, the speech features of the target to be tested are used as input data to train the acoustic model. In the embodiment of the present invention, the latent vector state model is used and the training is performed by means of discriminative training. The discriminative training is based on the maximum mutual information. The maximum Mutual Information (MMI) is used as the criterion for training.
步骤S105:比对所述待测目标的声学模型与所述特定目标的声学模型之间的关联性,若两者有关联则将所述待测目标的语音特征使用至少一语言模型进行语言解码,并根据语言解码结果判断是否唤醒。Step S105: Compare the correlation between the acoustic model of the target to be tested and the acoustic model of the specific target, and if there is a correlation between the two, use at least one language model to decode the speech feature of the target to be tested. , and judge whether to wake up according to the language decoding result.
具体的,此步骤中当待测目标的声学模型符合特定目标的声学模型则进行语言解码,假若待测目标的声学模型不符合特定目标的声学模型则不进行任何动作,所述语言解码使用待测目标的语音特征作为输入资料进行语言模型的训练,在本发明实施例中语言模型包含一词库模型及一文法模型。Specifically, in this step, when the acoustic model of the target to be tested conforms to the acoustic model of the specific target, language decoding is performed, and if the acoustic model of the target to be tested does not conform to the acoustic model of the specific target, no action is performed. The speech feature of the test target is used as input data to train the language model. In the embodiment of the present invention, the language model includes a vocabulary model and a grammar model.
当待测目标的声学模型判别为特定目标的声学模型,则代表此时待测目标为特定目标,因此进行语言解码确认待测目标的语音讯息是否包含唤醒词。将待测目标的语音特征进行词库模型与文法模型的训练,解析得到待测目标的语音讯息内容,然后再判断待测目标的语音讯息内容是否包含唤醒词,若包含唤醒词则语音识别唤醒启动,若没有包含唤醒词则语音识别唤醒未启动。When the acoustic model of the target to be tested is determined to be the acoustic model of a specific target, it means that the target to be tested is a specific target at this time, so language decoding is performed to confirm whether the voice message of the target to be tested contains wake words. The speech features of the target to be tested are trained on thesaurus model and grammar model, and the content of the voice message of the target to be measured is obtained by parsing, and then it is judged whether the voice message content of the target to be tested contains a wake-up word. If it contains a wake-up word, the voice recognition wakes up. Start, if the wake-up word is not included, the voice recognition wake-up is not started.
请参阅图2,本发明实施例一语音识别用于特定目标唤醒的装置。一语音识别装置100包含一采集模组11、一提取模组12、一训练模组13、一存储模组14、一解码模组15、一处理器模组16、一注册模组17以及一无线通讯模组18。Please refer to FIG. 2 , an embodiment of the present invention is an apparatus for waking up a specific target by voice recognition. A
所述采集模组11与提取模组12和注册模组17连接,其中采集模组11设置多个麦克风阵列用于接收特定目标与待测目标的语音讯息,收集的语音讯息为类比语音讯号需要转化成数位语音讯号,同时将数位语音讯号进行杂讯抑制处理及回音消除处理,然后将处理完的数位语音讯息传送到提取模组12。The
所述特定目标的定义是根据本发明语音识别用于特定目标唤醒的对象,所述待测目标的定义是语音识别装置100进行语音识别的对象。The definition of the specific target is the object that is used for the wake-up of the specific target according to the speech recognition of the present invention, and the definition of the target to be tested is the object that the
所述特定目标的语音讯息中包含一预设的唤醒词。The voice message of the specific target includes a preset wake-up word.
所述提取模组12与采集模组11、训练模组13以及解码模组15连接,提取模组12用于接收采集模组11处理后的语音讯息,并提取其中特定目标与待测目标的语音特征,再传送到训练模组13进行声学模型训练或是传送到解码模组15进行解码。The
所述提取特定目标与待测目标的语音特征是采用梅尔倒频谱系数(Mel-frequency Cepstral Coefficients, 简称MFCC)的方式撷取其语音讯息的语音特征。The extraction of the voice features of the specific target and the target to be measured is to extract the voice features of the voice messages by using Mel-frequency Cepstral Coefficients (MFCC for short).
所述训练模组13与提取模组12、存储模组14以及处理器模组16连接。所述训练模组13用于接收提取模组12提取完的特定目标与待测目标的语音特征,并将特定目标与待测目标的语音特征作为以最大互信息法训练的潜藏向量状态模型的输入数据,最后获取训练后的声学模型,并根据特定目标与待测目标进行不同步骤。若是特定目标则将特定目标的声学模型传送到存储模组14,若是待测目标则将待测目标的声学模型传送到处理器模组16。The
所述存储模组14与训练模组13、处理器模组16以及注册模组17连接。所述存储模组14用于保存训练模组13训练完成的特定目标的声学模型。在本发明实施例中,当特定目标进行注册模组17的操作,则训练模组13训练后的特定目标的声学模型会传送到存储模组14进行保存。另外,当处理器模组16进行待测目标与特定目标的声学模型比对时,则存储模组14将保存的特定目标的声学模型传送到处理器模组16。The
所述解码模组15与提取模组12及处理器模组16连接。所述解码模组15用于将待测目标的语音讯息进行语言解码,更具体的说明,提取模组12将待测目标的语音特征作为以词库模型及文法模型的输入资料进行训练,并将结果传送到处理器模组16。The
所述处理器模组16与训练模组13、存储模组14、解码模组15与无线通讯模组18连接。所述处理器模组16用于比对特定目标的声学模型与待测目标的声学模型,并根据两个声学模型的比对结果判断是否启动所述解码模组15进行语言解码,更具体的说明,当训练模组13传送待测目标的声学模型则处理器模组16同时从存储模组14中取得特定目标的声学模型,并在处理器模组16中进行这两个声学模型的比对。The
当确认特定目标的声学模型与待测目标的声学模型有关连,即代表待测目标为特定目标,因此进行待测目标的语音讯息语言解码判断其中是否包含唤醒词,故处理器模组16会启动解码模组15,并由解码模组15进行语言解码。When it is confirmed that the acoustic model of the specific target is related to the acoustic model of the target to be tested, it means that the target to be tested is a specific target. Therefore, the speech message language decoding of the target to be tested is performed to determine whether it contains a wake-up word. Therefore, the
所述解码模组15从提取模组12中获取待测目标的语音特征,并将语言解码的运算结果回传给处理器模组16,处理器模组16会根据待测目标的声学模型以及语言解码后结果判断待测目标的语音讯息中是否包含唤醒词。The
当处理器模组16得到待测目标的语音讯息中包含唤醒词则执行语音识别装置100的唤醒,反之则不执行。When the
所述注册模组17与采集模组11以及存储模组14连接。所述注册模组17用于提供特定目标进行语音识别装置100的注册,其中注册模组17包含一启动元件以及一显示元件,当特定目标碰触启动元件则同时启动存储模组14,表示采集模组11此次收集到的语音讯息经过训练模组13训练后的声学模型需要保存到存储模组14,另外,当特定目标碰触启动元件则显示元件启动提供特定目标确认目前是否为注册阶段。The
在本发明实施例中,所述启动元件为一种按钮,所述显示元件为一种发光二极管。In the embodiment of the present invention, the activation element is a button, and the display element is a light-emitting diode.
所述无线通讯模组18与处理器模组16连接。所述无线通讯模组18用于当处理器模组16确认唤醒语音识别装置100成功后进行与外部通讯连接。The
在本发明实施例中,所述无线通讯模组18包含一种Wi-Fi模组或一种蓝牙模组。In the embodiment of the present invention, the
以上所述,本发明语音识别用于特定目标唤醒的方法及装置采用鉴别式训练的潜藏向量状态模型作为声学模型,使用最大互信息法的鉴别式训练除了最大化正确答案的出现机率外,也会将竞争者的出现机率降低,增加其正确答案与竞争者之间的鉴别能力,能够快速且准确地判断待测目标是否为特定目标,进而达到唤醒的功用。As described above, the method and device for voice recognition of the present invention for awakening a specific target adopts the latent vector state model of the discriminative training as the acoustic model, and the discriminative training using the maximum mutual information method not only maximizes the probability of occurrence of the correct answer, but also It will reduce the appearance probability of competitors, increase the ability to discriminate between their correct answers and competitors, and can quickly and accurately determine whether the target to be tested is a specific target, thereby achieving the function of awakening.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910124945.7A CN111583939A (en) | 2019-02-19 | 2019-02-19 | Method and device for specific target wake-up by voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910124945.7A CN111583939A (en) | 2019-02-19 | 2019-02-19 | Method and device for specific target wake-up by voice recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111583939A true CN111583939A (en) | 2020-08-25 |
Family
ID=72122523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910124945.7A Withdrawn CN111583939A (en) | 2019-02-19 | 2019-02-19 | Method and device for specific target wake-up by voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111583939A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971678A (en) * | 2013-01-29 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and device for detecting keywords |
CN106611597A (en) * | 2016-12-02 | 2017-05-03 | 百度在线网络技术(北京)有限公司 | Voice wakeup method and voice wakeup device based on artificial intelligence |
CN107123417A (en) * | 2017-05-16 | 2017-09-01 | 上海交通大学 | Optimization method and system are waken up based on the customized voice that distinctive is trained |
CN108281137A (en) * | 2017-01-03 | 2018-07-13 | 中国科学院声学研究所 | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system |
CN109155132A (en) * | 2016-03-21 | 2019-01-04 | 亚马逊技术公司 | Speaker verification method and system |
CN109243446A (en) * | 2018-10-01 | 2019-01-18 | 厦门快商通信息技术有限公司 | A kind of voice awakening method based on RNN network |
-
2019
- 2019-02-19 CN CN201910124945.7A patent/CN111583939A/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971678A (en) * | 2013-01-29 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and device for detecting keywords |
CN109155132A (en) * | 2016-03-21 | 2019-01-04 | 亚马逊技术公司 | Speaker verification method and system |
CN106611597A (en) * | 2016-12-02 | 2017-05-03 | 百度在线网络技术(北京)有限公司 | Voice wakeup method and voice wakeup device based on artificial intelligence |
CN108281137A (en) * | 2017-01-03 | 2018-07-13 | 中国科学院声学研究所 | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system |
CN107123417A (en) * | 2017-05-16 | 2017-09-01 | 上海交通大学 | Optimization method and system are waken up based on the customized voice that distinctive is trained |
CN109243446A (en) * | 2018-10-01 | 2019-01-18 | 厦门快商通信息技术有限公司 | A kind of voice awakening method based on RNN network |
Non-Patent Citations (1)
Title |
---|
DEYU ZHOU AND YULAN HE: "《Discriminative Training of the Hidden Vector State Model for Semantic Parsing》", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110428810B (en) | Voice wake-up recognition method and device and electronic equipment | |
US9646610B2 (en) | Method and apparatus for activating a particular wireless communication device to accept speech and/or voice commands using identification data consisting of speech, voice, image recognition | |
WO2017071182A1 (en) | Voice wakeup method, apparatus and system | |
CN112102850B (en) | Emotion recognition processing method and device, medium and electronic equipment | |
US9354687B2 (en) | Methods and apparatus for unsupervised wakeup with time-correlated acoustic events | |
WO2016150001A1 (en) | Speech recognition method, device and computer storage medium | |
CN107767861B (en) | Voice awakening method and system and intelligent terminal | |
US6618702B1 (en) | Method of and device for phone-based speaker recognition | |
EP3210205B1 (en) | Sound sample verification for generating sound detection model | |
CN111210829B (en) | Speech recognition method, apparatus, system, device and computer readable storage medium | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
CN109272991B (en) | Voice interaction method, device, equipment and computer-readable storage medium | |
CN109979438A (en) | Voice wake-up method and electronic equipment | |
CN109192224B (en) | Voice evaluation method, device and equipment and readable storage medium | |
US20140337024A1 (en) | Method and system for speech command detection, and information processing system | |
US9335966B2 (en) | Methods and apparatus for unsupervised wakeup | |
CN101772015A (en) | Method for starting up mobile terminal through voice password | |
CN110610707A (en) | Voice keyword recognition method and device, electronic equipment and storage medium | |
CN103943105A (en) | Voice interaction method and system | |
JPH0962293A (en) | Speech recognition dialogue device and speech recognition dialogue processing method | |
CN103021409A (en) | Voice activating photographing system | |
TW202029181A (en) | Method and apparatus for specific user to wake up by speech recognition | |
CN110689887B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN108052195A (en) | Control method of microphone equipment and terminal equipment | |
JP2019124952A (en) | Information processing device, information processing method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200825 |
|
WW01 | Invention patent application withdrawn after publication |