[go: up one dir, main page]

JPH08314490A - Word spotting type speech recognition method and device - Google Patents

Word spotting type speech recognition method and device

Info

Publication number
JPH08314490A
JPH08314490A JP7123469A JP12346995A JPH08314490A JP H08314490 A JPH08314490 A JP H08314490A JP 7123469 A JP7123469 A JP 7123469A JP 12346995 A JP12346995 A JP 12346995A JP H08314490 A JPH08314490 A JP H08314490A
Authority
JP
Japan
Prior art keywords
keyword
likelihood
section
input voice
vowel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP7123469A
Other languages
Japanese (ja)
Inventor
Toru Imai
亨 今井
Akio Ando
彰男 安藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Japan Broadcasting Corp
Original Assignee
Nippon Hoso Kyokai NHK
Japan Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Hoso Kyokai NHK, Japan Broadcasting Corp filed Critical Nippon Hoso Kyokai NHK
Priority to JP7123469A priority Critical patent/JPH08314490A/en
Publication of JPH08314490A publication Critical patent/JPH08314490A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】 【目的】 湧き出し誤りが少なく、認識精度が高く、高
速認識が可能なワードスポッティング型音声認識方法お
よび装置を提供する。 【構成】 母音の標準パターンを用いて入力音声の各時
刻の母音候補を求め、これとキーワード発音辞書中の母
音列との対応をとり、入力音声中のキーワードの存在区
間を検出し(2)、該検出された存在区間におけるキー
ワードの尤度を隠れマルコフモデルにより求める(3)
とともに、検出された前記キーワードの存在区間と求め
られた前記尤度から音声認識結果を出力する(4)。
(57) [Abstract] [Purpose] To provide a word spotting type speech recognition method and device which have a small number of source errors, high recognition accuracy, and high-speed recognition. [Structure] Using the standard pattern of vowels, the vowel candidate at each time of the input voice is obtained, and the vowel sequence in the keyword pronunciation dictionary is associated with it to detect the presence section of the keyword in the input voice (2). , Finding the likelihood of the keyword in the detected existence section by a hidden Markov model (3)
At the same time, a speech recognition result is output from the detected keyword existence section and the obtained likelihood (4).

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は、音声認識方法と装置
に係り、特に入力音声中のキーワードを検出して音声認
識を行うワードスポッティング型音声認識方法と装置に
関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition method and apparatus, and more particularly to a word spotting type voice recognition method and apparatus for detecting a keyword in an input voice to perform voice recognition.

【0002】[0002]

【従来の技術】従来のワードスポッティング型音声認識
装置を使用する方法には、例えば以下の (a), (b) の
ような方法が提案されている。 (a)隠れマルコフモデルだけを使用する方法(今村明
弘:“HMM(隠れマルコフモデル)による電話音声の
スポッティング”,信学技報 SP90-18, 1990参照)。 (b)音素認識ネットワーク素子による単語予備選択
と、DPマッチング(例えば“確率モデルによる音声認
識”,電子情報通信学会刊,中川聖著,18頁,1988参
照)による単語決定を併用した方法(小倉広実:“音素
事後確率時系列を用いた単語予備選択と大語彙単語音声
認識”, 信学技報 SP94-87, 1995)。
2. Description of the Related Art As a method of using a conventional word spotting type voice recognition device, for example, the following methods (a) and (b) have been proposed. (A) Method using only Hidden Markov Model (See Akihiro Imamura: "Spotting of Telephone Voice by HMM (Hidden Markov Model)", IEICE Technical Report SP90-18, 1990). (B) A method in which word preselection by a phoneme recognition network element is combined with word determination by DP matching (for example, "speech recognition by a probabilistic model", published by The Institute of Electronics, Information and Communication Engineers, Sei Nakagawa, page 18, 1988) (Ogura Hiromi: “Word preselection and large vocabulary word speech recognition using phoneme posterior probability time series,” IEICE Tech. SP94-87, 1995).

【0003】[0003]

【発明が解決しようとする課題】上述の従来の方法
(a)では、入力音声の任意のフレームから任意のフレ
ームまでをキーワード存在区間と仮定し、隠れマルコフ
モデルのみにより尤度を求めていた。そのため計算量が
多く、実際には発生していない場所でキーワードを検出
してしまうこと(湧き出し誤り)が多いという問題点が
あった。
In the above-mentioned conventional method (a), it is assumed that an arbitrary frame to an arbitrary frame of the input speech is a keyword existence section, and the likelihood is calculated only by the hidden Markov model. Therefore, there is a problem that the amount of calculation is large and that a keyword is often detected (a spelling error) in a place where it does not actually occur.

【0004】また従来の方法(b)では、子音も含めた
全ての音素の認識を高い精度で実現することが困難であ
り、DPマッチングによるキーワードの尤度計算の精度
も、特に不特定話者を想定した場合には隠れマルコフモ
デルより劣るという問題点があった。そこで本発明の目
的は、従来のこの種技術の欠点を排除し、湧き出し誤り
が少なく、認識精度が高く、高速認識が可能なワードス
ポッティング型音声認識方法および装置を提供せんとす
るものである。
Further, with the conventional method (b), it is difficult to realize recognition of all phonemes including consonants with high accuracy, and the accuracy of likelihood calculation of keywords by DP matching is also particularly high. There was a problem that it was inferior to Hidden Markov Model when assumed. SUMMARY OF THE INVENTION Therefore, an object of the present invention is to eliminate the drawbacks of the conventional techniques of the related art, to provide a word spotting type speech recognition method and device which have few source errors, high recognition accuracy and high-speed recognition. .

【0005】[0005]

【課題を解決するための手段】この目的を達成するため
本発明ワードスポッティング型音声認識方法は、母音の
標準パターンを用いて入力音声の各時刻の母音候補を求
め、これとキーワード発音辞書中の母音列との対応をと
り、入力音声中のキーワードの存在区間を検出し、該検
出された存在区間におけるキーワードの尤度を隠れマル
コフモデルにより求めるとともに、検出された前記キー
ワードの存在区間と求められた前記尤度から音声認識結
果を出力することを特徴とするものである。
In order to achieve this object, the word spotting type speech recognition method of the present invention finds a vowel candidate at each time of an input voice using a standard pattern of vowels and stores it in a keyword pronunciation dictionary. Corresponding to the vowel sequence, the presence section of the keyword in the input speech is detected, the likelihood of the keyword in the detected presence section is obtained by the hidden Markov model, and the presence section of the detected keyword is obtained. The speech recognition result is output from the likelihood.

【0006】また、本発明ワードスポッティング型音声
認識装置は、入力音声の音響分析を行う音響分析部と、
該分析部の分析結果およびあらかじめ設けられたキーワ
ード発音辞書に基づいて入力音声の母音の認識を行い、
入力音声における各キーワードの存在区間を検出する存
在区間検出部と、前記分析結果、キーワード発音辞書お
よび検出されたキーワード存在区間に基づいて入力音声
に含まれる各キーワードの尤度を求める尤度算出部と、
各キーワードの存在区間および求められた尤度に基づい
てキーワード認識結果を出力する認識結果出力部とを具
備したことを特徴とするものである。
The word spotting type speech recognition apparatus of the present invention further comprises an acoustic analysis section for performing acoustic analysis of input speech,
The vowel recognition of the input voice is performed based on the analysis result of the analysis unit and the keyword pronunciation dictionary provided in advance,
An existence section detection unit that detects an existence section of each keyword in the input voice, and a likelihood calculation unit that obtains the likelihood of each keyword included in the input voice based on the analysis result, the keyword pronunciation dictionary, and the detected keyword existence section. When,
The present invention is characterized by further comprising a recognition result output unit that outputs a keyword recognition result based on the presence interval of each keyword and the obtained likelihood.

【0007】[0007]

【実施例】以下添附図面を参照し実施例により本発明を
詳細に説明する。本発明によるワードスポッティング型
音声認識方法を実施するために使用される装置の一実施
例を示す図1を参照するに、本発明音声認識装置は、入
力音声の音響分析を行う音響分析部1と、分析結果およ
びキーワード音声辞書に基づいて母音の認識を行い、入
力音声における各キーワードの存在区間を検出する存在
区間検出部2と、分析結果とキーワード音声辞書とキー
ワード存在区間に基づいて各キーワードの尤度を求める
尤度算出部と、各キーワードの存在区間と尤度に基づい
てキーワード認識結果を出力する認識結果出力部で構成
されている。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below with reference to the accompanying drawings. Referring to FIG. 1 showing an embodiment of an apparatus used for implementing a word spotting type speech recognition method according to the present invention, the speech recognition apparatus of the present invention comprises an acoustic analysis unit 1 for performing acoustic analysis of input speech. The presence section detection unit 2 for recognizing vowels based on the analysis result and the keyword voice dictionary and detecting the presence section of each keyword in the input voice, and for each keyword based on the analysis result, the keyword voice dictionary, and the keyword presence section. It is composed of a likelihood calculation unit that obtains a likelihood and a recognition result output unit that outputs a keyword recognition result based on the existence interval and likelihood of each keyword.

【0008】音響分析部1は、図2に示すように、入力
音声をA/D変換器11によりA/D変換し、フレーム
化器12により20ms程度の幅を1フレームとして、
各フレームの平均パワー、LPC(線形予測)ケプスト
ラム係数(周波数分布を表す係数)などの値を算出部1
3により計算する。
As shown in FIG. 2, the acoustic analysis unit 1 A / D-converts an input voice by an A / D converter 11, and a framer 12 sets a width of about 20 ms as one frame.
The calculation unit 1 calculates values such as the average power of each frame and the LPC (linear prediction) cepstrum coefficient (coefficient representing frequency distribution).
Calculate by 3.

【0009】存在区間検出部2には図3に示すようにあ
らかじめ、母音(あ、い、う、え、お、ん)のケプスト
ラム係数などの標準的な値(母音標準パターン)を母音
標準パターンメモリ部21に保存しておく。この母音標
準パターンと入力音声の分析結果を比較器22で比較
し、入力音声の各フレームがどの母音に相当するかを母
音候補決定部23で求める。ただし各フレームに対して
一意に1つの母音を決定するのではなく、各母音の標準
パターンと入力音声の近さを尤度に変換して、各フレー
ムに対して複数の母音候補を求める。
As shown in FIG. 3, the existing section detecting unit 2 previously stores standard values (vowel standard patterns) such as cepstral coefficients of vowels (a, i, u, e, o, and n) as vowel standard patterns. It is saved in the memory unit 21. The vowel standard pattern and the analysis result of the input voice are compared by the comparator 22, and the vowel candidate determining unit 23 determines which vowel each frame of the input voice corresponds to. However, instead of uniquely determining one vowel for each frame, the proximity of the standard pattern of each vowel and the input voice is converted into a likelihood to obtain a plurality of vowel candidates for each frame.

【0010】次に、認識対象であるキーワードの母音列
が入力音声に含まれているかどうかを、キーワード発音
辞書に基づいて調べる。これも母音候補決定部23で行
われる。キーワード発音辞書は、認識対象であるキーワ
ードの漢字仮名混じり表記と、その発音をローマ字で記
述したものである。例えば「スポーツ」というキーワー
ドに対しては「s,u,p,oo,ts,u 」という発音を、「相
撲」というキーワードに対しては「s,u,m,oo」という発
音をあらかじめ記述しておく。キーワードの母音列「u,
oo,u」や「u,oo」などが入力音声に含まれている場合に
は、入力音声の何フレーム目から何フレーム目までがど
のキーワードなのか、各キーワードの存在区間情報を出
力する。
Next, it is checked based on the keyword pronunciation dictionary whether or not the vowel sequence of the keyword to be recognized is included in the input voice. This is also performed by the vowel candidate determination unit 23. The keyword pronunciation dictionary is a description in which the keywords to be recognized are mixed with kanji and kana and their pronunciations are written in Roman letters. For example, the pronunciation "s, u, p, oo, ts, u" for the keyword "sports" and the pronunciation "s, u, m, oo" for the keyword "sumo" are described in advance. I'll do it. Keyword vowel sequence "u,
When the input voice includes "oo, u", "u, oo", etc., it outputs which section of the input voice is from which frame to which keyword, and the existence section information of each keyword is output.

【0011】キーワードの母音列が入力音声に含まれて
いるかどうかを調べる際、例えば「スポーツ」の母音列
「u,oo,u」が入力音声に含まれているかどうかを調べる
際、「u,oo,a,u」など母音が挿入した場合もキーワード
の存在区間とする。もちろん、1つのキーワードに対し
て複数の存在区間を出力可能とする。以上の操作をキー
ワード発音辞書中のすべてのキーワードに対して行う。
When checking whether a vowel sequence of a keyword is included in an input voice, for example, when checking whether a vowel sequence “u, oo, u” of “sports” is included in an input voice, “u, Even when a vowel is inserted such as "oo, a, u", it is also defined as a keyword existence section. Of course, a plurality of existing sections can be output for one keyword. The above operation is performed for all the keywords in the keyword pronunciation dictionary.

【0012】尤度算出部3には図4に示すように、あら
かじめ母音(a,i,u,e,o) と子音(p,t,k,ts など) の音素
単位の隠れマルコフモデルをメモリ(ROM)に保存し
ておく。そしてキーワード発音辞書中の発音に従ってこ
れらを直列に連結32し、「スポーツ」や「相撲」など
のキーワードに相当する隠れマルコフモデルをブロック
33(RAM)で作成する。存在区間検出部2から各キ
ーワードの存在区間を受け取り、入力音声のその区間に
対する尤度を、尤度算出器34で各キーワードに相当す
る隠れマルコフモデルにより求める。
As shown in FIG. 4, the likelihood calculating unit 3 has a hidden Markov model for each phoneme of a vowel (a, i, u, e, o) and a consonant (p, t, k, ts) in advance. Save in memory (ROM). Then, these are serially connected 32 according to pronunciations in the keyword pronunciation dictionary, and a hidden Markov model corresponding to a keyword such as "sports" or "sumo" is created in a block 33 (RAM). The existence section of each keyword is received from the existence section detection unit 2, and the likelihood of the input speech with respect to the section is obtained by the likelihood calculator 34 by the hidden Markov model corresponding to each keyword.

【0013】認識結果出力部4は図5に示すように各キ
ーワードの存在区間と尤度を尤度メモリ部41で受け取
り、あらかじめ設定した閾値メモリ部42からの尤度の
閾値と比較器43で比較し、その閾値を超えるキーワー
ドがあった場合には、そのキーワードが発声されたもの
として、そのキーワードを認識結果として出力する。
As shown in FIG. 5, the recognition result output unit 4 receives the presence interval and likelihood of each keyword in the likelihood memory unit 41, and the likelihood threshold from the preset threshold memory unit 42 and the comparator 43. When the comparison is made and there is a keyword exceeding the threshold value, it is determined that the keyword is uttered, and the keyword is output as a recognition result.

【0014】認識例を図6により説明する。例えば「あ
のー、スポーツ番組ありますか」というような音声が入
力された時、存在区間検出部2においてまず母音の認識
が行われ、「a,oo,u,oo,u,a,N,u,i,a,i,a,u,a 」という
ような母音列が得られる。次に「スポーツ(s,u,p,oo,t
s,u)」、「スキー(s,u,k,ii)」、「相撲(s,u,m,oo)」な
どのキーワードの母音列が入力音声のある部分と一致す
ることが検出され、この区間がキーワード存在区間とし
て尤度算出部3に渡される。尤度算出部3は、「スポー
ツ(s,u,p,oo,ts,u)」、「スキー(s,u,k,ii)」、「相撲
(s,u,m,oo)」に対して隠れマルコフモデルにより尤度を
求める。認識結果出力部4は、最も大きな尤度を示し、
あらかじめ定めた−13.0などの閾値を越える尤度−
12.0を示した「スポーツ」をキーワード認識結果と
して出力する。
A recognition example will be described with reference to FIG. For example, when a voice such as "Are there any sports programs?" Is input, the presence section detection unit 2 first recognizes vowels, and then "a, oo, u, oo, u, a, N, u, Vowel sequences such as "i, a, i, a, u, a" are obtained. Next, "Sports (s, u, p, oo, t
s, u) '', `` ski (s, u, k, ii) '', and `` sumo (s, u, m, oo) '' are detected as matching the vowel sequence of the input voice. This section is passed to the likelihood calculation unit 3 as a keyword existing section. Likelihood calculation unit 3 uses “sports (s, u, p, oo, ts, u)”, “skis (s, u, k, ii)”, and “sumo”.
For (s, u, m, oo) ”, the likelihood is calculated by the hidden Markov model. The recognition result output unit 4 shows the largest likelihood,
Likelihood to exceed a predetermined threshold such as -13.0-
“Sports” indicating 12.0 is output as the keyword recognition result.

【0015】認識実験により、本発明の有効性の検証を
行う。まず母音標準パターンと隠れマルコフモデルを市
販の女性音声データベースにより学習し、2人の女性話
者の50文でさらに話者適応化を行った。母音標準パタ
ーンと隠れマルコフモデルにはLPCケプストラム係数
を特徴量として用いた。キーワード発音辞書には20の
単語キーワードを登録した。前述の2人の女性話者が、
文中に1つのキーワードを含む48文を発声し、本発明
の認識装置により認識を行った。その結果、94%の認
識率が得られ(従来の方法(a)では30%であっ
た)、湧き出し誤りはゼロであり、ほぼ実時間で処理を
終えた。
The effectiveness of the present invention is verified by recognition experiments. First, the vowel standard pattern and hidden Markov model were learned by a commercially available female voice database, and speaker adaptation was further performed with 50 sentences of two female speakers. The LPC cepstrum coefficient was used as a feature amount for the vowel standard pattern and the hidden Markov model. Twenty word keywords were registered in the keyword pronunciation dictionary. The two female speakers mentioned above
Forty-eight sentences including one keyword in the sentence were uttered and recognized by the recognition device of the present invention. As a result, a recognition rate of 94% was obtained (it was 30% in the conventional method (a)), and there was no springing error, and the processing was completed in almost real time.

【0016】以上一実施例により本発明を説明してきた
が、本発明はこの実施例に限定されることはなく、本発
明の要旨内で各種の変形、変更の可能なことは当業者に
自明であろう。例えば図3図示の尤度24は、分析結果
の入力音声の母音パターンを母音標準パターンと比較し
て両者の距離差を求めるものであってもよい。また本発
明方法を実施する装置は、上述のハード構成のものの
他、プログラムされたコンピュータソフトで処理される
ものであってもよい。
Although the present invention has been described with reference to one embodiment, it is obvious to those skilled in the art that the present invention is not limited to this embodiment and various modifications and changes can be made within the scope of the present invention. Will. For example, the likelihood 24 shown in FIG. 3 may be obtained by comparing the vowel pattern of the input speech as the analysis result with the vowel standard pattern to obtain the distance difference between the two. In addition to the hardware configuration described above, the apparatus for implementing the method of the present invention may be processed by programmed computer software.

【0017】[0017]

【発明の効果】本発明方法ならびに装置によれば、母音
の標準パターンを用いて入力音声中のキーワードの存在
区間を求めた後に、隠れマルコフモデルを用いてキーワ
ードの尤度を求めるので従来のワードスポッティング型
音声認識装置に比し、湧き出し誤りが少なく、認識精度
が高く、高速認識の可能な音声認識方法ならびに装置が
提供される。
According to the method and apparatus of the present invention, the likelihood of the keyword is calculated using the hidden Markov model after the existence section of the keyword in the input speech is calculated using the standard pattern of vowels. Provided are a voice recognition method and a device capable of high-speed recognition with less generation error and higher recognition accuracy than a spotting type voice recognition device.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明方法を実施するための装置の実施例構成
を示すブロック線図。
FIG. 1 is a block diagram showing the configuration of an embodiment of an apparatus for carrying out the method of the present invention.

【図2】図1図示音響分析部の構成を示す図。FIG. 2 is a diagram showing a configuration of an acoustic analysis unit shown in FIG.

【図3】図1図示存在区間検出部の構成を示す図。FIG. 3 is a diagram showing a configuration of an existing section detection unit shown in FIG.

【図4】図1図示尤度算出部の構成を示す図。FIG. 4 is a diagram showing the configuration of a likelihood calculation unit shown in FIG.

【図5】図1図示認識結果出力部の構成を示す図。FIG. 5 is a diagram showing a configuration of an illustration recognition result output unit in FIG.

【図6】認識例を説明するための図。FIG. 6 is a diagram for explaining a recognition example.

【符号の説明】[Explanation of symbols]

1 音響分析部 2 存在区間検出部 3 尤度算出部 4 認識結果出力部 11 A/D変換器 12 フレーム化器 13 パワーLPCケプストラム係数算出部 21 母音標準パターンメモリ部 22 比較器 23 母音候補決定部 24 尤度 31 音素単位隠れマルコフモデルメモリ部 32 連結 33 キーワード単位隠れマルコフモデル 34 尤度算出器 41 尤度メモリ部 42 閾値メモリ部 43 比較器 DESCRIPTION OF SYMBOLS 1 Acoustic analysis unit 2 Presence section detection unit 3 Likelihood calculation unit 4 Recognition result output unit 11 A / D converter 12 Framer 13 Power LPC cepstrum coefficient calculation unit 21 Vowel standard pattern memory unit 22 Comparator 23 Vowel candidate determination unit 24 Likelihood 31 Hidden Markov Model Memory Unit of Phoneme Unit 32 Concatenation 33 Hidden Markov Model of Keyword Unit 34 Likelihood Calculator 41 Likelihood Memory Unit 42 Threshold Memory Unit 43 Comparator

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 母音の標準パターンを用いて入力音声の
各時刻の母音候補を求め、これとキーワード発音辞書中
の母音列との対応をとり、入力音声中のキーワードの存
在区間を検出し、該検出された存在区間におけるキーワ
ードの尤度を隠れマルコフモデルにより求めるととも
に、検出された前記キーワードの存在区間と求められた
前記尤度から音声認識結果を出力することを特徴とする
ワードスポッティング型音声認識方法。
1. A vowel candidate at each time of an input voice is obtained using a standard pattern of vowels, the vowel sequence in a keyword pronunciation dictionary is associated with the vowel candidate, and a keyword existing section in the input voice is detected. A word spotting type speech characterized in that a likelihood of a keyword in the detected existence section is obtained by a hidden Markov model, and a speech recognition result is output from the detected existence section of the keyword and the obtained likelihood. Recognition method.
【請求項2】 入力音声の音響分析を行う音響分析部
と、該分析部の分析結果およびあらかじめ設けられたキ
ーワード発音辞書に基づいて入力音声の母音の認識を行
い、入力音声における各キーワードの存在区間を検出す
る存在区間検出部と、前記分析結果、キーワード発音辞
書および検出されたキーワード存在区間に基づいて入力
音声に含まれる各キーワードの尤度を求める尤度算出部
と、各キーワードの存在区間および求められた尤度に基
づいてキーワード認識結果を出力する認識結果出力部と
を具備したことを特徴とするワードスポッティング型音
声認識装置。
2. The presence of each keyword in the input voice, which recognizes the vowels of the input voice based on the acoustic analysis unit that performs the acoustic analysis of the input voice, the analysis result of the analysis unit, and the keyword pronunciation dictionary provided in advance. An existing section detection unit that detects a section, a likelihood calculation section that obtains the likelihood of each keyword included in the input voice based on the analysis result, the keyword pronunciation dictionary, and the detected keyword existing section, and the existing section of each keyword And a recognition result output unit that outputs a keyword recognition result based on the obtained likelihood, and a word spotting type speech recognition device.
JP7123469A 1995-05-23 1995-05-23 Word spotting type speech recognition method and device Pending JPH08314490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7123469A JPH08314490A (en) 1995-05-23 1995-05-23 Word spotting type speech recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7123469A JPH08314490A (en) 1995-05-23 1995-05-23 Word spotting type speech recognition method and device

Publications (1)

Publication Number Publication Date
JPH08314490A true JPH08314490A (en) 1996-11-29

Family

ID=14861407

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7123469A Pending JPH08314490A (en) 1995-05-23 1995-05-23 Word spotting type speech recognition method and device

Country Status (1)

Country Link
JP (1) JPH08314490A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1058446A2 (en) * 1999-06-03 2000-12-06 Lucent Technologies Inc. Key segment spotting in voice messages
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN110781270A (en) * 2018-07-13 2020-02-11 北京搜狗科技发展有限公司 A method and device for constructing a non-keyword model in a decoding network
WO2020073839A1 (en) * 2018-10-11 2020-04-16 阿里巴巴集团控股有限公司 Voice wake-up method, apparatus and system, and electronic device
KR20230006055A (en) * 2018-07-13 2023-01-10 구글 엘엘씨 End-to-end streaming keyword spotting

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1058446A2 (en) * 1999-06-03 2000-12-06 Lucent Technologies Inc. Key segment spotting in voice messages
EP1058446A3 (en) * 1999-06-03 2003-07-09 Lucent Technologies Inc. Key segment spotting in voice messages
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN110781270A (en) * 2018-07-13 2020-02-11 北京搜狗科技发展有限公司 A method and device for constructing a non-keyword model in a decoding network
KR20230006055A (en) * 2018-07-13 2023-01-10 구글 엘엘씨 End-to-end streaming keyword spotting
WO2020073839A1 (en) * 2018-10-11 2020-04-16 阿里巴巴集团控股有限公司 Voice wake-up method, apparatus and system, and electronic device

Similar Documents

Publication Publication Date Title
CN110675855B (en) Voice recognition method, electronic equipment and computer readable storage medium
US8478591B2 (en) Phonetic variation model building apparatus and method and phonetic recognition system and method thereof
US20180137109A1 (en) Methodology for automatic multilingual speech recognition
Witt et al. Language learning based on non-native speech recognition.
US20110196678A1 (en) Speech recognition apparatus and speech recognition method
US6553342B1 (en) Tone based speech recognition
KR102199246B1 (en) Method And Apparatus for Learning Acoustic Model Considering Reliability Score
US20110218802A1 (en) Continuous Speech Recognition
Iwano et al. Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
CN114627896A (en) Voice evaluation method, device, equipment and storage medium
JP3444108B2 (en) Voice recognition device
Hirose et al. Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition
CN108806691B (en) Voice recognition method and system
JP3403838B2 (en) Phrase boundary probability calculator and phrase boundary probability continuous speech recognizer
JPH08314490A (en) Word spotting type speech recognition method and device
JP2003044078A (en) Voice recognizing device using uttering speed normalization analysis
JP2938865B1 (en) Voice recognition device
Qian et al. A Multi-Space Distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition
JP2008242059A (en) Device for creating speech recognition dictionary, and speech recognition apparatus
JP3061292B2 (en) Accent phrase boundary detection device
JPH10232693A (en) Voice recognition device
JP2001109491A (en) Continuous voice recognition device and continuous voice recognition method
CN114255758B (en) Oral evaluation method, device, equipment and storage medium
Jelinek et al. 25 Continuous speech recognition: Statistical methods