[go: up one dir, main page]

JP2003177779A - Speaker learning method for speech recognition - Google Patents

Speaker learning method for speech recognition

Info

Publication number
JP2003177779A
JP2003177779A JP2001378341A JP2001378341A JP2003177779A JP 2003177779 A JP2003177779 A JP 2003177779A JP 2001378341 A JP2001378341 A JP 2001378341A JP 2001378341 A JP2001378341 A JP 2001378341A JP 2003177779 A JP2003177779 A JP 2003177779A
Authority
JP
Japan
Prior art keywords
speaker
learning
recognition
learning method
utterance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2001378341A
Other languages
Japanese (ja)
Other versions
JP3876703B2 (en
JP2003177779A5 (en
Inventor
Yumi Wakita
由実 脇田
Kenji Mizutani
研治 水谷
Shinichi Yoshizawa
伸一 芳澤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2001378341A priority Critical patent/JP3876703B2/en
Publication of JP2003177779A publication Critical patent/JP2003177779A/en
Publication of JP2003177779A5 publication Critical patent/JP2003177779A5/ja
Application granted granted Critical
Publication of JP3876703B2 publication Critical patent/JP3876703B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

(57)【要約】 【課題】 従来の認識性能の低い話者の性能を向上させ
るための話者適応や話者登録学習では、学習用の発声量
が多くなるため話者に負担がかかるか、または負担を軽
くするために発声量を制限した場合には、全ての発声に
おいて認識性能が向上するとは限らず、認識率が低下す
る単語も出現する可能性がある、という問題を有してい
る。 【解決手段】 少ない発声で発声内容が認識結果に依存
しているかどうかを推定し、依存していない場合には話
者適応学習、依存している場合には話者登録学習を行う
ことにより、話者の負担にならない程度の学習発声で、
確実に認識率を向上させることができる話者学習法を提
供できる。
(57) [Summary] [Problem] In conventional speaker adaptation and speaker registration learning to improve the performance of speakers with low recognition performance, does the speaker become burdensome because the amount of utterance for learning increases? If the amount of utterance is limited to reduce the burden, the recognition performance may not be improved in all utterances, and there is a possibility that a word with a lower recognition rate may appear. Yes. By estimating whether the utterance content depends on the recognition result with less utterance, performing speaker adaptive learning if not dependent, and speaker registration learning if dependent, With a learning utterance that is not burdened by the speaker,
It is possible to provide a speaker learning method that can surely improve the recognition rate.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】本発明は、音声認識における
話者学習法に関するものである。
TECHNICAL FIELD The present invention relates to a speaker learning method in speech recognition.

【0002】[0002]

【従来の技術】以下、従来の話者学習法を説明する。従
来の不特定話者音声認識システムでは、なるべく不特定
多数の話者に対応できる標準的な音響モデルを構築して
用いているが、実用上では、話者の発声特徴は多種多様
であり、全ての使用話者に対して高性能を保証する音響
モデルを学習することは困難である。そこで従来は、認
識しない話者について、話者自身の発声を用いて音響モ
デルパラメータを再学習し、話者に適応した音響モデル
を再構築することにより全話者に対する性能を保証する
話者適応手段をとっている。この話者適応には話者の特
徴を捉えるに十分な多くの学習用音声が必要であるが、
発声者の負担になるので、最低限の発声回数に絞る様々
な工夫がなされている(たとえば、特許第2037877)。
一方、別の学習方法として、誤認識した単語の認識結果
に相当する音響モデル系列を正解系列として発音辞書に
追加し、誤った系列として認識したものを正しい系列と
して認識することを可能とする話者登録方法もある(特
開平8-171396号公報)。
2. Description of the Related Art A conventional speaker learning method will be described below. In the conventional unspecified speaker speech recognition system, a standard acoustic model that can handle as many unspecified speakers as possible is constructed and used, but in practice, the utterance characteristics of the speaker are diverse, It is difficult to learn an acoustic model that guarantees high performance for all speakers. Therefore, conventionally, speaker adaptation that guarantees performance for all speakers by re-learning the acoustic model parameters of the unrecognized speaker using the speaker's own utterance and reconstructing the acoustic model adapted to the speaker. I am taking means. This speaker adaptation requires a large amount of learning speech to capture the characteristics of the speaker,
Since it imposes a burden on the speaker, various measures have been taken to limit the minimum number of times of vocalization (for example, Japanese Patent No. 2037877).
On the other hand, as another learning method, it is possible to add an acoustic model sequence corresponding to the recognition result of an erroneously recognized word to the pronunciation dictionary as a correct answer sequence and recognize what is recognized as an incorrect sequence as a correct sequence. There is also a person registration method (Japanese Patent Laid-Open No. 8-171396).

【0003】[0003]

【発明が解決しようとする課題】従来の話者適応法は、
学習データが十分あれば、原理的に確実に認識性能を向
上できる手法であるが、ほとんど全ての実用上システム
では行われているように、話者の学習負担を考慮して発
声回数が絞られた場合、学習データに存在しない一部の
発声に対して、逆に認識率が低下してしまう可能性があ
るという問題がある。一方、従来の話者登録法は、学習
された発声部分の認識率は確実に向上するが、多くの発
声内容で認識しにくい話者の場合は、学習時に認識しに
くい全ての発声をしなければならず学習に負担がかか
る、という問題がある。
The conventional speaker adaptation method is as follows.
In principle, this method can certainly improve the recognition performance if there is sufficient learning data.However, as is the case with almost all practical systems, the number of utterances is narrowed in consideration of the learning burden on the speaker. In that case, there is a problem that the recognition rate may be reduced on the contrary for some utterances that do not exist in the learning data. On the other hand, the conventional speaker registration method surely improves the recognition rate of learned utterances, but in the case of a speaker that is difficult to recognize due to a large amount of utterance content, all utterances that are difficult to recognize during learning must be used. There is a problem that learning is burdensome.

【0004】本発明の目的は、従来の話者適応学習と話
者登録学習の問題点を解決し、話者に負担にならない学
習発声量で、学習後に確実に認識率を向上させる話者学
習法を提供するものである。
An object of the present invention is to solve the problems of conventional speaker adaptation learning and speaker registration learning, and to improve the recognition rate after learning with a learning voicing amount that does not burden the speaker. It provides the law.

【0005】[0005]

【課題を解決するための手段】上述した課題を解決する
ために、請求項1から5に記載の話者学習法は、話者の
学習用音声を用いて音響モデルパラメータを再学習し、
話者に適応した音響モデルを作成する手段と、誤認識し
た単語の認識結果に相当する音響モデル系列を正解系列
として発音辞書に追加する手段と、認識しやすさが発声
内容に依存するかどうかを判断する手段とから構成され
る。
In order to solve the above-mentioned problems, the speaker learning method according to any one of claims 1 to 5 re-learns acoustic model parameters using a speaker's learning voice,
A means to create an acoustic model adapted to the speaker, a means to add the acoustic model sequence corresponding to the recognition result of the misrecognized word to the pronunciation dictionary as a correct sequence, and whether the recognizability depends on the utterance content. And a means for determining.

【0006】[0006]

【発明の実施の形態】以下、図面を参照して本発明の請
求項1〜5に記載の話者学習法を説明する。
BEST MODE FOR CARRYING OUT THE INVENTION A speaker learning method according to claims 1 to 5 of the present invention will be described below with reference to the drawings.

【0007】図1は本発明の請求項1〜5の話者学習法
ブロック図である。
FIG. 1 is a block diagram of a speaker learning method according to claims 1 to 5 of the present invention.

【0008】各話者が自分に対する認識性能を向上させ
る必要を感じた場合に選択するように設定された話者学
習機能において、まず、システムからユーザに対し特定
単語発声を促し、話者の特定単語発声が入力される。こ
の発声内容は、各話者に対して、予め準備した標準音声
がどのくらい適切かを判断するのに必要な最低限の内容
であり、たとえば日本語認識の場合は、5母音を全て含
む単語「マイクテスト」などの内容がふさわしい。シス
テムが単語認識の場合には5母音が全て含まれるように
対象単語から複数単語を選択しても良い。
In the speaker learning function set so that each speaker selects when he / she needs to improve the recognition performance for himself / herself, the system first prompts the user to speak a specific word to identify the speaker. The word utterance is input. This utterance content is the minimum content necessary for each speaker to judge how appropriate the standard speech prepared in advance is, and in the case of Japanese recognition, for example, the word " Contents such as "microphone test" are appropriate. When the system recognizes words, a plurality of words may be selected from the target words so that all five vowels are included.

【0009】この発声に対して音声認識処理1で通常の
認識処理が行われ、認識スコア算出処理2で認識結果と
認識信頼度スコアが計算される。認識結果は、認識結果
の音素または音節系列と正解の音素系列とを比較し、異
なっている部分を誤りとし一致している部分を正解とし
て、正解系列の各音素毎に正誤を記録しておく。また信
頼度スコアは、たとえば正解音素または音節系列と発声
された結果との各音素または音節毎の音響的距離スコア
であり、距離尺度として重み付きケプストラム距離を用
いた場合は、各音素の信頼度は式1で算出されるものを
用いてもよい。
A normal recognition process is performed on this utterance in the voice recognition process 1, and a recognition result and a recognition reliability score are calculated in the recognition score calculation process 2. As for the recognition result, the phoneme or syllable sequence of the recognition result is compared with the correct phoneme sequence, and the difference is regarded as an error and the matching part is regarded as the correct answer, and the correctness is recorded for each phoneme of the correct answer sequence. . The reliability score is, for example, an acoustic distance score for each phoneme or syllable between the correct phoneme or syllable sequence and the uttered result, and when the weighted cepstrum distance is used as the distance measure, the reliability of each phoneme is calculated. May be calculated by Equation 1.

【0010】[0010]

【数1】 [Equation 1]

【0011】学習法決定処理3では、信頼度スコアが閾
値以下であるか、閾値以上であったとしても誤認識して
いる音素または音節(適応候補音素または音節と呼ぶ)
の全発声に含まれる音素または音節に対する割合を計算
する。この割合が大きい場合は、発声内容に依存せず話
者の発声特徴が標準音声に適用していないことが推定さ
れ、全ての標準音声を話者に適用するように学習する必
要があると考えられる。また、この割合が小さい場合に
は、誤認識は発声内容に依存しており、話者の発声特徴
と標準音声は適用しているが、特定の発声においてのみ
学習が必要であると考えられる。従って、この割合が一
定値以上である場合、話者適応学習を選択し、一定値以
下である場合、話者登録学習を選択する。
In the learning method determination process 3, a phoneme or syllable (referred to as an adaptive candidate phoneme or syllable) which is erroneously recognized even if the reliability score is equal to or lower than the threshold value or equal to or higher than the threshold value.
Calculate the ratio of phonemes or syllables included in all utterances of. If this ratio is large, it is estimated that the speaker's utterance features do not apply to the standard voice, regardless of the utterance content, and it is necessary to learn to apply all standard voices to the speaker. To be If this ratio is small, the misrecognition depends on the utterance content, and although the utterance feature of the speaker and the standard voice are applied, it is considered that learning is necessary only for a specific utterance. Therefore, if this ratio is a certain value or more, speaker adaptive learning is selected, and if it is less than a certain value, speaker registration learning is selected.

【0012】話者適応学習を選択した場合は、話者適応
処理4で、ユーザにさらに適応するに必要最低限の発声
を促す。話者適応法は、たとえば、特開平5-53599に記
載のVFS法を利用した場合には、標準音響モデルと学
習用入力音声パラメータとをマッチングし、対応するパ
ラメータの関係からファジー級関数を求め、求められた
関数を重みとして、標準音声を学習用入力音声に近づく
ように標準音響モデルのパラメータを更新している。
When the speaker adaptation learning is selected, the speaker adaptation processing 4 prompts the user to make a minimum necessary utterance for further adaptation. In the speaker adaptation method, for example, when the VFS method described in Japanese Patent Laid-Open No. 5-53599 is used, a standard acoustic model is matched with a learning input speech parameter, and a fuzzy class function is obtained from the relationship between the corresponding parameters. , The parameters of the standard acoustic model are updated so that the standard speech approaches the learning input speech with the obtained function as a weight.

【0013】また、話者登録学習を選択した場合には、
話者登録処理5で、学習決定処理で算出した適応候補音
素または音節が含まれている単語のみの発声を促し、適
応候補音素に相当する音素系列を含む単語の音素系列
に、発声に対する音素または音節認識結果系列を発音辞
書7に追加する。たとえば、「メニュー」という単語が
誤認識を起こす場合、この単語のみの発声を促し、その
認識結果が「デニュー」であったとする。音響モデルと
して音素モデルを使用している場合には、「メニュー」
の正しい音素モデル系列は/m e ny u u/であり、認識結
果音素系列は/d eny u u/である。この話者の場合、単
語の始めであり、次に/e/が続く音素/m/は/d/に誤る傾
向があることがわかる。そこで、認識対象単語の中で、
単語の先頭であり、次が/e/である/m/は/d/と誤っても/
m/と認識するように、発音辞書に音素系列を追加する。
この例の場合には、もともと辞書上で「メニュー/m e n
y uu/」であったところに/d e ny u u/を追加し、「メ
ニュー/m e ny u u/または/de ny u u/」と辞書を変更
する。これにより、この話者が「メニュー」を/d e ny
u u/ と認識しても結果的には「メニュー」が認識でき
ることになる。
If speaker registration learning is selected,
In the speaker registration process 5, the utterance of only the word including the adaptive candidate phoneme or syllable calculated in the learning determination process is urged, and the phoneme sequence of the utterance is added to the phoneme sequence of the word including the phoneme sequence corresponding to the adaptive candidate phoneme. The syllable recognition result sequence is added to the pronunciation dictionary 7. For example, if the word "menu" causes erroneous recognition, it is assumed that only this word is uttered and the recognition result is "denu". If you are using a phoneme model as an acoustic model, click "Menu".
The correct phoneme model sequence of is / me ny uu /, and the recognition result phoneme sequence is / d eny uu /. For this speaker, it can be seen that the phoneme / m /, which is the beginning of a word and is followed by / e /, tends to be mistaken for / d /. So, in the recognition target word,
/ M /, which is the beginning of a word and the next is / e /, is mistaken as / d /
Add a phoneme sequence to the pronunciation dictionary to recognize it as m /.
In the case of this example, the menu originally "menu / men
Add "/ de ny uu /" instead of "y uu /" and change the dictionary to "menu / me ny uu / or / de ny uu /". This will cause this speaker to drop the "menu"
Even if it is recognized as uu /, the result is that the "menu" can be recognized.

【0014】以上のように、話者の発声が発声内容に依
存せずに誤るかどうかを推定し、発声内容に依存しない
場合は話者適応学習、依存する場合は話者登録学習を行
うことにより、従来の話者適応学習で、適応するための
多くの学習発声をしたにもかかわらず認識率が低下する
問題を、話者適応学習ではなく話者登録学習を行うこと
で解決することができる。また、従来の話者登録学習
で、多くの単語を発声しなければ学習できなかった問題
を、話者登録学習ではなく話者適応学習を行うことで解
決することができる。
As described above, it is estimated whether or not the utterance of the speaker is erroneous without depending on the utterance content, and if it does not depend on the utterance content, speaker adaptive learning is performed, and if so, speaker registration learning is performed. Thus, the problem that the recognition rate decreases in the conventional speaker adaptation learning despite many learning utterances can be solved by performing speaker registration learning instead of speaker adaptation learning. it can. In addition, the problem that cannot be learned without uttering many words in the conventional speaker registration learning can be solved by performing speaker adaptation learning instead of speaker registration learning.

【0015】[0015]

【発明の効果】以上詳述したように、本発明に係る請求
項1に記載の話者学習法は、各話者の認識しやすさと発
声内容の依存の強さによって、話者適応学習を行うか話
者登録学習を行うかの選択を行い、どちらかの学習を話
者に促すことにより、従来の話者適応学習において、適
応するための多くの学習発声をしたにもかかわらず認識
率が低下する問題を、話者適応学習のかわりに話者登録
学習を自動選択することで解決することができる。ま
た、従来の話者登録学習において、多くの単語を発声し
なければ学習できなかった問題を、話者登録学習のかわ
りに話者適応学習を自動選択することで解決することが
できる。従って、話者に負担にならない程度の学習量
で、確実に認識率を向上させることが可能である話者学
習法を提供するものである。
As described above in detail, the speaker learning method according to the first aspect of the present invention enables speaker adaptive learning to be performed depending on the recognizability of each speaker and the degree of dependence of the utterance content. By deciding whether to perform learning or speaker registration learning, and encouraging the speaker to learn either, recognition rate in the conventional speaker adaptation learning despite many vocalizations for adaptation Can be solved by automatically selecting speaker registration learning instead of speaker adaptation learning. Further, in the conventional speaker registration learning, a problem that cannot be learned without uttering many words can be solved by automatically selecting speaker adaptation learning instead of speaker registration learning. Therefore, the present invention provides a speaker learning method that can surely improve the recognition rate with a learning amount that does not burden the speaker.

【0016】以上詳述したように、本発明に係る請求項
2に記載の話者学習法は、認識のしやすさが発声内容に
依存するかどうかを判断する手段において、依存するこ
とが判断できる最低限の学習用発声に対する認識スコア
を計算し、スコアの高さから依存するかどうかを決定す
ることにより、従来の話者適応学習において、適応する
ための多くの学習発声をしたにもかかわらず認識率が低
下する問題を、話者適応学習のかわりに話者登録学習を
自動選択することで解決することができる。また、従来
の話者登録学習において、多くの単語を発声しなければ
学習できなかった問題を、話者登録学習のかわりに話者
適応学習を自動選択することで解決することができる。
従って、話者に負担にならない程度の学習量で、確実に
認識率を向上させることが可能である話者学習法を提供
するものである。
As described in detail above, the speaker learning method according to the second aspect of the present invention determines that the easiness of recognition depends on the utterance content. By calculating the recognition score for the minimum possible training utterance and determining whether it depends on the score's height, it is possible to use many learning utterances for adaptation in conventional speaker adaptation learning. The problem of a low recognition rate can be solved by automatically selecting speaker registration learning instead of speaker adaptation learning. Further, in the conventional speaker registration learning, a problem that cannot be learned without uttering many words can be solved by automatically selecting speaker adaptation learning instead of speaker registration learning.
Therefore, the present invention provides a speaker learning method that can surely improve the recognition rate with a learning amount that does not burden the speaker.

【0017】以上詳述したように、本発明に係る請求項
3に記載の話者学習法は、認識しやすさが発声内容に依
存するかどうかを判断した結果、依存すると判断された
場合には話者登録学習を行い、依存しないと判断された
場合には話者適応学習を行うことにより、従来の話者適
応学習において、適応するための多くの学習発声をした
にもかかわらず認識率が低下する問題を、話者適応学習
のかわりに話者登録学習を自動選択することで解決する
ことができる。また、従来の話者登録学習において、多
くの単語を発声しなければ学習できなかった問題を、話
者登録学習のかわりに話者適応学習を自動選択すること
で解決することができる。従って、話者に負担にならな
い程度の学習量で、確実に認識率を向上させることが可
能である話者学習法を提供するものである。
As described in detail above, the speaker learning method according to the third aspect of the present invention determines whether or not the recognizability depends on the utterance content, and when it is determined that the recognizability depends on the utterance content. Performs speaker registration learning, and if it is determined that it does not depend on the speaker adaptation learning, the recognition rate is increased in the conventional speaker adaptation learning, even though many learning utterances are made for adaptation. Can be solved by automatically selecting speaker registration learning instead of speaker adaptation learning. Further, in the conventional speaker registration learning, a problem that cannot be learned without uttering many words can be solved by automatically selecting speaker adaptation learning instead of speaker registration learning. Therefore, the present invention provides a speaker learning method that can surely improve the recognition rate with a learning amount that does not burden the speaker.

【0018】以上詳述したように、本発明に係る請求項
4に記載の話者学習法は、認識スコアを、認識結果の正
誤結果あるいは標準音声との距離値あるいは左記距離値
の信頼度を各々単独かまたは組み合わせて算出されるこ
とにより、従来の話者適応学習において、適応するため
の多くの学習発声をしたにもかかわらず認識率が低下す
る問題を、話者適応学習のかわりに話者登録学習を自動
選択することで解決することができる。また、従来の話
者登録学習において、多くの単語を発声しなければ学習
できなかった問題を、話者登録学習のかわりに話者適応
学習を自動選択することで解決することができる。従っ
て、話者に負担にならない程度の学習量で、確実に認識
率を向上させることが可能である話者学習法を提供する
ものである。
As described above in detail, in the speaker learning method according to the fourth aspect of the present invention, the recognition score is calculated based on the accuracy of the recognition result, the distance value from the standard voice, or the reliability of the distance value on the left. In the conventional speaker adaptive learning, the problem that the recognition rate decreases even though many learning utterances for adaptation are calculated by calculating them individually or in combination, instead of the speaker adaptive learning. The problem can be solved by automatically selecting person registration learning. Further, in the conventional speaker registration learning, a problem that cannot be learned without uttering many words can be solved by automatically selecting speaker adaptation learning instead of speaker registration learning. Therefore, the present invention provides a speaker learning method that can surely improve the recognition rate with a learning amount that does not burden the speaker.

【0019】以上詳述したように、本発明に係る請求項
5に記載の話者学習法は、認識スコアを、認識結果の正
誤結果あるいは標準音声との距離値あるいは左記距離値
の信頼度を各々単独かまたは組み合わせて算出されるこ
とにより、従来の話者適応学習において、適応するため
の多くの学習発声をしたにもかかわらず認識率が低下す
る問題を、話者適応学習のかわりに話者登録学習を自動
選択することで解決することができる。また、従来の話
者登録学習において、多くの単語を発声しなければ学習
できなかった問題を、話者登録学習のかわりに話者適応
学習を自動選択することで解決することができる。従っ
て、話者に負担にならない程度の学習量で、確実に認識
率を向上させることが可能である話者学習法を提供する
ものである。
As described above in detail, in the speaker learning method according to the fifth aspect of the present invention, the recognition score is calculated as the correctness / incorrectness result of the recognition result, the distance value from the standard voice, or the reliability of the distance value on the left. In the conventional speaker adaptive learning, the problem that the recognition rate decreases even though many learning utterances for adaptation are calculated by calculating them individually or in combination, instead of the speaker adaptive learning. The problem can be solved by automatically selecting person registration learning. Further, in the conventional speaker registration learning, a problem that cannot be learned without uttering many words can be solved by automatically selecting speaker adaptation learning instead of speaker registration learning. Therefore, the present invention provides a speaker learning method that can surely improve the recognition rate with a learning amount that does not burden the speaker.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例である話者学習法ブロック図FIG. 1 is a block diagram of a speaker learning method according to an embodiment of the present invention.

【符号の説明】[Explanation of symbols]

1 音声認識 2 認識スコア算出 3 学習法決定 4 話者適応 5 話者登録 6 音響モデル 7 発音辞書 8 認識スコアバッファ 1 voice recognition 2 Recognition score calculation 3 Learning method decision 4 speaker adaptation 5 Speaker registration 6 acoustic models 7 Pronunciation dictionary 8 recognition score buffer

───────────────────────────────────────────────────── フロントページの続き (72)発明者 芳澤 伸一 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 Fターム(参考) 5D015 AA02 AA03 GG01 GG04 GG05 GG06    ─────────────────────────────────────────────────── ─── Continued front page    (72) Inventor Shinichi Yoshizawa             1006 Kadoma, Kadoma-shi, Osaka Matsushita Electric             Sangyo Co., Ltd. F-term (reference) 5D015 AA02 AA03 GG01 GG04 GG05                       GG06

Claims (5)

【特許請求の範囲】[Claims] 【請求項1】 話者の学習用音声を用いて音響モデルパ
ラメータを再学習し、話者に適応した音響モデルを作成
する手段(以下話者適応学習と呼ぶ。)と、誤認識した
単語の認識結果に相当する音響モデル系列を正解系列と
して発音辞書に追加する手段(以下話者登録学習と呼
ぶ。)と、認識しやすさが発声内容に依存するかどうか
を判断する手段とを有し、話各話者の認識しやすさと発
声内容の依存の強さによって、話者適応学習を行うか話
者登録学習を行うかの選択を行い、どちらかの学習を話
者に促すことを特徴とする話者学習法。
1. A means for re-learning acoustic model parameters using a speaker's learning voice (hereinafter referred to as speaker adaptive learning), and a means for erroneously recognizing a word. It has means for adding an acoustic model sequence corresponding to the recognition result as a correct answer sequence to the pronunciation dictionary (hereinafter referred to as speaker registration learning), and means for determining whether the recognizability depends on the utterance content. , It is characterized that the speaker adaptive learning or speaker registration learning is selected according to the recognizability of each speaker and the dependence of the utterance content, and the speaker is urged to learn either of them. Speaker learning method.
【請求項2】 請求項1に記載の話者学習法において、
認識のしやすさが発声内容に依存するかどうかを判断す
る手段は、依存することが判断できる最低限の学習用発
声に対する認識スコアを計算し、スコアの高さから依存
するかどうかを決定することを特徴とする話者学習法。
2. The speaker learning method according to claim 1, wherein
The means to judge whether the easiness of recognition depends on the utterance content calculates the recognition score for the minimum learning utterance that can be judged to be dependent, and determines whether it depends from the height of the score. A speaker learning method characterized by that.
【請求項3】 請求項1に記載の話者学習法において、
認識しやすさが発声内容に依存するかどうかを判断した
結果、依存すると判断された場合には話者登録学習を行
い、依存しないと判断された場合には話者適応学習を行
うことを特徴とする話者学習法。
3. The speaker learning method according to claim 1, wherein
As a result of judging whether the recognizability depends on the utterance content, if it is judged to be dependent, speaker registration learning is performed, and if it is judged not to be dependent, speaker adaptation learning is performed. Speaker learning method.
【請求項4】 請求項2に記載の話者学習法における認
識スコアは、認識結果の正誤結果あるいは標準音声との
距離値あるいは左記距離値の信頼度を各々単独かまたは
組み合わせて算出されることを特徴とする話者学習法。
4. The recognition score in the speaker learning method according to claim 2, is calculated by using the accuracy of the recognition result, the distance value to the standard voice, or the reliability of the distance value on the left, respectively, alone or in combination. Speaker learning method characterized by.
【請求項5】 請求項2に記載の話者学習法における認
識スコアは、認識結果の正誤結果あるいは標準音声との
距離値あるいは左記距離値の信頼度を各々単独かまたは
組み合わせて算出されることを特徴とする話者学習法。
5. The recognition score in the speaker learning method according to claim 2, is calculated by using the accuracy of the recognition result, the distance value from the standard voice, or the reliability of the distance value on the left, respectively, alone or in combination. Speaker learning method characterized by.
JP2001378341A 2001-12-12 2001-12-12 Speaker learning apparatus and method for speech recognition Expired - Fee Related JP3876703B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2001378341A JP3876703B2 (en) 2001-12-12 2001-12-12 Speaker learning apparatus and method for speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2001378341A JP3876703B2 (en) 2001-12-12 2001-12-12 Speaker learning apparatus and method for speech recognition

Publications (3)

Publication Number Publication Date
JP2003177779A true JP2003177779A (en) 2003-06-27
JP2003177779A5 JP2003177779A5 (en) 2005-07-14
JP3876703B2 JP3876703B2 (en) 2007-02-07

Family

ID=19186094

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2001378341A Expired - Fee Related JP3876703B2 (en) 2001-12-12 2001-12-12 Speaker learning apparatus and method for speech recognition

Country Status (1)

Country Link
JP (1) JP3876703B2 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009532744A (en) * 2006-04-03 2009-09-10 ヴォコレクト・インコーポレーテッド Method and system for fitting a model to a speech recognition system
US8290773B2 (en) 2008-12-26 2012-10-16 Fujitsu Limited Information processing apparatus, method and recording medium for generating acoustic model
US8374870B2 (en) 2005-02-04 2013-02-12 Vocollect, Inc. Methods and systems for assessing and improving the performance of a speech recognition system
JP2013083798A (en) * 2011-10-11 2013-05-09 Nippon Telegr & Teleph Corp <Ntt> Sound model adaptation device, sound model adaptation method, and program
US8612235B2 (en) 2005-02-04 2013-12-17 Vocollect, Inc. Method and system for considering information about an expected response when performing speech recognition
US8756059B2 (en) 2005-02-04 2014-06-17 Vocollect, Inc. Method and system for considering information about an expected response when performing speech recognition
US8868421B2 (en) 2005-02-04 2014-10-21 Vocollect, Inc. Methods and systems for identifying errors in a speech recognition system
US8914290B2 (en) 2011-05-20 2014-12-16 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
US9978395B2 (en) 2013-03-15 2018-05-22 Vocollect, Inc. Method and system for mitigating delay in receiving audio stream during production of sound from audio stream
US11837253B2 (en) 2016-07-27 2023-12-05 Vocollect, Inc. Distinguishing user speech from background speech in speech-dense environments

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9202458B2 (en) 2005-02-04 2015-12-01 Vocollect, Inc. Methods and systems for adapting a model for a speech recognition system
US9928829B2 (en) 2005-02-04 2018-03-27 Vocollect, Inc. Methods and systems for identifying errors in a speech recognition system
US8374870B2 (en) 2005-02-04 2013-02-12 Vocollect, Inc. Methods and systems for assessing and improving the performance of a speech recognition system
US10068566B2 (en) 2005-02-04 2018-09-04 Vocollect, Inc. Method and system for considering information about an expected response when performing speech recognition
US8612235B2 (en) 2005-02-04 2013-12-17 Vocollect, Inc. Method and system for considering information about an expected response when performing speech recognition
US8756059B2 (en) 2005-02-04 2014-06-17 Vocollect, Inc. Method and system for considering information about an expected response when performing speech recognition
US8868421B2 (en) 2005-02-04 2014-10-21 Vocollect, Inc. Methods and systems for identifying errors in a speech recognition system
JP2009532744A (en) * 2006-04-03 2009-09-10 ヴォコレクト・インコーポレーテッド Method and system for fitting a model to a speech recognition system
US8290773B2 (en) 2008-12-26 2012-10-16 Fujitsu Limited Information processing apparatus, method and recording medium for generating acoustic model
US9697818B2 (en) 2011-05-20 2017-07-04 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
US8914290B2 (en) 2011-05-20 2014-12-16 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
US10685643B2 (en) 2011-05-20 2020-06-16 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
US11810545B2 (en) 2011-05-20 2023-11-07 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
US11817078B2 (en) 2011-05-20 2023-11-14 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
JP2013083798A (en) * 2011-10-11 2013-05-09 Nippon Telegr & Teleph Corp <Ntt> Sound model adaptation device, sound model adaptation method, and program
US9978395B2 (en) 2013-03-15 2018-05-22 Vocollect, Inc. Method and system for mitigating delay in receiving audio stream during production of sound from audio stream
US11837253B2 (en) 2016-07-27 2023-12-05 Vocollect, Inc. Distinguishing user speech from background speech in speech-dense environments

Also Published As

Publication number Publication date
JP3876703B2 (en) 2007-02-07

Similar Documents

Publication Publication Date Title
EP1557822B1 (en) Automatic speech recognition adaptation using user corrections
US7013276B2 (en) Method of assessing degree of acoustic confusability, and system therefor
US7401017B2 (en) Adaptive multi-pass speech recognition system
EP0907949B1 (en) Method and system for dynamically adjusted training for speech recognition
KR100826875B1 (en) On-line speaker recognition method and apparatus therefor
KR100305455B1 (en) Apparatus and method for automatically generating punctuation marks in continuous speech recognition
JP6654611B2 (en) Growth type dialogue device
US8886532B2 (en) Leveraging interaction context to improve recognition confidence scores
EP2048655A1 (en) Context sensitive multi-stage speech recognition
JPH0968994A (en) Method of recognizing words by pattern matching and apparatus for implementing the method
JP3876703B2 (en) Speaker learning apparatus and method for speech recognition
JP2004333543A (en) Voice interaction system and voice interaction method
JP2004325635A (en) Apparatus, method, and program for speech processing, and program recording medium
JP4293340B2 (en) Dialogue understanding device
EP1734509A1 (en) Method and system for speech recognition
JPH0667698A (en) Voice recognizer
JP4749990B2 (en) Voice recognition device
JP2001175276A (en) Speech recognizing device and recording medium
JP2001013988A (en) Method and device for voice recognition
JP4604424B2 (en) Speech recognition apparatus and method, and program
JPH05323990A (en) Speaker recognition method
JP3357752B2 (en) Pattern matching device
JPH0772899A (en) Voice recognizer
JPH11259086A (en) Voice recognition method and voice recognition device
JPH11338492A (en) Speaker recognition device

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20041116

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20041116

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20050704

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20060801

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060828

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20061010

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20061023

R151 Written notification of patent or utility model registration

Ref document number: 3876703

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R151

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091110

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101110

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111110

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121110

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121110

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131110

Year of fee payment: 7

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313113

S533 Written request for registration of change of name

Free format text: JAPANESE INTERMEDIATE CODE: R313533

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees