JP2001265385A

JP2001265385A - Speaker recognizing device

Info

Publication number: JP2001265385A
Application number: JP2000072911A
Authority: JP
Inventors: Yuji Hirayama; 裕司平山; Hirohide Ushida; 牛田　　博英; Hiroshi Nakajima; 宏中嶋
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2000-03-15
Filing date: 2000-03-15
Publication date: 2001-09-28
Anticipated expiration: 2020-03-15
Also published as: JP3849841B2

Abstract

PROBLEM TO BE SOLVED: To provide a speaker recognizing device capable of updating the registered voice at a proper timing and capable of securing the safety at the time of performing the updating of the registered voice. SOLUTION: The voice data of a speaker which are given via an input part 1 are compared with registered voice data becoming a reference which are stored in a voice data storage part 2 in a voice collating part 4 and a collation score indicating the degree of similarity is obtained in the part. When the collation score is larger than a threshold, this device judges that the voice data are of the person to move to a prescribe service. At this time, when, although the data are judged to be the voice data of the person, the collation score is low and the difference between the present data and the last data is large, a necessity of update judging part 7 judges that there is the necessity of the update to deliver the judged result to a voice registering and managing part 5. The part 5 updates the registered voice data to the voice data inputted this time on condition that the voice data are judged that there is the necessity of the update and, also, there is the agreement of the update from a speaker.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、話者認識装置に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speaker recognition device.

【０００２】[0002]

【発明の背景】コンピュータが行う本人確認の１つとし
て、従来から暗証番号やキーワードなどの予め登録した
特定の指定情報を入力し、係る指定情報が登録している
ものと一致する場合には本人であると判断するようにし
たものが一般に行われている。しかし、係るシステムで
は、指定情報が他人に知られたり、見破られたりした場
合には、その指定情報を盗用され、悪用されてしまうお
それがある。2. Description of the Related Art As one type of personal identification performed by a computer, specific identification information such as a personal identification number or a keyword that has been registered in advance is input, and if the identification information matches the registered information, the user is identified. Is generally performed. However, in such a system, if the designated information is known or found by another person, the designated information may be stolen and misused.

【０００３】そこで、身体的特徴（指紋，声紋，音声，
瞳等）を利用して本人確認をする技術があり、その中の
一つとして話者照合や話者識別等の話者認識がある。し
かし、話者認識の場合には、「声の経年変化による照合
精度の低下」を防ぐことが課題の一つとしてあり、係る
課題を解決するためには、適切な時期に登録音声の更新
（再登録）を行なう必要がある。係る更新をすることを
目的とした従来技術としては、以下に示すものがある。[0003] Therefore, physical characteristics (fingerprint, voiceprint, voice,
There is a technology for confirming the identity of a person using pupils and the like, and one of them is speaker recognition such as speaker verification and speaker identification. However, in the case of speaker recognition, one of the problems is to prevent “decrease in matching accuracy due to aging of the voice”, and in order to solve such a problem, updating the registered voice at an appropriate time ( Re-registration). Conventional techniques for performing such updating include the following.

【０００４】すなわち、入力音声が本人と判定された場
合に、自動的に登録音声を更新（再登録）するものがあ
る（特開昭５７−０１３４９３号公報）。この公報に開
示された発明によれば、時々刻々とわずかに変化してい
く本人の音声に追従して登録音声を更新することができ
るので経年変化に強くなるという効果を奏する。[0004] In other words, there is an apparatus that automatically updates (re-registers) a registered voice when the input voice is determined to be the person (Japanese Patent Laid-Open No. 57-013493). According to the invention disclosed in this publication, the registered voice can be updated in accordance with the voice of the person, which changes slightly from moment to moment, so that there is an effect that it becomes strong against aging.

【０００５】しかし、話者認識の場合には、その精度を
１００％完全なものにすることは実際には困難であるの
で、他人を誤って本人と認識してしまうおそれがある。
係る場合に、その他人の音声に基づいて登録音声が更新
されてしまうことになり、以後、正規の利用者が本人と
して認証されなくなるばかりか、音声更新時の話者（他
人）に正規の利用者の立場を乗っ取られてしまうという
問題がある。[0005] However, in the case of speaker recognition, it is actually difficult to make the accuracy 100% perfect, so that there is a possibility that another person may be mistakenly recognized as the person.
In such a case, the registered voice is updated based on the voice of the other person, and thereafter, not only the authorized user is not authenticated as the user, but also the authorized user (authorized person) at the time of voice update. There is a problem that the position of the person is taken over.

【０００６】また、本人が利用した場合であっても、や
むを得ず風邪声など通常と異なる音声で更新してしまう
と、風邪が治って通常の声に戻った後では正しく本人と
認証されなくなるおそれがある。[0006] Even if the user uses the voice, if the voice is unavoidably updated with a voice different from the normal voice such as a cold voice, the user may not be correctly authenticated after the cold has returned to the normal voice. is there.

【０００７】また、係る問題を解決するため、例えば、
登録音声更新時に正規の利用者に直接確認をとった（例
えば，正規利用者の連結先に電話して確認するなど）後
で実際に更新処理をする発明がある（特開平９−１２７
９７５号公報）。しかし、この公報に開示された発明で
は、その都度、確認手段で正規の利用者に連絡をとるた
め、処理が煩雑である。In order to solve such a problem, for example,
There is an invention in which an update process is actually performed after a confirmation is made directly to a legitimate user at the time of updating the registered voice (for example, by calling a connection destination of the legitimate user) (Japanese Patent Laid-Open No. 9-127)
975). However, in the invention disclosed in this publication, the processing is complicated because the confirmation means contacts the authorized user each time.

【０００８】さらに、登録音声の更新を行うタイミング
であるが、従来は定期的に更新するものがおおい。一
方、更新回数が増えると、それだけ他人等による不正登
録の可能性が高くなり、また、音声の経年変化をする時
期（期間）も人によりばらつきがあり、定期的に設定し
た更新期間と、経年変化してしまう経過時間との設定が
うまく行かず、更新期間のほうが長いと、正規の利用者
本人の声が変化し認識できなくなるおそれがあり、逆に
更新期間が短すぎると、更新回数の増加から、他人によ
る登録音声の不正登録の機会を増やすことになり、いず
れの場合も好ましくない。[0008] Furthermore, the timing for updating the registered voice is usually the one that is regularly updated. On the other hand, as the number of updates increases, the possibility of unauthorized registration by others increases, and the time (period) at which the sound changes over time varies from person to person. If the setting of the elapsed time is not set properly and the update period is longer, the voice of the authorized user may change and become unrecognizable.On the other hand, if the update period is too short, the number of updates may be reduced. Due to the increase, the chance of unauthorized registration of the registered voice by others is increased, which is not preferable in any case.

【０００９】この発明による登録音声の更新に関して、
適切なタイミングで登録音声の更新を行なうことがで
き、更新時における安全性を確保することのできる話者
認識装置を提供することを目的とする。Regarding the update of the registered voice according to the present invention,
An object of the present invention is to provide a speaker recognition device that can update a registered voice at an appropriate timing and can ensure security at the time of updating.

【００１０】[0010]

【課題を解決するための手段】この発明による話者認識
装置は、音声入力手段と、音声認識の基準となる登録音
声情報を保持する音声情報格納手段と、その音声情報格
納手段に格納された前記登録音声情報に基づいて、前記
音声入力手段から入力された音声情報が正規の話者から
発せられたものか否かを判断する音声照合手段とを備え
た話者認識装置を前提とする。そして、前記音声照合手
段における照合処理の際に求められる前記正規の話者ら
しさを示す照合スコア情報に基づいて、前記音声情報格
納手段に保持された登録音声情報の更新の必要性を判定
する判定手段と、前記判定手段の判定結果に基づいて前
記登録音声情報の更新をする更新手段とを備えるもので
ある。A speaker recognition apparatus according to the present invention includes a voice input unit, a voice information storage unit for holding registered voice information as a reference for voice recognition, and a voice information storage unit. It is assumed that the speaker recognition device includes a voice collating unit that determines whether or not the voice information input from the voice input unit is from a legitimate speaker based on the registered voice information. And determining whether the registered voice information stored in the voice information storage means needs to be updated, based on the verification score information indicating the authenticity of the speaker obtained in the verification processing in the voice verification means. Means, and updating means for updating the registered voice information based on the result of the judgment by the judging means.

【００１１】ここで、「照合スコア情報」は、「正規の
話者らしさ」を示すものであれば、実施の形態で示すよ
うに、具体的な数値で特定されるものでも良いし、本人
らしさが「高い」，「中くらい」，「低い」などの幅を
持ったものでも良い。Here, the "collation score information" may be specified by specific numerical values as shown in the embodiment, as long as it indicates "authority of a normal speaker", or may be information of an individual. May have a range such as "high", "medium", and "low".

【００１２】登録された登録音声情報と、入力された今
回発話された音声情報が非常に近いものの場合には、経
年変化もなく、更新する必要が低いといえる。一方、本
人が発話して得られた音声情報と、登録音声情報があま
り似ていなくなると、経年変化が生じ、そのままでは本
人であっても認識されなくなるおそれがあるので、登録
音声情報の更新をする必要がある。そこで、照合スコア
情報に基づいて更新の必要があるか否かを判定すること
により、適切なタイミングで登録音声情報の更新を行な
うことができ、経年変化に強くなる。さらに、無意味に
定期的に更新をすることにより、他人の発話に基づいて
更新処理がされてしまうことも防止できる。If the registered registered voice information is very similar to the input voice information that has been uttered this time, it can be said that there is no aging change and the need for updating is low. On the other hand, if the voice information obtained by uttering the person and the registered voice information are not very similar, aging occurs, and the person himself may not be recognized as it is. There is a need to. Therefore, by determining whether or not it is necessary to update the registered voice information based on the collation score information, the registered voice information can be updated at an appropriate timing, and it is resistant to aging. Further, by performing the update periodically in a meaningless manner, it is possible to prevent the update process from being performed based on the utterance of another person.

【００１３】また、各構成要件と実施の形態との対応を
採ると、音声入力手段は入力部１に、音声情報格納手段
は音声データ格納部２に、判定手段は更新必要性判定部
７に、更新手段は音声登録・管理部５にそれぞれ対応す
る。[0013] In addition, if correspondence is made between each component and the embodiment, the voice input means is provided in the input section 1, the voice information storage means is provided in the voice data storage section 2, and the determination means is provided in the update necessity determination section 7. , Updating means correspond to the voice registration / management unit 5, respectively.

【００１４】この発明の好ましい一実施態様において
は、前記更新手段は、前記判定手段における前記照合ス
コア情報に基づいた判定結果が更新必要性有りで、か
つ、話者からの更新処理の同意情報があることを条件に
前記登録音声情報の更新をするようにすることである。
このようにすると、例えば風邪声の場合など、利用者の
声が更新には不適切な場合には更新を避けることができ
る。[0014] In a preferred embodiment of the present invention, the updating means determines that the determination result based on the collation score information in the determining means needs to be updated, and that the consent information of the updating process from the speaker is required. The registered voice information is updated on the condition that there is a certain condition.
In this way, when the voice of the user is inappropriate for updating, for example, in the case of a cold voice, updating can be avoided.

【００１５】もっとも係る機能は必ずしも設けなくても
良く、判定手段で更新の必要があると判断した場合には
そのまま更新処理を実行するようにしても良い。つま
り、風邪声などで登録された場合、次回の通常の声に戻
ったときに本人と認識されることにより、通常の声に基
づいて再度登録音声情報の更新がされるならば、それ以
降の認識は問題が無くなる。さらに、更新の判定条件
は、上記の条件以外に付加することはかまわない。一例
を示すと、通常よりも厳しく照合を行なうことによって
登録音声更新に伴う危険を回避することなどもできる。
厳しく照合を行なうには、（ａ）暗証情報照合との併用音声の一致に加えて、パスワードや暗証番号など本人以
外の者にはわからない知識を確認する；（ｂ）照合スコアの閾値変更通常のときよりも閾値を厳しく設定する；（ｃ）照合回数を増やす２回，３回と繰り返し照合を行ない、すべての照合で本
人と判定された場合にのみ登録音声の更新を許す；など
の方法がある。Such functions need not necessarily be provided, and when the determination means determines that updating is necessary, the updating process may be executed as it is. In other words, if a person is registered with a cold voice, etc., the registered voice information will be updated again based on the normal voice by recognizing the person when returning to the normal voice next time. Recognition is no problem. Furthermore, the update determination condition may be added in addition to the above conditions. As an example, by performing the comparison more strictly than usual, it is possible to avoid the danger associated with updating the registered voice.
To perform strict collation, (a) Use with password information collation In addition to voice matching, confirm knowledge that is not known to anyone other than the user, such as passwords and passwords; (b) Change thresholds for collation scores (C) Increasing the number of times of collation Two or three repetitions of collation are performed, and the registered voice can be updated only when the identity is determined by all the collations. is there.

【００１６】また、更新時の発話の特徴量のばらつきに
より登録音声の更新重みを設定し、更新前の登録音声情
報も加味して新たな登録音声情報を生成しても良い。す
ると、たとえ他人の声で更新されてしまっても、更新前
の本人の音声情報（特徴量）を残しておくことで正規の
利用者が照合される可能性が残る。Further, the update weight of the registered voice may be set based on the variation of the feature amount of the utterance at the time of updating, and new registered voice information may be generated in consideration of the registered voice information before updating. Then, even if it is updated by another person's voice, there is a possibility that a legitimate user is collated by retaining the voice information (feature amount) of the person before the update.

【００１７】この発明で言う話者認識は、実施の形態で
示した話者照合のみならず話者識別も含む。つまり、話
者照合は、予め特定された一人の話者の登録音声情報
と、入力音声情報を比較し、各々の音声の話者が同一で
あるか否かを判定する（１対１の認識）ので、一人の話
者を特定するために、話者特定情報を入力する。一方、
話者識別は、予め登録されている全ての話者の登録音声
情報と入力された音声情報とを比較し、入力音声の話者
がどの登録音声の話者と同一であるかを識別する（１対
多の認識）ものである。The speaker recognition according to the present invention includes not only the speaker verification described in the embodiment but also the speaker identification. That is, in the speaker verification, the registered voice information of one speaker specified in advance is compared with the input voice information, and it is determined whether or not the speakers of each voice are the same (one-to-one recognition). Therefore, in order to identify one speaker, speaker identification information is input. on the other hand,
The speaker identification compares the registered voice information of all speakers registered in advance with the input voice information to identify which registered voice is the same as the speaker of the input voice ( One-to-many recognition).

【００１８】この発明による話者認識装置を構成する各
手段を専用のハードウエア回路によって実現することが
できるし、プログラムされたコンピュータによって実現
することもできる。Each means constituting the speaker recognition device according to the present invention can be realized by a dedicated hardware circuit, or can be realized by a programmed computer.

【００１９】[0019]

【発明の実施の形態】図１は、本発明の好適な一実施の
形態を示している。本システムでは、話者が発生する音
声等の情報を入力部１を介して装置内に取り込み、音声
データ格納部２に格納された話者特定情報の示す登録音
声データ等の辞書情報に基づいて、利用中の話者が本人
であるか否かの話者照合を行い、本人確認ができたなら
ば所定のサービスを実行するものに適用される。そし
て、入力された音声データに基づいて、認識・照合処理
する際の基準・辞書データとなる音声データ格納部２に
登録された登録音声データの更新の必要の有無を判断
し、必要な場合には更新処理をするようになっている。
これにより、利用者の音声がたとえ経年変化したとして
も、それに対応し、高い認識率を維持できる。具体的な
構成は以下の通りである。FIG. 1 shows a preferred embodiment of the present invention. In this system, information such as a voice generated by a speaker is fetched into the apparatus via the input unit 1, and based on dictionary information such as registered voice data indicated by the speaker identification information stored in the voice data storage unit 2. The present invention is applied to a case where the speaker verification is performed to determine whether or not the speaker in use is the principal, and if the identity is confirmed, a predetermined service is executed. Then, based on the input voice data, it is determined whether or not it is necessary to update the registered voice data registered in the voice data storage unit 2 serving as a reference / dictionary data at the time of recognition / collation processing. Performs update processing.
Thus, even if the user's voice changes over time, a high recognition rate can be maintained in response thereto. The specific configuration is as follows.

【００２０】まず、音声データ格納部２は、話者特定情
報と登録音声データを対応付けて記憶保持するデータベ
ースである。なお、登録音声データ（特徴量）のほか
に、後述する入力理解部３による照合キーワード音声の
認識結果をも対応づけて保持しておくことにより、登録
音声更新時に照合キーワードとは異なるキーワード音声
による更新を防止することもできる。例えば、登録した
本人しかキーワードの内容を知らないようにしておけ
ば、異なるキーワードで登録音声を更新しようとした利
用者は本人ではない可能性があるため、その場合は登録
音声の更新を中止することができる。First, the voice data storage unit 2 is a database for storing speaker holding information and registered voice data in association with each other. It should be noted that, in addition to the registered voice data (feature amount), the recognition result of the collation keyword voice by the input understanding unit 3 to be described later is also held in association with the registered voice data. Updates can also be prevented. For example, if only the registered person knows the content of the keyword, the user who tried to update the registered voice with a different keyword may not be the person himself, in which case the update of the registered voice is stopped. be able to.

【００２１】そして、図１に示すように、入力部１によ
り、話者が発声する音声を装置内に取り込むようになっ
ている。この入力部１としては、例えばマイクや電話な
どを用いて取得された音声を取り込むものである。この
入力部１によって装置内に取り込まれる音声としては、
照合キーワード，話者特定情報（口座番号など），シス
テムの質問に対する回答がある。すなわち、本形態で
は、話者特定情報の入力も音声を利用することで、入力
部１が話者照合装置における音声入力手段と話者特定情
報入力手段を兼用している。Then, as shown in FIG. 1, the input section 1 takes in the voice uttered by the speaker into the apparatus. The input unit 1 captures voice acquired using, for example, a microphone or a telephone. The audio captured by the input unit 1 into the device includes:
There are collation keywords, speaker identification information (such as account number), and answers to system questions. In other words, in the present embodiment, the input of the speaker identification information also uses the voice, so that the input unit 1 also serves as the voice input unit and the speaker identification information input unit in the speaker verification device.

【００２２】なお、照合キーワード以外の発話である話
者特定情報とシステムからの質問に対する回答は、必ず
しも音声で入力される必要はなく、例えば、電話のプッ
シュボタンで入力されてもよい。また、入力部１として
は、少なくとも音声入力に対応する機能を有する必要が
あるが、音以外の入力に対応する機能をさらに有するよ
うにしてもよい。一例としては、情報提供端末等の場合
に、話者特定情報をテンキーなどのコンソールから入力
する場合に、それに対応する入力機能を有することなど
がある。Note that the speaker identification information, which is an utterance other than the collation keyword, and the answer to the question from the system need not necessarily be input by voice, but may be input by, for example, a telephone push button. Also, the input unit 1 needs to have at least a function corresponding to voice input, but may further have a function corresponding to input other than sound. As an example, in the case of an information providing terminal or the like, when the speaker identification information is input from a console such as a numeric keypad, it has an input function corresponding thereto.

【００２３】この入力部１から入力された情報は、入力
理解部３と、音声照合部４と音声登録・管理部５に与え
られる。まず、入力理解部３は、入力部１から受け取っ
た音声波形データの表す音声の文字情報を認識し、その
文字列を所定の処理部に向けて出力するものである。具
体的には、音声波形を周波数分析して得られた特徴量系
列と予め用意された認識対象語句の特徴量系列とのパタ
ーンマッチングにより行なう。ここでは、暗証番号その
他の各種番号を入力する際に必要となる「０〜９」の各
数字と、ユーザ（話者）の回答として想定される「は
い」や「いいえ」などの各語句を認識対象語句としてそ
れぞれ特徴量系列を用意する。The information input from the input unit 1 is provided to an input understanding unit 3, a voice collating unit 4, and a voice registration / management unit 5. First, the input understanding unit 3 recognizes the character information of the voice represented by the voice waveform data received from the input unit 1, and outputs the character string to a predetermined processing unit. More specifically, pattern matching is performed between a feature amount sequence obtained by frequency analysis of a speech waveform and a feature amount sequence of a prepared phrase to be recognized. Here, each number of “0 to 9” required when inputting a personal identification number and other various numbers, and each word such as “Yes” or “No” assumed as an answer of the user (speaker) are described. A feature amount sequence is prepared for each of the words to be recognized.

【００２４】もちろん、係る音声以外の入力として、例
えばテキストを用いる場合には、入力内容であるテキス
トを正規化したものを認識結果とすることにより音声認
識の場合と同様の結果が得られる。また、入力理解の結
果（音声認識結果）によって、入力された照合キーワー
ドが予め定められたものか、或いは、現在登録されてい
るキーワードと一致するかを判定することも可能にな
る。なお、具体的な特徴量を用いたマッチング処理は、
従来から行われているものをそのまま適用することがで
きるので、具体例をあげた詳細な説明は省略する。Of course, when a text is used as an input other than the voice, for example, the same result as in the case of voice recognition can be obtained by using a normalized text as the input content as a recognition result. It is also possible to determine whether the input matching keyword is a predetermined one or matches a currently registered keyword based on the result of the input understanding (speech recognition result). Note that the matching process using specific feature amounts
Since what has been conventionally performed can be applied as it is, a detailed description with a specific example is omitted.

【００２５】また、音声照合部４は、音声データ格納部
２に登録された話者特定情報の示す登録音声データと、
入力部１や入力理解部３から入力された話者の音声デー
タを比較して入力音声と登録音声の話者が同一であるか
否かを判定する。The voice collating unit 4 further includes registered voice data indicated by the speaker identification information registered in the voice data storage unit 2,
The voice data of the speaker input from the input unit 1 or the input understanding unit 3 is compared to determine whether the input voice and the registered voice are the same.

【００２６】一例としては、その判定を行なうための照
合スコアを算出する。照合スコアは、−１．０から十
１．０までの値をとる。そして、正の符号は、入力音声
の話者が登録音声の（どちらかというと）話者本人であ
ることを示し、負の符号は、どちらかというと他人であ
ることを示している。また、絶対値は各符号で示された
内容（本人または他人）の確からしさを示す。従って、
正の値が大きいほど（１．０に近いほど）本人らしさが
高いと言える。そして、求めた照合スコアと予め定めた
閾値とを比較し、閾値よりも大きければ本人（閾値以下
の場合は他人）と判断するようにしている。そして、そ
の照合結果は、応答生成部６並びに更新必要性判定部７
に与える。As an example, a collation score for making the determination is calculated. The collation score takes a value from -1.0 to ten 1.0. A positive sign indicates that the speaker of the input speech is the speaker of the registered speech (somewhat), and a negative sign indicates that the speaker is rather another person. Further, the absolute value indicates the likelihood of the content (the person or another person) indicated by each code. Therefore,
It can be said that the greater the positive value (the closer to 1.0), the higher the identity. Then, the obtained matching score is compared with a predetermined threshold value, and if the matching score is larger than the threshold value, it is determined that the user is the person (if the value is equal to or smaller than the threshold value, another person). The collation result is sent to the response generation unit 6 and the update necessity determination unit 7
Give to.

【００２７】応答生成部６は、入力理解部３で行った入
力理解結果や音声照合部４で行った音声照合結果や、後
述する更新必要性判定部７で行った登録音声更新の要否
判定結果に基づいてシステムの発話すべき内容を決定す
る。具体的には，システムの発する音声ガイダンスの内
容や照合結果に応じて発話する内容を予めすべて記億し
ておき、そのときの入力理解結果・照合結果・更新要否
判定結果に応じて適切な発話内容を選択する。そして、
選択した発話内容（発話文）を出力部８に与えるように
なっている。The response generation unit 6 determines whether the input comprehension results obtained by the input comprehension unit 3 and the voice verification results performed by the voice verification unit 4 and whether the registered voice update needs to be performed by the update necessity determination unit 7 described below. The contents to be uttered by the system are determined based on the result. Specifically, the contents of the voice guidance issued by the system and the contents uttered in accordance with the verification result are all recorded in advance, and an appropriate value is determined in accordance with the input understanding result, the verification result, and the update necessity determination result at that time. Select the utterance content. And
The selected utterance content (utterance sentence) is provided to the output unit 8.

【００２８】出力部８は、生成された発話文を音声に変
換してスピーカやヘッドフォン，電話の受話器スピーカ
を利用して出力するものである。発話文を音声に変換す
るには、以下に示す各種の方式が採れる。＊録音編集方式生成される可能性のある発話内容の全体、または、部分
ごとに発話内容を読み上げた音声を録音しておき、出力
時には、発話内容に対応する録音ファイルを再生して音
声を出力するものである。The output unit 8 converts the generated utterance sentence into a voice and outputs it using a speaker, headphones, or a telephone receiver speaker. In order to convert an utterance sentence into a voice, various methods described below are adopted. * Recording / editing method Record the voice of the utterance contents read out for the whole or each part of the utterance contents that may be generated, and output the sound by playing back the recording file corresponding to the utterance contents when outputting. Is what you do.

【００２９】＊テキスト音声合成方式既存技術として存在する音声合成エンジンを利用する。
すなわち、発話内容を表す文字列を入力として与える
と、その文字列の読み上げ音声を出力することができる
ものである。* Text-to-speech synthesis system A speech synthesis engine existing as an existing technology is used.
That is, when a character string representing the utterance content is given as an input, a reading voice of the character string can be output.

【００３０】＊録音編集とテキスト音声合成の組合せ方
式発話内容のうち、録音部品が用意されている部分は録音
編集方式で発声出力をし、対応する録音部品が存在しな
い部分はテキスト音声合成により発声出力するものであ
る。* Combination method of recording and editing and text-to-speech synthesis Of the utterance contents, a part where a recording part is prepared is output as a utterance by the recording and editing method, and a part where there is no corresponding recording part is uttered by text-to-speech synthesis. Output.

【００３１】なお、出力部８としては、上記のように音
声出力に限ることは無い。音声以外の出力としては、例
えばテキストを用いることができる。この場合に、出力
部８は発話内容をＰＣ用ディスプレイ画面上に表示する
ことができる。The output section 8 is not limited to audio output as described above. As an output other than voice, for example, text can be used. In this case, the output unit 8 can display the utterance content on the PC display screen.

【００３２】上記した入力部１からの入力に従って本人
確認をし、出力部８から所定のメッセージを出力するた
めの各部の構成は、基本的に従来のものによって実現で
きるので、その詳細な説明を省略する。The structure of each unit for confirming the identity in accordance with the input from the input unit 1 and outputting a predetermined message from the output unit 8 can be basically realized by a conventional one. Omitted.

【００３３】ここで本発明では、更新必要性判定部７を
設け、音声照合部４における照合結果に基づいて、認識
する際の基準となる音声データを構成するか否かを判断
するようにした。つまり、この更新必要性判定部７は、
利用者の声の経年変化を検出し、その結果に基づき登録
音声を更新する必要性を判定する。声の経年変化の検出
は、照合スコアと入力音声・登録音声の各特徴量を参照
して行なう（詳細は後述する）。Here, in the present invention, the update necessity judging section 7 is provided, and it is judged whether or not to constitute speech data as a reference for recognition based on the collation result in the speech collating section 4. . That is, the update necessity determination unit 7
Aging of the voice of the user is detected, and the necessity of updating the registered voice is determined based on the result. The secular change of the voice is detected by referring to the collation score and the feature amounts of the input voice and the registered voice (details will be described later).

【００３４】そして、更新必要性判定部７が更新の必要
ありと判断した場合には、その判断結果を音声登録・管
理部５と応答生成部６に対して与える。応答生成部６
は、更新をしても良いかの確認を利用者に促すメッセー
ジを出力するので、それを受けた利用者が更新の同意を
すると、その同意情報が音声登録・管理部５に与えられ
る（入力部１を介して直接或いは入力理解部３を経由し
て与えられる）。When the update necessity determination unit 7 determines that the update is necessary, the determination result is given to the voice registration / management unit 5 and the response generation unit 6. Response generator 6
Outputs a message prompting the user to confirm whether the update can be performed. If the user who has received the message consents to the update, the consent information is given to the voice registration / management unit 5 (input Provided directly via the unit 1 or via the input understanding unit 3).

【００３５】音声登録・管理部５は、利用者（話者）が
更新登録に同意した場合に、話者が照合キーワードを発
声した音声データを話者特定情報と対応づけて音声デー
タ格納部２に登録する。これにより、音声データ格納部
２に格納されたデータ内容の更新が成され、経時変化に
追従して対応することになる。When the user (speaker) agrees to the renewal registration, the voice registration / management unit 5 associates the voice data in which the speaker uttered the collation keyword with the speaker identification information, and stores the voice data in the voice data storage unit 2. Register with. As a result, the content of the data stored in the audio data storage unit 2 is updated, and the data content is updated according to aging.

【００３６】次に、このシステムの全体の動作を図２，
図３に示すフローチャートを用いながら説明しつつ、必
要に応じて各処理部の詳細な機能を説明する。また、各
機能の理解を容易にするため、具体例を適宜引用しなが
ら説明する。引用する具体例としては、次のような話者
照合装置を考える。つまり、電話による音声入出力をす
るもので、話者特定番号として９桁の口座番号を使用
し、照合キーワードとして４桁の暗証番号を使用するも
のとする。そして、音声入力された照合キーワードに基
づいて話者照合を行う（本人確認後は、所定のサービス
その他の各処理を実施することになる）。さらに、この
話者照合の結果に基づいて照合する際の基準となる登録
音声データの更新の有無を判断し、必要に応じて更新処
理をするようになっている。Next, the overall operation of this system is shown in FIG.
The detailed functions of the respective processing units will be described as necessary, while using the flowchart shown in FIG. In addition, in order to facilitate understanding of each function, a description will be given with reference to specific examples as appropriate. As a specific example to be cited, consider the following speaker verification device. In other words, voice input / output by telephone is performed, and a 9-digit account number is used as a speaker identification number, and a 4-digit password is used as a matching keyword. Then, speaker verification is performed based on the verification keyword input by voice (after identity verification, a predetermined service and other processes are performed). Furthermore, based on the result of the speaker verification, it is determined whether or not the registered voice data serving as a reference for verification has been updated, and an update process is performed as necessary.

【００３７】まず、話者特定情報入力処理を実行する
（ＳＴ１）。すなわち、利用者（話者）に対して、話者
特定情報の入力を促す。具体的には、応答生成部６が、
予め記憶された発話内容の中から「お電話ありがとうご
ざいます。まず、お客さまの口座番号をおっしゃってく
ださい」という発話を取り出し、出力部８によりその発
話を出力する。これを受けた利用者は、口座番号（例え
ば、「５９６３８４１０７」）を音声で発話したり、電
話機のプッシュボタンによって入力する。First, speaker identification information input processing is executed (ST1). That is, the user (speaker) is prompted to input speaker identification information. Specifically, the response generation unit 6
The utterance “Thank you for calling. First, tell us your account number” is extracted from the utterance contents stored in advance, and the utterance is output by the output unit 8. Upon receiving this, the user speaks the account number (for example, "596384107") by voice or inputs the account number using a push button on the telephone.

【００３８】音声入力の場合には、入力部１が、音声波
形として利用者の発話を装置に取り込むとともに、入力
理解部３に渡し、そこにおいて音声波形データを予め用
意された数字の音素特徴量系列と比較することにより、
発話された数字列を認識する。また、プッシュボタン入
力の場合、入力部１は、数字列に対応するトーン信号を
装置に取り込むとともに、入力理解部３に渡し、そこに
おいて入力されたトーン信号を予め用意された数字の信
号波形と比較することにより入力された数字列を認識す
る。In the case of voice input, the input unit 1 captures the utterance of the user as a voice waveform into the device and passes it to the input understanding unit 3, where the voice waveform data is converted into a phoneme feature quantity of a number prepared in advance. By comparing with the series,
Recognize the uttered digit string. In the case of a push button input, the input unit 1 captures a tone signal corresponding to a numeric string into the device, passes the tone signal to an input understanding unit 3, and converts the input tone signal into a previously prepared numeric signal waveform. Recognize the input digit string by comparing.

【００３９】次に、照合キーワード入力処理を実行する
（ＳＴ２）。つまり、応答生成部６が、予め用意された
発話内容の中から「お客様の照合キーワードをおっしゃ
ってください」という発話を取り出し、出力部８がその
内容を音声として出力する。これを受けた利用者は、照
合キーワード（暗証番号）を発話するので、この発話さ
れた照合キーワードの音声データ（例えば「４１０
７」）が、入力部１によって装置に取り込まれ、音声照
合部４に渡される。Next, a matching keyword input process is executed (ST2). That is, the response generation unit 6 extracts an utterance “Please tell us your collation keyword” from the utterance content prepared in advance, and the output unit 8 outputs the content as a voice. Since the user receiving this utters the collation keyword (personal identification number), the voice data of the uttered collation keyword (for example, "410
7 ”) is input to the device by the input unit 1 and passed to the voice collating unit 4.

【００４０】次いで、話者照合処理に移行する。まず、
照合スコアを計算する（ＳＴ３）。すなわち、音声照合
部４が、入力理解部３から与えられた口座番号（話者特
定惰報）の認識結果をキーとして音声データ格納部２を
アクセスし、該当する登録音声データを取得する。つま
り、音声データ格納部２は、図４に示すようなデータ構
造となっているので、「５９６３８４１０７」に対応す
る「よんいちぜろなな」（本人が、４１０７を発生した
時の音声データ）を取得する。この登録音声データは、
音声データそのものであっても良いし、そこから抽出さ
れた特徴量であっても良い。そして、その取得した登録
音声データの特徴量と、入力された音声データの特徴量
とを比較し、似ている程度を示す照合スコア（−１．０
〜＋１．０の値）を算出する。なお、この照合スコアを
算出する認識・照合アルゴリズムは、従来からある各種
のものを適用できる。Next, the process proceeds to speaker verification processing. First,
A collation score is calculated (ST3). That is, the voice collating unit 4 accesses the voice data storage unit 2 using the recognition result of the account number (speaker specific coasting information) given from the input understanding unit 3 as a key, and acquires the corresponding registered voice data. That is, since the voice data storage unit 2 has a data structure as shown in FIG. 4, "Yonichizenna" (voice data when the user generates 4107) corresponding to "596384107" To get. This registered voice data
The data may be audio data itself, or a feature amount extracted therefrom. Then, the characteristic amount of the acquired registered audio data is compared with the characteristic amount of the input audio data, and a matching score (−1.0
値 +1.0). In addition, as a recognition / collation algorithm for calculating the collation score, various conventional ones can be applied.

【００４１】さらに、音声照合部４は、算出した照合ス
コア値に基づいて登録音声の話者と入力音声の話者が同
一であるか否か、すなわち利用者が登録者本人であるか
否かを判定する。具体的には、求めた照合スコアが閾値
よりも大きいか否かを判断する（ＳＴ４）。なお、ここ
では閾値は０とし、単純に照合スコア値の正負によって
判定を行なうようにした。つまり、照合スコアが正の値
であれば本人，負の値なら他人と判定する。つまり、ス
テップ３，４が音声照合部４の機能となる。Further, the voice collating unit 4 determines whether the speaker of the registered voice and the speaker of the input voice are the same based on the calculated collation score value, ie, whether the user is the registrant himself or herself. Is determined. Specifically, it is determined whether or not the obtained matching score is larger than a threshold (ST4). Here, the threshold value is set to 0, and the determination is made simply based on the sign of the collation score value. That is, if the collation score is a positive value, it is determined that the person is the person, and if the collation score is a negative value, the person is determined to be another person. That is, steps 3 and 4 function as the voice collating unit 4.

【００４２】そして、その判定結果が応答生成部６に渡
されるので、ステップ４の分岐判断でＮｏ、つまり他人
と判定された場合には、応答生成部６は、予め記億され
た発話から「照合の結果、ご本人とは確認できませんで
した。誠に申し訳ありませんが、お客さまは本サービス
をご利用になれません」といった発話を取り出し、出力
部８はその内容を音声として出力する。その後，処理を
終了するために電話回線を切断する（ＳＴ５）。Then, the result of the determination is passed to the response generation unit 6, and if the result of the branch determination in step 4 is No, that is, if it is determined to be another person, the response generation unit 6 returns to " As a result of the collation, the identity could not be confirmed. Sorry, the customer cannot use this service. ", And the output unit 8 outputs the content as audio. Thereafter, the telephone line is disconnected to end the processing (ST5).

【００４３】一方、ステップ４の分岐判断がＹｅｓ、つ
まり、判定結果が本人の場合には、登録音声を更新する
必要性の判定を行なう前に本人と認証されたことを利用
者に通知する（ＳＴ６）。すなわち、応答生成部６が、
予め記憶された発話から「照合の結果、ご本人と確認い
たしました」といった発話を取り出し、出力部８がその
内容を音声として出力する。その後，ステップ７に進
み、音声更新の必要性判定の処理に移る。On the other hand, if the branch determination in step 4 is Yes, that is, if the result of the determination is that of the user, the user is notified that the user has been authenticated before determining the necessity of updating the registered voice ( ST6). That is, the response generation unit 6
From the utterance stored in advance, an utterance such as "the collation has been confirmed as the result of collation" is extracted, and the output unit 8 outputs the content as a voice. Thereafter, the process proceeds to step 7 and proceeds to the process of determining the necessity of the voice update.

【００４４】次に、登録音声更新の必要性判定処理を実
行する（ＳＴ７）。すなわち、更新必要性判定部７は、
算出された照合スコアの値に基づいて、登録音声更新の
必要性があるかどうかを判定する。ここでは、単純に照
合スコア値がある閾値未満の場合に登録音声の更新が必
要と判定することができる。つまり、照合スコアが正の
値で本人と判断したものの、その数値が小さい場合に
は、経時変化により本人の音声が登録音声データと異な
ってきたと推定し、照合スコア値がある閾値未満の場合
に更新の必要性有りと判定する。Next, a process for determining the necessity of updating the registered voice is executed (ST7). That is, the update necessity determination unit 7
Based on the value of the calculated matching score, it is determined whether there is a need to update the registered voice. Here, it can be determined that the registered voice needs to be updated simply when the matching score value is less than a certain threshold. In other words, if the verification score is a positive value and the person is judged to be the person, but the numerical value is small, it is estimated that the voice of the person has become different from the registered voice data due to aging, and if the verification score value is less than a certain threshold, It is determined that there is a need for updating.

【００４５】また、より複雑な判定法としては、過去に
照合した時の照合スコア値の履歴や過去の照合キーワー
ドの音声データそのものを記憶しておき、それぞれの場
合の履歴情報を参照して登録音声更新の必要性を判定す
るという方法も採れる。一例を示すと、図５に示すフロ
ーチャートのように、前回と今回の照合スコア値を取得
し（ＳＴ２１）、前回のスコアから今回のスコアを減算
し、求めた値をＤとする（ＳＴ２２）。そして、その求
めた差分Ｄが一定の閾値よりも大きい場合に経年変化が
起こったと判断して更新必要と判定し、差分Ｄが小さい
場合には、更新不要と判定する（ＳＴ２３〜ＳＴ２
５）。As a more complicated judgment method, the history of the collation score value at the time of the past collation and the voice data of the past collation keyword itself are stored and registered by referring to the history information of each case. A method of determining the necessity of the voice update can be adopted. For example, as shown in the flowchart of FIG. 5, the previous and current collation score values are obtained (ST21), the current score is subtracted from the previous score, and the obtained value is set to D (ST22). When the obtained difference D is larger than a certain threshold value, it is determined that aging has occurred and it is determined that updating is necessary. When the difference D is small, it is determined that updating is unnecessary (ST23 to ST2).
5).

【００４６】これにより、図６に示すように、１，２，
３回目の利用時に求めた照合スコアの値がＳ１，Ｓ２，
Ｓ３とすると、２回目の利用時にはＳ１−Ｓ２の値が小
さいので更新不要と判定され、３回目の利用時にはＳ３
−Ｓ２の値が大きくなっているので更新必要と判定され
る。As a result, as shown in FIG.
The value of the matching score obtained at the time of the third use is S1, S2,
In the case of S3, the value of S1-S2 is small in the second use, so that it is determined that the update is unnecessary, and in the third use, S3 is used.
Since the value of -S2 is large, it is determined that updating is necessary.

【００４７】そして、前回の照合スコアは、例えば図７
に示すデータ構造のように音声データ格納部２におい
て、話者特定情報と登録音声データとともに関連付けて
格納することができる。そして、その登録は、例えば更
新必要性判定部７が今回の更新必要性の有無を判断した
際に、次回の判定のために該当する記憶エリアに登録す
ることにより対応できる。The previous collation score is, for example, as shown in FIG.
As shown in the data structure shown in FIG. 1, the voice data storage unit 2 can store the speaker identification information and the registered voice data in association with each other. The registration can be dealt with by, for example, when the update necessity determining unit 7 determines the necessity of the current update, by registering it in the corresponding storage area for the next determination.

【００４８】さらにまた、上記のように単純に前回との
差分を利用するのではなく、照合スコアの平均値を求め
ておき、下記の条件を具備した時に更新の必要ありと判
定することもできる。（今回の照合スコア値−照合スコア値平均値）の絶対値
＞閾値上記のようにして求めた更新必要性判定結果が更新必要
か否かを判断する（ＳＴ８）。更新の必要性無しと判定
された場合には、今回の話者照合の処理を終了する。な
お、その後は、通常のアプリケーションやタスクに依存
する処理に進むことになる。Further, instead of simply using the difference from the previous time as described above, the average value of the collation scores may be obtained, and it may be determined that the update is necessary when the following conditions are satisfied. . Absolute value of (current collation score value−average collation score value)> threshold value It is determined whether or not the update necessity determination result obtained as described above needs to be updated (ST8). If it is determined that there is no need for updating, the current speaker verification process is terminated. After that, the process proceeds to a process depending on a normal application or task.

【００４９】一方、登録音声更新の必要性ありと判定さ
れた場合はステップ９に進み、更新の推奨と利用者確認
を行う（ＳＴ９）。すなわち、今回の照合スコアが低か
ったり、大きく変化し、更新必要性有りと判断された理
由が、たまたま利用者が風邪をひいているなどの理由か
ら登録音声データに対する類似度が低くなることがあ
る。係る場合に、自動的に登録音声データを更新してし
まうと、次回、風邪が直って通常の声に戻った時に本人
と認識されないおそれがある。そして、発生した音声が
本人にとって通常の声であったか、そうでないかは本人
が一番良く知っている。そこで、本人に更新の同意をと
ることにより、誤った情報に基づいて更新されるのを抑
制する。具体的には、応答生成部６が、予め記憶してい
る発話から次のような発話を取り出し，出力手段は，そ
の内容を音声として出力する。On the other hand, if it is determined that there is a need to update the registered voice, the process proceeds to step 9, where the update is recommended and the user is confirmed (ST9). That is, the matching score of this time is low or greatly changed, and the similarity to the registered voice data may be low because the reason that the necessity of updating is determined is that the user happens to have a cold or the like. . In such a case, if the registered voice data is automatically updated, the user may not be recognized as the person himself / herself the next time the cold is corrected and the voice returns to the normal voice. The person knows best whether the generated voice is a normal voice for the person or not. Therefore, by obtaining the consent of the update to the person, update based on incorrect information is suppressed. Specifically, the response generation unit 6 extracts the following utterance from the utterance stored in advance, and the output unit outputs the content as a voice.

【００５０】「登録音声の更新をお勧めします。更新を
なさらないと、今後、正しく照合できなくなる可能性が
あります。ただし、風邪を引いているなど，現在お声の
調子が悪い場合は、更新を行なわないでください。……
登録音声の更新をなさいますか？」これを受けた利用者は、「はい」，「いいえ」のような
肯定あるいは否定を示す発話を行なうので、その発話を
入力部１が受け取るとともに、その発話を音声波形デー
タとして装置に取り込み、入力理解部３に渡す。入力理
解部３は、音声波形データから、「肯定」／「否定」の
いずれの入力であったかを判断し、その結果を音声登録
・管理部５に渡す。なお、利用者の回答の入力方式は、
音声に限らずプッシュボタン入力などであってもよい。[We recommend that you update the registered voice. If you do not update the registered voice, it may not be possible to perform correct verification in the future. However, if your voice is currently out of order, such as when you have a cold, Please do not update ...
Do you want to update your registered voice? Upon receiving this, the user makes an utterance indicating affirmation or denial such as “yes” or “no”, so that the utterance is received by the input unit 1 and the utterance is taken into the apparatus as voice waveform data. Pass to input understanding unit 3. The input understanding unit 3 determines whether the input is “positive” or “negative” from the audio waveform data, and passes the result to the audio registration / management unit 5. The input method of the user's answer is
Not only voice but also push button input or the like may be used.

【００５１】この場合に、例えば、システムの発話とし
て「音声の更新を行なう場合は１番，行なわない場合は
２番のプッシュボタンを押してください」のようなもの
を加えて出力することでユーザにプッシュボタン入力で
回答するように促すことができる。そして、入力された
トーン信号は、ステップ１で示した口座番号のプッシュ
ボタン入力の場合と同様にして入力理解部３にて認識さ
れる。In this case, for example, an utterance of the system such as "Press the push button 1 when updating the voice, and press the push button 2 when not updating" is added to the utterance and output to the user. You can be prompted to respond with a push button input. Then, the input tone signal is recognized by the input understanding unit 3 in the same manner as in the case of the push button input of the account number shown in step 1.

【００５２】そして、音声登録・管理部５は、利用者が
登録音声の更新を行なうことに同意したか否かを判断し
（ＳＴ１０）、同意した場合（利用者回答の認識結果＝
「はい」の場合）は、ステップ１１に進んで音声データ
の更新を行なう。すなわち、音声登録・管理部５は、ス
テップ２で入力された音声データを新たな登録音声デー
タとして音声データ格納部２の該当する記憶領域に登録
する。なお、この登録は、現在登録されている音声デー
タに対して上書きするようにしても良いし、予め登録さ
れた音声データと入力された音声データの各特徴量の平
均値で更新するようにすることもできる。なお、ユーザ
が音声更新に同意しなかった場合は、そこで話者照合の
処理を終了する。Then, the voice registration / management unit 5 determines whether or not the user has agreed to update the registered voice (ST10), and when the user has agreed (recognition result of the user answer =
If "yes", the process proceeds to step 11 to update the voice data. That is, the voice registration / management unit 5 registers the voice data input in step 2 as new registered voice data in the corresponding storage area of the voice data storage unit 2. This registration may be overwritten on the currently registered audio data, or may be updated with the average value of the feature amounts of the previously registered audio data and the input audio data. You can also. If the user does not agree with the voice update, the process of speaker verification ends.

【００５３】図８は本発明の第２の実施の形態を示して
いる。ブロック構成は、図１と同じであるので、フロー
チャートに基づいてその動作を説明しながら対応する処
理部の機能を説明する。FIG. 8 shows a second embodiment of the present invention. Since the block configuration is the same as that of FIG. 1, the function of the corresponding processing unit will be described while describing its operation based on a flowchart.

【００５４】この第２の実施の形態は、第１の実施の形
態の図３，図４と基本的に同様であり、更新時の安全性
を高めたものである。具体的には、図３に示すフローチ
ャートの処理機能は本実施の形態においても同様（各処
理部の機能もその点では同じ）である。そして、図４に
示すフローチャートにおけるステップ１０の分岐判断で
利用者が更新に同意した場合に、第１の実施の形態では
すぐに更新処理をしたが、本実施の形態では、追加照合
をすることにより、確実に本人であることを確認するよ
うにした。さらに、更新する場合にも重み付けを設定す
ることにより、より正しい登録音声データを作成し、以
後に行う話者照合における認識率を高くするようにし
た。換言すると、係る処理を実行する機能を、登録更新
・管理部５に付加させる。The second embodiment is basically the same as FIGS. 3 and 4 of the first embodiment, and improves the security at the time of updating. Specifically, the processing functions of the flowchart shown in FIG. 3 are the same in the present embodiment (the functions of each processing unit are also the same in that respect). In the first embodiment, when the user agrees to the update in the branch determination in step 10 in the flowchart shown in FIG. 4, the update process is immediately performed. In the present embodiment, additional collation is performed. This ensures that you are who you are. Further, even when updating, by setting weights, more accurate registered voice data is created, and the recognition rate in speaker verification performed later is increased. In other words, the function for executing such processing is added to the registration update / management unit 5.

【００５５】そして、具体的には、図８に示すように、
利用者が更新を承諾した場合（ステップ１０でＹｅｓ）
に、ステップ３１以降の処理を実施するようにしてい
る。すなわち、まず、追加照合用の発話回数・閾値の決
定処理を行う（ＳＴ３１）。この処理も音声登録・更新
処理部が実施する。そして、このステップの具体的な処
理は、図９に示すようになる。同図に示すように、今回
の照合スコアを参照し、それが追加照合回数決定閾値よ
りも小さいか否かを判断する（ＳＴ４１〜ＳＴ４３）。Then, specifically, as shown in FIG.
When the user accepts the update (Yes in step 10)
Then, the processing after step 31 is performed. That is, first, the number of utterances / threshold for additional collation is determined (ST31). This processing is also performed by the voice registration / update processing unit. The specific processing of this step is as shown in FIG. As shown in the drawing, the current collation score is referred to, and it is determined whether or not it is smaller than the additional collation count determination threshold value (ST41 to ST43).

【００５６】そして、閾値よりも小さい、つまり、本人
と判定されたものの登録音声データとの類似度は低い場
合には、ステップ４４に進み、追加照合回数が２にセッ
トされ、照合閾値が厳しい値に設定される。一方、今回
の照合スコアが追加照合回数決定閾値以上の場合には、
ステップ４５に進み、追加照合回数が１にセットされ、
照合閾値が通常の値に設定される。If it is smaller than the threshold, that is, if it is determined that the person is the person and the similarity with the registered voice data is low, the process proceeds to step 44, where the number of times of additional collation is set to 2, and the collation threshold is set to a strict value. Is set to On the other hand, if the current matching score is equal to or greater than the additional matching number determination threshold,
Proceeding to step 45, the number of additional matches is set to one,
The collation threshold is set to a normal value.

【００５７】すなわち、本形態によれば、本人と判定し
たときの照合スコア（今回のスコア）に応じて追加照合
時の発話回数並びに照合閾値を設定する。このとき、照
合スコアが低い場合は、より厳しくチェックするために
発話回数と閾値を高めに設定する。これにより、本人と
判断したものの本人で無い可能性がある（本人らしさが
低い）場合には厳しく判断し、それでも本人と照合され
たときに更新処理を行うことにより、安全性の向上を図
る。That is, according to the present embodiment, the number of utterances at the time of additional collation and the collation threshold are set according to the collation score (the current score) when the person is determined to be the person. At this time, if the collation score is low, the number of utterances and the threshold value are set higher for more strict checking. With this, when it is determined that the user is the person, but there is a possibility that the person is not the person (the personality is low), the determination is made strictly, and the security is improved by performing the update process when the person is still verified.

【００５８】上記の設定にしたがって本人の追加照合を
行う（ＳＴ３２）。つまり、再度利用者に発話させ、話
者照合を行う。そして追加照合した結果、本人確認がで
きたか否かを判断する（ＳＴ３３）。条件を満たさない
場合には、今回は更新処理をしないようにする。これに
より、確実に本人の発話に基づいて音声データの更新登
録ができ、安全性が高まる。In accordance with the above settings, the user is additionally verified (ST32). That is, the user is made to speak again and speaker verification is performed. Then, as a result of the additional collation, it is determined whether or not the identity has been confirmed (ST33). If the conditions are not satisfied, the update process is not performed this time. As a result, the update registration of the voice data can be reliably performed based on the utterance of the person, and the security is improved.

【００５９】一方、追加照合した結果、本人と確認でき
た場合には、ステップ３４に進み更新重みの設定処理を
行う。ここでは、更新後の登録音声における既存の登録
音声の特徴量と新規に入力する発話音声の特徴量との混
合比を決定するための重みを設定する。そして、具体的
には図１０に示すフローチャートを実施する。On the other hand, as a result of the additional collation, if it is confirmed that the user is the person, the process proceeds to step 34, where the update weight is set. Here, a weight is set for determining a mixture ratio between the feature amount of the existing registered voice and the feature amount of the newly input uttered voice in the updated registered voice. Then, specifically, the flowchart shown in FIG. 10 is performed.

【００６０】まず、本人と判定したときの今回の照合ス
コアと、追加照合時のスコアを取得し、その差分Ｄ２を
求める（ＳＴ５１，ＳＴ５２）。そして、求めた差分Ｄ
２が、更新用発話回数決定閾値よりも小さいか否かを判
断する（ＳＴ５３）。差分Ｄ２が小さい場合（ステップ
５３の分岐判断でＹｅｓ）には、ステップ５４に進み更
新用発話回数に２をセットする。逆に、差分Ｄ２が大き
い場合（ステップ５３の分岐判断でＮｏ）には、ステッ
プ５５に進み更新用発話回数に１をセットする。First, the current matching score when the person is determined to be the person and the score at the time of additional matching are obtained, and the difference D2 is obtained (ST51, ST52). Then, the obtained difference D
It is determined whether or not 2 is smaller than the update utterance count determination threshold value (ST53). If the difference D2 is small (Yes in the branch determination of step 53), the process proceeds to step 54, and 2 is set to the number of utterances for updating. Conversely, when the difference D2 is large (No in the branch determination of step 53), the process proceeds to step 55, and 1 is set to the number of utterances for updating.

【００６１】このように、更新用発話回数が決まったな
らば、次は、重み付けを行う（ＳＴ５６）。図から明ら
かなように、差分Ｄ２が多くなればなるほど重みが小さ
くなる。つまり、差分Ｄ２が大きい場合は、その話者の
「声のばらつきが大きい」と判断して、新規入力の重み
を低く設定する。すなわち、ばらつきが大きいので、な
るべく過去の音声特徴量が残るように重みづけをする。After the number of utterances for updating is determined, weighting is performed next (ST56). As is clear from the figure, the weight becomes smaller as the difference D2 increases. That is, when the difference D2 is large, it is determined that "variation in voice" of the speaker is large, and the weight of the new input is set low. That is, since the variation is large, weighting is performed so that past speech feature amounts remain as much as possible.

【００６２】また、音声特徴量がベクトル表現されてい
る場合には、図１１に示す式に各特徴量データ（各要素
の特徴量ベクトル）を代入し、更新後の特徴ベクトルを
求めることもできる。When the speech feature is expressed in a vector, each feature data (the feature vector of each element) is substituted into the equation shown in FIG. 11 to obtain the updated feature vector. .

【００６３】上記のようにして登録用発話回数と、更新
重みが決定されたならば、ステップ３５に進み、与えら
れた登録用音声を取得し、それに基づいてスコアを算出
する（ＳＴ３５，ＳＴ３６）。そして、得られた照合ス
コアが、閾値を超えた場合に（ＳＴ３７）に、登録音声
の更新を行う（ＳＴ１１）。なお、スコアが閾値未満の
場合には、ステップ３５に戻り、再度登録用音声入力を
する。When the number of utterances for registration and the update weight are determined as described above, the process proceeds to step 35, where the given registration voice is obtained, and a score is calculated based on the obtained voice (ST35, ST36). . When the obtained matching score exceeds the threshold (ST37), the registered voice is updated (ST11). If the score is less than the threshold, the process returns to step 35, and the registration voice is input again.

【００６４】[0064]

【発明の効果】以上のように、この発明では、照合スコ
ア情報を用いることにより、登録音声情報の更新の要否
の判断を行うようにしたため、適切なタイミングで更新
処理が行える。As described above, according to the present invention, the necessity of updating the registered voice information is determined by using the matching score information, so that the updating process can be performed at an appropriate timing.

[Brief description of the drawings]

【図１】本発明の好適な一実施の形態を示すブロック図
である。FIG. 1 is a block diagram showing a preferred embodiment of the present invention.

【図２】作用を説明するフローチャートの一部である。FIG. 2 is a part of a flowchart illustrating an operation.

【図３】作用を説明するフローチャートの一部である。FIG. 3 is a part of a flowchart illustrating an operation.

【図４】音声データ格納部のデータ構造を示す図であ
る。FIG. 4 is a diagram showing a data structure of an audio data storage unit.

【図５】更新必要性判定部の機能を説明するフローチャ
ートである。FIG. 5 is a flowchart illustrating a function of an update necessity determining unit.

【図６】更新必要性の要否の判定例を示す図である。FIG. 6 is a diagram illustrating an example of determining whether or not update is necessary;

【図７】音声データ格納部の別のデータ構造を示す図で
ある。FIG. 7 is a diagram showing another data structure of the audio data storage unit.

【図８】本発明の第２の実施の形態の要部である音声登
録・管理部の機能を説明するフローチャートである。FIG. 8 is a flowchart illustrating functions of a voice registration / management unit that is a main part of the second embodiment of the present invention.

【図９】図８のステップ３１の詳細な処理手順を示すフ
ローチャートである。FIG. 9 is a flowchart showing a detailed processing procedure of step 31 in FIG. 8;

【図１０】図８のステップ３４の詳細な処理手順を示す
フローチャートである。FIG. 10 is a flowchart showing a detailed processing procedure of step 34 in FIG. 8;

【図１１】新たな更新登録音声データ（特徴量）の算出
例を説明する図である。FIG. 11 is a diagram illustrating a calculation example of new update registration voice data (feature amount).

【符号の説明】１入力部２音声データ格納部３入力理解部４音声照合部５音声登録・管理部６応答生成部７更新必要性判定部８出力部[Description of Signs] 1 input unit 2 voice data storage unit 3 input understanding unit 4 voice collation unit 5 voice registration / management unit 6 response generation unit 7 update necessity determination unit 8 output unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者中嶋宏京都府京都市右京区花園土堂町10番地オムロン株式会社内Ｆターム(参考） 5D015 AA03 GG01 HH04 9A001 BB03 HH17 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Hiroshi Nakajima 10F, Hanazono Todo-cho, Ukyo-ku, Kyoto-shi, Japan OMRON Corporation F-term (reference) 5D015 AA03 GG01 HH04 9A001 BB03 HH17

Claims

[Claims]

1. A voice input means, a voice information storage means for holding registered voice information serving as a reference for voice recognition, and a voice input means based on the registered voice information stored in the voice information storage means. A speaker recognizing device that includes a voice collating unit that determines whether or not the input voice information is issued by a valid speaker. Based on the collation score information indicating the individuality,
Determining means for determining the necessity of updating the registered voice information held in the voice information storage means; and updating means for updating the registered voice information based on the determination result of the determining means. Speaker recognition device.

2. The method according to claim 1, wherein the updating unit determines that the determination result based on the collation score information in the determining unit needs to be updated, and that the registered voice is provided on condition that there is consent information of an updating process from a speaker. The speaker recognition device according to claim 1, wherein the information is updated.