JPS59111699A

JPS59111699A - Speaker recognition method

Info

Publication number: JPS59111699A
Application number: JP57221652A
Authority: JP
Inventors: 奈良　泰弘; 小林　敦仁
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-17
Filing date: 1982-12-17
Publication date: 1984-06-27

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】発明の技術分野本発明は話者認識方式に係り、特に複数の人間が発音し
た音声が登録されているときに入力音声がこの登録され
ている複数の人間の音声のうちどれともつとも類似して
いるものか認識できるようにしたものに関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a speaker recognition system, and in particular, when the voices pronounced by a plurality of people are registered, the input voice is recognized as one of the registered voices of the plurality of people. It relates to something that can be recognized to be similar to any one of them.

技術の背景現在の音声認識装置では９話者が自分の声で登録した辞
書を使用した場合には高い認識率が得られるが、誰の声
でも認識できるものではなく、他人の声で登録された辞
書を使用した場合にはがなり認識率が低下する。したが
って９例えば電話で伝達された声により音声認識を行う
場合には、第１段階として電話での話者が誰であるのか
、あるいは複数登録話者のうちの誰に類似しているのか
を認識する９話者認識が必要であった。Background of the Technology Current speech recognition devices can achieve a high recognition rate when nine speakers use a dictionary in which their own voices are registered, but they cannot recognize anyone's voice, and are unable to recognize anyone else's voice. If you use a different dictionary, the recognition rate will decrease. Therefore,9 For example, when performing speech recognition using a voice transmitted over the telephone, the first step is to recognize who the speaker is on the telephone, or to which of multiple registered speakers the speaker is similar. 9 speaker recognition was required.

従来技術と問題点従来の話者認識法には「話し方」に着目する方法と、「
音質」に着目する方法がある。前者は。Conventional techniques and problems Conventional speaker recognition methods include methods that focus on "speech style" and methods that focus on "speech style";
There is a way to focus on sound quality. The former.

例えば発声速度やイン）・ネーションの変化パターンに
注目する方法であるが、簡単な手法でｄあるものの、登
録辞書には音質に関連するデータが多く登録されており
「話し方」のデータには音質に関する分析が不充分のた
め、音声認識装置の使用に先立つ辞書選びには適さない
。また後者は発声者の声帯の形状や口腔等共鳴器管の形
状により決まる音質に注目する方法であるが、既に登録
しである複数話者のうちの１人が発声し、それが誰であ
るかを判定する用途には精度の点で不向きである反面、
登録していない新らしい話者の声が誰のものに似かよっ
ているかを判定するには好適である。For example, there is a method that focuses on patterns of changes in speech rate and in)/nation, but although it is a simple method, registered dictionaries contain a lot of data related to sound quality, and data on "speech style" includes sound quality. Because there is insufficient analysis regarding this, it is not suitable for selecting a dictionary prior to using a speech recognition device. The latter is a method that focuses on the sound quality determined by the shape of the speaker's vocal cords and the shape of the resonator tube such as the oral cavity. Although it is unsuitable in terms of accuracy for determining whether
This is suitable for determining whose voice the voice of a new speaker who has not been registered resembles.

音質に注目した従来の話者認識技術には２発声音の音質
をフレーム間隔毎に分析して％全パラメータを抽出して
からこの特徴パラメータを時間軸方向にたし合わせ平均
化したものを話者毎に比較するというものがおる。しか
しこの方法では平均化したパターンに発声法の影響、す
なわち照星の長短のような、音質の特徴以外に話し方の
％徴がかなり含まれており、正確な認識には不充分であ
った。Conventional speaker recognition technology that focuses on sound quality analyzes the sound quality of two utterances at each frame interval, extracts % total parameters, and then adds up and averages these characteristic parameters along the time axis. There is a way to compare each person. However, with this method, the averaged pattern contained considerable percentage characteristics of speaking style, such as the influence of vocalization method, such as the length and shortness of the sun, in addition to sound quality characteristics, and was insufficient for accurate recognition.

発明の目的本発明の目的はこのような問題を改善するために登録話
者の音声をフレーム間隔毎に分析し・で得られるパラメ
ータをメモリに格納しておき、入力音声の１フレーム毎
に登録話者の誰の声に似ているかを判断し、入力音声の
発声後に誰の声に似ているフレームが多かったかによっ
て総合判断を行うことにより発声法の影響を受けずに、
高精度に発声者の声が誰のものに類似しているかを判定
できるようにした話者認識方式を提供することにらる。Purpose of the Invention The purpose of the present invention is to solve this problem by analyzing the voice of a registered speaker at each frame interval, storing the obtained parameters in memory, and registering them for each frame of the input voice. By determining whose voice the speaker's voice resembles, and making a comprehensive judgment based on which frame resembles the voice of the most people after the input voice is uttered, the system is not affected by the vocalization method.
It is an object of the present invention to provide a speaker recognition method that can highly accurately determine to whom a speaker's voice is similar.

発明の構成この目的を遂行するため２本発明の話者認識方式では、
入間が発音した音声を処理し２発音者の声が登録者の誰
のものに似ているかを判定する話者認識装置において、
複数の人間が発声した音声をフレーム周期で分析したパ
ラメータ時系列を話者毎に保持する細分類音種パターン
メモリを設け。Structure of the Invention In order to accomplish this purpose, the speaker recognition method of the present invention has two steps:
In a speaker recognition device that processes the voice pronounced by Iruma and determines which of the registrants the voice of the two speakers resembles,
A subclassified sound type pattern memory is provided to store a parameter time series for each speaker, which is obtained by analyzing the voices uttered by multiple people at frame intervals.

発声者の音声をフレーム周期で分析したパラメータ時系
列と細分類音柚パターンメモリのパラメータ時系列との
相関を演算してフレーム周期毎ＫＭも相関の大きい登録
話者名を選択する選択手段を設け９発声後最も多数回選
択された登録話者名を決定してこれを話者認識結果とし
て出力するようにしたことを特徴とする。A selection means is provided for calculating the correlation between the parameter time series obtained by analyzing the speaker's voice in frame cycles and the parameter time series of the subclassified sound pattern memory, and selecting a registered speaker name with a large correlation in KM for each frame cycle. The present invention is characterized in that the name of the registered speaker selected most times after nine utterances is determined and this is output as the speaker recognition result.

発明の要点本発明ではあらかじめ複数基の話者が発声した音声をフ
レーム間隔毎に分析して得られるパターン群を話者毎Ｅ
Ｃ整理してメモリ（細分類音種メモリと呼ぶ）に格納し
ておく。そして話者認識すべき発声者の１フレームに対
する分析結果と、細分類音種メモリの全項目と相関（類
似度）を計算し。Summary of the Invention In the present invention, a group of patterns obtained by analyzing voices uttered by a plurality of speakers for each frame interval is analyzed in advance for each speaker.
C are organized and stored in a memory (referred to as subclassification note type memory). Then, the correlation (degree of similarity) is calculated between the analysis result for one frame of the speaker to be recognized and all the items in the subclassified sound type memory.

最も類似度の高いパターンの発声省名を記録する。Record the vocalization name of the pattern with the highest degree of similarity.

このような処理を話者認識すべき発声者の音声の全フレ
ームに対して行ない、最も高い頻度で選択された発声者
名を話者認識結果とするものである。Such processing is performed on all frames of the voice of the speaker to be recognized, and the name of the speaker selected most frequently is used as the speaker recognition result.

発明の史＆４月本発明の一実施例を添付図面にもとづき詳述する。History of invention & April An embodiment of the present invention will be described in detail based on the accompanying drawings.

図中、１はマイクロフォン、２は１６チヤネルのバンド
パス・フィルタ拳バンク（以下バンドパス・フィルタと
いう）、３はマルチプレクサ。In the figure, 1 is a microphone, 2 is a 16-channel band-pass filter bank (hereinafter referred to as a band-pass filter), and 3 is a multiplexer.

４はアナログ・ディジタル変換器（以下Ａ／Ｄ変換器と
いう）、５は細分類音種メモリ、６はチェビシェフノル
ム計算回路、７は最小値演算部、８はデコーダ、９は登
録話者頻度記録部、ｉｏＩ′ｉ最大値演算部１８１．８
２はそれぞれスイッチ部である。4 is an analog-to-digital converter (hereinafter referred to as an A/D converter), 5 is a subclassification note type memory, 6 is a Chebyshev norm calculation circuit, 7 is a minimum value calculation unit, 8 is a decoder, 9 is a registered speaker frequency record part, ioI'i maximum value calculation part 181.8
2 are switch parts.

バンドパス・フィルタ２はマイクロフォン１から入力さ
れた音声信号をｆ□〜ｆ１ｇの１６の周波数に分析する
ものであって、スペクトルの概形を表わす１６チヤ卆Ｉ
４のアナログ信号を出力するものである。The bandpass filter 2 analyzes the audio signal input from the microphone 1 into 16 frequencies from f□ to f1g, and the bandpass filter 2 analyzes the audio signal input from the microphone 1 into 16 frequencies from f□ to f1g.
It outputs 4 analog signals.

マルチプレクサ３は例えば１０ｍ５のサンプル周期毎に
１回、チ１ヤネ１ｔ−１〜１６のアナログ信号をスキャ
ンすることにより時分割サンプルを行うものである。そ
してこの時分割された１アナログ信号出力はＡ／Ｄ変換
器４によりティジタル量に変換されて１例えば１６ワー
ド／フレームのディジタル出力される。したがって入力
発声長を例えば１秒間とすると、１発声について　１ｏ
ｏフレ−ム×１６ワードー１６００ワードが出力される
ことになる。The multiplexer 3 performs time-division sampling by scanning the analog signals of the channels 1t-1 to 16, for example, once every 10 m5 sampling period. This time-divided one analog signal output is converted into a digital quantity by the A/D converter 4, and is output as a digital quantity of, for example, 16 words/frame. Therefore, if the input utterance length is, for example, 1 second, then 1 o for 1 utterance.
o frames x 16 words - 1600 words will be output.

細分類音種メモリ５は登録者の特徴を保持するメモリで
あって、各登録者毎にその特徴を保持するために登録者
毎にこれを用意する。したがってこの例のように登録者
が１０名いる場合には細分類音ｍ第１メモリ５−０〜細
分類音ｍ第１０メモリ５−９を用意する。The subclassified note type memory 5 is a memory that holds the characteristics of registrants, and is prepared for each registrant in order to hold the characteristics of each registrant. Therefore, when there are 10 registrants as in this example, the first memory 5-0 for subclassified sound m to the tenth memory 5-9 for subclassified sound m are prepared.

チェビシェフノルム計算回路６はフレームの類似度を計
算するものであって ΣｌＩ＋−Ｄ＋１ −１を割算するものである。ここでＩ、はＡ／Ｄ変換器４か
ら出力される第１チヤネルを表わし、Ｄｌはスイッチ部
Ｓ２を経由して伝達される細分類音種メモリ５に保持さ
れている１辞書項目の第１ワードを示す。この計算結果
はＡ／Ｄ変換器４から送出される認識音声の１フレーム
データ（１６ワード）と、スイッチ部Ｓ２を経由して細
分類音種メモリ５から送出される１辞書項目（１６ワー
ド）の距離を表わすととＫなる。チェビシェフノルム計
算面゛路６は１０ｍ秒に１回、Ａ／Ｄ変換器４から１フ
レ一ム分のデータが伝達されると、スイッチ部Ｓ２を細
分類音種第１メモリ５−０〜細分類音種第１０メモリ５
−９側に順次切換え、１００項目×１０（ｍ−ｉｏｏｏ
項目に対する距離計算を行うが、最小値演算部７はこの
１０００回の計算結果の最小値を演算し、その最小値を
与えるデータが細分類音種第１メモリ５−０〜細分類音
種第１０メモリ５−９のいずれから出力されたものであ
るかを示す４ビツトの識別コードをフレーム毎に出力す
る。すな１わち最小値演算部７は１０ｍ秒毎に１回、４
ビツトの識別コードを出力することになる。The Chebyshev norm calculation circuit 6 calculates the similarity of frames and divides ΣlI+−D+1 −1. Here, I represents the first channel output from the A/D converter 4, and Dl represents the first channel of one dictionary item held in the subclassification note type memory 5 transmitted via the switch section S2. Indicates word. This calculation result is one frame data (16 words) of the recognized speech sent from the A/D converter 4 and one dictionary item (16 words) sent from the subclassification sound type memory 5 via the switch section S2. The distance is expressed as K. Once every 10 msec, when one frame worth of data is transmitted from the A/D converter 4, the Chebyshev norm calculation plane circuit 6 switches the switch section S2 to subdivide note type first memories 5-0 to 5-0. Classification sound type 10th memory 5
-9 side, 100 items x 10 (m-iooo
The distance calculation for the item is performed, and the minimum value calculation unit 7 calculates the minimum value of the 1000 calculation results, and the data giving the minimum value is stored in the subclass note type first memory 5-0 to subclass note type No. A 4-bit identification code indicating from which of the 10 memories 5-9 the frame is output is output for each frame. In other words, the minimum value calculation section 7 calculates 4 times once every 10 msec.
The bit identification code will be output.

デコーダ８はこの４ビツトの識別コードを解読して、そ
れが例えば細分類音種第１７モリ５−０から出力された
データと比較したときに最小値が付与されたものである
ことを判読したとき、登録話者頻度記録部９の第１カウ
ンタ９−０に出力を送り、これを＋１し９例えば細分類
音種第２メモリ５−１から出力されたものと判読したと
き第２カウンタ９−１に出力を送る。このようにして第
１カウンタ９−０〜第１０カウンタ９−９には各フレー
ム毎にもつとも類似した登録話者がカウントされること
になり、これらのカウンタのうち最大値のものを最大値
演算部１０で検出することにより認識音声が、登録話者
のどれともつとも類似しているのかを判別できる。The decoder 8 deciphered this 4-bit identification code, and when it compared it with, for example, the data output from the subclassified note type No. 17 mori 5-0, it determined that the minimum value was assigned. At this time, the output is sent to the first counter 9-0 of the registered speaker frequency recording unit 9, and it is incremented by 1.9 When the output is interpreted as being output from the second memory 5-1 for subclassified note types, the second counter 9 Send output to -1. In this way, the first counter 9-0 to the tenth counter 9-9 count at least similar registered speakers for each frame, and the maximum value among these counters is calculated as the maximum value. By detecting it in the unit 10, it can be determined whether the recognized speech is extremely similar to any of the registered speakers.

次に添付図面により本発明の詳細な説明する。The present invention will now be described in detail with reference to the accompanying drawings.

（１）　　登録時登録時にはまずスイッチ部Ｓ１を細分類音種第１メモリ
５−０と接続し、第１番目の登録話者に例えばあらかじ
め定められた音声を発音させる。この音声はマイクロフ
ォン１から入力されてバンドパス・フィルタ２により１
６チヤネルに周波数分析され、１６チヤネルのアナログ
信３が出力される。マルチプレクサ３により１０ｎ１秒
のサンプル周期に１回チャネル１〜１６のアナログ信月
をスキャンすることにより時分割サンプルを行ない。(1) At the time of registration At the time of registration, the switch section S1 is first connected to the subclassified sound type first memory 5-0, and the first registered speaker is made to emit, for example, a predetermined voice. This voice is input from microphone 1 and passed through bandpass filter 2.
The frequency is analyzed into 6 channels, and 16 channels of analog signals 3 are output. Time-division sampling is performed by scanning the analog signals of channels 1 to 16 once in a sampling period of 10n1 seconds using the multiplexer 3.

この出力はＡ、／　Ｄ変換器４によりディジタル量に変
換される。このようにしてＡ／Ｄ変換器４は１０ｍ秒毎
に１チヤネル毎に１ワードの９合計して１６ワードのデ
ィジタル出力を生ずることになり、これが細分類音種第
１メモリ５−０に登録されることになる。したがって入
力発声長が１秒の場合には、１発声について１００フレ
ーム×１６ワードー１６００ワードが登録されることに
なる。次に第２番目の登録話者が登録する場合、スイッ
チ部Ｓ１を細分類音種第２メモリ５−１側に接続して同
様の入力処理が行われるので、細分類音種第２メモＩＪ
　５−１には第２番目の登録話者の特徴が保持される。This output is converted into a digital quantity by an A/D converter 4. In this way, the A/D converter 4 generates nine digital outputs, one word for each channel every 10 msec, totaling 16 words, which are registered in the subclassified note type first memory 5-0. will be done. Therefore, if the input utterance length is 1 second, 100 frames x 16 words (1600 words) will be registered for one utterance. Next, when a second registered speaker registers, the switch section S1 is connected to the subclassified note type second memory 5-1 side and similar input processing is performed, so that the subclassified note type second memorandum IJ
5-1 holds the characteristics of the second registered speaker.

このようなことが各登録話者毎に行われるので、登録話
者が１０名いるときには細分類音種第１０メモリ５−９
までに各登録話者の特徴が保持されることになる。This is done for each registered speaker, so when there are 10 registered speakers, the 10th subclassification sound type memory 5-9
The characteristics of each registered speaker will be maintained until then.

（２）認識時入力音声が登録話者の誰ともつとも類似しているかとい
うことを認識する場合には、スイッチ部Ｓ１を開放状態
にする。このとき入力される音声は。(2) When recognizing whether the input speech is similar to any of the registered speakers, the switch section S1 is opened. What is the audio input at this time?

登録話者が細分類音種メモリ５に特徴を登録するときに
発声したものと同じ音声であることが望ましい。マイク
ロフォン１から入力されたこの被認識音声は、上記（１
）と同様に１６チヤネルに周波数分析され、これらが１
０ｍ秒のサンプル周期にスキャンされてディジタル量に
変換され、１フレーム１６ワードのディジタル出力がチ
ェビシェフノルム計算回路乙に伝達される。このときス
イッチ部Ｓ２は細分類音種第１メモリ５−０と接続して
１ワードづつこのメモリの読出しを行ない、チェビシェ
フノルム計算回路６にて上記！ｌ　ｌ　Ｉ、−Ｄ、　ｌ
　−ｃ−１表現される財力を行う。すなわち被認識音声及び細分類
音種メモリから得られた１項目１６ワードのデータのそ
れぞれ対応する項の差の絶対値の和が引算されることに
なり、この計算結果がＡ／Ｄ変換器４から送られる１フ
レーム・データ（１６ワード）と細分類音種メモリ５か
ら送出される１辞書項目（１６ワード）の距離を表わす
ことになる。チェビシェフノルム計算回路６は１０ｍ秒
に１回、　　Ａ／Ｉ）変換器４から１フレ一ム分のデー
タが伝達されると、スイッチ部Ｓ２を細分類音種第１メ
モリ５−０〜細分類音種第１０メモリ５−９側に順次切
換えて、１００項×１０組に対する距離計算を行うが、
最小値演算部７はこの１０００回の計算結果の最小値を
演舞してその最小値を与えるデータが細分類音種第１メ
モリ５−０〜細分類音種第１０メモリ５−９のいずれか
ら出力されたものかを示す例えば４ビツトの識別コード
を出力する。It is desirable that the voice is the same as that uttered by the registered speaker when registering the characteristics in the subclassification sound type memory 5. This voice to be recognized inputted from microphone 1 is as described above (1).
), the frequency is analyzed into 16 channels, and these are divided into 1
It is scanned at a sampling period of 0 msec and converted into a digital quantity, and the digital output of 16 words per frame is transmitted to the Chebyshev norm calculation circuit B. At this time, the switch section S2 connects to the subclassified note type first memory 5-0, reads out this memory word by word, and uses the Chebyshev norm calculation circuit 6 to read out the above! l l I, -D, l
-c-1 Perform the financial power expressed. In other words, the sum of the absolute values of the differences between the corresponding terms of each item of 16 words of data obtained from the speech to be recognized and the subclassified sound type memory is subtracted, and this calculation result is sent to the A/D converter. This represents the distance between one frame of data (16 words) sent from 4 and one dictionary item (16 words) sent from subclassification note type memory 5. Once every 10 msec, when one frame worth of data is transmitted from the A/I) converter 4, the Chebyshev norm calculation circuit 6 sends the switch section S2 to the subclassification note type first memory 5-0 to subclassification. The distance is calculated for 100 terms x 10 sets by sequentially switching to the 10th note type memory 5-9 side.
The minimum value calculation unit 7 performs the minimum value of the 1000 calculation results and determines whether the data giving the minimum value is from any of the subclass note type first memory 5-0 to subclass note type tenth memory 5-9. For example, a 4-bit identification code indicating whether the output has been output is output.

すなわち最小値演算部７は１０ｍ秒に１回この識別コー
ドを出力するが、この識別コードはデコーダ８で解読さ
れ、これに対応する第１カウンタ９−〇〜第１０カウン
タ９−９が選択的に＋１されする信号がデ・−ダ８より
出力される。このようにして被認識音声の一発声が終っ
たとき、最大値演算部１０はこの登録話者頻度記録部９
を構成している第１カウンタ９−０〜第１０カウンタ９
−９の値を比較して、その値も大きな値を示しているカ
ウンタ゛の番号を話者認識結果として出力するとともに
、第１カウンタ９−〇〜第１０カウンタ９−９をリセッ
トする。That is, the minimum value calculation section 7 outputs this identification code once every 10 msec, but this identification code is decoded by the decoder 8, and the corresponding first counter 9-0 to tenth counter 9-9 selectively output the identification code. The de/-der 8 outputs a signal that is incremented by +1. When one voice to be recognized has been uttered in this manner, the maximum value calculation section 10 calculates the registered speaker frequency recording section 9.
The first counter 9-0 to the tenth counter 9 forming the
-9 is compared, and the counter number whose value is also large is output as the speaker recognition result, and the first counter 9-0 to the tenth counter 9-9 are reset.

なお上記説明ではバンドパス拳フィルタを１６　　・チ
ャネルのものを使用した例について説明したが勿論この
チャネル数はこれに限定されるものではなく適当なｎチ
ャネルにしたり、ディジタル・フィルターバンクを使用
するとともでき、またフレーム周期を１０ｍ秒ではなく
他の適当な時間に変更することもできる。勿論登録話者
は１０人に限定されるものではなく任意の複数基に選定
できる。In the above explanation, an example was explained in which a bandpass filter with 16 channels was used, but of course the number of channels is not limited to this, and it is possible to use an appropriate n channel or use a digital filter bank. It is also possible to change the frame period to another suitable time instead of 10 msec. Of course, the number of registered speakers is not limited to ten, and any number of speakers can be selected.

また話者認識のときに発声する音声は、特定のものでも
、登録時と認識時とが異なるものであってもよい。Furthermore, the voice uttered during speaker recognition may be specific or may be different at the time of registration and at the time of recognition.

発明の効果本発明によれば例えば語尾発声が長い短いというような
発声法に影脣されることなく、音質にもとづき話者認識
を行うことができるので、高精度の話者認識を行うこと
ができる。したがって、これによりもつとも類似した登
録話者の辞書を利用して不特定話者の音声認識率を高め
ることが可能となる。また電話を使用して入力される話
者に対しても２本発明により前処理を行って類似登録話
者を選定し、その後にその登録辞書を使用することによ
り高精度の音声認識を行うことができる。Effects of the Invention According to the present invention, it is possible to perform speaker recognition based on sound quality without being affected by vocalization methods such as having long and short final utterances, so that highly accurate speaker recognition can be performed. can. Therefore, this makes it possible to improve the speech recognition rate for unspecified speakers by using dictionaries of similar registered speakers. Furthermore, the present invention performs preprocessing on speakers input using a telephone to select similar registered speakers, and then performs highly accurate speech recognition by using the registered dictionary. I can do it.

[Brief explanation of drawings]

添付図面は本発明の一実施例構成図である。図中、１はマイクロフォン、２はバンドパス・フィルタ
・バンク、３はマルチプレクサ、４はアナログ・ディジ
タル変換器、５は細分類音種メモリ、６はチェビシェフ
ノルム計算回路、７は最小値演算部、８はデコーダ、９
は登録話者頻度記録部、１０は最大値演算部Ｔ　Ｓｌ　
＋　”’２はそれぞれスイッチ部である。特許出願人　富士通株式会社代理人弁理士　山　谷　晧　榮The accompanying drawing is a configuration diagram of an embodiment of the present invention. In the figure, 1 is a microphone, 2 is a band-pass filter bank, 3 is a multiplexer, 4 is an analog-to-digital converter, 5 is a subclassified note type memory, 6 is a Chebyshev norm calculation circuit, 7 is a minimum value calculation unit, 8 is a decoder, 9
10 is a registered speaker frequency recording unit, and 10 is a maximum value calculation unit TSL.
＋ ”'2 are switch parts respectively. Patent applicant: Fujitsu Limited, representative patent attorney Akira Yamatani Sakae

Claims

[Claims] A speaker recognition device that processes speech produced by a human and determines which of the registrants the speaker's voice resembles,
A subclassification sound type pattern memory is provided to store a parameter time series for each speaker, which is obtained by analyzing the voices of multiple people at a frame period. A selection means is provided for calculating the correlation with the parameter time series of the pattern memory and selecting the registered speaker name with the highest correlation for each frame period. This speaker recognition method is characterized by determining the name of a registered speaker who has been selected many times after utterance and outputting this as the speaker recognition result.