JPH03150953A

JPH03150953A - Automatic transfer telephone set

Info

Publication number: JPH03150953A
Application number: JP1289413A
Authority: JP
Inventors: Kunio Hirata; 平田　国男; Shingo Nishimura; 新吾西村; Masashi Miyagawa; 宮川　正志
Original assignee: Sekisui Chemical Co Ltd
Current assignee: Sekisui Chemical Co Ltd
Priority date: 1989-11-07
Filing date: 1989-11-07
Publication date: 1991-06-27

Abstract

PURPOSE:To surely and easily receive a call from a party desiring transfer with simple hardware constitution even when the arrival of the call comes from an external line or an extension by deciding whether or not a call arrived at present is to be transferred based on a voice signal specific to a transfer desiring person in advance. CONSTITUTION:Plural extension telephone sets 3A, 3B,... are connected to an exchange 2, which is provided with a storage section 5, an automatic reply section 6, a taker recognition section 8 and a reply control section 9. Whether or not the arrival of a call at present is to be transferred is decided based on a voice signal specific to a transfer desiring person in advance, then there is no limitation of making a call from a specific telephone set. Thus, even when the arrival of a call results from either an external line or an extension, the arrival of the call from a transfer desiring person in advance is surely and easily obtained with simple hardware constitution in the automatic transfer telephone set.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、自動転送電話装置に関する。[Detailed description of the invention] [Industrial application field] TECHNICAL FIELD The present invention relates to automatic transfer telephone equipment.

［従来の技術］従来、特開昭６３−２０３０４７に記載の如くの自動転
送電詰装置かある。[Prior Art] Conventionally, there is an automatic transfer charging device as described in Japanese Patent Application Laid-Open No. 63-203047.

この自動転送電話装置は、不在転送サービスを使用しよ
うとする内線加入者電話機からの特殊操作により、交換
機に接続されている記憶部に、転送元内線情報、転送先
内線情報、被転送者内線情報を登録する。そして、転送
元内線電話機に着信があった時、その着信の内線情報を
被転送者内線情報と比較し、その内線情報が被転送者と
して登録されているものであることを条件に、その着信
を転送先内線情報に従って転送する。This automatic forwarding telephone device stores forwarding source extension information, forwarding destination extension information, and forwarding party extension information in a storage unit connected to the exchange by a special operation from the telephone of an extension subscriber who is attempting to use the call forwarding service. Register. When a call is received on the forwarding extension telephone, the extension information of the call is compared with the extension information of the forwarded party, and on the condition that the extension information is the one registered as the forwarded party, the incoming call is is forwarded according to the forwarding destination extension information.

［発明か解決しようとする課Ｈ］然しながら、従来技術には下記■、■の問題点がある。[Section H trying to invent or solve] However, the prior art has the following problems (1) and (2).

■今回の着信が転送して良いものか否かの判断に、内線
情報を使用しており、外部からの着信は被転送対象とな
らない。■Extension information is used to determine whether or not the current incoming call can be forwarded, and incoming calls from outside will not be forwarded.

■上記■と同様に、転送サービスを受けようとする場合
には、登録された内線情報を持つ特定の電話機から発信
しなければならない。逆に、登録済の電話機を用いれば
、誰でも転送サービスな受けることかできる。■Similar to (■) above, if you wish to receive a transfer service, you must make a call from a specific telephone that has registered extension information. Conversely, anyone can use a registered telephone to receive a forwarding service.

本発明は、今回の着信が外線、内線のいずれからなされ
たものであっても、転送希望者からの着信を、簡単なハ
ードウェア構成により、確実かつ容易に行なえる自動転
送電話装置を提供することを目的とする。The present invention provides an automatic forwarding telephone device that can reliably and easily receive a call from a person who wishes to forward the call, regardless of whether the current call is received from an outside line or an extension line, using a simple hardware configuration. The purpose is to

［課題を解決するための手段］請求項１に記載の本発明は、転送元内線電話機情報と転
送先内線電話機情報とを記憶するとともに、予め特定の
転送希望者を登録されており、登録済の転送希望者から
転送元内線電話機に着信があった時、該着信を転送先内
線電話機へ自動転送する自動転送電話装置において、転
送元内線電話機への着信に対して自動応答する自動応答
部と、予め転送希望者である特定の話者を登録され、今
回の着信時に自動応答部に対して返答した発信者の音声
が上記登録話者の音声か否かを認識する話者認識部と、
話者認識部の認識結果により、今回の着信時における発
信者が上記登録話者であることを条件に、該着信を転送
先内線電話機へ接続する応答制御部とを有し、話者認識
部はニューラルネットワークを用いて上記発信者の音声
から話者認識するようにしたものである。[Means for Solving the Problems] The present invention according to claim 1 stores transfer source extension telephone information and transfer destination extension telephone information, and registers a specific transfer applicant in advance. In an automatic forwarding telephone device that automatically forwards the incoming call to a forwarding destination extension when a call is received from a transfer requestor to the forwarding extension, an automatic answering section that automatically answers the incoming call to the forwarding extension; , a speaker recognition unit that recognizes whether or not the voice of the caller who has previously registered a specific speaker who is a person who wishes to transfer the call and who has responded to the automatic response unit upon receiving the current call is the voice of the registered speaker;
and a response control unit that connects the incoming call to the forwarding destination extension telephone based on the recognition result of the speaker recognition unit, on the condition that the caller at the time of the current incoming call is the registered speaker, and the speaker recognition unit This system uses a neural network to recognize the speaker from the caller's voice.

請求項２に記載の本発明は、前記ニューラルネットワー
クへの入力として、［３］声の周波数特性の時間的変化、 ■音声の平均的な線形予測係数、 ■音声の平均的なＰＡＲ（：ＯＲ係数、［４］声の平均
的な周波数特性、及びピッチ周波数、 ■高域強調を施された音声波形の平均的な周波数特性、
並びに ■音声の平均的な周波数特性のうちの１つ以上を使用するようにしたものである。The present invention according to claim 2 provides, as inputs to the neural network, [3] temporal changes in voice frequency characteristics, ■ average linear prediction coefficient of voice, ■ average PAR (:OR) of voice. coefficient, [4] Average frequency characteristics of voice and pitch frequency, ■ Average frequency characteristics of voice waveform with high frequency emphasis,
and (1) one or more of the average frequency characteristics of voice is used.

請求項３に記載の本発明は、前記ニューラルネットワー
クが階層的なニューラルネットワークであるようにした
ものである。According to a third aspect of the present invention, the neural network is a hierarchical neural network.

［作用］請求項１に記載の本発明によれば、下記（１）〜（３）
の作用効果がある。[Function] According to the present invention as set forth in claim 1, the following (1) to (3)
It has the function and effect of

（１）今回の着信が転送して良いものか否かの判断を、
予め特定した転送希望者に固有の音声により行なうもの
であるから、特定の電話機から必ずかけなければならな
いという制約がない。(1) Determine whether or not this incoming call should be forwarded.
Since the call is made using a voice unique to the person who wishes to transfer, which has been specified in advance, there is no restriction that the call must be made from a specific telephone.

（２）上記（１）により、内線に限らず、外線からの着
信も被転送対象となる。又、転送サービスを受は得るも
のを、転送希望者として予め登録した話者のみに特定で
きる。(2) According to (1) above, incoming calls from not only internal lines but also external lines are subject to transfer. Further, those who receive the transfer service can be specified only to speakers who have been registered in advance as transfer applicants.

（３）話者認識の手段として、ニューラルネットワーク
を用いたので、下記■〜■のメリットがある。(3) Since a neural network is used as a means of speaker recognition, there are the following advantages.

■経時的な正常動作率の劣化が極めて少ない。このこと
は、後述する実験結果により確認されていることである
が、ニューラルネットワークが音声の時期差による変動
の影響を受けにくい構造をとることが可能なためと推定
される。■Deterioration of normal operation rate over time is extremely small. This has been confirmed by the experimental results described below, and is presumed to be because the neural network can have a structure that is less susceptible to fluctuations due to differences in audio timing.

■ニューラルネットワークは、原理的に、ネットワーク
全体の演算処理か単純且つ迅速である。■Neural networks are, in principle, simple and quick to perform calculations on the entire network.

■ニューラルネットワークは、原理的に、それを構成し
ている各ユニットが独立に動作しており、並列的な演算
処理が可能である。従って、演算処理が迅速である。■In principle, each unit that makes up a neural network operates independently, and parallel arithmetic processing is possible. Therefore, calculation processing is quick.

■上記■〜■により、自動転送電話装置を複雑な処理装
置によることなく容易に実時間処理できる。(2) With the above (2) to (3), automatic forwarding telephone equipment can be easily processed in real time without using a complicated processing device.

又、請求項２に記載の本発明によれば上記（１）〜（３
）の作用効果に加えて、下記（４）の作用効果かある。Further, according to the present invention as set forth in claim 2, the above (1) to (3)
In addition to the effects of ), there is also the effect of (4) below.

（４）ニューラルネットワークへの入力として、請求項
２に記載の■〜■の各要素のうちの１つ以上を用いるか
ら、入力を得るための前処理が、従来の複雑な特徴量抽
出に対して、単純となり、この前処理に要する時間が短
くて足りる。(4) Since one or more of each of the elements (■ to ■) described in claim 2 is used as input to the neural network, the preprocessing for obtaining the input is different from conventional complex feature extraction. Therefore, the process is simple, and the time required for this preprocessing is short.

又、請求項３に記載の本発明によれば上記（１）〜（４
）の作用効果に加えて、下記（５）の作用効果がある。Further, according to the present invention according to claim 3, the above (1) to (4)
), there is the following effect (5).

（５）階層的なニューラルネットワークにあっては、現
在、後述する如くの簡単な学習アルゴリズム（パックプ
ロパゲーション）が確立されており、高い認識率を実現
できるニューラルネットワークを容易に形成てきる。(5) Regarding hierarchical neural networks, a simple learning algorithm (pack propagation) as described later has been established, and it is possible to easily form a neural network that can achieve a high recognition rate.

［実施例］第１図は本発明が適用された自動転送電話装置の一例を
示す模式図、第２図は話者認識部の一例を示す模式図、
第３図は入力音声を示す模式図、第４図はバンドパスフ
ィルタの出力を示す模式図、第５図はニューラルネット
ワークを示す模式図、第６図は階層的なニューラルネッ
トワークを示す模式図、第７図はユニットの構造を示す
模式図である。[Example] FIG. 1 is a schematic diagram showing an example of an automatic transfer telephone device to which the present invention is applied, and FIG. 2 is a schematic diagram showing an example of a speaker recognition unit.
Fig. 3 is a schematic diagram showing input audio, Fig. 4 is a schematic diagram showing the output of a bandpass filter, Fig. 5 is a schematic diagram showing a neural network, and Fig. 6 is a schematic diagram showing a hierarchical neural network. FIG. 7 is a schematic diagram showing the structure of the unit.

本発明の具体的実施例の説明に先立ち、ニューラルネッ
トワークの構成、学習アルゴリズムについて説明する。Prior to describing specific embodiments of the present invention, the configuration of the neural network and the learning algorithm will be described.

（１）ニューラルネットワークは、その構造から、第５
図（Ａ）に示す階層的ネットワークと第５図（Ｂ）に示
す相互結合ネットワークの２種に大別できる。本発明は
、両ネットワークのいずれを用いて構成するものであっ
ても良いが、階層的ネットワークは後述する如くの簡単
な学習アルゴリズムが確立されているためより有用であ
る。(1) Due to its structure, neural networks are
It can be roughly divided into two types: the hierarchical network shown in FIG. 5(A) and the interconnected network shown in FIG. 5(B). Although the present invention may be constructed using either of these networks, the hierarchical network is more useful because a simple learning algorithm as described below has been established.

（２）ネットワークの構造階層的ネットワークは、第６図に示す如く、入力層、中
間層、出力層からなる階層構造をとる。(2) Network Structure A hierarchical network has a hierarchical structure consisting of an input layer, an intermediate layer, and an output layer, as shown in FIG.

各層は１以上のユニットから構成される。結合は、入力
層→中間層→出力層という前向きの結合たけで、各層内
での結合はない。Each layer is composed of one or more units. The connections are forward connections from the input layer to the middle layer to the output layer, and there are no connections within each layer.

（３）ユニットの構造ユニットは第７図に示す如く脳のニューロンのモデル化
であり構造は簡単である。他のユニットから入力を受け
、その総和をとり一定の規則（変換関数）で変換し、結
果を出力する。他のユニットとの結合には、それぞれ結
合の強さを表わす可変の重みを付ける。(3) Structure of the unit The unit is a model of a neuron in the brain and has a simple structure as shown in FIG. It receives input from other units, sums it up, transforms it using a certain rule (conversion function), and outputs the result. Each connection with another unit is given a variable weight that represents the strength of the connection.

（４）学習（パックプロパゲーション）ネットワークの
学習とは、実際の出力を目標値（望ましい出力）に近づ
けることであり、−Ｍ的には第７図に示した各ユニット
の変換関数及び重みを変化させて学習を行なう。(4) Learning (pack propagation) Learning of a network is to bring the actual output closer to the target value (desired output). Learn by making changes.

又、学習のアルゴリズムとしては、例えば、Ｒｕｍｅｌ
ｈａｒｔ、　　Ｄ、Ｅ、、ＭｃＣｌｅｌｌａｎｄ、　　
Ｊ、Ｌ、　　ａｎｄ　　ｔｈｅＰＤＰ　Ｒｅ５ｅａｒｃ
ｈ　Ｇｒｏｕｐ、　ＰＡＲＡＬＬＥＬ　ＤＩＳＴＲＩＢ
ＵＴＥＤＰＲＯＣＥＳＳＩＮＧ、　ｔｈｅ　ＭＩＴ　Ｐ
ｒｅｓｓ、　１９８６．に記載されているパックプロパ
ゲーションを用いることができる。Further, as a learning algorithm, for example, Rumel
hart, D.E., McClelland,
J, L, and thePDP Re5earc
h Group, PARALLEL DISTRIB
UTED PROCESSING, the MIT P
ress, 1986. Pack propagation as described in .

以下１本発明の具体的な実施例について説明する。A specific embodiment of the present invention will be described below.

自動転送電話装置１は、第１図に示す如く、交換機２に
複数の内線電話機３Ａ、３Ｂ、３Ｃ・・・を接続して備
えるとともに、記憶部５、自動応答部６、話者認識部８
、応答制御部９を有して構成されている。As shown in FIG. 1, the automatic transfer telephone device 1 includes a plurality of extension telephones 3A, 3B, 3C, .
, a response control section 9.

（１）記憶部５は、転送元内線電話機情報と転送先内線
電話機情報とを記憶している。(1) The storage unit 5 stores transfer source extension telephone information and transfer destination extension telephone information.

（２）自動応答部６は、転送元内線電話機での着信に対
して自動応答する。(2) The automatic response unit 6 automatically responds to incoming calls on the transfer source extension telephone.

（３）話者認識部８は、ニューラルネットワークを用い
て、後述する如くにより、予め転送希望者である特定の
話者を登録され、今回の着信時に自動応答部６に対して
返答した発信者の音声が上記登録話者の音声か否かを認
識する。(3) Using a neural network, the speaker recognition unit 8 registers in advance a specific speaker who wishes to transfer the call, as will be described later, and the caller who responded to the automatic response unit 6 when receiving the current call. It is recognized whether or not the voice is the voice of the registered speaker.

この時、話者認識部８は、前処理部１ｏ、ニューラルネ
ットワーク２０、判定回路３０の結合にて構成される。At this time, the speaker recognition section 8 is configured by combining a preprocessing section 1o, a neural network 20, and a determination circuit 30.

前処理部１０は、入力された音声に、後述する如くの簡
単な前処理を施す。The preprocessing unit 10 performs simple preprocessing on the input audio as described below.

ニューラルネットワーク２０は、下記■の学習動作と下
記■の評価動作を行なう。The neural network 20 performs the following learning operation (2) and the following evaluation operation (2).

■学習学習単語を、例えば全登録話者について共通の「転送」
とする。■Learning Learning words, for example, common "transfer" for all registered speakers
shall be.

目標値（出力層を構成する各出カニニットの目標出力値
）を、登録話者については（１，Ｏ）、その他の話者に
ついては（０，１）とする。The target value (target output value of each output unit constituting the output layer) is set to (1, O) for registered speakers and (0, 1) for other speakers.

登録話者の入力音声「転送」に、前処理部１０による前
処理を施し、この前処理結果をニューラルネットワーク
２０に入力する。そして、ニューラルネットワーク２０
の出力値（出力層を構成する各出カニニットの出力値）
が上記目標値に近づくように、ニューラルネットワーク
２０の各ユニットの変換関数及び重みを修正する。The preprocessing section 10 performs preprocessing on the input voice "transfer" of the registered speaker, and the preprocessing result is input to the neural network 20. And neural network 20
Output value (output value of each output unit that makes up the output layer)
The conversion function and weight of each unit of the neural network 20 are modified so that the value approaches the target value.

この学習動作をくり返す。Repeat this learning action.

■評価今回着信時における発信者が発呼した単語に前処理を施
し、この前処理を施した単語の音声をニューラルネット
ワーク２０に入力し、ニューラルネットワークの出力値
を得る。■Evaluation Preprocessing is applied to the words spoken by the caller at the time of this call arrival, and the preprocessed word speech is input to the neural network 20 to obtain the output value of the neural network.

判定回路３０は、このニューラルネットワーク２０の出
力値があるしきい値を超え、（ｌＯ）に近ければ今回着
信■キに４５ける発信りを登録話者として認識する。If the output value of the neural network 20 exceeds a certain threshold and is close to (lO), the determination circuit 30 recognizes the caller 45 in the current incoming call as a registered speaker.

（４）応答制御部９は、話者認識部８の認識結果により
、今回の着信時における発信者が上記登録話者であるこ
とを条件に、該着信を転送先内線電話機に接続する。(4) Based on the recognition result of the speaker recognition unit 8, the response control unit 9 connects the incoming call to the transfer destination extension telephone on the condition that the caller at the time of the current incoming call is the registered speaker.

以下、話者認識部８として、第２図に示す如く、階層的
なニューラルネットワーク２０を用い、ニューラルネッ
トワーク２０の入力として音声の一定時間内における平
均的な周波数特性の時間的変化を用いた場合の具体的実
施例について説明する。Hereinafter, a case will be described in which a hierarchical neural network 20 is used as the speaker recognition unit 8, as shown in FIG. A specific example will be described.

尚、話者認識部８の前処理部１０は、第２図に示す如く
、音声入力部１２、ローパスフィルタ１３、バンドパス
フィルタ１４、平均化回路１５、メモリ１６の結合にて
構成される。The preprocessing section 10 of the speaker recognition section 8 is composed of a voice input section 12, a low-pass filter 13, a band-pass filter 14, an averaging circuit 15, and a memory 16, as shown in FIG.

（Ａ）ニューラルネットワーク２０の学習■登録話者５
名、その他の話者２５名が音声入力部１２に「転送」の
音声をそれぞれ１０回入力する。(A) Learning of neural network 20 ■Registered speaker 5
and 25 other speakers each input the word "transfer" into the voice input section 12 ten times.

■人力音声の音声信号の高域成分を、ローパスフィルタ
１３にてカットする。そして、この入力音声を第３図に
示す如く、４つのブロックに時間的に等分割する。■The high-frequency component of the human-powered audio signal is cut by the low-pass filter 13. Then, this input audio is temporally equally divided into four blocks as shown in FIG.

■音声波形を、第２図に示す如く、複数（ｎ個）チャン
ネルのバンドパスフィルタ１４に通し、各ブロック即ち
各一定時間毎に第４図（Ａ）〜（Ｄ）のそれぞれに示す
如くの周波数特性を得る。■As shown in Fig. 2, the audio waveform is passed through a band pass filter 14 of multiple (n) channels, and each block, that is, each fixed time, is passed through the band pass filter 14 as shown in Figs. 4 (A) to (D). Obtain frequency characteristics.

この時、バンドパスフィルタ１４の出力信号は、平均化
回路１５にて、各ブロック毎、即ち一定時間で平均化さ
れる。At this time, the output signal of the bandpass filter 14 is averaged by the averaging circuit 15 for each block, that is, for a certain period of time.

以上の前処理により、「音声の一定時間内における平均
的な周波数特性の時間的変化」が得られた。Through the above preprocessing, the "temporal change in the average frequency characteristics of audio within a certain period of time" was obtained.

平均化回路１５の出力は、直接的にニューラルネットワ
ーク２０に転送され、或いはメモリ１６を経由して間接
的にニューラルネットワーク２０に転送される。The output of the averaging circuit 15 is transferred directly to the neural network 20 or indirectly via the memory 16.

■ニューラルネットワーク２０は、３層の階層的なニュ
ーラルネットワークにて構成される。入力層２１は、前
処理の４ブロツク、ｎチャンネルに対応する４Ｘｎユニ
ツトにて構成される。出力層２２は、登録話者群とその
他の話者群との２ユニツトにて構成される。(2) The neural network 20 is composed of a three-layer hierarchical neural network. The input layer 21 is composed of 4×n units corresponding to 4 blocks of preprocessing and n channels. The output layer 22 is composed of two units: a registered speaker group and other speaker groups.

出力層２２の目標値を、登録話者については（１，０）
その他の話者については（０，１）とする。Set the target value of the output layer 22 to (1,0) for registered speakers.
For other speakers, it is set to (0, 1).

出力層２２の出力値が目標値に近づくようにニューラル
ネットワーク２０の重みと変換関数な修正する。この操
作をくり返し、入力に対する出力のエラーが一定レベル
に収束するまて学習して、一定認識率を保証し得るネッ
トワークを構築する。The weights and conversion functions of the neural network 20 are corrected so that the output value of the output layer 22 approaches the target value. This operation is repeated until the error in the output relative to the input converges to a certain level, and the system learns to construct a network that can guarantee a certain recognition rate.

（Ｂ）自動転送電話装置１の制御 ■第１内線電話機３Ａを転送元として定める時、まずそ
の電話機３Ａを使用して転送先電話機番号、転送希望者
（複数ても可）の音声「転送」を入力する。(B) Control of the automatic transfer telephone device 1■ When determining the first extension telephone 3A as the transfer source, first use the telephone 3A to enter the transfer destination telephone number and the voice of the transfer requestor (or persons) to "transfer". Enter.

これにより、転送元としての第１内線電話機３Ａの内線
情報、及び特定された転送先電話機の内線情報か記憶部
５に記憶される。As a result, the extension information of the first extension telephone 3A as the transfer source and the extension information of the specified transfer destination telephone are stored in the storage unit 5.

尚、第１内線電話機３Ａに上述の如く入力された転送希
望者の音声「転送」は、話者認識部８の前処理部１０に
入力され、前記（Ａ）の■にて記載したニューラルネッ
トワーク２０の学習のための音声入力として供されるも
のである。Incidentally, the voice "transfer" of the person requesting transfer inputted into the first extension telephone 3A as described above is inputted into the preprocessing section 10 of the speaker recognition section 8, and is processed by the neural network described in (A) above. This is provided as audio input for learning 20.

■その後、内線又は外線から、転送元として定めた第１
内線電話機３Ａに着信があると、自動応答部６が作動し
、案内メツセージ「はい、佐原てす。お話下さい。」を
自動応答する。■After that, from the internal or external line,
When the extension telephone 3A receives a call, the automatic answering section 6 is activated and automatically responds with the guidance message "Yes, Sahara Tesu. Please talk to me."

■ここで、発信者が登録話者であり、「転送」と発声す
ると、この音声は話者認識部８の前処理部１０において
、上記（Ａ）の■、■と同じ前処理を施され、この前処
理結果は話者認識部８のニューラルネットワーク２０に
入力される。■Here, when the caller is a registered speaker and utters "transfer," this voice is subjected to the same preprocessing as in (A) above in the preprocessing section 10 of the speaker recognition section 8. , this preprocessing result is input to the neural network 20 of the speaker recognition section 8.

■話者認識部８の判定回路３０はニューラルネットワー
ク２０の出力を得て、ニューラルネットワーク２０の出
力が（１，０）に近ければ今回着信時の発信者を登録話
者と判定し、（０，１）に近ければ今回着信時の発信者
をその他の話者と判定する。■The determination circuit 30 of the speaker recognition unit 8 receives the output of the neural network 20, and if the output of the neural network 20 is close to (1, 0), it determines that the caller at the time of the current call is the registered speaker, and (0 , 1), the caller at the time of the current call is determined to be another speaker.

■応答制御部９は、■話者認識部８の認識結果が「登録
話者」であれば、交換機２を制御することにて、上記■
の着信を、記憶部５に記憶されている転送先内線電話機
へ接続し、■話者認識部８の認識結果が「その他の話者
」であれば、自動応答部６を作動させて「ただいま外出
しております。」を返し、通話終了とする。■If the recognition result of the ■speaker recognition unit 8 is "registered speaker," the response control unit 9 controls the exchange 2 to
The incoming call is connected to the transfer destination extension telephone stored in the storage unit 5, and if the recognition result of the speaker recognition unit 8 is “other speaker”, the automatic response unit 6 is activated and the message “I’m home” is sent. I'm out of town.'' and the call ends.

（Ｃ）実験上記自動転送電話装置１を用いた結果、学習直後の正常
動作率１００％に対し、３ケ月経過後の正常動作率９９
％であり、経時的な正常動作率の劣化が極めて少ないこ
とが認められた。(C) Experiment As a result of using the above automatic transfer telephone device 1, the normal operation rate was 100% immediately after learning, but the normal operation rate was 99 after 3 months.
%, and it was recognized that the deterioration of the normal operation rate over time was extremely small.

又、自動転送電話装置１の処理速度（１単語の発声に対
する認識に要した時間）は　１秒以内であった。Further, the processing speed (the time required to recognize the utterance of one word) of the automatic forwarding telephone device 1 was within 1 second.

次に、上記実施例の作用について説明する。Next, the operation of the above embodiment will be explained.

（２）上記（１）により、内線に限らず、外線の着信も
被転送対象となる。又、転送サービスを受は得るものを
、転送希望者として、予め登録した話者のみに特定でき
る。(2) According to (1) above, incoming calls from not only internal lines but also external lines are subject to transfer. In addition, those who receive the transfer service can be specified only to speakers who have been registered in advance as transfer applicants.

（３）話者認識の手段として、ニューラルネットワーク
２０を用いたのて、下記■〜■のメリットがある。(3) Using the neural network 20 as a means of speaker recognition has the following advantages.

■経時的な正常動作率の劣化か極めて少ない。■Deterioration of normal operation rate over time is extremely small.

とのことは、ニューラルネットワーク２０が音声の時期
差による変動の影響を受けにくい構造をとることが可能
なためと推定される。This is presumably because the neural network 20 can have a structure that is less susceptible to fluctuations due to differences in audio timing.

■ニューラルネットワーク２０は、原理的に、ネットワ
ーク全体の演算処理が単純且つ迅速である。(2) In principle, the neural network 20 has simple and quick calculation processing for the entire network.

■ニューラルネットワーク２０は、原理的に、それを構
成している各ユニットか独立に動作しており、並列的な
演算処理が可能である。従って、演算処理が迅速である
。(2) In principle, the neural network 20 operates independently of each unit that constitutes it, and is capable of parallel arithmetic processing. Therefore, calculation processing is quick.

■上記■〜■により、自動転送電話装置１を複雑な処理
装置によることなく容易に実時間処理できる。(2) With the above (2) to (2), the automatic transfer telephone device 1 can be easily processed in real time without using a complicated processing device.

■ニューラルネットワーク２０への入力として、［音声
の周波数特性の時間的変化」を用いたから、入力を得る
ための前処理が従来の複雑な特徴量抽出に比して、単純
となりこの前処理に要する時間が短くて足りる。■As the input to the neural network 20 is the temporal change in the frequency characteristics of the voice, the preprocessing required to obtain the input is simpler than the conventional complex feature extraction. The time is short enough.

この時、上記ニューラルネットワークへの入力として、
更に、「音声の一定時間内における平均的な周波数特性
の時間的変化」を用いたから、ニューラルネットワーク
２０における処理が単純となり、この処理に要する時間
かより短くて足りる。At this time, as an input to the above neural network,
Furthermore, since "temporal changes in the average frequency characteristics of audio within a certain period of time" are used, the processing in the neural network 20 is simple, and the time required for this processing is shorter.

０階層的なニューラルネットワーク２０を用いたから、
現在、既に確立している簡単な学習アルゴリズム（パッ
クプロパゲーション）を用いて、高い認識率を達成でき
る。Since we used a zero-layer neural network 20,
Currently, a high recognition rate can be achieved using a simple learning algorithm (pack propagation) that has already been established.

尚、本発明の実施においては、ニューラルネットワーク
への入力として、［３］声の周波数特性の時間的変化、［３］声の平均的な線形予測係数、［３］声の平均的なＰＡＲＣＯＲ係数、［４］声の平均
的な周波数特性、及びピッチ周波数、 ■高域強調を施された音声波形の平均的な周波数特性、
並びに［３］声の平均的な周波数特性のうちの１つ以上を使用できる。In the implementation of the present invention, as inputs to the neural network, [3] Temporal changes in frequency characteristics of voices, [3] Average linear prediction coefficients of voices, and [3] Average PARCOR coefficients of voices. , [4] Average frequency characteristics of voice and pitch frequency, ■ Average frequency characteristics of voice waveform with high frequency emphasis,
and [3] one or more of the average frequency characteristics of the voice can be used.

そして、上記■の要素が更に「音声の一定時間内におけ
る平均的な周波数特性の時間的変化」として用いられた
ように、上記■の要素は「音声の一定時間内における平
均的な線形予測係数の時間的変化」、上記■の要素は「
音声の一定時間内における平均的なＰＡＲＣＯＲ係数の
時間的変化」、上記■の要素は「音声の一定時間内にお
ける平均的な周波数特性、及びピッチ周波数の時間的変
化」、上記■の要素は、「高域強調を施された音声波形
の一定時間内における平均的な周波数特性の時間的変化
」として用いることができる。Then, just as the element (■) above was further used as "temporal change in the average frequency characteristics within a certain period of time", the element (■) above is also used as "the average linear prediction coefficient within a certain period of time". "Temporal change in
``Temporal change in the average PARCOR coefficient within a certain time period of audio'', the above element (■) is ``the average frequency characteristic and temporal change in pitch frequency within a certain time period of audio'', and the above element (■) is: It can be used as a "temporal change in the average frequency characteristics within a certain period of time of a voice waveform that has been subjected to high-frequency emphasis."

尚、上記■の線形予測係数は、以下の如く定義される。Incidentally, the linear prediction coefficient of (2) above is defined as follows.

即ち、音声波形のサンプル値（χ０）の間には、一般に
高い近接相関があることが知られている。That is, it is known that there is generally a high proximity correlation between sample values (χ0) of audio waveforms.

そこで次のような線形予測が可能であると仮定する。Therefore, it is assumed that the following linear prediction is possible.

線形予測値　　χ、＝−ΣαＬχｔ−１・・・（１）線
形予測誤差　ε（＝χ、−χｔ　　・・・（２）ここで
、χｔ：時刻ｔにおける音声波形のサンプル値、（αｔ
）（ｉ＝ｔ、・・・、ｐ）：（９次の）線形予測係数さて、本発明の実施においては、線形予測誤差εｔの２
乗平均値か最小となるように線形予測係数（αｉ）を求
める。Linear predicted value χ, = -ΣαLχt-1...(1) Linear prediction error ε(=χ, -χt...(2) Here, χt: sample value of the audio waveform at time t, (αt
) (i=t,...,p): (9th order) linear prediction coefficient Now, in the implementation of the present invention, 2 of the linear prediction error εt
The linear prediction coefficient (αi) is determined so that the root mean value becomes the minimum value.

具体的には　（ｅ　ｔ）２を求め、その時間平均を（ε
ｔ）２と表わシテ、θ（εｔａ”　／　ａ　ａ　１　＝
Ｏ，ｔ＝１．２．・・・、ｐとおくことによって、次の
式から（α、）が求められる。Specifically, (e t)2 is calculated, and its time average is (ε
t) expressed as 2, θ(εta” / a a 1 =
O,t=1.2. ..., p, (α,) can be obtained from the following equation.

Σ’（！　１ＶＩＩ−Ｊｌ　　＝’＋　Ｊ＝１　＋　２
　＋　”’＋　ｐ・・”　（３）又、上記■のＰＡＲＣ
ＯＲ係数は以下の如く定義される。Σ'(! 1VII-Jl ='+ J=1 + 2
+ "'+ p..." (3) Also, PARC of the above ■
The OR coefficient is defined as follows.

即ち、［ｋｎ］（ｎ＝１．・・・、ｐ）を（９次の）Ｐ
ＡＲＣＯＲ係数（偏自己相関係数）とする時、ＰＡＲ（
：ＯＲ係数に、＋□は、線形予測による前向き残差ε、
　（ｆｌと後向き残差εｔ−＋ｎ＋、、　（ｂ）間の正
規化相関係数として、次の式によって定義される。That is, [kn] (n=1..., p) is (9th order) P
When using ARCOR coefficient (partial autocorrelation coefficient), PAR (
: In the OR coefficient, +□ is the forward residual ε due to linear prediction,
The normalized correlation coefficient between fl and backward residual εt-+n+, (b) is defined by the following equation.

ε　も−（ｎ◆１）・・・（４）ここで、ε、（ｔ′＝χｔ−Σ　α・χｔ−１゛（αｌ
）　：前向き予測係数、 ε　ｔ−＋ｎ◆目　（ｂ）＝　　χ　ｔ　−Ｌ　ｎ　＊
　ｌ　）　　−、ＥｌｌＪ１３　　Ｊ　＋　χ　ｔ−ｊ
　　、（βｊ）　：後向き予測係数又、上記■の音声のピッチ周波数とは、声帯波の繰り返
し周期（ピッチ周期）の逆数である。ε also -(n◆1) ...(4) Here, ε, (t'=χt-Σ α・χt-1゛(αl
): Forward prediction coefficient, ε t-+n◆th (b) = χ t -L n *
l ) −, EllJ13 J + χ t−j
, (βj) : Backward prediction coefficient Also, the pitch frequency of the voice in the above (■) is the reciprocal of the repetition period (pitch period) of the vocal cord wave.

尚、ニューラルネットワークへの入力として、個人差が
ある声帯の基本的なパラメータであるピッチ周波数を付
加したから、特に大人／小人、男性／女性間の話者の認
識率を向上することかできる。In addition, since pitch frequency, which is a basic parameter of the vocal cords that differs between individuals, was added as an input to the neural network, it is possible to improve the recognition rate of speakers, especially between adults/dwarfs and male/female. .

又、上記■の高域強調とは、音声波形のスペクトルの平
均的な傾きを補償して、低域にエネルギが集中すること
を防止することである。然るに、音声波形のスペクトル
の平均的な傾きは話者に共通のものであり、話者の認識
には無関係である。Furthermore, the above-mentioned high frequency enhancement (2) is to compensate for the average slope of the spectrum of the audio waveform to prevent concentration of energy in the low frequency range. However, the average slope of the spectrum of the speech waveform is common to all speakers and is unrelated to the speaker's recognition.

ところか、このスペクトルの平均的な傾きが補償されて
いない音声波形をそのままニューラルネットワークへ人
力する場合には、ニューラルネットワークか学習する時
にスペクトルの平均的な傾きの特徴の方を抽出してしま
い、話者の認識に必要なスペクトルの山と谷を抽出する
のに時間がかかる。これに対し、ニューラルネットワー
クへの入力を高域強調する場合には、話者に共通で、認
識には無関係でありながら、学習に影響を及ぼすスペク
トルの平均的な傾きを補償できるため、学習速度か速く
なるのである。On the other hand, if we manually input a speech waveform whose average slope of the spectrum has not been compensated for into a neural network, the feature of the average slope of the spectrum will be extracted when the neural network learns. It takes time to extract the peaks and valleys of the spectrum necessary for speaker recognition. On the other hand, when high-frequency emphasis is applied to the input to a neural network, it is possible to compensate for the average slope of the spectrum that is common to all speakers and is unrelated to recognition, but that affects learning, which speeds up the learning process. or faster.

［発明の効果］以上のように本発明によれば、今回の着信が外線、内線
のいずれからなされたものであっても、転送希望者から
の着信を、簡単なハードウェア構成により、確実かつ容
易に行なえる自動転送電話装置を得ることがてきる。[Effects of the Invention] As described above, according to the present invention, regardless of whether the current incoming call is from an outside line or an extension, incoming calls from a person requesting transfer can be received reliably and with a simple hardware configuration. You can get a call forwarding device that is easy to use.

[Brief explanation of the drawing]

第１図は本発明が適用された自動転送電話装置の一例を
示す模式図、第２図は話者認識部の一例を示す模式図、
第３図は入力音声を示す模式図、第４図はバンドパスフ
ィルタの出力を示す模式図、第５図はニューラルネット
ワークを示す模式図、第６図は階層的なニューラルネッ
トワークを示す模式図、第７図はユニットの構造を示す
模式図である。１・・・自動転送電話装置、５・・・記憶部、６・・・自動応答部、８・・・話者認識部、９・・・応答制御部、１０・・・前処理部、２０・・・ニューラルネットワーク、３０・・・判定回路。FIG. 1 is a schematic diagram showing an example of an automatic transfer telephone device to which the present invention is applied; FIG. 2 is a schematic diagram showing an example of a speaker recognition unit;
Fig. 3 is a schematic diagram showing input audio, Fig. 4 is a schematic diagram showing the output of a bandpass filter, Fig. 5 is a schematic diagram showing a neural network, and Fig. 6 is a schematic diagram showing a hierarchical neural network. FIG. 7 is a schematic diagram showing the structure of the unit. DESCRIPTION OF SYMBOLS 1... Automatic forwarding telephone device, 5... Storage unit, 6... Automatic response unit, 8... Speaker recognition unit, 9... Response control unit, 10... Preprocessing unit, 20 ... Neural network, 30... Judgment circuit.

Claims

[Claims]

(1) When the transfer source extension telephone information and the transfer destination extension telephone information are stored, and a specific transfer applicant is registered in advance, and the transfer source extension telephone receives a call from a registered transfer applicant, The automatic forwarding telephone device that automatically forwards the incoming call to the forwarding extension telephone has an automatic answering section that automatically answers the incoming call to the forwarding extension telephone, and a specific speaker who is the person who wishes to forward the call, which is registered in advance, and this time. A speaker recognition unit recognizes whether the voice of the caller who responded to the automatic answering unit when the call arrives is the voice of the registered speaker, and the recognition result of the speaker recognition unit identifies the caller at the time of the current call. and a response control unit that connects the incoming call to the transfer destination extension telephone on the condition that the caller is the registered speaker, and the speaker recognition unit recognizes the speaker from the voice of the caller using a neural network. An automatic forwarding telephone device characterized by:

(2) As inputs to the neural network, [1] Temporal changes in frequency characteristics of voice, [2] Average linear prediction coefficient of voice, [3] Average PARCOR coefficient of voice, [4] Voice using one or more of the following: average frequency characteristics and pitch frequency; [5] average frequency characteristics of high-frequency emphasized speech waveform; and [6] average frequency characteristics of voice. The automatic transfer telephone device according to claim 1.

(3) The automatic transfer telephone device according to claim 1 or 2, wherein the neural network is a hierarchical neural network.