JP2006343644A

JP2006343644A - Speech recognition method, speech recognition apparatus, program, and recording medium

Info

Publication number: JP2006343644A
Application number: JP2005170836A
Authority: JP
Inventors: Satoru Kobashigawa; 哲小橋川; Atsunori Ogawa; 厚徳小川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2005-06-10
Filing date: 2005-06-10
Publication date: 2006-12-21

Abstract

【課題】電話機の送話信号と受話信号とを別々に音声認識する音声認識方法において、受話信号に回り込む側音成分に起因する音声認識率の低下を改善する。
【解決手段】電話機から送話信号と受話信号とを取り出し、これら送話信号と受話信号を別々に音声認識する音声認識装置において、送話信号が受話信号に回り込むゲインを推定する側音ゲイン推定部と、受話信号の最大振幅を推定する受話信号最大振幅推定部と、推定した側音ゲインと受話信号最大振幅を用いて送話信号に対する側音信号を推定する側音信号推定部と、推定した側音信号により受話信号から側音信号を抑圧する側音抑圧処理部と、側音抑圧処理後の受話音声と送話音声とを別々に音声認識する音声認識処理部とを備える。
【選択図】図１In a speech recognition method for recognizing separately a speech signal and a reception signal of a telephone, a reduction in speech recognition rate due to a side sound component that wraps around the reception signal is improved.
In a speech recognition apparatus that extracts a transmission signal and a reception signal from a telephone and recognizes the transmission signal and the reception signal separately, a side-tone gain estimation for estimating a gain that the transmission signal wraps around the reception signal. A reception signal maximum amplitude estimation unit that estimates the maximum amplitude of the reception signal, a side sound signal estimation unit that estimates a side sound signal for the transmission signal using the estimated side sound gain and reception signal maximum amplitude, A side-tone suppression processing unit that suppresses the side-tone signal from the received signal by the received side-tone signal, and a voice recognition processing unit that recognizes the received voice and the transmitted voice after the side-tone suppression processing separately.
[Selection] Figure 1

Description

本発明は電話機から送話信号と受話信号とを取り出し、これら送話信号と受話信号とを別々に音声認識するための音声認識方法、音声認識装置、音声認識プログラム、このプログラムを記録した記録媒体に関する。 The present invention relates to a voice recognition method, a voice recognition device, a voice recognition program, and a recording medium on which this program is recorded, for taking out a transmission signal and a reception signal from a telephone and recognizing the transmission signal and the reception signal separately. About.

図３に示すように電話機１９とハンドセット１８の間の回線から、ハンドセット１８のマイク部１１に接続した送話信号に相当する回路と、ハンドセット１８のスピーカ部１２に接続した受話信号に相当する回線を分岐する分岐装置２０（例えば特許文献１）があり、この分岐装置２０の送話端子及び受話端子をそれぞれ分けて送話信号と受話信号とを別々に音声認識する音声認識装置１１０は容易に考えられる。この従来の分岐装置２０により送話信号のみを取り出すことは可能である。
特開昭６０−２２３３７０号公報 As shown in FIG. 3, a circuit corresponding to a transmission signal connected to the microphone unit 11 of the handset 18 from a line between the telephone set 19 and the handset 18 and a line corresponding to a reception signal connected to the speaker unit 12 of the handset 18. There is a branching device 20 (for example, Patent Document 1), and the speech recognition device 110 that recognizes the speech signal and the reception signal separately by dividing the transmission terminal and the reception terminal of the branching device 20 easily. Conceivable. It is possible to extract only the transmission signal by the conventional branching device 20.
JP 60-223370 A

通常の電話機１９では、電話機１９の内部の側音回路１５で送話信号が受話信号に回り込んだ側音が受話信号に重畳することになり、受話信号のみを取り出すのは難しい課題がある。また送話信号のレベルが比較的大きく、又は側音回路１５の利得が高いために送話信号が受話信号に側音信号として回り込むレベルが高い場合、或いは、側音レベルが受話信号の最大振幅レベルに達してサチュレーションを起こしてしまうような場合には、サチュレーション区間でエコーキャンセラに用いられている適応フィルタの学習がうまく進まず、従って、側音信号のゲインが推定できない状態に陥る。この結果としてエコーキャンセラを用いて側音信号を抑圧処理することは難しく、送話音声と比較して受話音声の音声認識率が低い欠点がある。 In the normal telephone set 19, the side sound in which the transmission signal wraps around the reception signal in the side sound circuit 15 inside the telephone set 19 is superimposed on the reception signal, and it is difficult to extract only the reception signal. Further, when the level of the transmission signal is relatively high or the gain of the side sound circuit 15 is high, the level of the transmission signal wraps around the reception signal as a side sound signal is high, or the side sound level is the maximum amplitude of the reception signal. In the case where saturation is caused by reaching the level, learning of the adaptive filter used in the echo canceller does not proceed well in the saturation section, and therefore, the gain of the side sound signal cannot be estimated. As a result, it is difficult to suppress the side-tone signal using the echo canceller, and there is a drawback that the voice recognition rate of the received voice is lower than that of the transmitted voice.

この発明の目的は送話音声と受話音声とを別々に音声認識する音声認識方法及び装置において、受話音声の音声認識率を向上することができる音声認識方法、及び装置を提供しようとするものである。 An object of the present invention is to provide a speech recognition method and apparatus capable of improving the speech recognition rate of received speech in a speech recognition method and device for recognizing separately transmitted speech and received speech. is there.

この発明による音声認識装置は電話機から送話信号と受話信号とを取り出し、これら送話信号と受話信号を別々に音声認識するための音声認識装置において、送話信号が受話信号に回り込むゲインを推定する側音ゲイン推定部と、受話信号の最大振幅を推定する受話信号最大振幅推定部と、推定した側音ゲインと受話信号最大振幅を用いて送話信号に対する側音信号を推定する側音信号推定部と、推定した側音信号により受話信号から側音信号を抑圧する側音抑圧部と、側音抑圧処理後の受話音声と送話音声とを別々に音声認識する音声認識部とを含むことを特徴とする。 A voice recognition apparatus according to the present invention extracts a transmission signal and a reception signal from a telephone, and estimates a gain at which the transmission signal wraps around the reception signal in a voice recognition apparatus for recognizing the transmission signal and the reception signal separately. Side-tone gain estimating section, received-signal maximum amplitude estimating section for estimating the maximum amplitude of the received signal, and side-tone signal for estimating the side-tone signal for the transmitted signal using the estimated side-tone gain and received-signal maximum amplitude An estimation unit, a side tone suppression unit that suppresses the side tone signal from the received signal based on the estimated side tone signal, and a voice recognition unit that separately recognizes the received voice and the transmitted voice after the side tone suppression processing It is characterized by that.

この発明による音声認識装置によれば、常時側音ゲインレベルを推定するから、受話信号から側音と呼ばれる送話信号成分を抑圧することで、送話信号と受話信号を可能な限り分けて音声認識することが可能となる。つまり、受話信号に対して音声認識処理を施す場合でも、側音成分を含んだ信号よりも側音成分を除去した信号に対して音声認識処理を施した方が高い認識性能が得られる。また、サチュレーションを起こしてしまうような側音信号に対しても、側音レベルゲインの推定を行っているため、側音信号の抑圧処理が可能であり、これにより受話信号の音声認識率の低減を抑制することができる。 According to the speech recognition apparatus of the present invention, since the side sound gain level is always estimated, the transmitted signal component called the side sound is suppressed from the received signal, so that the transmitted signal and the received signal are separated as much as possible. It becomes possible to recognize. That is, even when the speech recognition process is performed on the received signal, higher recognition performance can be obtained by performing the speech recognition process on the signal from which the side sound component is removed than the signal including the side sound component. In addition, side sound level gain is estimated even for side sound signals that cause saturation, so side sound signal suppression processing is possible, thereby reducing the speech recognition rate of the received signal. Can be suppressed.

この発明による音声認識方法及び音声認識装置はハードウェアによって実現することも可能であるが、それよりも簡素に実現するにはコンピュータにこの発明による音声認識プログラムをインストールし、コンピュータにこの発明による音声認識方法を実行させる実施形態が最良の実施形態である。
コンピュータにこの発明による音声認識方法を実行させるには、コンピュータを以下の手順で動作させればよい。
送話信号が受話信号に回り込むゲインを推定する側音ゲイン推定過程と、受話信号の最大振幅を推定する受話信号最大振幅推定過程と、推定した側音ゲインと受話信号最大振幅を用いて送話信号に対する側音信号を推定する側音信号推定過程と、推定した側音信号により受話信号から側音信号を抑圧する側音抑圧過程と、側音抑圧処理後の受話音声と送話音声とを別々に音声認識する音声認識過程とを実行させ、音声認識装置として機能させる。 The speech recognition method and speech recognition apparatus according to the present invention can be realized by hardware. However, in order to achieve a simpler implementation, the speech recognition program according to the present invention is installed in a computer, and the speech according to the present invention is installed in the computer. The embodiment in which the recognition method is executed is the best embodiment.
In order for a computer to execute the speech recognition method according to the present invention, the computer may be operated according to the following procedure.
Sidetone gain estimation process for estimating the gain that the transmitted signal wraps around the received signal, received signal maximum amplitude estimation process for estimating the maximum amplitude of the received signal, and transmission using the estimated sidetone gain and received signal maximum amplitude Side sound signal estimation process for estimating the side sound signal for the signal, side sound suppression process for suppressing the side sound signal from the received signal by the estimated side sound signal, and the received voice and the transmitted voice after the side sound suppression processing. A voice recognition process is performed separately for voice recognition to function as a voice recognition device.

図１にこの発明による音声認識装置の実施例を示す。図３と対応する部分には同一の符号を付して示す。この実施例でも分岐装置２０から送話信号と受話信号とを別々に取り出し、音声認識装置１１０に入力する点は図３の場合と同じである。
この発明では音声認識装置１１０に分岐装置２０から取り出した受話信号から受話信号最大振幅を推定する受話信号最大振幅推定部２１と、側音ゲイン推定部２２と、側音抑圧処理部２３とが設けられる。 FIG. 1 shows an embodiment of a speech recognition apparatus according to the present invention. Parts corresponding to those in FIG. 3 are denoted by the same reference numerals. In this embodiment, the transmission signal and the reception signal are separately extracted from the branching device 20 and input to the voice recognition device 110 as in the case of FIG.
In the present invention, the speech recognition device 110 is provided with a received signal maximum amplitude estimating unit 21 for estimating a received signal maximum amplitude from a received signal taken out from the branching device 20, a side sound gain estimating unit 22, and a side sound suppression processing unit 23. It is done.

受話信号最大振幅推定部２１は受話信号をある程度の時間（例えば１秒程度以上）蓄積したデータ系列から最大振幅レベルを推定する。側音ゲイン推定部２２では受話信号最大振幅推定部２１で推定した最大振幅レベルを手掛かりに最大振幅レベルに達していない非最大振幅レベル区間を検出し、この非最大振幅レベル区間において送話信号と受話信号の関係から側音ゲインを推定する。 The received signal maximum amplitude estimating unit 21 estimates the maximum amplitude level from a data series in which the received signal is accumulated for a certain time (for example, about 1 second or more). The side tone gain estimation unit 22 detects a non-maximum amplitude level section that does not reach the maximum amplitude level by using the maximum amplitude level estimated by the reception signal maximum amplitude estimation unit 21 as a clue, and the transmission signal is detected in the non-maximum amplitude level section. Sidetone gain is estimated from the relationship of the received signal.

側音抑圧処理部２３では側音ゲイン推定部２２で推定した側音ゲインと、最大振幅レベルとから送話信号に対する側音信号を推定し、受話信号に対して側音信号の抑圧処理を施すことで側音信号を含まない受話信号を得る。
側音信号と送話信号の間の時間の遅れは比較的少なく、時間遅れを無視してもある程度の抑圧性能が得られる。最大振幅レベルや、側音ゲインに関しては、フレーム毎に更新していくことで、信号の長さに応じて精度が向上していく。 The side tone suppression processing unit 23 estimates a side tone signal for the transmitted signal from the side tone gain estimated by the side tone gain estimating unit 22 and the maximum amplitude level, and performs a side tone signal suppression process on the received signal. In this way, a reception signal that does not include a side tone signal is obtained.
The time delay between the side sound signal and the transmission signal is relatively small, and a certain degree of suppression performance can be obtained even if the time delay is ignored. By updating the maximum amplitude level and sidetone gain for each frame, the accuracy is improved according to the signal length.

側音信号がサチュレーションを起こしてしまっている区間に対して、送られて来た受話信号が重畳している場合は、抑圧処理に大きな歪みが生じてしまう可能性もあるが、元々サチュレーション区間の信号は信頼性も低くクリーン信号に戻すのは困難であり、音声認識性能も期待できないが、この発明では送話信号に起因する側音によりサチュレーションとなる区間が分かるので、例えば、信号を０詰めしてしまうような音声認識に悪影響が起き難いような抑圧処理も可能である。 If the received signal is superimposed on the section where the side sound signal has been saturated, there is a possibility that a large distortion will occur in the suppression process. The signal is low in reliability and difficult to return to a clean signal, and speech recognition performance cannot be expected. However, in the present invention, since a section where saturation is caused by a side sound caused by a transmission signal is known, for example, the signal is padded with zeros. Therefore, it is possible to perform suppression processing that hardly causes adverse effects on voice recognition.

図２にこの発明による音声認識プログラムの概要を説明するためのフローチャートを示す。
受話信号から受話信号最大振幅推定部２１で最大振幅レベルを推定する（ステップＳ１、Ｓ２）。
受話信号最大振幅レベルより低い受話信号の区間の送話／受話信号から、側音ゲイン推定部２２で、側音ゲイン（受話／送話信号比）を計算、推定し（ステップＳ３、Ｓ４）。
側音抑圧処理部２３で、送話信号に推定された側音ゲインを乗算した後、受話信号最大振幅レベルを上限とした振幅となるように側音信号を推定し（ステップＳ５）。 FIG. 2 shows a flowchart for explaining the outline of the speech recognition program according to the present invention.
The received signal maximum amplitude estimating unit 21 estimates the maximum amplitude level from the received signal (steps S1 and S2).
The side-tone gain estimation unit 22 calculates and estimates a side-tone gain (received / send-signal ratio) from the transmitted / received signal in the received signal interval lower than the received signal maximum amplitude level (steps S3 and S4).
After the side sound suppression processing unit 23 multiplies the transmission signal by the estimated side sound gain, the side sound signal is estimated to have an amplitude with the received signal maximum amplitude level as an upper limit (step S5).

受話信号から側音信号を減算することで側音信号の抑圧を行う（ステップＳ６）。最後に抑圧処理後の信号に対して認識処理を行う（ステップＳ７）。
以上説明したように、この発明によれば受話信号から側音信号を抑圧することで送話信号と受話信号を可能な限り分けて音声認識することが可能となる。従って、受話信号に対して音声認識処理を施す場合でも、側音信号を含んだ信号よりも側音成分を除去した信号に対して音声認識処理を施した方が高い認識性能が得られる。またサチュレーションを起こしてしまうような側音信号に対しても側音レベルゲインの推定を行っているため、抑圧処理が可能となる。 The side tone signal is suppressed by subtracting the side tone signal from the received signal (step S6). Finally, recognition processing is performed on the signal after suppression processing (step S7).
As described above, according to the present invention, by suppressing the side sound signal from the reception signal, it is possible to recognize the voice by dividing the transmission signal and the reception signal as much as possible. Therefore, even when the speech recognition process is performed on the received signal, higher recognition performance can be obtained by performing the speech recognition process on the signal from which the side sound component is removed than the signal including the side sound signal. Further, since the side sound level gain is estimated even for the side sound signal that causes saturation, suppression processing can be performed.

上述したこの発明による音声認識方法及びこの音声認識方法で定める手順に従って動作する音声認識装置はハードウェアによって構成することもできるが、最も簡素に実現するにはコンピュータにこの発明による音声認識プログラムをインストールし、コンピュータに音声認識装置として機能させる実施形態が最良である。
この発明による音声認識プログラムはコンピュータが解読可能なプログラム言語によって記述され、コンピュータが読み取り可能な磁気ディスク或いはＣＤ−ＲＯＭのような記録媒体に記録され、これらの記録媒体から或いは通信回線を通じてコンピュータにインストールされる。インストールされたプログラムがコンピュータに備えられたＣＰＵで解読されることにより、コンピュータは図２に示した手順に従って音声認識動作を実行する。 The speech recognition method according to the present invention and the speech recognition apparatus that operates according to the procedure defined by the speech recognition method can be configured by hardware. However, in order to realize the simplest, the speech recognition program according to the present invention is installed in a computer. However, the embodiment in which the computer functions as a voice recognition device is the best.
The voice recognition program according to the present invention is written in a computer-readable program language, recorded on a recording medium such as a magnetic disk or CD-ROM that can be read by the computer, and installed in the computer from these recording media or through a communication line. Is done. When the installed program is decoded by a CPU provided in the computer, the computer executes a speech recognition operation in accordance with the procedure shown in FIG.

この発明による音声認識方法及び音声認識装置は例えば自動案内装置或いは自動予約装置等の分野で活用される。 The speech recognition method and speech recognition device according to the present invention are utilized in the field of, for example, an automatic guidance device or an automatic reservation device.

この発明による音声認識装置の一実施例を説明するためのブロック図。The block diagram for demonstrating one Example of the speech recognition apparatus by this invention. この発明による音声認識方法の手順を説明するためのフローチャート。The flowchart for demonstrating the procedure of the speech recognition method by this invention. 従来の技術を説明するためのブロック図。The block diagram for demonstrating the prior art.

Explanation of symbols

１１マイク部１９電話機
１２スピーカ部２０分岐装置
１３送信部２１受話信号最大振幅推定部
１４受信部２２側音ゲイン推定部
１５側音回路２３側音抑圧処理部
１６送話信号録音部１１０音声認識装置
１７受話信号録音部１１１音声認識処理部
１８ハンドセット DESCRIPTION OF SYMBOLS 11 Microphone part 19 Telephone 12 Speaker part 20 Branching device 13 Transmission part 21 Received signal maximum amplitude estimation part 14 Reception part 22 Side sound gain estimation part 15 Side sound circuit 23 Side sound suppression process part 16 Transmission signal recording part 110 Speech recognition apparatus
17 Received signal recording unit 111 Speech recognition processing unit 18 Handset

Claims

In a voice recognition method for taking out a transmission signal and a reception signal from a telephone and recognizing these transmission signals and reception signals separately,
Sidetone gain estimation process for estimating the gain that the transmitted signal wraps around the received signal;
A process for estimating the maximum amplitude of the received signal for estimating the maximum amplitude of the received signal;
Using the estimated sidetone gain and received signal maximum amplitude to estimate the sidetone signal for the transmitted signal;
A side-tone suppression process that suppresses the side-tone signal from the received signal based on the estimated side-tone signal;
A speech recognition process for recognizing the received speech and the transmitted speech separately after side-tone suppression processing;
A speech recognition method comprising:

2. The speech recognition method according to claim 1, wherein the side-tone gain estimation is calculated based on a ratio of a reception signal and a transmission signal.

3. The speech recognition method according to claim 1, wherein the maximum amplitude of the received signal is estimated based on the maximum amplitude in the received signal cut out every unit time.

In a voice recognition device for taking out a transmission signal and a reception signal from a telephone and recognizing these transmission signals and reception signals separately,
A sidetone gain estimator that estimates the gain that the transmitted signal wraps around the received signal;
A received signal maximum amplitude estimating unit for estimating the maximum amplitude of the received signal;
A side-tone signal estimating unit that estimates a side-tone signal with respect to the transmitted signal using the estimated side-tone gain and the maximum amplitude of the received signal;
A side-tone suppression processing unit that suppresses the side-tone signal from the received signal based on the estimated side-tone signal;
A voice recognition processing unit for recognizing separately the received voice and the transmitted voice after the side-tone suppression processing;
A speech recognition apparatus comprising:

5. The speech recognition apparatus according to claim 4, wherein the side-tone gain estimation unit is configured by calculation means for calculating based on a ratio between the received signal and the transmitted signal.

6. The speech recognition apparatus according to claim 4, wherein the received signal maximum amplitude estimating unit includes recording means for recording an received signal cut out every unit time, and the received signal recorded in the recorded means is included in the received signal. A speech recognition apparatus for estimating a maximum amplitude of a received signal with a maximum amplitude.

A speech recognition program written in a computer-readable program language and causing the computer to function as the speech recognition apparatus according to claim 4.

A recording medium comprising a computer-readable recording medium, wherein the voice recognition program according to claim 7 is recorded on the recording medium.