JP4709714B2

JP4709714B2 - Echo canceling apparatus, method thereof, program thereof, and recording medium thereof

Info

Publication number: JP4709714B2
Application number: JP2006232519A
Authority: JP
Inventors: 和則小林; 賢一古家; 陽一羽田; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-08-29
Filing date: 2006-08-29
Publication date: 2011-06-22
Anticipated expiration: 2026-08-29
Also published as: JP2008060715A

Description

この発明は、例えば、ＴＶ会議や音声会議などハンズフリー通信のエコー消去装置、その方法、そのプログラム、およびその記録媒体に関する。 The present invention relates to an echo canceling apparatus for hands-free communication such as a TV conference and an audio conference, a method thereof, a program thereof, and a recording medium thereof.

従来技術のエコー消去装置の機能構成例を図１に示し、スピーカを線形特性と非線形特性の並列接続したモデルで表したときのブロック図を図２に示す。
以下の説明において、再生手段はスピーカとして、収音手段はマイクロホンとして説明する。また、図示しない遠端話者よりの通信網２を経由した音声信号を受話信号ｘ（ｔ）と表し、近端話者７が発した音声を話者音声ｓ（ｔ）と表し、マイクロホンで収音された話者音声ｓ（ｔ）を話者受音信号ａ（ｔ）と表し、スピーカから放声された再生音がマイクロホンにまわり込み、収音された音声信号をエコー信号ｄ（ｔ）と表し、マイクロホンで収音した音声信号を受音信号ｙ（ｔ）と表し、遠端話者への送話信号または誤差信号をｅ（ｔ）と表す。ただしｔは離散的時刻を表す。
従来のエコー消去装置１４は可変フィルタ８、減算部１０、フィルタ係数更新部１２、とで構成されている。従来のエコー消去装置１４はスピーカ４とマイクロホン６を用いた拡声通話において、受音信号ｙ（ｔ）に混入されたエコー信号ｄ（ｔ）を消去する。エコー消去装置１４の入力信号、（マイクロホン６で収音される音声信号）つまり受音信号ｙ（ｔ）はエコー信号ｄ（ｔ）と話者受音信号ａ（ｔ）とからなる。従来のエコー消去装置１４は、受音信号ｙ（ｔ）に含まれるエコー信号ｄ（ｔ）を推定して、受音信号ｙ（ｔ）から差し引くことにより、受音信号ｙ（ｔ）に含まれるエコー信号ｄ（ｔ）を消去する。 FIG. 1 shows an example of a functional configuration of a conventional echo canceling apparatus, and FIG. 2 shows a block diagram when a speaker is represented by a model in which linear characteristics and nonlinear characteristics are connected in parallel.
In the following description, the reproducing means is described as a speaker, and the sound collecting means is described as a microphone. In addition, a voice signal from a far-end speaker (not shown) via the communication network 2 is represented as a received signal x (t), and a voice uttered by the near-end speaker 7 is represented as a speaker voice s (t). The collected speaker voice s (t) is represented as a speaker received signal a (t), and the reproduced sound emitted from the speaker wraps around the microphone, and the collected voice signal is echo signal d (t). The voice signal picked up by the microphone is represented as a received signal y (t), and the transmitted signal or error signal to the far-end speaker is represented as e (t). However, t represents a discrete time.
The conventional echo canceling device 14 includes a variable filter 8, a subtracting unit 10, and a filter coefficient updating unit 12. The conventional echo canceller 14 cancels the echo signal d (t) mixed in the received sound signal y (t) in the loudspeaking call using the speaker 4 and the microphone 6. The input signal of the echo canceler 14 (the sound signal collected by the microphone 6), that is, the sound reception signal y (t) is composed of the echo signal d (t) and the speaker sound reception signal a (t). The conventional echo canceller 14 estimates the echo signal d (t) included in the received sound signal y (t) and subtracts it from the received sound signal y (t), thereby including it in the received sound signal y (t). The echo signal d (t) to be deleted is deleted.

まず、図示しない遠端話者からの音声信号が通信網２を経由して、受話信号ｘ（ｔ）として、スピーカ４に入力される。また、スピーカ特性が線形であるとした場合、その特性をｇ（ｔ）とし、スピーカ４からマイクロホン６までのインパルス応答をｒ_１（ｔ）とすると、スピーカ４からマイクロホン６に回り込み、収音されたエコー信号ｄ（ｔ）は以下の式（１）で表すことが出来る。
ｄ（ｔ）＝ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）（１）
ただし＊は畳み込み演算を表す。次に、近端話者７からマイクロホン６までのインパルス応答をｃ_１（ｔ）と表し、上述したように、近端話者７からの音声はｓ（ｔ）であるので、受音信号ｙ（ｔ）は以下の式（２）で表すことができる。
ｙ（ｔ）＝ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）＋ｃ_１（ｔ）＊ｓ（ｔ）（２）
ここで、エコー消去装置１４に求められるのは、受音信号ｙ（ｔ）に含まれるエコー信号ｄ（ｔ）を消去することである。つまり、式（２）の右辺の第１項ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）の成分を消去することである。
可変フィルタ８で、推定したインパルス応答（以下、擬似インパルス応答ｈ（ｔ）という）を受話信号ｘ（ｔ）に畳み込み、可変フィルタ８よりの出力信号である擬似エコー信号ｈ（ｔ）＊ｘ（ｔ）を出力する。減算部１０で受音信号ｙ（ｔ）から擬似エコー信号ｈ（ｔ）＊ｘ（ｔ）を減算して、減算部１０はエコー信号ｄ（ｔ）を消去した送話信号ｅ（ｔ）を出力する。送話信号ｅ（ｔ）を式で表せば以下の式（３）になる。
ｅ（ｔ）＝｛ｇ（ｔ）＊ｒ_１（ｔ）−ｈ（ｔ）｝＊ｘ（ｔ）＋ｃ_１（ｔ）＊ｓ（ｔ）（３） First, a voice signal from a far-end speaker (not shown) is input to the speaker 4 as the received signal x (t) via the communication network 2. If the speaker characteristic is linear, the characteristic is g (t), and the impulse response from the speaker 4 to the microphone 6 is r ₁ (t). The echo signal d (t) can be expressed by the following equation (1).
d (t) = g (t) * r ₁ (t) * x (t) (1)
However, * represents a convolution operation. Next, the impulse response from the near-end speaker 7 to the microphone 6 is expressed as c ₁ (t), and as described above, the sound from the near-end speaker 7 is s (t), so the received signal y (T) can be expressed by the following formula (2).
y (t) = g (t) * r ₁ (t) * x (t) + c ₁ (t) * s (t) (2)
Here, what is required of the echo canceller 14 is to cancel the echo signal d (t) included in the received sound signal y (t). That is, the component of the first term g (t) * r ₁ (t) * x (t) on the right side of Expression (2) is eliminated.
The variable filter 8 convolves the estimated impulse response (hereinafter referred to as a pseudo impulse response h (t)) with the received signal x (t), and the pseudo echo signal h (t) * x ( t) is output. The subtraction unit 10 subtracts the pseudo echo signal h (t) * x (t) from the sound reception signal y (t), and the subtraction unit 10 uses the transmission signal e (t) from which the echo signal d (t) is deleted. Output. If the transmission signal e (t) is expressed by an equation, the following equation (3) is obtained.
e (t) = {g (t) * r ₁ (t) −h (t)} * x (t) + c ₁ (t) * s (t) (3)

可変フィルタのフィルタ係数（擬似インパルス応答ｈ（ｔ））は、フィルタ係数更新部１２で受話信号ｘ（ｔ）と送話信号ｅ（ｔ）等を用いて更新される。この更新には、学習同定（ＮＬＭＳ：ＮｏｒｍａｌｉｚｅｄＬｅａｓｔ−Ｍｅａｎ−Ｓｑｕａｒｅｓ）アルゴリズム、もしくは射影アルゴリズム、もしくは逐次最小二乗（ＲｅｃｕｒｓｉｖｅＬｅａｓｔＳｑｕａｒｅ）アルゴリズム、もしくはＬＭＳ（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ）アルゴリズム等を用いられる。例えば、学習同定アルゴリズムを用いると、更新式は以下の式（４）になる。
Ｈ(t+1)＝Ｈ（ｔ）＋ａ・Ｘ（ｔ）・ｅ（ｔ）／Ｘ（ｔ）^ＴＸ（ｔ）（４）
ここで、ａは事前に設定されたステップサイズであり、０＜ａ＜２の値をとり、Ａ^ＴはベクトルＡの転置行列を表し、Ｈ（ｔ）は時刻ｔにおけるフィルタ係数Ｈ（ｔ）＝(ｈ(０)，ｈ(１)，…，ｈ(Ｌ−１))^Ｔで表され、Ｌは可変フィルタ８のフィルタのタップ長であり、Ｘ（ｔ）は時刻ｔにおける受話信号ｘ（ｔ）のＬサンプル分のベクトルであり、Ｘ（ｔ）＝(ｘ(ｔ−０)，ｘ(ｔ−１)，…，ｘ(ｔ−Ｌ＋１))^Ｔで表す。 The filter coefficient (pseudo impulse response h (t)) of the variable filter is updated by the filter coefficient updating unit 12 using the received signal x (t), the transmitted signal e (t), and the like. For this update, a learning identification (NLMS: Normalized-Lean-Squares) algorithm, a projection algorithm, a sequential least-square (RecursiveLeastSquare) algorithm, an LMS (LeastMeanSquare) algorithm, or the like is used. For example, when the learning identification algorithm is used, the update formula is expressed by the following formula (4).
H (t + 1) = H (t) + a.X (t) .e (t) / X (t) ^TX (t) (4)
Here, a is a preset step size, takes a value of 0 <a <2, ^AT represents a transposed matrix of vector A, and H (t) is a filter coefficient H (t) at time t. = (H (0), h (1),..., H (L-1)) ^T , L is the tap length of the filter of the variable filter 8, and X (t) is the received signal x at time t. It is a vector for L samples of (t), and is represented by X (t) = (x (t-0), x (t-1), ..., x (t-L + 1)) ^T.

上述したように、フィルタ係数更新部１２は受話信号ｘ（ｔ）と送話信号ｅ（ｔ）から、上記式（４）等で表す更新式を用いて、可変フィルタ８のフィルタ係数を更新する。可変フィルタ８は更新されたフィルタ係数で逐次、受話信号ｘ（ｔ）をフィルタリングする。以上の処理により、受音信号ｙ（ｔ）に含まれるエコー信号ｄ（ｔ）が消去される。エコー信号ｄ（ｔ）が消去された送話信号ｅ（ｔ）が通信網２を通じて、図示しない遠端話者に出力される。なお、従来のエコー消去装置の詳細は特許文献１に記載されている。
特許第２６０２７５０号 As described above, the filter coefficient updating unit 12 updates the filter coefficient of the variable filter 8 from the reception signal x (t) and the transmission signal e (t) using the update expression represented by the above expression (4) or the like. . The variable filter 8 sequentially filters the received signal x (t) with the updated filter coefficient. By the above processing, the echo signal d (t) included in the sound reception signal y (t) is deleted. The transmission signal e (t) from which the echo signal d (t) has been deleted is output to the far end speaker (not shown) through the communication network 2. The details of the conventional echo canceller are described in Patent Document 1.
Patent No. 2602750

従来のエコー消去装置１４においては、エコー信号ｄ（ｔ）を十分に消去できない場合があった。この理由は、次の点にあると考えられるに至った。
即ち、一般的に、スピーカ特性は、振幅の大きな入力信号（受話信号ｘ（ｔ））に対して、出力が頭打ちになるような非線形の特性をもっているので、スピーカ特性は線形と非線形の特性に分けて考えられる。即ちスピーカ特性は、図２に示すように、線形特性の部分のスピーカ４のインパルス応答をｇ（ｔ）、非線形の部分の特性を関数ｆ（・）との並列特性で表すことが出来る。このスピーカ特性を考慮するとエコー信号ｄ（ｔ）は以下の式（５）で表すことができる。
ｄ（ｔ）＝ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）＋ｒ_１（ｔ）＊ｆ(ｘ（ｔ）) （５）
また受音信号ｙ（ｔ）は以下の式（６）で表すことができる。
ｙ（ｔ）＝ｄ（ｔ）＋ｃ_１（ｔ）＊ｓ（ｔ）
＝ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）＋ｒ_１（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_１（ｔ）＊ｓ（ｔ）（６）
従って、送話信号ｅ（ｔ）は以下の式（７）になる。
ｅ（ｔ）＝｛ｇ（ｔ）＊ｒ_１（ｔ）−ｈ（ｔ）｝＊ｘ（ｔ）＋ｒ_１（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_１（ｔ）＊ｓ（ｔ）（７）
しかし、従来のエコー消去装置１４においては、消去可能なエコーは、線形のエコー経路を通って、マイクロホン６に到達したエコーのみで、非線形のエコーは消去できない。つまり、上記式（６）の右辺の第１項であるｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）のみが消去され、第２項であるｒ_１（ｔ）＊ｆ(ｘ（ｔ）)の成分を消去することができず、送話信号ｅ（ｔ）に非線形のエコー信号が混入してしまう。従って、非線形が強いスピーカ等を用いた場合、十分にエコー信号を消去できないという問題があった。 In the conventional echo canceller 14, the echo signal d (t) may not be sufficiently cancelled. The reason for this is thought to be as follows.
That is, in general, the speaker characteristic has a nonlinear characteristic such that the output reaches a peak with respect to an input signal (received signal x (t)) having a large amplitude. It can be considered separately. That is, as shown in FIG. 2, the speaker characteristics can be expressed by the parallel characteristics of the linear response portion with the impulse response of the speaker 4 as g (t) and the nonlinear portion as the function f (·). Considering the speaker characteristics, the echo signal d (t) can be expressed by the following equation (5).
d (t) = g (t) * r ₁ (t) * x (t) + r ₁ (t) * f (x (t)) (5)
The sound reception signal y (t) can be expressed by the following equation (6).
y (t) = d (t) + c ₁ (t) * s (t)
= G (t) * r ₁ (t) * x (t) + r ₁ (t) * f (x (t)) + c ₁ (t) * s (t) (6)
Accordingly, the transmission signal e (t) is expressed by the following equation (7).
e (t) = {g (t) * r ₁ (t) −h (t)} * x (t) + r ₁ (t) * f (x (t)) + c ₁ (t) * s (t) (7)
However, in the conventional echo canceller 14, the only echo that can be canceled is the echo that reaches the microphone 6 through the linear echo path, and the nonlinear echo cannot be canceled. That is, only g (t) * r ₁ (t) * x (t), which is the first term on the right side of the above formula (6), is deleted, and r ₁ (t) * f (x (x), which is the second term. The component t)) cannot be eliminated, and a non-linear echo signal is mixed in the transmission signal e (t). Therefore, there is a problem that the echo signal cannot be sufficiently erased when using a speaker having a strong nonlinearity.

この発明によれば、受話信号を再生手段により、再生音に変換して、放声し、話者用収音手段よりの話者受音信号と上記再生音が回り込んだ信号（以下、エコー信号という）からなる受音信号から上記エコー信号を除去して、送話信号として出力するエコー消去装置において、上記受話信号から、第１の擬似エコー信号を生成し、上記複数の収音手段中よりの受音信号から、第２の擬似エコー信号を生成し、上記話者用収音手段よりの受音信号から上記第１の擬似エコー信号及び上記第２の擬似エコー信号を減算して、上記送話信号を出力し、少なくとも、上記受話信号と上記送話信号とが入力され、上記第１の可変フィルタのフィルタ係数を更新し、少なくとも、上記複数の収音手段中よりの受音信号と上記送話信号とから、上記第２の可変フィルタのフィルタ係数を更新し、受話信号のレベルを検出し、検出されたレベルが予め決められた閾値より小さい場合は、上記第２の可変フィルタを稼動させず、当該第２の可変フィルタの出力を０とする。 According to the present invention, the received signal is converted into a reproduced sound by the reproducing means, and is emitted, and a signal (hereinafter referred to as an echo signal) in which the speaker received signal from the sound collecting means for the speaker and the reproduced sound are circulated. In the echo canceling apparatus that removes the echo signal from the received sound signal and outputs it as a transmitted signal, a first pseudo echo signal is generated from the received signal, and the plurality of sound collecting means A second pseudo echo signal is generated from the received sound signal, and the first pseudo echo signal and the second pseudo echo signal are subtracted from the received sound signal from the speaker sound collecting means, A transmission signal is output; at least the reception signal and the transmission signal are input; a filter coefficient of the first variable filter is updated; and at least a reception signal from the plurality of sound collection units; From the transmission signal, the second possible Update the filter coefficients of the filter, to detect the level of the received signal, if the detected level is lower than a predetermined threshold, without operating the second variable filter, the output of the second variable filter Is set to 0 .

上記の構成により、非線形性が強いスピーカ等を用いた場合でも、エコー信号の除去能力を高めることが出来、送話信号の品質を高めることができる。 With the above configuration, even when a loudspeaker or the like with strong nonlinearity is used, the ability to remove echo signals can be enhanced, and the quality of the transmitted signal can be enhanced.

以下に、この発明を実施するための最良の形態を示す。 The best mode for carrying out the present invention will be described below.

この発明の機能構成例を図３に示し、この発明の主要な処理の流れを図４に示す。図１と同一機能構成部分には同一参照番号を付け、重複説明を省略する。以下も同様とする。また、メインマイクロホン６１は話者用収音手段を話者の音声を収音するためのものであり、収音手段とはメインマイクロホンを含む全てのマイクロホンを表すこととする。
エコー消去装置４０は、第１の可変フィルタ２４、第２の可変フィルタ２６ｍ（ｍ＝２、．．．、Ｍ）、加算部３６、減算部３８、第１のフィルタ係数更新部３４、第２のフィルタ係数更新部３２、とで構成されている。また、マイクロホンの数は、複数個であるＭ個（Ｍ≧２）になり、マイクロホンを６ｍ（ｍ＝１、．．．、Ｍ）と表す。
この発明のエコー消去装置４０の入力信号は、スピーカ４よりの再生信号がまわり込んだエコー信号と近端話者７よりの音声信号であり、つまりマイクロホン６ｍで収音された受音信号ｙ_ｍ（ｔ）である。出力信号は図示しない遠端話者への送話信号ｅ（ｔ）である。エコー消去装置４０は、エコー信号ｄ（ｔ）を消去し、会話をしやすくする。また、エコー消去装置４０の各入力信号は図示しないＡＤ変換部で、アナログ信号から離散時間の信号（ディジタル信号）に変換され、エコー消去装置４０の各出力信号は、図示しないＤＡ変換部で離散時間の信号（ディジタル信号）からアナログ信号に変換される。 An example of the functional configuration of the present invention is shown in FIG. 3, and the main processing flow of the present invention is shown in FIG. The same functional components as those in FIG. 1 are denoted by the same reference numerals, and redundant description is omitted. The same applies to the following. The main microphone 61 is for collecting the voice of the speaker by the speaker sound collecting means, and the sound collecting means represents all microphones including the main microphone.
The echo canceller 40 includes a first variable filter 24, a second variable filter 26m (m = 2,..., M), an adder 36, a subtractor 38, a first filter coefficient update unit 34, a second filter And the filter coefficient updating unit 32. The number of microphones is a plurality of M (M ≧ 2), and the microphones are represented as 6 m (m = 1,..., M).
The input signal of the echo canceling device 40 of the present invention is an echo signal in which a reproduction signal from the speaker 4 wraps around and an audio signal from the near-end speaker 7, that is, a received sound signal y _m collected by the microphone 6m. (T). The output signal is a transmission signal e (t) to a far-end speaker (not shown). The echo canceller 40 cancels the echo signal d (t) to facilitate conversation. Each input signal of the echo canceller 40 is converted from an analog signal to a discrete time signal (digital signal) by an AD converter (not shown), and each output signal of the echo canceler 40 is discretely converted by a DA converter (not shown). A time signal (digital signal) is converted into an analog signal.

受話信号ｘ（ｔ）はスピーカ４により再生音として再生される。スピーカ４内の信号の流れの詳細は上述と同様、図２に示す。上述の通り、スピーカ特性は、振幅の大きな信号の入力に対して、出力が頭打ちになるような非線形の特性をもっているので、スピーカ特性は線形と非線形の特性に分けて考える。線形特性の部分のインパルス応答ｇ（ｔ）、非線形の部分の特性を関数ｆ（・）として表す。またスピーカから各マイクロホン６ｍ（ｍ＝１、．．．、Ｍ）までのインパルス応答をｒ_ｍ（ｔ）と表し、近端話者７から各マイクロホン６ｍまでのインパルス応答をｃ_ｍ（ｔ）と表すと、各マイクロホン６ｍでの受音信号ｙ_ｍ（ｔ）は以下の式（８）で表すことができる。
ｙ_ｍ（ｔ）＝ｇ（ｔ）＊ｒ_ｍ（ｔ）＊ｘ（ｔ）＋ｒ_ｍ（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_ｍ（ｔ）＊ｓ（ｔ）（８）
この実施例１では、図示しない話者受音信号選出手段は、複数のマイクロホン６ｍから、エコー信号レベル対話者受音信号レベルの比の時間平均値が、他のマイクロホン６ｍに比べて小さいマイクロホン６ｍをメインマイクロホン６１として選出した場合である。（ステップＳ２）。 The received signal x (t) is reproduced as reproduced sound by the speaker 4. The details of the signal flow in the speaker 4 are shown in FIG. 2 as described above. As described above, the speaker characteristic has a non-linear characteristic such that the output reaches a peak with respect to the input of a signal having a large amplitude. Therefore, the speaker characteristic is considered as being divided into a linear characteristic and a non-linear characteristic. The impulse response g (t) of the linear characteristic part and the characteristic of the nonlinear part are expressed as a function f (·). Further, the impulse response from the speaker to each microphone 6m (m = 1,..., M) is represented as r _m (t), and the impulse response from the near-end speaker 7 to each microphone 6m is represented as c _m (t). Denoting, received sound signals y _m at each microphone 6 m _(t) can be expressed by the following equation (8).
y _m (t) = g (t) * r _m (t) * x (t) + r _m (t) * f (x (t)) + c _m (t) * s (t) (8)
In the first embodiment, the speaker received signal selection means (not shown) has a microphone 6m whose time average value of the ratio of the echo signal level talker received signal level from the plurality of microphones 6m is smaller than that of the other microphones 6m. Is selected as the main microphone 61. (Step S2).

また図示しないエコー受音信号選出手段は、複数のマイクロホン６ｍから、エコー信号レベル対話者受音信号レベルの比の時間平均値が大きいマイクロホンを１つ以上選出した場合である。ただし、この場合、メインマイクロホン６１は選択しない（ステップＳ２）。
話者受音信号選出手段の選出の仕方の一例として、図３に記載のように、１つのマイクロホンをメインマイクロホン６１とし、メインマイクロホン６１の感度の高い方向を近端話者７方向に向け、その他のマイクロホンを全てサブマイクロホン６ｍ（ｍ＝２、．．．、Ｍ）として、サブマイクロホン６ｍの感度の高い方向をスピーカ４方向に向けることが考えられる。メインマイクロホン６１、サブマイクロホン６ｍの感度の方向については後ほど詳細に述べる。この場合、メインマイクロホン６１に向かって近端話者７は音声を発することになる。
メインマイクロホン６１および、サブマイクロホン６ｍで収音された受音信号ｙ_ｍ（ｔ）（ｍ＝１、．．．、Ｍ）は上記式（８）と同様に表すことができる。サブマイクロホン６ｍの受音信号ｙ_ｍ（ｔ）がそれぞれ対応する第２の可変フィルタ２６ｍ（ｍ＝２、．．．、Ｍ）に入力される。第２の可変フィルタ２６ｍが受音信号ｙ_ｍ（ｔ）に第２のフィルタ係数ｈ_ｍ（ｔ）を畳み込んで、つまり以下の式（９）を計算して第２の擬似エコー信号β_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）が生成される（ステップＳ４）。 The echo sound signal selection means (not shown) is a case where one or more microphones having a large time average value of ratios of echo signal level dialogue person sound signal levels are selected from the plurality of microphones 6m. However, in this case, the main microphone 61 is not selected (step S2).
As an example of the selection method of the speaker reception signal selection means, as shown in FIG. 3, one microphone is set as the main microphone 61, the direction of high sensitivity of the main microphone 61 is directed toward the near-end speaker 7, It is conceivable that all other microphones are sub-microphones 6m (m = 2,..., M) and the direction of high sensitivity of the sub-microphone 6m is directed toward the speaker 4. The direction of sensitivity of the main microphone 61 and the sub microphone 6m will be described in detail later. In this case, the near-end speaker 7 emits voice toward the main microphone 61.
The received sound signal y _m (t) (m = 1,..., M) collected by the main microphone 61 and the sub microphone 6m can be expressed in the same manner as the above equation (8). The received sound signal y _m (t) of the sub microphone 6m is input to the corresponding second variable filter 26m (m = 2,..., M). The second variable filter 26m convolves the received signal y _m (t) with the second filter coefficient h _m (t), that is, calculates the following equation (9) to calculate the second pseudo echo signal β _m. (T) (m = 2, ..., M) is generated (step S4).

β_ｍ（ｔ）＝ｙ_ｍ（ｔ）＊ｈ_ｍ（ｔ）（９）
第２の擬似エコー信号β_ｍ（ｔ）はそれぞれ加算部３６へ入力される。 β _m (t) = y _m (t) * h _m (t) (9)
The second pseudo echo signals β _m (t) are each input to the adder 36.

一方、受話信号ｘ（ｔ）は第１の可変フィルタ２４に入力される。第１の可変フィルタ２４が受話信号ｘ（ｔ）に第１のフィルタ係数ｈ_１（ｔ）を畳み込んで、つまり以下の式（１０）を計算して、第１の擬似エコー信号α（ｔ）が生成される（ステップＳ４）。
α（ｔ）＝ｘ（ｔ）＊ｈ_１（ｔ）（１０）
第１の擬似エコー信号α（ｔ）および第２の擬似エコー信号β_ｍ（ｔ）は加算部３６へ入力され、α（ｔ）＋Σ_ｉ＝２ ^Ｍβ_ｉ（ｔ）が計算される。加算された信号α（ｔ）＋Σ_ｉ＝２ ^Ｍβ_ｉ（ｔ）は、減算部３８に入力される。 On the other hand, the received signal x (t) is input to the first variable filter 24. The first variable filter 24 convolves the received signal x (t) with the first filter coefficient h ₁ (t), that is, calculates the following equation (10) to obtain the first pseudo echo signal α (t ) Is generated (step S4).
α (t) = x (t) * h ₁ (t) (10)
The first pseudo echo signal α (t) and the second pseudo echo signal β _m (t) are input to the adder 36, and α (t) + Σ _{i = 2} ^M β _i (t) is calculated. The added signal α (t) + Σ _{i = 2} ^M β _i (t) is input to the subtracting unit 38.

一方、メインマイクロホン６１で収音された受音信号ｙ_１（ｔ）は減算部３８に入力される。減算部３８で、受音信号ｙ_１（ｔ）から加算部３６よりの音声信号α（ｔ）＋Σ_ｍ＝２ ^Ｍβ_ｍ（ｔ）を減算し、つまり、以下の式（１１）が計算され、送話信号ｅ（ｔ）が出力される（ステップＳ６）。
ｅ（ｔ）＝ｙ_１（ｔ）−α（ｔ）−Σ_ｉ＝２ ^Ｍβ_ｉ（ｔ）
＝ｙ_１（ｔ）−ｈ_１（ｔ）＊ｘ（ｔ）−Σ_ｉ＝２ ^Ｍｙ_ｉ（ｔ）＊ｈ_ｉ（ｔ）
（１１）
上記式（８）より On the other hand, the received sound signal y ₁ (t) collected by the main microphone 61 is input to the subtractor 38. The subtracting unit 38 subtracts the audio signal α (t) + Σ _{m = 2} ^M β _m (t) from the adding unit 36 from the received sound signal y ₁ (t), that is, the following equation (11) is calculated. The transmission signal e (t) is output (step S6).
e (t) = y ₁ (t) −α (t) −Σ _{i = 2} ^M β _i (t)
_{_{= Y 1 (t) -h 1}} (t) * x (t) -Σ i = 2 M y i (t) * h i (t)
(11)
From the above equation (8)

ｙ_１（ｔ）＝ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）＋ｒ_１（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_１（ｔ）＊ｓ（ｔ）（１２）
ｙ_ｉ（ｔ）＝ｇ（ｔ）＊ｒ_ｉ（ｔ）＊ｘ（ｔ）＋ｒ_ｉ（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_ｉ（ｔ）＊ｓ（ｔ）（１３）
となるので、式（１２）（１３）を上記式（１１）へ代入すると、以下の式（１４）になる。 y ₁ (t) = g (t) * r ₁ (t) * x (t) + r ₁ (t) * f (x (t)) + c ₁ (t) * s (t) (12)
y _i (t) = g (t) * r _i (t) * x (t) + r _i (t) * f (x (t)) + c _i (t) * s (t) (13)
Therefore, when Expressions (12) and (13) are substituted into Expression (11), the following Expression (14) is obtained.

ｅ（ｔ）＝ｇ（ｔ）＊ｒ_１（ｔ）＊ｘ（ｔ）＋ｒ_１（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_１（ｔ）＊ｓ（ｔ）
−ｈ_１（ｔ）＊ｘ（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊｛ｇ（ｔ）＊ｒ_ｉ（ｔ）＊ｘ（ｔ）
＋ｒ_ｉ（ｔ）＊ｆ(ｘ（ｔ）)＋ｃ_ｉ（ｔ）＊ｓ（ｔ）｝
＝｛ｇ（ｔ）＊ｒ_１（ｔ）−ｈ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｇ（ｔ）＊ｒ_ｉ（ｔ）｝＊ｘ（ｔ）
＋｛ｒ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｒ_ｉ（ｔ）｝＊ｆ(ｘ（ｔ）)
＋｛ｃ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｃ_ｉ（ｔ）｝＊ｓ（ｔ）（１４） e (t) = g (t) * r ₁ (t) * x (t) + r ₁ (t) * f (x (t)) + c ₁ (t) * s (t)
_{-H 1 (t) * x (} t) -Σ i = 2 M h i (t) * {g (t) * r i (t) * x (t)
_{+ R i (t) * f} (x (t)) + c i (t) * s (t)}
_{= {G (t) * r} 1 (t) -h 1 (t) -Σ i = 2 M h i (t) * g (t) * r i (t)} * x (t)
+ {R ₁ (t) −Σ _{i = 2} ^M h _i (t) * r _i (t)} * f (x (t))
+ {C ₁ (t) −Σ _{i = 2} ^M h _i (t) * c _i (t)} * s (t) (14)

ここで、式（１４）の第１項｛ｇ（ｔ）＊ｒ_１（ｔ）−ｈ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｇ（ｔ）＊ｒ_ｉ（ｔ）｝＊ｘ（ｔ）は線形の音響エコー成分であり、｛ｇ（ｔ）＊ｒ_１（ｔ）−ｈ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｇ（ｔ）＊ｒ_ｉ（ｔ）｝＝０となるｈ_１（ｔ）とｈ_ｍ（ｔ）を設定すれば、線形の音響エコーを消去することが出来る。
式（１４）の第２項｛ｒ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｒ_ｉ（ｔ）｝＊ｆ(ｘ（ｔ）)は、非線形の音響エコー成分であり、ｒ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｒ_ｉ（ｔ）＝０となるｈ_ｍ（ｔ）を設定すれば、非線形の音響エコーを消去することが出来る。
上記話者受音信号選出手段および、エコー受音信号選出手段は必ずしも設けなくても良い。この場合は話者受音信号選出手段が設けられないメインマイクロホン６１からの話者音声を収音した受音信号が減算部３８へ入力され、エコー受音信号選出手段が設けられないサブマイクロホン６ｍよりの受音信号が第２の可変フィルタ２６ｍへ入力される。
式（１４）の第３項｛ｃ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｃ_ｉ（ｔ）｝＊ｓ（ｔ）は近端話者７の音声成分であり、メインマイクロホン６１で収音された近端話者７の音声成分ｃ_１（ｔ）＊ｓ（ｔ）に加え、劣化成分であるΣ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｃ_ｉ（ｔ）＊ｓ（ｔ）が存在している。 Here, the first term of the expression (14) {g (t) * r ₁ (t) −h ₁ (t) −Σ _{i = 2} ^M h _i (t) * g (t) * r _i (t) } * X (t) is a linear acoustic echo component, and {g (t) * r ₁ (t) −h ₁ (t) −Σ _{i = 2} ^M h _i (t) * g (t) * r By setting h ₁ (t) and h _m (t) where _i (t)} = 0, linear acoustic echo can be eliminated.
The second term {r ₁ (t) −Σ _{i = 2} ^M h _i (t) * r _i (t)} * f (x (t)) in Equation (14) is a nonlinear acoustic echo component, By setting h _m (t) where r ₁ (t) −Σ _{i = 2} ^M h _i (t) * r _i (t) = 0, nonlinear acoustic echo can be eliminated.
The speaker received signal selecting means and the echo received signal selecting means are not necessarily provided. In this case, a received sound signal obtained by picking up the speaker voice from the main microphone 61 without a speaker received signal selection means is input to the subtractor 38, and a sub microphone 6m without an echo received signal selection means is provided. Is received by the second variable filter 26m.
The third term {c ₁ (t) −Σ _{i = 2} ^M h _i (t) * c _i (t)} * s (t) in the equation (14) is a speech component of the near-end speaker 7, and In addition to the speech component c ₁ (t) * s (t) of the near-end speaker 7 picked up by the microphone 61, Σ _{i = 2} ^M h _i (t) * c _i (t) * s which is a degradation component (T) exists.

また、上記式（１４）の第３項に含まれる近端話者７の音声の劣化成分であるΣ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｃ_ｉ（ｔ）＊ｓ（ｔ）が大きくなってしまうと、送話音声の品質が悪くなってしまう。これを防ぐためには前述したエコー受音信号選択手段を用いる。以下に一例を説明する。簡略化のために、メインマイクロホン６１、サブマイクロホン６２、共に１個の場合を説明する。
上記｛ｃ_１（ｔ）−Σ_ｉ＝２ ^Ｍｈ_ｉ（ｔ）＊ｃ_ｉ（ｔ）｝＊ｓ（ｔ）に基づく近端話者７の音声の劣化を防ぐためには、ｈ_２（ｔ）とｃ_２（ｔ）を小さくする必要がある。このためには、上記話者受音信号選出手段およびエコー受音信号選出手段を設ければよい。例えば、メインマイクロホン６１およびサブマイクロホン６２の配置を変えることが考えられる。図５に示すように、メインマイクロホン６１およびサブマイクロホン６２として単一指向性マイクロホンを使用する。メインマイクロホン６１の感度の高い部分６１ａを近端話者７に向け、感度の低い部分６１ｂをスピーカ４に向ける。また、サブマイクロホン６２の感度の高い部分６２ａをスピーカ４に向けて、感度の低い部分６２ｂを近端話者７に向ける。このような配置をすることで、メインマイクロホン６１において、ｃ_２（ｔ）の振幅が、ｃ_１（ｔ）の振幅より小さくなる。さらに、ｒ_１（ｔ）の振幅がｒ_２（ｔ）の振幅よりも小さくなることで、エコーを消去するためのフィルタｈ_２（ｔ）の振幅も小さくなる。何故なら、上記式（１４）の第２項において、ｒ_１（ｔ）−ｈ_２（ｔ）＊ｒ_２（ｔ）＝０となるｈ_２（ｔ）を設定しているからである。この配置により、近端話者７の音声の劣化成分ｈ_２（ｔ）＊ｃ_２（ｔ）＊ｓ（ｔ）を小さくすることが出来る。
この実施例１において、図６に示すように、メインマイクロホン６１の感度の高い部分６１ａを近端話者７に向け、マイクロホン６ｍの感度の高い部分６ｍａをスピーカ４に向けることで、近端話者７の音声の劣化成分を小さくすることが出来る。 Also, Σ _{i = 2} ^M h _i (t) * c _i (t) * s (t), which is a degradation component of the speech of the near-end speaker 7 included in the third term of the above formula (14), becomes large. If this happens, the quality of the transmitted voice will deteriorate. In order to prevent this, the above-described echo sound signal selection means is used. An example will be described below. For simplification, the case where there is only one main microphone 61 and one sub microphone 62 will be described.
In order to prevent the deterioration of the voice of the near-end speaker 7 based on {c ₁ (t) −Σ _{i = 2} ^M h _i (t) * c _i (t)} * s (t), h ₂ (t ) And c ₂ (t) must be reduced. For this purpose, the above-mentioned speaker received signal selecting means and echo received signal selecting means may be provided. For example, the arrangement of the main microphone 61 and the sub microphone 62 can be changed. As shown in FIG. 5, unidirectional microphones are used as the main microphone 61 and the sub microphone 62. The high sensitivity portion 61 a of the main microphone 61 is directed toward the near-end speaker 7, and the low sensitivity portion 61 b is directed toward the speaker 4. Further, the highly sensitive portion 62 a of the sub microphone 62 is directed to the speaker 4, and the low sensitive portion 62 b is directed to the near-end speaker 7. With this arrangement, in the main microphone 61, the amplitude of c ₂ (t) is smaller than the amplitude of c ₁ (t). Furthermore, since the amplitude of r ₁ (t) is smaller than the amplitude of r ₂ (t), the amplitude of the filter h ₂ (t) for canceling the echo is also reduced. Is because the second term in the above equation _(14), is set to _{r 1 (t) -h 2 (} t) * r 2 (t) = 0 and becomes _h 2 (t). With this arrangement, the degradation component h ₂ (t) * c ₂ (t) * s (t) of the voice of the near-end speaker 7 can be reduced.
In the first embodiment, as shown in FIG. 6, the high-sensitivity portion 61 a of the main microphone 61 is directed to the near-end speaker 7, and the high-sensitivity portion 6 ma of the microphone 6 m is directed to the speaker 4. The deterioration component of the voice of the person 7 can be reduced.

次に、エコー信号ｄ（ｔ）を抑圧するための可変フィルタの係数ｈ_１（ｔ）、ｈ_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）を求める方法を説明する。フィルタ係数ｈ_１（ｔ）は第１のフィルタ係数更新部３４で、ｈ_ｍ（ｔ）は、第２のフィルタ係数更新部３２で更新される（ステップＳ８）。送話信号ｅ（ｔ）に含まれるエコー成分の２乗平均が小さくなるようにフィルタ係数ｈ_１（ｔ）、ｈ_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）を逐次更新することにより得られる。ただしフィルタ係数の初期値は任意の値で事前に与えられる。
このフィルタ係数を逐次更新するアルゴリズムの代表的なものには、学習同定（ＮＬＭＳ：ＮｏｒｍａｌｉｚｅｄＬｅａｓｔ−Ｍｅａｎ−Ｓｑｕａｒｅｓ）アルゴリズム、もしくは射影アルゴリズム、もしくは逐次最小二乗（ＲｅｃｕｒｓｉｖｅＬｅａｓｔＳｑｕａｒｅ）アルゴリズム、もしくはＬＭＳ（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ）アルゴリズム等がある。以下、それぞれのアルゴリズムを簡単に説明する。 Next, a method for obtaining the coefficients h ₁ (t) and h _m (t) (m = 2,..., M) of the variable filter for suppressing the echo signal d (t) will be described. The filter coefficient h ₁ (t) is updated by the first filter coefficient update unit 34, and h _m (t) is updated by the second filter coefficient update unit 32 (step S8). The filter coefficients h ₁ (t), h _m (t) (m = 2,..., M) are sequentially updated so that the mean square of the echo components included in the transmission signal e (t) is reduced. Is obtained. However, the initial value of the filter coefficient is given in advance as an arbitrary value.
Typical algorithms for sequentially updating the filter coefficients include a learning identification (NLMS: Normalized-Lean-Squares) algorithm, a projection algorithm, a sequential least-squares (RecursiveLeastSquare) algorithm, or an LMS (LeastMeanSquare) algorithm. is there. Each algorithm will be briefly described below.

ＮＬＭＳアルゴリズム
ＮＬＭＳアルゴリズムは観測された最新の１サンプルの送話信号ｅ（ｔ）のみを用いてフィルタ係数を更新するアルゴリズムであり、演算量が少ない特徴をもつ。第１のフィルタ係数更新部３４による更新式は以下の式（１５）で表され、第２のフィルタ係数更新部３２による更新式は以下の式（１６）で表される。
Ｈ_１（ｔ＋１）＝Ｈ_１（ｔ）＋ａ_１・Ｘ（ｔ）・ｅ（ｔ）／｛Ｘ（ｔ）^ＴＸ（ｔ）＋Σ_ｉ＝２ ^ＭＹ_ｉ（ｔ）^ＴＹ_ｉ（ｔ）｝（１５）
Ｈ_ｍ（ｔ＋１）＝Ｈ_ｍ（ｔ）＋ａ_ｍ・Ｙ_ｍ（ｔ）・ｅ（ｔ）／｛Ｘ（ｔ）^ＴＸ（ｔ）＋Σ_ｉ＝２ ^ＭＹ_ｉ（ｔ）^ＴＹ_ｉ（ｔ）｝（１６）
ただし、Ｈ_１（ｔ）、Ｈ_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）は、時刻ｔにおける受話信号ｘ（ｔ）に対するフィルタ係数のベクトルであり、Ｈ_ｍ（ｔ）＝(ｈ_ｍ(０)，ｈ_ｍ(１)，…，ｈ_ｍ(Ｌ−１))^Ｔ（ｍ＝１、．．．、Ｍ）で表され、Ｌはタップ数である。ａ_１とａ_ｍは事前に設定されたＮＬＭＳアルゴリズムのステップサイズであり、
０＜ａ_１、ａ_ｍ＜２を満たす。また、Ｙ（ｔ）は時刻ｔにおける送話信号ｙ（ｔ）のＬサンプル分のベクトルであり、Ｙ_ｍ（ｔ）＝(ｙ_ｍ(ｔ−０)，ｙ_ｍ(ｔ−１)，…，ｙ_ｍ(ｔ−Ｌ＋１))^Ｔで表す。
また、上記式（４）の右辺の分母と、上記式（１５）（１６）の右辺の分母を比較すると、上記式（１５）（１６）の右辺の分母で、余分にΣ_ｉ＝２ ^ＭＹ_ｉ（ｔ）^ＴＹ_ｉ（ｔ）が足されている。Σ_ｉ＝２ ^ＭＹ_ｉ（ｔ）^ＴＹ_ｉ（ｔ）は、各マイクロホン６ｍで収音された受音信号ｙ_ｍ（ｔ）のパワーの和である。分母に受音信号ｙ_ｍ（ｔ）のパワーの和を足しておくことで、フィルタ係数の発散を防ぐことが出来る。 NLMS algorithm The NLMS algorithm is an algorithm for updating the filter coefficient using only the latest observed transmission signal e (t) of one sample, and has a feature that the amount of calculation is small. The update formula by the first filter coefficient update unit 34 is expressed by the following formula (15), and the update formula by the second filter coefficient update unit 32 is expressed by the following formula (16).
_{_{_{H 1 (t + 1) =}}} H 1 (t) + a 1 · X (t) · e (t) / {X (t) T X (t) + Σ i = 2 M Y i (t) T Y i (t) } (15)
H _m (t + 1) = H _m (t) + a _m · Y _m (t) · e (t) / {X (t) ^T X (t) + Σ _{i = 2} ^M Y _i (t) ^T Y _i (t )} (16)
Here, H ₁ (t), H _m (t) (m = 2,..., M) are filter coefficient vectors for the received signal x (t) at time t, and H _m (t) = ( h _m (0), h _m (1),..., h _m (L−1)) ^T (m = 1,..., M), where L is the number of taps. a ₁ and a _m are pre-set NLMS algorithm step sizes,
0 <a ₁ and a _m <2 are satisfied. Y (t) is a vector of L samples of the transmission signal y (t) at time t, and Y _m (t) = (y _m (t−0), y _m (t−1),. , Y _m (t−L + 1)) ^T.
Further, the denominator of the right side of the equation (4), comparing the denominator of the right side of the equation (15) (16), in the denominator of the right side of the equation (15) (16), extra Σ _{i =} ^{2 M} Y _i (t) ^T Y _i (t) is added. Σ _{i = 2} ^M Y _i (t) ^T Y _i (t) is the sum of the powers of the received sound signals y _m (t) collected by the microphones 6m. By adding the sum of the power of the received sound signal y _m (t) to the denominator, the divergence of the filter coefficient can be prevented.

ＬＭＳアルゴリズム
ＬＭＳアルゴリズムもＮＬＭＳアルゴリズム同様、観測された最新の１サンプルの送話信号ｅ（ｔ）のみを用いてフィルタ係数を更新するアルゴリズムであり、演算量が少ない特徴をもつ。ＬＭＳアルゴリズムの更新式は、以下の式（１７）、（１８）で表すことができる。
Ｈ_１（ｔ+１）＝Ｈ_１（ｔ）＋ｂ・Ｘ（ｔ）・ｅ（ｔ）（１７）
Ｈ_ｍ（ｔ＋１）＝Ｈ_ｍ（ｔ）＋ｂ_ｍ・Ｙ_ｍ（ｔ）・ｅ（ｔ）（１８） Similar to the NLMS algorithm, the LMS algorithm is an algorithm for updating the filter coefficient using only the latest observed transmission signal e (t) of one sample, and has a feature that the amount of calculation is small. The update formula of the LMS algorithm can be expressed by the following formulas (17) and (18).
H ₁ (t + 1) = H ₁ (t) + b · X (t) · e (t) (17)
H _m (t + 1) = H _m (t) + b _m · Y _m (t) · e (t) (18)

射影アルゴリズム
射影アルゴリズムは、過去ｕサンプル分の送話信号ｅ（ｔ）を用いて、フィルタ係数を更新するアルゴリズムである。射影アルゴリズムは隣り合う入力信号ベクトルの間の相関を取り除くことを基本的な考え方とするアルゴリズムである。射影アルゴリズムは上述したＮＬＭＳアルゴリズムに比べ、演算量が多くなるが送話信号ｅ（ｔ）の収束速度が速いという特徴がある。第１のフィルタ係数ｈ_１（ｔ）および第２のフィルタ係数ｈ_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）は以下の式（１９）で更新される。なお、以下の式（１９）は２次のフィルタ係数の更新式である。

Projection algorithm The projection algorithm is an algorithm for updating the filter coefficient using the transmission signal e (t) for the past u samples. The projection algorithm is an algorithm whose basic idea is to remove the correlation between adjacent input signal vectors. The projection algorithm has a feature that the amount of calculation is larger than the above-described NLMS algorithm, but the convergence speed of the transmission signal e (t) is high. The first filter coefficient h ₁ (t) and the second filter coefficient h _m (t) (m = 2,..., M) are updated by the following equation (19). In addition, the following formula | equation (19) is an update formula of a secondary filter coefficient.

ただし、ベクトルｕ（ｔ）は、以下の式（２０）で表すことができる。

However, the vector u (t) can be expressed by the following equation (20).

ＲＬＳアルゴリズム
ＲＬＳアルゴリズムは、過去全ての送話信号ｅ（ｔ）を利用して、フィルタ係数を更新するアルゴリズムであり、上述した射影アルゴリズムよりも送話信号ｅ（ｔ）の収束速度が速いが、演算量は多い。ＲＬＳアルゴリズムは過去の全入出力の関係を最小２乗誤差で近似させるフィルタ係数ベクトルＨ＾（ｔ）を求めることにある。第１のフィルタ係数ｈ_１（ｔ）および第２のフィルタ係数ｈ_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）は以下の式（２１）で更新される。

ただし、ベクトルＫ（ｔ）は以下の式（２２）で表すことができる。

RLS algorithm RLS algorithm is an algorithm for updating filter coefficients using all past transmission signals e (t), and the convergence speed of the transmission signal e (t) is faster than the projection algorithm described above. The amount of calculation is large. The RLS algorithm is to obtain a filter coefficient vector H ^ (t) that approximates all past input / output relationships with a least square error. The first filter coefficient h ₁ (t) and the second filter coefficient h _m (t) (m = 2,..., M) are updated by the following expression (21).

However, the vector K (t) can be expressed by the following equation (22).

ただし、ベクトルＰ（ｔ）は入力信号の共分散行列Ｅ［ｘ（ｔ）ｘ（ｔ）^Ｔ］の逆行列として定義され、タップ数Ｌを用いて、Ｌ×Ｌの正方行列である。またベクトルＰ（ｔ）は以下の式（２３）を満たす。

ただし、λは忘却係数であり、０≦λ≦１を満たす係数である。
これらのアルゴリズムの詳細については、「音響エコーキャンセラのための適応信号処理の研究牧野昭二博士論文１９９３東北大学」に記載されている。
また第１のフィルタ係数更新部３４および、第２のフィルタ係数更新部３２の更新アルゴリズムは各々違うアルゴリズムを用いても良い。 However, the vector P (t) is defined as an inverse matrix of the covariance matrix E [x (t) x (t) ^T ] of the input signal, and is an L × L square matrix using the tap number L. The vector P (t) satisfies the following expression (23).

However, λ is a forgetting factor, which is a factor satisfying 0 ≦ λ ≦ 1.
Details of these algorithms are described in "Research on Adaptive Signal Processing for Acoustic Echo Canceller, Dr. Shoji Makino Doctoral Dissertation 1993 Tohoku University".
Also, different algorithms may be used for the update algorithms of the first filter coefficient update unit 34 and the second filter coefficient update unit 32, respectively.

これらの更新アルゴリズムを適用して第１のフィルタ係数更新部３４および、第２のフィルタ係数更新部３２で、送話信号ｅ（ｔ）が収束するまで、第１のフィルタ係数ｈ_１（ｔ）および第２のフィルタ係数ｈ_ｍ（ｔ）（ｍ＝２、．．．、Ｍ）は更新される（ステップＳ８）。またこれらのアルゴリズムを用いて、フィルタ係数を更新できることは、以下で説明する実施例２〜４においても同様である。
実施例１の機能構成により、スピーカ４に非線形性などがある場合に発生する非線形な音響エコーと、線形の音響エコーとの両方のエコーを消去することが出来、高い消去性能を実現することが出来る。 By applying these update algorithms, the first filter coefficient updating unit 34 and the second filter coefficient updating unit 32 apply the first filter coefficient h ₁ (t) until the transmission signal e (t) converges. The second filter coefficient h _m (t) (m = 2,..., M) is updated (step S8). The filter coefficients can be updated using these algorithms as well in the second to fourth embodiments described below.
With the functional configuration of the first embodiment, it is possible to cancel both the nonlinear acoustic echo generated when the speaker 4 has nonlinearity and the like, and the linear acoustic echo, thereby realizing high canceling performance. I can do it.

［実験結果］
従来のエコー消去装置１４と実施例２のエコー消去装置４０を実際のハンズフリー装置に適用して、効果を比較するための実験結果を説明する。この実験で使用した実施例１で説明したエコー消去装置４０、スピーカ４、近端話者７などの配置を図７に示す。また、メインマイクロホン、サブマイクロホンをそれぞれ１つとした。メインマイクロホン６１とサブマイクロホン６２とを結ぶ直線と、スピーカ４と近端話者７とを結ぶ直線と、が直交するように、メインマイクロホン６１、サブマイクロホン６２、スピーカ４、近端話者７を配置させる。メインマイクロホン６１とサブマイクロホン６２との距離を２ｃｍ、スピーカ４と近端話者７との距離を５０ｃｍ、スピーカ４とメインマイクロホン６１との距離を５ｃｍとし、スピーカ４の直径は４ｃｍである。スピーカ特性は過大入力に対してクリップする特性を模擬したシグモイド関数とし、空間応答ｃ_１（ｔ）、ｃ_２（ｔ）、ｒ_１（ｔ）、ｒ_２（ｔ）は図７の配置で計測したインパルス応答とした。 [Experimental result]
An experimental result for comparing the effect by applying the conventional echo canceling device 14 and the echo canceling device 40 of the second embodiment to an actual hands-free device will be described. FIG. 7 shows the arrangement of the echo canceller 40, the speaker 4, the near-end speaker 7 and the like described in the first embodiment used in this experiment. One main microphone and one sub microphone were used. The main microphone 61, the sub microphone 62, the speaker 4, and the near-end speaker 7 are arranged so that the straight line connecting the main microphone 61 and the sub microphone 62 and the straight line connecting the speaker 4 and the near-end speaker 7 are orthogonal to each other. Arrange. The distance between the main microphone 61 and the sub microphone 62 is 2 cm, the distance between the speaker 4 and the near-end speaker 7 is 50 cm, the distance between the speaker 4 and the main microphone 61 is 5 cm, and the diameter of the speaker 4 is 4 cm. The speaker characteristic is a sigmoid function that simulates the characteristic of clipping with an excessive input, and the spatial responses c ₁ (t), c ₂ (t), r ₁ (t), and r ₂ (t) are measured with the arrangement shown in FIG. Impulse response.

メインマイクロホン６１の感度の高い部分４１ａを近端話者の方向へ向け、サブマイクロホン６２の感度の高い部分６２ａをスピーカ４の方向へ向ける。また、サンプリング周波数は１６ｋＨｚ、残響時間を２００ｍｓ、第１の可変フィルタ２４および第２の可変フィルタ２６のタップ数は５１２タップ、適応アルゴリズムはＮＬＭＳアルゴリズムを用いた。
図８、図９に実験結果を示す。図８は受話信号ｘ（ｔ）に白色雑音を入力したときのエコー消去量を示しているグラフであり、値が大きいほどより多くのエコーを消去できていることを示す。横軸が時刻（ｓ）であり、縦軸がその時刻のエコー消去量を示す。細線が従来技術のエコー消去装置１４によるエコー消去量を示し、太線が実施例１で説明したエコー消去装置４０によるエコー消去量を示す。
従来技術のエコー消去装置１４を用いた場合のエコー消去量は２０ｄＢ程度が最大であることが理解できる。これは、スピーカ４に非線形性があるために生じる非線形な音響エコーを従来技術で説明したエコー消去装置１４では消去できないためである。 The high sensitivity portion 41 a of the main microphone 61 is directed toward the near-end speaker, and the high sensitivity portion 62 a of the sub microphone 62 is directed toward the speaker 4. The sampling frequency was 16 kHz, the reverberation time was 200 ms, the number of taps of the first variable filter 24 and the second variable filter 26 was 512 taps, and the adaptive algorithm used was the NLMS algorithm.
8 and 9 show the experimental results. FIG. 8 is a graph showing the amount of echo cancellation when white noise is input to the received signal x (t). The larger the value, the more echoes can be canceled. The horizontal axis represents time (s), and the vertical axis represents the amount of echo cancellation at that time. A thin line indicates the amount of echo erasure by the conventional echo erasing device 14, and a thick line indicates the amount of echo erasure by the echo erasing device 40 described in the first embodiment.
It can be understood that the maximum amount of echo cancellation when using the conventional echo cancellation apparatus 14 is about 20 dB. This is because the non-linear acoustic echo generated due to the non-linearity of the speaker 4 cannot be canceled by the echo canceling device 14 described in the related art.

これに対し、実施例１で説明したエコー消去装置４０を使用した場合のエコー消去量は４０ｄＢ程度であることが理解できる。これは、実施例２で説明したエコー消去装置４０においての非線形な音響エコーも消去できているからである。
図９は、近端話者７に対するインパルス応答を示したグラフである。横軸が周波数（Ｈｚ）を示し、縦軸が、近端話者７からメインマイクロホン６１までのインパルス応答ｃ_１（ｔ）である。細線がエコー消去装置４０の処理前を示し、太線がエコー消去装置４０の処理後を示す。図８のグラフより、処理前と処理後のインパルス応答ｃ_１（ｔ）を比較しても、殆ど差が無いことが理解できる。以上のことから、実施例１では、近端話者７の音声の劣化成分が小さいことがわかる。
以上の説明から、実施例１によれば、線形の音響エコーと、スピーカ４の非線形性により発生する非線形な音響エコーの両方を消去し、高いエコー消去方法を実現出来る。さらに、近端話者７の音声の劣化を小さく高品質な収音が実現できる。 On the other hand, it can be understood that the amount of echo cancellation when the echo cancellation apparatus 40 described in the first embodiment is used is about 40 dB. This is because the nonlinear acoustic echo in the echo canceling apparatus 40 described in the second embodiment can also be canceled.
FIG. 9 is a graph showing an impulse response to the near-end speaker 7. The horizontal axis represents the frequency (Hz), and the vertical axis represents the impulse response c ₁ (t) from the near-end speaker 7 to the main microphone 61. A thin line indicates before processing of the echo canceller 40, and a thick line indicates after processing of the echo canceller 40. It can be understood from the graph of FIG. 8 that there is almost no difference even when the impulse response c ₁ (t) before and after the processing is compared. From the above, it can be seen that in Example 1, the degradation component of the voice of the near-end speaker 7 is small.
From the above description, according to the first embodiment, both the linear acoustic echo and the nonlinear acoustic echo generated due to the nonlinearity of the speaker 4 can be eliminated, and a high echo cancellation method can be realized. Furthermore, it is possible to realize high-quality sound collection with little deterioration of the voice of the near-end speaker 7.

実施例１では、メインマイクロホン６１は感度の高い部分６１ａを近端話者７に向けて、感度の低い部分をスピーカ４に向け、サブマイクロホン６２は感度の高い部分６２ａをスピーカ４に向け、感度の低い部分を近端話者７に向けることで、近端話者７の音声の劣化成分を小さくすることが出来ることを説明した。実施例２では、話者受音信号選出手段をメインビームフォーマとし、エコー受音信号選出手段をサブビームフォーマとする機能構成例である。実施例２ではこれらメインビームフォーマとサブビームフォーマを使って近端話者７の音声の劣化成分を小さくする。
図１０に実施例２の機能構成例を示す。実施例２のエコー消去装置４０は実施例２で説明したエコー消去装置４０と比較して、上記サブビームフォーマ５０とメインビームフォーマ５２とが加えられる。また、この実施例では、マイクロホン６ｍにおいては、メインマイクロホン（話者用収音手段）、サブマイクロホンとに分けられることなく、全てのマイクロホン６ｍが並列的に作動する。 In the first embodiment, the main microphone 61 directs the high-sensitivity portion 61a toward the near-end speaker 7, directs the low-sensitivity portion toward the speaker 4, and the sub microphone 62 directs the high-sensitivity portion 62a toward the speaker 4, It has been explained that the deterioration component of the voice of the near-end speaker 7 can be reduced by directing the low-frequency portion toward the near-end speaker 7. The second embodiment is a functional configuration example in which the speaker reception signal selection unit is a main beamformer and the echo reception signal selection unit is a sub-beamformer. In the second embodiment, the main beam former and the sub beam former are used to reduce the degradation component of the voice of the near-end speaker 7.
FIG. 10 shows a functional configuration example of the second embodiment. The echo canceling apparatus 40 of the second embodiment is added with the sub beam former 50 and the main beam former 52 as compared with the echo canceling apparatus 40 described in the second embodiment. In this embodiment, the microphones 6m are not divided into main microphones (speaker sound collecting means) and sub microphones, and all the microphones 6m operate in parallel.

また、サブビームフォーマ５０はサブ固定フィルタ５０ｍ（ｍ＝１、．．．、Ｍ）とサブ加算部５００とで構成され、メインビームフォーマ５２はメイン固定フィルタ５２ｍ（ｍ＝１、．．．、Ｍ）とメイン加算部５２０とで構成される。
メインビームフォーマ５２は近端話者７方向に感度を高くし、スピーカ４に対する感度を低くする。また、サブビームフォーマ５０は、スピーカ４に対する感度を高くして、近端話者７に対する感度を低くする。メインビームフォーマ５２とサブビームフォーマ５０を使用することで、任意の方向に対して指向性が高い部分と低い部分を作ることができ、様々なスピーカとマイクロホンの配置に適用することができる。
また、この実施例２の説明では、簡略化のため、周波数領域ωで説明する。近端話者からマイクロホン６ｍまでの伝達関数をＣ_ｍ（ω）とし、スピーカ４からマイクロホン６ｍまでの伝達関数をＲ_ｍ（ω）とし、メインビームフォーマ５２中のｍチャネルのメイン固定フィルタ５２ｍの各固定フィルタ係数をＰ_ｍ（ω）とし、サブビームフォーマ５０のサブ固定フィルタ５０ｍの固定フィルタ係数をＱ_ｍ（ω）とする。
サブ固定フィルタ５０ｍおよびメイン固定フィルタ５２ｍの各固定フィルタ係数は予め与えられた値から固定されたものである。以下に、固定フィルタ係数の設計を説明する。 The sub beamformer 50 includes a sub fixed filter 50m (m = 1,..., M) and a sub adder 500, and the main beamformer 52 includes a main fixed filter 52m (m = 1,. ) And a main adder 520.
The main beamformer 52 increases sensitivity toward the near-end speaker 7 and decreases sensitivity to the speaker 4. Further, the sub-beamformer 50 increases sensitivity to the speaker 4 and decreases sensitivity to the near-end speaker 7. By using the main beamformer 52 and the sub-beamformer 50, it is possible to create a portion having high directivity and a portion having low directivity in an arbitrary direction, and it can be applied to various speaker and microphone arrangements.
In the description of the second embodiment, the description will be made in the frequency domain ω for simplification. The transfer function from the near-end speaker to the microphone 6 _m is C _m (ω), the transfer function from the speaker 4 to the microphone 6 _m is R _m (ω), and the m-channel main fixed filter 52 m in the main beamformer 52 is Each fixed filter coefficient is P _m (ω), and the fixed filter coefficient of the sub fixed filter 50 m of the sub beam former 50 is Q _m (ω).
Each fixed filter coefficient of the sub fixed filter 50m and the main fixed filter 52m is fixed from a predetermined value. Hereinafter, the design of the fixed filter coefficient will be described.

メインビームフォーマ５２に要求されるのは、近端話者７の音声ｓ（ω）を収音し、スピーカ４からの再生音を抑圧することである。これらの条件を式で表せば、以下の式（２４）（２５）になる。
Σ_ｉ＝１ ^ＭＣ_ｉ’（ω）・Ｐ_ｉ（ω）＝Ｄ（ω）（２４）
Σ_ｉ＝１ ^ＭＲ_ｉ’（ω）・Ｐ_ｉ（ω）＝０（２５）
ここで、Ｄ（ω）は目標とするインパルス応答である。目標とするインパルス応答とは、例えば、振幅値が固定値であり、位相が直線位相（時間領域における固定遅延）となっているようなインパルス応答である。Ｃ_ｍ’(ω)、Ｒ_ｍ’(ω)はマイク６ｍ、スピーカ４、近端話者７の配置から計算される直接波の理論的な応答を事前に設定する。以下の式（２６）（２７）でも同様である。上記式（２４）（２５）を満たす固定フィルタ係数Ｐ_ｍ(ω)を設定すれば良い。次に、サブビームフォーマ５０に要求されるのは、近端話者７の音声ｓ（ω）を抑圧し、スピーカ４からの再生音を収音することである。これらの条件を式で表すと、以下の式（２６）（２７）になる。
Σ_ｉ＝１ ^ＭＣ_ｉ’（ω）・Ｑ_ｉ（ω）＝０（２６）
Σ_ｉ＝１ ^ＭＲ_ｉ’（ω）・Ｑ_ｉ（ω）＝Ｋ（ω）（２７）
ただし、Ｋ（ω）は目標とするインパルス応答である。目標とするインパルス応答とは、例えば、振幅値が固定値であり、位相が直線位相（時間領域における固定遅延）となっているようなインパルス応答である。
その他の処理は実施例１と同様である。 What is required of the main beamformer 52 is to pick up the voice s (ω) of the near-end speaker 7 and suppress the reproduced sound from the speaker 4. When these conditions are expressed by equations, the following equations (24) and (25) are obtained.
Σ _{i = 1} ^M C _i ′ (ω) · P _i (ω) = D (ω) (24)
Σ _{i = 1} ^M R _i ′ (ω) · P _i (ω) = 0 (25)
Here, D (ω) is a target impulse response. The target impulse response is, for example, an impulse response in which the amplitude value is a fixed value and the phase is a linear phase (fixed delay in the time domain). C _m ′ (ω) and R _m ′ (ω) set in advance the theoretical response of the direct wave calculated from the arrangement of the microphone 6m, the speaker 4, and the near-end speaker 7. The same applies to the following equations (26) and (27). A fixed filter coefficient P _m (ω) that satisfies the above equations (24) and (25) may be set. Next, what is required of the sub-beamformer 50 is to suppress the voice s (ω) of the near-end speaker 7 and collect the reproduced sound from the speaker 4. When these conditions are expressed by equations, the following equations (26) and (27) are obtained.
Σ _{i = 1} ^M C _i ′ (ω) · Q _i (ω) = 0 (26)
Σ _{i = 1} ^M R _i ′ (ω) · Q _i (ω) = K (ω) (27)
However, K (ω) is a target impulse response. The target impulse response is, for example, an impulse response in which the amplitude value is a fixed value and the phase is a linear phase (fixed delay in the time domain).
Other processes are the same as those in the first embodiment.

以上のように、メインビームフォーマ５２、サブビームフォーマ５０を設定すれば、任意のマイクロホン６ｍとスピーカ４の配置において、メインビームフォーマ５２では近端話者７の方向に感度を高くして、スピーカ４に対する感度を低くして、サブビームフォーマ５０ではスピーカ４に対する感度を高くして、近端話者７に対する感度を低くすることが実現し、近端話者音声の劣化を防止することが出来る。 As described above, if the main beamformer 52 and the sub-beamformer 50 are set, the sensitivity of the main beamformer 52 in the direction of the near-end speaker 7 is increased in the arrangement of the arbitrary microphone 6m and the speaker 4, and the speaker 4 The sub-beamformer 50 can reduce the sensitivity to the near-end speaker 7 by reducing the sensitivity to the near-end speaker 7 by preventing the deterioration of the near-end speaker voice.

この実施例の機能構成例は、実施例１で説明したエコー消去装置４０に受話検出部６０とスイッチ６２ｍ（ｍ＝２、．．．Ｍ）を加えたものである。スイッチ６２ｍのそれぞれは第２の可変フィルタ２６ｍのそれぞれと対応して接続されている。
図１１に実施例３の機能構成例を示す。通信網２よりの音声信号ｊ（ｔ）は第１の可変フィルタ２４のほかに、受話検出部６０にも入力される。受話検出部６０は、音声信号ｊ（ｔ）のレベルを観測し、受話信号ｘ（ｔ）のある区間を検出する。検出は例えば、予め、与えられた閾値と音声信号ｊ（ｔ）のレベル（パワー）とを比較し、音声信号ｊ（ｔ）の方が大きい場合はその区間に受話信号ｘ（ｔ）が含まれていると判断する。
受話信号ｘ（ｔ）が含まれている区間では、受話検出部６０がスイッチ６２ｍのオン接続とし、第２の可変フィルタ２６ｍをそれぞれ、加算部３６へ接続させる。また、受話信号ｘ（ｔ）が含まれていない区間では、受話検出部６０が、スイッチ６２ｍの接続をオフとし、第２の可変フィルタ２６ｍをそれぞれ加算部３６から切り離す。 The functional configuration example of this embodiment is obtained by adding the reception detector 60 and the switch 62m (m = 2,... M) to the echo canceller 40 described in the first embodiment. Each of the switches 62m is connected to each of the second variable filters 26m.
FIG. 11 shows a functional configuration example of the third embodiment. The audio signal j (t) from the communication network 2 is input to the reception detection unit 60 in addition to the first variable filter 24. The reception detection unit 60 observes the level of the audio signal j (t) and detects a certain section of the reception signal x (t). In the detection, for example, a given threshold value is compared with the level (power) of the audio signal j (t) in advance, and if the audio signal j (t) is larger, the received signal x (t) is included in that section. It is judged that
In a section in which the reception signal x (t) is included, the reception detection unit 60 turns on the switch 62m and connects the second variable filter 26m to the addition unit 36, respectively. In a section in which the reception signal x (t) is not included, the reception detection unit 60 turns off the connection of the switch 62m and disconnects the second variable filter 26m from the addition unit 36, respectively.

これらの処理により、近端話者７の発話があり、音声信号ｊ（ｔ）中に受話信号ｘ（ｔ）が含まれていない区間ではスイッチ６２ｍの接続がオフになっているので、第２の可変フィルタ２６よりの第２の擬似エコー信号ｑ（ｔ）が加算部３６に出力されず、つまりサブマイクロホン６ｍを経由する近端話者の音声が遮断され、送話信号ｅ（ｔ）での近端話者音声の劣化がなくなる。
また音声信号ｊ（ｔ）中に受話信号ｘ（ｔ）が含まれている区間ではスイッチ６２ｍがオン接続になっているので、実施例２で説明したとおり、エコー信号ｄ（ｔ）を消去する。
また受話信号ｘ（ｔ）が含まれている区間で、かつ近端話者７が発話しているいわゆるダブルトークの区間では、スイッチ６２ｍがオン接続となり、エコー信号ｄ（ｔ）を消去する。このダブルトーク区間では、近端話者７の音声の劣化が生じるが、ダブルトーク時は、近端話者７から音声信号ｓ（ｔ）と受話信号ｘ（ｔ）の両方が聴こえるため、聴覚のマスキング効果により、音声の劣化が聞こえにくくなっているので、劣化の知覚が少なくなる。 By these processes, the connection of the switch 62m is turned off in the section where the near-end speaker 7 speaks and the voice signal j (t) does not include the received signal x (t). The second pseudo echo signal q (t) from the variable filter 26 is not output to the adding unit 36, that is, the voice of the near-end speaker passing through the sub microphone 6m is blocked, and the transmission signal e (t) The near-end speaker's voice is no longer degraded.
Further, since the switch 62m is on in the section where the received signal x (t) is included in the voice signal j (t), the echo signal d (t) is deleted as described in the second embodiment. .
Further, in a section in which the received signal x (t) is included and a so-called double talk section in which the near-end speaker 7 is speaking, the switch 62m is turned on to cancel the echo signal d (t). In this double talk section, the voice of the near-end speaker 7 is deteriorated, but at the time of double talk, both the voice signal s (t) and the received signal x (t) can be heard from the near-end talker 7. Since the masking effect makes it difficult to hear sound deterioration, the perception of deterioration is reduced.

図１２に示すこの実施例４の機能構成例は、実施例２で説明したエコー消去装置４０に実施例３で説明した受話検出部６０を加えたものである。実施例３同様、受話信号ｘ（ω）が受話検出部６０に入力され、通信網２よりの音声信号ｊ（ω）中に受話信号ｘ（ω）が含まれている区間では、スイッチ６２がオン接続となり、エコー信号ｄ（ω）が消去される。また、通信網２よりの音声信号ｊ（ω）中に受話信号ｘ（ω）が含まれていない区間では、スイッチ６２の接続がオフとなり、近端話者７の音声の劣化は生じない。
以上の各実施形態の他、本発明であるエコー消去装置は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、エコー消去装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。
また、この発明のエコー消去装置における処理をコンピュータによって実現する場合、エコー消去装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、エコー消去装置における処理機能がコンピュータ上で実現される。 The functional configuration example of the fourth embodiment shown in FIG. 12 is obtained by adding the reception detection unit 60 described in the third embodiment to the echo canceller 40 described in the second embodiment. As in the third embodiment, the reception signal x (ω) is input to the reception detection unit 60, and in the section where the reception signal x (ω) is included in the voice signal j (ω) from the communication network 2, the switch 62 is turned on. The connection is turned on, and the echo signal d (ω) is deleted. Further, in a section where the received signal x (ω) is not included in the voice signal j (ω) from the communication network 2, the connection of the switch 62 is turned off and the voice of the near-end speaker 7 does not deteriorate.
In addition to the above embodiments, the echo canceller according to the present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the echo canceling device is not only executed in time series according to the order of description, but may be executed in parallel or individually as required by the processing capability of the device that executes the processing. .
Further, when the processing in the echo canceller of the present invention is realized by a computer, the processing contents of the functions that the echo canceller should have are described by a program. Then, by executing this program on a computer, the processing function in the echo canceller is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、ＤＶＤ−ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＣＤ−Ｒ（Ｒｅｃｏｒｄａｂｌｅ）／ＲＷ（ＲｅＷｒｉｔａｂｌｅ）等を、光磁気記録媒体として、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｃ）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（ＥｌｅｃｔｒｏｎｉｃａｌｌｙＥｒａｓａｂｌｅａｎｄＰｒｏｇｒａｍｍａｂｌｅ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等を用いることができる。
また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like is used as an optical disc, and a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable). ) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable Programmable-Read Only Memory), etc. can be used.
The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（ＡｐｐｌｉｃａｔｉｏｎＳｅｒｖｉｃｅＰｒｏｖｉｄｅｒ）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。
また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、エコー消去装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Further, the above-described processing may be executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by an execution instruction and result acquisition without transferring a program from the server computer to the computer. Good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
In this embodiment, the echo canceling apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

従来技術のシステムの機能構成例を示すブロック図。The block diagram which shows the function structural example of the system of a prior art. 従来技術及びこの発明のスピーカ４内の受話信号ｘ（ｔ）の流れを示す図。The figure which shows the flow of the reception signal x (t) in the speaker 4 of a prior art and this invention. この発明の実施例１のシステムの機能構成例を示すブロック図。1 is a block diagram showing a functional configuration example of a system according to Embodiment 1 of the present invention. この発明の実施例１の主な処理の流れを示すフローチャート。The flowchart which shows the flow of the main processes of Example 1 of this invention. この発明の実施例１または３において、近端話者７の音声の劣化成分を抑圧するためのメインマイクロホン６１、サブマイクロホン６２の配置を示す図。The figure which shows arrangement | positioning of the main microphone 61 and the submicrophone 62 for suppressing the degradation component of the audio | voice of the near-end speaker 7 in Example 1 or 3 of this invention. この発明に使用したメインマイクロホン６１とサブマイクロホン６２の配置を図５のように配置した場合の機能構成例を示すブロック図。FIG. 6 is a block diagram showing an example of a functional configuration when the main microphone 61 and the sub microphone 62 used in the present invention are arranged as shown in FIG. 5. 従来の技術のエコー消去装置１４とこの発明のエコー消去装置４０との効果の違いを示すための実験配置図。FIG. 6 is an experimental layout diagram showing the difference in effect between the conventional echo canceling device 14 and the echo canceling device 40 of the present invention. 従来の技術のエコー消去装置１４とこの発明のエコー消去装置４０とのエコー消去性能を示すグラフ。The graph which shows the echo cancellation performance of the echo cancellation apparatus 14 of a prior art, and the echo cancellation apparatus 40 of this invention. この発明のエコー消去装置４０による処理前と処理後においての、近端話者７に対するインパルス応答ｃ_ｍ（ｔ）を示す図。The figure which shows the impulse response _cm (t) with respect to the near-end speaker 7 before and after the process by the echo cancellation apparatus 40 of this invention. この発明の実施例２のシステムの機能構成例を示すブロック図。The block diagram which shows the function structural example of the system of Example 2 of this invention. この発明の実施例３のシステムの機能構成例を示すブロック図。The block diagram which shows the function structural example of the system of Example 3 of this invention. この発明の実施例４のシステムの機能構成例を示すブロック図。The block diagram which shows the function structural example of the system of Example 4 of this invention.

Claims

The reception means converts the received signal into a reproduced sound, utters, and a signal (hereinafter referred to as a speaker received signal) in which the near-end speaker sound is received by the speaker sound collecting means and the reproduced sound are In an echo canceller that removes the echo signal from a received signal consisting of a signal received by the speaker sound collection means (hereinafter referred to as an echo signal) and outputs the signal as a transmitted signal.
A first variable filter that receives the received signal and generates a first pseudo echo signal;
By one or more echo YoOsamu sound hand stage, received sound has been received signals are input, a second variable filter for generating a second pseudo echo signal,
A subtracting unit that subtracts the first pseudo echo signal and the second pseudo echo signal from the received sound signal at the sound collecting means for the speaker and outputs the transmission signal;
A first filter coefficient updating unit that receives at least the reception signal and the transmission signal and updates a filter coefficient of the first variable filter;
A second filter coefficient updating unit that receives at least a sound reception signal from the echo sound collection means and the transmission signal, and updates a filter coefficient of the second variable filter ;
When the level of the received signal is detected and the detected level is smaller than a predetermined threshold, the second variable filter is not operated and the output of the second variable filter is set to 0. And an echo canceller.

The echo canceller according to claim 1,
Further, a received sound signal having a time average value of the ratio of the echo signal level dialogue person sound signal level smaller than that of the other received sound signals is selected from the plurality of sound collecting means and input to the subtracting unit. Speaker receiving signal selection means;
A sound reception signal having a time average value of the ratio of the echo signal level to the conversation person sound signal level larger than that of the other sound reception signals is selected from the plurality of sound collection means and input to the second variable filter. An echo canceling device comprising: an echo sound receiving signal selecting means for performing the operation.

The echo canceller according to claim 2, wherein
The speaker received signal selection means has high sensitivity in the direction of the speaker, and from one sound collection means (hereinafter referred to as main sound collection means) as the speaker sound collection means among the plurality of sound collection means. Is a means of obtaining a sound pickup signal of
The echo canceling device, wherein the echo sound receiving signal selecting means is means for obtaining a collected sound signal from a sound collecting means other than the main sound collecting means and having high sensitivity in the reproducing means direction.

The echo canceller according to claim 2, wherein
The speaker received signal selection means is a main beamformer that receives the received sound signals from all of the plurality of sound collecting means and suppresses the component of the echo signal in the received sound signals,
The echo sound receiving signal selecting means is a sub-beamformer that receives the sound receiving signals from all the plurality of sound collecting means and suppresses the components of the speaker sound receiving signals in the sound receiving signals. Echo canceling device.

In the echo canceller according to any one of claims 1 to 4 ,
The update of the filter coefficients of the first filter coefficient update unit and the second filter coefficient update unit is performed by a learning identification (NLMS: Normalized Last-Mean-Squares) algorithm, a projection algorithm, or a sequential least squares (Recursive Last Square) algorithm. Alternatively, an echo canceller that updates sequentially using a LMS (Least Mean Square) algorithm.

The reception means converts the received signal into a reproduced sound, utters, and a signal (hereinafter referred to as a speaker received signal) in which the near-end speaker sound is received by the speaker sound collecting means and the reproduced sound are In an echo canceling method for removing the echo signal from a received signal consisting of a signal received by a speaker sound pickup means (hereinafter referred to as an echo signal) and outputting the signal as a transmitted signal,
A first pseudo-echo signal generation process in which a first variable filter generates a first pseudo-echo signal from the received signal;
Second variable filter, by one or more echo YoOsamu sound hand stage, a second pseudo echo signal generating process of generating a second pseudo echo signal from the received sound has been received signals,
A subtracting step of subtracting the first pseudo echo signal and the second pseudo echo signal from a sound reception signal from the speaker sound collecting unit and outputting the transmission signal;
A first filter coefficient updating means for updating a filter coefficient of the first variable filter from at least the received signal and the transmitted signal;
A second filter coefficient updating step in which the second filter coefficient updating means updates the filter coefficient of the second variable filter from at least the sound reception signal from the echo sound collecting means and the transmission signal; ,
The reception detection means detects the level of the reception signal, and when the detected level is smaller than a predetermined threshold, the second variable filter is not operated and the output of the second variable filter is set to 0. An echo detection method characterized by comprising:

The echo cancellation method according to claim 6 ,
Further, the speaker sound signal selection means selects a sound reception signal whose time average value of the ratio of the echo signal level to the conversation person sound signal level is smaller than the other sound reception signals from the plurality of sound pickup means. The process of selecting a speaker's received sound signal,
The echo sound receiving signal selecting means selects an echo receiving signal from among the plurality of sound collecting means for selecting a sound receiving signal whose time average value of the ratio of the echo signal level to the conversation person sound signal level is larger than that of the other sound receiving signals. An echo canceling method comprising: a sound signal selection process;

The echo cancellation method according to claim 7 , wherein
In the speaker received signal selection process, the speaker received signal selection means has high sensitivity in the direction of the speaker, and one sound collection means as the speaker sound collection means among the plurality of sound collection means (Hereinafter referred to as the main sound collecting means)
The echo sound reception signal selection process is a process in which the echo sound signal selection means obtains a sound collection signal from a sound collection means other than the main sound collection means with high sensitivity in the reproduction means direction. Echo canceling method characterized.

The echo cancellation method according to claim 7 , wherein
The speaker received signal selection process is a process of suppressing the component of the echo signal in the received signal from all the plurality of sound collecting means by the main beamformer,
The echo received signal selection process is a process of suppressing a component of the speaker received signal in the received signal from all the plurality of sound collecting means by a sub-beamformer. Method.

In the echo cancellation method according to any one of claims 6 to 9 ,
The update of the filter coefficient in the first filter coefficient update process and the second filter coefficient update process is performed by a learning identification (NLMS: Normalized Last-Mean-Squares) algorithm, a projection algorithm, or a recursive least square (RecursiveLeastSquare) algorithm. Alternatively, an echo cancellation method, which is a process of sequentially updating using an LMS (Least Mean Square) algorithm.

An echo cancellation program for causing a computer to execute each step of the echo cancellation method according to any one of claims 6 to 10 .

The computer-readable recording medium which recorded the echo cancellation program of Claim 11 .