JP3727800B2

JP3727800B2 - Echo canceller and voice communication apparatus provided with the echo canceller

Info

Publication number: JP3727800B2
Application number: JP5266999A
Authority: JP
Inventors: 隆小原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-03-01
Filing date: 1999-03-01
Publication date: 2005-12-14
Anticipated expiration: 2019-03-01
Also published as: JP2000252885A

Description

【０００１】
【発明の属する技術分野】
本発明は、回線数を多く確保するために音声信号を圧縮して送るディジタル自動車電話等の携帯端末であって、特に電話機を持たなくても通話が可能である所謂ハンズフリーホンタイプの携帯端末で発生するエコーによる通話の劣化を抑えるためのエコーキャンセラ及びこのエコーキャンセラを備えた音声通信装置に関する。
【０００２】
【従来の技術】
近年、コンピュータや通信の分野では、ディジタル信号処理（ＤＳＰ）が注目され、多種類の分野に応用されている。このディジタル信号処理は、アナログ処理では困難であった特性定数の任意変更や適応処理等の複雑な処理を容易に実現することができるため、特に、音声処理や画像処理の分野において汎用的な技術として用いられている。
【０００３】
例えば、自動車電話等で使用されるハンズフリータイプの電話機では、ハンズフリー通話中には、スピーカからの受話音声がマイクロホンに回り込み、これが相手方に送られて音響エコーが発生することがある。このような受話音声の回り込みによる音響エコーを打ち消して通話品質を保つために、一般にエコーキャンセラと呼ばれるエコー消去装置が使用される。
【０００４】
このエコーキャンセラでは、上述したディジタル信号処理によって擬似エコー信号を生成し、マイクロホンから入力された通話者の送話信号から上記擬似エコー信号を差し引くことによって、送話信号に含まれるエコー成分を除去している。
【０００５】
また、無線携帯電話装置では、通信チャネル数を増加させるために、音声信号のパラメータを検出して、そのパラメータだけを送る音声符号化装置により低ビットレートの伝送が行われている。
【０００６】
図３は従来のエコーキャンセラを備えた音声通信装置の構成を示すブロック図である。
【０００７】
無線携帯電話装置等で用いられている音声通信装置は、図３に示すように、スピーカ１０１、マイクロホン１０２、Ｄ／Ａ変換器１０３、Ａ／Ｄ変換器１０４、音声復号化器（ＤＥＣＯＤＥＲ）１０９、音声符号化器（ＣＯＤＥＲ）１１０およびエコーキャンセラ１００を備えている。
【０００８】
スピーカ１０１は遠端話者（通話相手）の音声を出力し、マイクロホン１０２は近端話者の音声を入力するためのものである。Ｄ／Ａ変換器１０３は、音声復号化器１０９からの遠端話者信号Ｘｓをディジタル／アナログ変換してスピーカ１０１に出力する。Ａ／Ｄ変換器１０４は、マイクロホン１０２から入力された音声をアナログ／ディジタル変換して、ディジタル化された近端話者信号Ｙｓを生成する。音声復号化器１０９は、図示せぬ受信回路にて受信されたパラメータだけの信号から音声信号へと復号処理を行って、遠端話者信号Ｘｓを生成する。音声符号化器１１０は、エコーキャンセラ１００にてエコー成分が消去された送信音声信号のパラメータを検出してそのパラメータだけを符号化して図示せぬ送信回路に送る。
【０００９】
このような構成において、図示せぬ受信回路にて受信されたパラメータだけの受話信号は音声復号化器１０９で復号化処理が施され、さらにＤ／Ａ変換器１０３でアナログ通話信号に戻された後、スピーカ１０６から話者に向けて音声出力される。
【００１０】
一方、話者の送話音声は、マイクロホン１０２により集音されて送話信号に変換された後、Ａ／Ｄ変換器１０４に入力され、ここで先ず所定のサンプリング周期でデジタル化される。そして、このデジタル化された送話信号つまり近端話者信号Ｙｓは、エコーキャンセラ１００にてエコー成分が除去された後、音声符号化器１１０に入力され、ここで符号化処理されて図示せぬ送信回路に供給される。
【００１１】
ここで、エコーキャンセラ１００は、ディジタルフィルタ（ＦＩＲ）１０５、減算器１０６、フィルタ係数更新部（ＡＤＰ）１０７、ダブルトーク検出器（ＤＴＤ）１０８から構成される。
【００１２】
ディジタルフィルタ１０５は、遠端話者信号Ｘｓにフィルタ係数を掛け合わせて疑似エコー信号Ｙｓｓを生成するものである。この疑似エコー信号Ｙｓｓは演算器１０６に供給される。減算器１０６は、Ａ／Ｄ変換器１０４の出力である近端話者信号Ｙｓから疑似エコー信号Ｙｓｓを減算してエコー成分を打ち消した残差エコー信号Ｅｓを生成する。
【００１３】
また、フィルタ係数更新部１０７は、ディジタルフィルタ１０５のフィルタ係数を更新する。ここでは、上記減算器１０６で消去されずに通過する残差信号を最小にするために、フィルタ係数（タップ係数とも呼ばれる）を更新するための演算処理が行われる。このフィルタ係数更新（学習）により、ディジタルフィルタ１０５の伝達関数はエコーパスの伝達関数に次第に近づき、両伝達関数が等しくなると残差信号レベルは略零になる。
【００１４】
このフィルタ係数更新部１０７によるフィルタ係数の更新は、双方向で同時に会話する状態の時、所謂ダブルトークの最中に行うことができない。これは、ダブルトーク中にフィルタ係数の更新を行うと、フィルタ係数が収束せずに、送話信号に含まれるエコー成分を正確に消去できなくなるからである。そこで、一般にはダブルトーク検出器１０８が備えられ、通信時におけるダブルトーク期間を検出して、エコーキャンセラ１００の発散を防止している。
【００１５】
ところで、上述した従来のエコーキャンセラ１００では、ＡＤ変換器１０４の出力信号毎に処理を行うような構成、つまり、サンプル形式で構成されており、サンプル毎に上記ダブルトーク期間の判定処理を行っている。
【００１６】
また、ＤＳＰ(Digital Signal Processor)でエコー消去処理と音声符号化／復号化処理の両方を行う場合に、音声符復号化処理は一般的にフレーム処理であるのに対し、エコー消去処理はサンプル処理であるため、音声符号化／復号化処理の最中に割り込みとしてエコー消去処理が入って来る。このため、割り込みのためのレジスタ待避などの処理が余計にかかってしまう。
【００１７】
【発明が解決しようとする課題】
上述したように、従来、音声符復号化器とエコーキャンセラは独立の構成を採っていたために（音声符復号化器はフレーム処理、エコーキャンセラはサンプル処理）、１つのＤＳＰで処理しようとした場合には、割り込み処理などが必要であった。この場合、ダブルトーク結果にハングオーバ処理が加わっていた場合には、サンプル処理では、ハングオーバのカウンタ値の変更やカウンタ値の判定をするためだけに割り込み処理を行わなくてはいけないために、処理量が非常に多くなる問題があった。
【００１８】
また、ダブルトークの判定は、通常、音声の立ち上がりを検出することで行われるが、サンプル処理では、サンプル毎の判定となるため、音声の立ち上がりなど、信号の状態を検出しづらいといった問題があった。
【００１９】
本発明は上記のような点に鑑みなされたもので、割り込み処理等の複雑な演算処理を必要とせずに効率的にエコー成分を消去することのできるエコーキャンセラ及びこのエコーキャンセラを備えた音声通信装置を提供することを目的とする。
【００２０】
【課題を解決するための手段】
本発明のエコーキャンセラは、遠端話者信号をフレーム単位で蓄積する第１のバッファ手段と、近端話者信号をフレーム単位で蓄積する第２のバッファ手段と、上記第１のバッファ手段に蓄積された遠端話者信号にフィルタ係数を乗算して擬似エコー信号を生成するフィルタ手段と、上記第２のバッファ手段に蓄積された近端話者信号から上記フィルタ手段によって生成された擬似エコー信号を差し引くことで上記近端話者信号に含まれるエコー成分を除去する減算手段と、上記フィルタ係数を所定のアルゴリズムに基づき更新するフィルタ係数更新手段と、上記減算手段によって得られる残差エコー信号をフレーム単位で蓄積する第３のバッファ手段と、上記第３のバッファ手段に蓄積された残差エコー信号を読み込み、この残差信号を複数のブロックに分けてブロック毎のパワーの変化から近端話者信号の立ち上りを検出する立ち上り検出手段と、この第３のバッファ手段に１フレーム分の残差エコー信号が蓄積されたことに応じ、この残差エコー信号と前記第１のバッファ手段に蓄積された遠端話者信号との関係が第１のダブルトーク判定基準または前記立ち上り検出手段によって立ち上がりが検出されたことに応じて設定される前記第１のダブルトーク判定基準よりもダブルトークを検出しやすい第２のダブルトーク判定基準を満たすことでダブルトークを検出し、上記フィルタ係数更新手段による上記フィルタ係数の更新を中断させるダブルトーク検出手段とを具備したことを特徴とする。
【００２１】
このような構成によれば、ダブルトークの検出がしづらい音声の立ち上り時でも、立ち上り検出手段が音声の立ち上りを検出した場合は、ダブルトーク検出手段が、よりダブルトークと判定しやすい判定基準を用いてダブルトークの検出を行うため、シングルトークと誤判定されにくくすることができる。
【００２３】
【発明の実施の形態】
以下、図面を参照して本発明の一実施形態を説明する。
【００２４】
図１は本発明のエコーキャンセラを備えた音声通信装置の構成を示すブロック図である。
【００２５】
本実施形態におる音声通信装置は、例えば自動車電話等の無線携帯電話装置に適用されるものであり、図１に示すように、本装置は、スピーカ２０１、マイクロホン２０２、Ｄ／Ａ変換器２０３、Ａ／Ｄ変換器２０４、音声復号化器（ＤＥＣＯＤＥＲ）２０９、音声符号化器（ＣＯＤＥＲ）２１０およびエコーキャンセラ２００を備えている。
【００２６】
スピーカ２０１は遠端話者（通話相手）の音声を出力し、マイクロホン２０２は近端話者の音声を入力するためのものである。Ｄ／Ａ変換器２０３は、音声復号化器２０９からの遠端話者信号Ｘｓをディジタル／アナログ変換してスピーカ２０１に出力する。Ａ／Ｄ変換器２０４は、マイクロホン２０２から入力された音声をアナログ／ディジタル変換して、ディジタル化された近端話者信号Ｙｓを生成する。音声復号化器２０９は、図示せぬ受信回路にて受信されたパラメータだけの信号から音声信号へと復号処理を行って、時系列の遠端話者信号Ｘｓを生成する。音声符号化器２１０は、エコーキャンセラ２００にてエコー成分が消去された送信音声信号のパラメータを検出して、そのパラメータだけを符号化する。
【００２７】
ここで、エコーキャンセラ２００は、ディジタルフィルタ（ＦＩＲ）２０５、減算器２０６、フィルタ係数更新部（ＡＤＰ）２０７、ダブルトーク検出器（ＤＴＤ）２０８から構成される。さらに、本実施形態では、このエコーキャンセラ２００におけるエコー消去処理をフレーム形式で行うために、フレームバッファ（ＢＵＦＦ）２１１〜２１４を備えると共に、アドレス発生器２１５および立ち上がり検出器２１６を備えている。
【００２８】
ディジタルフィルタ２０５は、遠端話者信号Ｘｓの信号系列にそれぞれフィルタ係数を掛け合わせて疑似エコー信号Ｙｓｓを生成するものである。この疑似エコー信号Ｙｓｓは演算器２０６に供給される。減算器２０６は、Ａ／Ｄ変換器２０４の出力である近端話者信号Ｙｓから疑似エコー信号Ｙｓｓを減算してエコー成分を打ち消し、その残り成分である残差エコー信号Ｅｓを生成する。
【００２９】
フィルタ係数更新部２０７は、所定のアルゴリズムに基づきディジタルフィルタ２０５のフィルタ係数を更新する。このフィルタ係数更新部２０７にて、上記減算器２０６で消去されずに通過する残差信号を最小にするためのフィルタ係数（タップ係数とも呼ばれる）の更新処理が行われる。
【００３０】
なお、フィルタ係数の更新アルゴリズムとしては、例えば最小二乗平均法（ＬＭＳ）を正規化した学習同定法（ＮＬＭＳ）が用いられる。この学習同定法によるアルゴリズムは、演算量が比較的少なくて済み、しかも良好な特性を示すという利点を有する。このフィルタ係数更新（学習）により、ディジタルフィルタ２０５の伝達関数はエコーパスの伝達関数に次第に近づき、両伝達関数が等しくなると残差信号レベルは略零になる。
【００３１】
ダブルトーク検出器２０８は、ダブルトーク期間を検出して、エコーキャンセラ２００の発散を防止する。
【００３２】
また、各フレームバッファ（ＢＵＦＦ）２１１〜２１４は、フレーム単位で各信号をそれぞれ所定時間蓄積するためのものである。なお、フレームの長さは、音声復号化器２０９および音声符号化器２１０における信号の処理単位に対応している。
【００３３】
フレームバッファ２１１は、近端話者信号用のフレームバッファである。このフレームバッファ２１１は減算器２０６の前段に設けられ、Ａ／Ｄ変換器２０４の出力である近端話者信号Ｙｓを１フレーム分一時的に蓄積する。フレームバッファ２１２は、疑似エコー信号用のフレームバッファである。このフレームバッファ２１２は減算器２０６の前段に設けられ、ディジタルフィルタ２０５の出力である疑似エコー信号Ｙｓｓを１フレーム分一時的に蓄積する。フレームバッファ２１３は、残差エコー信号用のフレームバッファである。このフレームバッファ２１３は減算器２０６の後段に設けられ、減算器２０６の出力である残差エコー信号Ｅｓを１フレーム分一時的に蓄積する。フレームバッファ２１４は、遠端話者信号用のフレームバッファである。このフレームバッファ２１４は音声復号化器２０９の後段に設けられ、音声復号化器２０９の出力である遠端話者信号Ｘｓを１フレーム分一時的に蓄積する。
【００３４】
アドレス発生器２１５は、減算器２０６での減算処理に用いられる近端話者信号用フレームバッファ２１１、疑似エコー信号用フレームバッファ２１２、残差エコー信号用フレームバッファ２１３に対する各アドレスポインタを自動的にインクリメントするためのものである。このアドレス発生器２１５によってインクリメントされるアドレスポインタに従って近端話者信号Ｙｓおよび疑似エコー信号Ｙｓｓが１フレーム分ずつフレームバッファ２１１およびフレームバッファ２１２の所定のエリアにそれぞれ取り込まれ、その減算結果として得られる残差エコー信号Ｅｓが減算器２０６の出力バッファつまりフレームバッファ２１３の所定のエリアに取り込まれる。
【００３５】
また、立ち上がり検出器２１６は、近端話者信号Ｙｓの立ち上がりを検出する。この立ち上がり検出器２１６は、後述するように立ち上がり検出に用いるバッファエリアを複数に分割し、その分割した各エリア毎にダブルトーク検出に使用する信号のパラメータを算出し、各パラメータが小さい順に並ぶ場合に音声信号の立ち上がりであると判定する。この立ち上がり検出の結果はダブルトーク検出器２０８に反映される。
【００３６】
次に、本装置の処理動作について説明する。
【００３７】
図示せぬ受信回路にて受信されたパラメータだけの受話信号は音声復号化器２０９で復号化処理され、時系列の遠端話者信号Ｘｓに変換される。この遠端話者信号ＸｓはＤ／Ａ変換器２０３にてアナログ通話信号に戻された後、スピーカ２０６に与えられる。これにより、スピーカ２０１から近端話者に向けて遠端話者（通話相手の）の音声が出力される。
【００３８】
一方、近端話者の音声は、マイクロホン２０２により集音されて送話音声信号に変換された後、Ａ／Ｄ変換器２０４に入力され、ここで先ず所定のサンプリング周期でデジタル化される。そして、このデジタル化された送話信号つまり近端話者信号Ｙｓは、エコーキャンセラ２００にてエコー成分が除去された後、音声符号化器２１０に入力され、ここで符号化処理されて図示せぬ送信回路に供給される。
【００３９】
ここで、エコーキャンセラ２００は、フレーム型のエコーキャンセラとして構成されており、エコーキャンセル処理に必要な各信号を音声復号化器２０９および音声符号化器２１０の処理単位であるフレーム単位で各フレームバッファ２１４〜２１１に蓄えながら処理を行う。
【００４０】
すなわち、Ａ／Ｄ変換器２０４の出力である近端話者信号Ｙｓとディジタルフィルタ２０５の出力である疑似エコー信号Ｙｓｓは、それぞれフレームバッファ２１１とフレームバッファ２１２に１フレーム分蓄えられた後、減算器２０６にて減算処理される。この減算器２０６の減算処理により、近端話者信号Ｙｓに含まれるエコー成分が除去される。また、減算結果として得られた残差エコー信号Ｅｓはフレームバッファ２１３に蓄えられる。この残差エコー信号Ｅｓはフィルタ係数更新部２０７に与えられ、ここでフィルタ係数の更新処理（学習処理）が所定のアルゴリズムに基づいて行われる。
【００４１】
このようなエコーキャンル処理が繰り返し行われる。この場合、従来のようなサンプル処理と違って、時系列で得られる音声信号をフレーム単位でまとめて処理することができるため、ＤＳＰの演算量を大幅に削減できるといったメリットがある。
【００４２】
また、既に説明したように、通話時には双方向で同時に会話することがあるため、所謂ダブルトークが発生する。ダブルトークの期間にフィルタ係数の更新を行うと、フィルタ係数が収束せずに、エコーキャンセル処理を正確に行えなくなる。そこで、ダブルトーク検出器２０８によりダブルトークが検出されたときには、フィルタ係数の更新を中断する必要がある。
【００４３】
ここで、ダブルトーク検出器２０８では、上述した各バッファに蓄積された信号の特性を利用してダブルトーク検出に使用する。ここでは、ダブルトーク検出に使用する信号のバッファエリアを２つ以上に分割し、その分割したエリア毎にダブルトーク検出に使用するパラメータを算出し、音声の立ち上がりかどうかを判定し、音声の立ち上がりの場合にはダブルトークの判定に上記の立ち上がり検出結果を反映させている。
【００４４】
例えば、遠端話者信号用フレームバッファ２１４に格納された遠端話者信号Ｘｓのフレームパワー特性（パワー値またはピーク値）と、残差エコー信号用フレームバッファ２１３に格納された残差エコー信号Ｅｓのフレームパワー特性（パワー値またはピーク値）に基づいてダブルトーク判定を行う場合には、以下のような式で判定を行う。
【００４５】
ｌｏｇＸ＋ｔｈ＜ｌｏｇＥ …（１）
ここで、ｌｏｇＸは遠端話者信号Ｘｓのフレームパワーの対数値、ｌｏｇＥは残差エコー信号Ｅｓのフレームパワーの対数値である。また、ｔｈは可変閾値であり、これは残差エコー信号Ｅｓがフィルタ係数の収束に伴い小さくなっていくため、その変化を補うためのものである。
【００４６】
上記（１）式において、残差エコー信号Ｅｓのエリアつまり残差エコー信号用フレームバッファ２１３を４つに分割し、その４つの分割されたエリア内の信号のパワーをそれぞれＥｐ１、Ｅｐ２、Ｅｐ３、Ｅｐ４とした場合に、Ｅｐ１からＥｐ４の値が、
Ｅｐ１＜Ｅｐ２＜Ｅｐ３＜Ｅｐ４ …（２）
となっている場合には、近端話者信号Ｙｓの音声信号が立ち上がりになっていると判断して、上記（１）式を以下のようにする。
【００４７】
ｌｏｇＸ＋ｔｈ＜ｌｏｇＥ＋ｔｈ２・・・（３）
ｔｈ２は正の可変閾値であり、この可変閾値ｔｈ２を加えることでＥの値で音声の立ち上りのパワーが比較的小さい時にシングルトークと誤判定することを防ぐことができる。つまり、音声の立ち上り時には比較的パワーが低いため、ダブルトークの検出がしづらい。そこで、残差エコー信号Ｅｓの方に可変閾値ｔｈ２を加えることで、ダブルトークを検出しやすくしている。これは、フレーム処理独特の方法であり、従来のおうなサンプリング処理ではできない。
【００４８】
なお、上記（３）時では、右辺の式に閾値ｔｈ２を加えるようにしているが、以下の（４）式に示すように、左辺の式から閾値ｔｈ２を引くことでも同じある。要は、音声立ち上がり時に、音声信号のパワーが低いことを考慮した演算を行えば良い。
【００４９】
ｌｏｇＸ＋ｔｈ−ｔｈ２＜ｌｏｇＥ・・・（４）
図１に示す立ち上り検出器２１６は、残差エコー信号用フレームバッファ２１３を上述にしたように複数のエリアに分割し、各エリア内の信号のパワーを比較することで、上記（２）式の条件を満たす場合に近端話者信号Ｙｓの音声信号が立ち上がりになっているものと判定し、その結果をダブルトーク検出器２０８に知らせている。ダブルトーク検出器２０８では、遠端話者信号用フレームバッファ２１４に格納された遠端話者信号Ｘｓと、残差エコー信号用フレームバッファ２１３に格納された残差エコー信号Ｅｓの各信号のパワー特性を比較することで、上記（１）式の条件を満たす場合にダブルトーク中であると判定する。その際、立ち上り検出器２１６から音声の立ち上りであることが検出されると、ダブルトーク検出器２０８は上記（１）式を上記（３）または（４）式のように置き換えてダブルトークの判定を行う。
【００５０】
このように、ダブルトーク検出処理に音声立ち上がりの検出結果を反映させることで、音声立ち上がり時（信号のパワーが低いとき）であってもダブルトークの期間を正確に検出できるようになる。これにより、フィルタ係数更新部２０７でのフィルタ係数の更新動作を制御して、エコーキャンセル処理を正しく行うことができる。
【００５１】
なお、上記実施形態では、所定時間蓄積された遠端話者信号Ｘｓと近端話者信号Ｙｓから疑似エコー信号Ｙｓｓを差し引いた残差エコー信号Ｅｓのパワー特性を検出することでダブルトーク期間を検出するようにしたが、別の方法として、所定時間蓄積された近端話者信号Ｙｓと遠端話者信号Ｘｓを対象として、それらの音声信号のパワー特性を検出することでも上記同様にダブルトーク期間を検出することが可能である。
【００５２】
また、上記ディジタルフィルタ２０５はフレーム型になっており、音声復号化器２０９内の出力バッファに格納された１フレーム分の遠端話者信号Ｘｓとフィルタ係数更新部２０７によって更新されるフィルタ係数とを用いて、フレームの長さ分一度に処理をしてフレーム長さ分の疑似エコー信号Ｙｓｓを発生させている。このため、ディジタルフィルタ２０５の入力及び出力バッファのアドレスは一定の値を足し引きを繰り返すだけで済み、それに伴い演算量、特にアドレス設定の処理を削減することができる。
【００５３】
また、ディジタルフィルタ２０５におけるフィルタ係数のバッファエリアの並び順をアドレスの低い方から係数の次数の高いものを並べるようにする。
【００５４】
例えば、係数エリアの先頭アドレスを１０ｈ番地とした場合には、係数エリアの１０ｈ番地に係数の一番次数の高い値、つまり、フィルタ次数が２０次であるとすると、１０ｈ番地に２０次番目のデータを配置する。そして、１１ｈ番地には１９次番目のデータ、１２ｈ番地には１８次番目のデータ、１３ｈ番地には１７次番目のデータ…といったように、アドレスの低い方から係数の次数の高い順に配置する。
【００５５】
このように配置することで、フレーム処理する場合に、フィルタ係数も遠端話者信号Ｘｓのためのバッファエリアも１ステップのインクリメント処理をフィルタ次数分行って、その後に上記２つのバッファをフィルタ次数分だけもとに戻してから、遠端話者信号のバッファエリアのみ１ステップインクリメントさせれば良い。これにより、ディジタルフィルタ２０５の２入力のバッファエリアのアドレスポインタがほぼ同様の動きになるために、そのためのアドレス発生器を簡略化することができる。
【００５６】
なお、上記実施形態では、エコーキャンセラ２００内に各フレームバッファ２１１〜２１４を設けたが、これらのフレームバッファ２１１〜２１４を新たに設けなくとも、以下に説明するように音声復号化器２０９および音声符号化器２１０ではフレーム処理のためのバッファが既にあるため、このバッファを利用することもできる。
【００５７】
図２に音声符号化器２１０の一般的な構成を示す。
【００５８】
図２は主にＣＥＬＰ(code excited linear prediction)の符号化部を示したものである。ｃｏｄｅｒへの入力信号はハイパスフィルタ（ＨＰＦ）３０１を通して直流成分が取り除かれる。ハイパスフィルタ３０１の出力はそれぞれ短期相関検出器３０２（例えばＬＰＣまたはＬＳＰパラメータ検出器で構成される）により周波数の包絡特性のパラメータを検出し、一般にその検出したパラメータを量子化したものを多重化部３０６へと渡す。
【００５９】
また、長期相関検出器３０３（例えばピッチ検出器で構成される）によりピッチ成分を検出して、その量子化した値を多重化部３０６へと渡す。さらに、励起信号検出器（例えば雑音コードブック検出器で構成される）３０４により上記長期相関器３０３および短期相関器３０２で求められたフィルタ係数の入力となる励起信号を求め、上記雑音コードブックの番号を多重化部３０６に渡し、多重化部３０６で束ねられた信号を送信する。
【００６０】
上述した音声符号化器２１０の処理、特に短期相関器の処理はある時間分の音声データが必要なためにバッファ３０７が必要となる。このバッファ３０７を上述したエコーキャンセラ２００のためのフレームバッファ２１１〜２１４として利用すれば、必要以上のスタティックなメモリ領域を新たに追加することなく、従来よりもステップ数の少ない音声通信装置を実現することができる。
【００６１】
なお、音声復号化器２０９ついても同様であり、その復号化処理に用いられるバッファを利用することが可能である。
【００６２】
このように、音声符号化／復号化装置とエコーキャンセラとを組み合わせた音声通信装置において、エコーキャンセラでのエコー消去処理を音声符号化／復号化装置に合わせたフレーム形式とすることで、各信号に対する処理をまとめて行うことができるため、割り込み処理等の複雑な演算処理を必要とせずに効率的にエコーを消去することができる。
【００６３】
さらに、音声符号化／復号化装置が有するバッファを利用することで、エコーキャンセラでのエコー消去処理に必要なメモリの容量を削減でき、従来よりもステップ数の少ない演算を行うことができる。
【００６４】
【発明の効果】
以上のように本発明によれば、シングルトークと誤判定されやすい音声の立ち上り時でも、音声の立ち上りが検出されたときはよりダブルトークと判定しやすい判定基準を用いてダブルトーク検出を行うため、シングルトークと誤検出されにくくすることができる。
【図面の簡単な説明】
【図１】本発明のエコーキャンセラを備えた音声通信装置の構成を示すブロック図。
【図２】上記音声通信装置に設けられた音声符号化器の構成を示すブロック図。
【図３】従来のエコーキャンセラを備えた音声通信装置の構成を示すブロック図。
【符号の説明】
２０１…スピーカ
２０２…マイクロホン
２０３…Ｄ／Ａ変換器
２０４…Ａ／Ｄ変換器
２０５…ディジタルフィルタ
２０６…減算器
２０７…フィルタ係数更新部
２０８…ダブルトーク検出器
２０９…音声復号化器
２１０…音声符号化器
２１１…近端話者信号用フレームバッファ
２１２…疑似エコー信号用フレームバッファ
２１３…残差エコー信号用フレームバッファ
２１４…遠端話者信号用フレームバッファ
２１５…アドレス発生器
２１６…立ち上がり検出器
３０１…ハイパスフィルタ
３０２…短期相関検出器
３０３…長期相関検出器
３０４…励起信号検出器
３０５…ゲイン検出器
３０６…多重化部
３０７…バッファ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a portable terminal such as a digital car phone that compresses and transmits audio signals in order to secure a large number of lines, and is a so-called hands-free phone type portable terminal capable of making a call without a telephone. The present invention relates to an echo canceller for suppressing deterioration of a call due to an echo generated in a voice communication apparatus and a voice communication apparatus including the echo canceller.
[0002]
[Prior art]
In recent years, in the field of computers and communications, digital signal processing (DSP) has attracted attention and has been applied to various fields. Since this digital signal processing can easily realize complicated processing such as arbitrary change of characteristic constants and adaptive processing, which was difficult with analog processing, it is a general-purpose technology especially in the fields of audio processing and image processing. It is used as.
[0003]
For example, in a hands-free type telephone used in a car phone or the like, during a hands-free call, the received voice from the speaker may wrap around the microphone and be sent to the other party to generate an acoustic echo. In order to cancel the acoustic echo caused by the wraparound of the received voice and maintain the call quality, an echo canceller generally called an echo canceller is used.
[0004]
In this echo canceller, a pseudo echo signal is generated by the digital signal processing described above, and the echo component contained in the transmission signal is removed by subtracting the pseudo echo signal from the transmission signal of the caller input from the microphone. ing.
[0005]
Further, in the wireless cellular phone device, in order to increase the number of communication channels, a low-bit-rate transmission is performed by a voice encoding device that detects a parameter of a voice signal and sends only the parameter.
[0006]
FIG. 3 is a block diagram showing a configuration of a voice communication apparatus provided with a conventional echo canceller.
[0007]
As shown in FIG. 3, a voice communication device used in a wireless mobile phone device or the like includes a speaker 101, a microphone 102, a D / A converter 103, an A / D converter 104, and a voice decoder (DECODER) 109. A speech encoder (CODER) 110 and an echo canceller 100.
[0008]
The speaker 101 outputs the voice of the far-end speaker (calling partner), and the microphone 102 is used to input the voice of the near-end speaker. The D / A converter 103 performs digital / analog conversion on the far-end speaker signal Xs from the speech decoder 109 and outputs it to the speaker 101. The A / D converter 104 performs analog / digital conversion on the voice input from the microphone 102 to generate a digitized near-end speaker signal Ys. The voice decoder 109 performs a decoding process from a signal of only parameters received by a receiving circuit (not shown) to a voice signal, and generates a far-end speaker signal Xs. The speech encoder 110 detects a parameter of the transmission speech signal from which the echo component has been eliminated by the echo canceller 100, encodes only the parameter, and sends the encoded parameter to a transmission circuit (not shown).
[0009]
In such a configuration, the received signal of only the parameters received by the receiving circuit (not shown) is decoded by the voice decoder 109 and further converted back to the analog call signal by the D / A converter 103. Thereafter, a sound is output from the speaker 106 toward the speaker.
[0010]
On the other hand, the transmission voice of the speaker is collected by the microphone 102 and converted into a transmission signal, and then input to the A / D converter 104, where it is first digitized at a predetermined sampling period. The digitized transmission signal, that is, the near-end speaker signal Ys is input to the speech encoder 110 after the echo component is removed by the echo canceller 100, where it is encoded and illustrated. Supplied to the transmission circuit.
[0011]
Here, the echo canceller 100 includes a digital filter (FIR) 105, a subtracter 106, a filter coefficient update unit (ADP) 107, and a double talk detector (DTD) 108.
[0012]
The digital filter 105 multiplies the far-end speaker signal Xs by a filter coefficient to generate a pseudo echo signal Yss. This pseudo echo signal Yss is supplied to the arithmetic unit 106. The subtractor 106 subtracts the pseudo echo signal Yss from the near-end speaker signal Ys that is the output of the A / D converter 104 to generate a residual echo signal Es that cancels the echo component.
[0013]
Further, the filter coefficient update unit 107 updates the filter coefficient of the digital filter 105. Here, in order to minimize the residual signal that passes through without being erased by the subtractor 106, an arithmetic process for updating a filter coefficient (also referred to as a tap coefficient) is performed. By this filter coefficient update (learning), the transfer function of the digital filter 105 gradually approaches the transfer function of the echo path, and the residual signal level becomes substantially zero when both transfer functions become equal.
[0014]
The update of the filter coefficient by the filter coefficient updating unit 107 cannot be performed during so-called double talk when talking in both directions simultaneously. This is because if the filter coefficient is updated during double talk, the filter coefficient does not converge and the echo component included in the transmitted signal cannot be erased accurately. Therefore, a double talk detector 108 is generally provided to detect the double talk period during communication and prevent the echo canceller 100 from diverging.
[0015]
By the way, the above-described conventional echo canceller 100 is configured to perform processing for each output signal of the AD converter 104, that is, configured in a sample format, and performs the determination processing of the double talk period for each sample. Yes.
[0016]
Also, when both echo cancellation processing and speech encoding / decoding processing are performed by a DSP (Digital Signal Processor), speech coding / decoding processing is generally frame processing, whereas echo cancellation processing is sample processing. Therefore, echo cancellation processing is entered as an interrupt during voice encoding / decoding processing. For this reason, extra processing such as register saving for interrupts is required.
[0017]
[Problems to be solved by the invention]
As described above, since the speech coder / decoder and the echo canceller conventionally have independent configurations (frame processing for the speech coder / decoder and sample processing for the echo canceller), when processing is performed with one DSP Needed interrupt handling. In this case, if hangover processing is added to the double talk result, the sample processing must perform interrupt processing only to change the hangover counter value or determine the counter value. There was a problem that would be very many.
[0018]
In addition, the determination of double talk is usually performed by detecting the rising edge of the sound. However, since the sample processing is performed for each sample, there is a problem that it is difficult to detect the signal state such as the rising edge of the sound. It was.
[0019]
The present invention has been made in view of the above points, and an echo canceller capable of efficiently canceling an echo component without requiring complicated arithmetic processing such as interrupt processing, and voice communication including the echo canceller. An object is to provide an apparatus.
[0020]
[Means for Solving the Problems]
  The echo canceller of the present invention includes a first buffer means for storing far-end speaker signals in units of frames, a second buffer means for storing near-end speaker signals in units of frames, and the first buffer means. Filter means for generating a pseudo echo signal by multiplying the accumulated far-end speaker signal by a filter coefficient, and a pseudo echo generated by the filter means from the near-end speaker signal accumulated in the second buffer means Subtracting means for removing the echo component contained in the near-end speaker signal by subtracting the signal, filter coefficient updating means for updating the filter coefficient based on a predetermined algorithm, and residual echo signal obtained by the subtracting means A third buffer means for storing the frame by frame;A rising edge detecting means for reading the residual echo signal accumulated in the third buffer means, dividing the residual signal into a plurality of blocks, and detecting a rising edge of the near-end speaker signal from a change in power for each block; In response to the accumulation of the residual echo signal for one frame in the third buffer means, the relationship between the residual echo signal and the far-end speaker signal accumulated in the first buffer means is the first. Satisfying the second double talk determination criterion that is easier to detect double talk than the first double talk determination criterion set in response to the rising detection by the rising detection means. Double talk detecting means for detecting double talk and interrupting the update of the filter coefficient by the filter coefficient updating means;It is characterized by comprising.
[0021]
  According to such a configuration,Even when the voice rise is difficult to detect double talk, if the rise detection means detects the voice rise, the double talk detection means detects the double talk using a determination criterion that makes it easier to determine the double talk. Therefore, it is possible to make it difficult to erroneously determine single talk.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
[0024]
FIG. 1 is a block diagram showing a configuration of a voice communication apparatus provided with an echo canceller of the present invention.
[0025]
The voice communication device according to the present embodiment is applied to a wireless mobile phone device such as a car phone, for example. As shown in FIG. 1, this device includes a speaker 201, a microphone 202, and a D / A converter 203. , An A / D converter 204, a speech decoder (DECODER) 209, a speech encoder (CODER) 210, and an echo canceller 200.
[0026]
The speaker 201 outputs the voice of the far-end speaker (calling partner), and the microphone 202 is used to input the voice of the near-end speaker. The D / A converter 203 performs digital / analog conversion on the far-end speaker signal Xs from the speech decoder 209 and outputs the converted signal to the speaker 201. The A / D converter 204 performs analog / digital conversion on the voice input from the microphone 202 to generate a digitized near-end speaker signal Ys. The speech decoder 209 performs a decoding process from a signal having only parameters received by a receiving circuit (not shown) to a speech signal, and generates a time-series far-end speaker signal Xs. The speech encoder 210 detects a parameter of the transmission speech signal from which the echo component is eliminated by the echo canceller 200, and encodes only the parameter.
[0027]
Here, the echo canceller 200 includes a digital filter (FIR) 205, a subtracter 206, a filter coefficient update unit (ADP) 207, and a double talk detector (DTD) 208. Further, in the present embodiment, in order to perform the echo cancellation processing in the echo canceller 200 in a frame format, the frame buffer (BUFF) 211 to 214 is provided, and the address generator 215 and the rising detector 216 are provided.
[0028]
The digital filter 205 multiplies the signal sequence of the far-end speaker signal Xs by a filter coefficient to generate a pseudo echo signal Yss. This pseudo echo signal Yss is supplied to the calculator 206. The subtracter 206 subtracts the pseudo echo signal Yss from the near-end speaker signal Ys that is the output of the A / D converter 204 to cancel the echo component, and generates a residual echo signal Es that is the remaining component.
[0029]
The filter coefficient update unit 207 updates the filter coefficient of the digital filter 205 based on a predetermined algorithm. The filter coefficient updating unit 207 performs update processing of a filter coefficient (also referred to as a tap coefficient) for minimizing a residual signal that passes through without being erased by the subtractor 206.
[0030]
As a filter coefficient update algorithm, for example, a learning identification method (NLMS) obtained by normalizing the least mean square method (LMS) is used. This algorithm based on the learning identification method has the advantage that it requires a relatively small amount of calculation and exhibits good characteristics. By this filter coefficient update (learning), the transfer function of the digital filter 205 gradually approaches the transfer function of the echo path, and the residual signal level becomes substantially zero when both transfer functions become equal.
[0031]
The double talk detector 208 detects the double talk period and prevents the echo canceller 200 from diverging.
[0032]
Each frame buffer (BUFF) 211 to 214 is for accumulating each signal for a predetermined time in units of frames. The length of the frame corresponds to the signal processing unit in speech decoder 209 and speech encoder 210.
[0033]
The frame buffer 211 is a frame buffer for near-end speaker signals. The frame buffer 211 is provided in the preceding stage of the subtracter 206, and temporarily stores the near-end speaker signal Ys, which is the output of the A / D converter 204, for one frame. The frame buffer 212 is a frame buffer for pseudo echo signals. The frame buffer 212 is provided before the subtractor 206, and temporarily stores the pseudo echo signal Yss, which is the output of the digital filter 205, for one frame. The frame buffer 213 is a frame buffer for residual echo signals. The frame buffer 213 is provided at the subsequent stage of the subtracter 206, and temporarily stores the residual echo signal Es, which is the output of the subtracter 206, for one frame. The frame buffer 214 is a frame buffer for far-end speaker signals. The frame buffer 214 is provided at the subsequent stage of the speech decoder 209, and temporarily stores the far-end speaker signal Xs, which is the output of the speech decoder 209, for one frame.
[0034]
The address generator 215 automatically sets the address pointers for the near-end speaker signal frame buffer 211, the pseudo echo signal frame buffer 212, and the residual echo signal frame buffer 213 that are used for the subtraction process in the subtractor 206. It is for incrementing. According to the address pointer incremented by the address generator 215, the near-end speaker signal Ys and the pseudo echo signal Yss are fetched by one frame into predetermined areas of the frame buffer 211 and the frame buffer 212, respectively, and obtained as a subtraction result. The residual echo signal Es is taken into a predetermined area of the output buffer of the subtracter 206, that is, the frame buffer 213.
[0035]
The rising detector 216 detects the rising edge of the near-end speaker signal Ys. The rise detector 216 divides a buffer area used for rise detection as described later, calculates a parameter of a signal used for double talk detection for each divided area, and arranges each parameter in ascending order. It is determined that the voice signal is rising. The result of this rising edge detection is reflected in the double talk detector 208.
[0036]
Next, the processing operation of this apparatus will be described.
[0037]
A received signal having only parameters received by a receiving circuit (not shown) is decoded by a speech decoder 209 and converted into a time-series far-end speaker signal Xs. The far-end speaker signal Xs is returned to an analog call signal by the D / A converter 203 and then given to the speaker 206. As a result, the voice of the far-end speaker (of the other party) is output from the speaker 201 toward the near-end speaker.
[0038]
On the other hand, the voice of the near-end speaker is collected by the microphone 202 and converted into a transmitted voice signal, and then input to the A / D converter 204, where it is first digitized at a predetermined sampling period. The digitized transmission signal, that is, the near-end speaker signal Ys is input to the speech encoder 210 after the echo component is removed by the echo canceller 200, where it is encoded and illustrated. Supplied to the transmission circuit.
[0039]
Here, the echo canceller 200 is configured as a frame-type echo canceller, and each frame buffer receives each signal necessary for echo cancellation processing in units of frames that are processing units of the speech decoder 209 and speech encoder 210. The processing is performed while accumulating in 214 to 211.
[0040]
That is, the near-end speaker signal Ys which is the output of the A / D converter 204 and the pseudo echo signal Yss which is the output of the digital filter 205 are stored in the frame buffer 211 and the frame buffer 212 for one frame, and then subtracted. Subtractor 206 performs the subtraction process. By the subtraction processing of the subtracter 206, the echo component included in the near-end speaker signal Ys is removed. The residual echo signal Es obtained as a subtraction result is stored in the frame buffer 213. The residual echo signal Es is given to the filter coefficient updating unit 207, where the filter coefficient update process (learning process) is performed based on a predetermined algorithm.
[0041]
Such echo cancel processing is repeatedly performed. In this case, unlike the conventional sample processing, since the audio signals obtained in time series can be processed in units of frames, there is an advantage that the DSP calculation amount can be greatly reduced.
[0042]
Further, as already described, since there is a case where two-way conversations occur simultaneously during a call, so-called double talk occurs. If the filter coefficient is updated during the double talk period, the filter coefficient does not converge and the echo cancellation process cannot be performed accurately. Therefore, when double talk is detected by the double talk detector 208, it is necessary to interrupt the update of the filter coefficient.
[0043]
Here, the double-talk detector 208 is used for double-talk detection by utilizing the characteristics of the signals accumulated in the buffers described above. Here, the buffer area of the signal used for double-talk detection is divided into two or more, and the parameters used for double-talk detection are calculated for each of the divided areas to determine whether or not the voice rises. In this case, the rising detection result is reflected in the determination of double talk.
[0044]
For example, the frame power characteristic (power value or peak value) of the far-end speaker signal Xs stored in the far-end speaker signal frame buffer 214 and the residual echo signal stored in the residual echo signal frame buffer 213 When performing double talk determination based on the frame power characteristic (power value or peak value) of Es, the determination is performed using the following equation.
[0045]
logX + th <logE (1)
Here, logX is a logarithmic value of the frame power of the far-end speaker signal Xs, and logE is a logarithmic value of the frame power of the residual echo signal Es. Further, th is a variable threshold value, and this is for compensating for the change since the residual echo signal Es becomes smaller as the filter coefficient converges.
[0046]
In the above equation (1), the area of the residual echo signal Es, that is, the residual echo signal frame buffer 213 is divided into four, and the powers of signals in the four divided areas are respectively Ep1, Ep2, Ep3, When Ep4 is set, the values from Ep1 to Ep4 are
Ep1 <Ep2 <Ep3 <Ep4 (2)
If it is, it is determined that the voice signal of the near-end speaker signal Ys is rising, and the above equation (1) is changed as follows.
[0047]
logX + th<logE + th2 (3)
Th2 is a positive variable threshold value, and by adding this variable threshold value th2, it is possible to prevent erroneous determination as single talk when the voice rising power is relatively small at the value of E. That is, it is difficult to detect double talk because the power is relatively low when the voice rises. Therefore, by adding a variable threshold th2 to the residual echo signal Es, it is easy to detect double talk. This is a method peculiar to frame processing and cannot be performed by conventional sampling processing.
[0048]
In the above (3), the threshold value th2 is added to the expression on the right side, but the same applies to subtracting the threshold value th2 from the expression on the left side as shown in the following expression (4). In short, it is only necessary to perform an operation considering that the power of the audio signal is low when the audio rises.
[0049]
logX + th-th2<logE (4)
The rising detector 216 shown in FIG. 1 divides the residual echo signal frame buffer 213 into a plurality of areas as described above, and compares the signal power in each area, thereby When the condition is satisfied, it is determined that the voice signal of the near-end speaker signal Ys is rising, and the result is notified to the double talk detector 208. In the double talk detector 208, the power of each signal of the far-end speaker signal Xs stored in the far-end speaker signal frame buffer 214 and the residual echo signal Es stored in the residual echo signal frame buffer 213. By comparing the characteristics, it is determined that the double talk is being performed when the condition of the expression (1) is satisfied. At this time, when it is detected from the rising detector 216 that the voice is rising, the double talk detector 208 replaces the equation (1) with the equation (3) or (4) to determine the double talk. I do.
[0050]
Thus, by reflecting the detection result of the voice rise in the double talk detection processing, the double talk period can be accurately detected even when the voice rises (when the signal power is low). As a result, the filter coefficient update operation in the filter coefficient update unit 207 can be controlled to correctly perform the echo cancellation process.
[0051]
In the above embodiment, the double-talk period is set by detecting the power characteristic of the residual echo signal Es obtained by subtracting the pseudo echo signal Yss from the far-end speaker signal Xs and the near-end speaker signal Ys accumulated for a predetermined time. However, as another method, it is also possible to detect the power characteristics of the near-end speaker signal Ys and far-end speaker signal Xs accumulated for a predetermined time and to detect the power characteristics of those speech signals as in the above case. It is possible to detect the talk period.
[0052]
The digital filter 205 is a frame type, and the far-end speaker signal Xs for one frame stored in the output buffer in the speech decoder 209 and the filter coefficient updated by the filter coefficient updating unit 207 Is used to generate a pseudo echo signal Yss for the length of the frame by processing at once for the length of the frame. For this reason, the input and output buffer addresses of the digital filter 205 need only be repeatedly added and subtracted, and the amount of calculation, particularly address setting processing, can be reduced accordingly.
[0053]
Further, the order of arrangement of the buffer areas of the filter coefficients in the digital filter 205 is arranged from the lowest address to the higher coefficient order.
[0054]
For example, if the top address of the coefficient area is 10h, and the 10th address of the coefficient area is the highest value of the coefficient, that is, the filter order is 20th, the 20th order is 10th. Arrange the data. Then, 19th order data is assigned to the 11h address, 18th order data to the 12h address, 17th order data to the 13h address, and so on.
[0055]
With this arrangement, when frame processing is performed, both the filter coefficient and the buffer area for the far-end speaker signal Xs are incremented by one step for the filter order, and then the above two buffers are set to the filter order. After returning to the original amount, only the far-end speaker signal buffer area may be incremented by one step. As a result, the address pointer of the two-input buffer area of the digital filter 205 operates in substantially the same manner, and the address generator for that purpose can be simplified.
[0056]
In the above-described embodiment, the frame buffers 211 to 214 are provided in the echo canceller 200. However, even if these frame buffers 211 to 214 are not newly provided, as described below, the audio decoder 209 and the audio buffers Since the encoder 210 already has a buffer for frame processing, this buffer can also be used.
[0057]
FIG. 2 shows a general configuration of speech encoder 210.
[0058]
FIG. 2 mainly shows a coding part of CELP (code excited linear prediction). The DC signal is removed from the input signal to the coder through a high pass filter (HPF) 301. The output of the high-pass filter 301 is obtained by detecting a frequency envelope characteristic parameter by a short-term correlation detector 302 (for example, constituted by an LPC or LSP parameter detector), and generally a quantizing unit that quantizes the detected parameter. To 306.
[0059]
Further, the pitch component is detected by the long-term correlation detector 303 (for example, constituted by a pitch detector), and the quantized value is passed to the multiplexing unit 306. Further, an excitation signal that is input to the filter coefficients obtained by the long-term correlator 303 and the short-term correlator 302 is obtained by an excitation signal detector 304 (for example, constituted by a noise code book detector), and the noise code book The number is passed to the multiplexing unit 306, and the signal bundled by the multiplexing unit 306 is transmitted.
[0060]
The above-described processing of the speech encoder 210, particularly the short-term correlator, requires a certain amount of speech data, and thus the buffer 307 is necessary. If this buffer 307 is used as the frame buffers 211 to 214 for the echo canceller 200 described above, a voice communication apparatus having a smaller number of steps than the prior art can be realized without newly adding more static memory areas than necessary. be able to.
[0061]
The same applies to the audio decoder 209, and a buffer used for the decoding process can be used.
[0062]
In this way, in a voice communication device that combines a voice encoding / decoding device and an echo canceller, each signal can be obtained by making echo cancellation processing in the echo canceller into a frame format that matches the voice encoding / decoding device. Therefore, the echo can be efficiently erased without requiring complicated arithmetic processing such as interrupt processing.
[0063]
Furthermore, by using a buffer included in the speech encoding / decoding device, it is possible to reduce the memory capacity required for echo cancellation processing in the echo canceller, and to perform operations with fewer steps than in the past.
[0064]
【The invention's effect】
  As described above, according to the present invention,Even when the voice rises easily misidentified as single talk, when the voice rise is detected, double talk detection is performed using a criterion that makes it easier to judge double talk. Can do.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a voice communication apparatus including an echo canceller according to the present invention.
FIG. 2 is a block diagram showing a configuration of a speech encoder provided in the speech communication apparatus.
FIG. 3 is a block diagram showing a configuration of a voice communication apparatus provided with a conventional echo canceller.
[Explanation of symbols]
201 ... speaker
202 ... Microphone
203 ... D / A converter
204 ... A / D converter
205: Digital filter
206 ... subtractor
207 ... Filter coefficient update unit
208 ... Double talk detector
209 ... Voice decoder
210: Speech encoder
211 ... Near-end speaker signal frame buffer
212 ... Pseudo echo signal frame buffer
213 ... Frame buffer for residual echo signal
214 ... Frame buffer for far-end speaker signal
215 ... Address generator
216 ... Rising detector
301 ... High-pass filter
302 ... Short-term correlation detector
303 ... Long-term correlation detector
304 ... Excitation signal detector
305 ... Gain detector
306: Multiplexer
307 ... Buffer

Claims

First buffer means for storing far-end speaker signals in units of frames;
Second buffer means for storing the near-end speaker signal in units of frames;
Filter means for generating a pseudo echo signal by multiplying the far-end speaker signal accumulated in the first buffer means by a filter coefficient;
Subtracting means for removing an echo component contained in the near-end speaker signal by subtracting the pseudo echo signal generated by the filter means from the near-end speaker signal stored in the second buffer means;
Filter coefficient updating means for updating the filter coefficient based on a predetermined algorithm;
Third buffer means for storing the residual echo signal obtained by the subtracting means in units of frames;
A rising edge detecting means for reading the residual echo signal accumulated in the third buffer means, dividing the residual signal into a plurality of blocks, and detecting a rising edge of the near-end speaker signal from a change in power for each block;
In response to the accumulation of the residual echo signal for one frame in the third buffer means, the relationship between the residual echo signal and the far-end speaker signal accumulated in the first buffer means is the first. Satisfying the second double talk determination criterion that is easier to detect double talk than the first double talk determination criterion set in response to the rising detection by the rising detection means. An echo canceller comprising: a double-talk detecting unit that detects double-talk and interrupts updating of the filter coefficient by the filter coefficient updating unit .

The subtracting means performs a subtraction process by increasing the address pointer of each buffer area for storing the near-end speaker signal, the pseudo echo signal, and the residual echo signal by a certain amount for each process. The echo canceller according to claim 1.

2. The echo canceller according to claim 1, wherein the filter means has a coefficient buffer area, and in the coefficient buffer area, filter coefficients of higher order are arranged in order from the lowest address.

A speech coder / decoder for decoding and receiving a far-end speaker signal, encoding and transmitting a near-end speaker signal, and removing an echo component included in the near-end speaker signal In an audio communication device equipped with an echo canceller for
The echo canceller is
First buffer means for storing far-end speaker signals in units of frames;
Second buffer means for storing the near-end speaker signal in units of frames;
Filter means for generating a pseudo echo signal by multiplying the far-end speaker signal accumulated in the first buffer means by a filter coefficient;
Subtracting means for removing an echo component contained in the near-end speaker signal by subtracting the pseudo echo signal generated by the filter means from the near-end speaker signal stored in the second buffer means;
Filter coefficient updating means for updating the filter coefficient based on a predetermined algorithm;
Third buffer means for storing the residual echo signal obtained by the subtracting means in units of frames;
A rising edge detecting means for reading the residual echo signal accumulated in the third buffer means, dividing the residual signal into a plurality of blocks, and detecting a rising edge of the near-end speaker signal from a change in power for each block;
In response to the accumulation of the residual echo signal for one frame in the third buffer means, the relationship between the residual echo signal and the far-end speaker signal accumulated in the first buffer means is the first. Or a second double talk determination criterion that is easier to determine as a double talk than the first double talk determination criterion set in response to detection of a rising edge by the rising detection means. A voice communication apparatus comprising: a double talk detecting means for detecting double talk and interrupting the updating of the filter coefficient by the filter coefficient updating means .

The buffer means for storing the far-end speaker signal, the near-end speaker signal and the residual echo signal in units of frames, and receiving and decoding the far-end speaker signal stored in the buffer means, A speech encoder for encoding and transmitting the near-end speaker signal stored in the buffer means, and an echo canceller for removing an echo component included in the near-end speaker signal In a voice communication device,
The echo canceller is
Filter means for generating a pseudo echo signal by multiplying the far-end speaker signal accumulated in the buffer means by a filter coefficient;
Subtracting means for removing an echo component contained in the near-end speaker signal by subtracting the pseudo echo signal generated by the filter means from the near-end speaker signal stored in the buffer means;
Filter coefficient updating means for updating the filter coefficient based on a predetermined algorithm;
Read the residual echo signal accumulated in the buffer means, dividing the residual echo signal into a plurality of blocks, rising detection means for detecting the rising edge of the near-end speaker signal from the change in power for each block;
In response to the accumulation of the residual echo signal for one frame in the buffer means, the relationship between the residual echo signal and the far-end speaker signal is raised by the first double talk determination criterion or the rising edge detecting means. Double-talk is detected by satisfying a second double-talk criterion that is easier to determine as double-talk than the first double-talk criterion set according to the detection, and the filter coefficient updating means performs the above-mentioned A voice communication apparatus comprising: a double talk detecting means for interrupting the update of the filter coefficient .