JP3847989B2

JP3847989B2 - Signal extraction device

Info

Publication number: JP3847989B2
Application number: JP34806498A
Authority: JP
Inventors: 龍池沢; 章中村; 哲夫梅田
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1998-12-08
Filing date: 1998-12-08
Publication date: 2006-11-22
Anticipated expiration: 2018-12-08
Also published as: JP2000172297A

Abstract

PROBLEM TO BE SOLVED: To extract in real-time only a voice signal component from a signal of one channel that a voice signal and a non-voice signal are mixed by using the signal shifting an input signal by a fixed time and using the high band emphasized signal. SOLUTION: A delayed signal xj is supplied to an adaptive filter 1, and its filter coefficient Wj is successively updated by an output from an adaptive filter coefficient successive renewal part 2. The output signal zj of the adaptive filter 1 is applied to a reduction terminal of a subtracter 3, and is subtracted from the signal yj as it is applied to the non-reduction terminal of the subtracter 3, and the output of the subtracter 3 becomes an output signal ej of an adaptive processing circuit. The output signal ej of the adaptive processing circuit and respective tap output signals xj of the adaptive filter 1 are supplied to the adaptive filter coefficient successive renewal part 2, and the filter coefficient is updated successively based on algorithm by a learning identifying method to be outputted. The successive renewal of the filter coefficient is performed so that a difference ej between the output signal zj and a reference signal yj becomes small.

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号を収音する場合に、なんらかの事情で、不必要な音響信号が混入して収音された場合、収音された信号から、所望の音声信号のみをリアルタイムで抽出する信号抽出方法および装置ならびに信号抽出プログラムを記録した媒体に関する。
【０００２】
【従来の技術】
従来の雑音除去手法としては、以下に説明する、いわゆる１入力法と２入力法の手法が知られている。
まず、１入力法の手法としては、例えば、文献 S.F.Boll,“Suppression of Acoustic Noise in Speech Using Spectral Subtraction”IEEE trans., Vol. ASSP-27, No.2, pp.113-120, April(1979)に記載のスペクトルサブトラクション法を挙げることができる。この手法は、既知のあるいは無音区間から推定される雑音の短時間パワースペクトルを入力信号のスペクトルから差し引くことにより、所望の音声信号の短時間振幅スペクトルを得る手法である。
【０００３】
次に、２入力法の手法としては、例えば、文献 B. Widrow et al.,“ Adaptive Noise Cancelling : Principles and Applications”,Proc.IEEE, Vol. 63, No.12, pp.1692-1716(1975)に記載の適応信号処理手法が挙げられる。この手法は、音声と雑音が混在する入力信号のほかに、参照信号として雑音成分のみが別に利用できる場合に、雑音の除去が可能な手法である。参照信号の収音系は、入力信号の収音系と異なるのが一般的であるため、参照信号は入力信号に含まれる雑音成分と一致せず、入力信号から音声信号を抽出するには、参照信号と入力信号に含まれる雑音成分とをできる限り一致させるためのフィルタ処理を施して差分をとることが必要である。このフィルタには一般に適応フィルタが用いられている。
【０００４】
すなわち、この適応信号処理手法は適応フィルタを用いる信号処理手法であり、適応フィルタの制御アルゴリズムとして、上記文献 B. Widrow et al.,“ Adaptive Noise Cancelling : Principles and Applications”,Proc.IEEE, Vol. 63, No.12, pp.1692-1716(1975)に記載のＬＭＳ法や、文献 J. Nagumo et al.,“A Learning Method for System Identification ”,IEEE Trans., Vol.AC-12, No.3, pp. 282-287 JUNE (1967)に記載の学習同定法などがある。ここで適応フィルタとは、入力信号と出力すべき信号を与えると、適応フィルタの係数を自動的に逐次修正し、次第に希望出力に近い信号を出力するようになっていく、いわば一種の学習機能を具えたフィルタである。いずれのアルゴリズムもフィルタ出力と入力との差分をフィルタにフィードバックしてフィルタ係数の値を収束させていくアルゴリズムであり、学習同定法は差分値を正規化している点でＬＭＳ法と異なっている。ＬＭＳ法は単純で計算量が少なく、そのため、装置化も簡単であるという利点がある一方、差分を正規化していないため、入力信号振幅への依存性が大きく、収束が緩やかで速やかな適応を必要とする用途には不向きである。一方、学習同定法はＬＭＳ法の改良版であり、正規化するための計算量は多少増加するが、収束はＬＭＳ法よりはるかに速い。
【０００５】
【発明が解決しようとする課題】
放送、特に報道関連で現場から持ち込まれる素材は、誘拐事件の犯人の声、飛行機のコックピット内の会話音声などＳＮ比が悪く、しかもほとんどの場合、参照信号のない１入力信号の形態である。
緊急を要する場合には、このような信号を対象にリアルタイムで雑音を除去するニーズが生まれるが、次に述べるように、従来技術ではこのような要求に応えることはできなかった。
【０００６】
従来技術としてのスペクトルサブトラクション法では、あらかじめ識別した無音区間を切り出してこれを用いて雑音を除去している。従って、雑音成分が時間的に変動する場合については、その都度、無音区間の検出をやりなおすなど、リアルタイム処理に対する適応性が乏しい。すなわち、従来の１入力法では、パラメータの適応的な調整ができず、時々刻々と変化する１チャンネルの入力信号に対し、所望の信号成分のみを時間的に追従しながらフィルタリングすることは困難である。
【０００７】
また、上述の２入力法の適応信号処理手法では、リアルタイム処理することはできても、このケースでは音声と雑音の混合された１チャンネルの信号しか与えられていないから用いることができない。すなわち、適応信号処理により、時間的な追随をしようとすると、参照信号として、雑音成分のみの信号が必要である。
【０００８】
このように、１入力信号形態の場合で雑音成分が時間的に変動するときにも対応可能な雑音除去装置、さらには聴感上最も聴きとりやすい状態になるよう、効果を確認しながらリアルタイムにパラメータの調整ができる装置の開発が望まれる。
【０００９】
そこで、本発明の目的は、音声信号と非音声信号が混合された１チャンネルの信号から、リアルタイムにパラメータを調整して音声信号成分のみを抽出することができる方法および装置ならびに信号抽出プログラムを記録した媒体を提供することにある。
【００１０】
【課題を解決するための手段】
上記目的を達成するために、本発明では、従来の２入力法で用いられている適応信号処理手法をそのまま用いることを基本とし、音声信号と非音声信号とでは、時間的な相関の性質が異なることに着目して、聴感により判断できる非音声信号の内容に応じて、必要な参照信号として入力信号を一定時間ずらした信号を用いるようにし、さらにそれに加えて、高域強調した信号を入力信号として用いるようにしている。
【００１１】
前者（必要な参照信号として入力信号を一定時間ずらした信号を用いる方法）は、楽音と音声とでは楽音の方が時間的な相関が高いことを利用して、入力信号が楽音と音声の場合に適応フィルタ処理により、時間的相関が高い楽音成分を抽出してこれを入力信号から減じて、より相関の低い音声を抽出しようとする考えに基づくものである。
【００１２】
後者（前者に加えて、高域強調した信号を入力信号として用いる方法）は、雑音と音声では音声の方が時間的な相関が高いことを利用して、より時間的相関の高い音声成分を適応フィルタ処理により抽出するとともに、予め高域強調した信号を入力信号として用いることにより、高域成分が少ない音声の高域強調をも同時に達成しようとするものである。
【００１３】
また、いずれの場合においても、適応フィルタの係数を収束速度を制御しながら逐次更新することにより、時々刻々変化する１チャンネルの入力信号に対し、所望の信号成分のみを時間的に追随しながらフィルタリングすることを可能としている。
【００１４】
さらに言えば、前者では、入力信号とそれを所定時間遅延させた信号とから適応フィルタ処理により、時間的相関がより高い成分を抽出してこれを入力信号から減算するようにし、また、後者では、入力信号を高域強調処理して得られた出力信号と入力信号を所定時間遅延させた信号とから適応フィルタ処理によりマッチドフィルタの係数をリアルタイムで決定する系統と、決定したフィルタ係数に基づき、入力信号をフィルタ処理して出力する系統とを設け、入力信号を所定時間遅延させる遅延回路の遅延時間、一次元適応フィルタのタップ数、収束係数、および後者の場合にあっては高域強調信号ゲインを聴感上最も聴き取りやすいように、それぞれリアルタイムに調整できるようにしたことに特徴がある。
【００１７】
すなわち、本発明は、時間的相関の低い所望信号と時間的相関の高い非所望信号とが混合された１チャンネルのディジタル入力信号から、前記時間的相関の低い所望信号を抽出する信号抽出装置であって、２分岐された前記ディジタル入力信号の一方の信号であってサンプル数Ｎの遅延が施された信号が入力され、１サンプル間隔で指定されたタップ数ｋからなり、フィルタ係数ベクトルが逐次更新される適応フィルタと、前記適応フィルタの出力信号を、２分岐されたディジタル入力信号の他方の信号から減算し、減算結果を前記時間的相関の低い所望信号として出力する減算手段と、前記減算手段からの出力信号と前記適応フィルタの各タップからの出力信号を用いて、
【数５】

Ｗ _ｊ：適応フィルタのフィルタ係数ベクトル
μ：収束係数
ｅ _ｊ：減算手段の出力信号（所望信号）
Ｘ _ｊ：適応フィルタへの入力信号ベクトル
｜Ｘ _ｊ｜：入力信号ベクトルのノルム
を演算し、前記適応フィルタのフィルタ係数ベクトルＷ _ｊを逐次更新する適応フィルタ係数逐次更新手段とを少なくとも具え、遅延された前記ディジタル入力信号の一方の信号を前記適応フィルタに入力して、
【数６】

ｚ _ｊ：適応フィルタの出力信号
〈Ｗ _ｊ，Ｘ _ｊ〉：フィルタ係数ベクトルと入力信号ベクトルの内積
を演算し、演算結果である適応フィルタの出力信号ｚ _ｊを前記減算手段に入力して、
ｅ _ｊ＝ｙ _ｊ −ｚ _ｊ
ｙ _ｊ：２分岐されたディジタル入力信号の他方の信号
を演算し、前記時間的に相関の低い所望信号ｅ _ｊを出力することを特徴とするものである。
【００１８】
また、本発明の信号抽出装置は、前記ディジタル入力信号をＦサンプル毎のブロックに分割するブロック分割手段をさらに当該装置の入力端に具えるとともに、前記２分岐されたディジタル入力信号の他方の信号が入力される前記減算手段の入力端に具える少なくともＦサンプルの記憶容量を有する第１のバッファ手段であって、前記ブロック分割手段の出力信号を前記ブロック単位で順次に記憶するとともに、該記憶後に該第１のバッファ手段に記憶された該第１のバッファ手段の記憶容量に応じて予め定められた第１の所定のサンプルから数えて時間的に（Ｆ−１）サンプル前の記憶サンプルを前記２分岐されたディジタル入力信号の他方の信号として前記減算手段に入力し、前記ブロック単位の記憶に続く次のブロック単位の記憶までの間に前記第１の所定のサンプルから数えて時間的に（Ｆ−１）サンプル前の記憶サンプルから１サンプルずつ前記第１の所定のサンプル方向にずらしながら合計Ｆ個のサンプルを繰り返し前記減算手段に入力する第１のバッファ手段と、サンプル数Ｎの遅延を兼ねる少なくとも（Ｎ＋ｋ＋Ｆ）サンプルの記憶容量を有する第２のバッファ手段であって、前記ブロック分割手段の出力信号を前記ブロック単位に順次に記憶するとともに、該記憶後に前記第１の所定のサンプルと同一タイミングで該第２のバッファ手段に記憶された該第２のバッファ手段の記憶容量に応じて予め定められた第２の所定のサンプルから数えて時間的に（Ｎ＋ｋ＋Ｆ−１）サンプル前の記憶サンプルから前記第２の所定のサンプルに向かってｋ個連続したサンプルの組を前記適応フィルタの各タップ出力信号として前記適応フィルタに入力し、前記ブロック単位の記憶に続く次のブロック単位の記憶までの間に前記第２の所定のサンプルから数えて時間的に（Ｎ＋ｋ＋Ｆ−１）サンプル前の記憶サンプルから１サンプルずつ前記第２の所定のサンプル方向にずらしながら合計Ｆ組の前記ｋ個連続したサンプルの組を前記第１のバッファ手段と同期して繰り返し前記適応フィルタの各タップ出力信号として前記適応フィルタに入力する第２のバッファ手段とを具えてなることを特徴とするものである。
【００１９】
また、本発明の信号抽出装置は、前記遅延量のサンプル数Ｎ、前記タップ数ｋおよび前記収束係数μをそれぞれ制御することを特徴とするものである。
【００２０】
また、本発明は、時間的相関の高い所望信号と時間的相関の低い非所望信号とが混合された１チャンネルのディジタル入力信号から、前記時間的相関の高い所望信号を抽出する信号抽出装置であって、３分岐された前記ディジタル入力信号の第１の信号が入力され、該信号に高域強調処理と指定されたゲイン倍とを施して高域強調信号を出力する高域強調手段と、３分岐された前記ディジタル入力信号の第２の信号であってサンプル数Ｍだけ遅延された信号が入力され、１サンプル間隔で指定されたタップ数ｋからなり、フィルタ係数ベクトルが逐次更新される適応フィルタと、前記適応フィルタの出力信号を、前記高域強調信号から減算する減算手段と、該減算手段からの出力信号と前記適応フィルタの各タップからの出力信号を用いて、
【数７】

Ｗ _ｊ：適応フィルタのフィルタ係数ベクトル
μ：収束係数
ｅ _ｊ：減算手段の出力信号
Ｘ _ｊ：適応フィルタへの入力信号ベクトル
｜Ｘ _ｊ｜：入力信号ベクトルのノルム
を演算し、前記適応フィルタのフィルタ係数ベクトルＷ _ｊを逐次更新する適応フィルタ係数逐次更新手段と、前記適応フィルタと同一構成であり、前記適応フィルタのフィルタ係数ベクトルがコピーされるマッチドフィルタとを少なくとも具え、３分岐された前記ディジタル入力信号の第３の信号であってサンプル数Ｌだけ遅延された信号を前記マッチドフィルタに入力して、
【数８】

ｑ _ｊ：マッチドフィルタの出力信号
Ｐ _ｊ：マッチドフィルタへの入力信号ベクトル
〈Ｗ _ｊ，Ｐ _ｊ〉：フィルタ係数ベクトルと入力信号ベクトルの内積
を演算し、前記時間的に相関の高い所望信号ｑ _ｊを出力することを特徴とするものである。
【００２１】
また、本発明の信号抽出装置は、前記高域強調手段が、入力信号を１サンプル遅延させる遅延手段と、該遅延手段の入出力信号間の差分信号を生成して出力する差分手段と、前記遅延手段の前段または前記差分手段の後段に接続された入力信号を前記ゲイン倍する乗算手段とを具えてなることを特徴とするものである。
【００２２】
また、本発明の信号抽出装置は、前記ディジタル入力信号をＦサンプル毎のブロックに分割するブロック分割手段をさらに当該装置の入力端に具えるとともに、前記遅延手段に代わる少なくとも（Ｆ＋１）サンプルの記憶容量を有する第１のバッファ手段であって、前記ブロック分割手段の出力信号または前記ゲイン倍された前記ブロック分割手段の出力信号を前記ブロック単位に順次に記憶するとともに、該記憶後に該第１のバッファ手段に記憶された該第１のバッファ手段の記憶容量に応じて予め定められた第１の所定のサンプルから数えて時間的にＦサンプル前の記憶サンプルから前記第１の所定のサンプルに向かって２個連続したサンプルの組を前記差分手段の入力信号として出力し、前記ブロック単位の記憶に続く次のブロック単位の記憶までの間に前記第１の所定のサンプルから数えて時間的にＦサンプル前の記憶サンプルから１サンプルずつ前記第１の所定のサンプル方向にずらしながら合計Ｆ組の前記２個連続したサンプルの組を繰り返し前記差分手段に出力する第１のバッファ手段と、サンプル数Ｍの遅延を兼ねる少なくとも（Ｍ＋ｋ＋Ｆ）サンプルの記憶容量を有する第２のバッファ手段であって、前記ブロック分割手段の出力信号を前記ブロック単位に順次に記憶するとともに、該記憶後に前記第１の所定のサンプルと同一タイミングで該第２のバッファ手段に記憶された該第２のバッファ手段の記憶容量に応じて予め定められた第２の所定のサンプルから数えて時間的に（Ｍ＋ｋ＋Ｆ−１）サンプル前の記憶サンプルから前記第２の所定のサンプルに向かってｋ個連続したサンプルの組を前記適応フィルタの各タップ出力信号として前記適応フィルタに入力し、前記ブロック単位の記憶に続く次のブロック単位の記憶までの間に前記第２の所定のサンプルから数えて時間的に（Ｍ＋ｋ＋Ｆ−１）サンプル前の記憶サンプルから１サンプルずつ前記第２の所定のサンプル方向にずらしながら合計Ｆ組の前記ｋ個連続したサンプルの組を前記第１のバッファ手段と同期して繰り返し前記適応フィルタの各タップ出力信号として前記適応フィルタに入力する第２のバッファ手段と、サンプル数Ｌの遅延を兼ねる少なくとも（Ｌ＋ｋ＋Ｆ）サンプルの記憶容量を有する第３のバッファ手段であって、前記ブロック分割手段の出力信号を前記ブロック単位に順次に記憶するとともに該記憶後に前記第１の所定のサンプルと同一タイミングで該第３のバッファ手段に記憶された該第３のバッファ手段の記憶容量に応じて予め定められた第３の所定のサンプルから数えて時間的に（Ｌ＋ｋ＋Ｆ−１）サンプル前の記憶サンプルから前記第３の所定のサンプルに向かってｋ個連続したサンプルの組を前記マッチドフィルタの各タップ出力信号として前記マッチドフィルタに入力し、前記ブロック単位の記憶に続く次のブロック単位の記憶までの間に前記第３の所定のサンプルから数えて時間的に（Ｌ＋ｋ＋Ｆ−１）サンプル前の記憶サンプルから１サンプルずつ前記第３の所定のサンプル方向にずらしながら合計Ｆ組の前記ｋ個連続したサンプルの組を前記第１のバッファ手段と同期して繰り返し前記マッチドフィルタの各タップ出力信号として前記マッチドフィルタに入力する第３のバッファ手段とを具えてなることを特徴とするものである。
【００２３】
また、本発明の信号抽出装置は、前記遅延量のサンプル数ＭおよびＬ、前記タップ数ｋ、前記収束係数μおよび前記ゲインをそれぞれ制御できることを特徴とするものである。
【００２８】
【発明の実施の形態】
以下に添付図面を参照し、発明の実施の形態に基づいて本発明を詳細に説明する。
〔第１の発明〕
図１は、時間的相関の低い所望信号（例えば、音声）と時間的相関の高い非所望信号（例えば、楽音）とが混合された１チャンネルのディジタル入力信号から、時間的相関の低い所望信号を抽出する本発明信号抽出方法に係る信号処理の手順をフローチャートで示している。
【００２９】
図１において、時間的相関の低い信号（音声）と時間的相関の高い信号（楽音）とが混合された１チャンネルの信号（アナログ信号）がまず入力される（ステップＳ１）。この信号は、例えば、マイクロホンによって収音された楽音を含んだ音声信号である。この信号は、まず、以下に説明する適応処理のためにディジタル化される（ステップＳ２）。ディジタル化された信号は、図示のように２分岐され、一方はそのまま、他方のディジタル信号は指定されたサンプル数（遅延量）Ｎだけ遅延される（ステップＳ３）。これらそのままおよび遅延された信号は、図２にその一実施形態を示す適応処理回路に供給され、適応処理回路においては、以下に説明するように適応処理される（ステップＳ４）。
【００３０】
適応処理回路は、等価回路にて図２に示すように、１サンプル間隔で指定されたタップ数ｋの適応フィルタ（図示の場合、トランスバーサルフィルタで構成）１、適応フィルタの係数を逐次更新する適応フィルタ係数逐次更新部２、および減算器３からなっている。
【００３１】
上記の構成において、適応フィルタ１にはステップＳ３によって遅延された信号ｘ_j（図１参照）が供給され、そのフィルタ係数（係数ベクトル
【外１】

）は、適応フィルタ係数逐次更新部２からの出力によって逐次更新される。なお、添字ｊは、それが時点ｊのものであることを示している。適応フィルタ１の出力信号ｚ_jは、減算器３の減数端子に印加され、同減算器３の被減数端子に印加された前述のそのままの信号（以下、参照信号と言う）ｙ_j（図１参照）から減算され、同減算器３の出力が図２に示す適応処理回路の出力信号ｅ_j（図１参照）となる。
【００３２】
適応フィルタ係数逐次更新部２には、適応処理回路の出力信号ｅ_jと適応フィルタ１の各タップ出力信号（入力ベクトル
【外２】

）が供給され、以下に説明する学習同定法によるアルゴリズムに基づきフィルタ係数（係数ベクトル〔外１〕）が逐次更新され出力される。
【００３３】
以下では、数式等において、英字の大文字はベクトル、小文字はスカラー量をそれぞれ表すものとする。
学習同定法によるフィルタ係数の逐次更新アルゴリズムは、次の（１）式によって表される。
【数９】

ここに、
〔外２〕：適応フィルタへの入力信号ベクトル
〔外１〕：適応フィルタのフィルタ係数ベクトル
μ ：収束係数
ｚ_j：適応フィルタの出力信号
ｙ_j：参照信号
ｅ_j：適応処理部の出力信号
【外３】

：係数ベクトルと入力信号ベクトルの内積
【外４】

：入力信号ベクトルのノルム
【００３４】
（１）式の意味するところは、次のように説明される。
減算器３において、同減算器３の被減数端子への入力信号（参照信号）ｙ_jから適応フィルタ１の出力信号ｚ_jを減算し、その減算結果ｅ_jに、収束係数μおよび
【外５】

を乗算し、さらにフィルタ係数ベクトル〔外１〕を加算して得られたものを次の時点（ｊ＋１）におけるフィルタ係数ベクトル
【外６】

とする。なお、
【外７】

の値が０のときは〔外５〕を強制的に０とする。またフィルタ係数ベクトルの初期値
【外８】

は任意の値でよいが、一般にはゼロベクトルとすることが多い。
【００３５】
また、このフィルタ係数の逐次更新は、適応フィルタの出力信号ｚ_jと参照信号ｙ_jとの差分ｅ_jが小さくなるように更新が行われるが、適応フィルタの入力信号ｘ_jと参照信号ｙ_jとの間にはサンプル数Ｎだけの時間のずれがあることから、上記差分ｅ_jが小さくなるようにフィルタ係数を決めることは、時間的相関の高い信号成分を強調するフィルタリング処理をしていることに相当する。なお、時間的相関の程度はパラメータとしての遅延量Ｎとタップ数ｋとにより制御することができる。
【００３６】
さらに、逐次更新のリアルタイム処理を可能とするためにはフィルタ係数の収束が速いことが必要であり、過度のオーバーシュートが生じないように収束速度の制御をパラメータとしての収束係数μ（０≦μ≦１）により制御する。上記遅延量Ｎとタップ数ｋを含め、これらのパラメータは、図１のパラメータ制御（ステップＳ５）において設定することができる。
【００３７】
上記パラメータ（Ｎ，ｋ，μ）の設定は、本発明装置から得られる音声を聞きながら、聴感上、それが最も聴き取りやすくなるように人手で調整する。このとき、適応フィルタ１（図２参照）の出力信号ｚ_jは楽音のスペクトルに近い形状を有するようになる。
【００３８】
図１において、Ｄ／Ａ変換（ステップＳ６）および音声出力（ステップＳ７）の各ステップは、上述した適応処理（ステップＳ４）によって時間的相関の低い信号成分（音声）が強調されたデジタル信号をアナログ信号化して聴取できるようにするための手順である。
【００３９】
以上、本発明中、第１の発明を図１を参照して方法の発明として説明したが、適応処理（ステップＳ４）を行う適応処理回路が図２に示され、そのパラメータ制御も詳細に説明されたことから、第１の発明を装置の発明として構成し得ることは明らかである。
【００４０】
〔第２の発明〕
図３は、時間的相関の高い所望信号（例えば、音声）と時間的相関の低い非所望信号（例えば、雑音）とが混合された１チャンネルのディジタル入力信号から、時間的相関の高い所望信号を抽出する本発明信号抽出方法に係る信号処理の手順をフローチャートで示している。
【００４１】
図３において、時間的相関の高い信号（音声）と時間的相関の低い信号（雑音）とが混合された１チャンネルの信号（アナログ信号）がまず入力される（ステップＳ８）。この信号は、例えば、マイクロホンによって収音された雑音を含んだ音声信号である。この信号は、図１に示される場合と同様、以下に説明する適応処理のためにディジタル化される（ステップＳ９）。ディジタル化された信号は、図示のように分岐され、それぞれ高域強調（ステップＳ１０）、サンプル数（遅延量）Ｍだけ遅延１（ステップＳ１１）、およびサンプル数（遅延量）Ｌだけ遅延２（ステップＳ１２）される。
【００４２】
図４は、ステップＳ１０で行われる高域強調のための高域強調回路の等価回路の例を示し、入力信号の１サンプル間の差分信号を生成し、それを所定のゲインＡでゲイン倍して出力する構成のものである。入力信号をゲインＡ倍してから差分信号を生成しても同じである。
【００４３】
ステップＳ１０で高域強調された信号ｙ_jとステップＳ１１でサンプル数（遅延量）Ｍだけ遅延された信号ｘ_jは、上述の、図２と同一回路構成（ただし、出力信号が目的とする信号にならないので、出力端子は必要でない）の適応処理回路の参照信号が供給される端子と適応フィルタ１にそれぞれ供給され、ステップＳ１３によって示される適応処理が実行される。
【００４４】
この場合においても、高域強調された信号ｙ_jと適応フィルタ１の出力信号、すなわち減算器３の被減算端子に供給される信号ｚ_jとの差分ｅ_j、および適応フィルタ１の各タップ出力信号に基づいて、図２について説明したのと同様、上記差分ｅ_jが小さくなるように適応フィルタ１のフィルタ係数を学習同定法により逐次更新する。このとき、適応フィルタ１の入力信号ｘ_jと参照信号ｙ_jとの間には、サンプル数Ｍだけの時間的ずれがあることから、差分ｅ_jが小さくなるようにフィルタ係数を決めることは、時間的相関の高い信号成分を強調するフィルタリング処理をしていることに相当する。
【００４５】
なお、図３のステップＳ１４で示すパラメータ（ゲインＡ、遅延量Ｍ，Ｌ、タップ数ｋ、収束係数μ）の制御は、図１においてステップＳ５で説明したのと同じであるから、ここでは、その説明は省略する。ただ、図１においてなかったものとして、ゲインＡは、図４に示す高域強調回路の振幅レベルを制御している。
【００４６】
図３において、ステップＳ１２においてサンプル数（遅延量）Ｌだけ遅延されたディジタル信号は、ステップＳ１５において、図５に示すマッチドフィルタを用いてマッチドフィルタ処理される。マッチドフィルタの出力は、そのフィルタ係数がステップＳ１３の適応処理によって得られた適応フィルタのフィルタ係数であるとき、時間的相関の高い信号成分（音声）ｑ_jが強調されて得られることになる。図３中、白ぬきの矢印は、ステップＳ１３で求めた適応フィルタのフィルタ係数〔外１〕を、ステップＳ１５のマッチドフィルタのフィルタ係数にコピーすることを示している。
【００４７】
このマッチドフィルタにディジタル入力信号を供給するのに、ステップＳ１２によってサンプル数Ｌだけ遅延させているのは、次の理由からである。すなわち、音声は、ある時間をかけて表現された音韻が時間的に連続して成立するので、後述するようにマッチドフィルタで所望の音声を抽出するには、その音韻終了時点において適応フィルタで決まるフィルタ係数を用いてその音韻開始時点からマッチドフィルタを制御する必要があるからで、その時間差を遅延時間Ｌにより制御している。これらの制御パラメータである２つの遅延回路の遅延時間Ｍ，Ｌ，高域強調信号ゲインＡ、適応フィルタのタップ数ｋ、および適応信号処理アルゴリズムで用いる収束係数μは、パラメータ制御（ステップＳ１４）で設定できるようになっている。これらパラメータ（Ｍ，Ｌ，Ａ，ｋ，μ）の設定は、本発明装置から得られる音声を聞きながら、その音声が、聴感上、最も聴き取りやすくなるように各制御パラメータを設定するものとする。
【００４８】
ステップＳ１５で用いられるマッチドフィルタ（図５参照）は適応処理回路中の適応フィルタと同一の構成であり、適応フィルタと同一のタップ数ｋのフィルタ係数を有している。そして、ステップＳ１３で使用される適応フィルタのフィルタ係数がコピーされたフィルタ係数でフィルタリングされ、そのフィルタリングされたディジタル音声出力信号ｑ_ｊを出力する。すなわち、マッチドフィルタのタップ数をｋ、マッチドフィルタの入力信号ベクトルを
【外９】

、コピーされたマッチドフィルタの係数ベクトルを〔外１〕とするマッチドフィルタの出力信号ｑ_ｊは次の（２）式のように表現できる。
【数１０】

【００４９】
なお、この場合サンプル数Ｌの遅延は、前述したように、適応処理回路の出力信号とマッチドフィルタに入力されるディジタル音声信号の時間軸を調整するためのものである。
マッチドフィルタからのディジタル音声出力信号ｑ_jは、ステップＳ１６によってアナログ信号に変換され聴取（ステップＳ１７）される。
【００５０】
以上、本発明中第２の発明も、高域強調（ステップＳ１０）を行う高域強調回路、適応処理（ステップＳ１３）を行う適応処理回路、およびマッチドフィルタ処理（Ｓ１５）を行うマッチドフィルタが、それぞれ図面等（適応処理回路は第１の発明で使用したものと同一であり、図面は省略した）により説明され、それらのパラメータ制御についても説明されたことから、第２の発明を装置の発明として構成し得ることは明らかである。
【００５１】
〔第１の発明の別の実施態様〕
第１の発明を、信号抽出プログラムを記録媒体からコンピュータに読み込むことによってソフトウェア処理で実現しようとする場合、１チャンネルのディジタル入力信号を１サンプルづつ処理していたのでは、使用するＣＰＵによっては計算処理に時間がかかり、リアルタイム処理が困難となる可能性もある。これを解決するために、ディジタル入力信号をブロック毎に処理する方法を、以下に説明する。
【００５２】
図６は、この場合のデータバッファリングの構成を示している。
本実施態様を装置として構成する場合には、ブロック分割手段、第１のバッファおよび第２のバッファが必要である。
【００５３】
以下に、それら第１および第２のバッファ（メモリ）の大きさやディジタル入力信号の記憶の仕方（書き込み、読み出し）について、箇条書きで説明する。これにより、信号抽出の全過程をコンピュータ処理化することができる。
▲１▼ ディジタル入力信号は、取り込み開始からＦサンプル毎のブロック（ブロックデータ）に切り出される。本実施形態では、Ｆ＝２５６サンプルである。ブロックはどこで切ってもよい。
▲２▼ 適応フィルタのタップ数をｋとする。本実施形態では、初期値としてｋ＝１２８を採用する。
▲３▼ 切り出されたブロックデータを、適応処理回路の入力側に設けたバッファ１と、適応処理回路に入力するＮサンプルの遅延を兼ねるバッファ２に蓄える（図１参照）。
▲４▼ このとき、バッファ１は、（ｋ＋Ｆ）個のバッファ長（サンプル個数）、また、バッファ２は、サンプル数Ｎの遅延を含む（Ｎ＋２×ｋ＋Ｆ）のバッファ長とする。
【００５４】
▲５▼ 各バッファ（バッファ１、バッファ２）の最後尾からＦサンプル分の場所（図６に＊で示す）にブロック内の時間的に最も後の（最も新しい）サンプルが各バッファの最後尾に格納されるように、切り出したブロックデータを格納する。
▲６▼ 各バッファに格納されたデータのアドレスを制御して各バッファの先頭の２つ目のサンプルから１サンプルずつずらして読み出して上述した第１の発明の処理を行う。このように、本実施形態によれば、メモリ上でアドレス制御を行うことにより、物理的なメモリ格納場所をシフトすることなしに、上記処理が行えるので計算時間を短縮することができ、きわめて好都合である。
【００５５】
ディジタル信号の書き込み、読み出しは以下のように行う。
まず、バッファ１の先頭から２つ目のサンプルを適応処理回路の参照信号ｙ_ｊ（図１，図２参照）とする。この参照信号に対応する適応処理回路の入力信号（ステップＳ３の出力信号）ｘ_ｊは、バッファ１の先頭の２つ目のサンプルからサンプル数Ｎだけ遅延したサンプルであり、適応フィルタの各タップ出力信号は、そのサンプルを先頭に、さらにｋサンプル遅延したサンプルに至るｋ個のサンプルであるから、バッファ２の先頭の２つ目のサンプルからｋ個のサンプルに相当する。そこで、バッファ２の先頭の２つ目のサンプルからｋ個のサンプルを読み出して、適応フィルタ１（図２参照）のフィルタ係数を用いてフィルタリングを行う。このフィルタリング処理、フィルタ係数の逐次更新処理および適応フィルタの出力信号を参照信号から減算して出力する処理を各バッファの先頭の２つ目のサンプルから１サンプルずつ時間方向にずらしながら繰り返し行う。すなわち、バッファ１の先頭の２つ目のサンプルからＦサンプル、バッファ２の先頭の２つ目のサンプルから（Ｆ＋ｋ−１）サンプルのデータを使い、時刻ｊに対応するベクトルデータから時刻ｊ＋Ｆ−１に対応するベクトルデータに至るまでＦ回、適応フィルタのフィルタ係数を更新しながら、次の（３）式により
【数１１】

適応フィルタの出力ｚ_ｊを得るとともに、これを参照信号ｙ_ｊから減算して出力信号ｅ_ｊを得る処理を繰り返す。
【００５６】
▲８▼ 続いてバッファ１は、最後尾からのｋサンプルをＦサンプル分前方へシフトする。バッファ２は、最後尾からの（ｋ×２＋Ｎ）サンプルを前方へＦサンプル分シフトする。次に、各バッファの最後尾のＦサンプル分に次のブロックデータを格納し、▲１▼からの処理を繰り返す。
【００５７】
なお、本実施形態では、便宜上、バッファ長をバッファ１については（ｋ＋Ｆ）、バッファ２については（Ｎ＋２×ｋ＋Ｆ）としたが、図６から分かるように、各バッファを同一タイミングで制御するための各バッファのバッファ長はバッファ１については少なくともＦ、バッファ２については少なくとも（Ｎ＋ｋ＋Ｆ）だけあればよい。
【００５８】
〔第２の発明の別の実施態様〕
第１の発明の別の実施態様と同様、ディジタル入力信号をブロック毎に処理することにより、信号抽出プログラムを記録媒体からコンピュータに読み込むことによるソフトウェア処理の際の処理速度を高め、リアルタイム処理が可能になるようにしたものである。
【００５９】
図７は、この場合のデータバッファリングの構成を示している。
本実施態様を装置として構成する場合には、ブロック分割手段、第１のバッファおよび第２のバッファに加えて、第３のバッファが必要である。
【００６０】
以下に、それら第１，第２および第３のバッファ（メモリ）の大きさやディジタル入力信号の記憶の仕方（書き込み、読み出し）について、箇条書きで説明する。これにより、信号抽出の全過程をコンピュータ処理化することができる。
▲１▼ ディジタル入力信号は、取り込み開始からＦサンプル毎のブロック（ブロックデータ）に切り出される。本実施形態では、Ｆ＝２５６サンプルである。ブロックはどこで切ってもよい。
▲２▼ 適応フィルタのタップ数をｋとする。本実施形態では、初期値としてｋ＝１２８を採用する。
▲３▼ 切り出されたブロックデータを、高域強調回路の入力側に設けたバッファ１、適応フィルタへ入力する遅延１を兼ねるバッファ２、およびマッチドフィルタへ入力する遅延２を兼ねるバッファ３に蓄える（図３参照）。
▲４▼ このとき、バッファ１は、（ｋ＋Ｆ）個のバッファ長（サンプル個数）、バッファ２は、サンプル数Ｍの遅延を含む（Ｍ＋２×ｋ＋Ｆ）のバッファ長、また、バッファ３にはサンプル数Ｌの遅延を含む（Ｌ＋２×ｋ＋Ｆ）のバッファ長とする。
【００６１】
▲５▼ 各バッファ（バッファ１、バッファ２およびバッファ３）の最後尾からＦサンプル分の場所（図７に＊で示す）にブロック内の時間的に最も後の（最も新しい）サンプルが各バッファの最後尾に格納されるように、切り出したブロックデータを格納する。
▲６▼ 各バッファに格納されたデータのアドレスを制御して各バッファの先頭の２つ目のサンプル（バッファ１については、１つ目のサンプル）から１サンプルずつずらして読み出して上述した第２の発明の処理を行う。このように、本実施形態によれば、メモリ上でアドレス制御を行うことにより、物理的なメモリ格納場所をシフトすることなしに、上記処理が行えるので計算時間を短縮することができ、きわめて好都合である。
【００６２】
ディジタル信号の書き込み、読み出しは以下のように行う。
▲７▼ まず、バッファ１の先頭から１つ目のサンプルとその１つ後のサンプルの差分をとって、これを適応処理回路の参照信号とする。この参照信号に対応する適応処理回路の入力信号（ステップＳ１１の出力信号）ｘ_jは、バッファ１の先頭の２つ目のサンプルからサンプル数Ｍだけ遅延したサンプルであり、適応フィルタの各タップ出力信号は、そのサンプルを先頭にさらにｋサンプル遅延したサンプルに至るｋ個のサンプルであるから、バッファ２の先頭の２つ目のサンプルからｋ個のサンプルに相当する。また、そのタイミングのマッチドフィルタ入力信号（ステップＳ１２の出力信号）は、バッファ１の先頭の２つ目のサンプルからサンプル数Ｌだけ遅延したサンプルであり、マッチドフィルタの各タップ出力信号は、そのサンプルを先頭にさらにｋサンプル遅延したサンプルに至るｋ個のサンプルであるから、バッファ３の先頭の２つ目のサンプルからｋ個のサンプルに相当する。
【００６３】
そこで、バッファ２とバッファ３の各先頭の２つ目のサンプルからｋ個のサンプルを読み出して、適応フィルタ１（図２参照）のフィルタ係数を用いてフィルタリングを行いながら、適応フィルタのフィルタ係数をマッチドフィルタのフィルタ係数にコピーしてフィルタリングを行って出力信号ｑ_ｊを得る。これらのフィルタリング処理および適応フィルタ係数の逐次更新処理を各バッファの先頭の２つ目のサンプル（バッファ１については、１つ目のサンプル）から１サンプルずつ時間方向にずらしながら繰り返し行う。すなわち、バッファ１の先頭の１つ目のサンプルからＦ＋１サンプル、バッファ２とバッファ３の先頭の２つ目のサンプルから（Ｆ＋ｋ−１）サンプルのデータを用い、時刻ｊに対応するベクトルデータから時刻ｊ＋Ｆ−１に対応するベクトルデータに至るまでＦ回、上記の高域強調信号ｙ_ｊを用いて適応フィルタのフィルタ係数を更新し、それをマッチドフィルタのフィルタ係数にコピーしながら、次の（４）式により
【数１２】

マッチドフィルタの出力ｑ_ｊを得る処理を繰り返す。
【００６４】
▲８▼ 続いてバッファ１は、最後尾からのｋサンプルをＦサンプル分前方へシフトする。バッファ２は、最後尾からの（ｋ×２＋Ｍ）サンプルを前方へＦサンプル分シフトする。バッファ３は、最後尾からの（ｋ×２＋Ｌ）サンプルを前方へＦサンプル分シフトする。次に、各バッファの最後尾のＦサンプル分に次のブロックデータを格納し、▲１▼からの処理を繰り返す。
【００６５】
なお、本実施形態では、便宜上、バッファ長をバッファ１については（ｋ＋Ｆ）、バッファ２については（Ｍ＋２×ｋ＋Ｆ）、バッファ３については（Ｌ＋２×ｋ＋Ｆ）としたが、図７から分かるように各バッファを同一タイミングで制御するための各バッファのバッファ長はバッファ１については少なくとも（Ｆ＋１）、バッファ２については少なくとも（Ｍ＋ｋ＋Ｆ）、バッファ３については少なくとも（Ｌ＋ｋ＋Ｆ）だけあればよい。
【００６６】
以上説明した本発明信号抽出方法および装置中、第１の発明およびその別の実施形態（音声と楽音が混合された１チャンネルの信号から音声を抽出することができる発明）によって、楽音成分が軽減され、音声成分が強調されて得られる（本発明の効果）ことが、本発明装置の入力信号と出力信号とをそれぞれ示す図８の信号波形（ａ）と（ｂ）によって示されている。
【００６７】
また、同じく本発明信号抽出方法および装置中、第２の発明およびその別の実施形態（音声と雑音が混合された１チャンネルの信号から音声を抽出することができる発明）によって、雑音成分が軽減され、音声成分が強調されて得られる（本発明の効果）ことが、本発明装置の入力信号と、出力信号とをそれぞれ示す図９の信号波形（ａ）と（ｂ）、および図１０のランニングスペクトルの波形（ａ）と（ｂ）によって示されている。
【００６８】
【発明の効果】
本発明によれば、楽音や雑音が混合された１チャンネルの信号から、所望の音声信号のみを聴感上最も聴き取りやすい状態で、リアルタイムに抽出することが可能となる。
また、本発明は、音声認識の前処理や、高齢者、聴覚障害者などが使用する補聴器など、さまざまな分野での応用が考えられる。
【図面の簡単な説明】
【図１】本発明信号抽出方法（とくに、第１の発明）をフローチャートにて示している。
【図２】適応処理回路の構成を等価回路にて示している。
【図３】本発明信号抽出方法（とくに、第２の発明）をフローチャートにて示している。
【図４】高域強調回路を等価回路にて示している。
【図５】マッチドフィルタの構成を等価回路にて示している。
【図６】本発明信号抽出装置（とくに、第１の発明の別の実施形態）におけるデータバッファリングの構成を示している。
【図７】本発明信号抽出装置（とくに、第２の発明の別の実施形態）におけるデータバッファリングの構成を示している。
【図８】本発明の効果（とくに、第１の発明および第１の発明の別の実施形態）を信号波形の対比で示している。
【図９】本発明の効果（とくに、第２の発明および第２の発明の別の実施形態）を信号波形の対比で示している。
【図１０】本発明の効果（とくに、第２の発明および第２の発明の別の実施形態）をランニングスペクトルの波形の対比で示している。
【符号の説明】
１適応フィルタ
２適応フィルタ係数逐次更新部
３減算器
Ｄ１サンプルディレー
Σ 総和器
Ａゲイン[0001]
BACKGROUND OF THE INVENTION
The present invention is a signal for extracting only a desired audio signal in real time from a collected signal when an unnecessary acoustic signal is mixed for some reason when the audio signal is collected. The present invention relates to an extraction method and apparatus and a medium on which a signal extraction program is recorded.
[0002]
[Prior art]
As a conventional noise removal method, a so-called one-input method and two-input method described below are known.
First, as a one-input method, for example, the document SFBoll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction” IEEE trans., Vol. ASSP-27, No.2, pp.113-120, April (1979 And the spectral subtraction method described in (1). This method is a method for obtaining a short-time amplitude spectrum of a desired audio signal by subtracting a short-time power spectrum of noise estimated from a known or silent section from the spectrum of an input signal.
[0003]
Next, as a two-input method, for example, reference B. Widrow et al., “Adaptive Noise Cancelling: Principles and Applications”, Proc. IEEE, Vol. 63, No. 12, pp. 1692-1716 (1975 The adaptive signal processing method described in (1). This method is a method capable of removing noise when only a noise component can be separately used as a reference signal in addition to an input signal in which voice and noise are mixed. Since the sound collection system of the reference signal is generally different from the sound collection system of the input signal, the reference signal does not match the noise component contained in the input signal, and in order to extract the audio signal from the input signal, It is necessary to obtain a difference by performing a filter process for matching the reference signal and the noise component included in the input signal as much as possible. An adaptive filter is generally used as this filter.
[0004]
That is, this adaptive signal processing method is a signal processing method using an adaptive filter. As a control algorithm for the adaptive filter, the above-mentioned document B. Widrow et al., “Adaptive Noise Cancelling: Principles and Applications”, Proc. IEEE, Vol. 63, No. 12, pp. 1692-1716 (1975), and J. Nagumo et al., “A Learning Method for System Identification”, IEEE Trans., Vol. AC-12, No. 3, pp. 282-287 There is a learning identification method described in JUNE (1967). Here, an adaptive filter is a kind of learning function that, when given an input signal and a signal to be output, automatically corrects the coefficients of the adaptive filter and gradually outputs a signal close to the desired output. It is a filter with Each algorithm is an algorithm that feeds back the difference between the filter output and the input to the filter to converge the value of the filter coefficient. The learning identification method is different from the LMS method in that the difference value is normalized. The LMS method is simple and has a small amount of calculation, and therefore has the advantage of being easy to implement. On the other hand, since the difference is not normalized, the dependency on the input signal amplitude is large, and the convergence is gradual and the adaptation is quick. It is unsuitable for the required use. On the other hand, the learning identification method is an improved version of the LMS method, and the amount of calculation for normalization increases somewhat, but the convergence is much faster than the LMS method.
[0005]
[Problems to be solved by the invention]
The material brought in from the field for broadcasting, particularly in the press, has a poor signal-to-noise ratio such as the voice of the criminal of the kidnapping case and the conversation voice in the cockpit of the airplane, and in most cases, is a form of one input signal without a reference signal.
In the case of urgent need, there arises a need to remove noise in real time for such a signal. However, as described below, the prior art has not been able to meet such a demand.
[0006]
In the spectral subtraction method as a conventional technique, a silent section identified in advance is cut out and noise is removed using this. Therefore, in the case where the noise component fluctuates with time, the adaptability to real-time processing is poor, such as redetecting the silent section each time. That is, with the conventional one-input method, it is difficult to perform adaptive adjustment of parameters, and it is difficult to perform filtering while following only a desired signal component with respect to one-channel input signal that changes every moment. is there.
[0007]
In the above-described adaptive signal processing method using the two-input method, real-time processing can be performed, but in this case, only a single-channel signal in which voice and noise are mixed is given and cannot be used. That is, when trying to follow temporally by adaptive signal processing, a signal having only a noise component is required as a reference signal.
[0008]
In this way, in the case of a single input signal form, a noise removal device that can cope with a time-varying noise component, and further parameters in real time while confirming the effect so that it becomes the most audible state in terms of hearing. It is desirable to develop a device that can be adjusted.
[0009]
Accordingly, an object of the present invention is to record a method and apparatus capable of adjusting only parameters in real time and extracting a signal component from a single-channel signal in which a sound signal and a non-sound signal are mixed, and a signal extraction program. Is to provide such a medium.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, the present invention basically uses the adaptive signal processing method used in the conventional two-input method as it is, and there is a temporal correlation property between an audio signal and a non-audio signal. Paying attention to the difference, depending on the content of the non-speech signal that can be judged by the sense of hearing, use a signal with the input signal shifted for a certain time as the necessary reference signal, and in addition, input a signal with high frequency emphasis It is used as a signal.
[0011]
The former (method using a signal obtained by shifting the input signal as a necessary reference signal for a certain period of time) takes advantage of the fact that the musical sound has a higher temporal correlation with the musical sound. In addition, this is based on the idea of extracting a sound component having a high temporal correlation by subtracting it from the input signal and extracting a voice having a lower correlation by adaptive filter processing.
[0012]
The latter (in addition to the former, a method using a high-frequency emphasized signal as an input signal) takes advantage of the fact that speech is more temporally correlated with noise and speech, and therefore, the speech component with higher temporal correlation is used. A high-frequency emphasis of speech with few high-frequency components is simultaneously achieved by using a signal that has been extracted by adaptive filter processing and previously subjected to high-frequency emphasis as an input signal.
[0013]
In any case, by sequentially updating the adaptive filter coefficients while controlling the convergence speed, filtering is performed while only a desired signal component is temporally tracked with respect to an input signal of one channel that changes every moment. It is possible to do.
[0014]
Furthermore, in the former, an adaptive filter process is used to extract a component having a higher temporal correlation from the input signal and a signal delayed by a predetermined time, and this is subtracted from the input signal. Based on the determined filter coefficient, the system that determines the coefficient of the matched filter in real time by adaptive filter processing from the output signal obtained by high-frequency emphasis processing of the input signal and the signal obtained by delaying the input signal for a predetermined time, A delay circuit that delays the input signal for a predetermined time, the number of taps of the one-dimensional adaptive filter, the convergence coefficient, and, in the latter case, a high-frequency emphasis signal. The gain is characterized in that each gain can be adjusted in real time so that it can be heard most easily.
[0017]
That is, the present inventionA signal extraction apparatus for extracting a desired signal having a low temporal correlation from a one-channel digital input signal in which a desired signal having a low temporal correlation and an undesired signal having a high temporal correlation are mixed,One of the two-branched digital input signals, which has been delayed by the number of samples N, is input, and the filter coefficient vector is sequentially updated with the number of taps k specified at one sample interval. An adaptive filter, a subtracting means for subtracting the output signal of the adaptive filter from the other signal of the two-branched digital input signal, and outputting a subtraction result as the desired signal having a low temporal correlation; Using the output signal and the output signal from each tap of the adaptive filter,
[Equation 5]

    W _j : Filter coefficient vector of adaptive filter
    μ: Convergence coefficient
    e _j : Output signal of subtraction means (desired signal)
    X _j : Input signal vector to adaptive filter
    ｜ X _j ｜: Norm of input signal vector
And a filter coefficient vector W of the adaptive filter _j And at least adaptive filter coefficient sequential updating means for sequentially updating one of the delayed digital input signals to the adaptive filter,
[Formula 6]

    z _j : Output signal of adaptive filter
    <W _j , X _j >: Inner product of filter coefficient vector and input signal vector
And the output signal z of the adaptive filter that is the result of the operation _j Is input to the subtracting means,
      e _j = Y _j -Z _j
    y _j : The other signal of the two-branched digital input signal
And the desired signal e having a low temporal correlation is calculated. _j OutputIt is characterized by this.
[0018]
  Also,The signal extraction device of the present invention comprises:A block dividing means for dividing the digital input signal into blocks for each F sample is further provided at the input end of the apparatus, and the other of the two-branched digital input signals is provided.Provided at the input end of the subtraction means to which a signal is inputA first buffer means having a storage capacity of at least F samples, wherein the output signals of the block dividing means are sequentially stored in units of the blocks, and the first buffer means stored in the first buffer means after the storage; The stored sample before (F-1) samples in time from the first predetermined sample predetermined according to the storage capacity of one buffer means is used as the other signal of the two-branched digital input signal. One sample from the storage sample before (F-1) samples counted from the first predetermined sample between the input to the subtraction means and the storage of the next block unit following the storage of the block unit First buffer means for repeatedly inputting a total of F samples to the subtracting means while shifting in the first predetermined sample direction one by one;Also serves as a delay of N samplesA second buffer means having a storage capacity of at least (N + k + F) samples, wherein the output signals of the block dividing means are sequentially stored in units of blocks and at the same timing as the first predetermined sample after the storage; From the stored sample before (N + k + F-1) samples in time counting from the second predetermined sample previously determined according to the storage capacity of the second buffer means stored in the second buffer means A set of k consecutive samples toward the second predetermined sample is inputted to the adaptive filter as each tap output signal of the adaptive filter, and until the next block unit storage following the block unit storage. Each sample from the memory sample before (N + k + F−1) samples in time from the second predetermined sample. A total of F sets of k consecutive samples are repeatedly synchronized with the first buffer means while being shifted in a predetermined sample direction, and are input to the adaptive filter as tap output signals of the adaptive filter. And buffer means.
[0019]
  Also,The signal extraction device of the present invention controls the number of samples N of the delay amount, the number of taps k, and the convergence coefficient μ, respectively.It is characterized by this.
[0020]
  Also,The present inventionA signal extraction apparatus for extracting a desired signal having a high temporal correlation from a one-channel digital input signal in which a desired signal having a high temporal correlation and an undesired signal having a low temporal correlation are mixed,The three branchesA first signal of a digital input signal is input, and the signal is subjected to high frequency emphasis processing and designated gain multiplication to obtain a high frequency emphasis signal.OutputHigh frequency emphasis means,A second signal of the three-branched digital input signal that is delayed by the number of samples M is input, the number of taps is designated at one sample interval, and the filter coefficient vector is sequentially updated. A filter, a subtracting means for subtracting the output signal of the adaptive filter from the high frequency emphasis signal, an output signal from the subtracting means and an output signal from each tap of the adaptive filter,
[Expression 7]

    W _j : Filter coefficient vector of adaptive filter
    μ: Convergence coefficient
    e _j : Subtracting means output signal
    X _j : Input signal vector to adaptive filter
    ｜ X _j ｜: Norm of input signal vector
And a filter coefficient vector W of the adaptive filter _j Adaptive filter coefficient successive update means for successively updating the adaptive filter coefficient, and a matched filter to which a filter coefficient vector of the adaptive filter is copied. And the signal delayed by the number of samples L is input to the matched filter,
[Equation 8]

    q _j : Matched filter output signal
    P _j : Input signal vector to matched filter
    <W _j , P _j >: Inner product of filter coefficient vector and input signal vector
And the desired signal q having a high temporal correlation is calculated. _j OutputIt is characterized by this.
[0021]
  Also,The signal extraction device of the present invention comprises:The high frequency emphasizing means delays the input signal by one sample.Delay meansAnd theDelay meansDifferential means for generating and outputting a differential signal between the input and output signals ofDelay meansAnd a multiplying means for multiplying an input signal connected to the subsequent stage of the difference means by the gain.
[0022]
  Also,The signal extraction device of the present invention comprises:A block dividing means for dividing the digital input signal into blocks for each F sample is further provided at the input end of the apparatus,Delay means1 is a first buffer means having a storage capacity of at least (F + 1) samples, and sequentially storing the output signal of the block dividing means or the output signal of the block dividing means multiplied by the gain in units of blocks. In addition, from the stored sample before the F sample in time counted from the first predetermined sample that is predetermined according to the storage capacity of the first buffer means stored in the first buffer means after the storage A set of two consecutive samples toward the first predetermined sample is output as an input signal of the difference means, and the first predetermined period is stored until the next block unit storage following the block unit storage. A total of F sets of samples are shifted in the first predetermined sample direction one by one from the stored sample before F samples in time. A first buffer means for outputting said difference means repeated serial two consecutive samples of the set,Doubles the sample number MA second buffer means having a storage capacity of at least (M + k + F) samples, wherein the output signals of the block dividing means are sequentially stored in units of blocks, and the same timing as the first predetermined sample after the storage; From the stored sample before (M + k + F-1) samples in time counting from the second predetermined sample previously determined according to the storage capacity of the second buffer means stored in the second buffer means A set of k consecutive samples toward the second predetermined sample is inputted to the adaptive filter as each tap output signal of the adaptive filter, and until the next block unit storage following the block unit storage. Each time from the second predetermined sample to the first (M + k + F−1) samples from the memory sample before the sample. A total of F sets of k consecutive samples are repeatedly synchronized with the first buffer means while being shifted in a predetermined sample direction, and are input to the adaptive filter as tap output signals of the adaptive filter. Buffer means;Doubles the delay of the number of samples LA third buffer means having a storage capacity of at least (L + k + F) samples, wherein the output signals of the block dividing means are sequentially stored in units of blocks, and after the storage, at the same timing as the first predetermined sample; From the stored sample before (L + k + F-1) samples in time counting from a third predetermined sample that is predetermined according to the storage capacity of the third buffer means stored in the third buffer means A set of k consecutive samples toward the third predetermined sample is input to the matched filter as each tap output signal of the matched filter, and until the next block unit storage following the block unit storage. One sample at a time from the stored sample before (L + k + F-1) samples in time from the third predetermined sample A total of F sets of k consecutive samples are repeatedly input in synchronization with the first buffer means while shifting in the third predetermined sample direction as the tap output signals of the matched filter to the matched filter. And third buffer means.
[0023]
  Also,The signal extraction device of the present invention can control the number of samples M and L of the delay amount, the number of taps k, the convergence coefficient μ, and the gain, respectively.It is characterized by this.
[0028]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail based on an embodiment of the invention with reference to the accompanying drawings.
[First invention]
FIG. 1 shows a desired signal having a low temporal correlation from a one-channel digital input signal in which a desired signal having a low temporal correlation (for example, speech) and an undesired signal having a high temporal correlation (for example, a musical tone) are mixed. 2 shows a flowchart of a signal processing procedure according to the signal extraction method of the present invention for extracting.
[0029]
In FIG. 1, a signal of one channel (analog signal) in which a signal with low temporal correlation (speech) and a signal with high temporal correlation (musical sound) is mixed is first input (step S1). This signal is, for example, an audio signal including a musical sound collected by a microphone. This signal is first digitized for adaptive processing described below (step S2). The digitized signal is branched into two as shown in the figure, one is left as it is, and the other digital signal is delayed by the designated number of samples (delay amount) N (step S3). These directly and delayed signals are supplied to the adaptive processing circuit whose one embodiment is shown in FIG. 2, and the adaptive processing circuit performs adaptive processing as described below (step S4).
[0030]
As shown in FIG. 2, the adaptive processing circuit sequentially updates the adaptive filter having the k number of taps designated at one sample interval (in the case of a transversal filter) 1 and the coefficient of the adaptive filter in the equivalent circuit. It consists of an adaptive filter coefficient successive update unit 2 and a subtractor 3.
[0031]
In the above configuration, the adaptive filter 1 includes the signal x delayed by step S3._j(See FIG. 1) and its filter coefficients (coefficient vectors)
[Outside 1]

) Is sequentially updated by the output from the adaptive filter coefficient sequential updating unit 2. The subscript j indicates that it is at time j. Output signal z of adaptive filter 1_jIs applied to the subtractor terminal of the subtracter 3 and is applied to the subtracted terminal of the subtracter 3 as it is (hereinafter referred to as a reference signal) y._j(See FIG. 1), and the output of the subtracter 3 is the output signal e of the adaptive processing circuit shown in FIG._j(See FIG. 1).
[0032]
The adaptive filter coefficient successive update unit 2 receives the output signal e of the adaptive processing circuit._jAnd each tap output signal of the adaptive filter 1 (input vector
[Outside 2]

), And the filter coefficient (coefficient vector [external 1]) is sequentially updated and output based on an algorithm based on the learning identification method described below.
[0033]
In the following, in mathematical formulas, uppercase letters of alphabets represent vectors, and lowercase letters represent scalar quantities.
The filter coefficient sequential update algorithm based on the learning identification method is expressed by the following equation (1).
[Equation 9]

here,
  [Outside 2]: Input signal vector to the adaptive filter
  [Outside 1]: Filter coefficient vector of the adaptive filter
  μ: Convergence coefficient
  z_j: Output signal of adaptive filter
  y_j: Reference signal
  e_j: Output signal of adaptive processing unit
[Outside 3]

: Inner product of coefficient vector and input signal vector
[Outside 4]

: Norm of input signal vector
[0034]
The meaning of equation (1) is explained as follows.
In the subtracter 3, an input signal (reference signal) y to the subtracted terminal of the subtracter 3_jTo the output signal z of the adaptive filter 1_jAnd the subtraction result e_jThe convergence factor μ and
[Outside 5]

And the filter coefficient vector [outside 1] is added to obtain the filter coefficient vector at the next time point (j + 1).
[Outside 6]

And In addition,
[Outside 7]

When the value of 0 is 0, [Outside 5] is forced to 0. The initial value of the filter coefficient vector
[Outside 8]

May be any value, but in general is often a zero vector.
[0035]
In addition, the sequential update of the filter coefficient is performed by the output signal z of the adaptive filter._jAnd reference signal y_jDifference with_jIs updated so that the input signal x of the adaptive filter is reduced._jAnd reference signal y_jSince there is a time lag of N samples, the difference e_jDetermining the filter coefficient so that becomes small corresponds to performing a filtering process that emphasizes a signal component having a high temporal correlation. The degree of temporal correlation can be controlled by the delay amount N and the tap number k as parameters.
[0036]
Furthermore, in order to enable real-time processing of the sequential update, it is necessary that the filter coefficients converge quickly, and the convergence coefficient μ (0 ≦ μ) is used as a parameter for controlling the convergence speed so that excessive overshoot does not occur. Control by ≦ 1). These parameters including the delay amount N and the tap number k can be set in the parameter control (step S5) in FIG.
[0037]
The parameters (N, k, μ) are set manually by listening to the sound obtained from the device of the present invention so that it is most audible for listening. At this time, the output signal z of the adaptive filter 1 (see FIG. 2)_jHas a shape close to the spectrum of musical sounds.
[0038]
In FIG. 1, in each step of D / A conversion (step S6) and voice output (step S7), a digital signal in which a signal component (voice) having a low temporal correlation is emphasized by the above-described adaptive processing (step S4). This is a procedure for making an analog signal available for listening.
[0039]
In the present invention, the first invention has been described as the method invention with reference to FIG. 1, but the adaptive processing circuit for performing the adaptive processing (step S4) is shown in FIG. 2, and its parameter control is also described in detail. From the above, it is apparent that the first invention can be configured as an apparatus invention.
[0040]
[Second invention]
FIG. 3 shows a desired signal having a high temporal correlation from a one-channel digital input signal in which a desired signal having a high temporal correlation (for example, speech) and an undesired signal having a low temporal correlation (for example, noise) are mixed. 2 shows a flowchart of a signal processing procedure according to the signal extraction method of the present invention for extracting.
[0041]
In FIG. 3, a one-channel signal (analog signal) in which a signal with high temporal correlation (voice) and a signal with low temporal correlation (noise) is mixed is first input (step S8). This signal is an audio signal including noise collected by a microphone, for example. This signal is digitized for the adaptive processing described below (step S9), as in the case shown in FIG. The digitized signals are branched as shown in the figure, and are respectively high-frequency emphasized (step S10), delayed by the number of samples (delay amount) M of 1 (step S11), and delayed by the number of samples (delay amount) L of 2 ( Step S12).
[0042]
FIG. 4 shows an example of an equivalent circuit of the high-frequency emphasis circuit for high-frequency emphasis performed in step S10. A difference signal between one sample of the input signal is generated and multiplied by a predetermined gain A. Output. It is the same even if the difference signal is generated after the input signal is multiplied by the gain A.
[0043]
Signal y enhanced in high frequency in step S10_jAnd the signal x delayed by the number of samples (delay amount) M in step S11_j2 is supplied to the terminal to which the reference signal of the adaptive processing circuit and the adaptive filter 1 having the same circuit configuration as that of FIG. Then, the adaptation process indicated by step S13 is executed.
[0044]
Even in this case, the high-frequency emphasized signal y_jAnd the output signal of the adaptive filter 1, that is, the signal z supplied to the subtracted terminal of the subtractor 3_jDifference with_j, And on the basis of each tap output signal of the adaptive filter 1, the difference e is the same as described for FIG._jSo that the filter coefficient of the adaptive filter 1 is sequentially updated by the learning identification method. At this time, the input signal x of the adaptive filter 1_jAnd reference signal y_jSince there is a time lag of the number M of samples, the difference e_jDetermining the filter coefficient so that becomes small corresponds to performing a filtering process that emphasizes a signal component having a high temporal correlation.
[0045]
The control of the parameters (gain A, delay amount M, L, tap number k, convergence coefficient μ) shown in step S14 in FIG. 3 is the same as that described in step S5 in FIG. The description is omitted. However, the gain A controls the amplitude level of the high frequency emphasis circuit shown in FIG.
[0046]
In FIG. 3, the digital signal delayed by the number of samples (delay amount) L in step S12 is subjected to matched filter processing using the matched filter shown in FIG. 5 in step S15. The output of the matched filter is a signal component (speech) q having a high temporal correlation when the filter coefficient is the filter coefficient of the adaptive filter obtained by the adaptive processing in step S13._jWill be obtained with emphasis. In FIG. 3, the white arrow indicates that the filter coefficient [outside 1] of the adaptive filter obtained in step S13 is copied to the filter coefficient of the matched filter in step S15.
[0047]
The reason why the digital input signal is supplied to the matched filter by the number of samples L in step S12 is as follows. In other words, since the phoneme that is expressed over a period of time is formed continuously in terms of time, in order to extract a desired speech using a matched filter as described later, it is determined by an adaptive filter at the end of the phoneme. Since it is necessary to control the matched filter from the phoneme start time using the filter coefficient, the time difference is controlled by the delay time L. These control parameters are the delay times M and L of the two delay circuits, the high frequency emphasis signal gain A, the tap number k of the adaptive filter, and the convergence coefficient μ used in the adaptive signal processing algorithm in parameter control (step S14). It can be set. The setting of these parameters (M, L, A, k, μ) is to set each control parameter so that the sound is most easily heard in terms of audibility while listening to the sound obtained from the device of the present invention. To do.
[0048]
The matched filter (see FIG. 5) used in step S15 has the same configuration as the adaptive filter in the adaptive processing circuit, and has the same number of tap coefficients k as the adaptive filter. Then, the filter coefficient of the adaptive filter used in step S13 is filtered with the copied filter coefficient, and the filtered digital audio output signal q_jIs output. That is, the number of taps of the matched filter is k, and the input signal vector of the matched filter is
[Outside 9]

, The output signal q of the matched filter with the coefficient vector of the copied matched filter as [external 1]_jCan be expressed as the following equation (2).
[Expression 10]

[0049]
In this case, the delay of the sample number L is for adjusting the time axis of the output signal of the adaptive processing circuit and the digital audio signal input to the matched filter, as described above.
Digital audio output signal q from matched filter_jIs converted into an analog signal in step S16 and listened to (step S17).
[0050]
As described above, in the second invention of the present invention, the high frequency emphasis circuit that performs high frequency emphasis (step S10), the adaptive processing circuit that performs adaptive processing (step S13), and the matched filter that performs matched filter processing (S15) Since each of the drawings and the like (the adaptive processing circuit is the same as that used in the first invention, the drawing is omitted) and the parameter control thereof are also explained, the second invention is an invention of the apparatus. It is obvious that it can be configured as:
[0051]
[Another embodiment of the first invention]
When the first invention is implemented by software processing by reading a signal extraction program from a recording medium into a computer, the digital input signal of one channel is processed sample by sample. Processing takes time, and real-time processing may be difficult. In order to solve this problem, a method for processing a digital input signal for each block will be described below.
[0052]
FIG. 6 shows the configuration of data buffering in this case.
When this embodiment is configured as an apparatus, block dividing means, a first buffer, and a second buffer are required.
[0053]
In the following, the size of the first and second buffers (memory) and the manner of storing (writing and reading) the digital input signal will be described in itemized form. Thereby, the whole process of signal extraction can be computerized.
(1) The digital input signal is cut out into blocks (block data) every F samples from the start of acquisition. In the present embodiment, F = 256 samples. Blocks can be cut anywhere.
(2) Let k be the number of taps of the adaptive filter. In this embodiment, k = 128 is adopted as the initial value.
{Circle around (3)} The cut block data is stored in a buffer 1 provided on the input side of the adaptive processing circuit and a buffer 2 serving as a delay of N samples input to the adaptive processing circuit (see FIG. 1).
(4) At this time, the buffer 1 has a buffer length (number of samples) of (k + F), and the buffer 2 has a buffer length of (N + 2 × k + F) including a delay of the number of samples N.
[0054]
(5) The latest (newest) sample in the block at the end of each buffer (indicated by * in FIG. 6) from the end of each buffer (buffer 1 and buffer 2) is the end of each buffer. The cut out block data is stored so as to be stored in.
{Circle around (6)} The address of the data stored in each buffer is controlled to read out by shifting one sample at a time from the second sample at the head of each buffer, and the above-described processing of the first invention is performed. Thus, according to the present embodiment, by performing address control on the memory, the above processing can be performed without shifting the physical memory storage location, so that the calculation time can be shortened, which is very convenient. It is.
[0055]
Digital signal writing and reading are performed as follows.
First, the second sample from the top of the buffer 1 is used as the reference signal y of the adaptive processing circuit._j(See FIGS. 1 and 2). Input signal of the adaptive processing circuit corresponding to this reference signal (output signal of step S3) x_jIs a sample delayed by the number N of samples from the second sample at the head of the buffer 1, and each tap output signal of the adaptive filter is k samples from the sample to the sample delayed by k samples. Therefore, it corresponds to k samples from the second sample at the head of the buffer 2. Therefore, k samples are read from the second sample at the head of the buffer 2, and filtering is performed using the filter coefficient of the adaptive filter 1 (see FIG. 2). The filtering process, the filter coefficient sequential update process, and the process of subtracting the output signal of the adaptive filter from the reference signal and outputting them are repeated while shifting the sample from the second sample at the head of each buffer in the time direction. That is, F samples from the second sample at the head of buffer 1 and (F + k−1) samples from the second sample at the head of buffer 2 are used, and time j + F−1 from the vector data corresponding to time j. While updating the filter coefficient of the adaptive filter F times until it reaches the vector data corresponding to, the following equation (3)
## EQU11 ##

Output z of the adaptive filter_jAnd the reference signal y_jSubtract from output signal e_jRepeat the process to get
[0056]
(8) Subsequently, the buffer 1 shifts the k samples from the tail by F samples forward. Buffer 2 shifts (k × 2 + N) samples from the tail by F samples forward. Next, the next block data is stored in the last F samples of each buffer, and the processing from (1) is repeated.
[0057]
In this embodiment, for convenience, the buffer length is set to (k + F) for the buffer 1 and (N + 2 × k + F) for the buffer 2. However, as can be seen from FIG. The buffer length of each buffer may be at least F for buffer 1 and at least (N + k + F) for buffer 2.
[0058]
[Another embodiment of the second invention]
As with the other embodiments of the first invention, the digital input signal is processed for each block, thereby increasing the processing speed during software processing by reading the signal extraction program from the recording medium into the computer and enabling real-time processing. It is intended to become.
[0059]
FIG. 7 shows the configuration of data buffering in this case.
When this embodiment is configured as an apparatus, a third buffer is required in addition to the block dividing means, the first buffer, and the second buffer.
[0060]
In the following, the sizes of the first, second and third buffers (memory) and how to store the digital input signals (writing and reading) will be described in bulleted form. Thereby, the whole process of signal extraction can be computerized.
(1) The digital input signal is cut out into blocks (block data) every F samples from the start of acquisition. In the present embodiment, F = 256 samples. Blocks can be cut anywhere.
(2) Let k be the number of taps of the adaptive filter. In this embodiment, k = 128 is adopted as the initial value.
(3) The cut block data is stored in a buffer 1 provided on the input side of the high frequency emphasis circuit, a buffer 2 also serving as a delay 1 input to the adaptive filter, and a buffer 3 serving also as a delay 2 input to the matched filter ( (See FIG. 3).
(4) At this time, the buffer 1 has a buffer length (number of samples) of (k + F), the buffer 2 has a buffer length of (M + 2 × k + F) including a delay of the number of samples M, and the buffer 3 has a number of samples. The buffer length including L delay is (L + 2 × k + F).
[0061]
(5) The latest (newest) sample in the block in the block is located at the position of F samples (indicated by * in FIG. 7) from the end of each buffer (buffer 1, buffer 2 and buffer 3). The cut block data is stored so as to be stored at the end of the block.
{Circle around (6)} The address of the data stored in each buffer is controlled so that the second sample described above is read out by shifting one sample at a time from the second sample at the beginning of each buffer (the first sample for buffer 1). The process of the invention is performed. Thus, according to the present embodiment, by performing address control on the memory, the above processing can be performed without shifting the physical memory storage location, so that the calculation time can be shortened, which is very convenient. It is.
[0062]
Digital signal writing and reading are performed as follows.
(7) First, the difference between the first sample from the head of the buffer 1 and the next sample is taken and used as a reference signal for the adaptive processing circuit. Input signal of the adaptive processing circuit corresponding to this reference signal (output signal of step S11) x_jIs a sample delayed by the number M of samples from the second sample at the head of buffer 1, and each tap output signal of the adaptive filter is k samples from the sample to the sample delayed by k samples. Therefore, it corresponds to k samples from the second sample at the head of the buffer 2. The matched filter input signal (output signal of step S12) at that timing is a sample delayed by the number of samples L from the second sample at the head of the buffer 1, and each tap output signal of the matched filter is the sample. Since the number of samples reaches the sample delayed by k samples from the beginning, it corresponds to k samples from the second sample at the beginning of the buffer 3.
[0063]
Therefore, the k samples are read from the second sample at the head of each of the buffer 2 and the buffer 3, and the filter coefficient of the adaptive filter is calculated while performing the filtering using the filter coefficient of the adaptive filter 1 (see FIG. 2). The output signal q is copied to the filter coefficient of the matched filter and filtered._jGet. The filtering process and the adaptive filter coefficient sequential update process are repeated while shifting one sample at a time from the second sample at the head of each buffer (the first sample for buffer 1). That is, using the data of F + 1 samples from the first sample at the top of buffer 1 and (F + k−1) samples from the second samples at the top of buffer 2 and buffer 3, the time from the vector data corresponding to time j The above high frequency emphasis signal y is obtained F times until the vector data corresponding to j + F-1 is reached._jThe filter coefficient of the adaptive filter is updated using, and copied to the filter coefficient of the matched filter by the following equation (4)
[Expression 12]

Matched filter output q_jRepeat the process to get
[0064]
(8) Subsequently, the buffer 1 shifts the k samples from the tail by F samples forward. Buffer 2 shifts (k × 2 + M) samples from the tail to F samples forward. The buffer 3 shifts (k × 2 + L) samples from the tail by F samples forward. Next, the next block data is stored in the last F samples of each buffer, and the processing from (1) is repeated.
[0065]
In this embodiment, for convenience, the buffer length is set to (k + F) for the buffer 1, (M + 2 × k + F) for the buffer 2, and (L + 2 × k + F) for the buffer 3, but as can be seen from FIG. The buffer length of each buffer for controlling the buffer at the same timing may be at least (F + 1) for buffer 1, at least (M + k + F) for buffer 2, and at least (L + k + F) for buffer 3.
[0066]
Of the signal extraction method and apparatus of the present invention described above, the tone component is reduced by the first invention and another embodiment thereof (invention that can extract voice from a single-channel signal in which voice and music are mixed). The signal components (a) and (b) shown in FIG. 8 showing the input signal and the output signal of the device of the present invention are obtained by emphasizing the speech component (effect of the present invention).
[0067]
Similarly, among the signal extraction method and apparatus of the present invention, the noise component is reduced by the second invention and another embodiment thereof (invention that can extract sound from a single-channel signal in which sound and noise are mixed). The signal components (a) and (b) of FIG. 9 showing the input signal and the output signal of the device of the present invention, and FIG. The running spectrum is shown by waveforms (a) and (b).
[0068]
【The invention's effect】
According to the present invention, it is possible to extract only a desired audio signal in real time from a single-channel signal mixed with musical sounds and noises in a state in which it is most audible to hear.
In addition, the present invention can be applied in various fields such as speech recognition preprocessing and hearing aids used by elderly people, hearing-impaired persons, and the like.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a signal extraction method (particularly, the first invention) according to the present invention.
FIG. 2 shows the configuration of an adaptive processing circuit using an equivalent circuit.
FIG. 3 is a flowchart showing the signal extraction method of the present invention (particularly, the second invention).
FIG. 4 shows a high frequency emphasis circuit as an equivalent circuit.
FIG. 5 shows a matched filter configuration with an equivalent circuit.
FIG. 6 shows a configuration of data buffering in the signal extraction device of the present invention (particularly, another embodiment of the first invention).
FIG. 7 shows a configuration of data buffering in the signal extraction device of the present invention (particularly, another embodiment of the second invention).
FIG. 8 shows the effects of the present invention (particularly, the first invention and another embodiment of the first invention) in comparison with signal waveforms.
FIG. 9 shows the effect of the present invention (particularly, the second invention and another embodiment of the second invention) in comparison with signal waveforms.
FIG. 10 shows the effects of the present invention (particularly, the second invention and another embodiment of the second invention) by comparing the waveforms of running spectra.
[Explanation of symbols]
1 Adaptive filter
2 Adaptive filter coefficient successive update unit
3 Subtractor
D 1 sample delay
Σ summer
A Gain

Claims

A signal extraction apparatus for extracting a desired signal having a low temporal correlation from a one-channel digital input signal in which a desired signal having a low temporal correlation and an undesired signal having a high temporal correlation are mixed,
One of the two-branched digital input signals, which has been delayed by the number of samples N, is input, and the filter coefficient vector is sequentially updated with the number of taps k specified at one sample interval. An adaptive filter;
Subtracting means for subtracting the output signal of the adaptive filter from the other signal of the two-branched digital input signal and outputting the subtraction result as a desired signal having a low temporal correlation;
Using the output signal from the subtracting means and the output signal from each tap of the adaptive filter,

W _j : filter coefficient vector of the adaptive filter
μ: Convergence coefficient
e _j : Output signal of subtraction means (desired signal)
X _j : input signal vector to the adaptive filter
| X _j |: Norm of input signal vector
And at least adaptive filter coefficient sequential updating means for sequentially updating the filter coefficient vector W _j of the adaptive filter,
One of the delayed digital input signals is input to the adaptive filter;

z _j : output signal of the adaptive filter
<W _j , X _j >: inner product of filter coefficient vector and input signal vector
And the adaptive filter output signal z _j which is the calculation result is input to the subtracting means,
e _j = y _j −z _j
y _j : the other signal of the two-branched digital input signal
Calculates the signal extraction apparatus and outputs a low desired signals e _j of the temporally correlated.

The signal extraction device according to claim 1,
And further comprising a block dividing means for dividing the digital input signal into blocks for each F sample at the input end of the apparatus,
A first buffer means having a storage capacity of at least F samples provided at an input terminal of the subtracting means to which the other signal of the two-branched digital input signals is inputted , the output signal of the block dividing means being Sequentially storing in units of blocks, and counting from a first predetermined sample predetermined according to the storage capacity of the first buffer means stored in the first buffer means after the storage. (F-1) The stored sample before the sample is input to the subtracting means as the other signal of the two-branched digital input signal, and until the next block unit storage following the block unit storage, Counting from the first predetermined sample in time (F-1) while shifting one sample at a time from the stored sample before the sample in the first predetermined sample direction. A first buffer means for inputting to said subtraction means repeats the F samples,
A second buffer means having a storage capacity of at least (N + k + F) samples, which also serves as a delay of the number of samples N, and sequentially stores the output signal of the block dividing means in units of the blocks; (N + k + F−1) from the second predetermined sample determined in advance according to the storage capacity of the second buffer means stored in the second buffer means at the same timing as the predetermined sample. ) A set of k consecutive samples from the storage sample before the sample toward the second predetermined sample is input to the adaptive filter as each tap output signal of the adaptive filter, and the next block following the storage in the block unit Storage sample before (N + k + F-1) samples in time counting from the second predetermined sample until storage in block units The total of F sets of the k consecutive samples are repeated in synchronization with the first buffer means while shifting one sample at a time in the second predetermined sample direction, and the adaptive filter is used as each tap output signal of the adaptive filter. A signal extraction device comprising: second buffer means for inputting to a filter.

3. The signal extraction apparatus according to claim 1, wherein the number of samples N of the delay amount, the number of taps k, and the convergence coefficient μ are controlled .

A signal extraction apparatus for extracting a desired signal having a high temporal correlation from a one-channel digital input signal in which a desired signal having a high temporal correlation and an undesired signal having a low temporal correlation are mixed,
High-frequency emphasizing means for inputting a first signal of the three-branched digital input signal, performing high-frequency emphasis processing and designated gain multiplication on the signal, and outputting a high-frequency emphasizing signal;
A second signal of the three-branched digital input signal that is delayed by the number of samples M is input, the number of taps is designated at one sample interval, and the filter coefficient vector is sequentially updated. Filters,
Subtracting means for subtracting the output signal of the adaptive filter from the high frequency emphasis signal;
Using the output signal from the subtracting means and the output signal from each tap of the adaptive filter,

W _j : filter coefficient vector of the adaptive filter
μ: Convergence coefficient
e _j : Output signal of subtraction means
X _j : input signal vector to the adaptive filter
| X _j |: Norm of input signal vector
Adaptive filter coefficient sequential update means for sequentially updating the filter coefficient vector W _j of the adaptive filter,
The adaptive filter has the same configuration as the adaptive filter, and includes at least a matched filter to which a filter coefficient vector of the adaptive filter is copied,
A third signal of the three-branched digital input signal that is delayed by the number of samples L is input to the matched filter,

q _j : Matched filter output signal
P _j : input signal vector to the matched filter
<W _j , P _j >: inner product of filter coefficient vector and input signal vector
And a desired signal q _j having a high temporal correlation is output .

5. The signal extraction device according to claim 4, wherein the high frequency emphasizing means includes delay means for delaying the input signal by one sample,
Differential means for generating and outputting a differential signal between input and output signals of the delay means ;
A signal extracting apparatus comprising: a multiplying unit that multiplies the input signal connected to the preceding stage of the delaying unit or the subsequent stage of the difference unit.

The signal extraction device according to claim 5, wherein
And further comprising a block dividing means for dividing the digital input signal into blocks for each F sample at the input end of the apparatus,
A first buffer means having a storage capacity of at least (F + 1) samples instead of the delay means , wherein the output signal of the block dividing means or the output signal of the block dividing means multiplied by the gain is sequentially applied to the block unit. And after the storage, the first buffer sample stored in the first buffer means is preliminarily determined according to the storage capacity of the first buffer means. A set of two consecutive samples from a stored sample toward the first predetermined sample is output as an input signal of the difference means, and the second block unit is stored between the block unit and the next block unit. Counting from one predetermined sample in time, the sample is shifted in the first predetermined sample direction by one sample from the stored sample before F samples. A first buffer means for outputting said difference means repeating a set of total Group F of the two consecutive samples,
Second buffer means having a storage capacity of at least (M + k + F) samples that also serves as a delay of the number M of samples, and sequentially stores the output signals of the block dividing means in units of blocks, and after the storage, the first buffer means (M + k + F−1) from the second predetermined sample determined in advance according to the storage capacity of the second buffer means stored in the second buffer means at the same timing as the predetermined sample ) A set of k consecutive samples from the storage sample before the sample toward the second predetermined sample is input to the adaptive filter as each tap output signal of the adaptive filter, and the next block following the storage in the block unit Storage sample before (M + k + F-1) samples in time counting from the second predetermined sample until storage in block units The total of F sets of the k consecutive samples are repeated in synchronization with the first buffer means while shifting one sample at a time in the second predetermined sample direction, and the adaptive filter is used as each tap output signal of the adaptive filter. Second buffer means for input to the filter;
A third buffer means having a storage capacity of at least (L + k + F) samples, which also serves as a delay of the number of samples L , sequentially storing the output signals of the block dividing means in units of the blocks and after the storage, the first buffer means; (L + k + F-1) in time counting from a third predetermined sample determined in advance according to the storage capacity of the third buffer means stored in the third buffer means at the same timing as the predetermined sample A set of k consecutive samples from the storage sample before the sample toward the third predetermined sample is input to the matched filter as each tap output signal of the matched filter, and the next block following the storage in the block unit Storage before (L + k + F-1) samples in time counting from the third predetermined sample until storage of units Sample sets from the sample are shifted by one sample in the third predetermined sample direction, and a total of F sets of the k consecutive samples are repeatedly synchronized with the first buffer means as each tap output signal of the matched filter. 3. A signal extraction apparatus comprising: third buffer means for inputting to a matched filter.

7. The signal extraction apparatus according to claim 4, wherein the number of samples M and L of the delay amount, the number of taps k, the convergence coefficient μ, and the gain are respectively controlled. apparatus.