JP5405130B2

JP5405130B2 - Sound reproducing apparatus and sound reproducing method

Info

Publication number: JP5405130B2
Application number: JP2009003880A
Authority: JP
Inventors: 真人戸上; 浩明小窪
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2009-01-09
Filing date: 2009-01-09
Publication date: 2014-02-05
Anticipated expiration: 2029-01-09
Also published as: JP2010161735A

Description

本発明は、車室内などの雑音環境下で複数台のスピーカの音再生制御を行う音再生装置および音再生方法に関する。 The present invention relates to a sound reproduction device and a sound reproduction method for performing sound reproduction control of a plurality of speakers under a noisy environment such as a passenger compartment.

これまで複数のスピーカを用いて、高臨場感で音を再生する技術が広く検討されてきている。一例を挙げると、5.1chサラウンド再生が代表的な技術と言える。また、自動車の車室内のように雑音が大きい環境でも、十分な音量で音声を再生することが可能なように、雑音量をモニタリングして、それに合わせて、再生音量をコントロールするオートボリュームコントロール技術が検討されてきている（例えば、特許文献１参照）。 Hitherto, techniques for reproducing sound with a high sense of presence using a plurality of speakers have been widely studied. For example, 5.1ch surround playback is a typical technology. In addition, auto volume control technology that monitors the amount of noise and controls the playback volume accordingly, so that sound can be played at a sufficient volume even in noisy environments such as the interior of a car. Have been studied (see, for example, Patent Document 1).

さらに、複数のマイクロホンを用いた複数チャンネルのデジタルフィルタ処理により、所望の目的音以外の音を高精度に抑圧する技術が検討されてきている（例えば、非特許文献１参照）。この技術は、複数チャンネルのデジタルフィルタ処理により、所望の方向の音のみを歪みを生じさせることなく抽出することを目的とするものである。この音源分離技術を応用すれば、雑音量の高精度なモニタリングが可能である。 Furthermore, a technique for highly accurately suppressing sounds other than a desired target sound by using a plurality of channels of digital filter processing using a plurality of microphones has been studied (for example, see Non-Patent Document 1). This technique is intended to extract only sound in a desired direction without causing distortion by digital filter processing of a plurality of channels. By applying this sound source separation technique, it is possible to monitor the amount of noise with high accuracy.

特開平４−２３５６００号公報JP-A-4-235600

戸上真人，天野明雄“人間共生ロボットEMIEWの騒音下音声認識技術”，計測と制御，Vol.46，No.6，２００７年６月Masato Togami, Akio Amano “Speech Recognition Technology Under Noise with Human Symbiotic Robot EMIEW”, Measurement and Control, Vol.46, No.6, June 2007

しかしながら、従来のオートボリュームコントロール技術（特許文献１記載）では、ユーザ聴取位置から見た雑音源の方向と再生音をスピーカアレイで再生した際の音像定位方向が重なった場合に、音が聞き取りにくくなるという問題点があった。つまり、人間の聴覚プロセスには、両耳間到達時間差や振幅差の情報から到来方向ごとに音を聞き分ける機能が組み込まれていると考えられている。しかし、音の方向が重なると、そのような機能ではスピーカ再生音を聞き分けることができなくなってしまうためだと考えられる。 However, in the conventional auto volume control technology (described in Patent Document 1), when the direction of the noise source viewed from the user listening position and the sound image localization direction when the reproduced sound is reproduced by the speaker array overlap, it is difficult to hear the sound. There was a problem of becoming. In other words, it is considered that the human auditory process incorporates a function for listening to sounds for each direction of arrival based on information on arrival time differences between both ears and amplitude differences. However, if the sound directions overlap, it is considered that such a function makes it impossible to distinguish the speaker playback sound.

本発明は、このような問題点に鑑みてなされたものであり、雑音が存在する環境でも、所望音を聞きやすい音で聴取可能にできる音再生装置および音再生方法を提供することを課題とする。 The present invention has been made in view of such problems, and it is an object of the present invention to provide a sound reproducing device and a sound reproducing method capable of listening to a desired sound with an easy-to-hear sound even in an environment where noise exists. To do.

前記課題を解決するために、本発明による音再生装置は、例えば、複数のマイクロホンアレイを用いた音源分離処理により、マイクロホンアレイなどから見た相対的な音源方向を推定するとともに、推定した音源方向をユーザ位置から見た音源方向に変換する音源方向変換処理部を有する。そして、音再生装置は、推定した前記マイクロホンアレイの位置での音源方向をユーザ聴取位置での音源方向に変換する音源方向変換処理部を有する。さらに、音再生装置は、音源方向変換処理部などによって変換された音源方向を基に、再生音源以外の雑音源のユーザ聴取位置での音源方向を算出し、当該雑音源のユーザ聴取位置での音源方向と再生音源としてのスピーカアレイの音像定位方向とが異なるように音像定位方向を制御する出力係数設定部を有する、ことを特徴とする。 In order to solve the above-described problem, the sound reproducing device according to the present invention estimates the relative sound source direction viewed from the microphone array or the like by, for example, sound source separation processing using a plurality of microphone arrays, and the estimated sound source direction. Has a sound source direction conversion processing unit for converting the sound source direction from the user position. The sound reproducing device includes a sound source direction conversion processing unit that converts the sound source direction at the estimated position of the microphone array into the sound source direction at the user listening position. Furthermore, the sound reproduction device calculates the sound source direction at the user listening position of a noise source other than the reproduced sound source based on the sound source direction converted by the sound source direction conversion processing unit and the like, and at the user listening position of the noise source. It has an output coefficient setting unit for controlling the sound image localization direction so that the sound source direction and the sound image localization direction of the speaker array as a reproduction sound source are different.

本発明によれば、雑音が存在する環境でも、所望音を聞きやすい音で聴取可能である。 According to the present invention, it is possible to listen to a desired sound with a sound that is easy to hear even in an environment where noise exists.

本発明による第１実施形態の音再生装置の適用例を示す説明図である。It is explanatory drawing which shows the example of application of the sound reproduction apparatus of 1st Embodiment by this invention. 本発明による第１実施形態の音再生装置を示すハードウェア構成図である。It is a hardware block diagram which shows the sound reproduction apparatus of 1st Embodiment by this invention. 本発明による第１実施形態のプログラム構成を示すブロック図である。It is a block diagram which shows the program structure of 1st Embodiment by this invention. 音源位置変換処理の幾何学的なイメージを示す説明図である。It is explanatory drawing which shows the geometric image of a sound source position conversion process. 本実施形態で設定した出力係数を出力ソースに重畳して、スピーカから出力する構成を示したブロック図である。It is the block diagram which showed the structure which superimposes the output coefficient set in this embodiment on an output source, and outputs it from a speaker. 音源位置変換部の詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of a sound source position conversion part. 図３の出力係数決定部の第１例を詳細に示すブロック図である。It is a block diagram which shows the 1st example of the output coefficient determination part of FIG. 3 in detail. 本発明の第１実施形態で生成されるヒストグラムの一例である。It is an example of the histogram produced | generated by 1st Embodiment of this invention. 推定した雑音の音源位置とユーザ位置及びスピーカ合成波面のユーザ位置から見た定位方向の一例を示す説明図である。It is explanatory drawing which shows an example of the localization direction seen from the sound source position and user position of the estimated noise, and the user position of the speaker synthetic | combination wavefront. 図３の出力係数決定部の第２例を詳細に示すブロック図である。It is a block diagram which shows the 2nd example of the output coefficient determination part of FIG. 3 in detail. 変形例の方向行列計算部を示すブロック図である。It is a block diagram which shows the direction matrix calculation part of a modification. 図３の音源分離部を詳細に示すブロック図である。It is a block diagram which shows the sound source separation part of FIG. 3 in detail. 音源分離フィルタの適応処理を示すフローチャートである。It is a flowchart which shows the adaptive process of a sound source separation filter. 図３の音源位置推定部を詳細に示すブロック図である。It is a block diagram which shows the sound source position estimation part of FIG. 3 in detail. 図３の音響エコーキャンセラを詳細に示すブロック図である。It is a block diagram which shows the acoustic echo canceller of FIG. 3 in detail. 本発明による第１実施形態のソフトウェアブロックとハードウェアとの関連性を示す説明図である。It is explanatory drawing which shows the relationship between the software block of 1st Embodiment by this invention, and hardware. 本実施形態において音楽などのオーディオ出力音の出力方法を制御する構成を示すブロック図である。It is a block diagram which shows the structure which controls the output method of audio output sounds, such as music, in this embodiment. 出力係数決定タイミングを決める処理を示すフローチャートである。It is a flowchart which shows the process which determines an output coefficient determination timing. 出力係数設定タイミングとオーディオソース再生タイミングの一例を示したタイミングチャートである。5 is a timing chart showing an example of output coefficient setting timing and audio source reproduction timing. 本発明による第２実施形態の音再生装置を示すハードウェア構成図である。It is a hardware block diagram which shows the sound reproduction apparatus of 2nd Embodiment by this invention. 本発明によるユーザ聴取位置での音源位置変換処理を使って、仮想的な音源位置における音場を再現する音場再現システムのソフトウェア構成を示すブロック図である。It is a block diagram which shows the software structure of the sound field reproduction system which reproduces the sound field in a virtual sound source position using the sound source position conversion process in the user listening position by this invention.

以下、本発明を実施するための最良の形態（以下、「実施形態」という）について、添付した各図を参照し、詳細に説明する。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as “embodiment”) will be described in detail with reference to the accompanying drawings.

図１は、本発明による第１実施形態の音再生装置１の適用例を示す説明図である。
図１を参照し、音再生装置１の概要を説明する。自動車１０などの車室１１内に複数のマイクロホン１０２を有するマイクロホンアレイ１０１を設置する。マイクロホンアレイ１０１で収録した音から雑音の到来方向を推定する。そして、スピーカ１１２によって再生される音が、ユーザ聴取位置で、雑音の到来方向と異なる方向に定位されるように、スピーカ１１２ごとにスピーカ出力係数を設定する。このような構成により、ユーザは、聞きやすい音でスピーカ再生音を聞くことが可能となる。 FIG. 1 is an explanatory diagram showing an application example of the sound reproducing device 1 according to the first embodiment of the present invention.
The outline of the sound reproducing device 1 will be described with reference to FIG. A microphone array 101 having a plurality of microphones 102 is installed in a vehicle compartment 11 such as an automobile 10. The direction of noise arrival is estimated from the sound recorded by the microphone array 101. Then, a speaker output coefficient is set for each speaker 112 so that the sound reproduced by the speaker 112 is localized in a direction different from the noise arrival direction at the user listening position. With such a configuration, the user can listen to the speaker playback sound with a sound that is easy to hear.

図２は、本発明による第１実施形態の音再生装置１を示すハードウェア構成図である。
マイクロホンアレイ１０１は、車室１１内で音を収録し、収録した音を示すアナログ信号を出力する。
多チャンネルＡ／Ｄ変換機２０２は、このアナログ信号をマイクロホン１０２ごとにデジタル信号に変換する。
スピーカアレイ１１１は、所望の再生音を車室１１内に放射する。 FIG. 2 is a hardware configuration diagram showing the sound reproducing device 1 according to the first embodiment of the present invention.
The microphone array 101 records sound in the passenger compartment 11 and outputs an analog signal indicating the recorded sound.
The multi-channel A / D converter 202 converts this analog signal into a digital signal for each microphone 102.
The speaker array 111 radiates desired reproduction sound into the passenger compartment 11.

中央演算装置２０３は、変換されたデジタル信号に、デジタル信号処理を施す。具体的には、デジタル信号中に含まれる雑音成分を抽出し、雑音到来方向を推定する。そして、その雑音到来方向からスピーカ出力係数を制御する。信号処理プログラムは、不揮発性メモリ２０５に蓄えられていて、実行時に揮発性メモリ２０４にロードされ、展開される。またワークメモリなどプログラム実行に必要なメモリ領域は揮発性メモリ２０４内に確保される。また、マイクロホン１０２の配置などの情報は不揮発性メモリ２０５に蓄えられている。
中央演算装置２０３は、スピーカ出力係数を制御し、生成したデジタル信号（スピーカ出力信号）を出力する。 The central processing unit 203 performs digital signal processing on the converted digital signal. Specifically, a noise component included in the digital signal is extracted, and the noise arrival direction is estimated. Then, the speaker output coefficient is controlled from the noise arrival direction. The signal processing program is stored in the non-volatile memory 205, loaded into the volatile memory 204 at the time of execution, and expanded. Also, a memory area necessary for program execution such as work memory is secured in the volatile memory 204. Information such as the arrangement of the microphone 102 is stored in the nonvolatile memory 205.
The central processing unit 203 controls the speaker output coefficient and outputs the generated digital signal (speaker output signal).

多チャンネルＤ／Ａ変換機２０６は、スピーカ出力信号をアナログ信号に変換して、スピーカアレイ１１１の複数のスピーカ１１２ごとに出力する。
スピーカ１１２は、このアナログ信号によって鳴動し、音を空中に放射する。 The multi-channel D / A converter 206 converts the speaker output signal into an analog signal and outputs it to each of the plurality of speakers 112 of the speaker array 111.
The speaker 112 is sounded by this analog signal and radiates sound into the air.

また、座席センサ２０８によって乗員／同乗者の存在（着席）を検出し、乗員／同乗者の発話の有無にかかわらず、乗員位置／同乗者位置を音源位置またはユーザ聴取位置とみなして、スピーカ出力係数を制御するような構成にしてもよい。具体的には、乗員／同乗者を雑音源とみなして、乗員方向／同乗者方向と異なる方向にスピーカ出力音が定位されるように制御してもよいし、乗員／同乗者を聴取者とみなして、乗員位置／同乗者位置でのスピーカ出力音の定位方向と雑音の到来方向とが異なるようにスピーカ出力音の係数を制御するような構成にしてもよい。後者の構成を採ることで、運転者のみならず他の乗員／同乗者にとっても所望の音を聞きやすい音場を形成することが可能となる。 Further, the presence / absence of the passenger / passenger is detected by the seat sensor 208, and the speaker output is regarded as the sound source position or the user listening position regardless of whether the passenger / passenger speaks or not. You may make it the structure which controls a coefficient. Specifically, the occupant / passenger may be regarded as a noise source, and control may be performed so that the speaker output sound is localized in a direction different from the occupant direction / passenger direction. Accordingly, the speaker output sound coefficient may be controlled so that the localization direction of the speaker output sound at the passenger position / passenger position differs from the noise arrival direction. By adopting the latter configuration, it is possible to form a sound field in which a desired sound can be easily heard not only by the driver but also by other passengers / passengers.

図３は、本発明による第１実施形態のプログラム構成を示すブロック図である。
波形取り込み部３０１は、多チャンネルＡ／Ｄ変換機２０２（図２参照）を制御し、デジタル信号を取得する。
音響エコーキャンセラ３０７は、取得したデジタル信号に含まれるスピーカ出力に起因する成分（音響エコー成分）を除去する。音響エコーキャンセラ３０７の具体的な構成については後記する。音響エコーキャンセラ３０７は、マイクロホン素子ごとに動作する。音響エコー消去後の複数チャンネル信号は、音源分離部３０２に送られる。 FIG. 3 is a block diagram showing the program configuration of the first embodiment according to the present invention.
The waveform capturing unit 301 controls the multi-channel A / D converter 202 (see FIG. 2) and acquires a digital signal.
The acoustic echo canceller 307 removes a component (acoustic echo component) caused by the speaker output included in the acquired digital signal. A specific configuration of the acoustic echo canceller 307 will be described later. The acoustic echo canceller 307 operates for each microphone element. The multi-channel signal after acoustic echo cancellation is sent to the sound source separation unit 302.

通常、車室１１内には多数の音源が存在する。音源分離部３０２は、この多数の音源を音源ごとの信号に分離する。音源の分離は、音響エコーキャンセラ３０７の出力信号を一定時間分取得するたびに行う。分離したそれぞれの信号は、音源ごとに音源位置推定部３０３に送られて、それぞれの音源位置が推定される。推定した音源位置は、マイクロホンアレイ１０１の位置と音源位置の相対位置となる。本実施形態では、ユーザ聴取位置から音源までの相対位置が必要となるため、音源位置変換部３０４では、事前のユーザ聴取位置情報から、ユーザ聴取位置（ユーザ位置）から見た音源位置を算出する。 Usually, a large number of sound sources exist in the passenger compartment 11. The sound source separation unit 302 separates the large number of sound sources into signals for each sound source. The sound source is separated every time the output signal of the acoustic echo canceller 307 is acquired for a certain time. The separated signals are sent to the sound source position estimation unit 303 for each sound source, and the respective sound source positions are estimated. The estimated sound source position is a relative position between the position of the microphone array 101 and the sound source position. In the present embodiment, since the relative position from the user listening position to the sound source is necessary, the sound source position conversion unit 304 calculates the sound source position viewed from the user listening position (user position) from the prior user listening position information. .

図４は、音源位置変換処理の幾何学的なイメージを示す説明図である。
具体的には、図４に示すように、ユーザ位置とマイクロホンアレイ１０１（図１参照）の位置から求めることができるユーザ位置ベクトルＶ１をマイクロホンアレイ１０１から見た音源の推定音源位置ベクトルＶ２に足し合わせることで、ユーザ位置から見た変換後音源位置ベクトルＶ３を取得することが可能となる。なお、マイクロホンアレイ１０１の設置位置は固定位置とする。この場合、ユーザ位置ベクトルＶ１は、ユーザ位置が分かれば決まる。ユーザ位置は、「運転席１２」にプリセットしてもよいし、座席センサ２０８（図２参照）が検知した乗員位置／同乗者位置の情報から決めてもよい。 FIG. 4 is an explanatory diagram showing a geometric image of the sound source position conversion process.
Specifically, as shown in FIG. 4, the user position vector V1 that can be obtained from the user position and the position of the microphone array 101 (see FIG. 1) is added to the estimated sound source position vector V2 of the sound source viewed from the microphone array 101. By combining them, it is possible to obtain the converted sound source position vector V3 viewed from the user position. The installation position of the microphone array 101 is a fixed position. In this case, the user position vector V1 is determined if the user position is known. The user position may be preset to “driver's seat 12” or may be determined from information on the passenger position / passenger position detected by the seat sensor 208 (see FIG. 2).

図３に戻り、ヒストグラム更新部３０５は、変換した音源位置の情報から、雑音の到来方向のヒストグラムＰ(θ)を生成する。ここで、θは、音源方位角とする。ヒストグラムは、Ｐ(θ,φ)といった形で、方位角θと仰角φの二次元ヒストグラムを生成してもよい。ここでｉ番目の分離信号の音源方向を方位角θ_i，仰角φ_iとする。雑音の到来があるごとに、到来した雑音のθ_i、φ_iに相当するヒストグラムＰ(θ_i,φ_i)に値１を加算する。また、ｉ番目の分離信号の平均パワーもしくはパワーの関数をＰ(θ_i,φ_i)に加えるような構成を採ってもよい。また、ヒストグラムは音源分離部３０２で一度処理するたびに初期化してもよいし（すなわち、Ｐ(θ,φ)＝０（すべてのθ、φに対して）としてもよいし）、音源分離を一度行うたびに、Ｐ(θ,φ)←αＰ(θ,φ)（αは０以上１以下の定数）といったように忘却係数αを乗算することで、過去の情報をゆっくりと忘れる構成としてもよい。 Returning to FIG. 3, the histogram update unit 305 generates a histogram P (θ) of the noise arrival direction from the converted sound source position information. Here, θ is a sound source azimuth angle. The histogram may be a two-dimensional histogram of azimuth angle θ and elevation angle φ in the form of P (θ, φ). Here, the sound source direction of the i-th separated signal is defined as an azimuth angle θ _i and an elevation angle φ _i . Each time there is an incoming noise, the value 1 is added to the histogram P (θ _i , φ _i ) corresponding to the incoming noise θ _i , φ _i . Further, the average power of the i-th separated signal or a function of power may be added to P (θ _i , φ _i ). The histogram may be initialized every time processing is performed by the sound source separation unit 302 (that is, P (θ, φ) = 0 (for all θ and φ)), or sound source separation may be performed. Each time it is done, the past information is slowly forgotten by multiplying it with a forgetting factor α such as P (θ, φ) ← αP (θ, φ) (α is a constant between 0 and 1). Good.

出力係数決定部３０６は、得られたヒストグラムＰ(θ,φ)の情報からスピーカ出力係数を決定する。ヒストグラムの値が大きいほど、雑音が大きい方向と考えられる。スピーカ出力係数は、得られたヒストグラムＰ(θ,φ)の値が大きい方向との方向の異なりが大きい方向にスピーカ出力音が定位されるように制御する。つまり、ヒストグラムＰ(θ,φ)の値が大きい方向とは、雑音が聞こえる頻度が大きく、雑音が大きいと見なせる方向であるから、本実施形態では、この雑音が大きいと見なせる方向を避けて、雑音が聞こえる方向とは異なる方向（典型的には、反対方向）から所望の音が聞こえるように、スピーカ出力音が定位するようにする。 The output coefficient determination unit 306 determines a speaker output coefficient from the information of the obtained histogram P (θ, φ). The greater the value of the histogram, the greater the noise. The speaker output coefficient is controlled so that the speaker output sound is localized in the direction in which the difference in direction from the direction in which the value of the obtained histogram P (θ, φ) is large. That is, the direction in which the value of the histogram P (θ, φ) is large is a direction in which noise is heard frequently and the noise can be regarded as large. In this embodiment, avoiding the direction in which the noise can be regarded as large, The speaker output sound is localized so that a desired sound can be heard from a direction different from the direction in which noise is heard (typically, the opposite direction).

図５は、本実施形態で設定した出力係数を出力ソースに重畳して、スピーカ１１２から出力する構成を示したブロック図である。
本実施形態で設定された出力係数記憶部４０１は、不揮発性メモリ２０５または揮発性メモリ２０４上に確保される。出力ソース取得部４０３は、オーディオやハンズフリー通話の出力音などの原信号を取得する。そして、スピーカ出力部４０２は、出力するスピーカ１１２ごとに、出力係数記憶部４０１に蓄えられている出力係数を重畳して、出力を行う。出力係数は、単なる音量値でもよいし、ＦＩＲ（Finite Impulse Response）フィルタまたは、短時間フーリエ変換により時間周波数領域に変換した後、周波数ごとに出力係数を設定し、時間周波数領域に戻して出力するような構成を採ってもよい。 FIG. 5 is a block diagram showing a configuration in which the output coefficient set in the present embodiment is superimposed on the output source and output from the speaker 112.
The output coefficient storage unit 401 set in the present embodiment is secured on the nonvolatile memory 205 or the volatile memory 204. The output source acquisition unit 403 acquires original signals such as audio and output sound of a hands-free call. And the speaker output part 402 superimposes the output coefficient stored in the output coefficient memory | storage part 401 for every speaker 112 to output, and outputs it. The output coefficient may be a simple volume value, or after being converted into the time-frequency domain by an FIR (Finite Impulse Response) filter or short-time Fourier transform, an output coefficient is set for each frequency, and the output is returned to the time-frequency domain. You may take such a structure.

図６は、図３の音源位置変換部３０４の詳細な構成を示すブロック図である。
音源位置推定部３０３が推定した各音源のマイクロホンアレイ１０１の位置からみた相対的な音源位置Ｐ=(ｘ,ｙ,ｚ)^T（肩字のＴは、ベクトル・行列の転置であることを示す。）が、音源位置変換部３０４へ入力となる。
マイク位置データベース５０４には、マイクロホンアレイ１０１の車室１１内における空間的位置ｐ₂(ｘ₂,ｙ₂,ｚ₂)^Tが記載されているものとする。ユーザ位置抽出部５０２は、車室１１内におけるユーザ聴取位置の空間的位置ｐ_u=(ｘ_u,ｙ_u,ｚ_u)^Tを取得する。座席センサ２０８などにより検出した乗員／同乗者の位置からユーザ聴取位置を決めてもよいし、予めユーザ聴取位置を運転席１２（図１参照）に固定するなどしてプリセットしておいてもよい。変換ベクトル生成部５０３では、ユーザ聴取位置ｐ_uとマイクロホンアレイ１０１の空間的位置ｐ₂の差ｂ＝ｐ₂−ｐ_uを計算する。変換ベクトル加算部５０５では推定したマイクロホンアレイ位置での音源位置Ｐにｂを加算したＰ´＝Ｐ＋ｂを得る。Ｐ´は、ユーザ聴取位置から見た音源の相対的な位置となる。 FIG. 6 is a block diagram showing a detailed configuration of the sound source position conversion unit 304 in FIG.
Relative sound source position P = (x, y, z) ^T as seen from the position of the microphone array 101 of each sound source estimated by the sound source position estimation unit 303 (the superscript T indicates the transposition of a vector / matrix) Is input to the sound source position conversion unit 304.
It is assumed that the microphone position database 504 describes a spatial position p ₂ (x ₂ , y ₂ , z ₂ ) ^T in the passenger compartment 11 of the microphone array 101. The user position extraction unit 502 acquires a spatial position p _u = (x _u , _yu , z _u ) ^T of the user listening position in the passenger compartment 11. The user listening position may be determined from the position of the passenger / passenger detected by the seat sensor 208 or the like, or may be preset by fixing the user listening position to the driver's seat 12 (see FIG. 1) in advance. . The transform vector generator 503 calculates the difference b = p ₂ -p _u spatial position p ₂ of the user listening positions p _u and the microphone array 101. The conversion vector addition unit 505 obtains P ′ = P + b by adding b to the sound source position P at the estimated microphone array position. P ′ is a relative position of the sound source viewed from the user listening position.

このように（図４参照）、簡単なベクトル計算によってユーザ聴取位置での音源位置を知ることができる。音源位置変換処理は、音源分離処理を１度実行するごとに、音源分離部３０２が分離して出力する音源ごとに１回ずつ行ってもよいし、音源分離部３０２が、音源または周波数ごとに音を分離する場合、音源分離処理を１回行うごとに、音源または周波数ごとに１回ずつ音源位置変換処理を行うようにしてもよい。 In this way (see FIG. 4), the sound source position at the user listening position can be known by simple vector calculation. The sound source position conversion process may be performed once for each sound source separated and output by the sound source separation unit 302 every time the sound source separation process is executed, or the sound source separation unit 302 may be performed for each sound source or frequency. When separating sounds, the sound source position conversion process may be performed once for each sound source or frequency each time the sound source separation process is performed once.

図７は、図３の出力係数決定部３０６の第１例を詳細に示すブロック図である。
出力係数決定部３０６は、音源位置変換部３０４（図３参照）により変換した音源ごとかつ周波数ごとの音源位置から、音源の相対的な方位角θまたは仰角値φを取り出す。これは音源位置(ｘ,ｙ,ｚ)を極座標(ｒcosθcosφ,ｒsinθcosφ,sinφ)と見立てて、推定することができる。通常、車室１１内では音源はすべて同一水平面上に存在すると仮定しても実用上問題がないと考えられるため、φ＝０としてもよい。 FIG. 7 is a block diagram showing in detail a first example of the output coefficient determination unit 306 in FIG.
The output coefficient determination unit 306 extracts the relative azimuth angle θ or elevation angle value φ of the sound source from the sound source position for each sound source and for each frequency converted by the sound source position conversion unit 304 (see FIG. 3). This can be estimated by regarding the sound source position (x, y, z) as polar coordinates (r cos θ cos φ, r sin θ cos φ, sin φ). Usually, even if it is assumed that all the sound sources are present on the same horizontal plane in the passenger compartment 11, it may be considered that there is no practical problem, so φ = 0 may be set.

方向行列計算部６０２は、抽出した音源ごとかつ周波数ごとの音源方向（θ）または（θ,φ）から、前記した方法で、音源方向ごとの頻度を示すヒストグラムＰ(θ)またはヒストグラムＰ(θ,φ)を生成する。 The direction matrix calculation unit 602 uses a histogram P (θ) or a histogram P (θ indicating the frequency for each sound source direction from the extracted sound source directions (θ) or (θ, φ) for each frequency in the manner described above. , φ).

図８は、本発明の第１実施形態で生成されるヒストグラムＰ(θ)の一例である。
音源方向θごとに、音源の頻度ＰがヒストグラムＰ(θ)上に得られる。 FIG. 8 is an example of the histogram P (θ) generated in the first embodiment of the present invention.
For each sound source direction θ, the frequency P of the sound source is obtained on the histogram P (θ).

ここで、以後の説明で用いるステアリングベクトルを定義する。音源位置ｐに存在する周波数ｆの音が各マイクロホン１０２に到達するまでの位相の遅延量を要素に持つベクトルａ_p(ｆ)を、次の式（１）で定義する。 Here, a steering vector used in the following description is defined. A vector a _p (f) whose element is a phase delay amount until the sound of the frequency f existing at the sound source position p reaches each microphone 102 is defined by the following equation (1).

ここで、ｊは虚数単位を表すものとする。ここで、Ｍはマイクロホン１０２の数とする。 Here, j represents an imaginary unit. Here, M is the number of microphones 102.

人間の両耳を２つのマイクロホン１０２と見立てると、音源位置ｐから両耳に音が届くまでの位相の遅延量もａ_p(ｆ)で表すことができる。本実施形態においては、ａ_p(ｆ)のマイクロホン１０２間で共通の遅延量は特に意味をもたないため、ａ_p(ｆ)は必ずしも、音源位置からの遅延量として定義する必要はなく、基準のマイクロホン１０２からの遅延量として定義してもよい。本実施形態では、１番目のマイクロホン１０２を基準のマイクロホン１０２として遅延量Ｔ_p,m（ｆ）を次の式（２）で定義する。ｔ_m(ｐ)は音源位置ｐの音がｍ番目のマイクロホン１０２まで届くまでの時間とする。 If the human ears are regarded as two microphones 102, the amount of phase delay until the sound reaches the ears from the sound source position p can also be expressed as a _p (f). In this embodiment, since the delay amount common between the microphones 102 of a _p (f) has no particular meaning, a _p (f) does not necessarily need to be defined as the delay amount from the sound source position. It may be defined as a delay amount from the reference microphone 102. In the present embodiment, the delay amount T _{p, m} (f) is defined by the following equation (2) using the first microphone 102 as a reference microphone 102. t _m (p) is the time until the sound at the sound source position p reaches the m-th microphone 102.

人間の両耳のようにマイクロホン１０２が直線上に並んでいると仮定し、音源位置ｐがマイクロホン間隔に対して十分遠い距離に存在すると仮定すると、Ｔ_p,m(ｆ)は、次の式（３）で近似することができる。 Assuming that the microphones 102 are aligned in a straight line like both human ears, and assuming that the sound source position p exists at a distance sufficiently far from the microphone interval, T _{p, m} (f) is given by It can be approximated by (3).

ここで、ｄ_mはｍ番目のマイクロホン１０２と１番目のマイクロホン１０２の間の距離とする。ｃは音速であり、常温で340[m/s]程度となるため、通常この値に設定する。θは、マイクロホンアレイ１０１を構成する直線に対して直交する平面に対して、マイクロホンアレイ１０１と音源位置ｐを結んだ直線が成す角とする。これをマイクロホンアレイ１０１の位置から見た相対的な方位角とする。マイクロホンアレイ１０１が直線配置以外の場合は、Ｔ_p,m(ｆ)はより複雑な形となるが、いずれにせよマイクロホンアレイ１０１の幾何学配置が既知であれば、単純な幾何学計算により求めることができる。本実施形態では、マイクロホンアレイ１０１の幾何学配置はあらかじめ不揮発性メモリ２０５（図２参照）に記憶されているとし、その情報を利用してステアリングベクトルを生成するものとする。 Here, d _m is the distance between the m-th microphone 102 and the first microphone 102. Since c is the speed of sound and is about 340 [m / s] at room temperature, it is usually set to this value. θ is an angle formed by a straight line connecting the microphone array 101 and the sound source position p with respect to a plane orthogonal to the straight line constituting the microphone array 101. This is a relative azimuth angle viewed from the position of the microphone array 101. When the microphone array 101 is other than the linear arrangement, T _{p, m} (f) has a more complicated shape. However, if the geometric arrangement of the microphone array 101 is known anyway, it is obtained by simple geometric calculation. be able to. In the present embodiment, it is assumed that the geometric arrangement of the microphone array 101 is stored in advance in the nonvolatile memory 205 (see FIG. 2), and the steering vector is generated using the information.

図７に戻り、方向行列計算部６０２は、ヒストグラムＰ(θ)またはヒストグラムＰ(θ，φ)を使って次の式（４）で定義されるマイクロホンアレイ１０１の位置での雑音共分散行列推定値Ｒ_n(ｆ)を計算する。 Returning to FIG. 7, the direction matrix calculation unit 602 estimates the noise covariance matrix at the position of the microphone array 101 defined by the following equation (4) using the histogram P (θ) or the histogram P (θ, φ). The value R _n (f) is calculated.

ここでＪはヒストグラムの分割数とする。Ｊは各分割された格子として、θ_jは格子ｊの方位角、φ_jは格子ｊの仰角とする。ｎ_j(ｆ)は格子ｊの位置に音源が存在すると仮定した時のユーザ聴取位置から見たステアリングベクトルとする。つまりＲ_n(ｆ)はヒストグラムの頻度が大きいところの影響が大きくなるように設定した行列となる。 Here, J is the number of histogram divisions. J is each divided lattice, θ _j is the azimuth angle of the lattice j, and φ _j is the elevation angle of the lattice j. n _j (f) is a steering vector viewed from the user listening position when it is assumed that a sound source exists at the position of the grid j. That is, R _n (f) is a matrix that is set so as to increase the influence of the histogram having a high frequency.

さらにＨ(ｆ)を次の式（５）で定義する。 Further, H (f) is defined by the following equation (5).

ここで、Ｌはスピーカ素子数とする。ｈ_i(ｆ)はｉ番目のスピーカ位置に音源があると仮定した時のユーザ聴取位置から見たステアリングベクトルとする。 Here, L is the number of speaker elements. h _i (f) is a steering vector viewed from the user listening position when it is assumed that there is a sound source at the i-th speaker position.

Ｈ(ｆ)とＲ_n(ｆ)を用いて、行列Ａ(ｆ)を、次の式（６）で求める。 Using H (f) and R _n (f), a matrix A (f) is obtained by the following equation (6).

ユーザ聴取位置が複数存在する場合は、次の式（７）のようにユーザ聴取位置ごとに雑音共分散行列Ｒ_i,n(ｆ)と、スピーカ１１２のステアリングベクトルからなる行列Ｈ_i(ｆ)を用いてＡ(ｆ)を構成してもよい。 When there are a plurality of user listening positions, a matrix H _i (f) composed of a noise covariance matrix R _{i, n} (f) and a steering vector of the speaker 112 for each user listening position as shown in the following equation (7). A (f) may be configured using

Ａ(ｆ)をこのような構成にすることで、複数の聴取位置で聞き取りやすいスピーカ出力音を得ることができる。 By configuring A (f) in this way, it is possible to obtain speaker output sound that is easy to hear at a plurality of listening positions.

固有値・ベクトル計算部６０３では、Ａ(ｆ)の最小固有値を与える固有ベクトルＳ(ｆ)を求める。Ｓ(ｆ)はスピーカ素子数分の要素数を持つ。ここで、Ｓ(ｆ)の各要素を各スピーカ素子の伝達関数に重畳して、すべてのスピーカ１１２から同時に音を放射した場合、放射音の合成波面のステアリングベクトルはＨ(ｆ)Ｓ(ｆ)となる。Ｈ(ｆ)Ｓ(ｆ)は雑音のステアリングベクトルとの異なりが最大となるステアリングベクトルになる。 The eigenvalue / vector calculation unit 603 obtains an eigenvector S (f) that gives the minimum eigenvalue of A (f). S (f) has as many elements as the number of speaker elements. Here, when each element of S (f) is superimposed on the transfer function of each speaker element and sound is radiated simultaneously from all the speakers 112, the steering vector of the synthesized wavefront of the radiated sound is H (f) S (f ). H (f) S (f) is a steering vector having a maximum difference from the noise steering vector.

最小コスト係数算出部６０４は、求めた周波数ごとのスピーカ出力係数Ｓ(ｆ)をスピーカ（素子）１１２ごとに逆フーリエ変換して時間領域のスピーカ出力係数Ｓ(ｔ)を求める。Ｓ(ｔ)はＦＩＲフィルタとみなせるため、Ｓ(ｔ)をスピーカ１１２から出力する時間領域の音に畳みこむことで周波数ごとのステアリングベクトルがＨ(ｆ)Ｓ(ｆ)となる合成波面を得ることができる。またＳ(ｆ)を得るために、次の式（８）で定義される雑音のステアリングベクトルとの異なりが最大となる音源位置ｐ_minを得た後、その次の式（９）でＳ(ｆ)を求めてもよい。 The minimum cost coefficient calculation unit 604 obtains the speaker output coefficient S (t) in the time domain by performing inverse Fourier transform on the obtained speaker output coefficient S (f) for each frequency for each speaker (element) 112. Since S (t) can be regarded as an FIR filter, by convolving S (t) with the time-domain sound output from the speaker 112, a combined wavefront in which the steering vector for each frequency becomes H (f) S (f) is obtained. be able to. Further, in order to obtain S (f), after obtaining the sound source position p _min where the difference from the noise steering vector defined by the following equation (8) is maximized, S (f) is obtained by the following equation (9). f) may be obtained.

このようにして求めた合成波面のステアリングベクトルＨ(ｆ)Ｓ(ｆ)は定義される音源位置ｐ_minのステアリングベクトルと完全に一致するとともに、出力係数Ｓ(ｆ)が最小となるようなものとなる。 The resultant wavefront steering vector H (f) S (f) thus obtained completely coincides with the steering vector at the defined sound source position p _min and the output coefficient S (f) is minimized. It becomes.

図９は、推定した雑音の音源位置とユーザ位置及びユーザ位置から見たスピーカ合成波面の定位方向の一例を示す説明図である。
本発明では、このように合成波面の方向を雑音の音源位置との異なりが大きい方向に設定することが可能となる。 FIG. 9 is an explanatory diagram illustrating an example of the estimated noise source position, the user position, and the localization direction of the speaker composite wavefront viewed from the user position.
In the present invention, it is possible to set the direction of the combined wavefront in such a way that the difference from the noise source position is large.

図１０は、出力係数決定部３０６の第２例を詳細に示すブロック図である。
この出力係数決定部３０６は、複数のスピーカ１１２の中から雑音とのステアリングベクトルの違いが最も大きいスピーカ（素子）１１２を選択する構成を有している。 FIG. 10 is a block diagram illustrating in detail a second example of the output coefficient determination unit 306.
The output coefficient determination unit 306 has a configuration that selects a speaker (element) 112 having the largest difference in steering vector from noise from a plurality of speakers 112.

この第２例の方向行列計算部７０２は、図７に示す第１例の方向行列計算部６０２と同様に、Ｒ_n(ｆ)を計算する。
スピーカ内積計算部７０３は、次の式（１０）で定義される各スピーカ（素子）１１２のステアリングベクトルとＲ_n(ｆ)との内積計算を行う。 The direction matrix calculation unit 702 of the second example calculates R _n (f) in the same manner as the direction matrix calculation unit 602 of the first example shown in FIG.
The speaker inner product calculation unit 703 calculates the inner product of the steering vector of each speaker (element) 112 defined by the following equation (10) and R _n (f).

最小コスト係数算出部７０４は、次の式（１１）により内積値が最小となるスピーカ（素子）１１２を選択する。 The minimum cost coefficient calculation unit 704 selects the speaker (element) 112 having the smallest inner product value according to the following equation (11).

図１１は、変形例の方向行列計算部８０１を示すブロック図である。
この方向行列計算部８０１は、図１０に示す方向行列計算部７０２に、同乗者位置推定部８０２および既知雑音位置８０３が付加された構成である。したがって、この方向行列計算部８０１は、音源分離により検出した音源の情報のみから雑音共分散行列Ｒ_n(ｆ)を生成するのではなく、同乗者位置推定部８０２からの乗員／同乗者の情報やワイパー音、エンジン音など予め既知の音源の情報を表す既知雑音位置８０３も用いてＲ_n(ｆ)を生成する構成を有している。 FIG. 11 is a block diagram illustrating a directional matrix calculation unit 801 according to a modification.
The directional matrix calculation unit 801 has a configuration in which a passenger position estimation unit 802 and a known noise position 803 are added to the directional matrix calculation unit 702 shown in FIG. Therefore, the directional matrix calculation unit 801 does not generate the noise covariance matrix R _n (f) only from the sound source information detected by sound source separation, but the occupant / passenger information from the passenger position estimation unit 802. R _n (f) is also generated using a known noise position 803 that represents information of a known sound source such as a wiper sound and an engine sound.

同乗者位置推定部８０２は、座席センサ２０８（図２参照）の情報から乗員／同乗者が座っている位置を検出し、その位置を仮想的な雑音音源位置とみなして、雑音の方向ヒストグラムＰ(θ)に追加する。追加する頻度値はあらかじめ定めた値を入力する。既知雑音位置８０３はワイパー音やエンジン音などの既知の音源位置をプリセットしておいたものを読み出し、雑音の方向ヒストグラムＰ(θ)に追加する。方向行列計算部８０１では、ユーザ聴取位置への変換後の雑音の音源位置と乗員位置／同乗者位置及び既知雑音位置の情報から作られた雑音の方向ヒストグラムＰ(θ)から雑音の共分散行列Ｒ_n(ｆ)を生成し、Ｒ_n(ｆ)を出力する。 The passenger position estimation unit 802 detects the position where the occupant / passenger is sitting from the information of the seat sensor 208 (see FIG. 2), regards the position as a virtual noise source position, and determines the noise direction histogram P. Add to (θ). The frequency value to be added is a predetermined value. The known noise position 803 reads out preset sound source positions such as wiper sounds and engine sounds, and adds them to the noise direction histogram P (θ). In the direction matrix calculation unit 801, the noise covariance matrix is obtained from the noise source histogram and the noise direction histogram P (θ) generated from the information of the occupant position / passenger position and the known noise position after the conversion to the user listening position. R _n (f) is generated and R _n (f) is output.

図１２は、図３の音源分離部３０２を詳細に示すブロック図である。
各マイクロホン（素子）１０２で受音したデジタル音圧波形は、バッファリング部９０１に送られる。
バッファリング部９０１は、マイクロホン（素子）１０２ごとに例えば数秒間データが溜められ、データが溜まる度に後段処理にデータを出力する。
短時間周波数変換部９０２は、バッファリング部９０１の出力信号を例えば数十[ｍｓ]ごとに処理する。処理の単位をフレームと呼び、１フレームで処理するマイクロホン（素子）１０２ごとのポイント数をフレームサイズLframeと呼ぶ。１フレームごとに処理を開始する位置をフレームシフトLshift分ずつずらす。つまりτ番目のフレームで処理するデータはτ*Lshiftポイントからτ*Lshift＋Lframeである。フレームごとにデータを短時間フーリエ変換により周波数領域に変換する。ｍ番目のマイクロホン素子について、フレームτでの周波数ｆ成分をｘ_m(ｆ,τ)と表記する。短時間フーリエ変換に先立って、直流成分カット、窓関数重畳などの処理を波形（信号）に施してもよい。窓関数としてはハミング窓やハニング窓、ブラックマン窓などが適用可能である。 FIG. 12 is a block diagram showing in detail the sound source separation unit 302 of FIG.
The digital sound pressure waveform received by each microphone (element) 102 is sent to the buffering unit 901.
The buffering unit 901 stores data for several seconds for each microphone (element) 102, for example, and outputs the data to subsequent processing each time data is stored.
The short-time frequency conversion unit 902 processes the output signal of the buffering unit 901, for example, every several tens [ms]. The unit of processing is called a frame, and the number of points for each microphone (element) 102 processed in one frame is called a frame size Lframe. The processing start position is shifted by one frame shift Lshift every frame. That is, the data to be processed in the τ-th frame is τ * Lshift + Lframe from the τ * Lshift point. Data is converted into the frequency domain by short-time Fourier transform for each frame. For the m-th microphone element, the frequency f component in the frame τ is expressed as x _m (f, τ). Prior to the short-time Fourier transform, processing such as DC component cut and window function superposition may be performed on the waveform (signal). As the window function, a Hamming window, Hanning window, Blackman window, or the like can be applied.

フィルタ適応部９０３では音源分離に必要なフィルタを周波数ｆごとに適応処理する。
フィルタリング部９０４では、フィルタ適応部９０３で適応した音源分離フィルタを使って、フレームごと、周波数ごとに音を音源ごとに分離する。ここで、ベクトルＸ(ｆ,τ)を[ｘ₁(ｆ,τ),ｘ₂(ｆ,τ),…,ｘ_M(ｆ,τ)]^Tで定義する。つまり、Ｘ(ｆ,τ)は、フレームτ、周波数ｆのすべてのマイクロホン（素子）１０２の信号を要素に持つベクトルとなる。Ｘ(ｆ,τ)から音源分離フィルタＷを使って、次の式（１２）で分離信号を得る。 A filter adaptation unit 903 adaptively processes a filter necessary for sound source separation for each frequency f.
The filtering unit 904 uses the sound source separation filter adapted by the filter adaptation unit 903 to separate sound for each sound source for each frame and each frequency. Here, the vector X (f, τ) is defined by [x ₁ (f, τ), x ₂ (f, τ),..., X _M (f, τ)] ^T. That is, X (f, τ) is a vector having elements of signals of all microphones (elements) 102 of frame τ and frequency f. Using the sound source separation filter W from X (f, τ), a separation signal is obtained by the following equation (12).

ここでベクトルｙ(ｆ,τ)の各要素は、各分離信号の時間τ、周波数ｆ成分に相当する。フィルタリング部９０４が出力する分離信号は、パワー正規化部９０５で、時間τ、周波数ｆごとに、
ｙ_norm(ｆ,τ)←ｙ(ｆ,τ)/|ｙ(ｆ,τ)|
で正規化される。つまりｙ_norm(ｆ,τ)のパワーは０から１までの値をとる。棄却判定部９０６では、正規化したパワーのフレーム平均値が閾値より小さい音源・周波数成分については、背景雑音成分と見直して音源分離結果から外し、閾値以上の成分についてのみ音源・周波数成分を出力する。出力の際、音源ごとに短時間フーリエ逆変換をかけて、時間領域の波形に戻した後、出力してもよい。 Here, each element of the vector y (f, τ) corresponds to a time τ and a frequency f component of each separated signal. The separated signal output from the filtering unit 904 is the power normalization unit 905, for each time τ and frequency f.
y _norm (f, τ) ← y (f, τ) / | y (f, τ) |
Normalized by. That is, the power of y _norm (f, τ) takes a value from 0 to 1. Rejection determination section 906 reexamines the sound source / frequency component whose normalized power frame average value is smaller than the threshold as a background noise component, removes it from the sound source separation result, and outputs the sound source / frequency component only for the component equal to or higher than the threshold. . At the time of output, each sound source may be subjected to a short-time inverse Fourier transform to return to a time domain waveform and then output.

図１３は、音源分離フィルタＷの適応処理を示すフローチャートである。
音源分離フィルタＷの値が十分収束したかどうかを判定する（収束判定；ステップＳ１００１）。フィルタ更新回数が所定回数に達した場合収束したと判定してもよいし、後述する非線形共分散行列の非対角項のパワーが対角項のパワーに対してあらかじめ定める値以下になった場合に収束したと判定してもよい。 FIG. 13 is a flowchart showing an adaptation process of the sound source separation filter W.
It is determined whether or not the value of the sound source separation filter W has sufficiently converged (convergence determination; step S1001). When the filter update count reaches the predetermined number, it may be determined that the filter has converged, or when the power of the non-diagonal term of the nonlinear covariance matrix described later is equal to or less than a predetermined value with respect to the power of the diagonal term It may be determined that it has converged.

収束したと判定されれば（ステップＳ１００１のＹｅｓ）、処理を終了し、音源分離フィルタＷを出力する。
収束したと判定されなければ（ステップＳ１００１のＮｏ）、次のステップへ進む。 If it determines with having converged (Yes of step S1001), a process will be complete | finished and the sound source separation filter W will be output.
If it is not determined that it has converged (No in step S1001), the process proceeds to the next step.

処理開始位置をバッファリング部９０１が取り込んだ波形の先頭にセットする。また後記するＲ(ｆ)を０クリアする（初期化；ステップＳ１００２）。 The processing start position is set at the beginning of the waveform fetched by the buffering unit 901. Further, R (f) described later is cleared to 0 (initialization; step S1002).

処理開始位置がバッファリング部９０１が取り込んだ波形の終了位置以下かどうかを判定する（ｉ≦length?判定、ステップＳ１００３）。 It is determined whether or not the processing start position is equal to or less than the end position of the waveform captured by the buffering unit 901 (i ≦ length? Determination, step S1003).

処理開始位置が波形の終了位置に達していない場合（ステップＳ１００３のＮｏ）、フレームごと、周波数ごとのＸ(f,τ)をフィルタリング処理して、音源分離音ｙ(ｆ,τ)を得る（フィルタリング；ステップＳ１００４）。 When the processing start position does not reach the end position of the waveform (No in step S1003), X (f, τ) for each frame and frequency is filtered to obtain a sound source separated sound y (f, τ) ( Filtering; step S1004).

ここで、得た音源分離音は、適応中の音源分離フィルタにより分離した波形であるため、分離が不十分であると考えられる。そこで、Ｒ(ｆ)を、次の式（１３）で更新する（共分散更新；ステップＳ１００５）。 Here, since the obtained sound source separation sound is a waveform separated by the sound source separation filter being applied, it is considered that the separation is insufficient. Therefore, R (f) is updated by the following equation (13) (covariance update; step S1005).

ここでφ(ｘ)は音源の確率分布の微分関数に相当する関数であり、次の式（１４）で定義する。 Here, φ (x) is a function corresponding to the differential function of the probability distribution of the sound source and is defined by the following equation (14).

Ｒ(ｆ)は非線形共分散行列と呼び、この非対角項が０に近づくほど、分離した各音源が独立になっていることを意味する。対角項は各音源の大きさに相当する。したがって、非対角項と対角項の比が重要になる。分離フィルタの収束判定ではこの比をチェックし、収束判定してもよい。 R (f) is called a nonlinear covariance matrix and means that the separated sound sources become independent as the off-diagonal term approaches zero. The diagonal term corresponds to the size of each sound source. Therefore, the ratio of off-diagonal terms and diagonal terms becomes important. In the convergence determination of the separation filter, this ratio may be checked to determine the convergence.

次に、波形の処理開始位置をフレームシフトLshift分加算する（変数更新；ステップＳ１００７）。
そして、ステップＳ１００３以降の処理を繰り返す。 Next, the processing start position of the waveform is added by the frame shift Lshift (variable update; step S1007).
Then, the processes after step S1003 are repeated.

波形処理開始地点がバッファリング部９０１で取り込んだ波形の終了地点に達している場合（ステップＳ１００３のＮｏ）、ステップＳ１００６に処理を移す。 When the waveform processing start point has reached the end point of the waveform captured by the buffering unit 901 (No in step S1003), the process proceeds to step S1006.

次の式（１５）で分離フィルタを更新する（フィルタ更新；ステップＳ１００６）。 The separation filter is updated by the following equation (15) (filter update; step S1006).

ηはフィルタ更新速度を制御するための変数であり、大きいほどフィルタ収束速度は上がるが、フィルタが発散する可能性が大きくなる。小さいほどフィルタ収束速度は遅いが、フィルタが発散する可能性は低くなる。
そして、ステップＳ１００１以降の処理を繰り返す。 η is a variable for controlling the filter update rate. The larger the value, the higher the filter convergence rate, but the greater the possibility that the filter will diverge. The smaller the value is, the slower the filter convergence speed is, but the possibility that the filter diverges becomes lower.
And the process after step S1001 is repeated.

図１４は、図３の音源位置推定部３０３を詳細に示すブロック図である。
音源分離部３０２（図３参照）が分離した分離フィルタの逆行列は音源ごとのステアリングベクトルから構成される行列であることが知られている。
逆行列計算部１１０２は、分離フィルタの逆行列の第ｉ列ｗ(ｆ,τ)^−１ _iを抽出する。以降のブロックはフレームごと、周波数ごとに実行される。マイクロホンアレイ１０１は直線配置とする。本実施形態における音源位置推定部３０３は、マイクロホンアレイ１０１を構成するマイクロホン（素子）１０２を２分割する。分割後のマイクロホンアレイ１０１をサブアレイと呼ぶ。それぞれのサブアレイで音源方向を推定した後、三角測量法により、その音源方向の交点をとることで、方向と距離を知ることができる。 FIG. 14 is a block diagram showing in detail the sound source position estimation unit 303 of FIG.
It is known that the inverse matrix of the separation filter separated by the sound source separation unit 302 (see FIG. 3) is a matrix composed of steering vectors for each sound source.
The inverse matrix calculation unit 1102 extracts the i-th column w (f, τ) ⁻¹ _i of the inverse matrix of the separation filter. Subsequent blocks are executed for each frame and for each frequency. The microphone array 101 is linearly arranged. The sound source position estimation unit 303 in this embodiment divides the microphone (element) 102 constituting the microphone array 101 into two. The divided microphone array 101 is called a subarray. After estimating the sound source direction in each subarray, the direction and distance can be known by taking the intersection of the sound source directions by triangulation.

２分割したサブアレイで音源方向を推定するので、１つのサブアレイ分割部１１０３ごとに２つの方向推定部１１０４を具備し、２つの方向推定部１１０４の推定結果によって、１つの交点推定部１１０５は、ひとつの交点を推定することとなる。 Since the sound source direction is estimated using the subarray divided into two, each subarray division unit 1103 includes two direction estimation units 1104, and one intersection estimation unit 1105 includes one direction estimation unit 1105 according to the estimation results of the two direction estimation units 1104. Will be estimated.

分離フィルタの逆行列の第ｉ列は、サブアレイごとに、次の式（１６）のように分割される。 The i-th column of the inverse matrix of the separation filter is divided as shown in the following Expression (16) for each subarray.

また音源位置ｐに音源があると仮定した時のステアリングベクトルも、次の式（１７）のようにサブアレイごとに２分割される。 Further, the steering vector when it is assumed that there is a sound source at the sound source position p is also divided into two for each subarray as shown in the following equation (17).

方向推定部１１０４は、各サブアレイごとに、次の式（１８）及び式（１９）に基づき、音源方向^θ_i,1(ｆ,τ)及び音源方向^θ_i,2(ｆ,τ)を推定する。 The direction estimation unit 1104 generates the sound source direction ^ θ _{i, 1} (f, τ) and the sound source direction ^ θ _{i, 2} (f, τ) for each subarray based on the following equations (18) and (19). Is estimated.

交点推定部１１０５は、各サブアレイの中心位置から見て推定した音源方向に音源が存在すると考え、三角測量により音源方向と距離を推定する。各サブアレイの中心位置間の距離は予め既知であると仮定できるため、三角測量により音源方向と距離の推定は容易に実行可能である。 The intersection point estimation unit 1105 considers that a sound source exists in the sound source direction estimated from the center position of each subarray, and estimates the sound source direction and distance by triangulation. Since it can be assumed that the distance between the center positions of the subarrays is known in advance, the estimation of the sound source direction and the distance can be easily performed by triangulation.

ヒストグラム推定部１１０６は、周波数ごとに求めた音源方向と距離のヒストグラムを推定し、最もヒストグラムの頻度が大きい音源方向と距離をその音源の方向と距離であると判断し、頻度が大きい音源方向と距離を出力する。 The histogram estimation unit 1106 estimates the histogram of the sound source direction and distance obtained for each frequency, determines the sound source direction and distance with the highest histogram frequency as the direction and distance of the sound source, Output the distance.

図１５は、図３の音響エコーキャンセラ３０７を詳細に示すブロック図である。
スピーカ出力音は空間を伝播し、マイクロホンアレイ１０１で受音される。本実施形態では、入力音は一義的に雑音であると判定するため、音響エコーキャンセラ３０７が存在しない場合、マイクロホンアレイ１０１で受音されたスピーカ出力音は雑音と判断される。したがって、音響エコーキャンセラ３０７が存在しない場合、スピーカ出力係数を設定する際に、以前のスピーカ出力係数との異なりが大きくなるようにスピーカ出力係数を設定することになり、スピーカ出力係数が安定せず音響出力の定位方向が時々刻々不安定に変化してしまうという問題が起こる。この問題を避けるために、マイクロホンアレイ１０１で受音されたスピーカ出力音中に含まれるスピーカ出力成分をあらかじめ除去する必要がある。 FIG. 15 is a block diagram showing in detail the acoustic echo canceller 307 of FIG.
The speaker output sound propagates through the space and is received by the microphone array 101. In this embodiment, since the input sound is uniquely determined to be noise, if the acoustic echo canceller 307 is not present, the speaker output sound received by the microphone array 101 is determined to be noise. Therefore, when the acoustic echo canceller 307 is not present, when the speaker output coefficient is set, the speaker output coefficient is set so that the difference from the previous speaker output coefficient is large, and the speaker output coefficient is not stabilized. There is a problem that the localization direction of the sound output changes from time to time in an unstable manner. In order to avoid this problem, it is necessary to remove the speaker output component included in the speaker output sound received by the microphone array 101 in advance.

参照信号取り込み部１５０１ではスピーカ１１２からの出力音源信号ｕ(ｔ)を取得する。各スピーカ出力信号はスピーカ１１２ごとに異なる出力係数Ｓ_m(ｔ)を重畳される。出力係数重畳部１５０３ではｕ(ｔ)にＳ_m(ｔ)を次の式（２０）で畳みこむ。 The reference signal capturing unit 1501 acquires the output sound source signal u (t) from the speaker 112. Each speaker output signal is superimposed with a different output coefficient S _m (t) for each speaker 112. The output coefficient superimposing unit 1503 convolves S _m (t) with u (t) by the following equation (20).

畳み込み後の信号をｕ_m(ｔ)とする。ここでｕ_m(ｔ)は後段のエコー量推定フィルタと同じ長さを持つベクトルとし、畳み込み後の信号が時間という観点で新しいものから順番に並んでいるベクトルとする。ｕ_m(ｔ)をマイクロホンｍの音響エコーキャンセラ３０７の参照信号として使用する。 The signal after convolution is defined as u _m (t). Here, u _m (t) is a vector having the same length as that of the subsequent echo amount estimation filter, and is a vector in which signals after convolution are arranged in order from the newest in terms of time. u _m (t) is used as a reference signal of the acoustic echo canceller 307 of the microphone m.

入力信号バッファリング部１５０２は、入力信号を所定時間分バッファリングし、後段へ出力する。 The input signal buffering unit 1502 buffers the input signal for a predetermined time and outputs it to the subsequent stage.

フィルタリング部１５０４は、参照信号にエコー量推定フィルタｇ_mを畳みこむ。
エコー消去部１５０６は、推定したエコー量をマイクロホン入力信号ｘ_m(ｔ)から引くことで、エコー消去後の信号ｅ_m(ｔ)を次の式（２１）のように得る。 The filtering unit 1504 convolves the echo amount estimation filter g _m with the reference signal.
Echo canceling unit 1506, by subtracting the estimated echo value from the microphone input signal x _m (t), obtained signal after echo cancellation e _m (t), as in the following equation (21).

フィルタ更新部１５０５は、エコー消去後の信号を０に近づけるようにエコー量推定フィルタｇ_mを次の式（２２）に定義されるようにして更新する。 The filter update unit 1505 updates the echo amount estimation filter g _m as defined by the following equation (22) so that the signal after echo cancellation approaches 0.

ここでμはフィルタ更新係数で０から１までの値をとる。エコー消去部１５０６で出力したエコー消去後の信号を音響エコーキャンセラ３０７による処理済の出力信号として出力する。 Here, μ is a filter update coefficient and takes a value from 0 to 1. The signal after echo cancellation output from the echo cancellation unit 1506 is output as an output signal processed by the acoustic echo canceller 307.

図１６は、本発明による第１実施形態のソフトウェアブロックとハードウェアとの関連性を示す説明図である。
複数のマイクロホン１０２から構成されるマイクロホンアレイ１０１で取り込んだアナログ音圧値は、Ａ／Ｄ変換装置１６０２内に配置されるＡ／Ｄ変換処理部１６０２ａによりデジタル音圧値に変換される。
変換されたデジタル音圧値は、中央演算装置２０３に送られて各種デジタル信号処理を施される。波形取り込み部１６０３ａ（図３の波形取り込み部３０１に対応）はデジタル音圧波形を取り込んでバッファリングする。
音響エコーキャンセラ１６０３ｂ（図３の音響エコーキャンセラ３０７に対応）は取り込んだデジタル音圧波形中のスピーカ出力信号成分を消去する。
エコー消去後の信号は、音源分離部１６０３ｄ（図３の音源分離部３０２に対応）に送られ、音源ごとに分離される。
音源位置推定部１６０３ｅ（図３の音源位置推定部３０３に対応）は、音源ごとの音源位置を推定する。推定した音源位置はマイクロホンアレイ位置から見た音源位置となる。
音源位置変換部１６０３ｆ（図３の音源位置変換部３０４に対応）は、マイクロホンアレイ位置から見た音源位置をユーザ聴取位置から見た音源方向に変換する。
出力係数決定部１６０３ｇは、ユーザ聴取位置から見た音源方向とスピーカ出力音の合成波面のユーザ聴取位置での音源方向の異なりが最大となるようにスピーカ出力係数を決定する。
音声再生部１６０３ｃは、決定したスピーカ１１２ごとの出力係数を出力音に畳み込む。ここまでのデジタル信号処理で必要なワークメモリ及びマイク配置等の事前情報は不揮発性メモリ２０５、揮発性メモリ２０４（図２参照）に記憶される。
Ｄ／Ａ変換装置１６０４内に配置されるＤ／Ａ変換処理部１６０４ａは、音声再生部１６０３ｃが出力するデジタル信号をアナログ信号に変換する。
このアナログ信号は複数のスピーカ１１２からなるスピーカアレイ１１１に送られ、各スピーカ１１２から音響信号として出力され空中に放射される。 FIG. 16 is an explanatory diagram showing the relationship between software blocks and hardware according to the first embodiment of the present invention.
An analog sound pressure value captured by the microphone array 101 including a plurality of microphones 102 is converted into a digital sound pressure value by an A / D conversion processing unit 1602a disposed in the A / D conversion device 1602.
The converted digital sound pressure value is sent to the central processing unit 203 and subjected to various digital signal processing. A waveform capturing unit 1603a (corresponding to the waveform capturing unit 301 in FIG. 3) captures and buffers the digital sound pressure waveform.
The acoustic echo canceller 1603b (corresponding to the acoustic echo canceller 307 in FIG. 3) deletes the speaker output signal component in the captured digital sound pressure waveform.
The signal after echo cancellation is sent to a sound source separation unit 1603d (corresponding to the sound source separation unit 302 in FIG. 3), and is separated for each sound source.
A sound source position estimation unit 1603e (corresponding to the sound source position estimation unit 303 in FIG. 3) estimates a sound source position for each sound source. The estimated sound source position is the sound source position viewed from the microphone array position.
A sound source position converter 1603f (corresponding to the sound source position converter 304 in FIG. 3) converts the sound source position viewed from the microphone array position into the sound source direction viewed from the user listening position.
The output coefficient determination unit 1603g determines the speaker output coefficient so that the difference between the sound source direction viewed from the user listening position and the sound source direction at the user listening position on the synthesized wavefront of the speaker output sound is maximized.
The audio reproduction unit 1603c convolves the determined output coefficient for each speaker 112 with the output sound. Prior information such as work memory and microphone arrangement necessary for the digital signal processing so far is stored in the nonvolatile memory 205 and the volatile memory 204 (see FIG. 2).
A D / A conversion processing unit 1604a disposed in the D / A conversion device 1604 converts the digital signal output from the audio reproduction unit 1603c into an analog signal.
The analog signal is sent to a speaker array 111 including a plurality of speakers 112, and is output as an acoustic signal from each speaker 112 and radiated into the air.

図１７は、本実施形態において音楽などのオーディオ出力音の出力方法を制御する構成を示すブロック図である。
スピーカ出力係数決定部１７０１は、前記したように、スピーカ合成波面のユーザ聴取位置での音源方向と雑音方向との異なりが最大となるようにスピーカ出力係数を決定する。 FIG. 17 is a block diagram showing a configuration for controlling an output method of audio output sound such as music in the present embodiment.
As described above, the speaker output coefficient determination unit 1701 determines the speaker output coefficient so that the difference between the sound source direction and the noise direction at the user listening position of the speaker composite wavefront is maximized.

オーディオソース取得部１７０２では、コンパクトディスクプレーヤなどの再生機器から再生音を取得する。オーディオ再生１７０３では、取得した再生音にスピーカ１１２ごとの出力係数を重畳した後、各スピーカ１１２から出力し、空中に放射する。また出力係数を雑音方向が変化するたびに、常に変化させると、かえって聞き取りにくい音になってしまう可能性がある。少なくとも同一ソース、例えば同じ音楽の曲を流している間は出力係数は変えないことが望まれる。 The audio source acquisition unit 1702 acquires playback sound from a playback device such as a compact disc player. In the audio reproduction 1703, the output coefficient for each speaker 112 is superimposed on the acquired reproduction sound, and then output from each speaker 112 and radiated into the air. Moreover, if the output coefficient is constantly changed every time the noise direction changes, the sound may be difficult to hear. It is desirable that the output coefficient is not changed at least while playing the same source, for example, music of the same music.

図１８は、出力係数決定タイミングを決める処理を示すフローチャートである。
まず、出力音のソースが変更されたかどうかを判定する（ソース変更判定；ステップＳ２００１）。これは、音楽の場合は再生曲が終了したかどうかをオーディオ機器に問い合わせることで実現可能である。 FIG. 18 is a flowchart showing a process for determining the output coefficient determination timing.
First, it is determined whether or not the source of the output sound has been changed (source change determination; step S2001). In the case of music, this can be realized by inquiring of the audio device whether or not the reproduced music has been completed.

ソースが変更されている場合（ステップＳ２００１のＹｅｓ）、スピーカ出力係数を変更し（出力係数変更；ステップＳ２００２）、次のステップ（ステップＳ２００３）へ進む。
なお、出力係数変更（ステップＳ２００２）では、更新したヒストグラムからスピーカ出力係数を決定する。 If the source has been changed (Yes in step S2001), the speaker output coefficient is changed (output coefficient change; step S2002), and the process proceeds to the next step (step S2003).
In the output coefficient change (step S2002), the speaker output coefficient is determined from the updated histogram.

ソースが変更されていない場合（ステップＳ２００１のＮｏ）、および出力係数を変更した後は、次の時間の波形を取り込む（波形取り込み；ステップＳ２００３）。 If the source has not been changed (No in step S2001), and after changing the output coefficient, the waveform of the next time is captured (waveform capture; step S2003).

取り込んだ波形は音響エコーキャンセラ３０７に送られ、音響エコー成分が消去される（ステップＳ２００４）。
次に、音源ごとに分離される（音源分離；ステップＳ２００５）。
そして、音源ごとのマイクロホン位置での音源位置が推定される（音源位置変換推定；ステップＳ２００６）。
そして、ユーザ聴取位置での音源方向が算出される（音源位置変換；ステップＳ２００７）。
そして、ユーザ聴取位置での音源方向のヒストグラムを更新する（ヒストグラム更新；ステップＳ２００８）。
そして、再生終了か否かを判断する（ステップＳ２００９）。再生終了の場合（ステップＳ２００９のＹｅｓ）、処理を終了する。
再生終了でない場合（ステップＳ２００９のＮｏ）、ステップＳ２００１以降の処理を繰り返す。 The acquired waveform is sent to the acoustic echo canceller 307, and the acoustic echo component is eliminated (step S2004).
Next, the sound sources are separated (sound source separation; step S2005).
Then, the sound source position at the microphone position for each sound source is estimated (sound source position conversion estimation; step S2006).
Then, the sound source direction at the user listening position is calculated (sound source position conversion; step S2007).
Then, the histogram of the sound source direction at the user listening position is updated (histogram update; step S2008).
Then, it is determined whether or not the reproduction is finished (step S2009). If the reproduction has ended (Yes in step S2009), the process ends.
If the reproduction is not finished (No in step S2009), the processes in and after step S2001 are repeated.

図１９は、出力係数設定タイミングとオーディオソース再生タイミングの一例を示したタイミングチャートである。
雑音方向はθ１からθ２に変化するものとする。スピーカ出力係数を常時更新する場合、雑音の方向が変化したタイミングでスピーカ出力係数が変化することになるが、この例の場合、ソース（２）再生中に出力係数が変化することになり、ユーザにとって聞き取りにくい音となる。本例に示すように、ソース（２）とソース（３）のソースが変わるタイミングで出力係数を変化させる構成を取ることで、ユーザに与える不快感を軽減することが可能となる。5.1chサラウンド音楽など一つのソースに複数の音源が含まれている場合は、スピーカ出力音の合成波面の音源方向が雑音の音源方向との異なりが大きくなるものから順番に音源数分だけスピーカ出力係数を選択し、それぞれの音源に重畳するような構成を取ってもよい。 FIG. 19 is a timing chart showing an example of output coefficient setting timing and audio source playback timing.
It is assumed that the noise direction changes from θ1 to θ2. When the speaker output coefficient is constantly updated, the speaker output coefficient changes at the timing when the noise direction changes. In this example, the output coefficient changes during playback of the source (2), and the user Sounds that are hard to hear. As shown in this example, by adopting a configuration in which the output coefficient is changed at the timing when the sources (2) and (3) are changed, it is possible to reduce discomfort given to the user. If multiple sources are included in a single source such as 5.1ch surround music, the speaker output is output by the number of sound sources in order starting from the difference in the sound source direction of the synthesized wave front of the speaker output sound from the noise source direction. A configuration may be adopted in which a coefficient is selected and superimposed on each sound source.

図２０は、本発明による第２実施形態の音再生装置１ｂを示すハードウェア構成図である。
図２０に示す音再生装置１ｂは、車室１１内でのハンズフリー通話に応用する際のハードウェア構成を示すものであって、図２で示した構成に加えて、携帯電話１８０１をハードウェアとして加えたものである。 FIG. 20 is a hardware configuration diagram showing the sound reproducing device 1b according to the second embodiment of the present invention.
A sound reproducing device 1b shown in FIG. 20 shows a hardware configuration when applied to a hands-free call in the passenger compartment 11, and in addition to the configuration shown in FIG. Is added as

中央演算装置２０３で取得した車室１１内のデジタル音圧データは携帯電話１８０１に送られる。
携帯電話１８０１は電話網を通してデジタル音圧データを通話相手に送信する。また電話網を通して通話相手の音声が送られてきた音を、中央演算装置２０３内で計算したスピーカごとの出力係数を重畳した後、多チャンネルＤ／Ａ変換機２０６に送りアナログ信号に変換する。
アナログ信号はスピーカアレイ１１１に送られ各スピーカ１１２から出力されて、空中に放射される。
ハンズフリー通話の構成においては、ユーザ聴取位置から音が放射される可能性がある。そこで、本実施形態のスピーカ出力係数決定部１７０１においては、音源分離後の各音源の信号のうち音源位置がユーザ聴取位置近傍となる音源については棄却し雑音とみなさないという構成を取ってもよい。また、音源分離後のユーザ聴取位置近傍の音源を携帯電話１８０１に送信するような構成を取ってもよい。このような構成を取ることで、雑音が存在する車室１１内でも雑音の少ないクリアな音を通話相手に送ることが可能となる。 Digital sound pressure data in the passenger compartment 11 acquired by the central processing unit 203 is sent to the mobile phone 1801.
The cellular phone 1801 transmits digital sound pressure data to the other party through the telephone network. Further, the sound transmitted from the other party through the telephone network is superimposed on the output coefficient for each speaker calculated in the central processing unit 203 and then sent to the multi-channel D / A converter 206 to be converted into an analog signal.
The analog signal is sent to the speaker array 111, output from each speaker 112, and radiated into the air.
In a hands-free call configuration, sound may be emitted from the user listening position. Therefore, the speaker output coefficient determination unit 1701 of the present embodiment may be configured such that a sound source whose sound source position is in the vicinity of the user listening position among the signals of each sound source after sound source separation is rejected and not regarded as noise. . Further, a configuration may be adopted in which a sound source near the user listening position after sound source separation is transmitted to the mobile phone 1801. By adopting such a configuration, it becomes possible to send a clear sound with little noise to the other party even in the passenger compartment 11 where the noise exists.

図２１は、本発明によるユーザ聴取位置での音源位置変換処理を使って、仮想的な音源位置における音場を再現する音場再現システムのソフトウェア構成を示すブロック図である。
波形取り込み部３０１で取り込んだ複数チャンネルのデジタル波形は、音響エコーキャンセラ３０７に送られ、スピーカ出力音の成分が除去される。
音源分離部３０２は、除去後の波形（信号）を音源ごとに分離する。
音源位置推定部３０３は、分離した音源ごとにマイクロホンアレイ１０１の位置での音源位置を推定する。
音源位置変換部３０４は、仮想的なユーザ聴取位置での音源位置に変換する。
話者ボリューム決定部１９０６は、音源分離部３０２で分離した出力信号に仮想的なユーザ聴取位置から見た音源のステアリングベクトルを重畳する。
すべての音源に対して同様な処理を繰り返した後、波形再結合部１９０７は、マイクロホン（素子）１０２ごとに音源ごとの波形を統合して出力する。 FIG. 21 is a block diagram showing a software configuration of a sound field reproduction system that reproduces a sound field at a virtual sound source position using the sound source position conversion process at the user listening position according to the present invention.
The digital waveforms of a plurality of channels captured by the waveform capturing unit 301 are sent to the acoustic echo canceller 307 and the speaker output sound component is removed.
The sound source separation unit 302 separates the removed waveform (signal) for each sound source.
The sound source position estimation unit 303 estimates the sound source position at the position of the microphone array 101 for each separated sound source.
The sound source position conversion unit 304 converts the sound source position to the sound source position at the virtual user listening position.
The speaker volume determination unit 1906 superimposes the steering vector of the sound source viewed from the virtual user listening position on the output signal separated by the sound source separation unit 302.
After repeating the same processing for all sound sources, the waveform recombination unit 1907 integrates and outputs the waveforms for each sound source for each microphone (element) 102.

１音再生装置（第１実施形態）
１ｂ音再生装置（第２実施形態）
１０自動車
１１車室
１２運転席
１０１マイクロホンアレイ
１０２マイクロホン
１１１スピーカアレイ
１１２スピーカ
２０２多チャンネルＡ／Ｄ変換機
２０３中央演算装置
２０４揮発性メモリ
２０５不揮発性メモリ
２０６多チャンネルＤ／Ａ変換機
２０８座席センサ
３０１波形取り込み部
３０２音源分離部
３０３音源位置推定部
３０４音源位置変換部
３０５ヒストグラム更新部
３０６出力係数決定部
３０７音響エコーキャンセラ
４０１出力係数記憶部
４０２スピーカ出力部
４０３出力ソース取得部
５０２ユーザ位置抽出部
５０３変換ベクトル生成部
５０４マイク位置データベース
５０５変換ベクトル加算部
６０２方向行列計算部
６０３固有値・ベクトル計算部
６０４最小コスト係数算出部
７０２方向行列計算部
７０３スピーカ内積計算部
７０４最小コスト係数算出部
８０１方向行列計算部
８０２同乗者位置推定部
８０３既知雑音位置
９０１バッファリング部
９０２短時間周波数変換部
９０３フィルタ適応部
９０４フィルタリング部
９０５パワー正規化部
９０６棄却判定部
１１０２逆行列計算部
１１０３サブアレイ分割部
１１０４方向推定部
１１０５交点推定部
１１０６ヒストグラム推定部
１５０１参照信号取り込み部
１５０３出力係数重畳部
１５０４フィルタリング部
１５０５フィルタ更新部
１５０６エコー消去部
１６０２Ａ／Ｄ変換装置
１６０４Ｄ／Ａ変換装置
１７０１スピーカ出力係数決定部
１７０２オーディオソース取得部
１７０３オーディオ再生
１８０１携帯電話
１９０６話者ボリューム決定部
１９０７波形再結合部 1 sound reproduction device (first embodiment)
1b Sound reproduction device (second embodiment)
DESCRIPTION OF SYMBOLS 10 Car 11 Car compartment 12 Driver's seat 101 Microphone array 102 Microphone 111 Speaker array 112 Speaker 202 Multi-channel A / D converter 203 Central processing unit 204 Volatile memory 205 Non-volatile memory 206 Multi-channel D / A converter 208 Seat sensor 301 Waveform acquisition unit 302 Sound source separation unit 303 Sound source position estimation unit 304 Sound source position conversion unit 305 Histogram update unit 306 Output coefficient determination unit 307 Acoustic echo canceller 401 Output coefficient storage unit 402 Speaker output unit 403 Output source acquisition unit 502 User position extraction unit 503 Conversion vector generation unit 504 Microphone position database 505 Conversion vector addition unit 602 Direction matrix calculation unit 603 Eigenvalue / vector calculation unit 604 Minimum cost coefficient calculation unit 702 Direction Matrix calculation unit 703 Speaker inner product calculation unit 704 Minimum cost coefficient calculation unit 801 Direction matrix calculation unit 802 Passenger position estimation unit 803 Known noise position 901 Buffering unit 902 Short-time frequency conversion unit 903 Filter adaptation unit 904 Filtering unit 905 Power normalization Unit 906 rejection determination unit 1102 inverse matrix calculation unit 1103 subarray division unit 1104 direction estimation unit 1105 intersection estimation unit 1106 histogram estimation unit 1501 reference signal capturing unit 1503 output coefficient superposition unit 1504 filtering unit 1505 filter update unit 1506 echo cancellation unit 1602 A / D conversion device 1604 D / A conversion device 1701 Speaker output coefficient determination unit 1702 Audio source acquisition unit 1703 Audio playback 1801 Cellular phone 1906 Speaker volume Tough 1907 waveform recombination part

Claims

A sound reproduction device including a plurality of speakers as a reproduction sound source ,
A sound source direction estimation unit for estimating a sound source direction at a position of a microphone array including a plurality of microphones ;
A sound source direction conversion unit that converts the sound source direction at the estimated position of the microphone array into the sound source direction at the user listening position;
Based on the converted sound source direction , a sound source direction at the user listening position of a noise source other than the reproduction sound source is calculated, and sound is emitted from the sound source direction of the noise source at the user listening position and the plurality of speakers. An output coefficient determination unit that determines the output coefficient of each of the plurality of speakers so that the sound image localization direction at the time is different,
A sound reproducing device comprising:

Equipped with a seat sensor to detect occupants,
The output coefficient determination unit, the sound reproducing apparatus according to claim 1, characterized in that the operation is regarded as the noise source is present in the passenger position detected by the seat sensor.

Equipped with a seat sensor to detect occupants,
The output coefficient determination unit, the sound reproducing apparatus according to claim 1, an occupant position detected by the seat sensor and performing the operation is regarded as the a user listening position.

By adding a second position vector that is a position vector of a sound source position other than the reproduction sound source with respect to the microphone array position to a first position vector that is a position vector of a microphone array position with respect to the user listening position, the user listening Calculating a third position vector, which is a position vector of the sound source position with respect to a position, and converting a sound source direction at the position of the microphone array into a sound source direction at a user listening position based on the third position vector; The sound reproducing device according to claim 1.

In an environment in which a plurality of microphones and a plurality of speakers are arranged at predetermined positions, a sound reproduction method for reproducing sound from a sound reproduction device using the plurality of speakers as a reproduction sound source ,
The position of the microphone array of a sound source other than the reproduction sound source from the plurality of speakers, using the information on the sound from the microphone array composed of a plurality of microphones and the information on the positional relationship between the microphone array and each of the speakers. A sound source direction estimating step for estimating a sound source direction at
A sound source direction conversion step of converting a sound source direction at the estimated position of the microphone array into a sound source direction at a user listening position;
Based on the converted sound source direction , a sound source direction at the user listening position of a noise source other than the reproduction sound source is calculated, and a sound source direction at the user listening position of the noise source and sound from the plurality of speakers are calculated. An output coefficient determining step for determining the output coefficient of each speaker so that the sound image localization direction when radiated is different;
A sound emission step of emitting sound from the speaker according to the determined output coefficient;
A sound reproduction method comprising:

In the output coefficient determination step, the speaker selected from the plurality of speakers is located in a direction in which the sound source direction at the user listening position is different.
The sound reproduction method according to claim 5, wherein: