JP5997007B2

JP5997007B2 - Sound source position estimation device

Info

Publication number: JP5997007B2
Application number: JP2012239919A
Authority: JP
Inventors: 健太丹羽; 小林　和則; 和則小林; 羽田　陽一; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-10-31
Filing date: 2012-10-31
Publication date: 2016-09-21
Anticipated expiration: 2032-10-31
Also published as: JP2014090353A

Description

本発明は、複数のマイクロホンで収音した観測信号を用いて、任意の音源の到来方向・位置を推定する技術（以下、「音源位置推定技術」と呼ぶ）に関する。なお、音源方向推定は音源位置推定に含まれると定義する。 The present invention relates to a technique for estimating the arrival direction / position of an arbitrary sound source using observation signals collected by a plurality of microphones (hereinafter referred to as “sound source position estimation technique”). Note that the sound source direction estimation is defined as being included in the sound source position estimation.

複数のマイクロホン（例えばマイクロホンアレー）で収音した観測信号間に生じる位相・振幅差を利用することで、音源の位置を推定する技術方式は様々研究されてきた。これまでの音源位置推定技術では、主に、i)収音した観測信号に対して行う信号処理に関する工夫と、ii)マイクロホン等の配置を工夫する2つの方針で研究が進められてきた。 Various techniques have been studied for estimating the position of a sound source by using a phase / amplitude difference generated between observation signals picked up by a plurality of microphones (for example, microphone arrays). In the sound source position estimation technology so far, research has been carried out mainly with two policies: i) ingenuity regarding signal processing performed on collected observation signals, and ii) in arrangement of microphones.

［従来方式i)：信号処理に関する工夫］
代表的な信号処理として、a)GCC-PHAT法(Generalized Cross Correlation with PHAse Transform法、非特許文献１参照)、b)MUSIC法(MUltiple SIgnal Classification法、非特許文献２参照)、c)ビームフォーマ法（非特許文献３参照）が知られている。 [Conventional method i): Device for signal processing]
Typical signal processing includes: a) GCC-PHAT method (Generalized Cross Correlation with PHAse Transform method, see Non-Patent Document 1), b) MUSIC method (MUltiple SIgnal Classification method, see Non-Patent Document 2), c) Beamformer The law (see Non-Patent Document 3) is known.

[従来方式ii):アレーの配置に関する工夫]
従来方式i)では、信号処理を用いて音源の位置を推定しようとしてきたが、観測する系(e.g.マイクロホンのアレンジ)によって性能は大きく変わる。例えば、a)マイクロホンアレーのサイズを大きくすることで球面波と平面波の違いを検出できるようにすることで、音源位置の推定精度を高めたり、b)複数のマイクロホンアレーを距離を離して配置することで、音源の位置を推定する問題を方向推定の組み合わせとして解決するアプローチが提案されている（非特許文献４参照）。基本的には両方式ともアレーサイズを大きくする方針であり、観測した後の信号処理については、従来方式i)に挙げた処理を例として、任意方式を用いればよい。 [Conventional method ii): Device for array layout]
In the conventional method i), the position of the sound source has been estimated using signal processing. However, the performance varies greatly depending on the observation system (eg microphone arrangement). For example, a) Increasing the size of the microphone array can detect the difference between spherical and plane waves, thereby improving the accuracy of sound source position estimation, or b) arranging multiple microphone arrays at a distance. Thus, an approach for solving the problem of estimating the position of the sound source as a combination of direction estimation has been proposed (see Non-Patent Document 4). Basically, both systems have a policy of increasing the array size, and for the signal processing after observation, an arbitrary method may be used by taking the processing given in the conventional method i) as an example.

C. H. Knapp et al., ”The generalized correlation method for estimation of timedelay”, IEEE Trans. ASSP, 1976, vol.24, no.4, pp. 320-327C. H. Knapp et al., “The generalized correlation method for estimation of timedelay”, IEEE Trans. ASSP, 1976, vol.24, no.4, pp. 320-327 R. O. Schmidt, ”Multiple emitter location and signal parameter estimation”, IEEE Transactions on Antennas and Propagation, 1986, vol.34, no.3, pp.276-280R. O. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE Transactions on Antennas and Propagation, 1986, vol.34, no.3, pp.276-280 D. H. Johnson et al., "Array Signal Processing: Concepts and Techniques", USA, Prentice-Hall, 1993D. H. Johnson et al., "Array Signal Processing: Concepts and Techniques", USA, Prentice-Hall, 1993 日岡祐輔ら, ”音源位置の推定情報を用いた特定の2 次元領域内の強調収音”, 電子情報通信学会技術研究報告(応用音響), 2008, 巻：108, 号:115, pp.7-12Yusuke Hioka et al., “Enhanced sound collection in a specific 2D region using estimated sound source position information”, IEICE Technical Report (Applied Acoustics), 2008, Volume ： 108, Issue: 115, pp. 7-12

音源を取り囲むようにマイクロホンを配置して、アレーサイズを大きくすることができる状況であればよいが、通常は、マイクロホン本数Mや設置できるエリアは限られる。その場合、従来方式i)のような信号処理を工夫したとしても、観測した信号内に音源位置を推定するための情報量が少なくて、推定できない可能性がある。例えば、ディスプレイ近くに設置されたTV会議での収音装置を用いて、5〜10m程度離れた位置に並んで発話者が座っているとする。その場合、音源の角度間が狭いので、音源を識別することが困難となる。 Although it is sufficient that the microphone can be arranged so as to surround the sound source and the array size can be increased, the number M of microphones and the area where the microphone can be installed are usually limited. In that case, even if the signal processing as in the conventional method i) is devised, there is a possibility that the amount of information for estimating the sound source position in the observed signal is small and cannot be estimated. For example, it is assumed that a speaker sits side by side at a position about 5 to 10 m away using a sound collecting device for a video conference installed near the display. In that case, since the angle between the sound sources is narrow, it is difficult to identify the sound source.

本発明は、音源同士が狭い角度差、距離で配置されていたとしても、音源の位置を推定することができる技術を提供することを目的とする。 An object of the present invention is to provide a technique capable of estimating the position of a sound source even if the sound sources are arranged with a narrow angle difference and distance.

上記の課題を解決するために、本発明の第一の態様によれば、音源位置推定装置は、複数個のマイクロホンと、音を反射可能な素材により作成され、推定対象の音源の複数の想定位置のそれぞれから発せられる音に対して、各マイクホロンにおいて１つ以上の反射音が収音できるように、複数個のマイクロホンの近傍に配置された反射手段と、推定対象の音源の複数の想定位置から複数個のマイクロホンまでの、反射手段により生ずる反射音の影響を含む伝達特性が格納される伝達特性記憶部と、複数個のマイクロホンから得られる周波数領域の観測信号と、複数の想定位置に対応する伝達特性とを用いて、想定位置に推定対象の音源が存在している可能性の高さを表す指標を求める拡散センシング部と、推定対象の音源が存在している可能性が高いことを表す指標に対応する位置を推定対象の音源の位置として推定する音源位置推定部とを含む。 In order to solve the above-described problem, according to the first aspect of the present invention, a sound source position estimation device is created from a plurality of microphones and a material capable of reflecting sound, and a plurality of assumptions of a sound source to be estimated Reflection means arranged in the vicinity of a plurality of microphones and a plurality of assumptions of the sound source to be estimated so that one or more reflected sounds can be collected in each microphone holon for the sound emitted from each of the positions. A transfer characteristic storage unit that stores transfer characteristics including the influence of reflected sound generated by the reflection means from a position to a plurality of microphones, an observation signal in a frequency domain obtained from a plurality of microphones, and a plurality of assumed positions. Using a corresponding transfer characteristic, a diffuse sensing unit that obtains an index indicating the high possibility that the estimation target sound source exists at the assumed position, and the estimation target sound source may exist And a sound source position estimating section for estimating a position corresponding to the index representing the higher the position of the sound source to be estimated.

本発明では、拡散センシングにより、音源同士が狭い角度差、距離で配置されていたとしても、音源位置を推定することができるという効果を奏する。 In the present invention, even if the sound sources are arranged with a narrow angle difference and distance by the diffusion sensing, there is an effect that the sound source position can be estimated.

第一実施形態に係る音源位置推定装置の機能ブロック図。The functional block diagram of the sound source position estimation apparatus which concerns on 1st embodiment. 第一実施形態に係る音源位置推定装置の処理フローを示す図。The figure which shows the processing flow of the sound source position estimation apparatus which concerns on 1st embodiment. 図３Ａは推定対象の音源から発せられた音が反射手段に反射しマイクロホンに等方位的に到来する状態を示す図であり、図３Ｂは図３Ａの反射手段による鏡像を示す図。FIG. 3A is a diagram showing a state where sound emitted from a sound source to be estimated is reflected by the reflecting means and arrives at the microphone in an isotropic direction, and FIG. 3B is a diagram showing a mirror image by the reflecting means of FIG. 3A. 反射手段の形状例を示す図。The figure which shows the example of a shape of a reflection means. 第二実施形態に係る音源位置推定装置の機能ブロック図。The functional block diagram of the sound source position estimation apparatus which concerns on 2nd embodiment. 第二実施形態に係る音源位置推定装置の処理フローを示す図。The figure which shows the processing flow of the sound source position estimation apparatus which concerns on 2nd embodiment.

以下、本発明の実施形態について説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「^-」は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. In the following description, the symbol “ ^- ” used in the text should be described immediately above the immediately preceding character, but it is described immediately after the character due to restrictions on text notation. In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

狭い間隔で音源が配列している状況でも、音源の位置を識別して、音源位置推定を行うために、拡散センシング(Diffusedsensing)に基づく制御（参考文献１参照）を取り入れる。
［参考文献１］K. Niwa et al., ”Diffused sensing for sharp directivity microphone array”, ICASSP ,2012, pp. 225-228. In order to identify the position of the sound source and estimate the sound source position even in a situation where the sound sources are arranged at a narrow interval, control based on diffuse sensing (see Reference 1) is incorporated.
[Reference 1] K. Niwa et al., “Diffused sensing for sharp directivity microphone array”, ICASSP, 2012, pp. 225-228.

参考文献１では、トンネルや洞窟の中のような全方位からパワーの強い反射音がランダムに到来する環境下で、マイクロホン間隔をできるだけ広くしたアレーを配置して収音することで、ターゲット音とその他の音源を空間的に見分けるための情報を最大限に得られるので、狭指向性の収音を可能にできることが明らかになっている。拡散センシングは、例えばマイクロホンアレーの周りに反射板をつけることで実装することができる。 In Reference 1, in an environment where powerful reflected sound comes randomly from all directions, such as in a tunnel or a cave, the target sound and Since it is possible to obtain the maximum information for spatially distinguishing other sound sources, it has become clear that it is possible to collect sound with narrow directivity. Diffuse sensing can be implemented, for example, by attaching a reflector around the microphone array.

収音後の信号処理については、任意の方式を使用して良い。例えば、b)MUSIC法、c)ビームフォーマ法など、従来方式で挙げた方式を利用できる。各方式では、マイクロホン間に生じる時間差や伝達特性を、推定対象の音源が存在すると想定されうる位置（以下、単に「想定位置」という）r^- _n毎に用意する。ただし、従来方式では、直接音のみをモデル化していたので、計算によってそれらを簡単に算出することができていたが、拡散センシングに基づく方式では、実際に伝達特性を測定するか、反射板の音響特性をシミュレートするなどして、反射音の影響を含めた伝達特性を用意する必要がある。 For signal processing after sound collection, any method may be used. For example, the conventional methods such as b) MUSIC method and c) beamformer method can be used. In each method, a time difference and a transfer characteristic generated between microphones are prepared for each position r ⁻ _n where the sound source to be estimated can be assumed (hereinafter simply referred to as “assumed position”). However, in the conventional method, only the direct sound was modeled, so it was possible to calculate them easily by calculation.However, in the method based on diffuse sensing, the transfer characteristics were actually measured or the reflector It is necessary to prepare transfer characteristics including the effect of reflected sound, such as by simulating acoustic characteristics.

＜第一実施形態に係る音源位置推定装置２＞
第一実施形態では、b)MUSIC法により、音源位置を推定する。MUSIC法は、音場に存在する音源数Kより多くの個数のマイクロホンを用いて、観測信号中に含まれる手掛かりからK個の音源kの位置を推定する。なお、音源数Kは予め与えるか観測した信号から推定することとする。図１は音源位置推定装置２の機能ブロック図を、図２はその処理フローを示す。 <Sound source position estimation apparatus 2 according to the first embodiment>
In the first embodiment, b) the sound source position is estimated by the MUSIC method. The MUSIC method estimates the positions of K sound sources k from the cues included in the observation signal using a larger number of microphones than the number of sound sources K existing in the sound field. Note that the number of sound sources K is estimated from a signal given or observed in advance. FIG. 1 is a functional block diagram of the sound source position estimating apparatus 2, and FIG.

音源位置推定装置２は、M個のマイクロホン１１０−ｍと、反射手段２００と、ＡＤ変換部１２０と、周波数領域変換部１３０と、伝達特性記憶部２１０と、拡散センシング部２２０と、音源位置推定部１６０とを含む。拡散センシング部２２０は、雑音空間相関行列計算部１４０と、ミュージックスペクトル計算部１５０とを含む。ただし、M>Kであり、m=1,2,…,Mである。 The sound source position estimation apparatus 2 includes M microphones 110-m, a reflection unit 200, an AD conversion unit 120, a frequency domain conversion unit 130, a transfer characteristic storage unit 210, a diffusion sensing unit 220, and a sound source position estimation. Part 160. The diffusion sensing unit 220 includes a noise space correlation matrix calculation unit 140 and a music spectrum calculation unit 150. However, M> K and m = 1, 2,.

音源位置推定装置２は、M個のマイクロホン１１０−ｍでそれぞれ収音したアナログ観測信号x_m(i)を用いて、K個の音源kの推定し、推定位置r^-(τ)=[r^- ₁(τ),…,r^- _K(τ)]を出力する。以下、各部における処理の詳細を説明する。 The sound source position estimation apparatus 2 estimates K sound sources k using the analog observation signals x _m (i) collected by the M microphones 110-m, and estimates positions r ⁻ (τ) = [r ^{_{- 1 (τ), ...,}} r - to output a _{K (τ)].} Details of the processing in each unit will be described below.

＜マイクロホン１１０−ｍ及び反射手段２００＞
反射手段２００は、音を反射可能な素材により作成され、想定位置r^- _nのそれぞれから発せられる音に対して、マイクホロン１１０−ｍにおいて１つ以上の反射音が収音できるように、M個のマイクロホン１１０−ｍの近傍に配置される。ただし、想定位置の個数をN(≧1)とすると、n=1,2,…,Nである。M個のマイクロホン１１０−ｍは、位置推定の対象となる音源が発する音を収音し（ｓ３）、アナログ観測信号x_m(i)をＡＤ変換部１２０に出力する。 <Microphone 110-m and reflection means 200>
The reflecting means 200 is made of a material that can reflect sound, and M 1 can collect one or more reflected sounds at the microphone holon 110-m with respect to sounds emitted from the assumed positions r ⁻ _n. It arrange | positions in the vicinity of the microphone 110-m. However, when the number of assumed positions is N (≧ 1), n = 1, 2,. The M microphones 110-m collect sound emitted by the sound source that is the target of position estimation (s 3), and output the analog observation signal x _m (i) to the AD converter 120.

本実施形態では、拡散センシング（参考文献１参照）を実装し、狭間隔に推定対象の音源が配置されていたとしても音源の位置を推定する技術を実現している。なお、拡散センシングとは”拡散状態にある信号を観測することで、多チャネルのセンサーを効果的に利用した空間制御を可能にすること”である。以下、一回以上反射して、マイクロホンに到来する場合の観測信号を拡散信号と呼ぶ。反射の回数は、多い方がのぞましい。 In the present embodiment, diffusion sensing (see Reference 1) is implemented, and a technique for estimating the position of a sound source is realized even if sound sources to be estimated are arranged at narrow intervals. Note that diffusion sensing means “allowing space control that effectively uses a multi-channel sensor by observing a signal in a diffusion state”. Hereinafter, an observation signal that is reflected once or more and arrives at the microphone is referred to as a spread signal. The higher the number of reflections, the better.

本実施形態では、拡散信号が収音できるようにM個のマイクロホン１１０−ｍと反射手段２００とを適宜配置する。例えば、参考文献１に記載されているように、反射手段２００は、暑さ８ｍｍのＡＢＳ樹脂（アクリロニトリル(Acrylonitrile)、ブタジエン(Butadiene)、スチレン(Styrene)共重合合成樹脂）からなり、その形状を先を切り取った八面体とし、その内部の頂点に24個のマイクロホンを配置してもよい。図３Ａは推定対象の音源１００−ｋから発せられた音が反射手段２００に反射し、マイクロホン１１０−ｍに等方位的に到来する状態を示し、図３Ｂは図３Ａの反射手段２００による鏡像を示す。このようにしてマイクロホン１１０−ｍで収音した観測信号が拡散信号である。拡散信号は拡散状態に近い信号と言える。例えば、トンネルや洞窟の中で発せられた際の残響がかった音が拡散信号に近い。 In the present embodiment, M microphones 110-m and reflecting means 200 are appropriately arranged so that the diffused signal can be picked up. For example, as described in Reference 1, the reflection means 200 is made of ABS resin (Acrylonitrile, butadiene (Butadiene), styrene (Styrene) copolymer synthetic resin) having a heat of 8 mm, and has a shape thereof. The octahedron with the tip cut off may be used, and 24 microphones may be arranged at the apexes inside. FIG. 3A shows a state in which the sound emitted from the sound source 100-k to be estimated is reflected by the reflecting means 200 and arrives at the microphone 110-m in the same direction, and FIG. 3B shows a mirror image by the reflecting means 200 of FIG. 3A. Show. The observation signal collected by the microphone 110-m in this way is a spread signal. The spread signal can be said to be a signal close to the spread state. For example, a reverberant sound when emitted in a tunnel or cave is close to a diffuse signal.

推定対象の音源１００−ｋから放射された音がマイクロホン１１０−ｍで観測されるまでの間に反射回数が多くなるほど観測信号は拡散的な信号となり、音源位置推定精度が向上する（参考文献１参照）。よって、反射手段２００の素材は、音をあまり吸収せずに反射するものであることが望ましい。また、その形状は、反射回数が多くなるような形状であることが望ましい。例えば、前述の先を切り取った八面体である。また、図４（Ａ）及び（Ｂ）にそれぞれ示すように十二面体及び二十面体の一面を開口面とした形状であってもよいし、図４（Ｃ）及び（Ｄ）にそれぞれ示すように菱形十二面体及び球体に開口部を設ける形状であってもよい。また、開口面や開口部に、ホーン等を取り付けた形状であってあってもよい。 As the number of reflections increases before the sound radiated from the sound source 100-k to be estimated is observed by the microphone 110-m, the observation signal becomes a diffuse signal and the sound source position estimation accuracy is improved (Reference Document 1). reference). Therefore, it is desirable that the material of the reflecting means 200 is a material that reflects without absorbing much sound. Moreover, it is desirable that the shape be a shape that increases the number of reflections. For example, the above-mentioned octahedron is cut off. Moreover, as shown to FIG. 4 (A) and (B), respectively, the shape which made the one surface of the dodecahedron and the icosahedron open may be sufficient, and it shows to FIG. 4 (C) and (D), respectively. In this way, the rhombus dodecahedron and the sphere may be provided with openings. Moreover, the shape which attached the horn etc. to the opening surface or the opening part may be sufficient.

＜伝達特性記憶部２１０＞
伝達特性記憶部２１０は、想定位置r^- _nからM個のマイクロホン１１０−ｍまでの、反射手段２００により生ずる反射音の影響を含む伝達特性a^-(ω,r^- _n)が予め格納される（ｓ１）。ただし、マイクロホン１１０−ｍの位置をp^- _mとすると、a^-(ω,r^- _n)=[a₁(ω,p^- ₁,r^- _n),…,a_M(ω,p^- _M,r^- _n)]^Tと定義される。 <Transfer characteristic storage unit 210>
The transfer characteristic storage unit 210 stores in advance transfer characteristics a ⁻ (ω, r ⁻ _n ) including the influence of reflected sound generated by the reflection means 200 from the assumed position r ⁻ _n to M microphones 110 -m. (S1). However, the position of the microphone 110-m p ^- When _{^{m, a - (ω, r}} - n) = [a 1 (ω, p - 1, r - n), ..., a M (ω, p - M , r ^- _n )] ^T.

なお、伝達特性a^-(ω,r^- _n)は、想定位置r^- _nからの音がM個のマイクロホン１１０−ｍに直接届く直接音の伝達特性と、当該音が反射物で反射してM個のマイクロホン１１０−ｍに届く一つ以上の反射音の各伝達特性との和で表される。 Incidentally, transfer characteristics ^{^{_{a - (ω, r - n}}} ) is assumed position r ^- a transfer characteristic of the direct reach direct sound to the sound from the _n is M microphones 110-m, the sound is reflected by the reflector This is expressed as the sum of one or more reflected sounds that reach the M microphones 110-m.

伝達特性a^-(ω,r^- _n)は、例えば、直接音のステアリングベクトルと、反射による音の減衰及び直接音に対する到来時間差が補正された一つ以上の反射音の各伝達特性との和とする。参考文献１では、次式により伝達特性a^-(ω,r^- _n)は求められる。 Transfer characteristics ^{^{_{a - (ω, r - n}}} ) , for example, the sum of the steering vectors of the direct sound, the respective transfer characteristics of one or more reflected sound difference of arrival time is corrected for attenuation and the direct sound of the sound by reflection And In Reference 1, the transfer characteristics by the following formula ^{^{_{a - (ω, r - n}}} ) is obtained.

ただし、h^-(0)(ω,r^- _n)は直接音のステアリングベクトルを、h^-（ｄ）（ω,r^- _n）（但し１≦ｄ≦Ｄ）は反射音のステアリングベクトルを、κ^(d)(ω)はd番目の反射音に対する反射係数を、p^- _m ^(d)はマイクロホン１１０−ｍのd番目の仮想マイクロホン(鏡像)の位置を、vは音速を表し、||p^- _m ^(d)―r^- _n||は、音源nからマイクロホン１１０−ｍのd番目の鏡像までの距離を表す。また、伝達特性a^-(ω,r^- _n)は、実環境下において実測で得られたものでもよいし、反射板の音響特性を用いてシミュレートして得られたものでもよい。 However, h- ⁽⁰⁾ (ω, r ^- _n ) is a direct sound steering vector, h- ^(d) (ω, r ^- _n ) (where 1≤d≤D) is a reflected sound steering vector, κ ^(d) (ω) is the reflection coefficient for the d-th reflected sound, p ^- _m ^(d) is the position of the d-th virtual microphone (mirror image) of the microphone 110-m, v is the speed of sound, and || p ⁻ _m ^(d) −r ⁻ _n || represents the distance from the sound source n to the d-th mirror image of the microphone 110-m. Further, the transfer characteristic a ⁻ (ω, r ⁻ _n ) may be obtained by actual measurement in an actual environment, or may be obtained by simulation using the acoustic characteristics of the reflector.

＜ＡＤ変換部１２０及び周波数領域変換部１３０＞
ＡＤ変換部１２０は、M個のアナログ観測信号x_m(i)を受け取り、それぞれデジタル観測信号x_m(t)（以下、単に「観測信号x_m(t)」ともいう）に変換し（ｓ５）、周波数領域変換部１３０に出力する。ただし、ｉ及びｔはそれぞれ連続時間及び離散時間のインデックスを表す。 <AD Converter 120 and Frequency Domain Converter 130>
The AD conversion unit 120 receives M analog observation signals x _m (i) and converts them into digital observation signals x _m (t) (hereinafter also simply referred to as “observation signals x _m (t)”) (s5). ) And output to the frequency domain converter 130. However, i and t represent indexes of continuous time and discrete time, respectively.

さらに、周波数領域変換部１３０は、M個の観測信号x_m(t)を受け取り、それぞれ周波数領域の観測信号X_m(ω,τ)（以下、単に「観測信号X_m(ω,τ)」ともいう）に変換し（ｓ７）、拡散センシング部２２０内の雑音空間相関行列計算部１４０に出力する。ただし、ω、τはそれぞれ離散周波数、フレーム時間のインデックスを表し、ω＝1,2,…,Ωとする。なお、m番目のマイクロホンで収音した観測信号の周波数領域表現をX_m(ω,τ)とし、X^-(ω,τ)=[X₁(ω,τ),…,X_M(ω,τ)]^Tとする。^Tは転置を表わす。 Further, the frequency domain transform unit 130 receives M observation signals x _m (t), and each of the frequency domain observation signals X _m (ω, τ) (hereinafter simply referred to as “observation signal X _m (ω, τ)”). (S7) and output to the noise spatial correlation matrix calculation unit 140 in the diffusion sensing unit 220. However, ω and τ represent indexes of discrete frequency and frame time, respectively, and ω = 1, 2,. Note that the frequency domain representation of the observed signal picked up by the m-th microphone X _m (ω, τ) ^{and, X - (ω, τ)} = [X 1 (ω, τ), ..., X M (ω, τ)] ^T. ^T represents transposition.

＜拡散センシング部２２０＞
拡散センシング部２２０は、Ω個の観測信号X^-(ω,τ)を受け取る。また、拡散センシング部２２０は、予め伝達特性記憶部２１０からN×Ω個の伝達特性a^-(ω,r^- _n)を取り出しておく。そして、拡散センシング部２２０は、Ω個の観測信号X^-(ω,τ)とN×Ω個の伝達特性a^-(ω,r^- _n)とを用いて、想定位置r^- _nに推定対象の音源が存在している可能性の高さを表す指標を求め（ｓ８）、音源位置推定部１６０に出力する。以下、その処理内容をより詳しく説明する。 <Diffusion sensing unit 220>
The diffusion sensing unit 220 receives Ω number of observation signals X ⁻ (ω, τ). The diffusion sensing unit 220 takes out N × Ω pieces of transfer characteristics a ⁻ (ω, r ⁻ _n ) from the transfer characteristic storage unit 210 in advance. Then, the diffusion sensing unit 220 uses the Ω observation signals X ⁻ (ω, τ) and N × Ω transfer characteristics a ⁻ (ω, r ⁻ _n ) to estimate the target position r ⁻ _n. An index representing the high possibility that the sound source is present is obtained (s8) and output to the sound source position estimating unit 160. Hereinafter, the processing content will be described in more detail.

（雑音空間相関行列計算部１４０）
雑音空間相関行列計算部１４０は、Ω個の観測信号X^-(ω,τ)を受け取り、この値を用いて、周波数ω毎に、雑音の空間相関行列R^- _N(ω,τ)を計算し（ｓ９）、ミュージックスペクトル計算部１５０に出力する。 (Noise spatial correlation matrix calculation unit 140)
The noise spatial correlation matrix calculation unit 140 receives Ω number of observation signals X ⁻ (ω, τ), and uses this value to calculate a noise spatial correlation matrix R ⁻ _N (ω, τ) for each frequency ω. (S9) and output to the music spectrum calculator 150.

雑音空間相関行列計算部１４０は、まず、Ω個の観測信号X^-(ω,τ)を用いて、空間相関行列R^-(ω,τ)を計算する。 The noise spatial correlation matrix calculation unit 140 first calculates a spatial correlation matrix R ⁻ (ω, τ) using Ω number of observation signals X ⁻ (ω, τ).

ここで、Hは共役転置を表わす。また、E[・]は期待値演算子で、例えば時間的な平均化処理で置き換えても問題ない。次に、雑音空間の空間相関行列を生成するために、空間相関行列R^-(ω,τ)を固有分解する。 Here, H represents conjugate transposition. E [•] is an expected value operator, and can be replaced by, for example, temporal averaging. Next, in order to generate a spatial correlation matrix of the noise space, the spatial correlation matrix R ⁻ (ω, τ) is eigendecomposed.

ここで、V^-(ω,τ)=[v^- ₁(ω,τ),…,v^- _M(ω,τ)]はM個の固有ベクトルv^- _m(ω,τ)で構成された固有ベクトル行列である。また、Λ^-(ω,τ)=diag([Λ₁(ω,τ),…,Λ_M(ω,τ)])は、M個の固有値Λ_m(ω,τ)で構成された固有値行列である。なお、M個の固有値Λ_m(ω,τ)は、Λ₁(ω,τ)≧…≧Λ_M(ω,τ)の順とする（参考文献１参照）。1番目からK番目までの固有ベクトルv^- ₁(ω,τ),…,v^- _K(ω,τ)には推定対象の音源に起因する成分が含まれるので、K+1番目からM番目までの固有ベクトルv^- _K+1(ω,τ),…,v^- _M(ω,τ)で構成される空間には定常的な雑音しか存在しないことになる。その性質を利用して、雑音の空間相関行列R^- _N(ω,τ)を生成する。 ^{Here, V - (ω, τ)} = [v - 1 (ω, τ), ..., v - M (ω, τ)] is M eigenvectors v ^- _m (ω, τ) configured eigenvectors It is a matrix. ^{Also, Λ - (ω, τ)} = diag ([Λ 1 (ω, τ), ..., Λ M (ω, τ)]) is, M eigenvalues Λ _m (ω, τ) eigenvalues composed of It is a matrix. The M eigenvalues Λ _m (ω, τ) are in the order of Λ ₁ (ω, τ) ≧... ≧ Λ _M (ω, τ) (see Reference 1). The first to Kth eigenvectors v ^- ₁ (ω, τ), ..., v ^- _K (ω, τ) contain components due to the sound source to be estimated, so from K + 1th to Mth , V ⁻ _M (ω, τ), there is only stationary noise in the space composed of eigenvectors v ⁻ _{K + 1} (ω, τ),. Using the property, a noise spatial correlation matrix R ⁻ _N (ω, τ) is generated.

つまり、推定対象の音源に起因する成分が含まれていない固有ベクトルv^- _K+1(ω,τ),…,v^- _M(ω,τ)と固有値Λ_K+1(ω,τ),…,Λ_M(ω,τ)とから雑音の空間相関行列R^- _N(ω,τ)を求める。 That is, eigenvectors v ⁻ _{K + 1} (ω, τ),..., V ⁻ _M (ω, τ) and eigenvalues Λ _{K + 1} (ω, τ),. , Λ _M (ω, τ) and the noise spatial correlation matrix R ⁻ _N (ω, τ).

（ミュージックスペクトル計算部１５０）
ミュージックスペクトル計算部１５０は、Ω個の雑音の空間相関行列R^- _N(ω,τ)を受け取り、この値と、伝達特性記憶部２１０から取り出しておいたN×Ω個の伝達特性a^-(ω,r^- _n)とから、次式により、周波数ω毎、かつ、想定位置r^- _n毎に、ミュージックスペクトルP_MUSIC(ω,τ,r^- _n)を計算し（ｓ１１）、音源位置推定部１６０に出力する。ただし、n=1,2,…,Nである。 (Music spectrum calculator 150)
The music spectrum calculation unit 150 receives the spatial correlation matrix R ⁻ _N (ω, τ) of Ω noises, and this value and the N × Ω transfer characteristics a ⁻ (removed from the transfer characteristic storage unit 210. omega, r ^- from a _n), the following equation, for each frequency omega, and assumed position r ^- for each _n, music spectrum _{P MUSIC (ω, τ, r} - n) was calculated (s11), the sound source position estimation Output to the unit 160. However, n = 1, 2,..., N.

従来技術では、直接音のみをモデル化して伝達特性a^-(ω,r^- _n)を計算していたが、本実施形態では、前述の通り、直接音と反射音とからモデル化して計算している。なお、本実施形態では、このミュージックスペクトルP_MUSIC(ω,τ,r^- _n)を、想定位置に推定対象の音源が存在している可能性が高さを表す指標として用いる。 In the prior art, the transfer characteristic by modeling only the direct sound ^{^{_{a - (ω, r - n}}} ) but was not calculated, in the present embodiment, as described above, and calculated by modeling from the direct sound and the reflected sound ing. In the present embodiment, this music spectrum P _MUSIC (ω, τ, r ⁻ _n ) is used as an index indicating the high possibility that the estimation target sound source exists at the assumed position.

＜音源位置推定部１６０＞
音源位置推定部１６０は、N×Ω個のミュージックスペクトルP_MUSIC(ω,τ,r^- _n)を受け取る。ここで、ミュージックスペクトルP_MUSIC(ω,τ,r^- _ｎ)は、その値が大きいほど、対応する想定位置r^- _nに音源が存在している可能性が高いことを表す。そこで、音源位置推定部１６０は、大きいミュージックスペクトルP_MUSIC(ω,τ,r^- _n)に対応する位置をK個抽出し、これを音源の位置として推定し（ｓ１３）、推定位置r^-(τ)=[r^- ₁(τ),…,r^- _K(τ)]を出力する。 <Sound source position estimation unit 160>
The sound source position estimation unit 160 receives N × Ω music spectra P _MUSIC (ω, τ, r ⁻ _n ). Here, the music spectrum P _MUSIC (ω, τ, r ⁻ _n ) indicates that the greater the value, the higher the possibility that a sound source exists at the corresponding assumed position r ⁻ _n . Therefore, the sound source position estimation unit 160 extracts K positions corresponding to the large music spectrum P _MUSIC (ω, τ, r ⁻ _n ), estimates these positions as the sound source positions (s13), and estimates the position r ⁻ ( τ) = [r ^- ₁ (τ), ..., r ^- _K (τ)] is output.

例えば、以下のコストC_MUSIC(τ,r^- _n)が大きいものをK個抽出し、そのコストC_MUSIC(τ,r^- _k)に対応するK個の推定位置r^- _k(τ)を出力する。 For example, K items having the following large cost C _MUSIC (τ, r ^- _n ) are extracted, and K estimated positions r ^- _k (τ) corresponding to the cost C _MUSIC (τ, r ^- _k ) are output. To do.

＜効果＞
拡散性反射音を生じさせて音を観測することで、狭間隔に音源が配置されていたとしても、ターゲット音とそれ以外の音源を見分けるための手掛かりが観測信号に含まれる。拡散性反射音を考慮して信号処理することで（具体的には伝達特性を用いることに対応）、狭間隔に音源が配置されていたとしても音源の位置を推定することが可能になる。 <Effect>
By observing the sound by generating a diffuse reflection sound, the observation signal includes a clue for distinguishing the target sound from other sound sources even if the sound sources are arranged at narrow intervals. By performing signal processing in consideration of diffusive reflected sound (specifically, using transfer characteristics), it is possible to estimate the position of the sound source even if the sound sources are arranged at narrow intervals.

＜変形例＞
ＡＤ変換処理（ｓ５）や周波数領域変換処理（ｓ７）は、マイクロホン１１０−ｍの内部で行われてもよい。その場合、ＡＤ変換部１２０や周波数領域変換部１３０は、マイクロホン１１０−ｍ内に設けられる構成となる。 <Modification>
The AD conversion process (s5) and the frequency domain conversion process (s7) may be performed inside the microphone 110-m. In that case, the AD conversion unit 120 and the frequency domain conversion unit 130 are provided in the microphone 110-m.

＜第二実施形態に係る音源位置推定装置３＞
第一実施形態と異なる部分についてのみ説明する。 <Sound source position estimation apparatus 3 according to the second embodiment>
Only parts different from the first embodiment will be described.

第二実施形態では、c)ビームフォーマ法により、音源位置を推定する。ビームフォーマ法は、多数のビームフォーマを用意して、空間を走査することにより、音源位置を推定する方式である。 In the second embodiment, c) the sound source position is estimated by the beamformer method. The beam former method is a method of estimating a sound source position by preparing a large number of beam formers and scanning a space.

図５は音源位置推定装置３の機能ブロック図を、図６はその処理フローを示す。 FIG. 5 is a functional block diagram of the sound source position estimating apparatus 3, and FIG.

音源位置推定装置３は、M個のマイクロホン１１０−ｍと、反射手段２００と、ＡＤ変換部１２０と、周波数領域変換部１３０と、伝達特性記憶部２１０と、拡散センシング部３３０と、音源位置推定部１６０とを含む。拡散センシング部３３０は、フィルタ計算部３４０と、空間スペクトル計算部３５０とを含む。なお、拡散センシング部３３０における処理（ｓ３２）の概要は、拡散センシング部２２０における処理（ｓ８）と同様であり、その詳細が異なる。第一実施形態とは異なるフィルタ計算部３４０及び空間スペクトル計算部３５０について詳細を説明する。 The sound source position estimation device 3 includes M microphones 110-m, a reflection unit 200, an AD conversion unit 120, a frequency domain conversion unit 130, a transfer characteristic storage unit 210, a diffusion sensing unit 330, and a sound source position estimation. Part 160. The diffusion sensing unit 330 includes a filter calculation unit 340 and a spatial spectrum calculation unit 350. The outline of the process (s32) in the diffusion sensing unit 330 is the same as the process (s8) in the diffusion sensing unit 220, and the details thereof are different. Details of the filter calculation unit 340 and the spatial spectrum calculation unit 350 different from those of the first embodiment will be described.

＜フィルタ計算部３４０＞
フィルタ計算部３４０は、伝達特性記憶部２１０から取り出しておいたN×Ω個の伝達特性a^-(ω,r^- _n)から空間を走査するためのフィルタw^-(ω,r^- _n)=[W₁(ω,r^- _n),…,W_M(ω,r^- _n)]^Tを、周波数ω毎、走査する位置毎（言い換えると、想定位置r^- _n毎）に計算し（ｓ３３）、空間スペクトル計算部３５０に出力する。フィルタの設計法は様々あるが、本実施形態では、a)遅延和法とb)最小分散法とについて説明する。 <Filter calculation unit 340>
Filter calculation unit 340, had been taken out from the transfer characteristic storage unit 210 N × Omega number of transfer characteristics ^{^{_{a - (ω, r - n}}} ) filter to scan the space from the ^{^{_{w - (ω, r - n}}} ) = _{^{[W 1 (ω, r -}} n), ..., W M (ω, r - n)] and ^T, each frequency omega, (in other words, assuming the position r ^- each _n) each position of scanning computed (s33 ) And output to the spatial spectrum calculation unit 350. Although there are various filter design methods, in this embodiment, a) the delay sum method and b) the minimum variance method will be described.

a)遅延和法では、以下のように、想定位置r^- _nにある音を強調するコストでフィルタw^-(ω,r^- _n)が設計される。 In a) delay and sum method, as follows, assuming the position r ^- filter emphasizing cost sound in the _{^{^{n w - (ω, r -}}} n) is designed.

b)最小分散無歪応答法（ＭＶＤＲ method;minimum variance distortion response method）では、以下のように、想定位置r^- _nにある音を強調しつつ、雑音のエネルギーを最小化するコストで設計される。 b) The minimum variance distortion response method (MVDR method) is designed at the cost of minimizing noise energy while emphasizing the sound at the assumed position r ^- _n as follows: .

他にも様々なフィルタ設計法があるが、任意の方式を用いてフィルタを設計して良い。 There are various other filter design methods, but the filter may be designed using any method.

なお、フィルタw^-(ω,r^- _n)は、伝達特性a^-(ω,r^- _n)の測定後、空間スペクトル計算部３５０における処理を行うまでに計算すればよい。 The filter ^{^{_{w - (ω, r - n}}} ) , the transfer characteristics ^{^{_{a - (ω, r - n}}} ) After measurement of the may be calculated until the processing in the spatial spectrum calculation section 350.

＜空間スペクトル計算部３５０＞
空間スペクトル計算部３５０は、N×Ω個のフィルタw^-(ω,r^- _n)とΩ個の観測信号X^-(ω,τ)とを受け取り、次式のように、フィルタw^-(ω,r^- _n)と観測信号X^-(ω,τ)とを畳み込み、空間スペクトルP_BF(ω,τ,r^- _n)を計算し（ｓ３５）、音源位置推定部１６０に出力する。 <Spatial spectrum calculation unit 350>
The spatial spectrum calculation unit 350 receives N × Ω filters w ⁻ (ω, r ⁻ _n ) and Ω observation signals X ⁻ (ω, τ), and filters w ⁻ (ω , r ⁻ _n ) and the observation signal X ⁻ (ω, τ) are convolved to calculate a spatial spectrum P _BF (ω, τ, r ⁻ _n ) (s35) and output to the sound source position estimation unit 160.

なお、本実施形態では、この空間スペクトルP_BF(ω,τ,r^- _n)を、想定位置に推定対象の音源が存在している可能性が高さを表す指標として用いる。よって、音源位置推定部１６０では、ミュージックスペクトルP_MUSIC(ω,τ,r^- _n)に代えて、空間スペクトルP_BF(ω,τ,r^- _n)を用いて、同様の処理を行う。 In the present embodiment, this spatial spectrum P _BF (ω, τ, r ⁻ _n ) is used as an index representing the high possibility that the estimation target sound source is present at the assumed position. Therefore, the sound source position estimating section 160, music spectrum _{P MUSIC (ω, τ, r} - n) in place of the spatial spectrum _{P BF (ω, τ, r} - n) using, perform the same processing.

このような構成により第一実施形態と同様の効果を得ることができる。 With this configuration, the same effect as that of the first embodiment can be obtained.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
上述した音源位置推定装置は、コンピュータにより機能させることもできる。この場合はコンピュータに、目的とする装置（各種実施形態で図に示した機能構成をもつ装置）として機能させるためのプログラム、またはその処理手順（各実施形態で示したもの）の各過程をコンピュータに実行させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスク、半導体記憶装置などの記録媒体から、あるいは通信回線を介してそのコンピュータ内にダウンロードし、そのプログラムを実行させればよい。 <Program and recording medium>
The above-described sound source position estimation apparatus can be functioned by a computer. In this case, each process of a program for causing a computer to function as a target device (a device having the functional configuration shown in the drawings in various embodiments) or a process procedure (shown in each embodiment) is processed by the computer. A program to be executed by the computer may be downloaded from a recording medium such as a CD-ROM, a magnetic disk, or a semiconductor storage device or via a communication line into the computer, and the program may be executed.

Claims

There are multiple sound sources to be estimated, and the sound sources to be estimated are arranged with a narrow angle difference and distance,
A plurality of microphones;
A plurality of the plurality of reflected sounds can be collected in each microphone holon with respect to the sound generated from each of a plurality of assumed positions of the sound source to be estimated. Reflection means arranged in the vicinity of the microphone;
A transfer characteristic storage unit that stores transfer characteristics including the influence of reflected sound generated by the reflecting means from a plurality of assumed positions of a sound source to be estimated to the plurality of microphones;
Using the frequency domain observation signals obtained from the plurality of microphones and the transfer characteristics corresponding to the plurality of assumed positions, it is possible to increase the possibility that a sound source to be estimated exists at the assumed positions. A diffuse sensing unit for obtaining an index to represent,
It is seen including a sound source position estimating section for estimating a position at which the estimation target sound source corresponds to the index indicating the high possibility to be present as the estimated position of the target sound source, and
The plurality of assumed positions are arranged with a narrow angle difference and distance,
Sound source position estimation device.

  The sound source position estimation apparatus according to claim 1,
  The reflection means has a shape that increases the number of reflections.
  Sound source position estimation device.

  The sound source position estimation apparatus according to claim 2,
  The reflecting means is
  (i) a shape having one surface of the dodecahedron as an opening surface;
  (ii) a shape having one surface of an icosahedron as an opening surface;
  (iii) a shape having an opening in a rhomboid dodecahedron,
  (iv) a shape in which an opening is provided in a sphere,
  (v) a shape in which one of the vertices of the octahedron is cut out to provide an opening,
  Either
  Sound source position estimation device.

The sound source position estimation device according to any one of claims 1 to 3 ,
The diffusion sensing unit is
A noise space in which a spatial correlation matrix is calculated using the observed signal, a spatial correlation matrix is calculated, and the spatial correlation matrix is eigendecomposed to obtain a noise spatial correlation matrix from eigenvectors and eigenvalues that do not include components derived from the sound source to be estimated A correlation matrix calculator,
A music spectrum calculation unit for calculating a music spectrum as the index from the transfer characteristics corresponding to the plurality of assumed positions and the spatial correlation matrix of the noise,
Sound source position estimation device.

The sound source position estimation device according to any one of claims 1 to 3 ,
The diffusion sensing unit is
A filter calculation unit for calculating a filter for scanning the space using the transfer function;
A spatial spectrum calculation unit that convolves the filter and the observation signal and calculates a spatial spectrum as the index,
Sound source position estimation device.