JP6540730B2

JP6540730B2 - Sound collection device, program and method, determination device, program and method

Info

Publication number: JP6540730B2
Application number: JP2017028268A
Authority: JP
Inventors: 一浩片桐
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-02-17
Filing date: 2017-02-17
Publication date: 2019-07-10
Anticipated expiration: 2037-02-17
Also published as: JP2018132737A

Description

本発明は、収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法に関し、例えば、目的エリアの音を強調し、それ以外のエリアの音を抑圧する処理に適用し得る。 The present invention relates to a sound collection device, a program and method, and a determination device, program and method, and can be applied to, for example, a process of emphasizing the sound of a target area and suppressing the sound of other areas.

複数の音源が存在する環境下において、ある特定の方向の音のみ分離し収音する技術として、マイクロホンアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下ＢＦ）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。ＢＦは、加算型と減算型の大きく２つの種類に分けられる。 There is a beam former (Beam Former; hereinafter referred to as BF) using a microphone array as a technology for separating and collecting only a sound in a specific direction in an environment where a plurality of sound sources exist. BF is a technology for forming directivity by using the time difference between signals arriving at each microphone (see Non-Patent Document 1). BF can be roughly divided into two types: addition and subtraction.

特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。 In particular, the subtractive BF has an advantage that directivity can be formed with a smaller number of microphones than the additive BF.

図７は、従来の減算型ＢＦに係る構成を示すブロック図である。 FIG. 7 is a block diagram showing a configuration according to a conventional subtraction type BF.

図７に示す従来の減算型ＢＦでは、マイクロホン数が２個となっている。 In the conventional subtraction type BF shown in FIG. 7, the number of microphones is two.

従来の減算型ＢＦは、まず遅延器により目的とする方向に存在する音（以下、「目的音」とも呼ぶ）が各マイクロホンに到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。従来の減算型ＢＦの遅延器では、時間差は下記（１）式により算出される。 The conventional subtractive BF first calculates the time difference between the signals arriving at each target microphone (hereinafter also referred to as "target sound") by means of a delay device, and adds a delay to the target sound. Match the phase. In the conventional subtraction type BF delayer, the time difference is calculated by the following equation (1).

下記の（１）式において、ｄはマイクロホン間の距離、ｃは音速、τ_ｉは遅延量である。また、下記の（１）式において、θ_Ｌは、各マイクロホンを結んだ直線に対する垂直方向から目的方向への角度である。
τ_Ｌ＝（ｄｓｉｎθ_Ｌ）／ｃ …（１） In the following equation (1), d is the distance between the microphones, c is the speed of sound, and τ _i is the delay amount. Further, in the following equation (1), θ _L is an angle from a perpendicular direction to a target direction with respect to a straight line connecting the microphones.
τ _L = (d sin θ _L ) / c (1)

ここで、死角が第１のマイクロホンと第２のマイクロホンの中心に対し、第１のマイクロホンの方向に存在する場合、従来の減算型ＢＦにおける遅延器は、第１のマイクロホンの入力信号ｘ_１（ｔ）に対し遅延処理を行う。その後、遅延処理された入力信号ｘ_１（ｔ）は、（２）式に従い減算処理される。
ａ（ｔ）＝ｘ_２（ｔ）−ｘ_１（ｔ−τ_Ｌ） …（２） Here, when the dead angle is in the direction of the first microphone with respect to the centers of the first microphone and the second microphone, the delay device in the conventional subtractive BF is the input signal x ₁ (the first microphone) Perform delay processing for t). Thereafter, the delayed input signal x ₁ (t) is subtracted according to equation (2).
a (t) = x ₂ (t) -x ₁ (t-τ _L ) (2)

従来の減算型ＢＦにおける減算処理は、周波数領域でも同様に行うことができ、その場合（２）式は以下の（３）式のように変更される。

The subtraction process in the conventional subtraction type BF can be similarly performed in the frequency domain, and in that case, the equation (2) is changed to the following equation (3).

ここでθ_Ｌ＝±π／２の場合、形成される指向性は図８（Ａ）に示すように、カージオイド型の単一指向性となり、θ_Ｌ＝０，πの場合は、図８（Ｂ）のような８の字型の双指向性となる。以下では、入力信号から単一指向性を形成するフィルタを単一指向性フィルタ、双指向性を形成するフィルタを双指向性フィルタと呼ぶものとする。 Here, in the case of θ _L = ± π / 2, the directivity formed is a cardioid single directivity as shown in FIG. 8A, and in the case of θ _L = 0, π, FIG. It becomes a figure 8-shaped bi-directional like (B). In the following, a filter that forms unidirectionality from an input signal is referred to as a unidirectional filter, and a filter that forms bidirectionality is referred to as a bidirectional filter.

またスペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下「ＳＳ」とも呼ぶ）を用いることで、双指向性の死角に強い指向性を形成することもできる。ＳＳによる指向性の形成は、（４）式に従う。（４）式では、第１のマイクロホンの入力信号Ｘ_１を用いているが、第２のマイクロホンの入力信号Ｘ_２でも同様の効果を得ることができる。ここでβはＳＳの強度を調節するための係数である。減算時に値がマイナスになった場合は、０または元の値を小さくした値に置き換えるフロアリング処理を行う。この方式は、双指向性フィルタにより目的方向以外に存在する音（以下、「非目的音」とも呼ぶ）を抽出し、抽出した非目的音のパワースペクトルを入力信号のパワースペクトルから減算することで、目的音を強調することができる。
｜Ｙ（ω）｜＝｜Ｘ_１（ω）｜−β｜Α（ω）｜ …（４） In addition, by using spectral subtraction (hereinafter also referred to as "SS"), strong directivity can be formed in a bidirectional blind spot. The formation of directivity by SS follows equation (4). Although the input signal X1 of the _first microphone is used in the equation (4), the same effect can be obtained with the input signal X2 of the _second microphone. Here, β is a coefficient for adjusting the intensity of SS. If the value becomes negative at the time of subtraction, floor processing is performed to replace 0 or the original value with a smaller value. In this method, bi-directional filters are used to extract sounds (hereinafter also referred to as "non-target sounds") present in other than the target direction, and the power spectrum of the non-target sounds extracted is subtracted from the power spectrum of the input signal. , Can emphasize the target sound.
| Y (ω) | = | X ₁ (ω) | -β | Α (ω) | (4)

ある特定のエリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音したい場合、減算型ＢＦを用いるだけでは、そのエリアの周囲に存在する音源（以下、「非目的エリア音」と呼ぶ）も収音してしまう可能性がある。そこで特許文献１では、複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する手法を提案している。 If you want to pick up only the sounds that exist in a specific area (hereinafter referred to as the "target area sound"), using the subtractive BF only, the sound sources that exist around that area ("non-target area") It may also pick up the sound). Therefore, Patent Document 1 proposes a method of collecting a target area sound by directing directivity to the target area from different directions and crossing the directivity in the target area using a plurality of microphone arrays.

次に、特許文献１に記載された目的エリア音の収音処理の例について説明する。 Next, an example of the sound pickup process of the target area sound described in Patent Document 1 will be described.

図９は、２つのマイクロホンアレイＭＡ１、ＭＡ２を用いて、目的エリアの音源からの目的エリア音を収音する場合における各マイクロホンアレイの構成例について示した説明図である。 FIG. 9 is an explanatory view showing a configuration example of each microphone array when the target area sound from the sound source of the target area is picked up using the two microphone arrays MA1 and MA2.

図１０は、図９に示すマイクロホンアレイＭＡ１、ＭＡ２のそれぞれのＢＦ出力について周波数領域で示した説明図（グラフ）である。図１０（ａ）、図１０（ｂ）は、それぞれマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力について周波数領域で示したグラフ（イメージ図）である。 FIG. 10 is an explanatory diagram (graph) showing the BF output of each of the microphone arrays MA1, MA2 shown in FIG. 9 in the frequency domain. FIGS. 10A and 10B are graphs (image views) showing the BF output of the microphone arrays MA1 and MA2 in the frequency domain, respectively.

特許文献１に記載された手法では、まず各マイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力に含まれる目的エリア音のパワーの比率を推定し、それを補正係数とする。具体的には、２つのマイクロホンアレイＭＡ１、ＭＡ２を使用する場合、目的エリア音パワーの補正係数は、例えば、（５）、（６）式又は（７）、（８）式により算出することができる。

In the method described in Patent Document 1, first, the ratio of the power of the target area sound included in the BF output of each of the microphone arrays MA1 and MA2 is estimated, and this is used as a correction coefficient. Specifically, when using two microphone arrays MA1 and MA2, the correction coefficient of the target area sound power may be calculated, for example, by (5), (6) or (7), (8) it can.

ここで、Ｙ_１ｋ（ｎ），Ｙ_２ｋ（ｎ）はマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力のパワースペクトル、Ｎは周波数ビンの総数、ｋは周波数、α（ｎ）はＢＦ出力に対するパワー補正係数である。またｍｏｄｅは最頻値、ｍｅｄｉａｎは中央値を表している。その後、補正係数により各ＢＦ出力を補正し、ＳＳすることで、目的エリア方向に存在する非目的エリア音を抽出する。更に抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより目的エリア音を抽出することができる。 Here, Y _1k (n) and Y _2k (n) are the power spectra of the BF output of the microphone arrays MA 1 and MA 2, N is the total number of frequency bins, k is the frequency, and α (n) is the power correction coefficient for the BF output is there. Also, mode represents a mode value, and median represents a median value. Thereafter, each BF output is corrected by the correction coefficient, and the non-target area sound present in the direction of the target area is extracted by performing SS. Furthermore, the target area sound can be extracted by performing SS on the extracted non-target area sound from the output of each BF.

図１１は、図９に示すマイクロホンアレイＭＡ１、ＭＡ２を用いて取得したＢＦ出力に基づいてエリア収音処理した場合における各成分のパワースペクトルの変化について示した説明図（イメージを図）である。 FIG. 11 is an explanatory view (image) showing a change in power spectrum of each component when area sound collection processing is performed based on the BF output acquired using the microphone arrays MA1 and MA2 shown in FIG.

まず、マイクロホンアレイＭＡ１の入力信号Ｘ_１から、非目的エリア音Ｎ_２を抑圧したＢＦ出力Ｙ_１を得る（図１１（ａ）参照）。 First, the input signal _{X 1} of the microphone array MA1, obtain BF output _{Y 1} that suppresses the non-target area sound _{N 2} (see FIG. 11 (a)).

マイクロホンアレイＭＡ１からみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出するには、（７）式に示すように、マイクロホンアレイＭＡ１のＢＦ出力Ｙ_２（ｎ）からマイクロホンアレイＭＡ２のＢＦ出力Ｙ_２（ｎ）にパワー補正係数αを掛けたものをＳＳする（図１１（ｂ）参照）。その後、（８）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する（図１１（ｃ）参照）。γ（ｎ）はＳＳ時の強度を変更するための係数である。
Ｎ_１（ｎ）＝Ｙ_１（ｎ）−α（ｎ）Ｙ_２（ｎ） …（７）
Ｚ_１（ｎ）＝Ｙ_１（ｎ）−γ（ｎ）Ｎ_１（ｎ） …（８） In order to extract non-target area sounds N ₁ (n) present in the direction of the target area viewed from the microphone array MA 1, the BF output Y ₂ (n) of the microphone array MA ₁ is extracted from the microphone array MA ₂ as shown in equation (7). The product of the BF output Y ₂ (n) and the power correction coefficient α is SS (see FIG. 11B). Thereafter, according to the equation (8), the non-target area sound is SS from each BF output to extract the target area sound (see FIG. 11 (c)). γ (n) is a coefficient for changing the intensity at SS.
N ₁ (n) = Y ₁ (n) -α (n) Y ₂ (n) (7)
Z ₁ (n) = Y ₁ (n) -γ (n) N ₁ (n) (8)

目的エリア音を抽出するために、（４）式と（８）式で非線形処理であるＳＳを行っているため、高雑音環境下ではミュージカルノイズと呼ばれる不快な異音が発生する恐れがある。 In order to extract the target area sound, the SS which is non-linear processing is performed by the equations (4) and (8), so that in a high noise environment, unpleasant noise called musical noise may be generated.

そこで特許文献２では、目的エリア音が存在している区間と存在していない区間を判定し、存在していない区間ではエリア収音処理した音を出力しないことにより、ミュージカルノイズなどの異音を抑えている。目的エリア音が存在しているかどうかを判定するために、まず（９）式に従い入力信号と目的エリア音を抽出した出力（以下、「エリア音出力」と呼ぶ）間のパワースペクトル比（エリア音出力／入力信号）を算出する。目的エリア内に音源が存在する場合、入力信号Ｘ_１とエリア音出力Ｚ_１には目的エリア音が共通に含まれるため、目的エリア音成分のパワースペクトル比は１に近い値となる。逆に非目的エリア音成分は、エリア音出力では抑圧されているため、パワースペクトル比は小さい値となる。またその他の背景雑音成分に関してもエリア収音処理では複数回のＳＳを行うため、専用の雑音抑圧処理を事前にしなくてもある程度抑圧され、パワースペクトル比は小さい値となる。逆に目的エリア音が存在しない場合、エリア音出力には、入力信号と比べて消し残りの弱い雑音しか含まれていないため、パワースペクトル比は全体域で小さい値となる。この特徴により、（１０）式に従い各周波数で求めたパワースペクトル比の平均（以下、「平均パワースペクトル比」とも呼ぶ）を取ると、目的エリア音が存在するときと存在しないときとで大きな差が生まれることになる。ここで、ｍとｎは、それぞれ処理帯域の上限と下限であり、例えば音声情報が十分に含まれる１００Ｈｚから６ｋＨｚとしてもよい。そして、特許文献２に記載された装置では平均パワースペクトル比を予め設定した閾値で判定し、目的エリア音が存在しないと判定された場合は、エリア音出力データを出力せずに無音、もしくは入力音のゲインを小さくした音を出力する。

Therefore, in Patent Document 2, a section in which the target area sound is present and a section in which the target area sound does not exist are determined, and in the section where the target area sound is not present, abnormal noise such as musical noise is output. I'm holding back. Power spectrum ratio (area sound) between the input signal and the output from which the target area sound is extracted (hereinafter referred to as "area sound output") according to equation (9) to determine whether the target area sound exists Calculate the output / input signal). If there is a sound source in the destination area, since the destination area sound to the input signal X ₁ and Area sound output Z ₁ is included in the common power spectral ratios object area sound component is a value close to 1. Conversely, since the non-target area sound component is suppressed by the area sound output, the power spectrum ratio becomes a small value. In addition, since the SS is performed a plurality of times in the area sound collection process for other background noise components, the power spectrum ratio is reduced to a small value without using a dedicated noise suppression process in advance. Conversely, when the target area sound does not exist, the area sound output contains only weak residual noise as compared to the input signal, so the power spectral ratio has a small value in the entire region. Due to this feature, when the average of the power spectrum ratio obtained at each frequency according to equation (10) (hereinafter also referred to as "average power spectrum ratio") is taken, a large difference occurs between the presence and absence of the target area sound. Will be born. Here, m and n are respectively the upper limit and the lower limit of the processing band, and may be, for example, 100 Hz to 6 kHz in which audio information is sufficiently included. Then, in the device described in Patent Document 2, the average power spectrum ratio is determined by a preset threshold value, and when it is determined that the target area sound does not exist, the area sound output data is not output but silent or input Output sound with reduced sound gain.

特開２０１４−０７２７０８号公報JP, 2014-072708, A 特開２０１６−１２７４５７号公報JP, 2016-127457, A

浅野太著，“音響テクノロジーシリーズ１６音のアレイ信号処理−音源の定位・追跡と分離−”，日本音響学会編，コロナ社，２０１１年２月２５日発行Asano Ta, "Sound Technology Series 16 Array signal processing of sound-Localization, tracking and separation of sound source", Japan Acoustical Society, edited by Corona, February 25, 2011

特許文献１に記載の手法を用いれば、目的とするエリアの周囲に非目的エリア音が存在していても、目的エリア音を収音することができる。また、特許文献２に記載の手法を用いれば、エリア収音処理で発生するミュージカルノイズの影響を抑えることができる。しかしながら、イベント会場など人が多い場所、また周囲で音楽などが流れている場所などの高雑音環境下ではＳＮ比が悪化し、エリア収音により出力される音のパワースペクトルが小さくなる可能性がある。このような状況では、エリア収音出力と入力信号の平均パワースペクトル比も小さくなってしまう。特に無声子音の様なもともとパワースペクトルが小さい成分では、非目的エリア音区間の平均パワースペクトル比との差が小さくなるため、目的エリア音の判定精度が悪くなり、目的エリア音の一部が欠落してしまう恐れがある。 If the method described in Patent Document 1 is used, the target area sound can be collected even if the non-target area sound exists around the target area. Moreover, if the method described in Patent Document 2 is used, the influence of musical noise generated in the area sound collection process can be suppressed. However, in a high noise environment such as an event site where there are many people or where music is flowing around, the SN ratio may deteriorate, and the power spectrum of the sound output by the area pickup may be reduced. is there. In such a situation, the average power spectrum ratio of the area pickup output and the input signal also becomes small. In particular, in components with originally small power spectrum such as unvoiced consonants, the difference with the average power spectrum ratio of the non-target area sound section becomes small, so that the judgment accuracy of the target area sound deteriorates and part of the target area sound is missing There is a risk of

以上のような問題に鑑みて、背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法が望まれている。 In view of the above problems, there is a demand for a sound collection device, program and method, determination device, program and method capable of improving the determination accuracy of the target area sound in an environment where background noise is strong. .

第１の本発明の収音装置は、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）前記入力信号と前記抽出音をそれぞれ複数の帯域に分割する帯域分割手段と、（５）前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（６）前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定する判定手段と、（７）前記判定手段で目的エリア音が存在すると判定された場合に収音結果として前記抽出音を出力する出力手段と、（８）前記入力信号と前記抽出音の全帯域の平均パワースペクトル比を算出する全帯域平均パワースペクトル比算出手段とを有し、（９）前記判定手段は、まず、前記全帯域平均パワースペクトル比算出手段が算出した前記全帯域の平均パワースペクトル比に基づいて前記入力信号に目的エリア音が存在するか否かを判定する第１の判定処理を行い、（１０）前記帯域分割手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記入力信号と前記抽出音をそれぞれ複数の帯域に分割し、（１１）前記パワースペクトル比算出手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記帯域分割手段で分割された帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出し、（１２）前記判定手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記パワースペクトル比算出手段で算出されたパワースペクトル比から前記入力信号に目的エリア音が存在するか否かを判定する第２の判定処理を行うことを特徴とする。 The sound pickup apparatus according to the first aspect of the present invention comprises (1) directivity forming means for forming directivity in the direction of a target area from an input signal by a beam former, and (2) directivity formed by the directivity forming means. Non-target area sound extraction means for extracting non-target area sound present in the target area direction; (3) non-target area present in the target area direction extracted by the non-target area sound extraction section from the output of the beam former (5) target area sound extraction means for outputting the extraction sound of the result of extracting the target area sound using the sound; (4) band division means for dividing the input signal and the extraction sound into a plurality of bands; ) Power spectrum ratio calculation means for calculating the power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band division means, and (6) calculation of the power spectrum ratio Determining means for determining whether or not the target area sound is present in the input signal using the power spectrum ratio for each divided band calculated in the step; and (7) determining that the target area sound is present in the determination means And output means for outputting the extracted sound as a sound collection result when being received, and (8) an all-band average power spectrum ratio calculating means for calculating an average power spectrum ratio of all bands of the input signal and the extracted sound. (9) The determining means first determines whether the target area sound is present in the input signal based on the average power spectrum ratio of the entire band calculated by the all band average power spectrum ratio calculating means. And (10) the band dividing unit can not determine whether the target area sound is present in the input signal in the first determination process. And the extracted sound are each divided into a plurality of bands, and (11) the power spectrum ratio calculating means can not determine whether the target area sound is present in the input signal in the first determination process. In this case, the power spectrum ratio of the input signal and the extracted sound is calculated for each of the bands divided by the band division means, and (12) the determination means aims at the input signal in the first determination process. A second determination that determines whether or not a target area sound is present in the input signal from the power spectrum ratio calculated by the power spectrum ratio calculation means when it can not be determined whether the area sound is present or not. It is characterized by performing processing .

第２の本発明の収音プログラムは、コンピュータを、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）前記入力信号と前記抽出音をそれぞれ複数の帯域に分割する帯域分割手段と、（５）前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（６）前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定する判定手段と、（７）前記判定手段で目的エリア音が存在すると判定された場合に収音結果として前記抽出音を出力する出力手段と、（８）前記入力信号と前記抽出音の全帯域の平均パワースペクトル比を算出する全帯域平均パワースペクトル比算出手段として機能させ、（９）前記判定手段は、まず、前記全帯域平均パワースペクトル比算出手段が算出した前記全帯域の平均パワースペクトル比に基づいて前記入力信号に目的エリア音が存在するか否かを判定する第１の判定処理を行い、（１０）前記帯域分割手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記入力信号と前記抽出音をそれぞれ複数の帯域に分割し、（１１）前記パワースペクトル比算出手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記帯域分割手段で分割された帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出し、（１２）前記判定手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記パワースペクトル比算出手段で算出されたパワースペクトル比から前記入力信号に目的エリア音が存在するか否かを判定する第２の判定処理を行うことを特徴とする。 A sound pickup program according to a second aspect of the present invention comprises a computer, (1) directivity forming means for forming directivity in a target area direction from an input signal by a beam former, and (2) the directivity forming means Non-target area sound extraction means for extracting non-target area sound existing in the target area direction by directivity, and (3) present in the target area direction extracted by the non-target area sound extraction means from the output of the beam former Target area sound extraction means for outputting the extraction sound of the result of extracting the target area sound using non-purpose area sound; (4) Band division means for dividing the input signal and the extraction sound into a plurality of bands respectively (5) power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means; A determination unit that determines whether a target area sound is present in the input signal using the power spectrum ratio of each divided band calculated by the word spectrum ratio calculation unit; (7) the target area sound by the determination unit Means for outputting the extracted sound as a sound collection result when it is determined that there is a (8) total band average power spectrum ratio calculation for calculating an average power spectrum ratio of the entire band of the input signal and the extracted sound (9) The determination means first determines whether the target area sound is present in the input signal based on the average power spectrum ratio of the all bands calculated by the all band average power spectrum ratio calculation means. (10) The band dividing means determines whether the target area sound is present in the input signal in the first determination process. If it does not occur, the input signal and the extracted sound are divided into a plurality of bands respectively, and (11) the power spectrum ratio calculating means determines that the target area sound exists in the input signal in the first determination process. If it can not be determined, the power spectrum ratio of the input signal and the extracted sound is calculated for each of the bands divided by the band division means, and (12) the determination means determines the first When it is not possible to determine whether the target area sound exists in the input signal in the determination processing, whether the target area sound exists in the input signal from the power spectrum ratio calculated by the power spectrum ratio calculation means It is characterized in that a second determination process of determining

第３の本発明の収音方法は、（１）指向性形成手段、非目的エリア音抽出手段、目的エリア音抽出手段、帯域分割手段、パワースペクトル比算出手段、判定手段、出力手段、及び全帯域平均パワースペクトル比算出手段を有し、（２）前記指向性形成手段は、入力信号からビームフォーマにより目的エリア方向に指向性を形成し、（３）前記非目的エリア音抽出手段は、前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出し、（４）前記目的エリア音抽出手段は、前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力し、（５）前記帯域分割手段は、前記入力信号と前記抽出音をそれぞれ複数の帯域に分割し、（６）前記パワースペクトル比算出手段は、前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出し、（７）前記判定手段は、前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定し、（８）前記出力手段は、前記判定手段で目的エリア音が存在すると判定された場合に収音結果として前記抽出音を出力し、（９）前記全帯域平均パワースペクトル比算出手段は、前記入力信号と前記抽出音の全帯域の平均パワースペクトル比を算出し、（１０）前記判定手段は、まず、前記全帯域平均パワースペクトル比算出手段が算出した前記全帯域の平均パワースペクトル比に基づいて前記入力信号に目的エリア音が存在するか否かを判定する第１の判定処理を行い、（１１）前記帯域分割手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記入力信号と前記抽出音をそれぞれ複数の帯域に分割し、（１２）前記パワースペクトル比算出手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記帯域分割手段で分割された帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出し、（１３）前記判定手段は、前記第１の判定処理で前記入力信号に目的エリア音が存在するか否かが判定できなかった場合に、前記パワースペクトル比算出手段で算出されたパワースペクトル比から前記入力信号に目的エリア音が存在するか否かを判定する第２の判定処理を行うことを特徴とする。 Sound collection method of the third invention, (1) directivity forming means, non-target area sound extraction unit, destination area sound extraction unit, the band dividing means, the power spectrum ratio calculating means, determining means, outputs means, and has a total band average power spectrum ratio calculating means, (2) the directional forming means, a directional formed from the input signal to the destination area direction by the beam former, (3) the non-target area sound extraction means, The non-target area sound present in the direction of the target area by directivity formed by the directivity forming means is extracted, and (4) the target area sound extraction means extracts the non-target area sound from the output of the beam former The extraction means outputs the extraction sound as a result of extracting the target area sound using the non-target area sound existing in the direction of the target area extracted, and (5) the band dividing means divides the input signal and the extraction sound (6) The power spectrum ratio calculation means calculates the power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band division means (6) 7) The determination means determines whether or not the target area sound is present in the input signal using the power spectrum ratio of each divided band calculated by the power spectrum ratio calculation means, and (8) the output The means outputs the extracted sound as a sound collection result when it is determined that the target area sound exists by the determination means, and (9) the all-band average power spectral ratio calculating means calculates the input signal and the extracted sound Calculating the average power spectrum ratio of all the bands, and (10) the determining means first calculates the average power spectrum ratio of the all bands calculated by the all band average power spectrum ratio calculating means. A first determination process of determining whether or not the target area sound is present in the input signal, and (11) the band dividing means performs the target area sound in the input signal in the first determination process. When it can not be determined whether or not it exists, the input signal and the extracted sound are each divided into a plurality of bands, and (12) the power spectral ratio calculation means calculates the input signal in the first determination process. Power spectrum ratio of the input signal and the extracted sound is calculated for each of the bands divided by the band dividing means when it can not be determined whether the target area sound exists in the target area sound, and (13) the judgment When it is not possible to determine whether the target area sound is present in the input signal in the first determination process, the means determines the input signal from the power spectrum ratio calculated by the power spectrum ratio calculation means. A second determination process of determining whether a target area sound is present is performed .

本発明によれば、背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる。 According to the present invention, it is possible to improve the determination accuracy of the target area sound under the environment where the background noise is strong.

第１の実施形態に係る収音装置（判定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the sound collection device (determination device) concerning a 1st embodiment. 第１の実施形態に係る周波数帯域分割部が処理対象信号のパワースペクトルを分割帯域ごとに分割した例について示した図（グラフ）である。It is the figure (graph) shown about the example which the frequency band division part which concerns on 1st Embodiment divided | segmented the power spectrum of a process target signal for every division | segmentation band. 第１の実施形態に係る帯域別平均パワースペクトル比算出部が算出した分割帯域ごとの平均パワースペクトル比について示した図（グラフ）である。It is the figure (graph) shown about the average power spectrum ratio for every division | segmentation band which the average power spectrum ratio calculation part classified by zone which concerns on 1st Embodiment calculated. 第２の実施形態に係る収音装置（判定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the sound collection device (determination device) concerning a 2nd embodiment. 第２の実施形態に係る収音装置（判定装置）の目的エリア音判定処理の動作について示したフローチャートである。It is the flowchart shown about operation | movement of the object area sound determination process of the sound collection apparatus (determination apparatus) which concerns on 2nd Embodiment. 第３の実施形態に係る収音装置（判定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the sound collection device (determination device) concerning a 3rd embodiment. 従来のマイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。It is a block diagram which shows the structure which concerns on the conventional subtraction type BF in case the number of microphones is two. 従来の２個のマイクロホンを用いた減算型ＢＦにより形成される指向特性を示す図である。It is a figure which shows the directivity characteristic formed by the subtractive BF using two conventional microphones. 従来の２つのマイクロホンアレイを用いて、目的エリアの音源からの目的エリア音を収音する場合における各マイクロホンアレイの構成例について示した説明図である。It is explanatory drawing shown about the structural example of each microphone array in, when collecting the target area sound from the sound source of a target area using two conventional microphone arrays. 従来の２つマイクロホンアレイのそれぞれのＢＦ出力について周波数領域で示した説明図である。It is explanatory drawing shown in the frequency domain about each BF output of two conventional microphone arrays. 従来の２つのマイクロホンアレイを用いて取得したＢＦ出力に基づいてエリア収音処理した場合における各成分のパワースペクトルの変化について示した説明図である。It is explanatory drawing shown about the change of the power spectrum of each component in the case of carrying out area sound collection processing based on BF output acquired using the two conventional microphone arrays.

（Ａ）第１の実施形態
以下、本発明による収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound collection device, program and method, determination device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の収音装置１００の機能的構成について示したブロック図である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a functional configuration of the sound collection device 100 of this embodiment.

収音装置１００は、２つのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 The sound collection device 100 performs the target area sound collection process of collecting the target area sound from the sound source of the target area using the two microphone arrays MA (MA1, MA2).

マイクロホンアレイＭＡ１、ＭＡ２は、目的エリアが存在する空聞の任意の場所に配置される。目的エリアに対するマイクロホンアレイＭＡ１、ＭＡ２の位置は、例えば、図９に示すように、指向性が目的エリアでのみ重なればどこでも良く、例えば目的エリアを挟んで対向に配置しても良い。各マイクロホンアレイＭＡは２つ以上のマイクロホンＭから構成され、各マイクロホンＭにより音響信号を収音する。この実施形態では、各マイクロホンアレイＭＡに、音響信号を収音する２つのマイクロホンＭ（Ｍ１、Ｍ２）が配置されるものとして説明する。すなわち、各マイクロホンアレイＭＡは、２ｃｈマイクロホンアレイを構成している。なお、マイクロホンアレイＭＡの数は２つに限定するものではなく、目的エリアが複数存在する場合、全てのエリアをカバーできる数のマイクロホンアレイＭＡを配置する必要がある。 The microphone arrays MA1 and MA2 are disposed at arbitrary places in the air where the target area exists. For example, as shown in FIG. 9, the positions of the microphone arrays MA1 and MA2 with respect to the target area may be anywhere as long as the directivity overlaps only in the target area. For example, the microphone arrays MA1 and MA2 may be disposed opposite to each other across the target area. Each microphone array MA is composed of two or more microphones M, and each microphone M picks up an acoustic signal. In this embodiment, two microphones M (M1 and M2) for picking up an acoustic signal will be described as being arranged in each microphone array MA. That is, each microphone array MA constitutes a 2ch microphone array. The number of microphone arrays MA is not limited to two. When there are a plurality of target areas, it is necessary to arrange the number of microphone arrays MA that can cover all the areas.

収音装置１００は、データ入力部１、指向性形成部２、遅延補正部３、空間座標データ４、目的エリア音パワー補正係数算出部５、目的エリア音抽出部６、周波数帯域分割部７、帯域別平均パワースペクトル比算出部８、及びエリア音判定部９を有している。収音装置１００を構成する各機能ブロックの詳細処理については後述する。 The sound collection device 100 includes a data input unit 1, directivity formation unit 2, delay correction unit 3, space coordinate data 4, target area sound power correction coefficient calculation unit 5, target area sound extraction unit 6, frequency band division unit 7, A band-wise average power spectrum ratio calculating unit 8 and an area sound judging unit 9 are provided. Detailed processing of each functional block constituting the sound collection device 100 will be described later.

なお、この実施形態では、入力信号に目的エリア音が存在するか否かの判定処理結果に基づいて、目的エリア音の収音結果を出力する収音装置１００について説明するが、収音装置１００から目的エリア音の収音結果を出力する出力手段（エリア音判定部９の一部の処理）を省略して、目的エリア音の判定処理結果を出力する判定装置（判定プログラム、判定方法）として構成するようにしてもよい。 In this embodiment, the sound collection device 100 that outputs the sound collection result of the target area sound will be described based on the determination processing result of whether the target area sound exists in the input signal. A determination device (determination program, determination method) that outputs the determination processing result of the target area sound by omitting the output unit (a part of the processing of the area sound determination unit 9) that outputs the sound collection result of the target area sound It may be configured.

収音装置１００は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１００は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の判定プログラムや収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collection device 100 may be configured entirely by hardware (for example, a dedicated chip or the like), or may be configured as software (program) for part or all. The sound collection device 100 may be configured, for example, by installing a program (including the determination program and the sound collection program of the embodiment) in a computer having a processor and a memory.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の収音装置１００の動作（実施形態に係る判定方法、及び収音方法）を説明する。 (A-2) Operation of First Embodiment Next, an operation (a determination method and a sound collection method according to the embodiment) of the sound collection device 100 of the first embodiment having the configuration as described above will be described. .

データ入力部１は、各マイクロホンアレイＭＡ１、ＭＡ２で収音した音響信号をアナログ信号からデジタル信号に変換する。そして、データ入力部１は、当該デジタル信号について、変換処理（例えば、高速フーリエ変換等を用いて時間領域から周波数領域へ変換する処理）を行う。 The data input unit 1 converts an acoustic signal collected by each of the microphone arrays MA1 and MA2 from an analog signal to a digital signal. Then, the data input unit 1 performs conversion processing (for example, processing of converting from the time domain to the frequency domain using fast Fourier transform or the like) on the digital signal.

指向性形成部２は、マイクロホンアレイＭＡ毎に、目的方向以外に存在する非目的エリア音を抽出（例えば、双指向性フィルタにより抽出）し、抽出した非目的エリア音の振幅スペクトルを入力信号の振幅スペクトルから減算することで、目的エリア方向に指向性を形成した音（ＢＦ出力）を取得する。具体的には、指向性形成部２は、マイクロホンアレイＭＡ毎に、（４）式に従いＢＦにより目的エリア方向に指向性を形成した音をＢＦ出力として取得する。なお、入力される信号が、マイクロホンアレイＭＡではなく、指向性マイクロホンから入力される信号である場合、指向性形成部２の処理を省略して、入力信号をそのまま後段側に供給するようにしてもよい。 The directivity forming unit 2 extracts non-target area sound present in other than the target direction (for example, by a bi-directional filter) for each microphone array MA, and extracts the amplitude spectrum of the extracted non-target area sound as an input signal. By subtracting from the amplitude spectrum, a sound (BF output) in which directivity is formed in the direction of the target area is acquired. Specifically, the directivity forming unit 2 acquires, as the BF output, the sound in which the directivity is formed in the direction of the target area by BF according to the equation (4) for each microphone array MA. When the input signal is not the microphone array MA but the signal input from the directional microphone, the processing of the directivity forming unit 2 is omitted, and the input signal is supplied to the subsequent stage as it is. It is also good.

遅延補正部３は、目的エリアと各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）の距離の違いにより発生する遅延を算出し、補正する。遅延補正部３は、空間座標データ４から目的エリアの位置とマイクロホンアレイの位置を取得し、各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）への目的エリア音の到達時間の差を算出する。次に、遅延補正部３は、最も目的エリアから遠い位置に配置されたマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を基準として、全てのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）に目的エリア音が同時に到達するように遅延を加える。 The delay correction unit 3 calculates and corrects a delay generated due to a difference in distance between the target area and each microphone array MA (MA1, MA2). The delay correction unit 3 acquires the position of the target area and the position of the microphone array from the space coordinate data 4, and calculates the difference in arrival time of the target area sound to each microphone array MA (MA1, MA2). Next, the delay correction unit 3 causes the target area sounds to simultaneously reach all the microphone arrays MA (MA1, MA2) on the basis of the microphone arrays MA (MA1, MA2) arranged farthest from the target area Add a delay to

空間座標データ４は、全ての目的エリアと各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）と各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を構成するマイクロホンＭ（Ｍ１、Ｍ２）の位置情報を保持する。 The spatial coordinate data 4 holds position information of all the target areas, the microphone arrays MA (MA1, MA2), and the microphones M (M1, M2) constituting the microphone arrays MA (MA1, MA2).

目的エリア音パワー補正係数算出部５は、各ＢＦ出力に含まれる目的エリア音成分のパワーを同じにするための補正係数を（５）式または（６）式に従い算出する。 The target area sound power correction coefficient calculation unit 5 calculates a correction coefficient for making the power of the target area sound component included in each BF output the same, according to equation (5) or (6).

目的エリア音抽出部６は、目的エリア音パワー補正係数算出部５で算出した補正係数により補正した各ＢＦ出力データを（７）式に従いＳＳし、目的エリア方向に存在する雑音を抽出する。さらに、目的エリア音抽出部６は、抽出した雑音を各ＢＦの出力から（８）式に従いＳＳすることにより目的エリア音を抽出する。 The target area sound extraction unit 6 SSs each BF output data corrected by the correction coefficient calculated by the target area sound power correction coefficient calculation unit 5 according to the equation (7) to extract noise present in the direction of the target area. Furthermore, the target area sound extraction unit 6 extracts the target area sound by performing SS on the extracted noise from the output of each BF according to the equation (8).

周波数帯域分割部７は、データ入力部１からの入力信号、及び目的エリア音抽出部６からのエリア音出力Ｚ_１を取得し、それぞれを複数の帯域に分割する。ここで入力信号とエリア音出力の帯域幅は同じであるものとする。 Frequency band division unit 7 obtains an input signal from the data input unit 1, and the area sound output Z ₁ from the target area sound extraction unit 6 divides each of a plurality of bands. Here, it is assumed that the bandwidths of the input signal and the area sound output are the same.

以下では、周波数帯域分割部７及び帯域別平均パワースペクトル比算出部８における処理対象の入力信号として、マイクロホンアレイＭＡ１の入力信号Ｘ_１を代表して用いるものとするが、他のマイクロホン（他のマイクロホンアレイＭＡのマイクロホンであってもよい）の入力信号に置き換えるようにしてもよい。 In the following, as an input signal to be processed in the frequency band division unit 7 and the band-by-band average power spectrum ratio calculating unit 8, it is assumed to be used on behalf of the input signal X ₁ of the microphone array MA1, other microphone (other It may be replaced by an input signal of the microphone array MA (which may be a microphone).

周波数帯域分割部７は、例えば、処理対象の信号（入力信号Ｘ_１及びエリア音出力Ｚ_１）を、それぞれ所定の周波数帯域幅（一定間隔又は不定間隔）で分割する。以下では、周波数帯域分割部７が、処理対象の信号について複数に分割した周波数帯域をそれぞれ「分割帯域」と呼び、各分割帯域の信号（分割対象の信号から分割した信号）を「分割帯域信号」とも呼ぶものとする。 The frequency band dividing unit 7 divides, for example, signals to be processed (input signal X ₁ and area sound output Z ₁ ) at predetermined frequency bandwidths (constant intervals or indefinite intervals). Hereinafter, the frequency bands obtained by dividing the signal to be processed by the frequency band dividing unit 7 into a plurality of parts are referred to as “division bands”, and the signals of each division band (signals divided from the division target signals) are divided into “division band signals I shall call it ".

周波数帯域分割部７は、各分割帯域の帯域幅を均等（等間隔）に設定してもよいし、周波数帯によって偏りを持たせて設定するようにしてもよい。例えば、周波数帯域分割部７は、高周波数であるほど分割帯域を広く設定（低域周波数であるほど分割帯域を狭く設定）するようにしてもよい。例えば、周波数帯域分割部７は、低周波数の帯域（例えば、１ｋＨｚ未満）については１００Ｈｚ間隔で分割帯域を設定し、低周波数でない帯域（例えば、１ｋＨｚ以上）については１ｋＨｚ間隔で分割帯域を設定するようにしてもよい。 The frequency band dividing unit 7 may set the bandwidths of the divided bands to be equal (equally spaced), or may be set to be biased depending on the frequency band. For example, the frequency band division unit 7 may set the division band wider as the frequency is higher (set the division band narrower as the lower frequency). For example, the frequency band division unit 7 sets division bands at 100 Hz intervals for low frequency bands (for example, less than 1 kHz), and sets division bands at 1 kHz intervals for bands that are not low frequencies (for example, 1 kHz or more). You may do so.

また、周波数帯域分割部７は、音声情報（音声の成分）が十分に含まれる所定範囲の帯域（例えば、１００ｈｚ〜６ｋＨｚの範囲）内に分割帯域を設定し、それ以外の周波数帯の信号を捨象（分割帯域の対象外として切り捨て）するようにしてもよい。 Further, the frequency band dividing unit 7 sets divided bands within a predetermined range of band (for example, a range of 100 hz to 6 kHz) in which voice information (component of voice) is sufficiently included, and signals of other frequency bands are set. It may be made to round off (it cuts off as the object of a division | segmentation band).

この実施形態の例では、周波数帯域分割部７は、説明を簡易とするため、処理対象の信号を１ｋＨｚ間隔の分割帯域に分割するものとして以下の説明を行う。 In the example of this embodiment, in order to simplify the description, the frequency band division unit 7 divides the signal to be processed into division bands at intervals of 1 kHz, and the following description will be made.

図２は、周波数帯域分割部７が処理する処理対象信号の例について示した図（帯域ごとのパワースペクトルを示したグラフ）である。 FIG. 2 is a diagram (a graph showing the power spectrum for each band) showing an example of the processing target signal processed by the frequency band division unit 7.

図２では、周波数帯域分割部７が、１００Ｈｚ〜６ｋＨｚまでの帯域の処理対象信号を、概ね１ｋｈｚ間隔で、６つの分割帯域Ｂ_１〜Ｂ_６に分割した例について示している。 FIG. 2 shows an example in which the frequency band dividing unit 7 divides the processing target signal in the band of 100 Hz to 6 kHz into six divided bands B _{1 to} B ₆ at intervals of approximately 1 khz.

帯域別平均パワースペクトル比算出部８は、各処理対象信号（入力信号Ｘ_１及びエリア音出力Ｚ_１）について、周波数帯域分割部７により分割した分割帯域（分割帯域信号）毎に、パワースペクトルを抽出（取得）する。そして、帯域別平均パワースペクトル比算出部８は、分割帯域ごとに（１１）式に基づき、平均パワースペクトル比（各分割帯域内のパワースペクトル比の平均）を算出する。 The band-wise average power spectrum ratio calculation unit 8 calculates the power spectrum for each divided band (division band signal) divided by the frequency band division unit 7 for each processing target signal (input signal X ₁ and area sound output Z ₁ ). Extract (acquire) Then, the band-wise average power spectrum ratio calculating unit 8 calculates an average power spectrum ratio (average of power spectrum ratios in each divided band) based on equation (11) for each divided band.

（１１）式において、「Ｒ_ｊ」は、ｊ番目の分割帯域（ｊは１〜Ｍのいずれかの整数；Ｍは分割した帯域の総数（分割帯域の個数））における平均パワースペクトル比である。また、（１１）式において、「Ｘ_１ｊ」は、マイクロホンアレイＭＡ１の入力信号Ｘ_１におけるｊ番目の分割帯域内の平均パワースペクトル（パワースペクトルの平均値）であり、「Ｚ_１ｊ」はエリア音出力Ｚ_１におけるｊ番目の分割帯域内の平均パワースペクトル（パワースペクトルの平均値）である。 In equation (11), “R _j ” is the average power spectral ratio in the j-th divided band (j is an integer from 1 to M; M is the total number of divided bands (number of divided bands)) . Further, in the equation (11), “X _1j ” is an average power spectrum (average value of power spectrum) within the j-th divided band in the input signal X ₁ of the microphone array MA ₁ , and “Z _1j ” is an area sound is the average power spectrum of the j-th divided band in the output Z ₁ (average value of the power spectrum).

例えば、周波数帯域分割部７が、図２に示すように、各処理対象信号（入力信号Ｘ_１及びエリア音出力Ｚ_１）を６個の分割帯域Ｂ_１〜Ｂ_６に分割した場合を想定する。この場合、帯域別平均パワースペクトル比算出部８は、帯域別平均パワースペクトル比算出部８は、入力信号Ｘ_１の分割帯域Ｂ_１〜Ｂ_６からそれぞれ入力信号の平均パワースペクトルＸ_１１〜Ｘ_１６を取得する。また、帯域別平均パワースペクトル比算出部８は、エリア音出力Ｚ_１の分割帯域Ｂ_１〜Ｂ_６からそれぞれエリア音出力の平均パワースペクトルＺ_１１〜Ｚ_１６を取得する。 For example, as shown in FIG. 2, it is assumed that the frequency band dividing unit 7 divides each processing target signal (input signal X ₁ and area sound output Z ₁ ) into six divided bands B _{1 to} B _6. . In this case, the band-by-band average power spectrum ratio calculating unit 8 and the band-by-band average power spectrum ratio calculating unit 8 respectively input the average power spectra X _{11 to} X _{16 of the} input signal from divided bands B _{1 to} B ₆ of the input signal X _1. To get Further, the band-wise average power spectrum ratio calculating unit 8 acquires the average power spectra Z _{11 to} Z ₁₆ of the area sound output from the divided bands B _{1 to} B ₆ of the area sound output Z ₁ respectively.

そして、帯域別平均パワースペクトル比算出部８は、Ｘ_１１〜Ｘ_１６、及びＺ_１１〜Ｚ_１６を式（１１）に適用して分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６を算出する。 Then, the band average power spectrum ratio calculation unit 8 applies X _{11 to} X ₁₆ and Z _{11 to} Z ₁₆ to equation (11) to calculate average power spectrum ratios R _{1 to} R ₆ for each divided band. .

図３は、帯域別平均パワースペクトル比算出部８が算出した分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６について示した図（グラフ）である。 FIG. 3 is a diagram (graph) showing the average power spectrum ratios R _{1 to} R ₆ for each divided band calculated by the band average power spectrum ratio calculation unit 8.

図３では、分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６と、全帯域での平均パワースペクトル（右端の値）を示している。 FIG. 3 shows the average power spectrum ratio R _{1 to} R ₆ for each divided band and the average power spectrum (the value at the right end) in the entire band.

そして、帯域別平均パワースペクトル比算出部８は、分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６から（１２）式に従って、最も大きい値（平均パワースペクトル比）を、最大平均パワースペクトル比Ｕ_ｍａｘとして取得する。 Then, the band average power spectrum ratio calculation unit 8 calculates the largest value (average power spectrum ratio) from the maximum power spectrum ratio U according to the equation (12) from the average power spectrum ratios R _{1 to} R ₆ for each divided band. Get as _max .

例えば、分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６の値が図３のような結果となった場合、最大平均パワースペクトル比Ｕ_ｍａｘは、分割帯域Ｂ_６の値となり、全帯域での平均パワースペクトルよりも大きい値になっていることが分かる。

For example, when the values of the average power spectrum ratio R _{1 to} R ₆ for each division band are as shown in FIG. 3, the maximum average power spectrum ratio U _max is the value of the division band B ₆ and It can be seen that the value is larger than the average power spectrum.

エリア音判定部９は、帯域別平均パワースペクトル比算出部８により算出した最大平均パワースペクトル比Ｕ_ｍａｘを予め設定した閾値Ｔ１と比較し、目的エリア音が存在するか否か（入力信号に目的エリア音が含まれるか否か）を判定する。エリア音判定部９は、例えば、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超える場合に目的エリア音が存在すると判定し、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１以下の場合に目的エリア音が存在しないと判定するようにしてもよい。 The area sound determination unit 9 compares the maximum average power spectrum ratio U _max calculated by the band average power spectrum ratio calculation unit 8 with the threshold value T1 set in advance to determine whether the target area sound exists (the purpose of the input signal is It is determined whether an area sound is included. The area sound determination unit 9 determines that the target area sound exists, for example, when the maximum average power spectrum ratio U _max exceeds the threshold T1, and the target area sound is present when the maximum average power spectrum ratio U _max is less than or equal to the threshold T1. It may be determined that it does not exist.

エリア音判定部９は、目的エリア音が存在すると判定した場合、エリア収音処理データ（エリア音出力Ｚ_１（抽出音））をそのまま出力するようにしてもよい。一方、逆に目的エリア音が存在しないと判定した場合、エリア音判定部９は、エリア収音処理データ（エリア音出力Ｚ_１（抽出音））は出力せずに無音の音声データを出力するようにしてもよい。なお、エリア音判定部９は、無音の音声データの代わりに、入力信号（例えば、マイクロホンアレイＭＡ１の入力信号Ｘ_１）のゲインを弱めたものを出力しても良い。 When it is determined that the target area sound is present, the area sound determination unit 9 may output the area sound collection processing data (area sound output Z ₁ (extraction sound)) as it is. On the other hand, when it is determined that the target area sound does not exist, the area sound determination unit 9 outputs silent audio data without outputting the area sound collection processing data (area sound output Z ₁ (extraction sound)). You may do so. The area sound determination unit 9 may output a signal obtained by reducing the gain of an input signal (for example, the input signal X ₁ of the microphone array MA1) instead of silent voice data.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of the First Embodiment According to the first embodiment, the following effects can be achieved.

第１の実施形態の収音装置１００では、入力信号（上記の例ではＸ_１）及びエリア音出力Ｚ_１を複数の分割帯域に分割し、分割帯域ごとの平均パワースペクトル比を求め、その最大値である最大平均パワースペクトル比Ｕ_ｍａｘに基づいて、目的エリア音が存在するか否かを判定している。 In the sound collection device 100 of the first embodiment, the input signal (in the above example, X ₁ ) and the area sound output Z ₁ are divided into a plurality of divided bands, and the average power spectral ratio for each divided band is determined. Based on the value, the maximum average power spectrum ratio U _max , it is determined whether the target area sound exists.

言い換えると、第１の実施形態の収音装置１００では、１つでも平均パワースペクトル比が閾値（上記の例ではＴ１）を超える分割帯域があれば、目的エリア音が存在すると判定する。人間の音声を目的エリア音とする場合、無声子音のパワーは小さいが、パワースペクトルにはピークがあるため、帯域を分割すれば、ピークを含む帯域のパワーは大きくなる。上述のような特性が存在するため、分割帯域ごとの平均パワースペクトル比の最大値（最大平均パワースペクトル比Ｕ_ｍａｘ）は、全帯域の平均パワースペクトル比と比べて差（例えば、ミュージカルノイズなどの雑音が発生している非目的エリア音区間と目的エリア音区間との差）が明確になる。したがって、第１の実施形態の収音装置１００では、全帯域の平均パワースペクトル比等を用いた従来の目的エリア音判定と比較して背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる。 In other words, in the sound collection device 100 according to the first embodiment, it is determined that the target area sound is present if there is one divided band in which the average power spectrum ratio exceeds the threshold (T1 in the above example). When human voice is used as the target area sound, the power of unvoiced consonants is small, but there is a peak in the power spectrum, so if the band is divided, the power of the band including the peak becomes large. Because of the characteristics as described above, the maximum value of the average power spectrum ratio per division band (maximum average power spectrum ratio U _max ) is different from the average power spectrum ratio of all bands (eg, musical noise etc.) The difference between the non-target area sound section in which noise is occurring and the target area sound section becomes clear. Therefore, in the sound collection device 100 according to the first embodiment, the determination accuracy of the target area sound is strong in an environment where background noise is strong as compared with the conventional target area sound determination using the average power spectrum ratio of all bands. It can be improved.

さらに、第１の実施形態では、分割帯域ごとの平均パワースペクトル比の最大値（最大平均パワースペクトル比Ｕ_ｍａｘ）を用いた目的エリア音の判定を行うため、目的エリア音判定に用いる帯域をピーク周辺に局所化しつつも、１点のサンプルだけを用いて判定を行うわけではないため、バースト的に発生したノイズに左右されにくい安定的な判定処理を行うことができる。 Furthermore, in the first embodiment, since the target area sound is determined using the maximum value of the average power spectrum ratio (maximum average power spectrum ratio U _max ) for each divided band, the band used for the target area sound determination is a peak. Since the determination is not performed using only one sample while being localized to the periphery, it is possible to perform a stable determination process that is less susceptible to noise generated in a burst manner.

（Ｂ）第２の実施形態
以下、本発明による収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of a sound collection device, program and method, determination device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図４は、この実施形態の収音装置１００Ａの機能的構成について示したブロック図である。図４では、上述の図１と同一部分又は対応部分に同一符号又は対応符号を付している。 (B-1) Configuration of Second Embodiment FIG. 4 is a block diagram showing a functional configuration of the sound collection device 100A of this embodiment. In FIG. 4, the same reference numerals or corresponding reference numerals are given to the same or corresponding parts as those in FIG. 1 described above.

以下では、第２の実施形態の収音装置１００Ａについて、第１の実施形態との差異を説明する。 In the following, differences from the first embodiment will be described in the sound collection device 100A of the second embodiment.

収音装置１００Ａでは、エリア音判定部９がエリア音判定部９Ａに置き換わり、さらに、全帯域平均パワースペクトル比算出部１０が追加されている点で、第１の実施形態と異なっている。 The sound collection device 100A differs from the first embodiment in that the area sound determination unit 9 is replaced with the area sound determination unit 9A, and further, the all band average power spectrum ratio calculation unit 10 is added.

全帯域平均パワースペクトル比算出部１０は、全帯域で平均パワースペクトル比を算出するものである。 The all-band average power spectrum ratio calculation unit 10 calculates an average power spectrum ratio in all bands.

エリア音判定部９Ａでは、周波数帯域分割部７、帯域別平均パワースペクトル比算出部８、及び全帯域平均パワースペクトル比算出部１０を制御して、目的エリア音の有無を判定する。 The area sound determination unit 9A controls the frequency band division unit 7, the band average power spectrum ratio calculation unit 8, and the all band average power spectrum ratio calculation unit 10 to determine the presence or absence of the target area sound.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の収音装置１００Ａの動作（実施形態に係る判定方法、及び収音方法）について第１の実施形態との差異を説明する。 (B-2) Operation of Second Embodiment Next, regarding the operation (the determination method and the sound collection method according to the embodiment) of the sound collection device 100A of the second embodiment having the configuration as described above. The difference from the embodiment of FIG.

収音装置１００Ａでは、エリア音判定部９Ａによる目的エリア音の判定処理が異なる点で第１の実施形態と異なる。以下では、エリア音判定部９Ａを中心とした目的エリア音の判定処理について説明する。 The sound collection device 100A is different from the first embodiment in that the determination process of the target area sound by the area sound determination unit 9A is different. Hereinafter, determination processing of the target area sound centering on the area sound determination unit 9A will be described.

図５は、収音装置１００Ａ（エリア音判定部９Ａ）による目的エリア音の判定処理について示したフローチャートである。 FIG. 5 is a flowchart showing the process of determining a target area sound by the sound collection device 100A (area sound determination unit 9A).

図５のフローチャートにおいて、エリア音判定処理に用いるＴ１、Ｔ２、Ｔ３は閾値である。閾値Ｔ１は第１の実施形態と同様のものを適用することができる。また、閾値Ｔ２は、閾値Ｔ３よりも大きい値（Ｔ２＞Ｔ３）であるものとする。「Ｔ１」と、「Ｔ２、Ｔ３」の大小関係は限定されないものであり、実験等により確認された好適な値を適用することができる。 In the flowchart of FIG. 5, T1, T2, and T3 used in the area sound determination process are threshold values. The threshold value T1 can be the same as that in the first embodiment. Further, the threshold T2 is a value larger than the threshold T3 (T2> T3). The magnitude relationship between “T1” and “T2, T3” is not limited, and a suitable value confirmed by an experiment or the like can be applied.

エリア音判定部９Ａは、まず、全帯域平均パワースペクトル比算出部１０を制御して、全帯域平均パワースペクトル比を算出させる（Ｓ１０１）。 First, the area sound determination unit 9A controls the all-band average power spectrum ratio calculation unit 10 to calculate the all-band average power spectrum ratio (S101).

全帯域平均パワースペクトル比算出部１０は、（９）式、（１０）式に従い全帯域平均パワースペクトル比を算出する。 The all-band average power spectrum ratio calculation unit 10 calculates the all-band average power spectrum ratio according to equations (9) and (10).

次に、エリア音判定部９Ａは、全帯域平均パワースペクトル比算出部１０が算出した全帯域平均パワースペクトル比が、閾値Ｔ２を超えているか否か（Ｕ＞Ｔ２か否か）を判断する（Ｓ１０２）。エリア音判定部９Ａは、全帯域平均パワースペクトル比が、閾値Ｔ２を超えている場合後述するステップＳ１０４から動作し、そうでない場合には後述するステップＳ１０３から動作する。 Next, the area sound determination unit 9A determines whether the all-band average power spectrum ratio calculated by the all-band average power spectrum ratio calculation unit 10 exceeds the threshold T2 (whether U> T2 or not) ( S102). The area sound determination unit 9A operates from step S104 described later when the all-band average power spectrum ratio exceeds the threshold value T2, and otherwise operates from step S103 described later.

全帯域平均パワースペクトル比が閾値Ｔ２を超えている場合（Ｕ＞Ｔ２の場合）、エリア音判定部９Ａは、目的エリア音は存在すると判断し（Ｓ１０４）、目的エリア音の判定処理を終了する。 If the all-band average power spectrum ratio exceeds the threshold T2 (in the case of U> T2), the area sound determination unit 9A determines that the target area sound exists (S104), and ends the target area sound determination process. .

一方、全帯域平均パワースペクトル比が閾値Ｔ２以下の場合（Ｕ≦Ｔ２の場合）、エリア音判定部９Ａは、全帯域平均パワースペクトル比が、閾値Ｔ３を超えているか否か（Ｕ＞Ｔ３か否か）を判断する（Ｓ１０３）。エリア音判定部９Ａは、全帯域平均パワースペクトル比が、閾値Ｔ３を超えている場合後述するステップＳ１０５から動作し、そうでない場合には後述するステップＳ１０８から動作する。 On the other hand, if the all band average power spectrum ratio is less than or equal to the threshold T2 (in the case of U ≦ T2), the area sound determination unit 9A determines whether the all band average power spectrum ratio exceeds the threshold T3 (U> T3? Or not) is judged (S103). The area sound determination unit 9A operates from step S105 described later when the all-band average power spectrum ratio exceeds the threshold T3, and operates from step S108 described later otherwise.

全帯域平均パワースペクトル比が閾値Ｔ３を超えている場合（Ｕ＞Ｔ３の場合）、エリア音判定部９Ａは、周波数帯域分割部７及び帯域別平均パワースペクトル比算出部８を制御して、第１の実施形態と同様の処理により、分割帯域ごとに、平均パワースペクトル比を算出させる（Ｓ１０５）。 When the all band average power spectrum ratio exceeds the threshold T3 (in the case of U> T3), the area sound determination unit 9A controls the frequency band division unit 7 and the band average power spectrum ratio calculation unit 8 to The average power spectrum ratio is calculated for each divided band by the same processing as that of the first embodiment (S105).

次に、エリア音判定部９Ａは、第１の実施形態と同様に、帯域別平均パワースペクトル比算出部８を制御して、分割帯域ごとの平均パワースペクトル比から最大平均パワースペクトル比Ｕ_ｍａｘを算出させ、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超えるか否か判定する（Ｓ１０６）。言い換えると、エリア音判定部９Ａ及び帯域別平均パワースペクトル比算出部８は、分割帯域ごとの平均パワースペクトル比に閾値Ｔ１を超えるものがあるか否かを判定する処理を行うことになる。 Next, the area sound judging unit 9A controls the band average power spectrum ratio calculating unit 8 to calculate the maximum average power spectrum ratio U _max from the average power spectrum ratio for each divided band, as in the first embodiment. It is calculated and it is determined whether the maximum average power spectrum ratio U _max exceeds the threshold T1 (S106). In other words, the area sound determination unit 9A and the per-band average power spectrum ratio calculation unit 8 perform processing to determine whether or not there is an average power spectrum ratio for each divided band that exceeds the threshold T1.

最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超える場合（分割帯域ごとの平均パワースペクトル比に閾値Ｔ１を超えるものがある場合）、エリア音判定部９Ａは、後述するステップＳ１０７から動作し、そうでない場合後述するステップＳ１０８から動作する。 If the maximum average power spectrum ratio U _max exceeds the threshold T 1 (if there is an average power spectrum ratio for each divided band exceeding the threshold T 1), the area sound determination unit 9 A operates from step S 107 described later, and not so In this case, the operation starts from step S108 described later.

最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超える場合（Ｕ_ｍａｘ＞Ｔ１の場合）、エリア音判定部９Ａは、目的エリア音は存在すると判断し（Ｓ１０７）、目的エリア音の判定処理を終了する。 When the maximum average power spectrum ratio U _max exceeds the threshold T1 (in the case of U _max > T1), the area sound determination unit 9A determines that the target area sound exists (S107), and ends the target area sound determination process. .

一方、上述のステップＳ１０３で全帯域平均パワースペクトル比が閾値Ｔ３以下の場合（Ｕ≦Ｔ３の場合）、又は上述のステップＳ１０６で最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１以下の場合（Ｕ_ｍａｘ≦Ｔ１の場合）、エリア音判定部９Ａは、目的エリア音は存在しないと判断し（Ｓ１０８）、目的エリア音の判定処理を終了する。 On the other hand, if the all band average power spectrum ratio is less than or equal to the threshold T3 in step S103 described above (U ≦ T3), or if the maximum average power spectrum ratio U _max is less than or equal to the threshold T1 in step S106 described above (U _max ≦ In the case of T1), the area sound determination unit 9A determines that the target area sound does not exist (S108), and ends the process of determining the target area sound.

エリア音判定部９Ａは、まず、全帯域平均パワースペクトル比算出部１０に全帯域平均パワースペクトル比を算出させて、全帯域平均パワースペクトル比に基づいた目的エリア音の判定処理（以下、「第１の判定処理」と呼ぶ）を行う。具体的には、エリア音判定部９Ａは上述の通り全帯域平均パワースペクトル比が閾値Ｔ２より大きい場合には目的エリア音は存在すると判定し、全帯域平均パワースペクトル比が閾値Ｔ３以下の場合は目的エリア音は存在しないと判定する。 The area sound determination unit 9A first causes the all band average power spectrum ratio calculation unit 10 to calculate the all band average power spectrum ratio, and the target area sound determination process based on the all band average power spectrum ratio (hereinafter 1) is performed). Specifically, as described above, the area sound determination unit 9A determines that the target area sound is present when the all-band average power spectrum ratio is larger than the threshold T2, and the all-band average power spectrum ratio is less than or equal to the threshold T3. It is determined that the target area sound does not exist.

そして、エリア音判定部９Ａは、全帯域平均パワースペクトル比が閾値Ｔ２以下で、閾値Ｔ３を超える場合（Ｔ２≦Ｕ＞Ｔ３）には、第１の判定処理では目的エリア音の判定はできないと判断し、周波数帯域分割部７及び帯域別平均パワースペクトル比算出部８を制御して、第１の実施形態と同様の処理により、最大平均パワースペクトル比Ｕ_ｍａｘを算出させ、最大平均パワースペクトル比Ｕ_ｍａｘに基づいた目的エリア音の有無を判定する処理（以下、「第２の判定処理」と呼ぶ）を行う。 The area sound determination unit 9A determines that the target area sound can not be determined in the first determination process when the all-band average power spectrum ratio is less than or equal to the threshold T2 and exceeds the threshold T3 (T2 ≦ U> T3). Determine the maximum average power spectrum ratio U _max by the same processing as in the first embodiment by controlling the frequency band dividing unit 7 and the band average power spectrum ratio calculating unit 8, and calculating the maximum average power spectrum ratio A process of determining the presence or absence of the target area sound based on U _max (hereinafter, referred to as “second determination process”) is performed.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved.

第２の実施形態の収音装置１００Ａ（エリア音判定部９Ａ）は、まず全帯域平均パワースペクトル比に基づいて目的エリア音の判定処理（第１の判定処理）を行い、全帯域平均パワースペクトル比が閾値Ｔ２以下で、閾値Ｔ３を超える場合（Ｔ２≦Ｕ＞Ｔ３）には最大平均パワースペクトル比Ｕ_ｍａｘに基づいて目的エリア音の判定処理（第２の判定処理）を行う。 The sound collection device 100A (area sound determination unit 9A) according to the second embodiment first performs target area sound determination processing (first determination processing) based on the whole band average power spectrum ratio, and all band average power spectrum When the ratio is less than or equal to the threshold T2 and exceeds the threshold T3 (T2 ≦ U> T3), the target area sound determination process (second determination process) is performed based on the maximum average power spectrum ratio _Umax .

これにより、収音装置１００Ａ（エリア音判定部９Ａ）は、第１の判定処理（全帯域平均パワースペクトル比に基づいた判定処理）のみで充分な精度で目的エリア音の判定処理が可能な場合（例えば、全帯域平均パワースペクトル比が十分大きい場合）には、第２の判定処理（帯域分割の処理等）は行わない。一方、収音装置１００Ａ（エリア音判定部９Ａ）は、第１の判定処理（全帯域平均パワースペクトル比に基づいた判定処理）では充分な精度で目的エリア音の判定処理ができない場合（例えば、無声子音のように平均パワースペクトル比Ｕが小さいとき）にのみ、第２の判定処理（帯域分割により最大平均パワースペクトル比Ｕ_ｍａｘを算出して目的エリア音を判定する処理）を行う。 Thereby, the case where the target area sound determination processing can be performed with sufficient accuracy only with the first determination processing (the determination processing based on the all-band average power spectrum ratio) of the sound collection device 100A (area sound determination unit 9A) (For example, when the all band average power spectrum ratio is sufficiently large), the second determination processing (processing of band division and the like) is not performed. On the other hand, when the sound pickup device 100A (area sound judgment unit 9A) can not judge the target area sound with sufficient accuracy in the first judgment process (the judgment process based on the all band average power spectrum ratio) (for example, The second determination processing (processing for calculating the maximum average power spectrum ratio U _max by band division and determining the target area sound) is performed only when the average power spectrum ratio U is small as in the case of unvoiced consonants.

すなわち、収音装置１００Ａ（エリア音判定部９Ａ）は、第１の判定処理では充分な精度で目的エリア音の判定処理が可能な場合にのみ、より処理量の多い帯域分割を伴う第２の判定処理を行うため、効率的な目的エリア音の判定処理を行うことができる。 That is, the sound collection device 100A (area sound determination unit 9A) performs the second process involving band division with a larger amount of processing only when the determination process of the target area sound can be performed with sufficient accuracy in the first determination process. Since the determination process is performed, the target area sound determination process can be performed efficiently.

（Ｃ）第３の実施形態
以下、本発明による収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法の第３の実施形態を、図面を参照しながら詳述する。 (C) Third Embodiment Hereinafter, a third embodiment of a sound collection device, program and method, determination device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ｃ−１）第３の実施形態の構成
図６は、この実施形態の収音装置１００Ｂの機能的構成について示したブロック図である。図６では、上述の図１と同一部分又は対応部分に同一符号又は対応符号を付している。 (C-1) Configuration of Third Embodiment FIG. 6 is a block diagram showing a functional configuration of the sound collection device 100B of this embodiment. In FIG. 6, the same or corresponding parts as those in FIG. 1 described above are denoted by the same reference numerals.

以下では、第３の実施形態の収音装置１００Ｂについて、第１の実施形態との差異を説明する。 In the following, differences from the first embodiment will be described in the sound collection device 100B of the third embodiment.

収音装置１００Ｂでは、エリア音判定部９がエリア音判定部９Ｂに置き換わり、さらに、帯域間パワースペクトル比算出部１１が追加されている点で、第１の実施形態と異なっている。 The sound collection device 100B differs from the first embodiment in that the area sound determination unit 9 is replaced with the area sound determination unit 9B and the inter-band power spectrum ratio calculation unit 11 is further added.

帯域間パワースペクトル比算出部１１は、帯域別平均パワースペクトル比算出部８が求めた分割帯域ごとの平均パワースペクトル比から最小値（以下、「最小平均パワースペクトル比Ｕ_ｍｉｎ」と呼ぶ）を算出する。そして、帯域間パワースペクトル比算出部１１は、帯域別平均パワースペクトル比算出部８が求めた最大平均パワースペクトル比Ｕ_ｍａｘ（分割帯域ごとの平均パワースペクトル比Ｒの最大値）と、最小平均パワースペクトル比Ｕ_ｍｉｎの比（以下、「帯域間パワースペクトル比Ｖ」と呼ぶ）を求める。 The inter-band power spectrum ratio calculation unit 11 calculates a minimum value (hereinafter referred to as "minimum average power spectrum ratio U _min ") from the average power spectrum ratio for each divided band obtained by the band average power spectrum ratio calculation unit 8 Do. Then, the inter-band power spectrum ratio calculation unit 11 calculates the maximum average power spectrum ratio U _max (the maximum value of the average power spectrum ratio R for each divided band) calculated by the band average power spectrum ratio calculation unit 8 and the minimum average power The ratio of the spectrum ratio U _min (hereinafter referred to as “inter-band power spectrum ratio V”) is determined.

そして、エリア音判定部９Ｂは、帯域間パワースペクトル比Ｖに基づいて目的エリア音を判定する点で、第１の実施形態と異なっている。 The area sound determination unit 9B is different from the first embodiment in that the target area sound is determined based on the inter-band power spectrum ratio V.

なお、第２の実施形態において、第２の判定処理を、帯域間パワースペクトル比Ｖを用いた判定処理に置き換えるようにしてもよい。 In the second embodiment, the second determination process may be replaced with a determination process using the inter-band power spectrum ratio V.

（Ｃ−２）第３の実施形態の動作
次に、以上のような構成を有する第３の実施形態の収音装置１００Ｂの動作（実施形態に係る判定方法、及び収音方法）について第１の実施形態との差異を説明する。 (C-2) Operation of Third Embodiment Next, an operation (a determination method and a sound collection method according to the embodiment) of the sound collection device 100B of the third embodiment having the above configuration will be described. The difference from the embodiment of FIG.

収音装置１００Ｂでは、エリア音判定部９Ｂによる目的エリア音の判定処理が異なる点で第１の実施形態と異なる。以下では、エリア音判定部９Ｂを中心とした目的エリア音の判定処理について説明する。 The sound collection device 100B is different from the first embodiment in that the determination process of the target area sound by the area sound determination unit 9B is different. Hereinafter, determination processing of the target area sound centering on the area sound determination unit 9B will be described.

帯域間パワースペクトル比算出部１１は、（１３）式に従い、帯域別平均パワースペクトル比算出部８が求めた分割帯域ごとの平均パワースペクトル比から最小平均パワースペクトル比Ｕ_ｍｉｎを求める。 The inter-band power spectrum ratio calculating unit 11 obtains the minimum average power spectrum ratio U _min from the average power spectrum ratio for each divided band obtained by the band average power spectrum ratio calculating unit 8 according to the equation (13).

そして、帯域間パワースペクトル比算出部１１は、（１４）式に従い、最大平均パワースペクトル比Ｕ_ｍａｘ及び最小平均パワースペクトル比Ｕ_ｍｉｎに基づき帯域間パワースペクトル比Ｖを算出する。

Then, the inter-band power spectrum ratio calculation unit 11 calculates the inter-band power spectrum ratio V based on the maximum average power spectrum ratio U _max and the minimum average power spectrum ratio U _min according to equation (14).

例えば、分割帯域ごとの平均パワースペクトル比（Ｒ_１〜Ｒ_６）の値が図３のような結果となった場合、最大平均パワースペクトル比Ｕ_ｍａｘは、分割帯域Ｂ_６の値となり、最小平均パワースペクトル比Ｕ_ｍｉｎの値は分割帯域Ｂ_３の値となる。 For example, when the value of the average power spectrum ratio (R _{1 to} R ₆ ) for each division band is as shown in FIG. 3, the maximum average power spectrum ratio U _max is the value of division band B ₆ and the minimum average the value of the power spectral ratios _{U min} is a value divided band _{B 3.}

エリア音判定部９Ｂは、帯域間パワースペクトル比Ｖと閾値Ｔ４を比較し、帯域間パワースペクトル比Ｖが閾値Ｔ４より大きい場合（Ｖ＞Ｔ４の場合）には目的エリア音が存在すると判定し、帯域間パワースペクトル比Ｖが閾値Ｔ４以下の場合（Ｖ≦Ｔ４の場合）目的エリア音は存在しないと判定するものとする。 The area sound determination unit 9B compares the inter-band power spectrum ratio V with the threshold T4, and determines that the target area sound exists if the inter-band power spectrum ratio V is larger than the threshold T4 (V> T4), When the inter-band power spectrum ratio V is equal to or less than the threshold value T4 (when V ≦ T4), it is determined that the target area sound does not exist.

（Ｃ−３）第３の実施形態の効果
第３の実施形態によれば、第１の実施形態と比較して以下のような効果を奏することができる。 (C-3) Effects of Third Embodiment According to the third embodiment, the following effects can be achieved as compared to the first embodiment.

第３の実施形態の収音装置１００Ｂでは、帯域間パワースペクトル比Ｖに基づいて目的エリア音を検出するので、より小さなパワースペクトルの目的エリア音成分も検出することができる。 In the sound collection device 100B of the third embodiment, the target area sound is detected based on the inter-band power spectrum ratio V, so that the target area sound component of a smaller power spectrum can also be detected.

（Ｄ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (D) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｄ−１）上記の各実施形態において、エリア音判定部９（９Ａ、９Ｂ）は、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１よりも一定以上大きい場合、その後の数秒間は、最大平均パワースペクトル比Ｕ_ｍａｘに関わらず目的エリア音が存在すると判定する機能（ハングオーバー機能）に対応するようにしてもよい。 (D-1) In the above embodiments, when the maximum average power spectrum ratio U _max is larger than the threshold T1 by a certain amount or more, the area sound determination unit 9 (9A, 9B) determines the maximum average power for several seconds thereafter. It may be made to correspond to the function (hangover function) of judging that the target area sound exists regardless of the spectral ratio U _max .

（Ｄ−２）上記の各実施形態の収音装置（判定装置）では、分割帯域ごとの平均パワースペクトル比を算出し、その最大値である最大平均パワースペクトル比Ｕ_ｍａｘを目的エリア音判定に利用しているが、各分割帯域におけるパワースペクトル比の平均値（平均パワースペクトル比）ではなく、各分割帯域におけるパワースペクトル比の代表値を１つ取得し、その代表値（以下、「代表パワースペクトル比」と呼ぶ）の最大値（以下、「最大代表パワースペクトル比」と呼ぶ）を、最大平均パワースペクトル比Ｕ_ｍａｘに置き換えて利用するようにしてもよい。 (D-2) In the sound collection device (determination device) of each of the above embodiments, the average power spectrum ratio for each divided band is calculated, and the maximum average power spectrum ratio U _max as its maximum value is used as the target area sound determination. Although it is used, one representative value of the power spectrum ratio in each division band is acquired instead of the average value (average power spectrum ratio) of the power spectrum ratio in each division band, and the representative value (hereinafter referred to as “representation power” The maximum value (hereinafter referred to as “maximum representative power spectrum ratio”) of the “spectral ratio” may be used by replacing it with the maximum average power spectrum ratio U _max .

すなわち、上記の各実施形態において、帯域別平均パワースペクトル比算出部８が、各分割帯域から代表パワースペクトル比を取得し、各分割帯域の代表パワースペクトル比の最大値を最大代表パワースペクトル比として取得し、最大平均パワースペクトル比Ｕ_ｍａｘに置き換えて目的エリア音判定に利用するようにしてもよい。上記の各実施形態において、各分割帯域から代表パワースペクトル比（代表値）を取得する位置は限定されないものであるが、例えば、中央値等を取得するようにしてもよい。 That is, in each of the above embodiments, the band average power spectrum ratio calculating unit 8 obtains the representative power spectrum ratio from each divided band, and sets the maximum value of the representative power spectrum ratio of each divided band as the maximum representative power spectrum ratio. It may be acquired and replaced with the maximum average power spectrum ratio U _max to be used for target area sound determination. In each of the above-described embodiments, the position at which the representative power spectrum ratio (representative value) is obtained from each divided band is not limited. However, for example, a median or the like may be obtained.

以上のように、上記の各実施形態では、目的エリア音判定において、分割帯域ごとのパワースペクトル比（例えば、平均パワースペクトル比や代表パワースペクトル比）の最大値（例えば、最大平均パワースペクトル比Ｕ_ｍａｘや最大代表パワースペクトル比等）を用いる。 As described above, in each of the above embodiments, in the target area sound determination, the maximum value (for example, the maximum average power spectrum ratio U) of the power spectrum ratio (for example, the average power spectrum ratio or the representative power spectrum ratio) for each divided band _{Use max} , maximum representative power spectrum ratio, etc.).

１００…収音装置（判定装置）、１…信号入力部、２…指向性形成部、３…遅延補正部、４…空間座標データ、５…目的エリア音パワー補正係数算出部、６…目的エリア音抽出部、７…周波数帯域分割部、８…帯域別平均パワースペクトル比算出部、９…エリア音判定部、ＭＡ、ＭＡ１、ＭＡ２…マイクロホンアレイ、Ｍ、Ｍ１、Ｍ２…マイクロホン。 100 ... sound collecting device (determination device), 1 ... signal input unit, 2 ... directivity forming unit, 3 ... delay correction unit, 4 ... space coordinate data, 5 ... target area sound power correction coefficient calculation unit, 6 ... target area Sound extraction unit 7 Frequency band division unit 8 Average power spectrum ratio calculation unit according to band 9 Area sound determination unit MA, MA1, MA2 Microphone array M, M1, M2 Microphone.

Claims

Directivity forming means for forming directivity from an input signal to a target area direction by a beam former;
Non-target area sound extraction means for extracting non-target area sound existing in the direction of a target area according to directivity formed by the directivity forming means;
Target area sound extraction means for extracting an extraction sound of a result of extracting a target area sound using the non-target area sound present in the direction of the target area extracted by the non-target area sound extraction means from the output of the beamformer ,
Band dividing means for dividing the input signal and the extracted sound into a plurality of bands respectively;
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means;
A determination unit that determines whether a target area sound is present in the input signal using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation unit;
An output unit that outputs the extracted sound as a sound collection result when it is determined by the determination unit that the target area sound is present ;
And an all-band average power spectrum ratio calculating unit configured to calculate an average power spectrum ratio of all the bands of the input signal and the extracted sound;
The determination means first determines whether the target area sound is present in the input signal based on the average power spectrum ratio of the entire band calculated by the all band average power spectrum ratio calculation means. Do the processing,
The band dividing means divides the input signal and the extracted sound into a plurality of bands when it is not possible to determine whether the target area sound is present in the input signal in the first determination process,
The power spectrum ratio calculation means may not perform the input for each of the bands divided by the band division means when it is not possible to determine whether the target area sound is present in the input signal in the first determination process. Calculating the power spectrum ratio of the signal and the extracted sound,
The determination means may not determine whether the target area sound is present in the input signal in the first determination process, the input signal may be calculated from the power spectrum ratio calculated by the power spectrum ratio calculation means. And a second determination process of determining whether a target area sound is present or not .

When the average power spectrum ratio of the entire band calculated by the all-band average power spectrum ratio calculation unit exceeds the second threshold value in the first determination process, the determination unit determines the target area of the input signal. If it is determined that sound is present, and the average power spectrum ratio of the entire band calculated by the all-band average power spectrum ratio calculating means is a value equal to or less than a third threshold value smaller than the second threshold value When it is determined that the target area sound does not exist, and the average power spectrum ratio of the entire band calculated by the all-band average power spectrum ratio calculating unit exceeds the third threshold and is equal to or less than the second threshold. the pickup device according to claim 1, characterized in that to obtain a result that can not be determined whether the sound object area is present in the input signal.

The determination means is a target area for the input signal according to the comparison result of the maximum value of the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation means and the first threshold value in the second determination process. The sound pickup device according to claim 1 or 2 , wherein it is determined whether a sound is present.

Computer,
Directivity forming means for forming directivity from an input signal to a target area direction by a beam former;
Non-target area sound extraction means for extracting non-target area sound existing in the direction of a target area according to directivity formed by the directivity forming means;
Target area sound extraction means for extracting an extraction sound of a result of extracting a target area sound using the non-target area sound present in the direction of the target area extracted by the non-target area sound extraction means from the output of the beamformer ,
Band dividing means for dividing the input signal and the extracted sound into a plurality of bands respectively;
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means;
A determination unit that determines whether a target area sound is present in the input signal using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation unit;
An output unit that outputs the extracted sound as a sound collection result when it is determined by the determination unit that the target area sound is present;
Function as total band average power spectrum ratio calculating means for calculating an average power spectrum ratio of all bands of the input signal and the extracted sound;
The determination means first determines whether the target area sound is present in the input signal based on the average power spectrum ratio of the entire band calculated by the all band average power spectrum ratio calculation means. Do the processing,
The band dividing means divides the input signal and the extracted sound into a plurality of bands when it is not possible to determine whether the target area sound is present in the input signal in the first determination process,
The power spectrum ratio calculation means may not perform the input for each of the bands divided by the band division means when it is not possible to determine whether the target area sound is present in the input signal in the first determination process. Calculating the power spectrum ratio of the signal and the extracted sound,
The determination means may not determine whether the target area sound is present in the input signal in the first determination process, the input signal may be calculated from the power spectrum ratio calculated by the power spectrum ratio calculation means. A sound collecting program characterized by performing a second determination process of determining whether or not there is a target area sound .

A directivity forming means, non-target area sound extraction unit, destination area sound extraction unit, the band dividing means, the power spectrum ratio calculating means, determining means, outputs means, and the entire band average power spectrum ratio calculating means,
The directivity forming unit forms directivity from the input signal to a target area direction by a beam former,
The non-target area sound extraction unit extracts non-target area sound existing in the direction of the target area according to the directivity formed by the directivity formation unit.
The target area sound extraction means extracts an extraction sound as a result of extracting a target area sound from the output of the beamformer using the non-target area sound existing in the direction of the target area extracted by the non-target area sound extraction means. Output
The band dividing means divides the input signal and the extracted sound into a plurality of bands, respectively.
The power spectrum ratio calculating means calculates a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means,
The determination means determines whether or not a target area sound is present in the input signal, using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation means.
The output unit outputs the extracted sound as a sound collection result when it is determined by the determination unit that the target area sound exists.
The all band average power spectrum ratio calculating means calculates an average power spectrum ratio of all bands of the input signal and the extracted sound;
The determination means first determines whether the target area sound is present in the input signal based on the average power spectrum ratio of the entire band calculated by the all band average power spectrum ratio calculation means. Do the processing,
The band dividing means divides the input signal and the extracted sound into a plurality of bands when it is not possible to determine whether the target area sound is present in the input signal in the first determination process,
The power spectrum ratio calculation means may not perform the input for each of the bands divided by the band division means when it is not possible to determine whether the target area sound is present in the input signal in the first determination process. Calculating the power spectrum ratio of the signal and the extracted sound,
The determination means may not determine whether the target area sound is present in the input signal in the first determination process, the input signal may be calculated from the power spectrum ratio calculated by the power spectrum ratio calculation means. A second determination process of determining whether a target area sound is present in the second area .