CN105403860B

CN105403860B - A kind of how sparse sound localization method related based on domination

Info

Publication number: CN105403860B
Application number: CN201410451825.5A
Authority: CN
Inventors: 应冬文; 李军锋; 冯永强; 潘接林; 颜永红
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2014-08-19
Filing date: 2014-08-19
Publication date: 2017-10-31
Anticipated expiration: 2034-08-19
Also published as: CN105403860A

Abstract

The present invention relates to a multi-sparse sound source localization method based on dominance correlation, comprising: converting the sound source signal received through a microphone array into a digital sound signal; extracting the frequency spectrum of the digital sound signal of each microphone; Calculate the spatial correlation matrix on each frequency point from the spectrum of the digital sound signals of all microphones on the point; extract the main eigenvector of the spatial correlation matrix; determine the time delay set of all microphone pairs on each frequency point; use an iterative method to calculate The azimuth angle of the incident direction of the dominant sound source at each frequency point; the azimuth angle of the incident direction of the dominant sound source at all frequency points is statistically analyzed to determine the final incident direction of the sound source and the number of sound sources. The method considers acoustic robustness and is suitable for real-time localization of multiple sparse sound sources.

Description

A multi-sparse sound source localization method based on dominance correlation

技术领域technical field

本发明涉及声源定位领域，尤其涉及一种基于支配相关的多稀疏声源定位方法。The invention relates to the field of sound source localization, in particular to a multi-sparse sound source localization method based on dominance correlation.

背景技术Background technique

声源定位包括单声源定位和多声源定位，声源定位技术可以指示声源目标所在的空间方位，为后续的信息采集与处理提供重要的空间信息。Sound source localization includes single sound source localization and multiple sound source localization. The sound source localization technology can indicate the spatial orientation of the sound source target and provide important spatial information for subsequent information collection and processing.

多声源定位技术在麦克风阵列的应用中占据重要位置，它可用于远程会议，为麦克风阵列指示波束聚焦的方向，为会议摄像头提供指向信息。Multi-sound source localization technology occupies an important position in the application of microphone arrays. It can be used in remote conferences to indicate the direction of beam focus for microphone arrays and provide pointing information for conference cameras.

在声源定位领域中，基于多源信号分类和空间波束扫描的声源定位算法是应用最为广泛的算法，它以噪声和混响条件下的鲁棒性而著称。然而，这些算法存在一个共同的特点：它们的代价函数非凸非凹，导致存在多个极值对应于声源位置的最优估计。因此，这类方法普遍采用格点遍历的方式寻求声源位置的全局最优解，导致计算量急剧上升，很难满足实时定位的需求。In the field of sound source localization, the sound source localization algorithm based on multi-source signal classification and spatial beam scanning is the most widely used algorithm, and it is famous for its robustness under noise and reverberation conditions. However, these algorithms have a common feature: their cost functions are non-convex and non-concave, resulting in multiple extrema corresponding to the optimal estimation of the sound source position. Therefore, such methods generally use grid traversal to find the global optimal solution of the sound source location, resulting in a sharp increase in the amount of calculations, and it is difficult to meet the needs of real-time positioning.

基于时频域稀疏性的多声源定位方法克服了上述难题，此方法假定每个频点上只有一个声源占据支配地位，其它声源的干扰可以忽略不计。这一假设将一个多声源定位的问题简化为单个频点上的单声源定位问题，虽然大幅度降低了计算量，但是缺乏声学鲁棒性导致这类方法在复杂声学环境下难以表现出优良的性能。The multi-sound source localization method based on sparsity in the time-frequency domain overcomes the above-mentioned difficulties. This method assumes that only one sound source occupies a dominant position at each frequency point, and the interference of other sound sources can be ignored. This assumption simplifies a multi-sound source localization problem to a single sound source localization problem at a single frequency point. Although the amount of calculation is greatly reduced, the lack of acoustic robustness makes it difficult for such methods to perform well in complex acoustic environments. excellent performance.

发明内容Contents of the invention

本发明的目的在于克服现有多声源定位方法中存在的计算量大，缺乏声学鲁棒性等缺陷，利用稀疏声源在时频域的支配相关性，从而提供一种鲁棒而高效的多声源定位方法。The purpose of the present invention is to overcome the defects of large amount of calculation and lack of acoustic robustness existing in the existing multi-sound source localization method, and utilize the dominant correlation of sparse sound sources in the time-frequency domain to provide a robust and efficient Multi-source localization method.

为了实现上述目的，本发明提出一种基于支配相关的多稀疏声源定位方法，包括：In order to achieve the above object, the present invention proposes a multi-sparse sound source localization method based on dominance correlation, including:

步骤101)，将通过麦克风阵列接收的声源信号转换成数字声音信号，所述麦克风阵列包括K个麦克风；Step 101), converting the sound source signal received by the microphone array into a digital sound signal, and the microphone array includes K microphones;

步骤102)，提取每个麦克风的数字声音信号的频谱；Step 102), extract the frequency spectrum of the digital sound signal of each microphone;

步骤103)，利用t时刻相邻时间相同频点上所有麦克风的数字声音信号的频谱计算每个频点上的空间相关矩阵；Step 103), utilize the frequency spectrum of the digital sound signal of all microphones to calculate the spatial correlation matrix on each frequency point on the same frequency point of adjacent time at t moment;

步骤104)，对t时刻每个频点上的空间相关矩阵进行分解，得到t时刻每个频点上的主特征向量，所述主特征向量的每一个分量对应一个麦克风的采集信号；Step 104), decompose the spatial correlation matrix on each frequency point at time t to obtain the main eigenvector at each frequency point at time t, and each component of the main eigenvector corresponds to the acquisition signal of a microphone;

步骤105)，利用t时刻每个频点上的主特征向量求取t时刻每个频点上M对麦克风的时间延迟集合，所述M等于K(K-1)2；Step 105), using the principal eigenvector on each frequency point at t time to obtain the time delay set of M pairs of microphones at each frequency point at t time, said M equals K(K-1) 2;

步骤106)，根据t时刻每个频点上M对麦克风的时间延迟集合，采用迭代的方法，计算t时刻每个频点上处于支配地位的声源入射方向的方位角；Step 106), according to the time delay set of M pairs of microphones on each frequency point at t time, adopt the iterative method to calculate the azimuth angle of the sound source incident direction that is in a dominant position at each frequency point at t time;

步骤107)，对t时刻所有频点的处于支配地位的声源入射方向的方位角进行统计分析；Step 107), statistically analyzing the azimuth angles of the dominant sound source incident directions of all frequency points at time t;

步骤108)，输出最终确定的t时刻声源入射方向和声源个数。Step 108), outputting the finally determined incident direction of the sound source and the number of sound sources at time t.

上述技术方案中，在步骤105)中，所述利用t时刻每个频点上的主特征向量求取t时刻每个频点上M对麦克风的时间延迟集合包括：In the above-mentioned technical solution, in step 105), the time delay collection of M pairs of microphones on each frequency point at the time t is obtained by using the main eigenvector at each frequency point at the time t time includes:

在t时刻第f频点上，主特征向量表示为：[u_1,t,f,u_2,t,f,…u_K,t,f]，由第p个和第q个麦克风组成的第m(m＝1,2,...,M)对麦克风的时间相位差为：At the f-th frequency point at time t, the main eigenvector is expressed as: [u _1,t,f ,u _2,t,f ,…u _K,t,f ], which is composed of the p-th and q-th microphones The time phase difference of the mth (m=1,2,...,M) pair of microphones for:

其中∠(.)表示求取复数相位的操作，表示角频率；Among them, ∠(.) represents the operation of obtaining the complex phase, Indicates the angular frequency;

在t时刻第f频点上，根据第m对麦克风的距离r_m约束，得到时间相位混叠集合L_m,t,f：At the f-th frequency point at time t, according to the distance r _m constraint of the m-th pair of microphones, the time-phase aliasing set L _{m,t,f is} obtained:

其中，c为声速；Among them, c is the speed of sound;

在t时刻第f频点上，第m对麦克风的时间延迟集合为：At the fth frequency point at time t, the time delay set of the mth pair of microphones is:

其中，，表示频率上的时间周期。in, ,express time period in frequency.

上述技术方案中，所述步骤106)进一步包括：In the above technical solution, the step 106) further includes:

步骤106-1)，选取初始的声源入射方向 Step 106-1), select the initial sound source incident direction

步骤106-2)，从每个时间延迟集合中选取一个时间延迟值；Step 106-2), selecting a time delay value from each time delay set;

令从每个时间延迟集合B_m,t,f中选取一个时间延迟值τ_m,t,f，满足：make Select a time delay value τ _m,t,f from each time delay set B _m,t ,f, satisfying:

其中，g_m＝(g_mx,g_my,g_mz)表示第m对麦克风连线的方向单位矢量；Among them, g _m = (g _mx , g _my , g _mz ) represents the directional unit vector of the m-th pair of microphones;

步骤106-3)，求取新的权重系数w_m,t,f；Step 106-3), finding new weight coefficients w _m,t,f ;

其中：in:

δ_m,t,f＝arccos(p^Tg_m)-arccos(cτ_m,t,f/r_m)，m＝1,2,...,Mδ _m,t,f = arccos(p ^T g _m )-arccos(cτ _m,t,f /r _m ), m=1,2,...,M

步骤106-4)，计算新的声源入射方向 Step 106-4), calculate the new sound source incident direction

其中： in:

步骤106-5)，判断是否收敛；如果判断结果是肯定的，转入步骤106-6)；否则，转入步骤106-2)继续执行；Step 106-5), judging Convergence; if the judgment result is affirmative, proceed to step 106-6); otherwise, proceed to step 106-2) to continue execution;

步骤106-6)，计算处于支配地位的声源入射方向的方位角。Step 106-6), calculate the dominant sound source incident direction azimuth angle.

上述技术方案中，在步骤106-1)中，所述选取初始的声源入射方向包括：In the above technical solution, in step 106-1), the selection of the initial sound source incidence direction include:

在方位角360°×仰角90°的格点空间上，均匀选取若干个声源入射方向作为候选值，标记为向量集合{ψ₁,ψ₂,…ψ_H,H＞8且为整数}；对于每个候选的声源入射方向ψ_h,h＝1,2,…H，计算它到所有时间延迟集合的距离和，表示为：On the grid point space with an azimuth angle of 360°×elevation angle of 90°, uniformly select several incident directions of sound sources as candidate values, and mark them as a vector set {ψ ₁ ,ψ ₂ ,…ψ _H , H>8 and an integer}; For each candidate sound source incidence direction ψ _h , h=1, 2,...H, calculate the sum of its distances to all time delay sets, expressed as:

其中，％表示浮点余数操作；Among them, % represents the floating-point remainder operation;

在t时刻第f频点上，初始的声源入射方向满足：At the fth frequency point at time t, the initial sound source incident direction Satisfy:

上述技术方案中，在步骤107)中，所述统计分析包括：直方图分析，聚类分析。In the above technical solution, in step 107), the statistical analysis includes: histogram analysis and cluster analysis.

本发明的优点在于：The advantages of the present invention are:

1、通过相邻时间频点间的支配相似性，充分挖掘相邻时间频点上的信息，实现可靠的空间位置估计；1. Through the dominant similarity between adjacent time and frequency points, fully mine the information on adjacent time and frequency points to achieve reliable spatial position estimation;

2、提出了一种基于时频块的空间相关矩阵的配对时间延迟抽取方法，该方法利用了信号增强和时间延迟的权重系数衡量可靠性，从而实现了鲁棒的多稀疏声源定位方法。2. A paired time-delay extraction method based on the spatial correlation matrix of time-frequency blocks is proposed. This method uses the weight coefficients of signal enhancement and time delay to measure reliability, thereby realizing a robust multi-sparse sound source localization method.

附图说明Description of drawings

图1是本发明的一种基于支配地位的多稀疏声源定位方法的流程图；Fig. 1 is a flow chart of a kind of multi-sparse sound source localization method based on dominance of the present invention;

图2是本发明的计算每个频点上处于支配地位的声源入射方向方位角的方法流程图；Fig. 2 is the flow chart of the method for calculating the dominant sound source incident direction azimuth on each frequency point of the present invention;

图3是本发明的对所有频点上处于支配地位的声源入射方向方位角进行直方图分析的流程图；Fig. 3 is the flow chart of the present invention carrying out histogram analysis to the dominant sound source incident direction azimuth angle on all frequency points;

图4是本发明的方位角直方图分析的示意图。Fig. 4 is a schematic diagram of the azimuth histogram analysis of the present invention.

具体实施方式detailed description

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

参考图1，本发明的方法包括以下步骤：With reference to Fig. 1, method of the present invention comprises the following steps:

步骤101)，将通过麦克风阵列接收的声源信号转换成数字声音信号；Step 101), converting the sound source signal received by the microphone array into a digital sound signal;

所述麦克风阵列包括K个麦克风；The microphone array includes K microphones;

步骤102)，对数字声音信号进行预处理，通过快速傅里叶变换(FFT)提取每个麦克风的数字声音信号的频谱；Step 102), the digital sound signal is preprocessed, and the frequency spectrum of the digital sound signal of each microphone is extracted by Fast Fourier Transform (FFT);

所述对数字声音信号进行预处理包括：对每一帧的数字声音信号先补零到N点，N＝2ⁱ，i为整数，且i≥8；然后，对每一帧的数字声音信号进行加窗或预加重处理，加窗函数采用汉明窗(hamming)或哈宁窗(hanning)；The preprocessing of the digital sound signal includes: first padding the digital sound signal of each frame to N points, N=2 ⁱ , i is an integer, and i≥8; then, for the digital sound signal of each frame Perform windowing or pre-emphasis processing, and the windowing function uses Hamming window (hamming) or Hanning window (hanning);

对t时刻的数字声音信号进行快速傅里叶变换，得到t时刻的数字声音信号的离散频谱为Fast Fourier transform is performed on the digital sound signal at time t, and the discrete spectrum of the digital sound signal at time t is obtained as

其中，y_k,t,n表示t时刻第k个麦克风采集信号的第n个采样点，Y_k,t,f(k＝1,2…K,f＝0,1,…N-1)表示t时刻第k个麦克风采集信号的第f个频点的傅里叶变换系数。Among them, y _{k, t, n} represent the nth sampling point of the signal collected by the kth microphone at time t, Y _{k, t, f} (k=1,2...K, f=0,1,...N-1) Indicates the Fourier transform coefficient of the fth frequency point of the signal collected by the kth microphone at time t.

步骤103)，利用相邻时间相同频点上所有麦克风的数字声音信号的频谱计算t时刻每个频点的空间相关矩阵；Step 103), utilize the frequency spectrum of the digital sound signal of all microphones on the same frequency point in adjacent time to calculate the spatial correlation matrix of each frequency point at t moment;

设x_t,f为t时刻，第f个频点上生成的一个复数向量：x_t,f＝{Y_1,t,f,Y_2,t,f…Y_K,t,f},其自相关矩阵为：其中：()^H表示共轭转置；Let x _t,f be a complex vector generated at the fth frequency point at time t: x _t,f ={Y _1,t,f ,Y _2,t,f ...Y _K,t,f }, where The autocorrelation matrix is: Where: () ^H represents the conjugate transpose;

复数自相关矩阵R_t,f表示为以f频点为中心的相邻时间频点上自相关矩阵的均值：The complex autocorrelation matrix R _{t, f} is expressed as the mean value of the autocorrelation matrix on the adjacent time frequency points centered on the f frequency point:

其中，A表示与t时刻相邻时间的帧数；Among them, A represents the number of frames adjacent to time t;

x_t,f的空间相关矩阵为R_t,f。The spatial correlation matrix of x _t,f is R _t,f .

步骤104)，对t时刻每个频点上的空间相关矩阵进行分解，得到t时刻每个频点上的主特征向量；向量的每一个分量对应一个麦克风的采集信号；Step 104), decompose the spatial correlation matrix on each frequency point at time t to obtain the main eigenvector at each frequency point at time t; each component of the vector corresponds to the acquisition signal of a microphone;

步骤105)，利用t时刻每个频点上的主特征向量求取t时刻每个频点上M对麦克风的时间延迟集合，所述M等于K(K-1)2；具体过程为：Step 105), use the principal eigenvector on each frequency point at t time to obtain the time delay set of M pairs of microphones at each frequency point at t time, and the M is equal to K(K-1) 2; the specific process is:

其中，c为声速；Among them, c is the speed of sound;

参考图2，具体实施步骤如下：Referring to Figure 2, the specific implementation steps are as follows:

其中，g_m＝(g_mx,g_my,g_mz)表示第m对麦克风连线的方向单位矢量，％表示浮点余数操作。Wherein, g _m =(g _mx , g _my , g _mz ) represents the directional unit vector of the m-th microphone pair, and % represents the floating-point remainder operation.

其中：in:

其中： in:

在本实施例中，判断是否收敛的方法为：In this example, judging The method of convergence is:

判断是否小于门限值ε，其中取ε＝0.01。judge Whether it is smaller than the threshold ε, where ε=0.01.

步骤106-6)，计算处于支配地位的声源入射方向的方位角；计算方法为：Step 106-6), calculate the dominant sound source incident direction The azimuth angle of ; the calculation method is:

步骤107)对t时刻所有频点的处于支配地位的声源入射方向的方位角进行统计分析；Step 107) Statistically analyze the azimuth angles of the dominant sound source incident directions of all frequency points at time t;

所述统计分析包括：直方图分析和聚类分析。The statistical analysis includes: histogram analysis and cluster analysis.

在本实施例中，对t时刻所有频点的处于支配地位的声源入射方向的方位角进行直方图分析，参考图3，具体步骤如下：In this embodiment, the histogram analysis is performed on the azimuth angles of the dominant sound source incident directions of all frequency points at time t, referring to Fig. 3, the specific steps are as follows:

步骤107-1)，构建并平滑方位角直方图；Step 107-1), constructing and smoothing the azimuth histogram;

将360度的方位角划分为360等份，在这些等份上构建直方图，并采用窗函数平滑直方图。Divide the azimuth angle of 360 degrees into 360 equal parts, construct a histogram on these equal parts, and use a window function to smooth the histogram.

步骤107-2)，计算直方图门限值；Step 107-2), calculating the histogram threshold value;

直方图门限值＝平均直方值+(最大直方值-平均直方值)×经验系数；Histogram threshold value = average histogram value + (maximum histogram value - average histogram value) × empirical coefficient;

其中经验系数取值范围在0到1之间；在本实施例中，经验系数取值为0.3。The value range of the empirical coefficient is between 0 and 1; in this embodiment, the value of the empirical coefficient is 0.3.

步骤107-3)，确定最终的方位角和方位角个数；Step 107-3), determine the final azimuth and the number of azimuths;

参考图4，选取直方图中峰值大于直方图门限值的峰作为最终的方位角，峰值的个数为方位角的个数。Referring to FIG. 4 , the peak in the histogram whose peak value is greater than the histogram threshold is selected as the final azimuth, and the number of peaks is the number of azimuths.

步骤107-4)，确定最终的声源入射方向和声源个数。Step 107-4), determining the final sound source incident direction and the number of sound sources.

根据选取的方位角获取此方位角对应的声源入射方向，方位角的个数为声源入射方向的个数。According to the selected azimuth, the incident direction of the sound source corresponding to the azimuth is obtained, and the number of azimuths is the number of incident directions of the sound source.

在其它实施例中，可以对t时刻所有频点的处于支配地位的声源入射方向的方位角进行聚类分析，具体的处理方法属于公知常识，在此不做赘述。In other embodiments, cluster analysis may be performed on the azimuth angles of the dominant sound source incidence directions at all frequency points at time t. The specific processing method belongs to common knowledge and will not be repeated here.

Claims

1. A multi-sparse sound source localization method based on dominance correlation, comprising:

Step 101), converting the sound source signal received by the microphone array into a digital sound signal, and the microphone array includes K microphones;

Step 102), extract the frequency spectrum of the digital sound signal of each microphone;

Step 103), utilize the frequency spectrum of the digital sound signal of all microphones to calculate the spatial correlation matrix on each frequency point on the same frequency point of adjacent time at t moment;

Step 104), decompose the spatial correlation matrix on each frequency point at time t to obtain the main eigenvector at each frequency point at time t, and each component of the main eigenvector corresponds to the acquisition signal of a microphone;

Step 105), using the principal eigenvector on each frequency point at t time to obtain the time delay set of M pairs of microphones at each frequency point at t time, said M equals K(K-1)/2;

Step 106), according to the time delay set of M pairs of microphones at each frequency point at t time, adopt an iterative method to calculate the azimuth angle of the sound source incident direction that is dominant at each frequency point at t time;

Step 107), statistically analyzing the azimuth angles of the dominant sound source incident directions of all frequency points at time t;

Step 108), outputting the finally determined incident direction of the sound source and the number of sound sources at time t.

2. The multi-sparse sound source localization method based on dominance correlation according to claim 1, characterized in that, in step 105), the use of the principal eigenvector at each frequency point at time t to obtain each The time delay set of M pairs of microphones at the frequency point includes:

At the f-th frequency point at time t, the main eigenvector is expressed as: [u _1,t,f ,u _2,t,f ,…u _K,t,f ], which is composed of the p-th and q-th microphones The time phase difference of the mth (m=1,2,...,M) pair of microphones for:

Among them, ∠(.) represents the operation of obtaining the complex phase, Indicates the angular frequency;

At the f-th frequency point at time t, according to the distance r _m constraint of the m-th pair of microphones, the time-phase aliasing set L _{m,t,f is} obtained:

Among them, c is the speed of sound;

At the fth frequency point at time t, the time delay set of the mth pair of microphones is:

<mrow><msub><mi>B</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>=</mo><mo>{</mo><msub><mover><mi>&tau;</mi><mo>^</mo></mover><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>=</mo><msub><mover><mi>&tau;</mi><mo>&OverBar;</mo></mover><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>+</mo><msub><mi>l</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><msub><mi>T</mi><mi>f</mi></msub><mo>|</mo><msub><mi>l</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>&Element;</mo><msub><mi>L</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>}</mo><mo>,</mo><mi>m</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo>mo><mo>...</mo><mo>,</mo><mi>M</mi></mrow>

in, express time period in frequency.

3. The multi-sparse sound source localization method based on dominant correlation according to claim 2, characterized in that, step 106) further comprises:

Step 106-1), select the initial sound source incident direction make

Step 106-2), selecting a time delay value from each time delay set;

make Select a time delay value τ _m,t,f from each time delay set B _m,t ,f, satisfying:

<mrow><msub><mi>&tau;</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>=</mo><mi>arg</mi><munder><mi>min</mi><mrow><mi>&tau;</mi><mo>&Element;</mo><msub><mi>B</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub></mrow></munder><mrow><mo>|</mo><mrow><msubsup><mi>pg</mi><mi>m</mi><mi>T</mi></msubsup><msub><mi>r</mi><mi>m</mi></msub><mo>-</mo><mi>c</mi><mi>&tau;</mi></mrow><mo>|</mo></mrow><mo>,</mo><mi>m</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>...</mo><mo>,</mo><mi>M</mi><mo>;</mo></mrow> 1

Among them, g _m = (g _mx , g _my , g _mz ) represents the directional unit vector of the m-th pair of microphones;

Step 106-3), finding new weight coefficients w _m,t,f ;

<mrow><msub><mi>w</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>=</mo><mi>exp</mi><mrow><mo>(</mo><msubsup><mi>&delta;</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow><mn>2</mn></msubsup><mo>/</mo><msup><mi>&sigma;</mi><mn>2</mn></msup><mo>)</mo></mrow><mo>/</mo><munderover><mo>&Sigma;</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>exp</mi><mrow><mo>(</mo><msubsup><mi>&delta;</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow><mn>2</mn></msubsup><mo>/</mo><msup><mi>&sigma;</mi><mn>2</mn></msup><mo>)</mo></mrow><mo>,</mo><mi>m</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>...</mo><mo>,</mo><mi>M</mi></mrow>

in:

δ _m,t,f = arccos(p ^T g _m )-arccos(cτ _m,t,f /r _m ), m=1,2,...,M

<mrow><msup><mi>&sigma;</mi><mn>2</mn></msup><mo>=</mo><munderover><mo>&Sigma;</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><msubsup><mi>&delta;</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow><mn>2</mn></msubsup><mo>/</mo><mi>M</mi></mrow>

Step 106-4), calculate the new sound source incident direction

<mrow><mfenced open = "(" close = ")"><mtable><mtr><mtd><msub><mover><mi>&gamma;</mi><mo>^</mo></mover><mrow><mn>1</mn><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub></mtd></mtr><mtr><mtd><msub><mover><mi>&gamma;</mi><mo>^</mo></mover><mrow><mn>2</mn><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub></mtd></mtr></mtable></mfenced><mo>=</mo><msup><mrow><mo>&lsqb;</mo><munderover><mo>&Sigma;</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><msub><mi>w</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><msubsup><mi>g</mi><mi>m</mi><mo>&prime;</mo></msubsup><msubsup><mi>g</mi><mi>m</mi><mrow><mo>&prime;</mo><mi>T</mi></mrow></msubsup><mo>&rsqb;</mo></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup><munderover><mo>&Sigma;</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><msub><mi>cw</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><msub><mi>&tau;</mi><mrow><mi>m</mi><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><msubsup><mi>g</mi><mi>m</mi><mrow><mo>&prime;</mo><mi>T</mi></mrow></msubsup><mo>/</mo><msub><mi>r</mi><mi>m</mi></msub></mrow>

<mrow><msub><mover><mi>&gamma;</mi><mo>^</mo></mover><mrow><mn>3</mn><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow></msub><mo>=</mo><msqrt><mrow><mn>1</mn><mo>-</mo><msubsup><mover><mi>&gamma;</mi><mo>^</mo></mover><mrow><mn>1</mn><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow><mn>2</mn></msubsup><mo>-</mo><msubsup><mover><mi>&gamma;</mi><mo>^</mo></mover><mrow><mn>2</mn><mo>,</mo><mi>t</mi><mo>,</mo><mi>f</mi></mrow><mn>2</mn></msubsup></mrow></msqrt></mrow>

Where: g′ _m = (g _mx , g _my );

Step 106-5), judging convergence; if the judgment result is affirmative, go to step 106-6); otherwise, let Go to step 106-2) to continue execution;

Step 106-6), calculate the dominant sound source incident direction azimuth angle.

4. The multi-sparse sound source localization method based on dominant correlation according to claim 3, characterized in that, in step 106-1), the selection of the initial sound source incident direction include:

On the grid point space with an azimuth angle of 360°×elevation angle of 90°, uniformly select several incident directions of sound sources as candidate values, and mark them as a vector set {ψ ₁ ,ψ ₂ ,…ψ _H , H>8 and an integer}; For each candidate sound source incidence direction ψ _h , h=1, 2,...H, calculate the sum of its distances to all time delay sets, expressed as:

<mrow><msub><mi>b</mi><mi>f</mi></msub><mrow><mo>(</mo><msub><mi>&psi;</mi><mi>h</mi></msub><mo>)</mo></mrow><mo>=</mo><munderover><mo>&Sigma;</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mo>&lsqb;</mo><msubsup><mi>&psi;</mi><mi>H</mi><mi>T</mi></msubsup><msub><mi>g</mi><mi>m</mi></msub><msub><mi>r</mi><mi>m</mi></msub><mo>-</mo><mi>c</mi><mi>&tau;</mi><mo>)</mo><mi>%</mi><mrow><mo>(</mo><msub><mi>cT</mi><mi>f</mi></msub><mo>)</mo></mrow><msup><mo>&rsqb;</mo><mn>2</mn></msup></mrow>

Among them, % represents the floating-point remainder operation;

At the fth frequency point at time t, the initial sound source incident direction Satisfy:

<mrow><msub><mover><mi>&gamma;</mi><mo>^</mo></mover><mrow><mi>t</mi><mo>,</mo><mi>f</mi><mo>,</mo><mn>0</mn></mrow></msub><mo>=</mo><munder><mrow><mi>arg</mi><mi>min</mi></mrow><mrow><mn>1</mn><mo>&le;</mo><mi>h</mi><mo>&le;</mo><mi>H</mi></mrow></munder><msub><mi>b</mi><mi>f</mi></msub><mrow><mo>(</mo><msub><mi>&psi;</mi><mi>h</mi></msub><mo>)</mo></mrow><mo>.</mo></mrow>

5. The multi-sparse sound source localization method based on dominance correlation according to claim 1, characterized in that, in step 107), the statistical analysis comprises: histogram analysis and cluster analysis.