CN109313909A

CN109313909A - Method, Apparatus, Apparatus and System for Evaluating Microphone Array Consistency

Info

Publication number: CN109313909A
Application number: CN201880001199.6A
Authority: CN
Inventors: 李国梁; 罗朝洪; 程树青
Original assignee: Shenzhen Goodix Technology Co Ltd
Current assignee: Xi'an Xinyida Communication Technology Co ltd
Priority date: 2018-08-22
Filing date: 2018-08-22
Publication date: 2019-02-05
Anticipated expiration: 2038-08-22
Also published as: WO2020037555A1; CN109313909B; CN116437280A

Abstract

The embodiment of the application provides a method, equipment, a device and a system for evaluating the consistency of a microphone array, which can evaluate the consistency among different microphones in the microphone array, so that the calibration of the microphone array is guided and the robustness of a multi-channel enhancement algorithm is evaluated according to a consistency evaluation result, and the user experience is improved. The method comprises the following steps: acquiring N audio signals respectively acquired by N microphones, wherein the N microphones form a microphone array, and N is more than or equal to 2; determining a phase spectrum difference value and/or a power spectrum difference value between each of the N microphones except a reference microphone and the reference microphone according to the N audio signals, wherein the reference microphone is any one of the N microphones; and performing consistency evaluation on the N microphones according to the phase spectrum difference value and/or the power spectrum difference value between each microphone except the reference microphone and the reference microphone.

Description

Method, Apparatus, Apparatus and System for Evaluating Microphone Array Consistency

技术领域technical field

本申请涉及语音通讯和语音智能交互领域，并且更具体地，涉及评估麦克风阵列一致性的方法、设备、装置和系统。The present application relates to the fields of voice communication and voice intelligent interaction, and more particularly, to methods, apparatuses, apparatuses and systems for evaluating the consistency of microphone arrays.

背景技术Background technique

在语音通讯应用中，语音增强技术能够提高人的听觉感受，提高语音通讯的可懂度，在语音智能交互应用中，语音增强技术能够提高语音识别的准确率，提升用户体验，因此语音增强技术无论是在传统的语音通讯，还是语音交互中都至关重要。语音增强技术分为单通道语音增强技术和多通道语音增强技术，其中，单通道语音增强技术能够消除稳态噪声，不能消除非稳态噪声，且信号比提高是以语音损伤为代价，信噪比提高越多，语音损伤越大；多通道语音增强技术利用麦克风阵列采集多路信号，利用多麦克风信号之间的相位信息和相干信息消除噪声，能够消除非稳态噪声，且对语音损伤较小。In voice communication applications, voice enhancement technology can improve human hearing experience and improve the intelligibility of voice communication. In voice intelligent interactive applications, voice enhancement technology can improve the accuracy of voice recognition and improve user experience. Therefore, voice enhancement technology Both in traditional voice communication and voice interaction are crucial. Speech enhancement technology is divided into single-channel speech enhancement technology and multi-channel speech enhancement technology. Among them, single-channel speech enhancement technology can eliminate steady-state noise, but cannot eliminate non-stationary noise, and the improvement of signal ratio is at the expense of speech impairment, and signal-to-noise The more the ratio is improved, the greater the speech damage; the multi-channel speech enhancement technology uses the microphone array to collect multi-channel signals, and uses the phase information and coherent information between the multi-microphone signals to eliminate noise, which can eliminate non-steady-state noise, and has less effect on speech damage. Small.

在多通道语音增强技术中，麦克风阵列中不同麦克风之间的一致性直接影响算法性能，现有方案提出了多通道增强技术的改进算法，增加算法的鲁棒性，同时对麦克风之间的一致性要求降低，然而，麦克风之间的一致性很低时仍然会影响算法性能，从而影响了用户体验。In the multi-channel speech enhancement technology, the consistency between different microphones in the microphone array directly affects the performance of the algorithm. The existing scheme proposes an improved algorithm of the multi-channel enhancement technology to increase the robustness of the algorithm, and at the same time, the consistency between the microphones is improved. The performance requirement is reduced, however, low consistency between microphones still affects algorithm performance and thus user experience.

发明内容SUMMARY OF THE INVENTION

本申请提供一种评估麦克风阵列一致性的方法、设备、装置和系统，能够评估麦克风阵列中不同麦克风之间的一致性，从而根据一致性评估结果指导麦克风阵列的校准和评估多通道增强算法的鲁棒性，提升用户体验。The present application provides a method, device, device and system for evaluating the consistency of a microphone array, which can evaluate the consistency between different microphones in the microphone array, so as to guide the calibration of the microphone array and evaluate the performance of the multi-channel enhancement algorithm according to the consistency evaluation result. Robustness, improve user experience.

第一方面，提供了一种评估麦克风阵列一致性的方法，包括：In a first aspect, a method for evaluating the consistency of a microphone array is provided, including:

获取N个麦克风分别采集的N个音频信号，该N个麦克风构成麦克风阵列，N≥2；Obtain N audio signals collected by N microphones respectively, the N microphones form a microphone array, N≥2;

根据该N个音频信号，确定该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值，该参考麦克风为该N个麦克风中的任意一个麦克风；Determine, according to the N audio signals, a phase spectrum difference value and/or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, where the reference microphone is one of the N microphones any one of the microphones;

根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值，对该N个麦克风进行一致性评估。According to the phase spectrum difference and/or the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the N microphones are evaluated for consistency.

需要说明的是，对该N个麦克风进行一致性评估，可以用于指导麦克风阵列中的麦克风分布，或者指导重新设计麦克风阵列中的麦克风分布，又或者指导重新设计麦克风阵列，又或者评估多通道增强算法的鲁棒性。It should be noted that the consistency evaluation of the N microphones can be used to guide the microphone distribution in the microphone array, or to guide the redesign of the microphone distribution in the microphone array, or to guide the redesign of the microphone array, or to evaluate the multi-channel Enhance the robustness of the algorithm.

例如，评估结果显示麦克风1与麦克风2的一致性较差时，可以指导调整麦克风1或者麦克风2在麦克风阵列中的分布，或者可以指导重新设计麦克风1或者麦克风2。For example, when the evaluation result shows that the consistency between microphone 1 and microphone 2 is poor, it may guide to adjust the distribution of microphone 1 or microphone 2 in the microphone array, or guide to redesign microphone 1 or microphone 2.

又例如，评估结果显示麦克风1与多个麦克风的一致性都较差时，可以指导调整麦克风1在麦克风阵列中的分布，或者可以指导重新设计麦克风1，或者可以指导重新设计麦克风阵列。For another example, when the evaluation result shows that the consistency of microphone 1 with multiple microphones is poor, it can guide to adjust the distribution of microphone 1 in the microphone array, or guide to redesign microphone 1, or guide to redesign the microphone array.

在本申请实施例中，根据N个麦克风分别采集的N个音频信号，确定各个麦克风与参考麦克风之间的相位谱差值和/或功率谱差值，从而对N个麦克风进行一致性评估，消除麦克风之间的一致性对多通道语音增强算法的影响，提升用户体验。In the embodiment of the present application, the phase spectrum difference and/or the power spectrum difference between each microphone and the reference microphone are determined according to the N audio signals collected by the N microphones, so that the consistency evaluation of the N microphones is performed, Eliminate the impact of consistency between microphones on multi-channel speech enhancement algorithms and improve user experience.

在一些可能的实现方式中，所述根据该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值，对该N个麦克风进行一致性评估，包括：In some possible implementations, according to the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the N microphones are evaluated for consistency, including:

根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值，评估对应麦克风与该参考麦克风之间的相位一致性。According to the phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.

需要说明的是，两个麦克风之间的相位谱差值越小，表示这两个麦克风之间的相位一致性越好。It should be noted that the smaller the phase spectrum difference between the two microphones, the better the phase consistency between the two microphones.

例如，麦克风1与参考麦克风之间的相位谱差值为A，A越小，表示麦克风1与参考麦克风之间的相位一致性越好。For example, the phase spectrum difference value between microphone 1 and the reference microphone is A, and the smaller A is, the better the phase consistency between microphone 1 and the reference microphone is.

可选地，可以设置一个阈值，若两个麦克风之间的相位谱差值小于这一阈值，则表示这两个麦克风之间的相位一致性满足设计需求，这两个麦克风之间的一致性对多通道语音增强算法的影响可以忽略，或者这两个麦克风之间的一致性对多通道语音增强算法没有影响。Optionally, a threshold can be set. If the phase spectrum difference between the two microphones is less than this threshold, it means that the phase consistency between the two microphones meets the design requirements, and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm is negligible, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.

应注意的是，上述阈值可以根据不同的多通道语音增强算法灵活配置。It should be noted that the above threshold can be flexibly configured according to different multi-channel speech enhancement algorithms.

在一些可能的实现方式中，该方法还包括：In some possible implementations, the method further includes:

分别测量该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风到声源的距离差；Measure the distance difference between each of the N microphones except the reference microphone and the reference microphone to the sound source;

根据所测量的距离差，分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差；Calculate the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;

根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差，分别校准其对应的相位谱差值。According to the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values are respectively calibrated.

例如，麦克风1与参考麦克风之间的固定相位差为A，麦克风1与参考麦克风之间的相位谱差值为B，校准之后，麦克风1与参考麦克风之间的相位谱差值为C，此时，C＝B-A。For example, the fixed phase difference between microphone 1 and the reference microphone is A, the phase spectrum difference between microphone 1 and the reference microphone is B, and after calibration, the phase spectrum difference between microphone 1 and the reference microphone is C, this , C=B-A.

在一些可能的实现方式中，所述根据所测量的距离，分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差，包括：In some possible implementations, according to the measured distance, calculating the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, respectively, includes:

根据公式分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差，According to the formula Calculate the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, respectively,

其中，Y_i(ω)表示第i个麦克风的频谱，Y₁(ω)表示参考麦克风的频谱，ω表示频率，d_i表示第i个麦克风与参考麦克风到声源的距离差，c表示声速，2πωd_i/c表示第i个麦克风与参考麦克风之间的固定相位差。Among them, Y _i (ω) represents the spectrum of the ith microphone, Y ₁ (ω) represents the spectrum of the reference microphone, ω represents the frequency, d _i represents the distance difference between the ith microphone and the reference microphone to the sound source, and c represents the speed of sound , 2πωd _i /c represents the fixed phase difference between the ith microphone and the reference microphone.

根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值，评估对应麦克风与该参考麦克风之间的幅度一致性。According to the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the amplitude consistency between the corresponding microphone and the reference microphone is evaluated.

需要说明的是，两个麦克风之间的功率谱差值越小，表示这两个麦克风之间的幅度一致性越好。It should be noted that the smaller the power spectrum difference between the two microphones, the better the amplitude consistency between the two microphones.

例如，麦克风1与参考麦克风之间的功率谱差值为A，A越小，表示麦克风1与参考麦克风之间的幅度一致性越好。For example, the power spectrum difference between the microphone 1 and the reference microphone is A, and the smaller the A, the better the amplitude consistency between the microphone 1 and the reference microphone.

可选地，可以设置一个阈值，若两个麦克风之间的功率谱差值小于这一阈值，则表示这两个麦克风之间的幅度一致性满足设计需求，这两个麦克风之间的一致性对多通道语音增强算法的影响可以忽略，或者这两个麦克风之间的一致性对多通道语音增强算法没有影响。Optionally, a threshold can be set. If the power spectrum difference between the two microphones is less than this threshold, it means that the amplitude consistency between the two microphones meets the design requirements, and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm is negligible, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.

在一些可能的实现方式中，在进行相位一致性评估时，该N个音频信号是在播放扫频信号数据的环境下采集的信号。In some possible implementations, when the phase consistency evaluation is performed, the N audio signals are signals collected in an environment of playing frequency sweep signal data.

在一些可能的实现方式中，在进行幅度一致性评估时，该N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。In some possible implementations, when the amplitude consistency evaluation is performed, the N audio signals are signals collected under the environment of playing Gaussian white noise data or frequency sweep signal data.

在一些可能的实现方式中，该扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。In some possible implementations, the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.

在一些可能的实现方式中，所述根据该N个音频信号，确定该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值，包括：In some possible implementations, the phase spectrum difference value and/or the power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone is determined according to the N audio signals ,include:

将该N个音频信号中的每个音频信号进行分帧，得到长度相等的K个信号帧，K≥2；Framing each of the N audio signals to obtain K signal frames of equal length, K≥2;

对该K个信号帧中的每个信号帧做加窗处理，得到K个加窗信号帧；Perform windowing processing on each of the K signal frames to obtain K windowed signal frames;

对该K个加窗信号帧中的每个加窗信号帧做快速傅氏变换(Fast FourierTransformation，FFT)变换，得到K个目标信号帧；Perform Fast Fourier Transform (FFT) transformation on each of the K windowed signal frames to obtain K target signal frames;

根据该每个音频信号对应的该K个目标信号帧，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值。According to the K target signal frames corresponding to each audio signal, a phase spectrum difference value and/or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone is determined.

可选地，K表示每个麦克风采集到信号的总帧数。Optionally, K represents the total number of frames of signals collected by each microphone.

需要说明的是，加窗处理用来消除分帧时带来的截断效应。可选地，可以是对该K个信号帧中的每个信号帧做加汉明窗处理。It should be noted that the windowing process is used to eliminate the truncation effect caused by frame division. Optionally, each of the K signal frames may be processed by adding a Hamming window.

在一些可能的实现方式中，该K个信号帧中任意两个相邻信号帧重叠R％，R＞0。例如，该R为25或者50。In some possible implementations, any two adjacent signal frames in the K signal frames overlap by R%, and R>0. For example, the R is 25 or 50.

可选地，重叠加窗后信号幅度保持不变。Optionally, the signal amplitude remains unchanged after overlapping windowing.

应理解，重叠之后的每一帧信号都有上一帧的成分，防止两帧之间的不连续。It should be understood that each frame signal after the overlap has the components of the previous frame to prevent discontinuity between the two frames.

在一些可能的实现方式中，将第i个音频信号进行分帧，得到长度相等的K个信号帧写成以下向量形式：In some possible implementations, the ith audio signal is divided into frames to obtain K signal frames of equal length, which are written in the following vector form:

x_i(t)＝[x_i,1(t),x_i,2(t),…,x_i,K(t)]^T x _i (t)=[x _i,1 (t), _xi,2 (t),…, _xi,K (t)] ^T

其中，x_i(t)表示第i个音频信号，K表示每个麦克风采集到信号的总帧数，[ ]^T表示向量或者矩阵的转置。Among them, x _i (t) represents the ith audio signal, K represents the total number of frames of signals collected by each microphone, and [ ] ^T represents the transpose of the vector or matrix.

在一些可能的实现方式中，所述根据该每个音频信号对应的该K个目标信号帧，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值，包括：In some possible implementations, the phase spectrum between each of the N microphones except the reference microphone and the reference microphone is determined according to the K target signal frames corresponding to each audio signal difference, including:

根据公式确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值，According to the formula determining the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone,

其中，imag()表示取虚部，ln()表示取自然对数，表示第i个麦克风与参考麦克风之间的相位谱差值，表示参考麦克风的第j个目标信号帧，表示第i个麦克风的第j个目标信号帧，表示主频率。Among them, imag() means taking the imaginary part, ln() means taking the natural logarithm, represents the phase spectrum difference between the ith microphone and the reference microphone, represents the jth target signal frame of the reference microphone, represents the jth target signal frame of the ith microphone, Indicates the dominant frequency.

在一些可能的实现方式中，所述根据该每个音频信号对应的该K个目标信号帧，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值，包括：In some possible implementations, the power spectrum between each of the N microphones except the reference microphone and the reference microphone is determined according to the K target signal frames corresponding to each audio signal difference, including:

根据该每个音频信号对应的该K个目标信号帧，确定该每个音频信号的功率谱；Determine the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;

根据该每个音频信号的功率谱，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone is determined.

在一些可能的实现方式中，所述根据该每个音频信号对应的该K个目标信号帧，确定该每个音频信号的功率谱，包括：In some possible implementations, determining the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal includes:

根据公式计算该每个音频信号的功率谱，According to the formula Calculate the power spectrum of each audio signal,

其中，P_i(ω)表示第i个音频信号的功率谱，Y_i,j(ω)表示第i个音频信号中的第j个目标信号帧，K表示每个麦克风接收到信号的总帧数，ω表示频率。Among them, P _i (ω) represents the power spectrum of the ith audio signal, Y _i,j (ω) represents the jth target signal frame in the ith audio signal, and K represents the total frame of the signal received by each microphone number, ω represents the frequency.

在一些可能的实现方式中，所述根据该每个音频信号的功率谱，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值，包括：In some possible implementations, the determining, according to the power spectrum of each audio signal, a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone includes:

根据公式PD_i(ω)＝P₁(ω)-P_i(ω)计算该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值，Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD _i (ω)=P ₁ (ω)-P _i (ω),

其中，PD_i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值，P₁(ω)表示参考麦克风的功率谱，P_i(ω)表示第i个麦克风的功率谱。Among them, PD _i (ω) represents the power spectrum difference between the ith microphone and the reference microphone, P ₁ (ω) represents the power spectrum of the reference microphone, and P _i (ω) represents the power spectrum of the ith microphone.

在一些可能的实现方式中，所述获取N个麦克风分别采集的N个音频信号，包括：In some possible implementations, the acquiring N audio signals respectively collected by the N microphones includes:

确定该N个麦克风在进行音频信号采集时的采样频率F_s和FFT点数N_fft，使用扬声器播放高斯白噪声数据或者扫频信号数据，该N个麦克风采集该N个音频信号，其中，若该扬声器所播放的数据为扫频信号数据，该扫频信号数据由M+1段长度相等且频率不等的信号构成， Determine the sampling frequency F _s and the number of FFT points N _fft of the N microphones when collecting the audio signals, use the speaker to play the Gaussian white noise data or the frequency sweep signal data, and the N microphones collect the N audio signals, where if the The data played by the speaker is the frequency sweep signal data, and the frequency sweep signal data is composed of M+1 segments of signals with equal lengths and unequal frequencies.

需要说明的是，FFT点数N_fft为偶数，一般为32,64,128,...,1024等，点数越多，运算量的节约就越大。It should be noted that the number of FFT points N _fft is an even number, generally 32, 64, 128, .

在一些可能的实现方式中，根据公式计算该M+1段信号中每段信号的频率，以及In some possible implementations, according to the formula calculate the frequency of each segment of the M+1 segment signal, and

根据公式S_i(t)＝sin(2πf_it)计算该M+1段信号中的每段信号，According to the formula S _i (t)=sin(2πf _i t), each segment of the M+1 segment signal is calculated,

其中，f_i表示第i段信号的频率，F_s表示采样频率，N_fft表示FFT点数，S_i(t)表示第i段信号，且S₁(t)的长度为周期T的整数倍，T＝1/f₁。Among them, f _i represents the frequency of the i-th segment signal, F _s represents the sampling frequency, N _fft represents the number of FFT points, S _i (t) represents the i-th segment signal, and the length of S ₁ (t) is an integer multiple of the period T, T=1/f ₁ .

在一些可能的实现方式中，扬声器所播放的扫频信号数据可以写成以下向量形式：In some possible implementations, the frequency sweep signal data played by the speaker can be written in the following vector form:

S(t)＝[S₀(t),S₁(t),…,S_M(t)]^T S(t)=[S ₀ (t),S ₁ (t),…,S _M (t)] ^T

其中，S(t)表示扬声器所播放的扫频信号数据，S_i(t)表示第i段信号，[ ]^T表示向量或者矩阵的转置。Among them, S(t) represents the frequency sweep signal data played by the speaker, S _i (t) represents the i-th signal, [ ] ^T represents the transpose of a vector or matrix.

在一些可能的实现方式中，该N个麦克风分别采集到N个音频信号，其中第i个麦克风采集到的音频信号表示为x_i(t)，且x_i(t)可以写成以下向量形式：In some possible implementations, the N microphones collect N audio signals respectively, wherein the audio signal collected by the ith microphone is represented as _xi (t), and _xi (t) can be written in the following vector form:

其中，x_i(t)表示第i个麦克风采集到的音频信号，K表示每个麦克风采集到信号的总帧数，[ ]^T表示向量或者矩阵的转置。Among them, x _i (t) represents the audio signal collected by the ith microphone, K represents the total number of frames of the signal collected by each microphone, [ ] ^T represents the transpose of the vector or matrix.

将该N个麦克风放置于测试房间内，该测试房间内配置有扬声器，该N个麦克风位于该扬声器的正前方；The N microphones are placed in a test room, a speaker is configured in the test room, and the N microphones are located directly in front of the speaker;

控制该扬声器播放高斯白噪声数据或者扫频信号数据，以及控制该N个麦克风分别采集该N个音频信号。The speaker is controlled to play Gaussian white noise data or frequency sweep signal data, and the N microphones are controlled to collect the N audio signals respectively.

在一些可能的实现方式中，该测试房间内具有消音室环境，该扬声器为音频测试专用人工嘴，且该人工嘴在使用之前用标准麦克风校准。In some possible implementations, the test room has an anechoic room environment, the speaker is an artificial mouth dedicated to audio testing, and the artificial mouth is calibrated with a standard microphone before use.

在一些可能的实现方式中，在控制该扬声器播放高斯白噪声数据或者扫频信号数据之前，该方法还包括：In some possible implementations, before controlling the speaker to play the Gaussian white noise data or the frequency sweep signal data, the method further includes:

在安静的环境下，获取该N个麦克风在第一时长T₁内采集的第一音频数据X₁(n)；In a quiet environment, obtain the first audio data X ₁ (n) collected by the N microphones within the first duration T ₁ ;

在播放高斯白噪声数据或者扫频信号数据的环境下，获取该N个麦克风在第二时长T2内采集的第二音频数据X₂(n)；Under the environment of playing Gaussian white noise data or frequency sweep signal data, obtain the second audio data X ₂ (n) collected by the N microphones in the second time period T2;

根据公式计算信噪比SNR，且确保该SNR大于第一阈值。According to the formula Calculate the signal-to-noise ratio SNR and ensure that the SNR is greater than the first threshold.

第二方面，提供了一种评估麦克风阵列一致性的设备，包括：In a second aspect, a device for evaluating the consistency of a microphone array is provided, including:

获取单元，用于获取N个麦克风分别采集的N个音频信号，所述N个麦克风构成麦克风阵列，N≥2；an acquisition unit, configured to acquire N audio signals collected by N microphones respectively, the N microphones form a microphone array, and N≥2;

处理单元，用于根据所述N个音频信号，确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值，所述参考麦克风为所述N个麦克风中的任意一个麦克风；A processing unit, configured to determine, according to the N audio signals, a phase spectrum difference value and/or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, where The reference microphone is any one of the N microphones;

所述处理单元，还用于根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值，对所述N个麦克风进行一致性评估。The processing unit is further configured to, according to the phase spectrum difference value and/or the power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, determine the N value for the N microphones. microphones for conformance assessment.

在一些可能的实现方式中，所述处理单元具体用于：In some possible implementations, the processing unit is specifically used for:

根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值，评估对应麦克风与所述参考麦克风之间的相位一致性。According to the phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.

在一些可能的实现方式中，所述处理单元还用于：In some possible implementations, the processing unit is further configured to:

分别测量所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风到声源的距离差；respectively measuring the distance difference between each of the N microphones except the reference microphone and the reference microphone to the sound source;

根据所测量的距离差，分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差；Calculate the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;

根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差，分别校准其对应的相位谱差值。According to the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values are respectively calibrated.

根据公式分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差，According to the formula Calculate the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, respectively,

根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值，评估对应麦克风与所述参考麦克风之间的幅度一致性。According to the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the amplitude consistency between the corresponding microphone and the reference microphone is evaluated.

在一些可能的实现方式中，所述N个音频信号是在播放扫频信号数据的环境下采集的信号。In some possible implementations, the N audio signals are signals collected in an environment of playing frequency sweep signal data.

在一些可能的实现方式中，所述N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。In some possible implementations, the N audio signals are signals collected in an environment of playing Gaussian white noise data or frequency sweep signal data.

在一些可能的实现方式中，所述扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。In some possible implementation manners, the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.

将所述N个音频信号中的每个音频信号进行分帧，得到长度相等的K个信号帧，K≥2；Framing each audio signal in the N audio signals to obtain K signal frames of equal length, K≥2;

对所述K个信号帧中的每个信号帧做加窗处理，得到K个加窗信号帧；Windowing is performed on each of the K signal frames to obtain K windowed signal frames;

对所述K个加窗信号帧中的每个加窗信号帧做FFT变换，得到K个目标信号帧；Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames;

根据所述每个音频信号对应的所述K个目标信号帧，确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值。Determine, according to the K target signal frames corresponding to each audio signal, a phase spectrum difference value and/or a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone Power spectrum difference.

在一些可能的实现方式中，所述K个信号帧中任意两个相邻信号帧重叠R％，R＞0。In some possible implementations, any two adjacent signal frames in the K signal frames overlap by R%, and R>0.

在一些可能的实现方式中，所述R为25或者50。In some possible implementations, the R is 25 or 50.

根据公式确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值，According to the formula determining a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone,

根据所述每个音频信号对应的所述K个目标信号帧，确定所述每个音频信号的功率谱；Determine the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;

根据所述每个音频信号的功率谱，确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone is determined.

根据公式计算所述每个音频信号的功率谱，According to the formula Calculate the power spectrum of each audio signal,

其中，P_i(ω)表示第i个音频信号的功率谱，Y_i,j(ω)表示第i个音频信号中的第j个目标信号帧，K表示每个麦克风采集到信号的总帧数，ω表示频率。Among them, P _i (ω) represents the power spectrum of the ith audio signal, Y _i,j (ω) represents the jth target signal frame in the ith audio signal, and K represents the total frame of the signal collected by each microphone number, ω represents the frequency.

根据公式PD_i(ω)＝P₁(ω)-P_i(ω)计算所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值，Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD _i (ω)=P ₁ (ω)-P _i (ω),

确定所述N个麦克风在进行音频信号采集时的采样频率F_s和FFT点数N_fft，使用扬声器播放高斯白噪声数据或者扫频信号数据，控制所述N个麦克风采集所述N个音频信号，其中，若所述扬声器所播放的数据为扫频信号数据，所述扫频信号数据由M+1段长度相等且频率不等的信号构成， determining the sampling frequency F _s and the number of FFT points N _fft of the N microphones when collecting audio signals, using a speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to collect the N audio signals, Wherein, if the data played by the speaker is frequency sweep signal data, the frequency sweep signal data is composed of M+1 segments of signals with equal lengths and unequal frequencies,

根据公式计算所述M+1段信号中每段信号的频率，以及According to the formula calculating the frequency of each segment of the M+1 segment signal, and

根据公式S_i(t)＝sin(2πf_it)计算所述M+1段信号中的每段信号，Calculate each segment of the M+1 segment signals according to the formula S _i (t)=sin( _2πfi t),

在一些可能的实现方式中，所述扬声器所播放的扫频信号数据写成以下向量形式：In some possible implementations, the frequency sweep signal data played by the speaker is written in the following vector form:

在一些可能的实现方式中，所述N个麦克风分别采集到N个音频信号，其中第i个麦克风采集到的音频信号表示为x_i(t)，且x_i(t)可以写成以下向量形式：In some possible implementations, the N microphones collect N audio signals respectively, wherein the audio signal collected by the ith microphone is represented as _xi (t), and _xi (t) can be written in the following vector form :

在一些可能的实现方式中，所述获取单元具体用于：In some possible implementations, the obtaining unit is specifically used for:

将所述N个麦克风放置于测试房间内，所述测试房间内配置有扬声器，所述N个麦克风位于所述扬声器的正前方；The N microphones are placed in a test room, where a speaker is configured in the test room, and the N microphones are located directly in front of the speaker;

控制所述扬声器播放高斯白噪声数据或者扫频信号数据，以及控制所述N个麦克风分别采集所述N个音频信号。The speaker is controlled to play Gaussian white noise data or frequency sweep signal data, and the N microphones are controlled to collect the N audio signals respectively.

在一些可能的实现方式中，所述测试房间内具有消音室环境，所述扬声器为音频测试专用人工嘴，且所述人工嘴在使用之前用标准麦克风校准。In some possible implementations, the test room has an anechoic chamber environment, the speaker is an artificial mouth dedicated to audio testing, and the artificial mouth is calibrated with a standard microphone before use.

在一些可能的实现方式中，在所述处理单元控制所述扬声器播放高斯白噪声数据或者扫频信号数据之前，所述获取单元还用于：In some possible implementations, before the processing unit controls the speaker to play Gaussian white noise data or frequency sweep signal data, the acquiring unit is further configured to:

在安静的环境下，获取所述N个麦克风在第一时长T₁内采集的第一音频数据X₁(n)；In a quiet environment, obtain the first audio data X ₁ (n) collected by the N microphones within the first duration T ₁ ;

在播放高斯白噪声数据或者扫频信号数据的环境下，获取所述N个麦克风在第二时长T₂内采集的第二音频数据X₂(n)；Under the environment of playing Gaussian white noise data or frequency sweep signal data, obtain the second audio data X ₂ (n) collected by the N microphones within the second time period T ₂ ;

触发所述处理单元根据公式计算信噪比SNR，且确保所述SNR大于第一阈值。trigger the processing unit according to the formula Calculate the signal-to-noise ratio SNR and ensure that the SNR is greater than a first threshold.

第三方面，提供了一种评估麦克风阵列一致性的装置，包括：In a third aspect, an apparatus for evaluating the consistency of a microphone array is provided, including:

存储器，用于存储程序和数据；以及memory, for storing programs and data; and

处理器，用于调用并运行所述存储器中存储的程序和数据；a processor for calling and running the programs and data stored in the memory;

该装置被配置为执行上述第一方面或其任意可能的实现方式中的方法。The apparatus is configured to perform the method of the first aspect above or any possible implementation thereof.

第四方面，提供了评估麦克风阵列一致性的系统，包括：A fourth aspect provides a system for evaluating the consistency of microphone arrays, including:

构成麦克风阵列的N个麦克风，N≥2；N microphones forming a microphone array, N≥2;

至少一个音频源；at least one audio source;

装置，包括用于存储程序和数据的存储器和用于调用并运行所述存储器中存储的程序和数据的处理器，该装置被配置为上述第一方面或其任意可能的实现方式中的方法。An apparatus comprising a memory for storing programs and data and a processor for invoking and executing the programs and data stored in the memory, the apparatus being configured as a method in the above first aspect or any possible implementations thereof.

第五方面，提供了一种计算机存储介质，该计算机存储介质中存储有程序代码，该程序代码可以用于指示执行上述第一方面或其任意可能的实现方式中的方法。In a fifth aspect, a computer storage medium is provided, and program codes are stored in the computer storage medium, and the program codes can be used to instruct to execute the method in the first aspect or any possible implementation manners thereof.

第六方面，提供了一种包含指令的计算机程序产品，其在计算机上运行时，使得计算机执行上述第一方面或其任意可能的实现方式中的方法。In a sixth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or any possible implementations thereof.

附图说明Description of drawings

图1是本申请实施例的评估麦克风阵列一致性的方法的示意性流程图。FIG. 1 is a schematic flowchart of a method for evaluating the consistency of a microphone array according to an embodiment of the present application.

图2是根据本申请实施例的测试环境示意图。FIG. 2 is a schematic diagram of a test environment according to an embodiment of the present application.

图3是根据本申请实施例的计算相位谱差值的示意图。FIG. 3 is a schematic diagram of calculating a phase spectrum difference value according to an embodiment of the present application.

图4是根据本申请实施例的计算功率谱差值的示意图。FIG. 4 is a schematic diagram of calculating a power spectrum difference according to an embodiment of the present application.

图5是根据本申请实施例的两麦克风之间的相位谱差值的示意图。FIG. 5 is a schematic diagram of a phase spectrum difference between two microphones according to an embodiment of the present application.

图6是根据本申请实施例的两麦克风之间校准之后的相位谱差值的示意图。FIG. 6 is a schematic diagram of a phase spectrum difference value after calibration between two microphones according to an embodiment of the present application.

图7a是根据本申请实施例的两麦克风的功率谱的示意图。FIG. 7a is a schematic diagram of power spectra of two microphones according to an embodiment of the present application.

图7b是根据本申请实施例的两麦克风之间的功率谱差值的示意图。FIG. 7b is a schematic diagram of a power spectrum difference between two microphones according to an embodiment of the present application.

图8是根据本申请实施例的一种评估麦克风阵列一致性的设备的示意性结构图。FIG. 8 is a schematic structural diagram of a device for evaluating the consistency of a microphone array according to an embodiment of the present application.

图9是根据本申请实施例的一种评估麦克风阵列一致性的装置的示意性结构图。FIG. 9 is a schematic structural diagram of an apparatus for evaluating the consistency of a microphone array according to an embodiment of the present application.

图10是根据本申请实施例的一种评估麦克风阵列一致性的系统的示意性结构图。FIG. 10 is a schematic structural diagram of a system for evaluating the consistency of a microphone array according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚地描述。The technical solutions in the embodiments of the present application will be clearly described below with reference to the accompanying drawings in the embodiments of the present application.

麦克风阵列(Microphone Array)是指由一定数目的麦克风(声学传感器)组成，用来对声场的空间特性进行采样并处理的系统。利用两个麦克风接收到声波的相位之间的差异对声波进行过滤，能最大限度将环境背景声音清除掉，只剩下需要的声波。Microphone Array refers to a system composed of a certain number of microphones (acoustic sensors) used to sample and process the spatial characteristics of the sound field. Using the difference between the phases of the sound waves received by the two microphones to filter the sound waves, the ambient background sound can be removed to the maximum extent, and only the required sound waves are left.

多通道语音增强技术算法假设条件是麦克风阵列中的多个麦克风的目标语音成分高相关性，目标语音与非目标干扰不相关，因此麦克风阵列中不同麦克风之间的一致性直接影响算法性能。The assumption of the multi-channel speech enhancement technology algorithm is that the target speech components of multiple microphones in the microphone array are highly correlated, and the target speech is not correlated with non-target interference, so the consistency between different microphones in the microphone array directly affects the performance of the algorithm.

麦克风一致性的定量评估，可用于指导麦克风的设计和麦克风阵列的设计，麦克风阵列的电路、电子元器件、声学结构都会影响麦克风的一致性，在设计麦克风阵列时可逐项测试各种因素对一致性的影响，从而使麦克风一致性的设计达到系统要求。Quantitative evaluation of microphone consistency can be used to guide the design of microphones and microphone arrays. The circuits, electronic components, and acoustic structures of microphone arrays will affect the consistency of microphones. When designing microphone arrays, various factors can be tested item by item. The effect of consistency, so that the design of microphone consistency can meet the system requirements.

麦克风一致性的定量评估，可用于比较不同算法的鲁棒性，在达到相同语音增强性能的前提性，对一致性指标要求越低，算法鲁棒性越好。Quantitative evaluation of microphone consistency can be used to compare the robustness of different algorithms. On the premise of achieving the same speech enhancement performance, the lower the requirements for consistency indicators, the better the algorithm robustness.

在本申请实施例中，从幅度谱差值和相位谱差值两个方面衡量一致性，具有客观性和准确性，并且定量的一致性评估方法能够客观的指导麦克风阵列的设计，也能够客观的比较多通道语音增强算法的鲁棒性。In the embodiment of the present application, the consistency is measured from the two aspects of the amplitude spectrum difference value and the phase spectrum difference value, which is objective and accurate, and the quantitative consistency evaluation method can objectively guide the design of the microphone array, and can also objectively guide the design of the microphone array. A comparison of the robustness of multi-channel speech enhancement algorithms.

以下，结合图1至图7，详细介绍本申请实施例的评估麦克风阵列一致性的方法。Hereinafter, with reference to FIGS. 1 to 7 , the method for evaluating the consistency of the microphone array according to the embodiment of the present application will be described in detail.

图1是本申请一个实施例的评估麦克风阵列一致性的方法的示意性流程图。应理解，图1示出了该方法的步骤或操作，但这些步骤或操作仅是示例，本申请实施例还可以执行其他操作或者图1中的各个操作的变形。该方法可以由评估麦克风阵列一致性的装置执行，其中，该评估麦克风阵列一致性的装置可以是手机、平板电脑、便携式电脑、个人数字助理(Personal Digital Assistant，PDA)等等。FIG. 1 is a schematic flowchart of a method for evaluating the consistency of a microphone array according to an embodiment of the present application. It should be understood that FIG. 1 shows steps or operations of the method, but these steps or operations are only examples, and the embodiments of the present application may also perform other operations or variations of the respective operations in FIG. 1 . The method may be performed by a device for evaluating the consistency of a microphone array, where the device for evaluating the consistency of the microphone array may be a mobile phone, a tablet computer, a portable computer, a Personal Digital Assistant (PDA), and the like.

S110，获取N个麦克风分别采集的N个音频信号，该N个麦克风构成麦克风阵列，N≥2。S110: Acquire N audio signals respectively collected by N microphones, where the N microphones form a microphone array, and N≥2.

在对N个麦克风进行一致性评估时，需要限制N个麦克风所处的环境，即该N个音频信号是在特殊的测试环境下采集的。When evaluating the consistency of N microphones, it is necessary to limit the environment where the N microphones are located, that is, the N audio signals are collected in a special test environment.

具体地，如图2所示，将由该N个麦克风构成的麦克风阵列201放置于测试房间202内，且在该测试房间202内配置有扬声器203，该麦克风阵列201具体位于该扬声器203的正前方，该麦克风阵列201与该扬声器203连接诸如计算机的控制设备204。该控制设备204可以控制该扬声器203播放特定的音频数据，例如，播放高斯白噪声数据或者扫频信号数据，同时，该控制设备204可以从该麦克风阵列201处获取该N个麦克风分布采集的N个音频信号。Specifically, as shown in FIG. 2 , a microphone array 201 composed of the N microphones is placed in a test room 202 , and a speaker 203 is configured in the test room 202 , and the microphone array 201 is located directly in front of the speaker 203 , the microphone array 201 and the speaker 203 are connected to a control device 204 such as a computer. The control device 204 can control the speaker 203 to play specific audio data, for example, to play Gaussian white noise data or frequency sweep signal data, and at the same time, the control device 204 can obtain from the microphone array 201 the N data collected by the N microphone distributions audio signal.

需要注意的是，麦克风一致性评估要求采集的音频信号的信噪比足够高，背景噪声足够弱，因此测试环境要求在安静环境下。特别地，测试房间202内要求具有消音室环境。扬声器203要求信噪比较高，且频率响应曲线平坦，特别地，扬声器使用音频测试专用人工嘴，且使用之前用标准麦克风校准。麦克风阵列201放置在扬声器203的正前方，特别地，要求放置在标准麦克风校准的位置。It should be noted that the microphone conformance assessment requires that the signal-to-noise ratio of the collected audio signal is high enough and the background noise is weak enough, so the test environment requires a quiet environment. In particular, an anechoic room environment is required within the test room 202 . The speaker 203 requires a high signal-to-noise ratio and a flat frequency response curve. In particular, the speaker uses an artificial mouth dedicated to audio testing and is calibrated with a standard microphone before use. The microphone array 201 is placed directly in front of the loudspeaker 203, in particular, it is required to be placed in a position for standard microphone calibration.

可选地，在进行正式的音频信号采集之前，还需要对上述测试环境进行信噪比(signal-to-noise ratio，SNR)检测。Optionally, before the formal audio signal collection is performed, a signal-to-noise ratio (signal-to-noise ratio, SNR) detection needs to be performed on the above-mentioned test environment.

具体地，在如图2所示的测试环境下，首先，在安静的环境下(即扬声器203处于关闭状态)，获取该N个麦克风在第一时长T₁内采集的第一音频数据X₁(n)；然后，在播放高斯白噪声数据或者扫频信号数据的环境下(即该控制设备204控制该扬声器203播放高斯白噪声数据或者扫频信号数据)，获取该N个麦克风在第二时长T₂内采集的第二音频数据X₂(n)；接着，根据如下公式1计算SNR；最后，当SNR大于设定阈值时，则检测通过，否则检测不通过。Specifically, in the test environment shown in FIG. 2 , first, in a quiet environment (that is, the speaker 203 is in a closed state), first audio data X ₁ collected by the N microphones within the first time period T ₁ is acquired (n); Then, under the environment of playing Gaussian white noise data or frequency sweep signal data (that is, the control device 204 controls the speaker 203 to play Gaussian white noise data or frequency sweep signal data), obtain the N microphones in the second The second audio data X ₂ (n) collected within the duration T ₂ ; then, the SNR is calculated according to the following formula 1; finally, when the SNR is greater than the set threshold, the detection passes, otherwise the detection fails.

其中，T₁表示第一时长，T₂表示第二时长，X₁(n)表示第一音频数据，X₂(n)表示第二音频数据。Wherein, T ₁ represents the first duration, T ₂ represents the second duration, X ₁ (n) represents the first audio data, and X ₂ (n) represents the second audio data.

需要说明的是，若检测不通过，需要对上述测试环境进行调整或者校准，消除一些可能对性噪比造成影响的因素，直至根据上述公式1所计算的SNR大于设定阈值。It should be noted that if the test fails, the above test environment needs to be adjusted or calibrated to eliminate some factors that may affect the SNR, until the SNR calculated according to the above formula 1 is greater than the set threshold.

可选地，在本申请实施例中，使用上述图2所示的测试环境采集音频信号具体可以包括：Optionally, in this embodiment of the present application, using the test environment shown in FIG. 2 to collect audio signals may specifically include:

确定该N个麦克风在进行音频信号采集时的采样频率F_s和FFT点数N_fft，使用扬声器播放高斯白噪声数据或者扫频信号数据，该N个麦克风采集该N个音频信号。Determine the sampling frequency F _s and the number of FFT points N _fft of the N microphones when collecting audio signals, use the speaker to play Gaussian white noise data or frequency sweep signal data, and collect the N audio signals by the N microphones.

可选地，FFT点数N_fft为偶数，一般为32,64,128,...,1024等，点数越多，运算量的节约就越大。Optionally, the number of FFT points N _fft is an even number, generally 32, 64, 128, .

需要说明的是，若该扬声器所播放的数据为扫频信号数据，该扫频信号数据由M+1段长度相等且频率不等的信号构成， It should be noted that, if the data played by the speaker is frequency sweep signal data, the frequency sweep signal data is composed of M+1 segments of signals with equal lengths and unequal frequencies.

可选地，可以根据如下公式2计算该M+1段信号中每段信号的频率，以及根据如下公式3计算该M+1段信号中的每段信号。Optionally, the frequency of each segment of the M+1 segment signal may be calculated according to the following formula 2, and the frequency of each segment of the M+1 segment signal may be calculated according to the following formula 3.

其中，f_i是第i段信号的频率，F_s是采样频率，N_fft表示FFT点数。Among them, f _i is the frequency of the i-th signal, F _s is the sampling frequency, and N _fft is the number of FFT points.

S_i(t)＝sin(2πf_it)公式3S _i (t)=sin(2πf _i t) Equation 3

其中，S_i(t)表示第i段信号，f_i是第i段信号的频率。Among them, S _i (t) represents the i-th segment signal, and f _i is the frequency of the i-th segment signal.

需要说明的是，第一段信号S₁(t)的长度为周期T的整数倍，T＝1/f₁。It should be noted that the length of the first segment signal S ₁ (t) is an integer multiple of the period T, T=1/f ₁ .

可选地，扬声器所播放的扫频信号数据可以写成以下向量形式：Optionally, the frequency sweep signal data played by the speaker can be written in the following vector form:

可选地，N个麦克风分别采集到N个音频信号，其中第i个麦克风采集到的音频信号表示为x_i(t)，且x_i(t)可以写成以下向量形式：Optionally, N audio signals are collected by the N microphones, wherein the audio signal collected by the ith microphone is represented as _xi (t), and _xi (t) can be written in the following vector form:

S120，根据该N个音频信号，确定该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值，该参考麦克风为该N个麦克风中的任意一个麦克风。S120: Determine, according to the N audio signals, a phase spectrum difference value and/or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, where the reference microphones are the N Any of the microphones.

可选地，在本申请实施例中，在该N个音频信号采集到之后，可以通过音频信号分帧，对每帧音频信号加窗，对每帧加窗信号做FFT变换，求不同麦克风之间的相位谱差值。Optionally, in this embodiment of the present application, after the N audio signals are collected, the audio signals can be divided into frames, each frame of audio signals can be windowed, and each frame of windowed signals can be subjected to FFT transformation to obtain the difference between different microphones. phase spectrum difference.

具体地，如图3所示，假设该N个音频信号为x₁(t),x₂(t),…,x_N(t)，将该N个音频信号中的每个音频信号进行分帧，得到长度相等的K个信号帧，K≥2，例如，将第i个音频信号进行分帧，得到长度相等的K个信号帧写成以下向量形式：Specifically, as shown in FIG. 3 , assuming that the N audio signals are x ₁ (t), x ₂ (t), . . . , x _N (t), each of the N audio signals is divided into frame, to obtain K signal frames of equal length, K≥2, for example, divide the ith audio signal into frames to obtain K signal frames of equal length, and write the following vector form:

其中，x_i(t)表示第i个音频信号，K表示每个麦克风采集到信号的总帧数，[ ]^T表示向量或者矩阵的转置；Among them, x _i (t) represents the ith audio signal, K represents the total number of frames of the signal collected by each microphone, [ ] ^T represents the transpose of the vector or matrix;

对该K个信号帧中的每个信号帧做加窗处理，得到K个加窗信号帧，例如，对第i个音频信号的第j个帧x_i,j加窗，得到第i个音频信号的第j个加窗信号帧y_i,j＝x_i,j×Win；Perform windowing processing on each of the K signal frames to obtain K windowed signal frames, for example, add a window to the jth frame x _i,j of the ith audio signal to obtain the ith audio The jth windowed signal frame of the signal _yi,j = _xi,j ×Win;

对该K个加窗信号帧中的每个加窗信号帧做FFT变换，得到K个目标信号帧，例如，对第i个音频信号的第j个加窗信号帧y_i,j(t)做FFT变换，得到第i个音频信号的第j个目标信号帧Y_i,j(ω)；Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames, for example, for the jth windowed signal frame y _i,j (t) of the ith audio signal Do FFT transformation to obtain the j-th target signal frame Yi _,j (ω) of the i-th audio signal;

根据该每个音频信号对应的该K个目标信号帧，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值，例如，假设第j个目标信号帧的主频率为则可以根据以下公式4计算第i个麦克风与参考麦克风在主频率为处的相位谱差值。According to the K target signal frames corresponding to each audio signal, determine the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, for example, assuming the jth target The dominant frequency of the signal frame is Then the main frequency of the i-th microphone and the reference microphone can be calculated according to the following formula 4: The phase spectrum difference at .

需要说明的是，在上述图3中，是以第一个麦克风为参考麦克风的，即分别计算除该第一麦克风之外的每个麦克风与该第一麦克风之间的相位谱差值，且第一麦克风对应音频信号x₁(t)，第二麦克风对应音频信号x₂(t)，…，第N麦克风对应音频信号x_N(t)。It should be noted that in FIG. 3 above, the first microphone is used as the reference microphone, that is, the phase spectrum difference values between each microphone except the first microphone and the first microphone are calculated separately, and The first microphone corresponds to the audio signal x ₁ (t), the second microphone corresponds to the audio signal x ₂ (t), ..., and the Nth microphone corresponds to the audio signal x _N (t).

可选地，K表示每个麦克风接收到信号的总帧数。Optionally, K represents the total number of frames of signals received by each microphone.

在一些可能的实现方式中，该K个信号帧中任意两个相邻信号帧重叠R％，R＞0。例如，该R为25或者50。换句话说，该K个信号帧中任意两个相邻信号帧重叠25％或者50％。In some possible implementations, any two adjacent signal frames in the K signal frames overlap by R%, and R>0. For example, the R is 25 or 50. In other words, any two adjacent signal frames among the K signal frames overlap by 25% or 50%.

可选地，在本申请实施例中，在进行相位一致性评估时，该N个音频信号是在播放扫频信号数据的环境下采集的信号。换句话说，在计算上述相位谱差值时，该N个音频信号是在播放扫频信号数据的环境下采集的信号。Optionally, in this embodiment of the present application, when the phase consistency evaluation is performed, the N audio signals are signals collected in an environment where frequency sweep signal data is played. In other words, when calculating the above-mentioned phase spectrum difference, the N audio signals are signals collected in an environment of playing frequency sweep signal data.

因此，可以计算出任意频率ω的相位差，即得到第i个麦克风与参考麦克风之间的相位谱差值PDiff_i(ω)，即上述 Therefore, the phase difference of any frequency ω can be calculated, that is, the phase spectrum difference PDiff _i (ω) between the ith microphone and the reference microphone can be obtained, that is, the above

可选地，在本申请实施例中，在该N个音频信号采集到之后，可以通过音频信号分帧，对每帧音频信号加窗，对每帧加窗信号做FFT变换，求FFT变换之后的每帧信号的功率谱，求不同麦克风之间的功率谱差值。Optionally, in this embodiment of the present application, after the N audio signals are collected, the audio signals can be divided into frames, each frame of audio signals can be windowed, and FFT transformation can be performed on each frame of the windowed signals, and after the FFT transformation is obtained. The power spectrum of each frame of the signal, and find the power spectrum difference between different microphones.

具体地，如图4所示，假设该N个音频信号为x₁(t),x₂(t),…,x_N(t)，将该N个音频信号中的每个音频信号进行分帧，得到长度相等的K个信号帧，K≥2，例如，将第i个音频信号进行分帧，得到长度相等的K个信号帧写成以下向量形式：Specifically, as shown in FIG. 4 , assuming that the N audio signals are x ₁ (t), x ₂ (t), . . . , x _N (t), each of the N audio signals is divided into frame, to obtain K signal frames of equal length, K≥2, for example, divide the ith audio signal into frames to obtain K signal frames of equal length, and write the following vector form:

其中，x_i(t)表示第i个音频信号，K表示每个麦克风接收到信号的总帧数，[ ]^T表示向量或者矩阵的转置；Among them, x _i (t) represents the ith audio signal, K represents the total number of frames of the signal received by each microphone, [ ] ^T represents the transpose of the vector or matrix;

根据该每个音频信号对应的该K个目标信号帧，确定该每个音频信号的功率谱，例如，根据以下公式5计算第i个音频信号的功率谱；Determine the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal, for example, calculate the power spectrum of the ith audio signal according to the following formula 5;

根据该每个音频信号的功率谱，确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值，例如，根据以下公式6计算第i个麦克风与该参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, determine the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, for example, calculate the ith microphone and the reference microphone according to the following formula 6 The power spectrum difference between the reference microphones.

其中，P_i(ω)表示第i个音频信号的功率谱，Y_i,j(ω)表示第i个音频信号中的第j个目标信号帧，ω表示频率，K表示每个麦克风采集到信号的总帧数。Among them, P _i (ω) represents the power spectrum of the ith audio signal, Y _i,j (ω) represents the jth target signal frame in the ith audio signal, ω represents the frequency, and K represents the frequency collected by each microphone. The total number of frames of the signal.

PD_i(ω)＝P₁(ω)-P_i(ω)公式6PD _i (ω)=P ₁ (ω)-P _i (ω) Equation 6

需要说明的是，在上述图4中，是以第一个麦克风为参考麦克风的，即分别计算除该第一麦克风之外的每个麦克风与该第一麦克风之间的功率谱差值，且第一麦克风对应音频信号x₁(t)，第二麦克风对应音频信号x₂(t)，…，第N麦克风对应音频信号x_N(t)。It should be noted that in the above-mentioned FIG. 4, the first microphone is used as the reference microphone, that is, the power spectrum difference between each microphone except the first microphone and the first microphone is calculated separately, and The first microphone corresponds to the audio signal x ₁ (t), the second microphone corresponds to the audio signal x ₂ (t), ..., and the Nth microphone corresponds to the audio signal x _N (t).

可选地，在本申请实施例中，在进行幅度一致性评估时，该N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。换句话说，在计算上述功率谱差值时，该N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。Optionally, in this embodiment of the present application, when the amplitude consistency evaluation is performed, the N audio signals are signals collected in an environment where Gaussian white noise data or frequency sweep signal data is played. In other words, when calculating the above power spectrum difference, the N audio signals are signals collected in an environment where Gaussian white noise data or frequency sweep signal data is played.

S130，根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值，对该N个麦克风进行一致性评估。S130, according to the phase spectrum difference and/or the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, perform consistency evaluation on the N microphones.

具体地，相位谱差值用于进行相位一致性评估，以及功率谱差值用于进行幅度一致性评估。Specifically, the phase spectrum difference value is used for phase consistency evaluation, and the power spectrum difference value is used for amplitude consistency evaluation.

可选地，在本申请实施例中，根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值，评估对应麦克风与所述参考麦克风之间的相位一致性。Optionally, in this embodiment of the present application, according to the phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, the corresponding microphone and the reference microphone are evaluated. phase coherence between.

需要说明的是，因在采集数据时，不同麦克风到声源的距离难于完全一致，所以不同麦克风之间存在一个固定相位差。It should be noted that since the distances from different microphones to the sound source are difficult to be completely consistent when collecting data, there is a fixed phase difference between different microphones.

可选地，在本申请实施例中，可以通过固定相位差校准上述相位谱差值。Optionally, in this embodiment of the present application, the above-mentioned phase spectrum difference value may be calibrated by using a fixed phase difference.

具体地，分别测量该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风到声源的距离差，例如，d_i表示第i个麦克风与参考麦克风到声源的距离差；Specifically, measure the distance difference between each of the N microphones except the reference microphone and the reference microphone to the sound source, for example, d _i represents the distance difference between the ith microphone and the reference microphone to the sound source;

根据所测量的距离差，分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差，例如，可以根据以下公式7计算第i个麦克风与参考麦克风之间的固定相位差；According to the measured distance difference, calculate the fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, for example, the ith microphone and the reference microphone can be calculated according to the following formula 7 A fixed phase difference between;

需要说明的是，固定相位差与信号频率满足线性关系，因此，可以使用线性拟合的方式确定固定相位差。It should be noted that the fixed phase difference and the signal frequency satisfy a linear relationship, therefore, the fixed phase difference can be determined by means of linear fitting.

例如，麦克风1与参考麦克风之间的固定相位差为A，麦克风1与参考麦克风之间的相位谱差值为B，如图5所示，直线部分表示拟合得到的麦克风1与参考麦克风之间的固定相位差，曲线部分表示麦克风1与参考麦克风之间的相位谱差值，其整体表现出，随着频率从0Hz增加至8000Hz，麦克风1与参考麦克风之间的相位谱差值从0弧度减小至-2弧度。校准之后，麦克风1与参考麦克风之间的相位谱差值为C，如图6中曲线所示，此时，C＝B-A，其整体表现出，随着频率从0Hz增加至8000Hz，麦克风1与参考麦克风之间的相位谱差值在0弧度与±0.5弧度之间波动。For example, the fixed phase difference between microphone 1 and the reference microphone is A, and the phase spectrum difference between microphone 1 and the reference microphone is B. As shown in Figure 5, the straight line represents the fitted difference between microphone 1 and the reference microphone. The fixed phase difference between the two, the curve part represents the phase spectrum difference between the microphone 1 and the reference microphone, and the whole shows that as the frequency increases from 0Hz to 8000Hz, the phase spectrum between the microphone 1 and the reference microphone The difference between the microphone 1 and the reference microphone changes from 0 The radians are reduced to -2 radians. After calibration, the phase spectrum difference between microphone 1 and the reference microphone is C, as shown in the curve in Figure 6, at this time, C=B-A, which shows that as the frequency increases from 0Hz to 8000Hz, the difference between microphone 1 and The phase spectrum difference between the reference microphones fluctuates between 0 radians and ±0.5 radians.

由图5和图6对比可知，固定相位差会对两个麦克风之间的相位谱差值造成较大的影响，因此，在对两麦克风进行幅度一致性评估时，需要消除两麦克风之间的固定相位差所造成的影响。It can be seen from the comparison between Figure 5 and Figure 6 that the fixed phase difference will have a greater impact on the phase spectrum difference between the two microphones. Therefore, when evaluating the amplitude consistency of the two microphones, it is necessary to eliminate the difference between the two microphones. The effect of a fixed phase difference.

可选地，在本申请实施例中，根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值，评估对应麦克风与该参考麦克风之间的幅度一致性。Optionally, in this embodiment of the present application, according to the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the amplitude between the corresponding microphone and the reference microphone is evaluated. consistency.

例如，如图7所示，具体地，图7a示出了麦克风1的功率谱与参考麦克风的功率谱，图7b示出了麦克风1与参考麦克风之间的功率谱差值，麦克风1与参考麦克风之间的功率谱相差不大，并且其功率谱差值的最大值＜±1分贝(dB)。For example, as shown in Figure 7, specifically, Figure 7a shows the power spectrum of the microphone 1 and the power spectrum of the reference microphone, Figure 7b shows the power spectrum difference between the microphone 1 and the reference microphone, the microphone 1 and the reference microphone The power spectrums between the microphones are not very different, and the maximum value of their power spectrum differences is <±1 decibel (dB).

可选地，在本申请实施例中，可以逐项测试诸如麦克风阵列的电路、电子元器件、声学结构等因素对麦克风一致性的影响，从而指导麦克风阵列的校准，具体地，可以是指导麦克风的设计和麦克风阵列的设计，评估多通道增强算法的鲁棒性。Optionally, in this embodiment of the present application, the influence of factors such as circuits, electronic components, and acoustic structures of the microphone array on the consistency of the microphone may be tested item by item, so as to guide the calibration of the microphone array. Specifically, it may be used to guide the microphone array. Design and design of microphone arrays to evaluate the robustness of multi-channel enhancement algorithms.

因此，在本申请实施例中，可以根据N个麦克风分别采集的N个音频信号，确定各个麦克风与参考麦克风之间的相位谱差值和/或功率谱差值，从而对N个麦克风进行一致性评估，消除麦克风之间的一致性对多通道语音增强算法的影响，提升用户体验。Therefore, in this embodiment of the present application, the phase spectrum difference value and/or the power spectrum difference value between each microphone and the reference microphone may be determined according to the N audio signals collected by the N microphones, so that the N microphones can be consistent with each other. performance evaluation, eliminate the influence of the consistency between microphones on the multi-channel speech enhancement algorithm, and improve the user experience.

可选地，如图8所示，本申请实施例提供了一种评估麦克风阵列一致性的设备800，包括：Optionally, as shown in FIG. 8 , an embodiment of the present application provides a device 800 for evaluating the consistency of a microphone array, including:

获取单元810，用于获取N个麦克风分别采集的N个音频信号，所述N个麦克风构成麦克风阵列，N≥2；an acquisition unit 810, configured to acquire N audio signals collected by N microphones respectively, where the N microphones constitute a microphone array, and N≥2;

处理单元820，用于根据所述N个音频信号，确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值，所述参考麦克风为所述N个麦克风中的任意一个麦克风；a processing unit 820, configured to determine, according to the N audio signals, a phase spectrum difference value and/or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, The reference microphone is any one of the N microphones;

所述处理单元820，还用于根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值，对所述N个麦克风进行一致性评估。The processing unit 820 is further configured to, according to the phase spectrum difference value and/or the power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, perform a N microphones for consistency evaluation.

可选地，所述处理单元820具体用于：Optionally, the processing unit 820 is specifically configured to:

可选地，所述处理单元820还用于：Optionally, the processing unit 820 is further configured to:

可选地，所述N个音频信号是在播放扫频信号数据的环境下采集的信号。Optionally, the N audio signals are signals collected in an environment of playing frequency sweep signal data.

可选地，所述N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。Optionally, the N audio signals are signals collected in an environment of playing Gaussian white noise data or frequency sweep signal data.

可选地，所述扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。Optionally, the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.

可选地，所述K个信号帧中任意两个相邻信号帧重叠R％，R＞0。Optionally, any two adjacent signal frames in the K signal frames overlap by R%, and R>0.

可选地，所述R为25或者50。Optionally, the R is 25 or 50.

可选地，将第i个音频信号进行分帧，得到长度相等的K个信号帧写成以下向量形式：Optionally, the ith audio signal is divided into frames to obtain K signal frames of equal length and written in the following vector form:

可选地，所述扬声器所播放的扫频信号数据写成以下向量形式：Optionally, the frequency sweep signal data played by the speaker is written in the following vector form:

可选地，所述N个麦克风分别采集到N个音频信号，其中第i个麦克风采集到的音频信号表示为x_i(t)，且x_i(t)可以写成以下向量形式：Optionally, the N microphones collect N audio signals respectively, wherein the audio signal collected by the i-th microphone is represented as x _i (t), and x _i (t) can be written in the following vector form:

可选地，所述获取单元810具体用于：Optionally, the obtaining unit 810 is specifically configured to:

可选地，所述测试房间内具有消音室环境，所述扬声器为音频测试专用人工嘴，且所述人工嘴在使用之前用标准麦克风校准。Optionally, the test room has an anechoic room environment, the speaker is an artificial mouth dedicated to audio testing, and the artificial mouth is calibrated with a standard microphone before use.

可选地，在所述处理单元820控制所述扬声器播放高斯白噪声数据或者扫频信号数据之前，所述获取单元810还用于：Optionally, before the processing unit 820 controls the speaker to play Gaussian white noise data or frequency sweep signal data, the acquiring unit 810 is further configured to:

触发所述处理单元820根据公式计算信噪比SNR，且确保所述SNR大于第一阈值。trigger the processing unit 820 according to the formula Calculate the signal-to-noise ratio SNR and ensure that the SNR is greater than a first threshold.

可选地，如图9所示，本申请实施例提供了一种评估麦克风阵列一致性的装置900，包括：Optionally, as shown in FIG. 9 , an embodiment of the present application provides an apparatus 900 for evaluating the consistency of a microphone array, including:

存储器910，用于存储程序和数据；以及memory 910 for storing programs and data; and

处理器920，用于调用并运行所述存储器中存储的程序和数据；a processor 920, for calling and running the programs and data stored in the memory;

该装置900被配置为执行上述图1至7中所示的方法。The apparatus 900 is configured to perform the methods shown in FIGS. 1 to 7 above.

可选地，如图10所示，本申请实施例提供了一种评估麦克风阵列一致性的系统1000，包括：Optionally, as shown in FIG. 10 , an embodiment of the present application provides a system 1000 for evaluating the consistency of a microphone array, including:

构成麦克风阵列1010的N个麦克风，N≥2；N microphones constituting the microphone array 1010, N≥2;

至少一个音频源1020；at least one audio source 1020;

装置1030，包括用于存储程序和数据的存储器1031和用于调用并运行所述存储器中存储的程序和数据的处理器1032，该装置1030被配置为上述图1至7中所示的方法。An apparatus 1030, comprising a memory 1031 for storing programs and data and a processor 1032 for calling and executing the programs and data stored in the memory, is configured as the method shown in FIGS. 1 to 7 above.

应理解，在本申请的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，该单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以该权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. a kind of method for assessing microphone array consistency characterized by comprising

N number of audio signal that N number of microphone acquires respectively is obtained, N number of microphone constitutes microphone array, N >=2；

According to N number of audio signal, determine each microphone in N number of microphone in addition to reference microphone with it is described Phase spectrum difference and/or power spectrum difference between reference microphone, the reference microphone are appointing in N number of microphone It anticipates a microphone；

According to each microphone in N number of microphone in addition to the reference microphone and between the reference microphone Phase spectrum difference and/or power spectrum difference carry out compliance evaluation to N number of microphone.

2. the method according to claim 1, wherein it is described according in N number of microphone remove reference microphone Except each microphone and the reference microphone between phase spectrum difference, consistency is carried out to the N number of microphone and is commented Estimate, comprising:

According to each microphone in N number of microphone in addition to the reference microphone and between the reference microphone Phase spectrum difference assesses the phase equalization between corresponding microphone and the reference microphone.

3. according to the method described in claim 2, it is characterized in that, the method also includes:

Each microphone in N number of microphone in addition to the reference microphone is measured respectively to arrive with the reference microphone The range difference of sound source；

According to measured range difference, each Mike in N number of microphone in addition to the reference microphone is calculated separately Fixed skew between wind and the reference microphone；

According to each microphone in N number of microphone in addition to the reference microphone and between the reference microphone Fixed skew calibrates its corresponding phase spectrum difference respectively.

4. according to the method described in claim 3, it is characterized in that, the distance according to measured by, calculates separately described N number of The fixed skew between each microphone and the reference microphone in microphone in addition to the reference microphone, packet It includes:

According to formulaIt calculates separately in N number of microphone in addition to the reference microphone Fixed skew between each microphone and the reference microphone,

Wherein, Y_i(ω) indicates the frequency spectrum of i-th of microphone, Y₁(ω) indicates that the frequency spectrum of reference microphone, ω indicate frequency, d_i Indicate the range difference of i-th of microphone and reference microphone to sound source, the c expression velocity of sound, 2 π ω d_i/ c indicate i-th microphone with Fixed skew between reference microphone.

5. method according to claim 1 to 4, which is characterized in that described to be removed according in N number of microphone The phase spectrum difference between each microphone and the reference microphone except reference microphone, to N number of microphone into Row compliance evaluation, comprising:

According to each microphone in N number of microphone in addition to the reference microphone and between the reference microphone Power spectrum difference assesses the amplitude coincidence between corresponding microphone and the reference microphone.

6. method according to any one of claim 2 to 4, which is characterized in that N number of audio signal is swept in broadcasting The signal acquired in the environment of frequency signal data.

7. according to the method described in claim 5, it is characterized in that, N number of audio signal is to play white Gaussian noise number According to or swept-frequency signal data in the environment of the signal that acquires.

8. method according to claim 6 or 7, which is characterized in that the swept-frequency signal is Linear chirp, logarithm is swept Frequency signal, linear stepping swept-frequency signal, any one in logarithm stepping swept-frequency signal.

9. method according to any one of claim 1 to 8, which is characterized in that it is described according to N number of audio signal, Determine each microphone in N number of microphone in addition to reference microphone and the Phase spectrum difference between the reference microphone Value and/or power spectrum difference, comprising:

Each audio signal in N number of audio signal is subjected to framing, obtains K signal frame of equal length, K >=2；

Windowing process is done to each signal frame in the K signal frame, obtains K windowing signal frame；

FFT transform is done to each windowing signal frame in the K windowing signal frame, obtains K echo signal frame；

According to the corresponding K echo signal frame of each audio signal, determine in N number of microphone except the reference The phase spectrum difference and/or power spectrum difference between each microphone and the reference microphone except microphone.

10. according to the method described in claim 9, it is characterized in that, any two adjacent signals frame weight in the K signal frame Folded R%, R > 0.

11. according to the method described in claim 10, it is characterized in that, the R is 25 or 50.

12. the method according to any one of claim 9 to 11, which is characterized in that divided i-th of audio signal Frame, K signal frame for obtaining equal length are write as following vector form:

x_i(t)=[x_i,1(t),x_i,2(t),…,x_i,K(t)]^T

Wherein, x_i(t) i-th of audio signal is indicated, K indicates that each microphone collects the totalframes of signal, []^TIndicate vector Or the transposition of matrix.

13. the method according to any one of claim 9 to 12, which is characterized in that described to be believed according to each audio Number corresponding K echo signal frame, determines each microphone in N number of microphone in addition to the reference microphone Phase spectrum difference between the reference microphone, comprising:

According to formulaIt determines in N number of microphone in addition to the reference microphone Each microphone and the reference microphone between phase spectrum difference,

Wherein, imag () expression takes imaginary part, and ln () expression takes natural logrithm,It indicates i-th of microphone and refers to wheat Phase spectrum difference between gram wind,Indicate j-th of echo signal frame of reference microphone,Indicate i-th of wheat J-th of echo signal frame of gram wind,Indicate basic frequency.

14. the method according to any one of claim 9 to 13, which is characterized in that described to be believed according to each audio Number corresponding K echo signal frame, determines each microphone in N number of microphone in addition to the reference microphone Power spectrum difference between the reference microphone, comprising:

According to the corresponding K echo signal frame of each audio signal, the power spectrum of each audio signal is determined；

According to the power spectrum of each audio signal, determine every in addition to the reference microphone in N number of microphone Power spectrum difference between a microphone and the reference microphone.

15. according to the method for claim 14, which is characterized in that described corresponding described according to each audio signal K echo signal frame determines the power spectrum of each audio signal, comprising:

According to formulaThe power spectrum of each audio signal is calculated,

Wherein, P_i(ω) indicates the power spectrum of i-th of audio signal, Y_i,j(ω) indicates j-th of target in i-th of audio signal Signal frame, K indicate that each microphone collects the totalframes of signal, and ω indicates frequency.

16. method according to claim 14 or 15, which is characterized in that the power according to each audio signal Spectrum, determines each microphone in N number of microphone in addition to the reference microphone and between the reference microphone Power spectrum difference, comprising:

According to formula PD_i(ω)=P₁(ω)-P_i(ω) calculates each wheat in N number of microphone in addition to reference microphone Power spectrum difference gram between wind and the reference microphone,

Wherein, PD_i(ω) indicates the power spectrum difference between i-th of microphone and reference microphone, P₁(ω) indicates to refer to Mike The power spectrum of wind, P_i(ω) indicates the power spectrum of i-th of microphone.

17. according to claim 1 to method described in any one of 16, which is characterized in that the N number of microphone of acquisition is adopted respectively N number of audio signal of collection, comprising:

Determine sample frequency F of the N number of microphone when carrying out audio signal sample_sWith FFT points N_fft, broadcast using loudspeaker White Gaussian noise data or swept-frequency signal data are put, N number of microphone acquires N number of audio signal, wherein if described The data that loudspeaker is played are swept-frequency signal data, the swept-frequency signal data are equal by M+1 segment length and frequency not etc. Signal is constituted,

18. according to the method for claim 17, which is characterized in that

According to formulaThe frequency of every segment signal in the M+1 segment signal is calculated, and

According to formula S_i(t)=sin (2 π f_iT) every segment signal in the M+1 segment signal is calculated,

Wherein, f_iIndicate the frequency of the i-th segment signal, F_sIndicate sample frequency, N_fftIndicate FFT points, S_i(t) i-th section of letter is indicated Number, and S₁(t) length is the integral multiple of cycle T, T=1/f₁。

19. according to the method for claim 18, which is characterized in that the swept-frequency signal data that the loudspeaker is played are write as Following vector form:

S (t)=[S₀(t),S₁(t),…,S_M(t)]^T

Wherein, S (t) indicates the swept-frequency signal data that loudspeaker is played, S_i(t) the i-th segment signal is indicated,[]^TTable Show the transposition of vector or matrix.

20. according to claim 1 to method described in any one of 19, which is characterized in that N number of microphone collects respectively N number of audio signal, wherein the collected audio signal of i-th of microphone is expressed as x_iAnd x (t),_i(t) it can be write as following vector Form:

x_i(t)=[x_i,1(t),x_i,2(t),…,x_i,K(t)]^T

Wherein, x_i(t) the collected audio signal of i-th of microphone is indicated, K indicates that each microphone collects total frame of signal Number, []^TIndicate the transposition of vector or matrix.

21. according to claim 1 to method described in any one of 20, which is characterized in that the N number of microphone of acquisition is adopted respectively N number of audio signal of collection, comprising:

N number of microphone is placed in test room, loudspeaker, N number of microphone are configured in the test room Positioned at the front of the loudspeaker；

It controls the loudspeaker and plays white Gaussian noise data or swept-frequency signal data, and control N number of microphone point N number of audio signal is not acquired.

22. according to the method for claim 21, which is characterized in that there is noise reduction room environmental in the test room, it is described Loudspeaker is audio-frequency test Special artificial mouth, and the artificial mouth is calibrated with standard microphone before the use.

23. the method according to claim 21 or 22, which is characterized in that play white Gaussian noise controlling the loudspeaker Before data or swept-frequency signal data, the method also includes:

Under quiet environment, N number of microphone is obtained in the first duration T₁First audio data X of interior acquisition₁(n)；

In the environment of playing white Gaussian noise data or swept-frequency signal data, N number of microphone is obtained in the second duration T₂ The second audio data X of interior acquisition₂(n)；

According to formulaSignal to Noise Ratio (SNR) is calculated, and ensures that the SNR is greater than first threshold.

24. a kind of equipment for assessing microphone array consistency characterized by comprising

Acquiring unit, the N number of audio signal acquired respectively for obtaining N number of microphone, N number of microphone constitute microphone array Column, N >=2；

Processing unit, for determining every in addition to reference microphone in N number of microphone according to N number of audio signal Phase spectrum difference and/or power spectrum difference between a microphone and the reference microphone, the reference microphone are the N Any one microphone in a microphone；

The processing unit, be also used to according in N number of microphone in addition to the reference microphone each microphone with Phase spectrum difference and/or power spectrum difference between the reference microphone carry out compliance evaluation to N number of microphone.

25. equipment according to claim 24, which is characterized in that the processing unit is specifically used for:

26. equipment according to claim 25, which is characterized in that the processing unit is also used to:

27. equipment according to claim 26, which is characterized in that the processing unit is specifically used for:

28. the equipment according to any one of claim 24 to 27, which is characterized in that the processing unit is specifically used for:

29. the equipment according to any one of claim 25 to 27, which is characterized in that N number of audio signal is to broadcast Put the signal acquired in the environment of swept-frequency signal data.

30. equipment according to claim 28, which is characterized in that N number of audio signal is to play white Gaussian noise The signal acquired in the environment of data or swept-frequency signal data.

31. the equipment according to claim 29 or 30, which is characterized in that the swept-frequency signal is Linear chirp, right Number swept-frequency signals, linear stepping swept-frequency signal, any one in logarithm stepping swept-frequency signal.

32. the equipment according to any one of claim 24 to 31, which is characterized in that the processing unit is specifically used for:

33. equipment according to claim 32, which is characterized in that any two adjacent signals frame in the K signal frame It is overlapped R%, R > 0.

34. equipment according to claim 33, which is characterized in that the R is 25 or 50.

35. the equipment according to any one of claim 32 to 34, which is characterized in that divided i-th of audio signal Frame, K signal frame for obtaining equal length are write as following vector form:

x_i(t)=[x_i,1(t),x_i,2(t),…,x_i,K(t)]^T

36. the equipment according to any one of claim 32 to 35, which is characterized in that the processing unit is specifically used for:

37. the equipment according to any one of claim 32 to 36, which is characterized in that the processing unit is specifically used for:

38. the equipment according to claim 37, which is characterized in that the processing unit is specifically used for:

According to formulaThe power spectrum of each audio signal is calculated,

39. the equipment according to claim 37 or 38, which is characterized in that the processing unit is specifically used for:

40. the equipment according to any one of claim 24 to 39, which is characterized in that the processing unit is specifically used for:

Determine sample frequency F of the N number of microphone when carrying out audio signal sample_sWith FFT points N_fft, broadcast using loudspeaker White Gaussian noise data or swept-frequency signal data are put, N number of microphone is controlled and acquires N number of audio signal, wherein if The data that the loudspeaker is played are swept-frequency signal data, the swept-frequency signal data are equal by M+1 segment length and frequency not Deng signal constitute,

41. equipment according to claim 40, which is characterized in that the processing unit is also used to:

42. equipment according to claim 41, which is characterized in that the swept-frequency signal data that the loudspeaker is played are write as Following vector form:

S (t)=[S₀(t),S₁(t),…,S_M(t)]^T

43. the equipment according to any one of claim 24 to 42, which is characterized in that N number of microphone acquires respectively To N number of audio signal, wherein the collected audio signal of i-th of microphone is expressed as x_iAnd x (t),_i(t) can be write as it is following to Amount form:

x_i(t)=[x_i,1(t),x_i,2(t),…,x_i,K(t)]^T

44. the equipment according to any one of claim 24 to 43, which is characterized in that the acquiring unit is specifically used for:

45. equipment according to claim 44, which is characterized in that there is noise reduction room environmental in the test room, it is described Loudspeaker is audio-frequency test Special artificial mouth, and the artificial mouth is calibrated with standard microphone before the use.

46. the equipment according to claim 44 or 45, which is characterized in that control the loudspeaker in the processing unit and broadcast Before putting white Gaussian noise data or swept-frequency signal data, the acquiring unit is also used to:

The processing unit is triggered according to formulaSignal to Noise Ratio (SNR) is calculated, and ensures that the SNR is big In first threshold.

47. a kind of device for assessing microphone array consistency characterized by comprising

Memory, for storing program and data；And

Processor, for calling and running the program and data that store in the memory；

Described device is configured as: executing the method as described in any one of claim 1 to 23.

48. a kind of system for assessing microphone array consistency characterized by comprising

Constitute N number of microphone of microphone array, N >=2；

At least one audio-source；

Device, including the memory for storing program and data and for calling and running the program stored in the memory With the processor of data, described device is configured as:

Execute the method as described in any one of claim 1 to 23.