CN102411138A

CN102411138A - A method for robot sound source localization

Info

Publication number: CN102411138A
Application number: CN2011101958620A
Authority: CN
Inventors: 刘宏; 沈苗; 李晓飞
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2011-07-13
Filing date: 2011-07-13
Publication date: 2012-04-11

Abstract

The invention discloses a robot sound source positioning method, which comprises the steps that 1) a robot adopts at least one pair of microphones to collect sound source signals, and the signals are converted into digital signals and then stored; 2) the stored digital signals are subjected to a sound source localization calculation by a modified weighted cross-correlation calculation. The method reduces the influence of interference factors such as background noise, reverberation and the like by improving the weighting function in the cross-correlation operation, and improves the positioning accuracy. And the microphone is simple to set, and a small number of microphone pairs can be set to obtain corresponding positioning effect.

Description

A method for robot sound source localization

技术领域 technical field

本发明涉及一种应用在机器人领域的音频处理技术，特别是涉及机器人的声源定位方法。属于信息技术领域。The invention relates to an audio processing technology applied in the field of robots, in particular to a sound source localization method for robots. It belongs to the field of information technology.

背景技术 Background technique

在机器人对周围环境的感知系统中，视觉感知和听觉感知是两种最重要的信息来源方式。而听觉感知对于视觉感知系统最大的优势是数据量小，因此对机器人计算机的运算能力的要求更低，更便于应用于移动机器人领域。随着科技的发展，音频技术在机器人领域的应用除了大家所熟知的基于语音识别系统的人机交互，声源定位技术也成为热门研究方向，特别是对于自动机器人而言。其核心问题是让机器人准确、迅速的对声源发出的声音信息做出响应。而利用较少的麦克风对、以简单的定位方法、较小的数据计算量，获取高实时性、准确定位结果就是移动机器人的声源定位技术要解决的最紧迫的问题。In the robot's perception system of the surrounding environment, visual perception and auditory perception are the two most important sources of information. The biggest advantage of auditory perception over visual perception system is that the amount of data is small, so the requirements for the computing power of the robot computer are lower, and it is easier to apply to the field of mobile robots. With the development of science and technology, the application of audio technology in the field of robots, in addition to the well-known human-computer interaction based on speech recognition systems, sound source localization technology has also become a popular research direction, especially for automatic robots. Its core problem is to make the robot respond accurately and quickly to the sound information from the sound source. The most urgent problem to be solved by the sound source localization technology of mobile robots is to obtain high real-time and accurate localization results by using fewer microphone pairs, simple localization methods, and a small amount of data calculation.

声源方位的识别是人和动物对环境感知的一种基本技能，人类所拥有声源定位能力的机制主要是利用声音传到双耳的时间、相位和强度的差异来判断声源的位置。声源定位技术就是根据麦克风接收到的声音数据确定声源的位置，现有的声源定位技术也基本都是围绕这三种参数进行，最常用的方法是计算声音到达不同的麦克风的时间差来确定声源的位置。但因为日常的环境中通常存在噪声、声源的反射与折射、多径效应、混响等一些环境参数的影响，以及移动机器人步进电机等本身发出的噪声及震动的影响，使得对声源的准确定位变得十分困难。The recognition of the direction of the sound source is a basic skill for humans and animals to perceive the environment. The mechanism of the sound source localization ability of humans is mainly to use the difference in the time, phase and intensity of the sound reaching the ears to judge the position of the sound source. Sound source localization technology is to determine the position of the sound source based on the sound data received by the microphone. The existing sound source localization technology is basically based on these three parameters. The most common method is to calculate the time difference between the sound arriving at different microphones. Determine the location of the sound source. However, due to the influence of some environmental parameters such as noise, reflection and refraction of the sound source, multipath effect, reverberation, etc. in the daily environment, as well as the influence of the noise and vibration emitted by the stepping motor of the mobile robot itself, the impact on the sound source Accurate positioning becomes very difficult.

通常的声源定位技术都是基于时间差的方法，即计算声源发出的声音到达麦克风阵列中(麦克风数量大于两个)不同的麦克风之间的时间差。如图1所示。Common sound source localization techniques are all based on time difference methods, that is, calculating the time difference between the arrival of sound from a sound source at different microphones in a microphone array (the number of microphones is greater than two). As shown in Figure 1.

从声源发出的声音到达两个麦克风的信号可以分别表示为：The signals from the sound from the sound source to the two microphones can be expressed as:

x₁(t)＝s₁(t)+n₁(t) (1)x ₁ (t)=s ₁ (t)+n ₁ (t) (1)

x₂(t)＝αs₁(t+D)+n₂(t) (2)x ₂ (t)=αs ₁ (t+D)+n ₂ (t) (2)

其中x₁(t)、x₂(t)分别表示在时刻t到达两个麦克风的声音信号，如果将一只麦克风接收到声源发出的声音表示为s₁(t)，则我们可以将另外一只麦克风接收到声源发出的声音为αs₂(t+D)。n₁(t)、n₂(t)分别为两只麦克风接收到的噪声，D为声源发出的声音到达两只麦克风的时间差。知道了到达的时间差D，同时两个麦克风之间的距离D1为已知的值，计算两个麦克风之间的时间差将在三维空间上得到一个符合条件的双曲面，两个麦克风还不能完全准确定位声源的具体位置，因此在实际应用中采用多对麦克风阵列的形式来收集声音信号，计算声源相对于每对麦克风在三维空间上的多个双曲面，多对麦克风分别对应的双曲面的交叉点就是声源相对于机器人的准确位置。Among them, x ₁ (t) and x ₂ (t) respectively denote the sound signals arriving at the two microphones at time t. If the sound from the sound source received by one microphone is denoted as s ₁ (t), then we can express the other A microphone receives the sound from the sound source as αs ₂ (t+D). n ₁ (t) and n ₂ (t) are the noises received by the two microphones respectively, and D is the time difference between the sound from the sound source arriving at the two microphones. Knowing the arrival time difference D, and the distance D1 between the two microphones is a known value, calculating the time difference between the two microphones will get a qualified hyperboloid in three-dimensional space, and the two microphones are not yet completely accurate Locate the specific position of the sound source, so in practical applications, multiple pairs of microphone arrays are used to collect sound signals, and the multiple hyperboloids of the sound source relative to each pair of microphones in three-dimensional space, and the hyperboloids corresponding to each pair of microphones The intersection point of is the exact location of the sound source relative to the robot.

麦克风阵列技术是语音信号处理的一个新领域，麦克风阵列由多个麦克风按照一定的拓扑结构组成，典型的如图2所示十字形麦克风阵列，分别计算信号到达两组麦克风的时间差，可以计算得到声源相对于麦克风阵列的位置。Microphone array technology is a new field of speech signal processing. The microphone array is composed of multiple microphones according to a certain topology. A typical cross-shaped microphone array is shown in Figure 2. The time difference between the signals arriving at the two groups of microphones can be calculated separately, which can be calculated as The position of the sound source relative to the microphone array.

通常计算时间差D的方法为对两只麦克风信号进行互相关运算，首先将麦克风采集的信号进行模数转换，转换为数字信号，再对数字信号进行互相关的运算。对数字信号进行互相关的运算通常有时域与频域两种方法。时域上的方法如下：Usually, the method of calculating the time difference D is to perform a cross-correlation operation on two microphone signals. Firstly, the signal collected by the microphone is converted into a digital signal by analog-to-digital conversion, and then the cross-correlation operation is performed on the digital signal. There are usually two methods for performing cross-correlation operations on digital signals: time domain and frequency domain. The method in the time domain is as follows:

${R R}_{{x x}_{11} {x x}_{22}} [[m m]] = = {Σ Σ}_{n no = = 00}^{N N - - m m - - 11} x x [[n no + + m m]] y the y [[n no]] - - - - - - ((33))$

其中

为互相关运算的结果，N为两信号长度之和。通过两路的麦克风信号进行互相关运算，并求得互相关运算的最大值。根据最大值的位置确定两路信号的时间差D。in

is the result of the cross-correlation operation, and N is the sum of the lengths of the two signals. The cross-correlation operation is performed through the two-way microphone signals, and the maximum value of the cross-correlation operation is obtained. Determine the time difference D of the two signals according to the position of the maximum value.

在时间域上对信号进行互相关运算的运算复杂度较高，可以将信号进行傅里叶变换，将信号变换到频率域上进行运算。The computational complexity of performing cross-correlation calculations on signals in the time domain is relatively high, and the signals can be Fourier transformed and transformed into the frequency domain for calculations.

首先将两路信号分别进行傅里叶变换，然后对两路信号傅里叶变换的结果进行互相关运算：First, the two signals are subjected to Fourier transform respectively, and then the cross-correlation operation is performed on the results of the Fourier transform of the two signals:

$S S ((ω ω)) = = {Σ Σ}_{m m = = 11}^{N N} x x [[n no]] {exp exp}^{((- - jwm jwm))} - - - - - - ((44))$

${R R}_{{x x}_{11} {x x}_{22}}^{' '} = = ((ω ω)) = = S S (({ω ω}_{11})) S S {(({ω ω}_{22}))}^{* *} - - - - - - ((55))$

其中x[n]为时间域的信号，ω为频率变量，j表示复数虚部的一个单位，m是一个变量，表示从1到n的一个数，S(ω)为经过傅里叶变换后得到的频率域的信号。

为两路信号在频率域进行互相关运算的结果。Where x[n] is a signal in the time domain, ω is a frequency variable, j represents a unit of the imaginary part of a complex number, m is a variable representing a number from 1 to n, and S(ω) is the Fourier transformed The resulting signal in the frequency domain.

It is the result of the cross-correlation operation on the two signals in the frequency domain.

然后将计算结果进行傅里叶逆变换再变换到时域，对逆变换的结果搜索最大值的位置，即可确定两路信号的时间差D。Then perform inverse Fourier transform on the calculation result and then transform it into the time domain, search for the position of the maximum value on the result of the inverse transform, and then determine the time difference D of the two signals.

在语音定位系统中，麦克风接收到的信号容易受到背景噪声、声音的反射、折射、混响等一系列环境因素的影响，使定位的准确度大大降低，为了减小这些外在因素的影响，现有技术在对信号进行相关性运算之前进行整体的加权处理，即In the voice positioning system, the signal received by the microphone is easily affected by a series of environmental factors such as background noise, sound reflection, refraction, reverberation, etc., which greatly reduces the accuracy of positioning. In order to reduce the influence of these external factors, In the prior art, the overall weighting process is carried out before the correlation operation is performed on the signal, that is,

${R R}_{{x x}_{11} {x x}_{22}}^{' '} ((ω ω)) = = {W W}_{n no} ((ω ω)) S S (({ω ω}_{11})) S S {(({ω ω}_{22}))}^{* *} - - - - - - ((66))$

现有技术中的加权函数只考虑了背景噪声、混响等一系列环境因素的影响，并未考虑声源本身的特性。同时现有方法对所有频率点都进行了相关性运算，复杂度高，并且浪费了大量的计算成本与时间。The weighting function in the prior art only considers the influence of a series of environmental factors such as background noise and reverberation, and does not consider the characteristics of the sound source itself. At the same time, the existing method performs a correlation operation on all frequency points, which has high complexity and wastes a lot of calculation cost and time.

发明内容 Contents of the invention

传统的互相关运算中时域转化到频域的相位转换加权函数为：The weighting function of the phase transformation from the time domain to the frequency domain in the traditional cross-correlation operation is:

${W W}_{n no} ((ω ω)) = = \frac{11}{| | {φ φ}_{1212} ((ω ω)) | |} - - - - - - ((77))$

这个公式展示了普通的相位转换加权函数，φ₁₂(ω)为信号的互功率谱密度函数。实际应用中这种加权函数无法抵御较大的噪音和混响的影响。下面详细描述本发明所涉及的算法改进：This formula shows the general phase transformation weighting function, φ ₁₂ (ω) is the cross power spectral density function of the signal. In practical applications, this weighting function cannot resist the influence of large noise and reverberation. The algorithm improvement involved in the present invention is described in detail below:

1.两路麦克风接收到的信号的互功率谱密度函数为：1. The cross power spectral density function of the signals received by the two microphones is:

${φ φ}_{1212} ((ω ω)) = = {α α}_{11} {α α}_{22} S S ((ω ω)) S S {((ω ω))}^{* *} {e e}^{- - jω jω (({τ τ}_{11} - - {τ τ}_{22}))}$

$+ + {α α}_{11} S S ((ω ω)) {e e}^{- - jω jω {τ τ}_{11}} {N N}_{22} {((ω ω))}^{* *} + + {α α}_{22} S S {((ω ω))}^{* *} {e e}^{jω jω {τ τ}_{22}} {N N}_{11} ((ω ω)) + + {N N}_{11} ((ω ω)) {N N}_{22} {((ω ω))}^{* *} - - - - - - ((88))$

在实际环境中，这个结果里面不但包含了声源有效信号的互功率谱密度，还包含了噪音的互功率谱密度，其中S(ω)为声源有效信号的功率谱，N₁(ω)、N₂(ω)为噪声的功率谱。当信噪比降低时，噪声的比例就会增大，信号中声源有效信号部分所占的比例就会减小。此外，当信号能量较小，即信噪比较低时，加权函数的分母部分会变小使得整个加权函数会增大数倍，与互功率谱密度函数相乘时会导致误差扩大，针对这些问题可以看出信噪比的高低是影响整个加权函数准确性的一个很重要的参数。预先测定机器人所处环境的信噪比SNR情况，设计了一个参数ρ，加权函数被改进为：In the actual environment, this result not only includes the cross-power spectral density of the effective signal of the sound source, but also includes the cross-power spectral density of the noise, where S(ω) is the power spectrum of the effective signal of the sound source, N ₁ (ω) , N ₂ (ω) is the power spectrum of the noise. When the signal-to-noise ratio decreases, the proportion of noise increases, and the proportion of the effective signal part of the sound source in the signal decreases. In addition, when the signal energy is small, that is, the signal-to-noise ratio is low, the denominator part of the weighting function will become smaller so that the entire weighting function will increase several times, and when multiplied with the cross-power spectral density function, the error will expand. For these It can be seen that the signal-to-noise ratio is a very important parameter that affects the accuracy of the entire weighting function. The signal-to-noise ratio (SNR) of the robot's environment is determined in advance, and a parameter ρ is designed, and the weighting function is improved as:

${W W}_{n no}^{* *} ((ω ω)) = = \frac{11}{{| | {φ φ}_{1212} ((ω ω)) | |}^{ρ ρ}} - - - - - - ((99))$

ρ的值是通过在房间中多次进行实验测试估计出的，这个值依赖于空间环境的信噪比SNR，通过多次实验发现，随着从0到1不断的增大，信号的信噪比也越强，声音中有效信号部分所占的比例也越大。因此，本发明设计了信噪比SNR与ρ的对应关系，不同信噪比取不同的ρ值，信噪比越高ρ值越大。The value of ρ is estimated through multiple experimental tests in the room. This value depends on the signal-to-noise ratio SNR of the space environment. Through multiple experiments, it is found that as the value increases from 0 to 1, the signal-to-noise ratio of the signal decreases. The stronger the ratio, the greater the proportion of the effective signal portion of the sound. Therefore, the present invention designs the corresponding relationship between SNR and ρ, different values of ρ are used for different SNRs, and the higher the SNR, the larger the value of ρ.

2.为了进一步抵御信号相干带来的干扰，对加权函数进一步改进。2. In order to further resist the interference caused by signal coherence, the weighting function is further improved.

如果加权函数用基本的相位转换，那么互功率谱密度函数，即加权函数的分母在信号能量很小时会趋近于为0，误差会被无限增大。这里提出相干函数的概念，相干函数值的大小与信号值的大小无关，因而易于比较不同的相关程度。它不仅可以表示同相相干性，而且可以表示移相(或延时)相干性。相干函数的公式为：If the weighting function uses basic phase conversion, then the cross-power spectral density function, that is, the denominator of the weighting function will tend to be 0 when the signal energy is small, and the error will be infinitely increased. The concept of coherence function is proposed here. The value of the coherence function has nothing to do with the value of the signal, so it is easy to compare different degrees of correlation. It can represent not only in-phase coherence, but also phase-shifted (or time-delayed) coherence. The formula for the coherence function is:

${γ γ}_{1212}^{22} ((ω ω)) = = \frac{{| | {φ φ}_{1212} | |}^{22}}{{φ φ}_{11} ((ω ω)) {φ φ}_{22} ((ω ω))} - - - - - - ((1010))$

如果相干函数为零，表示输出信号与输入信号不相干，当相干函数为1时，表示输出信号与输入信号完全相干。若相干函数在0-1之间，则表明有如下三种可能：If the coherence function is zero, it means that the output signal is not coherent with the input signal, and when the coherence function is 1, it means that the output signal is completely coherent with the input signal. If the coherence function is between 0-1, it indicates that there are three possibilities as follows:

(1)测试中有外界噪声干扰；(1) There is external noise interference during the test;

(2)输出y(t)是输入x(t)和其它输入的综合输出；(2) The output y(t) is the comprehensive output of the input x(t) and other inputs;

(3)联系x(t)和y(t)的系统是非线性的。(3) The system connecting x(t) and y(t) is nonlinear.

信号相干性差，则给Φ加的分量也小，这样不但减小了由于分母趋近于零造成的误差，而且不会对Φ所表征的互功率谱产生干扰。将加权函数修改为：If the signal coherence is poor, the component added to Φ is also small, which not only reduces the error caused by the denominator approaching zero, but also does not interfere with the cross power spectrum represented by Φ. Modify the weighting function to:

${W W}_{n no}^{* *} ((ω ω)) = = \frac{11}{{| | {φ φ}_{1212} ((ω ω)) | |}^{ρ ρ} + + | | {γ γ}_{1212}^{22} ((ω ω)) | |} 00 \leq \leq ρ ρ \leq \leq 11 - - - - - - ((1111))$

基于上述分析，我们针对现有技术提出了一种新的声源定位的方法，在互相关运算的基础上对相位转换加权函数进行改进。应用于移动机器人听觉领域实时进行声源定位。克服了现有技术中的运算量太大，容易受噪声影响等缺点，取得了良好的效果。Based on the above analysis, we propose a new sound source localization method based on the existing technology, and improve the phase conversion weighting function on the basis of the cross-correlation operation. It is applied to the hearing field of mobile robots for real-time sound source localization. The invention overcomes the disadvantages of the prior art, such as too much calculation amount, being easily affected by noise, etc., and achieves good results.

本发明提出的机器人声源定位方法，通过以下技术方案实现：The robot sound source localization method proposed by the present invention is realized through the following technical solutions:

1)机器人采用至少一对麦克风采集声源信号，并将信号转换为数字信号后存储；1) The robot uses at least one pair of microphones to collect sound source signals, convert the signals into digital signals and store them;

2)对1中存储的数字信号进行下面的声源定位计算：2) Carry out the following sound source localization calculation to the digital signal stored in 1:

a)对每个麦克风采集到的数据信号进行傅里叶变换，将信号从时域变换到频域：a) Fourier transform is performed on the data signal collected by each microphone, and the signal is transformed from the time domain to the frequency domain:

$S S ((ω ω)) = = {Σ Σ}_{n no = = 11}^{N N} x x [[n no]] exp exp ((- - jwn jwn)) - - - - - - ((1212))$

其中x[n]为麦克风采集到的时域信号，S(ω)为傅里叶变化得到的频域信号。Among them, x[n] is the time-domain signal collected by the microphone, and S(ω) is the frequency-domain signal obtained by Fourier transformation.

b)计算麦克风所在空间环境信噪比SNR，根据信噪比的情况确定信噪比参数ρ的值，0≤ρ≤1；b) Calculate the signal-to-noise ratio SNR of the space environment where the microphone is located, and determine the value of the signal-to-noise ratio parameter ρ according to the situation of the signal-to-noise ratio, 0≤ρ≤1;

c)对麦克风对的频率域的信号进行加权互相关运算：c) carry out weighted cross-correlation operation to the signal of the frequency domain of microphone pair:

${R R}_{{x x}_{11} {x x}_{22}}^{' '} ((ω ω)) = = {W W}_{n no}^{* *} ((ω ω)) S S (({ω ω}_{11})) S S {(({ω ω}_{22}))}^{* *} - - - - - - ((1313))$

其中，

为加权函数：in,

is the weighting function:

其中，φ₁₂(ω)是互功率谱密度函数。Among them, φ ₁₂ (ω) is the cross power spectral density function.

d)对互相关运算的结果进行傅里叶逆变换，从频域变换到时域：d) Inverse Fourier transform is performed on the result of the cross-correlation operation, from the frequency domain to the time domain:

${R R}^{' '} [[n no]] = = \frac{11}{22 π π} {&Integral; &Integral;}_{- - π π}^{π π} {R R}^{' '} ((ω ω)) {e e}^{jωn jωn} dω dω$

其中R′(ω)为频域信号，R′[n]为傅里叶逆变换得到的时域信号。Among them, R'(ω) is the frequency domain signal, and R'[n] is the time domain signal obtained by Fourier inverse transform.

e)对d)步骤获得的结果R′[n]搜索最大值，找出最大值的位置，即可获得声源到麦克风的时间差D；e) Search for the maximum value of the result R'[n] obtained in step d), find out the position of the maximum value, and then obtain the time difference D from the sound source to the microphone;

f)根据e中计算得到的时间差D与麦克风对中麦克风之间的距离D1，计算声源相对于麦克风对的位置。f) According to the time difference D calculated in e and the distance D1 between the microphones in the microphone pair, calculate the position of the sound source relative to the microphone pair.

所述对任意一对麦克风采集到的经过傅里叶变化的信号进行加权互相关运算的方法为：The method for carrying out weighted cross-correlation operation on the Fourier-transformed signals collected by any pair of microphones is as follows:

${W W}_{n no}^{* *} ((ω ω)) = = \frac{11}{{| | {φ φ}_{1212} ((ω ω)) | |}^{ρ ρ} + + | | {γ γ}_{1212}^{22} ((ω ω)) | |} 00 \leq \leq ρ ρ \leq \leq 11$

其中，γ₁₂(ω)为相干函数：Among them, γ ₁₂ (ω) is the coherence function:

其中，φ₁(ω)和φ₂(ω)分别为麦克风对中各麦克风接收到的信号的功率谱密度函数。Wherein, φ ₁ (ω) and φ ₂ (ω) are power spectral density functions of signals received by each microphone in the microphone pair, respectively.

在对数字信号进行声源定位计算前，对数字信号进行有效音频信号的端点检测，对包含有效音频的信号进行声源定位计算。Before the sound source localization calculation is performed on the digital signal, the endpoint detection of the effective audio signal is performed on the digital signal, and the sound source localization calculation is performed on the signal containing the effective audio.

所述有效音频信号的端点检测为语音端点检测。The endpoint detection of the effective audio signal is voice endpoint detection.

对包含有效音频的信号每采集到设定数据量的数据就进行一次声源定位的运算。A calculation of sound source localization is performed every time a set amount of data is collected for a signal containing effective audio.

所述设定数据量的数据是帧长大小的数据。The data of the set data amount is data of a frame size.

也就是说，对于包含有效数据的信号通过在时间域进行分帧处理方式进行声源定位计算。That is to say, the sound source localization calculation is performed by performing frame-by-frame processing in the time domain for signals containing valid data.

所述麦克风阵列为两对或两对以上的麦克风。The microphone array is two or more pairs of microphones.

所述声源相对于一对麦克风对的位置为一双曲面。The position of the sound source relative to the pair of microphones is a hyperboloid.

所述机器人可以为移动机器人，也可以为固定机器人。The robot can be a mobile robot or a fixed robot.

所述声源可以为固定声源，也可以为移动声源。The sound source may be a fixed sound source or a mobile sound source.

所属傅里叶变换可以为快速傅里叶变换。The associated Fourier transform may be a fast Fourier transform.

所述麦克风为两对或两对以上时，获得的两个或两个以上的双曲面，所述两个或两个以上双曲面的交点为声源位置。声源相对于所述两个麦克风的位置为一个双曲面，利用至少两对麦克风可以计算出声源分别相对于不同麦克风对位置的双曲面，至少两组双曲面的交点即为声源相对于麦克风对的位置。When there are two or more pairs of microphones, two or more hyperboloids are obtained, and the intersection of the two or more hyperboloids is the position of the sound source. The position of the sound source relative to the two microphones is a hyperboloid, and at least two pairs of microphones can be used to calculate the hyperboloid of the sound source relative to the positions of different microphone pairs, and the intersection of at least two sets of hyperboloids is the relative position of the sound source The location of the microphone pair.

所述互功率谱密度函数φ₁₂(ω)为：The cross power spectral density function φ ₁₂ (ω) is:

$+ + {α α}_{11} S S ((ω ω)) {e e}^{- - jω jω {τ τ}_{11}} {N N}_{22} {((ω ω))}^{* *} + + {α α}_{22} S S {((ω ω))}^{* *} {e e}^{jω jω {τ τ}_{22}} {N N}_{11} ((ω ω)) + + {N N}_{11} ((ω ω)) {N N}_{22} {((ω ω))}^{* *}$

所述麦克风阵列可以按十字形排列，也可以按照其他方式排列。The microphone array can be arranged in a cross shape or in other ways.

本发明的技术效果：Technical effect of the present invention:

本发明的方法通过对互相关运算中的加权函数进行改进，降低了背景噪声、混响等干扰因素的影响，提高了定位的准确性。且麦克风设置简单，可设置较少数量的麦克风对也能取得相应的定位效果。The method of the invention reduces the influence of interference factors such as background noise and reverberation by improving the weighting function in the cross-correlation operation, and improves the positioning accuracy. Moreover, the microphones are set simply, and a relatively small number of microphone pairs can be set to obtain a corresponding positioning effect.

附图说明 Description of drawings

图1是声源与麦克风对模型图；Figure 1 is a model diagram of a sound source and a microphone pair;

图2是麦克风阵列模型图；Fig. 2 is a microphone array model diagram;

图3是本发明方法的流程框图。Fig. 3 is a flowchart of the method of the present invention.

具体实施方法Specific implementation method

图3显示了本发明的实现技术步骤，具体过程如下：Fig. 3 has shown the realization technique step of the present invention, and concrete process is as follows:

1.麦克风阵列中的麦克风分别进行声源信号采集并进行预处理，将信号转换为数字信号，并存储。1. The microphones in the microphone array respectively collect and preprocess the sound source signal, convert the signal into a digital signal, and store it.

2.对1采集的信号进行有效音频信号的端点检测，在此实例中位语音端点检测，如果有有效音频信号的存在则开始进行声源定位运算，如果没有有效音频信号的存在则不进行定位的运算。并对采集的有效音频信号数据在时间域进行分帧处理，即每采集到一定量的数据(帧长大小的数据)就进行一次声源定位的运算。2. Carry out the endpoint detection of the effective audio signal on the signal collected in 1. In this example, the audio endpoint detection is performed. If there is an effective audio signal, the sound source localization operation will be started. If there is no effective audio signal, the positioning will not be performed. operation. And the collected effective audio signal data is processed in the time domain by frame division, that is, every time a certain amount of data (frame length data) is collected, a sound source localization operation is performed.

有效音频信号是根据声源的频率范围来确定的。以人声定位系统为例，由于人声的频率范围大部分在80Hz-1.2kHz之内，因此我们只取这个频率段的信号进行相关性运算，假如信号的采样频率为44.1kHz，其频率范围为0-22.5kHz，我们只取80Hz-1.2kHz频率段范围内的傅里叶变化值进行相关性运算，此种方法可以消除、降低大部分噪声的影响，如60Hz的电磁噪声、麦克风采样产生的高频噪声等，提高定位的准确度，并大幅降低了运算的复杂度，频域相关性部分的运算量降低为原算法的1/20，大幅的提高了定位的效率跟实时性，因此也大幅降低了机器人的反应时间。A valid audio signal is determined based on the frequency range of the sound source. Taking the human voice positioning system as an example, since the frequency range of the human voice is mostly within 80Hz-1.2kHz, we only take the signal in this frequency range for correlation calculation. If the sampling frequency of the signal is 44.1kHz, its frequency range 0-22.5kHz, we only take the Fourier change value in the frequency range of 80Hz-1.2kHz for correlation calculation, this method can eliminate and reduce the influence of most noise, such as 60Hz electromagnetic noise, microphone sampling High-frequency noise, etc., improve the accuracy of positioning, and greatly reduce the complexity of the calculation. The calculation amount of the frequency domain correlation part is reduced to 1/20 of the original algorithm, which greatly improves the efficiency and real-time performance of positioning. Therefore, It also greatly reduces the reaction time of the robot.

而对于其他声源信号，则可以根据以上原则选择相应的频率段作为有效音频信号。For other sound source signals, the corresponding frequency segment can be selected as an effective audio signal according to the above principles.

3.将2中处理后的采集信号进行快速傅里叶变换：3. Perform fast Fourier transform on the acquired signal processed in 2:

$S S ((ω ω)) = = {Σ Σ}_{m m = = 11}^{N N} x x [[n no]] exp exp ((- - jwm jwm)) - - - - - - ((1212))$

4.计算麦克风所在空间环境信噪比SNR，确定信噪比参数ρ的值；4. Calculate the signal-to-noise ratio SNR of the space environment where the microphone is located, and determine the value of the signal-to-noise ratio parameter ρ;

下面给出一个ρ与SNR的对应范围，发明并不受此限制。A corresponding range between ρ and SNR is given below, and the invention is not limited thereto.

$\{\begin{matrix} SNR \leq 10 \\ 10 < SNR \leq 25 \\ 25 < SNR \end{matrix}$ 对应ρ的范围 $\{\begin{matrix} ρ = [\begin{matrix} 0.2 & 0.55 \end{matrix}] \\ ρ = [\begin{matrix} 0.55 & 0.75 \end{matrix}] \\ ρ = [\begin{matrix} 0.75 & 0.9 \end{matrix}] \end{matrix}$ $\{\begin{matrix} SNR \leq 10 \\ 10 < SNR \leq 25 \\ 25 < SNR \end{matrix}$ Corresponding to the range of ρ $\{\begin{matrix} ρ = [\begin{matrix} 0.2 & 0.55 \end{matrix}] \\ ρ = [\begin{matrix} 0.55 & 0.75 \end{matrix}] \\ ρ = [\begin{matrix} 0.75 & 0.9 \end{matrix}] \end{matrix}$

即，比如SNR≤10时，ρ＝0.2-0.55，ρ可取值0.2-0.55之间的数值。That is, for example, when SNR≤10, ρ=0.2-0.55, and ρ can take a value between 0.2-0.55.

5.对麦克风对采集的经傅里叶变换的信号进行加权互相关运算：5. Perform a weighted cross-correlation operation on the Fourier-transformed signal collected by the microphone:

6.对5中的计算结果进行快速傅里叶逆变换，将计算结果变换到时间域。6. Perform inverse fast Fourier transform on the calculation results in 5, and transform the calculation results into the time domain.

${R R}^{' '} [[n no]] = = \frac{11}{22 π π} {&Integral; &Integral;}_{- - π π}^{π π} {S S}^{' '} ((ω ω)) {e e}^{jωn jωn} dω dω$

7.对6中的计算结果搜索R′[n]中最大值的位置，即可得到声源到麦克风对的时间差D.7. Search for the position of the maximum value in R′[n] from the calculation results in 6, and you can get the time difference D from the sound source to the microphone pair.

8.根据步骤7的计算结果分别求出麦克风阵列中声源相对每对麦克风的位置，每个位置表示为一双曲面，这些位置的交叉点即各对应的双曲面的交叉点即为声源相对于麦克风阵列的位置。8. Calculate the position of the sound source relative to each pair of microphones in the microphone array according to the calculation results in step 7. Each position is represented as a hyperboloid, and the intersection of these positions, that is, the intersection of each corresponding hyperboloid, is the relative sound source. at the location of the microphone array.

麦克风阵列可以为两对或两对以上的麦克风，如果是两对麦克风可以按图2所示的十字形排列。麦克风的排列可以根据具体情况进行调整，本发明不对此进行限制。The microphone array can be two or more pairs of microphones, and if there are two pairs of microphones, they can be arranged in a cross shape as shown in FIG. 2 . The arrangement of the microphones can be adjusted according to specific conditions, and the present invention is not limited thereto.

Claims

1. A robot sound source positioning method comprises the following steps:

1) the robot adopts at least one pair of microphones to collect sound source signals, converts the signals into digital signals and stores the digital signals;

2) performing sound source localization calculation on the stored digital signal:

a) performing Fourier transform on the data signal collected by each microphone, and transforming the signal from a time domain to a frequency domain:

wherein x [ n ] is a signal in a time domain, ω is a frequency variable, j represents a unit of an imaginary part of a complex number, m represents a variable from 1 to n, and S (ω) is a signal in a frequency domain obtained after Fourier transform;

b) calculating the signal-to-noise ratio of the space environment where the microphone is located, and determining a signal-to-noise ratio parameter rho according to the signal-to-noise ratio, wherein rho is more than or equal to 0 and less than or equal to 1;

c) carrying out weighted cross-correlation operation on the frequency domain signals of the microphone pair:

wherein,

as a weighting function:

wherein phi is₁₂(ω) is the cross-power spectral density function;

d) performing inverse Fourier transform on the result of the cross-correlation operation, and transforming the result from a frequency domain to a time domain:

<math> <mrow> <msup> <mi>R</mi> <mo>′</mo> </msup> <mo>[</mo> <mi>n</mi> <mo>]</mo> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>π</mi> </mrow> </mfrac> <msubsup> <mo>&Integral;</mo> <mrow> <mo>-</mo> <mi>π</mi> </mrow> <mi>π</mi> </msubsup> <msup> <mi>R</mi> <mo>′</mo> </msup> <mrow> <mo>(</mo> <mi>ω</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mi>jωn</mi> </msup> <mi>dω</mi> <mo>;</mo> </mrow> </math>

e) searching the maximum value of R' n, finding out the position of the maximum value, and obtaining the time difference D from the sound source to the microphone pair;

f) the position of the sound source relative to the microphone pair is calculated from the time difference D and the distance D1 between the two microphones of the microphone pair.

2. The robotic sound source localization method of claim 1, characterized by a further optimized weighting function

Comprises the following steps:

wherein, γ₁₂(ω) is the coherence function:

wherein phi is₁(omega) and phi₂And (omega) is a function of the power spectral density of the signals received by each microphone of the microphone pair.

3. The robot sound source localization method according to claim 1 or 2, wherein the digital signal is subjected to end point detection of a valid audio signal and the signal containing the valid audio is subjected to sound source localization calculation before the sound source localization calculation is performed on the digital signal.

4. The robot sound source localization method according to claim 3, wherein the sound source localization calculation is performed for each time a set amount of data is acquired for a signal containing valid audio.

5. The method according to claim 4, wherein the set data amount is data of a frame length size.

6. A robot sound source localization method according to claim 1 or 2, characterized in that the fourier transform is a fast fourier transform.

7. A robot sound source localization method according to claim 1 or 2, wherein the robot collects sound source signals using a microphone array comprising two or more pairs of microphones.

8. The method of claim 7, wherein when the microphone array comprises two or more pairs of microphones, the position of the sound source relative to the microphone array is calculated after the position of the sound source relative to each pair of microphones is calculated, and the calculating method comprises: the intersection of the sound source with respect to the positions of the respective pairs of microphones is taken as the position of the sound source with respect to the microphone array.

9. A robot sound source localization method according to claim 1 or 2, characterized in that the robot is a mobile robot or a fixed robot.

10. The robot sound source localization method according to claim 1 or 2, wherein the sound source is a fixed sound source or a moving sound source.