[go: up one dir, main page]

CN104142492A - SRP-PHAT multi-source spatial positioning method - Google Patents

SRP-PHAT multi-source spatial positioning method Download PDF

Info

Publication number
CN104142492A
CN104142492A CN201410366922.4A CN201410366922A CN104142492A CN 104142492 A CN104142492 A CN 104142492A CN 201410366922 A CN201410366922 A CN 201410366922A CN 104142492 A CN104142492 A CN 104142492A
Authority
CN
China
Prior art keywords
omega
tau
time
source
microphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410366922.4A
Other languages
Chinese (zh)
Other versions
CN104142492B (en
Inventor
孙明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN201410366922.4A priority Critical patent/CN104142492B/en
Publication of CN104142492A publication Critical patent/CN104142492A/en
Application granted granted Critical
Publication of CN104142492B publication Critical patent/CN104142492B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明所述一种SRP-PHAT多源空间定位方法,首先假设在数据获得过程中均匀圆形麦克风阵列的全部麦克风的数目和空间位置不变,各向同性的麦克风均匀分布在一个半径为r的位于x-y平面的圆周上,采用极坐标来表示平面波s的到达方向,坐标系的原点位于圆形阵列的圆心位置上,多声源信号划分为互不重叠的时频点集合,使每个时频窗内只包含一个活动的源信号,满足弱的W分离正交条件;并选取汉明窗,通过SRP-PHAT算法计算可控响应功率函数和得到目标函数,控制波束在所有可能的接收方向进行扫描,则波束输出功率最大的方向值即得到声源的方向,其使得多声源的DOA估计在强噪声和适度混响的声学环境下具有较好的分离性能,明显突出了真正峰值,具有较高的定位精度。

A kind of SRP-PHAT multi-source spatial positioning method described in the present invention, at first assume that the number and the spatial position of all the microphones of the uniform circular microphone array are unchanged in the data acquisition process, and the isotropic microphones are evenly distributed in a radius r is located on the circumference of the xy plane, using polar coordinates to represent the arrival direction of the plane wave s, the origin of the coordinate system is located at the center of the circular array, and the multi-sound source signals are divided into non-overlapping time-frequency point sets, so that each The time-frequency window contains only one active source signal, which satisfies the weak W-separated orthogonality condition; and selects the Hamming window, calculates the controllable response power function and obtains the objective function through the SRP-PHAT algorithm, and controls the beam in all possible receiving If the direction value of the beam output power is the largest, the direction of the sound source is obtained, which makes the DOA estimation of multiple sound sources have better separation performance in the acoustic environment of strong noise and moderate reverberation, and the true peak value is obviously highlighted. , with high positioning accuracy.

Description

A kind of SRP-PHAT multi-source space-location method
Technical field
The present invention relates to a kind of space-location method, specifically, relate to a kind of SRP-PHAT multi-source space-location method, be applied in the systems such as video conference, voice enhancing, osophone, hands-free phone and intelligent robot.
Background technology
Auditory localization technology is with a wide range of applications in the systems such as video conference, voice enhancing, osophone, hands-free phone and intelligent robot, has received in recent years increasing concern.
Controlled responding power (SRP-PHAT:Steered Response Power-Phase Transform) the auditory localization algorithm of phase tranformation weighting has at present become main flow algorithm, this algorithm combines the advantage of steerable beam formation and GCC-PHAT, has stronger robustness under Low SNR.For simple sund source, be positioned with good performance, but maximum shortcoming is that operand is large, huge operand has limited the application in real-time system.
Many researchers are attempting reducing the calculated amount of the controlled responding power search procedure of its core.As secondary accelerates SRP-PHAT auditory localization algorithm, by vertically arranged array, the search of two-dimensional space is converted into the search of the one-dimensional space, adopts Level Search strategy, by thick, to smart, the one-dimensional space is searched for.And for example improved associating SRP-PHAT voice location algorithm utilizes orthogonal straight lines microphone array that two-dimensional search space is reduced to dimension space one to one, then in the one-dimensional space, carries out respectively hierarchical search strategy, finds SRP maximal value to determine sound source position.
In practice, usually need to estimate the position of multi-acoustical.The separated orthogonality hypothesis of the existing W-based on the sparse property of voice signal does not meet many sound sources, cause the method spatial resolution low, easily be subject to the impact of reverberation, particularly under reverberation and noise circumstance, cannot differentiate two nearer signal sources of leaning in direction.Therefore, many auditory localizations problem has very important theory significance and practical value.
Summary of the invention
The present invention has overcome shortcoming of the prior art, and a kind of SRP-PHAT multi-source space-location method is provided, and can under reverberation and noise circumstance, differentiate a plurality of nearer signal sources of leaning in direction, good positioning effect.
In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:
A SRP-PHAT multi-source space-location method, is characterized in that, comprises the following steps:
1) computer memory coordinate under assumed condition, first number and the locus of supposing whole microphones of Homogeneous Circular microphone array in data acquisition process are constant, sound source and microphone distance meet the requirement of sound-field model, the physical property of each microphone is identical, isotropic microphone is evenly distributed on the circumference that is positioned at x-y plane that a radius is r, adopt polar coordinates to represent the arrival direction of plane wave s, the initial point of coordinate system is positioned on the home position of circular array, the pitching angle theta ∈ [0 of signal, pi/2], and position angle φ ∈ [0,2 π];
2) many sound-source signals are divided into the time frequency point sets of non-overlapping copies, make only to comprise a movable source signal in each time frequency window, meet the separated orthogonality condition of weak W; And choose Hamming window, work as WDO mmeet the separated quadrature of W-at=1 o'clock;
3) by SRP-PHAT algorithm, calculate the controlled responding power function of the right phase tranformation of all microphones and obtain an objective function, the control wave beam of Beam-former scans at all possible receive direction, and the direction value of wave beam output power maximum obtains the direction of sound source.
Further, described step 2) comprising:
First introduce two important performance criterias: (1) is sheltered and to what extent retained interested sound source; (2) shelter and to what extent suppressed interference sound source;
Consideration is divided into many sound-source signals the time frequency point sets of non-overlapping copies, only comprises a movable source signal in each time frequency window, and approximate satisfied
S j ( t , ω ) S k ( t , ω ) ≈ 0 , ∀ t , ω
Definition time-frequency masking code is
By estimating the time-frequency masking in corresponding each source, can from mixing source, obtain certain source j thus
S j ( t , ω ) = M j ( t , ω ) X ( t , ω ) , ∀ t , ω
M wherein jfor the indicator function of source j support, S j(t, ω), X (t, ω) is respectively s j, the time-frequency representation of x (t),
For given time-frequency mask M, the signal ratio PSRM that definition retains:
PSR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2
PSRM is the shared number percent of source Sj energy that appraisal retains after use is sheltered;
Definition simultaneously
z j ( t ) = Σ k = 1 j ≠ k N s k ( t )
Z wherein j(t) be at source S jlower active sum of interference;
After definition application time-frequency masking M, signal-to-noise ratio is:
SIR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | M ( t , ω ) Z j ( t , ω ) | | 2
SIR wherein mthe main signal-to-noise ratio of estimating after application time-frequency masking M separation signal;
Pass through PSR mand SIR mcan estimate approximate W-separated orthogonality WDO m:
WDO M = | | M ( t , ω ) S j ( t , ω ) | | 2 - | | M ( t , ω ) Y j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2
Because voice signal has sparse time-frequency representation, the power of its time-frequency representation accounts for the exhausted vast scale of general power, and the product amplitude of its time-frequency representation is conventionally always little, therefore meets the separated orthogonality condition of weak W; Especially, work as WDO mmeet the separated quadrature of W-at=1 o'clock.
Further, described step 3) for the SRP-PHAT algorithm of dual microphone;
For only having two microphones, microphone m iwith microphone m jarray, from the signal of position angle and the angle of pitch, arriving two microphone time delays is Δ τ ij(θ, φ), TDOA can estimate by broad sense simple crosscorrelation (GCC), be expressed as:
Δ τ ij ( θ , φ ) = arg max τ P ( r ) = arg max τ R s i , s j ( Δτ ij ( θ , φ ) )
Wherein P (r) is three-dimensional space vectors r spatial likelihood function, can obtain by calculating all possible θ and φ broad sense cross correlation function Rs is j(Δ τ i, j(θ, φ)) in frequency domain, can be expressed as:
R s i s j ( Δ τ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ ij ( θ , φ ) ) dω
ψ wherein ij(ω) be weighting function, S i(ω) S* j(ω) be cross-spectral density function;
Phase tranformation (PHAT) method is exactly a kind of typical transform method,
Definition phase weighting function is:
Ψ ij ( ω ) = 1 | S i ( ω ) S j * ( ω ) |
By selecting suitable weighting function, make the controlled responding power of delay accumulation meet optimization signal-to-noise ratio (SNR) Criterion, broad sense simple crosscorrelation Rs is j(Δ τ i, j(θ, φ)) in limited scope τ, show as a peak value, correspondence propagates into microphone m iwith microphone m jdelay TDOA.
Further, described step 3) for the SRP-PHAT algorithm of circular array microphone sound source:
The broad sense simple crosscorrelation right to all microphones summation:
P ( Δ τ 1 , Δτ 2 , . . . Δτ N ) = Σ i = 1 N Σ j = 1 N R s i s j ( Δτ ij ( θ , φ ) )
= Σ i = 1 N Σ j = 1 N ∫ - π π Ψ i , j ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ i - Δτ j ) dω
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
Further, described step 3) for many sound sources circular array microphone SRP-PHAT algorithm:
When there is two and above sound source, when there is more than two sound source, the SRP-PHAT peak value of a sound source has been sneaked into the SRP-PHAT peak value of another sound source, on some points, can produce false peak value, is difficult to find local maximal peak simultaneously simultaneously;
Utilize voice signal approximate W-separated orthogonality, at time-frequency domain, estimate that each sound-source signal arrives the relative time delay of microphone, array, utilize Short Time Fourier Transform as approximate W-separated orthogonal transformation,
The frequency domain representation of supposing the signal model of i microphone is:
X i [ ω , τ ] = S n ( ω , τ ) e - jωΔ τ n , i + N i [ ω , τ ]
If given window function W, the Short Time Fourier Transform of sj is Sj, has
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ
By selecting appropriate window function and size, at signal, be under approximate W-separated orthogonality hypothesis, only have a sound source at any time-Frequency point is effective, its cross-spectrum is:
E [ X i [ ω , τ ] X j * [ ω , τ ] ] = | S n ( ω , τ ) | 2 e - jω ( Δτ i - Δτ j )
The time delay Δ τ between microphone i and microphone j n, i-Δ τ n, jcan obtain by cross-power spectrum.
Compared with prior art, the invention has the beneficial effects as follows:
A kind of SRP-PHAT multi-source space-location method of the present invention shows by theoretical analysis and emulation experiment, associating approximate W based on circular array-separated quadrature SRP-PHAT algorithm makes the DOA of many sound sources estimate to have good separating property under the acoustic enviroment of very noisy and appropriate reverberation, obviously give prominence to true peaks, there is higher positioning precision.
1. for uniform circular array row, can see the research to simple sund source location, and relatively less for the multi-source Position Research of circular array.There is more high spatial resolution
2. on the basis of approximate W-separated orthogonality hypothesis, SRP-PHAT algorithm makes the DOA of many sound sources estimate under the acoustic enviroment of very noisy and appropriate reverberation, to have good separating property, has obviously given prominence to true peaks, has higher positioning precision.
3. can effectively solve the problem at false spectrum peak, 3 signal sources can be differentiated and opened,
4. this method is applicable to the location under medium reverberation.
Accompanying drawing explanation
Accompanying drawing is used to provide a further understanding of the present invention, for explaining the present invention, is not construed as limiting the invention together with embodiments of the present invention, in the accompanying drawings:
Fig. 1 is uniform circular array row geometric graphs;
Fig. 2 is that uniform circular array train wave bundle forms principle;
Fig. 2: WDO ratio (80%) in 3 sound source situations;
Fig. 4: WDO ratio (90%) in 3 sound source situations;
Fig. 5 sound source s 1(t) time frequency analysis | S 1w (t, ω) |;
Fig. 6 sound source s 2(t) time frequency analysis | S 2w (t, ω) |;
Fig. 7 time frequency analysis | S 1w (t, ω) S 2w (t, ω) |;
Fig. 8 method realizes block diagram;
Fig. 9 uniform circular array row;
Figure 10 is two auditory localization two-dimensional imaging figure, and signal to noise ratio (S/N ratio) is 20dB;
Figure 11 is two auditory localization two-dimensional imaging figure, and signal to noise ratio (S/N ratio) is 30dB;
Figure 12 is the position angle that circular array is surveyed two sound sources, and signal to noise ratio (S/N ratio) is 20dB;
Figure 13 is the position angle that circular array is surveyed two sound sources, and signal to noise ratio (S/N ratio) is 30dB;
Figure 14 is two angle, the sound bearing three-dimensional plot of surveying, and signal to noise ratio (S/N ratio) is 30dB;
Figure 15 is three auditory localization two-dimensional imaging figure, and signal to noise ratio (S/N ratio) is 30dB;
Figure 16 improves one's methods for circular array, to survey the position angle of three sound sources, and signal to noise ratio (S/N ratio) is 30dB;
Figure 17 is that classic method is surveyed the position angle of three sound sources for circular array, and signal to noise ratio (S/N ratio) is 30dB;
Figure 18 is the signal waveform that 8 yuan of microphones receive;
Signal to noise ratio (S/N ratio) and angular error curve when Figure 19 is different T60
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein, only for description and interpretation the present invention, is not intended to limit the present invention.
The first step, location model and uniform circular array train wave bundle form.
A Homogeneous Circular array can be determined space coordinates, as shown in Figure 1, is that isotropic microphone is evenly distributed on the circumference that is positioned at x-y plane that a radius is R.Adopt polar coordinates to represent the arrival direction of plane wave s, the initial point of coordinate system is positioned on the home position of circular array, and true origin is the pitching angle theta ∈ [0, pi/2] of system reference point signal, and position angle φ ∈ [0,2 π].Wherein r is the distance that sound source arrives the circular array center of circle, r ifor sound source is to microphone m idistance.
Suppose that acoustic signals is:
s ( r , t ) = e j ω 0 t - - - ( 1 )
Wherein: ω 0for the angular frequency of sound-source signal, and
C is velocity of wave, C=384m/s.
F is the frequency (Hz) of sound source.
The signal of i microphone reception is
f i(r,t)=s(t-Δτ i)
(2)
As shown in Figure 1
Wherein: r ibe that i microphone is to the distance in source
R is the distance that the round microphone array center of circle is arrived in source
R circular array radius
θ is the angle of pitch of sound source,
position angle for sound source.
i=0,1,2 ... N-1 is the position angle of i microphone.
So the time delay of each microphone before stack is
Wherein: C is velocity of wave, C=384m/s.
As shown in Figure 2, by the time delay Beam-former that superposes, the shifted signal of all microphones capture is sued for peace.The contribution stack of each sound source far zone field point just can, in the hope of the far-field pattern function of this ring array, be had
y ( t ) = 1 N Σ i = 1 N s ( r i , t - Δ τ i ) = 1 N Σ n = 1 N e j ω 0 ( t - Δ τ i ) = e j ω 0 t 1 N Σ n = 1 N e - j ω 0 Δ τ i - - - ( 5 )
(4) are brought in (5) and obtained
Wherein: for sound source unit's wave-number vector.
T is vector transposition.
Δ τ ii0τ 0 estimates with reference to time delay, and getting minimum in all microphone time delays is reference.
Second step, approximate W-separated orthogonality hypothesis
Conventionally the masking effect of people's ear is divided into frequency masking and temporal masking characteristic, based on time-frequency masking method hypothesis sound-source signal, is sparse in separable, meets the separated orthogonality of W-.
Suppose that signal x (t) is comprised of N sound-source signal, can be expressed as
x ( t ) = Σ j = 1 N s j ( t ) - - - ( 7 )
If there is certain linear transformation T, be called s jto S jmapping, be designated as and there is following properties:
(1) conversion T has reversibility, i.e. T -1(Ts)=T (T -1s)=s
(2) during j ≠ k, Λ wherein jfor S jsupport, Λ j=supp S j:={ λ: S j(λ) ≠ 0}, table
Show collection Λ jwith Λ kfriendship non-zero.
If meet above-mentioned (1), the condition of (2), the mixed signal in collection S all can be effectively separated.
If a given window function, if meet
S j ( t , ω ) S k ( t , ω ) = 0 ∀ t , ω - - - ( 8 )
Claim two sound source S jand S kmeet the separated orthogonality of W-.
But the separated orthogonality hypothesis of W-does not meet the signal that will study herein, the result of expression formula (7) is seldom zero.
For this reason, introduce two important performance criterias: (1) is sheltered and to what extent retained interested sound source; (2) shelter and to what extent suppressed interference sound source.
Consideration is divided into many sound-source signals the time frequency point sets of non-overlapping copies, only comprises a movable source signal in each time frequency window, and approximate satisfied
S j ( t , ω ) S k ( t , ω ) ≈ 0 , ∀ t , ω - - - ( 9 )
Definition time-frequency masking code is
By estimating the time-frequency masking in corresponding each source, can from mixing source, obtain certain source j thus
S j ( t , ω ) = M j ( t , ω ) X ( t , ω ) , ∀ t , ω - - - ( 11 )
M wherein jfor the indicator function of source j support, S j(t, ω), X (t, ω) is respectively s j, the time-frequency representation of x (t),
For given time-frequency mask M, the signal ratio PSR that definition retains m
PSR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2 - - - ( 12 )
PSR mfor estimate the source S retaining after use is sheltered jthe number percent that energy is shared.
Definition simultaneously
z j ( t ) = Σ k = 1 j ≠ k N s k ( t ) - - - ( 13 )
Z wherein j(t) be at source S jlower active sum of interference.
After definition application time-frequency masking M, signal-to-noise ratio is
SIR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | M ( t , ω ) Z j ( t , ω ) | | 2 - - - ( 14 )
SIR wherein mthe main signal-to-noise ratio of estimating after application time-frequency masking M separation signal.
Pass through PSR mand SIR mcan estimate approximate W-separated orthogonality WDO m.
WDO M = | | M ( t , ω ) S j ( t , ω ) | | 2 - | | M ( t , ω ) Y j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2 - - - ( 15 )
Because voice signal has sparse time-frequency representation, the power of its time-frequency representation accounts for the exhausted vast scale of general power, and the product amplitude of its time-frequency representation is conventionally always little.Therefore meet the separated orthogonality condition of weak W.Approximate W-separated intercept is higher, has better separating effect.Want to obtain good time-frequency masking effect, window function type and choosing of size are played vital effect to its performance.Especially, work as WDO mmeet the separated quadrature of W-at=1 o'clock.
According to the experiment of Scott Rickard (Scott Rickard, Radu Balan and Justinian Rosca.Real-time time-frequency based blind source separation.Proceedings ICA2001, pp.651-656, December2001.), under 0dB, the WDO ratio of different number sound sources is as follows
N 2 3 4 5 6 7 8 9 10
WDO 93.6 88.0 83.4 79.2 75.6 72.3 69.3 66.6 64
As shown in Figure 3, Figure 4, by the situation of 3 sound sources is carried out to simplation verification, horizontal ordinate is WDO value, and ordinate is voice signal number of samples, can see in 3 sound source situations, and signal more than 80% is being quadrature.
As Fig. 5, Fig. 6 and Fig. 7, in addition 2 sound sources are carried out to nearly orthogonal condition Verification, respectively to signal s 1(t), s 2(t) carry out time frequency analysis, respectively with analyze simultaneously horizontal ordinate is the time, and ordinate is frequency.Window function W (t) chooses Hamming window, length of window 64ms, and by Fig. 5, Fig. 6, Fig. 7 can find out, in comprise seldom with composition, can prove that sound-source signal meets approximate W-separated quadrature thus.
The 3rd step, the SRP-PHAT localization method of associating approximate W-separated many sound sources of quadrature circular array
SRP-PHAT algorithm is by calculating the controlled responding power function of the right phase tranformation of all microphones and obtaining an objective function, the Beam-former of devise optimum is also controlled wave beam and is scanned at all possible receive direction, and the direction value of wave beam output power maximum obtains the direction of sound source.
The SRP-PHAT algorithm of 1 dual microphone
For only there being two microphone m iand m jarray, from the signal of position angle and pitching, arriving two microphone time delays is Δ τ ij(θ, φ), TDOA can estimate by broad sense simple crosscorrelation (GCC), be expressed as:
Δτ ij ( θ , φ ) = arg max τ P ( r ) = arg max τ R s i , s j ( Δτ ij ( θ , φ ) ) - - - ( 16 )
Wherein P (r) is three-dimensional space vectors r spatial likelihood function, can obtain by calculating all possible θ and φ.Broad sense cross correlation function Rs is j(Δ τ i, j(θ, φ)) in frequency domain, can be expressed as:
R s i s j ( Δ τ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ ij ( θ , φ ) ) dω - - - ( 17 )
ψ wherein ij(ω) be weighting function, S i(ω) S* j(ω) be cross-spectral density function.
Phase tranformation (PHAT) method is exactly a kind of typical transform method.
Definition phase weighting function is:
Ψ ij = ( ω ) = 1 | S i ( ω ) S j * ( ω ) | - - - ( 18 )
By selecting suitable weighting function, make the controlled responding power of delay accumulation meet optimization signal-to-noise ratio (SNR) Criterion, broad sense simple crosscorrelation Rs is j(Δ τ i, j(θ, φ)) in limited scope τ, show as a peak value, correspondence propagates into microphone m iand m jdelay TDOA.This algorithm has certain noise immunity, anti-reverberation and robustness in auditory localization.
2 circular array SRP-PHAT algorithms
The broad sense simple crosscorrelation right to all microphones summation
P ( Δτ 1 , Δτ 2 , . . . Δτ N ) = Σ i = 1 N Σ j = 1 N R s i s j ( Δτ ij ( θ , φ ) ) = Σ i = 1 N Σ j = 1 N ∫ - π π Ψ i , j ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ i - Δτ j ) dω - - - ( 19 )
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
Along with the increase of microphone number, dual microphone SRP-PHAT method expands to round microphone SRP-PHAT method naturally.
The circular array of sound source more than 3 SRP-PHAT algorithm
When there is two and above sound source, when there is more than two sound source, the SRP-PHAT peak value of a sound source has been sneaked into the SRP-PHAT peak value of another sound source, on some points, can produce false peak value, is difficult to find local maximal peak simultaneously simultaneously.
Utilize foregoing voice signal approximate W-separated orthogonality, at time-frequency domain, estimate that each sound-source signal arrives the relative time delay of microphone array.
Utilize Short Time Fourier Transform as approximate W-separated orthogonal transformation.
The frequency domain representation of supposing the signal model of i microphone is:
X i [ ω , τ ] = S n ( ω , τ ) e - jω Δτ n , i + N i [ ω , τ ] - - - ( 20 )
If given window function W, the Short Time Fourier Transform of sj is Sj, has
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ - - - ( 21 )
By selecting appropriate window function and size, at signal, be under approximate W-separated orthogonality hypothesis, only have a sound source at any time-Frequency point is effective.Its cross-spectrum is:
E [ X i [ ω , τ ] X j * [ ω , τ ] ] = | S n ( ω , τ ) | 2 e - jω ( Δτ i - Δτ j ) - - - ( 22 )
The time delay Δ τ n between microphone i and j, i-Δ τ n, j can obtain by cross-power spectrum.
1 two auditory localizations of embodiment
1. uniform circular array row location model is selected
Emulation experiment is simulated under different signal to noise ratio (S/N ratio)s and reverberation environment, and Homogeneous Circular array is placed in the room of 7m * 8m * 3.5m, and its 8 yuan of microphone locus are respectively [3.25 ,-1.6,1.5], [3.25,1.1,1.5], [1.87,3.75,1.5], [1.0,3.75,1.5], [3.25,1.8,1.5], [3.25,-1.0,1.5], [2.2 ,-3.75,1.5], [0.6 ,-3.75,1.5].
2. the selection of sound source
Sound source is the random voice signal producing, and signal to noise ratio (S/N ratio) is 0-30dB.Random interfering signal is gaussian signal, is used for simulating air condition electric fan and from noise outside window, noise power can reach 10dB the most by force, and the corresponding reverberation time is determined by the reflection coefficient of room wall, floor and ceiling.
3. pair array reception signal carries out Short Time Fourier Transform (STFT)
If given window function W, s jshort Time Fourier Transform be S j, have
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ - - - ( 22 )
Want to obtain good time-frequency masking effect, window function type and choosing of size are played vital effect to its performance.Wherein window function is chosen Hamming window, and window size is 1024 points.
4. carry out the broad sense simple crosscorrelation of phase tranformation
By choosing suitable window function, desirable good separating effect, meets approximate W-separated quadrature.On this basis, can carry out broad sense computing cross-correlation.
Broad sense cross correlation function Rs is j(Δ τ i, j(θ, φ)) in frequency domain, can be expressed as:
R s i s j ( Δτ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ ij ( θ , φ ) ) dω - - - ( 22 )
ψ wherein ij(ω) be weighting function, for:
Ψ ij ( ω ) = 1 | S i ( ω ) S j * ( ω ) | - - - ( 23 )
The broad sense simple crosscorrelation right to all microphones summation
P ( Δτ 1 , Δτ 2 , . . . Δτ N ) = Σ i = 1 N Σ j = 1 N R s i s j ( Δτ ij ( θ , φ ) ) = Σ i = 1 N Σ j = 1 N ∫ - π π Ψ i , j ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ i - Δτ j ) dω - - - ( 24 )
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
Obtain P (Δ τ 1, Δ τ 2... Δ τ n) maximal value after, can determine pitching angle theta and the position angle φ of sound source.
5. the result after above step
Shown in Figure 10, Figure 11, be respectively circular array at 20dB, sound source wave field image under 30dB signal to noise ratio (S/N ratio).In figure, is microphone position, and zero represents the sound source of estimating, * for disturbing residing position.
The locus that Figure 10 shows that two sound sources is respectively [0.59,2.08,1.5] and [0.29 ,-1.37,1.5], and signal to noise ratio (S/N ratio) is 20dB.Random interfering signal is gaussian signal, is used for simulating air condition electric fan and from noise outside window, locus is respectively [2 ,-4,1.5], [3.5 ,-3.2,1.5], noise power can reach 10dB the most by force, and the corresponding reverberation time is determined by the reflection coefficient of room wall, floor and ceiling.
The locus that Figure 11 shows that two sound sources is respectively [1.5,2.1,1.5] and [2.1,0.8,1.5], and signal to noise ratio (S/N ratio) is 30dB.Be used for simulating air condition electric fan and from noise outside window away from two sound sources.
Adopt the SRP-PHAT algorithm of associating approximate W-separated quadrature to carry out orientation estimation, choose Hamming window, window size is 1024 points.Shown in Figure 10, Figure 11, be respectively circular array at 20dB, sound source wave field image under 30dB signal to noise ratio (S/N ratio).Is microphone position, and zero represents the sound source of estimating, * for disturbing residing position.Visible under identical background noise environment, the signal to noise ratio (S/N ratio) of signal more high position precision is also higher.
Shown in Figure 12, Figure 13, be respectively the angle, sound bearing recording.Fig. 5 position angle is respectively φ 1=74 ° and φ 2=-78 °, although the azran of two signals is near and Signal-to-Noise is low, 2 sound sources can be differentiated out substantially, in true bearing, all there is spectrum peak, do not have false spectrum peak to occur, and target azimuth correctly still can draw estimated result, 2 sound sources can be differentiated out substantially.Figure 13 is measured position angle φ 1=17 °, φ 2=52 °.Although the azran of two signals is nearer, because signal to noise ratio (S/N ratio) is high and two angles differ larger, 2 sound sources are differentiated completely.Along with the increase of signal to noise ratio (S/N ratio), evaluated error can be more and more less, and estimated accuracy can be more and more higher.The larger estimation of differential seat angle between two signals is more accurate, when the difference of angle greatly to a certain extent after estimated accuracy tend towards stability.
Position angle shown in Figure 14 and the angle of pitch are (φ 1=74 °, θ 1=46 °) and (φ 2=-78 °, θ 2=0 °).
2 three auditory localizations of embodiment
When sound source increases to 3, in the situation that signal to noise ratio (S/N ratio) is low, can not solve well the problem at false spectrum peak.Under high s/n ratio condition, substantially can solve the problem at false spectrum peak, many sound sources are had to good resolution characteristic.
Specific implementation step, with example 1, is omited herein.
Figure 15 shows that three auditory localization two-dimensional imaging figure, signal to noise ratio (S/N ratio) 30dB.
Shown in Figure 16, Figure 17, be respectively the angle, sound bearing that the method that proposes herein and traditional SRP-PHAT method record under the higher condition of signal to noise ratio (S/N ratio).SRP-PHAT method based on approximate W-separated quadrature can solve the problem at false spectrum peak effectively, 3 signal sources can be differentiated and opened, and traditional SRP-PHAT method there will be false spectrum peak, 3 useful signals of indistinguishable.
Figure 18 shows that the sound-source signal that 8 yuan of microphone array received arrive, can find out that interference source is on No. 7 microphones impacts close to are larger from it, Figure 19 shows that 60 times signal to noise ratio (S/N ratio)s of different reverberation time T and orientation angle error relationship curve, RT60 chooses respectively 300ms, 450ms and 600ms.Along with the increase of T60, evaluated error is increasing, and estimated accuracy can be more and more lower.Visible in the situation that reverberation is large, be difficult to resolution target orientation, this method is applicable to the location under medium reverberation.
From simulation result, can find out, adopt the SRP-PHAT algorithm of even ring array to there is good positioning performance.Particularly, when SNR is higher, when reverberation is moderate, locating effect is better
The separated orthogonality hypothesis of W-the present invention is directed to based on the sparse property of voice signal does not meet many sound sources, two key properties of signal to noise ratio (S/N ratio) after signal retention rate and time-frequency masking after introducing voice signal time-frequency masking, derived approximate W-separated orthogonality hypothesis condition, many sound-source signals are divided into the time frequency point sets of non-overlapping copies, each set only comprises the time frequency component of single source signal, at time-frequency domain, estimates that each sound-source signal arrives the relative time delay of microphone array.Estimate that source signal arrives the relative time delay of microphone array.Special employing has the more circular array of high spatial resolution, realized and the high-resolution of the position angle of many sound-source signals, the angle of pitch having been estimated simultaneously, realize the space orientation of sound-source signal, overcome the three-dimensional fix problem that existing sound localization method cannot effectively be realized a plurality of aliasing sound sources.
Finally it should be noted that: these are only the preferred embodiments of the present invention; be not limited to the present invention; although the present invention is had been described in detail with reference to embodiment; for a person skilled in the art; its technical scheme that still can record aforementioned each embodiment is modified; or part technical characterictic is wherein equal to replacement; but within the spirit and principles in the present invention all; any modification of doing, be equal to replacement, improvement etc., within protection scope of the present invention all should be included in.

Claims (5)

1.一种SRP-PHAT多源空间定位方法,其特征在于,包括以下步骤:1. a SRP-PHAT multi-source spatial positioning method, is characterized in that, comprises the following steps: 1)在假设条件下计算空间坐标,首先假设在数据获得过程中均匀圆形麦克风阵列的全部麦克风的数目和空间位置不变,声源与麦克风距离符合声场模型的要求,各个麦克风的物理性质相同,各向同性的麦克风均匀分布在一个半径为r的位于x-y平面的圆周上,采用极坐标来表示平面波s的到达方向,坐标系的原点位于圆形阵列的圆心位置上,信号的俯仰角θ∈[0,π/2],而方位角φ∈[0,2π];1) Calculating the spatial coordinates under assumptions, first assuming that the number and spatial position of all microphones in the uniform circular microphone array remain unchanged during the data acquisition process, the distance between the sound source and the microphones meets the requirements of the sound field model, and the physical properties of each microphone are the same , the isotropic microphones are evenly distributed on a circle with a radius of r on the x-y plane, using polar coordinates to represent the arrival direction of the plane wave s, the origin of the coordinate system is located at the center of the circular array, and the pitch angle θ of the signal ∈[0,π/2], while the azimuth φ∈[0,2π]; 2)多声源信号划分为互不重叠的时频点集合,使每个时频窗内只包含一个活动的源信号,满足弱的W分离正交条件;并选取汉明窗,当WDOM=1时满足W-分离正交;2) The multi-source signal is divided into non-overlapping time-frequency point sets, so that each time-frequency window contains only one active source signal, which satisfies the weak W separation orthogonal condition; and select Hamming window, when WDO M =1 satisfies W-separated orthogonality; 3)通过SRP-PHAT算法计算所有麦克风对的相位变换的可控响应功率函数和得到一个目标函数,波束形成器的控制波束在所有可能的接收方向进行扫描,则波束输出功率最大的方向值即得到声源的方向。3) Calculate the controllable response power function of the phase transformation of all microphone pairs through the SRP-PHAT algorithm and obtain an objective function. The control beam of the beamformer scans in all possible receiving directions, and the direction value of the maximum beam output power is Get the direction of the sound source. 2.根据权利要求1所述一种SRP-PHAT多源空间定位方法,其特征在于,所述步骤2)包括:2. a kind of SRP-PHAT multi-source spatial positioning method according to claim 1, is characterized in that, described step 2) comprises: 首先引入两个重要的特性准则:(1)掩蔽在多大程度上保留了感兴趣的声源;(2)掩蔽在多大程度上抑制了干扰声源;First, two important characteristic criteria are introduced: (1) how much masking preserves the sound source of interest; (2) how much masking suppresses interfering sound sources; 考虑将多声源信号划分为互不重叠的时频点集合,每个时频窗内只包含一个活动的源信号,而且近似满足Consider dividing the multi-source signal into a set of non-overlapping time-frequency points, each time-frequency window contains only one active source signal, and approximately satisfies SS jj (( tt ,, ωω )) SS kk (( tt ,, ωω )) ≈≈ 00 ,, ∀∀ tt ,, ωω 定义时频掩蔽码为Define the time-frequency mask as 通过估计对应每个源的时频掩蔽,由此可以从混合源中得到某个源jBy estimating the time-frequency mask corresponding to each source, a certain source j can be obtained from the mixture of sources SS jj (( tt ,, ωω )) == Mm jj (( tt ,, ωω )) Xx (( tt ,, ωω )) ,, ∀∀ tt ,, ωω 其中Mj为源j支集的指示函数,Sj(t,ω),X(t,ω)分别为sj,x(t)的时频表示,Among them, Mj is the indicator function of the support set of source j, Sj(t, ω), X(t, ω) are the time-frequency representations of sj, x(t), respectively, 对于给定的时频掩码M,定义保留的信号比率PSRM:For a given time-frequency mask M, the preserved signal ratio PSRM is defined: PSRPSR Mm == || || Mm (( tt ,, ωω )) SS jj (( tt ,, ωω )) || || 22 || || SS jj (( tt ,, ωω )) || || 22 PSRM为估量在使用掩蔽后所保留的源Sj能量所占的百分比;PSRM is an estimate of the percentage of source Sj energy retained after masking is applied; 同时定义define at the same time zz jj (( tt )) == ΣΣ kk == 11 jj ≠≠ kk NN sthe s kk (( tt )) 其中zj(t)为在源Sj的干扰下所有源之和;Where zj(t) is the sum of all sources under the interference of source Sj; 定义应用时频掩蔽M后信号干扰比为:Define the signal-to-interference ratio after applying the time-frequency mask M as: SIRSIR Mm == || || Mm (( tt ,, ωω )) SS jj (( tt ,, ωω )) || || 22 || || Mm (( tt ,, ωω )) ZZ jj (( tt ,, ωω )) || || 22 其中SIRM主要估量在应用时频掩蔽M分离信号后的信号干扰比;Among them, SIRM mainly estimates the signal-to-interference ratio after applying time-frequency masking M to separate signals; 通过PSRM和SIRM可估量近似W-分离正交性WDOM:The approximate W-separated orthogonality WDOM can be estimated by PSRM and SIRM: WDOWDO Mm == || || Mm (( tt ,, ωω )) SS jj (( tt ,, ωω )) || || 22 -- || || Mm (( tt ,, ωω )) YY jj (( tt ,, ωω )) || || 22 || || SS jj (( tt ,, ωω )) || || 22 由于语音信号具有稀疏的时频表示,其时频表示的功率占总功率的绝大比例,其时频表示的乘积幅度通常总是小的,因此满足弱的W分离正交条件;特别地,当WDOM=1时满足W-分离正交。Since the speech signal has a sparse time-frequency representation, the power of its time-frequency representation accounts for a large proportion of the total power, and the magnitude of the product of its time-frequency representation is usually always small, thus satisfying the weak W-separated orthogonality condition; in particular, W-separated orthogonality is satisfied when WDOM=1. 3.根据权利要求1所述一种SRP-PHAT多源空间定位方法,其特征在于,所述步骤3)对于双麦克风的SRP-PHAT算法,3. a kind of SRP-PHAT multi-source spatial positioning method according to claim 1, is characterized in that, described step 3) for the SRP-PHAT algorithm of two microphones, 对于仅有两个麦克风,麦克风mi和麦克风mj阵列,来自方位角和俯仰角的信号到达两麦克风时延为Δτij(θ,φ),TDOA可以通过广义互相关(GCC)估计,表示为:For an array with only two microphones, microphone mi and microphone mj, the signals from the azimuth and elevation angles arrive at the two microphones with a delay of Δτij(θ, φ), TDOA can be estimated by generalized cross-correlation (GCC), expressed as: ΔΔ ττ ijij (( θθ ,, φφ )) == argarg maxmax ττ PP (( rr )) == argarg maxmax ττ RR sthe s ii ,, sthe s jj (( ΔΔ ττ ijij (( θθ ,, φφ )) )) 其中P(r)是三维空间矢量r空间似然函数,可通过计算所有可能的θ和φ得到,广义互相关函数Rsisj(Δτi,j(θ,φ))在频域中可表示为:where P(r) is the space likelihood function of the three-dimensional space vector r, which can be obtained by calculating all possible θ and φ, and the generalized cross-correlation function Rsisj(Δτi, j(θ, φ)) can be expressed in the frequency domain as: RR sthe s ii ,, sthe s jj (( ΔΔ ττ ijij (( θθ ,, φφ )) )) == ∫∫ -- ππ ππ ΨΨ ijij (( ωω )) SS ii (( ωω )) SS jj ** (( ωω )) ee jωjω (( ΔΔ ττ ijij (( θθ ,, φφ )) )) dωdω 其中ψij(ω)为加权函数,Si(ω)S*j(ω)为互功率谱密度函数;Where ψij(ω) is the weighting function, Si(ω)S*j(ω) is the cross power spectral density function; 相位变换(PHAT)法就是一种典型的变换方法,The phase transformation (PHAT) method is a typical transformation method. 定义相位加权函数为:Define the phase weighting function as: ΨΨ ijij (( ωω )) == 11 || SS ii (( ωω )) SS jj ** (( ωω )) || 通过选择合适的加权函数,使延时累加可控响应功率满足最优化信噪比准则,广义互相关Rsisj(Δτi,j(θ,φ))在所限制的范围τ内表现为一个峰值,对应传播到麦克风mi和麦克风mj的延迟TDOA。By selecting an appropriate weighting function, the time-delay accumulated controllable response power satisfies the optimal SNR criterion, and the generalized cross-correlation Rsisj(Δτi, j(θ, φ)) shows a peak within the limited range τ, corresponding to Delay TDOA propagated to microphone mi and microphone mj. 4.根据权利要求1所述一种SRP-PHAT多源空间定位方法,其特征在于,所述步骤3)对于圆阵列麦克风声源的SRP-PHAT算法:4. a kind of SRP-PHAT multi-source spatial positioning method according to claim 1, is characterized in that, described step 3) for the SRP-PHAT algorithm of circular array microphone sound source: 对所有麦克风对的广义互相关求和:Generalized cross-correlation over all microphone pairs Summing: PP (( ΔΔ ττ 11 ,, ΔΔ ττ 22 ,, ·&Center Dot; ·&Center Dot; ·&Center Dot; ΔΔ ττ NN )) == ΣΣ ii == 11 NN ΣΣ jj == 11 NN RR sthe s ii ,, sthe s jj (( ΔΔ ττ ijij (( θθ ,, φφ )) )) == ΣΣ ii == 11 NN ΣΣ jj == 11 NN ∫∫ -- ππ ππ ΨΨ ijij (( ωω )) SS ii (( ωω )) SS jj ** (( ωω )) ee jωjω (( ΔΔ ττ ii -- ΔΔ ττ jj )) dωdω 其中Δτ1,Δτ2…ΔτN为N个麦克风的可控延时,其中Δτi=τi0i=1…N,τ0为参考时延估计,取所有麦克风延时中最小的为参考。Among them, Δτ 1 , Δτ 2 ... Δτ N are the controllable delays of N microphones, where Δτ i = τ i0 i=1...N, τ 0 is the reference delay estimate, and the smallest of all microphone delays is taken for reference. 5.根据权利要求1所述一种SRP-PHAT多源空间定位方法,其特征在于,所述步骤3)对于多声源圆阵列麦克风SRP-PHAT算法:5. a kind of SRP-PHAT multi-source spatial positioning method according to claim 1, is characterized in that, described step 3) for multi-source circular array microphone SRP-PHAT algorithm: 当同时存在两个及以上声源时,当同时存在两个以上声源时,一个声源的SRP-PHAT峰值混入了另一个声源的SRP-PHAT峰值,一些点上会产生虚假的峰值,很难找到局部最大峰值;When there are two or more sound sources at the same time, when there are more than two sound sources at the same time, the SRP-PHAT peak of one sound source is mixed with the SRP-PHAT peak of another sound source, and false peaks will be generated at some points. It is difficult to find the local maximum peak; 利用语音信号近似W-分离正交性,在时频域估计各声源信号到达麦克风、阵列的相对时延,利用短时傅里叶变换作为近似W-分离正交变换,Using the approximate W-separation orthogonality of the speech signal, estimate the relative delay of each sound source signal reaching the microphone and array in the time-frequency domain, and use the short-time Fourier transform as an approximate W-separation orthogonal transformation, 假设第i个麦克风的信号模型的频域表示为:Suppose the frequency domain representation of the signal model of the i-th microphone is: Xx ii [[ ωω ,, ττ ]] == SS nno (( ωω ,, ττ )) ee -- jωΔjωΔ ττ nno ,, ii ++ NN ii [[ ωω ,, ττ ]] 若给定窗函数W,sj的短时傅里叶变换为Sj,有If the window function W is given, the short-time Fourier transform of sj is Sj, we have SS jj (( tt ,, ωω )) == Ff WW (( sthe s jj (( ·· )) )) (( tt ,, ωω )) == 11 22 ππ ∫∫ -- ∞∞ ∞∞ WW (( ττ -- tt )) sthe s jj (( ττ )) ee -- iωτiωτ dτdτ 通过选择恰当的窗函数及大小,在信号为近似W-分离正交性假设下,仅有一个声源在任何时间-频率点有效,则其互谱为:By choosing an appropriate window function and size, under the assumption that the signal is approximately W-separated and orthogonal, only one sound source is effective at any time-frequency point, then its cross-spectrum is: EE. [[ Xx ii [[ ωω ,, ττ ]] Xx jj ** [[ ωω ,, ττ ]] ]] == || SS nno (( ωω ,, ττ )) || 22 ee -- jωjω (( ΔτΔτ ii -- ΔτΔτ jj )) 则麦克风i和麦克风j之间的延时Δτn,i-Δτn,j可以通过互功率谱得到。Then the time delay Δτn,i-Δτn,j between microphone i and microphone j can be obtained through the cross power spectrum.
CN201410366922.4A 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods Expired - Fee Related CN104142492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410366922.4A CN104142492B (en) 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410366922.4A CN104142492B (en) 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods

Publications (2)

Publication Number Publication Date
CN104142492A true CN104142492A (en) 2014-11-12
CN104142492B CN104142492B (en) 2017-04-05

Family

ID=51851720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410366922.4A Expired - Fee Related CN104142492B (en) 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods

Country Status (1)

Country Link
CN (1) CN104142492B (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104898086A (en) * 2015-05-19 2015-09-09 南京航空航天大学 Sound intensity estimation sound source orientation method applicable for minitype microphone array
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN105044675A (en) * 2015-07-16 2015-11-11 南京航空航天大学 Fast SRP sound source positioning method
CN105467364A (en) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for localizing target sound source
CN105489219A (en) * 2016-01-06 2016-04-13 广州零号软件科技有限公司 Indoor space service robot distributed speech recognition system and product
CN106093864A (en) * 2016-06-03 2016-11-09 清华大学 A kind of microphone array sound source space real-time location method
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106950542A (en) * 2016-01-06 2017-07-14 中兴通讯股份有限公司 The localization method of sound source, apparatus and system
CN107063437A (en) * 2017-04-12 2017-08-18 中广核研究院有限公司北京分公司 Nuclear power station noise-measuring system based on microphone array
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A Sound Source Localization System Based on Distributed Microphone Array
CN107271963A (en) * 2017-06-22 2017-10-20 广东美的制冷设备有限公司 The method and apparatus and air conditioner of auditory localization
CN107290711A (en) * 2016-03-30 2017-10-24 芋头科技(杭州)有限公司 A kind of voice is sought to system and method
CN107918108A (en) * 2017-11-14 2018-04-17 重庆邮电大学 A kind of uniform circular array 2-d direction finding method for quick estimating
CN108089153A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of sound localization method, apparatus and system
CN108089152A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of apparatus control method, apparatus and system
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
CN108510987A (en) * 2018-03-26 2018-09-07 北京小米移动软件有限公司 Method of speech processing and device
CN108549052A (en) * 2018-03-20 2018-09-18 南京航空航天大学 A kind of humorous domain puppet sound intensity sound localization method of circle of time-frequency-spatial domain joint weighting
CN108872939A (en) * 2018-04-29 2018-11-23 桂林电子科技大学 Interior space geometric profile reconstructing method based on acoustics mirror image model
CN109254266A (en) * 2018-11-07 2019-01-22 苏州科达科技股份有限公司 Sound localization method, device and storage medium based on microphone array
CN109633551A (en) * 2019-01-08 2019-04-16 中国电子科技集团公司第三研究所 A kind of acoustic array of detectable a variety of acoustic targets
CN109997375A (en) * 2016-11-09 2019-07-09 西北工业大学 Concentric circles difference microphone array and associated beam are formed
CN110376551A (en) * 2019-07-04 2019-10-25 浙江大学 A kind of TDOA localization method based on the distribution of acoustical signal time-frequency combination
CN110544490A (en) * 2019-07-30 2019-12-06 南京林业大学 A Sound Source Localization Method Based on Gaussian Mixture Model and Spatial Power Spectrum Features
CN110703199A (en) * 2019-10-22 2020-01-17 哈尔滨工程大学 Quaternary cross array high-precision azimuth estimation method based on compass compensation
CN110726972A (en) * 2019-10-21 2020-01-24 南京南大电子智慧型服务机器人研究院有限公司 Voice sound source positioning method using microphone array under interference and high reverberation environment
CN111060872A (en) * 2020-03-17 2020-04-24 深圳市友杰智新科技有限公司 Sound source positioning method and device based on microphone array and computer equipment
CN111798869A (en) * 2020-09-10 2020-10-20 成都启英泰伦科技有限公司 Sound source positioning method based on double microphone arrays
CN111833901A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio processing method, audio processing apparatus, audio processing system, and medium
CN111880148A (en) * 2020-08-07 2020-11-03 北京字节跳动网络技术有限公司 Sound source positioning method, device, equipment and storage medium
CN111929645A (en) * 2020-09-23 2020-11-13 深圳市友杰智新科技有限公司 Method and device for positioning sound source of specific human voice and computer equipment
CN112379330A (en) * 2020-11-27 2021-02-19 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112684412A (en) * 2021-01-12 2021-04-20 中北大学 Sound source positioning method and system based on pattern clustering
CN113470682A (en) * 2021-06-16 2021-10-01 中科上声(苏州)电子有限公司 Method, device and storage medium for estimating speaker orientation by microphone array
CN113655440A (en) * 2021-08-09 2021-11-16 西南科技大学 An adaptive compromise pre-whitening sound source localization method
CN113936687A (en) * 2021-12-17 2022-01-14 北京睿科伦智能科技有限公司 Method for real-time voice separation voice transcription
CN115150712A (en) * 2022-06-07 2022-10-04 中国第一汽车股份有限公司 Vehicle-mounted microphone system and automobile
CN115295000A (en) * 2022-10-08 2022-11-04 深圳通联金融网络科技服务有限公司 Method, device and equipment for improving speech recognition accuracy under multi-object speaking scene
CN115407266A (en) * 2022-07-25 2022-11-29 南京航空航天大学 Direct positioning method based on cross-spectrum subspace orthogonality
CN115951305A (en) * 2022-12-22 2023-04-11 四川启睿克科技有限公司 Sound source positioning method based on SRP-PHAT space spectrum and GCC
CN116068485A (en) * 2023-01-03 2023-05-05 四川九洲电器集团有限责任公司 Iterative direction finding method of multichannel acoustic acquisition array based on local SRP
CN118859103A (en) * 2024-07-31 2024-10-29 上海特金信息科技有限公司 Multi-target separation positioning method, device, electronic equipment and medium
CN120065115A (en) * 2025-04-28 2025-05-30 山东浪潮科学研究院有限公司 A sound source localization method based on multi-microphone array collaborative networking

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090279714A1 (en) * 2008-05-06 2009-11-12 Samsung Electronics Co., Ltd. Apparatus and method for localizing sound source in robot
CN101762806A (en) * 2010-01-27 2010-06-30 华为终端有限公司 Sound source locating method and apparatus thereof
KR20140015893A (en) * 2012-07-26 2014-02-07 삼성테크윈 주식회사 Apparatus and method for estimating location of sound source

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090279714A1 (en) * 2008-05-06 2009-11-12 Samsung Electronics Co., Ltd. Apparatus and method for localizing sound source in robot
CN101762806A (en) * 2010-01-27 2010-06-30 华为终端有限公司 Sound source locating method and apparatus thereof
KR20140015893A (en) * 2012-07-26 2014-02-07 삼성테크윈 주식회사 Apparatus and method for estimating location of sound source

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAVID AYLLO´N ET AL.: "Real-time phase-isolation algorithm for speech separation", 《19TH EUROPEAN SIGNAL PROCESSING CONFERENCE》 *
M. SWARTLING ET AL.: "Source Localization for Multiple Speech Sources Using Low Complexity Non-Parametric Source Separation and Clustering", 《SIGNAL PROCESSING》 *

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN104936091B (en) * 2015-05-14 2018-06-15 讯飞智元信息科技有限公司 Intelligent interactive method and system based on circular microphone array
CN104898086A (en) * 2015-05-19 2015-09-09 南京航空航天大学 Sound intensity estimation sound source orientation method applicable for minitype microphone array
CN105044675A (en) * 2015-07-16 2015-11-11 南京航空航天大学 Fast SRP sound source positioning method
CN105467364A (en) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for localizing target sound source
CN105467364B (en) * 2015-11-20 2019-03-29 百度在线网络技术(北京)有限公司 A kind of method and apparatus positioning target sound source
CN105489219A (en) * 2016-01-06 2016-04-13 广州零号软件科技有限公司 Indoor space service robot distributed speech recognition system and product
CN106950542A (en) * 2016-01-06 2017-07-14 中兴通讯股份有限公司 The localization method of sound source, apparatus and system
CN107290711A (en) * 2016-03-30 2017-10-24 芋头科技(杭州)有限公司 A kind of voice is sought to system and method
CN106093864A (en) * 2016-06-03 2016-11-09 清华大学 A kind of microphone array sound source space real-time location method
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106448722B (en) * 2016-09-14 2019-01-18 讯飞智元信息科技有限公司 The way of recording, device and system
CN109997375A (en) * 2016-11-09 2019-07-09 西北工业大学 Concentric circles difference microphone array and associated beam are formed
US10816633B2 (en) 2016-11-23 2020-10-27 Hangzhou Hikvision Digital Technology Co., Ltd. Device control method, apparatus and system
CN108089152A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of apparatus control method, apparatus and system
WO2018095166A1 (en) * 2016-11-23 2018-05-31 杭州海康威视数字技术股份有限公司 Device control method, apparatus and system
CN108089153A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of sound localization method, apparatus and system
CN108089152B (en) * 2016-11-23 2020-07-03 杭州海康威视数字技术股份有限公司 Equipment control method, device and system
CN107063437A (en) * 2017-04-12 2017-08-18 中广核研究院有限公司北京分公司 Nuclear power station noise-measuring system based on microphone array
CN107102296B (en) * 2017-04-27 2020-04-14 大连理工大学 A sound source localization system based on distributed microphone array
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A Sound Source Localization System Based on Distributed Microphone Array
CN107271963A (en) * 2017-06-22 2017-10-20 广东美的制冷设备有限公司 The method and apparatus and air conditioner of auditory localization
CN107918108A (en) * 2017-11-14 2018-04-17 重庆邮电大学 A kind of uniform circular array 2-d direction finding method for quick estimating
CN108198568B (en) * 2017-12-26 2020-10-16 太原理工大学 Method and system for localizing multiple sound sources
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
CN108549052A (en) * 2018-03-20 2018-09-18 南京航空航天大学 A kind of humorous domain puppet sound intensity sound localization method of circle of time-frequency-spatial domain joint weighting
CN108549052B (en) * 2018-03-20 2021-04-13 南京航空航天大学 Time-frequency-space domain combined weighted circular harmonic domain pseudo-sound strong sound source positioning method
US10930304B2 (en) 2018-03-26 2021-02-23 Beijing Xiaomi Mobile Software Co., Ltd. Processing voice
CN108510987B (en) * 2018-03-26 2020-10-23 北京小米移动软件有限公司 Voice processing method and device
CN108510987A (en) * 2018-03-26 2018-09-07 北京小米移动软件有限公司 Method of speech processing and device
CN108872939B (en) * 2018-04-29 2020-09-29 桂林电子科技大学 Reconstruction method of indoor space geometric contour based on acoustic mirror model
CN108872939A (en) * 2018-04-29 2018-11-23 桂林电子科技大学 Interior space geometric profile reconstructing method based on acoustics mirror image model
CN109254266A (en) * 2018-11-07 2019-01-22 苏州科达科技股份有限公司 Sound localization method, device and storage medium based on microphone array
CN109633551A (en) * 2019-01-08 2019-04-16 中国电子科技集团公司第三研究所 A kind of acoustic array of detectable a variety of acoustic targets
CN111833901B (en) * 2019-04-23 2024-04-05 北京京东尚科信息技术有限公司 Audio processing method, audio processing device, system and medium
CN111833901A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio processing method, audio processing apparatus, audio processing system, and medium
CN110376551B (en) * 2019-07-04 2021-05-04 浙江大学 A TDOA localization method based on time-frequency joint distribution of acoustic signals
CN110376551A (en) * 2019-07-04 2019-10-25 浙江大学 A kind of TDOA localization method based on the distribution of acoustical signal time-frequency combination
CN110544490B (en) * 2019-07-30 2022-04-05 南京工程学院 Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics
CN110544490A (en) * 2019-07-30 2019-12-06 南京林业大学 A Sound Source Localization Method Based on Gaussian Mixture Model and Spatial Power Spectrum Features
CN110726972A (en) * 2019-10-21 2020-01-24 南京南大电子智慧型服务机器人研究院有限公司 Voice sound source positioning method using microphone array under interference and high reverberation environment
CN110703199A (en) * 2019-10-22 2020-01-17 哈尔滨工程大学 Quaternary cross array high-precision azimuth estimation method based on compass compensation
CN111060872A (en) * 2020-03-17 2020-04-24 深圳市友杰智新科技有限公司 Sound source positioning method and device based on microphone array and computer equipment
CN111060872B (en) * 2020-03-17 2020-06-23 深圳市友杰智新科技有限公司 Sound source positioning method and device based on microphone array and computer equipment
CN111880148A (en) * 2020-08-07 2020-11-03 北京字节跳动网络技术有限公司 Sound source positioning method, device, equipment and storage medium
CN111798869A (en) * 2020-09-10 2020-10-20 成都启英泰伦科技有限公司 Sound source positioning method based on double microphone arrays
CN111798869B (en) * 2020-09-10 2020-11-17 成都启英泰伦科技有限公司 Sound source positioning method based on double microphone arrays
CN111929645A (en) * 2020-09-23 2020-11-13 深圳市友杰智新科技有限公司 Method and device for positioning sound source of specific human voice and computer equipment
CN112379330B (en) * 2020-11-27 2023-03-10 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112379330A (en) * 2020-11-27 2021-02-19 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112684412B (en) * 2021-01-12 2022-09-13 中北大学 A method and system for sound source localization based on pattern clustering
CN112684412A (en) * 2021-01-12 2021-04-20 中北大学 Sound source positioning method and system based on pattern clustering
CN113470682B (en) * 2021-06-16 2023-11-24 中科上声(苏州)电子有限公司 Method, device and storage medium for estimating speaker azimuth by microphone array
CN113470682A (en) * 2021-06-16 2021-10-01 中科上声(苏州)电子有限公司 Method, device and storage medium for estimating speaker orientation by microphone array
CN113655440A (en) * 2021-08-09 2021-11-16 西南科技大学 An adaptive compromise pre-whitening sound source localization method
CN113936687A (en) * 2021-12-17 2022-01-14 北京睿科伦智能科技有限公司 Method for real-time voice separation voice transcription
CN113936687B (en) * 2021-12-17 2022-03-15 北京睿科伦智能科技有限公司 Method for real-time voice separation voice transcription
CN115150712A (en) * 2022-06-07 2022-10-04 中国第一汽车股份有限公司 Vehicle-mounted microphone system and automobile
CN115407266A (en) * 2022-07-25 2022-11-29 南京航空航天大学 Direct positioning method based on cross-spectrum subspace orthogonality
CN115295000B (en) * 2022-10-08 2023-01-03 深圳通联金融网络科技服务有限公司 Method, device and equipment for improving speech recognition accuracy under multi-object speaking scene
CN115295000A (en) * 2022-10-08 2022-11-04 深圳通联金融网络科技服务有限公司 Method, device and equipment for improving speech recognition accuracy under multi-object speaking scene
CN115951305A (en) * 2022-12-22 2023-04-11 四川启睿克科技有限公司 Sound source positioning method based on SRP-PHAT space spectrum and GCC
CN116068485A (en) * 2023-01-03 2023-05-05 四川九洲电器集团有限责任公司 Iterative direction finding method of multichannel acoustic acquisition array based on local SRP
CN118859103A (en) * 2024-07-31 2024-10-29 上海特金信息科技有限公司 Multi-target separation positioning method, device, electronic equipment and medium
CN118859103B (en) * 2024-07-31 2025-01-21 上海特金信息科技有限公司 Multi-target separation positioning method, device, electronic equipment and medium
CN120065115A (en) * 2025-04-28 2025-05-30 山东浪潮科学研究院有限公司 A sound source localization method based on multi-microphone array collaborative networking

Also Published As

Publication number Publication date
CN104142492B (en) 2017-04-05

Similar Documents

Publication Publication Date Title
CN104142492A (en) SRP-PHAT multi-source spatial positioning method
Chen et al. Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field
CN111123192B (en) A two-dimensional DOA localization method based on circular array and virtual expansion
Chen et al. Acoustic source localization and beamforming: theory and practice
Ren et al. A novel multiple sparse source localization using triangular pyramid microphone array
Jamali-Rad et al. Sparsity-aware multi-source TDOA localization
CN105301563B (en) A kind of double sound source localization method that least square method is converted based on consistent focusing
CN102103200A (en) Acoustic source spatial positioning method for distributed asynchronous acoustic sensor
CN105223551B (en) A wearable sound source location tracking system and method
Zou et al. Multisource DOA estimation based on time-frequency sparsity and joint inter-sensor data ratio with single acoustic vector sensor
Zhao et al. Open‐Lake Experimental Investigation of Azimuth Angle Estimation Using a Single Acoustic Vector Sensor
CN105158734B (en) A kind of single vector hydrophone Passive Location based on battle array invariant
Tellakula Acoustic source localization using time delay estimation
Padois et al. Acoustic source localization using a polyhedral microphone array and an improved generalized cross-correlation technique
He et al. Closed-form DOA estimation using first-order differential microphone arrays via joint temporal-spectral-spatial processing
Xia et al. Noise reduction method for acoustic sensor arrays in underwater noise
Xiong et al. Fibonacci array-based focused acoustic camera for estimating multiple moving sound sources
Liu et al. A multiple sources localization method based on TDOA without association ambiguity for near and far mixed field sources
KR20090128221A (en) Sound source location estimation method and system according to the method
Svaizer et al. Environment aware estimation of the orientation of acoustic sources using a line array
Dang et al. Multiple sound source localization based on a multi-dimensional assignment model
Wang et al. 3-D sound source localization with a ternary microphone array based on TDOA-ILD algorithm
Pertilä Acoustic source localization in a room environment and at moderate distances
CN114994608B (en) Sound source localization method of multi-device self-organizing microphone array based on deep learning
Sun et al. Indoor multiple sound source localization using a novel data selection scheme

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170405

Termination date: 20200729

CF01 Termination of patent right due to non-payment of annual fee