CN1589127A

CN1589127A - Method and apparatus for removing noise from electronic signals

Info

Publication number: CN1589127A
Application number: CNA028231937A
Authority: CN
Inventors: 格雷戈里·C·伯内特
Original assignee: AliphCom LLC
Current assignee: AliphCom LLC
Priority date: 2001-11-21
Filing date: 2002-11-21
Publication date: 2005-03-02
Also published as: KR20040077661A; WO2004056298A1; EP1480589A1; KR100936093B1; AU2002359445A1; JP2005529379A

Abstract

A method and system for removing acoustic noise removal from human speech is described. Acoustic noise is removed regardless of noise type, amplitude, or orientation. The system includes a processor coupled among microphones and a voice activation detection ('V AD') element. The processor executes denoising algorithms that generate transfer functions. The processor receives acoustic data from the microphones and data from the VAD indicates voicing activity and when the VAD indicates no voicing activity. The transfer functions are used to generate a denoised data stream.

Description

From the signal of telecommunication, remove the method and apparatus of noise

Related application

The serial number of submitting in 12 days July calendar year 2001 of present patent application is a wherein continuation of a part of U.S. Patent application of 09/905,361, here in addition reference of this U.S. Patent application 09/905,361.The serial number that present patent application also requires submit to November 21 calendar year 2001 is the priority of 60/332,202 U.S. Provisional Patent Application.

Technical field

The invention belongs to the electronic system and the mathematical method that are used for removing or suppressing not expect noise from the sound transmission or record.

Background technology

During typical sound was used, the voice that human user sends were recorded, store and be transferred to the recipient who is positioned at the different location.In user's environment, may exist one or more noise sources to pollute useful (user's voice) signal by beyond thought noise.This makes the recipient, no matter is people or machine, all is difficult to maybe can not understand user's voice.Along with portable communication device, the increasing substantially of image drift mobile phone and personal digital assistant etc., it is especially serious that this problem becomes.Exist some to suppress the method that these noises increase in the prior art, but all have very big shortcoming.Such as, owing to calculate the required time, therefore existing method speed is slow.Existing method also needs loaded down with trivial details hardware, makes greatly distortion of useful signal, or has lower performance, makes them useless.At the work of Vaseghi for example, ISBN has described some existing methods in " Advanced DigitalSignal Processing and Noise Reducton " textbook of 0-471-62692-9.

Description of drawings

Figure 1 shows that the block diagram that removes the system that makes an uproar of an embodiment.

Figure 2 shows that the block diagram of the noise remove algorithm of an embodiment, hypothesis has a noise source and a directapath of leading to mike among this embodiment.

Figure 3 shows that the front end block diagram of the noise remove algorithm of an embodiment who generally is used for n different noise sources (these noise sources can be the echoes of reflection or another noise source).

Figure 4 shows that under the situation that n different noise sources and signaling reflex are arranged usually the front end block diagram of the noise remove algorithm of an embodiment.

Figure 5 shows that the flow chart of the denoising method of an embodiment.

Figure 6 shows that among the embodiment, in face of the airport terminal noise that comprises many other talkers and public broadcasting, the figure as a result of a woman's who says Americanese noise suppression algorithm.

Figure 7 shows that in Fig. 2, Fig. 3 and embodiment shown in Figure 4, adopt unidirectional and omnidirectional microphone to remove the block diagram of the physical arrangements of making an uproar.

Figure 8 shows that among the embodiment, comprise the microphone arrangement figure that makes an uproar that removes of two omnidirectional microphones.

Figure 9 shows that among the embodiment shown in Figure 8 the required C and the relation curve of distance.

Figure 10 shows that two mikes have among the embodiment of different response characteristics, the front end block diagram of noise cancelling alorithm.

Figure 11 A is depicted as in frequency response (percent) disparity map between two mikes (4 centimetres of distances) before the compensation.

Figure 11 B is depicted as among the embodiment, frequency response (percent) disparity map between latter two mike of DFT compensation (4 centimetres of distances).

Figure 11 C is depicted as among the alternative embodiment, frequency response (percent) disparity map between latter two mike of time domain filtering compensation (4 centimetres of distances).

The specific embodiment

Following detailed description helps thoroughly to understand embodiments of the invention.Yet, those skilled in the art will appreciate that the present invention also can realize under the situation of these detailed descriptions not having.In other cases, the 26S Proteasome Structure and Function that everybody is known does not show and describes, to avoid that the description of the embodiment of the invention is caused unnecessary confusion.

Unless description is arranged below in addition, otherwise the formation of the various modules shown in the figure is just identical with traditional design with operation.Therefore, here such module be need not to be described in detail, because they are all known by those skilled in the relevant art.Omitted more detailed description and reached the purpose of simplification, so that do not obscure detailed description of the present invention.Any necessity to the module of (or other embodiment) among the figure is revised and can be made by those skilled in the relevant art based on the detailed description here.

Figure 1 shows that the block diagram that removes the system that makes an uproar of an embodiment, the described system of making an uproar that removes has used the knowledge of obtaining from the active physiologic information of sounding when voice send.This system comprises mike 10 and pick off 20, provides signal at least one processor 30.Described processor comprises that one is removed make an uproar subsystem or algorithm 40.

Figure 2 shows that the block diagram of the noise remove algorithm of an embodiment, it shows employed system component.Suppose to have a noise source and a directapath of leading to mike.Fig. 2 comprises the diagram description to the process of the embodiment that comprises a signal source 100 and a noise source 101.This algorithm uses two mikes: " signal " mike 1 (" MIC1 ") and " noise " mike 2 (" MIC2 "), but be not limited thereto.Suppose the noisy most of signal of MIC1 collecting belt, and MIC2 collects the most of noise that has signal.Be expressed as s (n) from the data of signal source 100 to MIC1, wherein s (n) is to the discrete sampling from the signal of signal source 100.Be expressed as s from the data of signal source 100 to MIC2 ₂(n).Be expressed as n (n) from the data of noise source 101 to MIC2.Be expressed as n from the data of noise source 101 to MIC1 ₂(n).In like manner, the data from MIC1 to noise remove element 105 are expressed as m ₁(n), the data from MIC2 to noise remove element 105 are expressed as m ₂(n).

The noise remove element also receives the signal from sounding activity detection (" VAD ") element 104.VAD 104 detects and uses physiologic information to determine when the talker talks.In various embodiments, described VAD comprises a radio-frequency unit, a glottis resistance instrument, a Vltrasonic device, a sound throat microphone and/or a pneumatic detector.

Suppose that be consistent from the transfer function of signal source 100 to MIC1 with transfer function from noise source 101 to MIC2.Be expressed as H from the transfer function of signal source 100 to MIC2 ₂(z), the transfer function from noise source 101 to MIC1 is expressed as H ₁(z).Suppose that the transfer function unanimity does not hinder the versatility of this algorithm,, and, these ratios are redefined in order to simplify because the actual relationship between signal, noise and the mike is simple ratio.

In traditional noise-removal system, be used to remove noise from MIC1 from the information of MIC2.Yet a kind of speechless hypothesis is that VAD element 104 performances are bad, therefore must carefully remove and make an uproar, so just can be along with the too many signal of noise remove.Yet if hypothesis VAD 104 performances are fine, so that its value equals 0 when the user does not send voice, its value equals 1 when voice produce, and a substantial raising has so just been arranged aspect noise remove.

When analyzing single noise source 101 and arriving the directapath of mike, with reference to figure 2, the whole acoustic informations that enter MIC1 are expressed as m ₁(n).In like manner, the whole acoustic informations that enter MIC2 are expressed as m ₂(n).In z (numerical frequency) territory, be expressed as m ₁(z) and m ₂(z).Obtain then:

M ₁(z)＝S(z)+N ₂(z)

M ₂(z)＝N(z)+S ₂(z)

And

N ₂(z)＝N(z)H ₁(z)

S ₂(z)＝S(z)H ₂(z)

Therefore

M ₁(z)=S (z)+N (z) H ₁(z) equation 1

M ₂(z)＝N(z)+S(z)H ₂(z)

For system, all be this situation usually with two mikes.In the system of reality, will the noise leakage that enter MIC1 be arranged always, the noise leakage that enters MIC2 in addition.Equation 1 has four unknown quantitys, wherein has only two known relation, therefore can not directly solve.

Yet, have another kind of approach to separate some unknown quantitys in the equation 1.At first the situation when checking signal not produce begins to analyze, and that is to say, equals 0 from a signal of VAD element 104, and does not have voice and produce.In this case, s (n)=S (z)=0, equation 1 becomes

M _1n(z)＝N(z)H ₁(z)

M _2n(z)=N (z) wherein, the subscript n of variable M represents to have only noise to be received.Cause M like this _1n(z)=M _2n(z) H ₁(z)

H_{1} (z) = \frac{M_{1 n} (z)}{M_{2 n} (z)}

Equation 2

H ₁(z) can use any available system identification algorithm to calculate, when system determined to have only noise to be received, mike was exported.This calculating can be carried out adaptively, so system can make a response to the variation in the noise.

There is a kind of solution can solve a variable in the equation 1 now.Another variable H ₂(z) can use VAD equal 1 and situation when producing voice determine.At this moment, recently (may less than 1 second) history display low noise of mike can be supposed n (s)=N (z)～0.Then equation 1 becomes

M _1s(z)＝S(z)

M _2s(z)＝S(z)H ₂(z)

Can cause so again

M _2s(z)＝M _1s(z)H ₂(z)

H_{2} (z) = \frac{M_{2 s} (z)}{M_{1 s} (z)}

This is H ₁(z) inverse operation.Yet, it should be noted that and use different input-had only noise to produce originally to have only signal to produce now.Calculate H ₂(z) time, be H ₁(z) value of calculating keeps constant, and vice versa.Therefore, suppose to calculate H ₁(z) and H ₂(z) one of them the time, unevaluated that do not change in fact.

Calculated H ₁(z) and H ₂(z) afterwards, they are used for removing noise from signal.If equation 1 is write as again

S(z)＝M ₁(z)-N(z)H ₁(z)

N(z)＝M ₂(z)-S(z)H ₂(z)

S(z)＝M ₁(z)-[M ₂(z)-S(z)H ₂(z)]H ₁(z)′

S (z) [1-H ₂(z) H ₁(z)]=M ₁(z)-M ₂(z) H ₁(z) then N (z) can as shown belowly replace, and solves S (z):

S (z) = \frac{M_{1} (z) - M_{2} (z) H_{1} (z)}{1 - H_{2} (z) H_{1} (z)}

Equation 3

If transfer function H ₁(z) and H ₂(z) can describe with enough degree of accuracy, so just noise can be removed fully, and recover primary signal.The amplitude of this result and noise and spectral characteristic are irrelevant.Only hypothesis is the good VAD of performance, enough accurate H ₁(z) and H ₂(z), ought calculate H so ₁(z) and H ₂During (z) one of them, another remains unchanged in fact.In the reality, it is rational that these hypothesis prove.

Noise remove algorithm described herein is easy to be generalized to the noise source that comprises arbitrary number.Figure 3 shows that when being generalized to the different noise source of n the front end block diagram of the noise remove algorithm of an embodiment.These different noise sources can be reflection or echoes each other, but are not limited thereto.Here show several noise sources, the path that each all has a transfer function or arrives each mike.The path H of previous name ₂Be marked as H ₀, so labelling is convenient to the path of the noise source 2 ' of MIC1.The output of each mike in transforming to the z territory time is:

M ₁(z)＝S(z)+N ₁(z)H ₁(z)+N ₂(z)H ₂(z)+…N _n(z)H _n(z)

Equation 4

M ₂(z)＝S(z)H ₀(z)+N ₁(z)G ₁(z)+N ₂(z)G ₂(z)+…N _n(z)G _n(z)

When no signal (VAD=0), then (for the purpose of clear, cancel z)

M _1n＝N ₁H ₁+N ₂H ₂+…N _nH _n

Equation 5

M _2n＝N ₁G ₁+N ₂G ₂+…N _nG _n

Can define a new transfer function now, be similar to above-mentioned H ₁(z):

{\tilde{H}}_{1} = \frac{M_{1 n}}{M_{2 n}} = \frac{N_{1} H_{1} + N_{2} H_{2} + . . . N_{n} H_{n}}{N_{1} G_{1} + N_{2} G_{2} + . . . N_{n} G_{n}}

Equation 6

Therefore,

Only depend on noise source and their transfer functions separately, whenever can calculating of no signal transmission.Once more, the subscript n of mike input is just only represented in detection noise, and subscript s represents that mike is only in received signal.

When supposing that noiseless produces, check equation 4

M _1s＝S

M _2s＝SH ₀

Therefore, H ₀Can use any available transfer function to calculate algorithm and solve as aforementioned.On the mathematics

H_{0} = \frac{M_{2 s}}{M_{1 s}}

Define in the use equation 6 Rewrite equation 4, obtain

{\tilde{H}}_{1} = \frac{M_{1} - S}{M_{2} - {SH}_{0}}

Equation 7

And then solve S

S = \frac{M_{1} - M_{2} {\tilde{H}}_{1}}{1 - H_{0} {\tilde{H}}_{1}}

Equation 8

Equation 8 is identical with equation 3, wherein with H ₀Replace H ₂, with Replace H ₁Therefore this noise remove algorithm is for the noise source of arbitrary number, and the echo that comprises noise source all is still effective on mathematics.Once more, if H ₀With

Estimation have sufficiently high degree of accuracy, and have only the hypothesis of a paths to set up from the signal mike, noise can fully be removed so.

In most cases all have a plurality of noise sources and a plurality of signal source.Figure 4 shows that under the situation that n different noise sources and signaling reflex are arranged usually the front end block diagram of the noise remove algorithm of an embodiment.Here, the reflection of signal enters this two mikes.This is the most common situation, accurately is modeled as simple additional noise source because enter into the reflection of the noise source of mike.For clarity sake, the directapath from the signal to MIC2 is from H ₀(z) become H ₀₀(z), arrive the reflection path of MIC1 and MIC2 respectively by H ₀₁(z) and H ₀₂(z) expression.

Input to mike becomes

M ₁(z)=S (z)+S (z) H ₀₁(z)+N ₁(z) H ₁(z)+N ₂(z) H ₂(z)+... N _n(z) H _n(z) equation 9

M ₂(z)＝S(z)H ₀₀(z)+S(z)H ₀₂(z)+N ₁(z)G ₁(z)+N ₂(z)G ₂(z)+…N _n(z)G _n(z)

When VAD=0, input becomes (cancelling z once more)

M _1n＝N ₁H ₁+N ₂H ₂+…N _nH _n

M _2n＝N ₁G ₁+N ₂G ₂+…N _nG _n

This equation is identical with equation 5.Therefore, as desired, in equation 6

Calculating do not become.In the muting situation of check, equation 9 becomes

M _1s＝S+SH ₀₁

M _2s＝SH ₀₀+SH ₀₂

Define thus

{\tilde{H}}_{2} = \frac{M_{2 s}}{M_{1 s}} = \frac{H_{00} + H_{02}}{1 + H_{01}}

Equation 10

It is right to use The definition of (as equation 7) rewrites equation 9 once more, obtains

{\tilde{H}}_{1} = \frac{M_{1} - S (1 + H_{01})}{M_{2} - S (H_{00} + H_{02})}

Equation 11

And then some algebraic manipulations of process obtain

S (1 + H_{01} - {\tilde{H}}_{1} (H_{00} + H_{02})) = M_{1} - M_{2} {\tilde{H}}_{1}

S (1 + H_{01}) [1 - {\tilde{H}}_{1} \frac{(H_{00} + H_{02})}{1 + H_{01}}] = M_{1} - M_{2} {\tilde{H}}_{1}

S (1 + H_{01}) [1 - {\tilde{H}}_{1} {\tilde{H}}_{2}] = M_{1} - M_{2} {\tilde{H}}_{1}

Finally obtain

S (1 + H_{01}) = \frac{M_{1} - M_{2} {\tilde{H}}_{1}}{1 - {\tilde{H}}_{1} {\tilde{H}}_{2}}

Equation 12

Equation 12 is identical with equation 8, just H ₀Replace with

, and the left side has increased the factor (1+H ₀₁).This extra factor means that S can not directly solve in this case, but adds for signal and can produce its whole echoes one and to separate.This is not a kind of bad situation because there are a lot of traditional methods to handle the echoes inhibition, even and echo is not suppressed, they also can not largely influence the understanding to the voice implication.It is right to need Carry out more complicated calculating to solve as the signal echo among the MIC 2 of noise source.

Figure 5 shows that the flow chart of the denoising method of an embodiment.In operation, acoustical signal is received 502 at 502 places.Then, movable relevant with human sounding physiologic information is received at 504 places.By determining to have at least one specific period not have sounding information in the acoustical signal, calculate first transfer function 506 of representative voice signal.By determining to have at least one specific period to have sounding information in the acoustical signal, calculate second transfer function 508 of representative voice signal.Use at least once combination of first transfer function and second transfer function, produce and remove noise sound data stream 510, noise is removed from acoustical signal.

Simple case from a noise source with a directapath to a plurality of noise sources that have reflection and echo has been described the noise remove algorithm or except that making an uproar algorithm.Here this algorithm of Xian Shiing can change under different environmental conditions.If can estimate well With , and one of them is constant in fact when calculating another, and then the type of noise and quantity are all unimportant.If user environment is to have echo, and echo then can compensate them from noise source.If also there is signal echo, then they will influence the clear signal of gained, but in most cases, these influences should be left in the basket.

In operation, when handling various noise types, amplitude and direction, the algorithm among the embodiment demonstrates good result.Yet, when from the mathematical concept to the engineering, using transfer, always must be similar to and adjust.In the equation 3, suppose H ₂(z) very little, so H ₂(z) H ₁(z) ≈ 0, so equation 3 becomes

S(z)≈M ₁(z)-M ₂(z)H ₁(z)

This just means only needs to calculate H ₁(z), quicken treatment progress, reduced required a large amount of calculating.If correctly selected mike, then this approximate being easy to realized.

The another kind of approximate filter that uses among the embodiment that relates to.Actual H ₁(z) will have pole and zero beyond suspicion, but, use finite impulse response (FIR) (FIR) wave filter at full zero point for stability with for the purpose of oversimplifying.If have enough branches (about 60), just can approach actual H well ₁(z).

About the selection of subband, transfer function must calculate frequency range wide more, its accurate Calculation is just difficult more.Therefore, voice data is divided into 16 subbands, low-limit frequency 50Hz, highest frequency 3700Hz.Should remove the algorithm of making an uproar and be applied to each subband successively, and remove the data flow of making an uproar to 16 then and reconfigure, and generate and remove the voice data of making an uproar.It is fine that this method is carried out ground, but can use combination in any combinations such as (promptly 4,6,8,32 uniformly-spaced) perception intervals of subband, and it is fine to carry out ground equally.

The amplitude of noise is suppressed among embodiment, makes that employed mike can saturated (not that is to say, be operated in outside the linear response regions).The microphone lines sex work guarantees that best performance is very important.Even under this restriction, still can remove low-down signal to noise ratio (snr) signal and make an uproar (reduce to-10dB is following or still less).

Use lowest mean square (LMS) algorithm, common self adaptation transfer function, can per 10 milliseconds finish H ₁(z) calculating.Explanation to it is arranged in " the Adaptive SignalProcessing " of Widrow and Streams work in 1985, and this this book is published by Prentice-Hall, and ISBN is 0-13-004029-0

VAD among one embodiment is got by RF sensing device and two mikes, and voiced speech and unvoiced speech are produced very high degree of accuracy (99%).VAD among the embodiment uses radio frequency (RF) interferometer to detect the histokinesis relevant with the generation of human speech, but is not limited thereto.Therefore, this VCD is not affected by noise fully, and can move in any noise circumstance.A simple energy measurement to the RF signal can be used to judge whether to produce voiced speech.Can use and traditional determine unvoiced speech, approach to use the RF pick off or similarly audible segment is determined in sounding pick off or above-mentioned combination based on sound method.Because considerably less energy arranged in unvoiced speech, thus its to activate degree of accuracy accurate in speech sound.

According to detected sound and unvoiced speech reliably, can realize the algorithm of an embodiment.Once more, repeating the noise remove algorithm, not rely on how VAD to obtain be useful, as long as it is enough accurate, especially for speech sound.If voice are not detected, and training takes place on voice, and the noise sound data that removes after then may distortion.

Data are collected in 4 channels, and one is used for MIC1, and one is used for MIC2, and two are used for radio frequency sensor, and it detects the histokinesis relevant with speech sound.Above-mentioned data are sampled simultaneously at 40kHz, are digitized filtration then, divide sample to 8kHz.Higher sample rate is used to reduce any glitch that the processing procedure owing to analog to digital produces.(National Instruments) the A/D plate and the Labview of one four channel American National Instr Ltd. are used to catch and store data jointly.Be read in the c program after these data, once removed by 10 milliseconds then and make an uproar.

Figure 6 shows that among the embodiment, under the airport terminal noise that comprises many other talkers and public broadcasting, the result of a woman's who says Americanese noise suppression algorithm.The talker is just sending the sound of digital 406-5562 in the centre of airport terminal noise.10 milliseconds of the noisy voice datas of mixing are once removed makes an uproar, and remove make an uproar before, 10 milliseconds data from 50 to 3700Hz by pre-filtering.Obviously, reduced the noise of about 17dB.Owing to do not have after-filtration in this sampling, therefore the minimizing of all noises of above-mentioned realization all is because the algorithm of embodiment.Obviously, this algorithm is adapted to noise rapidly, and can remove other talkers' the noise that is difficult to remove.Test various different types of noise, all obtained similar result, comprised street noise, helicopter noise, music and sine wave etc.And, even substantial variation takes place the direction of noise, significantly do not change the noise suppressed performance yet.At last, the distortion of the clear voice of gained is very low, has guaranteed the superperformance of speech recognition machine and human receptor.

Shown the variation of noise remove algorithm under any environmental condition of embodiment among the figure.If it is right With

Do good estimation, then the type of noise and quantity are all unimportant.If user environment has echo existence, if they then can be compensated from a noise source.If also have signal echo, they can influence the clear signal of gained so, but under most of environment, this influence can be ignored.

Figure 7 shows that under Fig. 2, Fig. 3 and embodiment shown in Figure 4, adopt omnidirectional microphone M2 to noise with adopt omnidirectional microphone M1 voice to be removed the block diagram of the physical arrangements of making an uproar.As mentioned above, approach 0 to the path of noise mike (MIC2), and can realize this approximate by the careful layout of omnidirectional and omnidirectional microphone from voice.When noise is directed to and signal location (noise source N ₁) when opposite, can realize approximate (20-40dB of noise suppressed) preferably.Yet, when noise source is oriented at and talker the same side (noise source N ₂) time, performance just is reduced to the only 10-20dB of noise suppressed.This decline of inhibition ability can ascribe to guarantees H ₂Approach 0 step.These steps comprise the omnidirectional microphone that is used for noise mike (MIC2), and making does not almost have signal in the noise data.Because omnidirectional microphone has been offset acoustic information from specific direction, thus it also offset from the noise of voice equidirectional.This is can the limiting adaptive algorithm qualitative and remove the noise of a certain position, as N ₂Ability.When omnidirectional microphone is used for speech microphone M1, identical influence is arranged.

Yet, if replace omnidirectional microphone M with an omnidirectional microphone ₂, M then ₂Catch a large amount of signals.This and aforesaid H ₂Be that 0 hypothesis is opposite, the result has removed a large amount of signals in voiced process, causes removing making an uproar and " removing signal ".If distorted signals will remain on minimum, then this result is unacceptable.Therefore, in order to reduce distortion, calculate H ₂Value.Yet, under noisy situation or noise be designated as voice by mistake and can't be calculated H under the not removed situation ₂Value.

Experience suggestion about the microphone array that has only sound (acoustic-only): a little two-microphone array is the way that addresses this problem.Figure 8 shows that among the embodiment, comprise the microphone arrangement figure that makes an uproar that removes of two omnidirectional microphones.By the omnidirectional microphone of using two to be oriented in same direction (towards signal source), can reach identical effect.Yet another embodiment has used an omnidirectional microphone and an omnidirectional microphone.Be used for catching similar information from sound source in the direction of signal source.The relative position of signal source and two mikes is fixed and is known.By mike is placed on n the discrete time corresponding distance of sampling be the position of d, and the talker is placed on the axle of array, then H ₂Can be fixed as Cz ^-nForm, wherein C is M ₁And M ₂The difference of vibration of the signal data at place.For following content, suppose n=1, though the arbitrary integer except that 0 can use.Consider cause effect relation, positive integer is used in suggestion.When the amplitude in spherical pressure source changes with 1/r, will allow direction and distance to be described to the source.Required C can followingly estimate:

Figure 9 shows that among the embodiment shown in Figure 8 the required C and the relation curve of distance.As can be seen, asymptote is tending towards C=1.0, and in the time of about 38 centimetres, C reaches 0.9, is slightly larger than 1 foot, and C reaches 0.94 in the time of 60 centimetres.When distance normally changes (4 to 12 centimetres) with hand-held device or earphone mode, C will change between about 0.5 to 0.75.This be positioned at the difference that about 60 centimetres of noise sources far away have about 19-44%, and clearly, most of noise sources are all far away than this value.Therefore, use the system of this configuration very to distinguish noise and signal effectively, even they have close orientation.

Remove the effect of making an uproar when determining C bad estimated, suppose C=nC ₀, wherein C estimates, C ₀It is the actual value of C.Use above-mentioned definition, obtain signal

S (z) = \frac{M_{1} (z) - M_{2} (z) H_{1} (z)}{1 - H_{2} (z) H_{1} (z)}

Suppose H ₂(z) enough little, so signal is similar to

S(z)≈M ₁(z)-M ₂(z)H ₁(z)

If there are not voice, The above results is exactly real, because H ₂=0.Yet, if there are voice to produce H ₂Be non-zero, if be made as Cz ^-1,

S (z) = \frac{M_{1} (z) - M_{2} (z) H_{1} (z)}{1 - {Cz}^{- 1} H_{1} (z)}

Above-mentioned equation can be write as

S (z) = \frac{M_{1} (z) - M_{2} (z) H_{1} (z)}{1 - n C_{0} z^{- 1} H_{1} (z)} = \frac{M_{1} (z) - M_{2} (z) H_{1} (z)}{1 - C_{0} z^{- 1} H_{1} (z) + (1 - n) C_{0} z^{- 1} H_{1} (z)}

The error that last factor decision of denominator causes owing to the bad estimation to C.This factor is labeled as E:

E＝(1-n)C ₀z ^-1H ₁(z)

Because z ^-1H ₁(z) be a wave filter, so its amount will be a positive number always.Therefore, because E, the variation of the semaphore of calculating will be depended on (1-n) fully.

Two kinds of possible errors are arranged: to the too small estimation (n＜1) of C with to the excessive estimation (n＞1) of C.Under first kind of situation, the ratio reality that C is estimated is little, or signal approaches estimated value.(1-n) in this case, so E is a positive number.Therefore denominator is too big, and the quantity of the clear signal of gained is too little.This shows except that signal.Under second kind of situation, signal is considerably beyond estimated value, and E is negative, and this makes S should be worth greater than it.In this case, insufficient except that making an uproar.Because ideal situation is very low distorted signals, therefore the excessive estimation to C can produce mistake.

The above results shows, is positioned at identical solid angle (from M with signal ₁Direction) noise will depend on the C between signaling point and the noise spot variation and from removing in fact.Therefore, when the service range mouth about 4 centimetres have a M ₁Hand-held device the time, required C is approximately 0.5, for the noise at about 1 meter, C is approximately 0.96.Therefore, for noise, estimate that C=0.5 means that for noise C has been underestimated.Noise will be removed.The quantity of removing will directly depend on (1-n).Therefore, this algorithm direction and scope of using signal come extraction of signal from noise.

A kind of viewpoint of new proposition relates to the stability of this technology.Clearly, (1-H ₁H ₂) deconvolution draw the problem of stability because begin to need to calculate (1-H in each acoustic segment ₁H ₂) inverse.This helps to reduce computation time, or realizes the quantity of the order of index futures weekly of this algorithm, because need not to calculate the inverse that each has acoustic window, only calculates first, because H ₂Be considered to constant.Yet, just running into a vacation at every turn, all need 1-H ₁H ₂Inverse calculate, that vacation just calculated will be expensive more thereby this is approximate.

Fortunately, H ₂Selection eliminated required deconvolution.From above-mentioned discussion, signal can be write as

S (z) = \frac{M_{1} (z) - M_{2} (z) H_{1} (z)}{1 - H_{2} (z) H_{1} (z)}

This equation can be write as

S(z)＝M ₁(z)-M ₂(z)H ₁(z)+S(z)H ₂(z)H ₁(z)

Or

S (z)=M ₁(z)-H ₁(z) [M ₂(z)+S (z) H ₂(z)] yet, because H ₂(z) be Cz ^-1Form, so picture that gets up of the sequence in the time domain

s[n]＝m ₁[n]-h ₁*[m ₂[n]]-C·s[n-1]

Mean the current MIC1 signal of current demand signal sampling needs, current MIC2 signal and signal sampling before.This means does not need deconvolution, only needs a simple subtraction, then the convolution of a picture front.The increase of required calculating is reduced to minimum.Therefore, this raising is easy to realize.

The result of the difference of mike response can show by the configuration among check Fig. 2,3 and 4 among this embodiment.Have only to comprise time transfer function A (z) and B (z) specifically, they are represented frequency response and their filtration of MIC1 and MIC2 and amplify response.Figure 10 shows that two mike MIC1 and MIC2 have among the embodiment of different response characteristics, the front end block diagram of noise cancelling alorithm.

Figure 10 comprises the pattern description of embodiment process, has a signal source 1000 and a noise source 1001.This algorithm uses two mikes: " signal " mike 1 (" MIC1 ") and " noise " mike 2 (" MIC2 "), but be not limited to this.Suppose that MIC1 catches the most of signal that has some noises, and MIC2 catches the most of noise that has some signals.Be expressed as s (n) from the data of signal source 1000 to MIC1, wherein s (n) is to the discrete sampling from the analogue signal of signal source 1000.Be expressed as s from the data of signal source 1000 to MIC2 ₂(n).Be expressed as n (n) from the data of noise source 1001 to MIC2.Be expressed as n from the data of noise source 1001 to MIC1 ₂(n).

Transfer function A (z) expression is in company with the frequency response of filtration and the amplification response of MIC1.Transfer function B (z) expression is in company with the frequency response of filtration and the amplification response of MIC2.The output of transfer function A (z) is expressed as m ₁(n), the output of transfer function B (z) is expressed as m ₂(n).Signal m ₁(n) and m ₂(n) received by noise remove element 1005, noise remove element 1005 is handled these signals, and output " voice clearly ".

Below, term " frequency response of MIC X " will comprise combining of any amplification of taking place in the data recording process to mike or Filtering Processing and mike effect.When separating signal and noise (for the purpose of clear, cancelling " z "),

S = \frac{M_{1}}{A} - H_{1} N

N = \frac{M_{2}}{B} - H_{2} S

Wherein with latter's substitution the former, obtain

S = \frac{M_{1}}{A} - \frac{H_{1} M_{2}}{B} + H_{1} H_{2} S

S = \frac{\frac{M_{1}}{A} - \frac{H_{1} M_{2}}{B}}{1 - H_{1} H_{2}}

This equation is as showing that the difference of frequency response (between MIC1 and the MIC2) has an influence.Yet, the amount of must note measuring.Front (before the frequency response of considering mike), H ₁The use following formula is measured

H_{1} = \frac{M_{1 n}}{M_{2 n}}

Wherein subscript n shows that this calculating occurs over just in the window procedure that includes only noise.Yet, when the check equation, when it should be noted that no signal, measure following value at the mike place:

M ₁＝H ₁NA

M ₂＝NB

Therefore, H ₁Should calculate like this

H_{1} = \frac{{BM}_{1 n}}{A M_{2 n}}

Yet, calculate H ₁(z) do not consider B (z) and A (z) time.Therefore actual measurement only is the ratio of signal in each mike:

{\tilde{H}}_{1} = \frac{M_{1 n}}{M_{2 n}} = H_{1} \frac{A}{B}

Wherein

Represent measured response, H ₁Be real response.To H ₂Compute classes be similar to H ₁Calculating, obtain

{\tilde{H}}_{1} = \frac{M_{2 s}}{M_{1 s}} = H_{2} \frac{B}{A}

Will With In generation, returned the equation of top S, solves

S = \frac{\frac{M_{1}}{A} - \frac{B {\tilde{H}}_{1} M_{2}}{AB}}{1 - {\tilde{H}}_{1} \frac{B}{A} {\tilde{H}}_{2} \frac{A}{B}}

Or

SA = \frac{M_{1} - {\tilde{H}}_{1} M_{2}}{1 - {\tilde{H}}_{1} {\tilde{H}}_{2}}

When the frequency response of mike is not included in wherein, this equation and front identical.Here S (z) A (z) replaces S (z), value ( With ) the actual H of replacement ₁(z) and H ₂(z).Therefore, in theory, this algorithm is independent of mike and relevant wave filter and amplifier response.

Yet, in the reality, suppose H ₂=Cz ^-1(C is a constant), but actually be

{\tilde{H}}_{2} = \frac{B}{A} {Cz}^{- 1}

Therefore, the result is

SA = \frac{M_{1} - {\tilde{H}}_{1} M_{2}}{1 - \frac{B}{A} {\tilde{H}}_{1} {Cz}^{- 1}}

This equation depends on B (z) and A (z), and the two all is unknown.If the frequency response of mike is different in essence, the two is common the generation, during the mike of especially employed cheapness, can go wrong.This means that the data from MIC2 should be compensated, make it with have suitable relation from the data of MIC1.These can realize that described source is positioned on the distance and direction of desired actual signal (can use actual signal source) by record from the MIC1 in source and the broadband signal among the MIC2.Calculate the discrete Fourier transform (DFT) (DFT) of each microphone signal then, and calculate the conversion quantity of each frequency bin.Then, the DFT quantity of the MIC2 in each frequency bin is set to equal the DFT quantity that C multiply by MIC1.If M ₁[n] expression is to the quantity of n the frequency bin of the DFT of MIC1, and then this factor multiply by M ₂[n] is

F [n] = C \frac{M_{1} [n]}{M_{2} [n]}

Then, the MIC2 DFT phase place before using is applied to new MIC2 DFT amplitude with this inverse transformation.In this way, MIC2 is synthesized again, makes relational expression

M ₂(z)=M ₁(z) Cz ^-1When only having voice to take place all is correct many times.Use the imitation of wave filter imitate as far as possible the characteristic of F (for example, Matlab function F FT2.M can with the F[n that calculates] value makes and is used for making up a suitable FIR wave filter), this conversion can also realize in time domain.

Figure 11 A is depicted as in frequency response (percent) disparity map of (4 centimetres of distances) between two mikes before the compensation.Figure 11 B is depicted as frequency response (percent) disparity map of (4 centimetres of distances) between latter two mike of DFT compensation.Figure 11 C is depicted as frequency response (percent) disparity map of (4 centimetres of distances) between latter two mike of time domain filtering compensation.These figure have shown the effectiveness of above-mentioned compensation method.Therefore, use the omnidirectional microphone and the omnidirectional microphone of two cheapnesss, two kinds of compensation methodes can both recover two correct relations between the mike.

As long as amplification and filter process are constant relatively, then this conversion should be constant relatively.Therefore, it is possible only needing to compensate processing in the fabrication stage.Yet, if desired, can suppose H ₂=0 moves this algorithm, up to this system applies in the place that does not almost have noise and very strong signal.Then, can calculate penalty coefficient F[n], and from bringing into use at this moment.Because noise need not when very little to remove and makes an uproar, therefore this calculating can not produce unsuitable influence to removing the algorithm of making an uproar.Remove the coefficient of making an uproar and also can under the most satisfied situation of noise circumstance, upgrade, to obtain best degree of accuracy.

Each frame of describing among the figure and each step all comprise a series of operations of being described of need not here.Those skilled in the art can create program, algorithm, source code, microcode, program logic array or realize based on above-mentioned accompanying drawing and specifically described invention.Program described herein can comprise any one in following, or following one or more combination: be stored in the program in the nonvolatile memory (not shown) that forms relevant one or more processor parts; The program of using traditional program logic array or component to realize; Be stored in removable media as the program in the dish; That download from server or be stored in program the local user; Hardwired or be pre-programmed into program in the chip is as EEPROM (" EEPROM ") semiconductor chip, application-specific IC (ASIC) or Digital Signal Processing (DSP) integrated circuit.

Unless context clearly needs, otherwise the speech that uses in description and claims " comprises (comprise) " and " comprising (comprising) " etc. all should literal translate to comprising, but not the exclusive or meaning completely, in other words, it is the meaning of " comprise, but be not limited to ".Use the speech of odd number and plural number also to comprise a plurality of or single number respectively.In addition, when speech " here ", " following " etc., similar speech used in this application, should refer to the integral body of this application, but not any specific part of this application.

The description of the above embodiment of the present invention is not to be used for the present invention described thoroughly or the present invention is restrictedly accurate as described.But specific implementations of the present invention described herein and embodiment are used for illustrative purposes, and those skilled in the art can make the various modifications that are equal within the scope of the invention.Theory of the present invention described here can be applied in the other machines visual system, is not only above-mentioned data collection symbol reader.And the element of the various embodiments described above and action can be in conjunction with forming additional embodiments.

Some reference papers or the U.S. Patent application of quoting are here all quoted.If necessary, content of the present invention can be modified, so that use system, function and the notion of these reference papers, for the invention provides more embodiment.

Claims

1, a kind of method of removing noise from the signal of telecommunication comprises:

In first receiving device, receive a plurality of acoustical signals;

In second receiving system, receive a plurality of acoustical signals; Described a plurality of acoustical signal comprises at least one noise signal of at least one noise source generation and at least one voice signal that at least one signal source produces, described at least one signal source comprises the talker, and the relative position of signal source, first receiving device and second receiving system is fixed and known;

Receive the physiologic information movable relevant, comprise whether the sounding activity is arranged with talker's human sounding;

Do not have the sounding activity by judging at least one specific period in described a plurality of acoustical signals, produce at least one first transfer function of the described a plurality of noise signals of expression;

In described a plurality of acoustical signals sounding information is arranged by judging at least one specific period, produce at least one second transfer function of the described a plurality of acoustical signals of expression;

Use at least a combination of described at least one first transfer function and described at least one second transfer function to come from described a plurality of acoustical signals, to remove noise, produce at least one thus except that the data flow after making an uproar.

2, the method for claim 1 is characterized in that: the described first receiving device and second receiving system all comprise the mike of selecting from omnidirectional microphone and omnidirectional microphone.

3, the method for claim 1 is characterized in that: described a plurality of acoustical signals are received with discrete time sampling, the described first receiving device and the second receiving system distance ' ' d ' ', and described d is corresponding to n discrete time sampling.

4, the method for claim 1 is characterized in that: described at least one second transfer function is confirmed as the function of difference of amplitude of the signal data of the amplitude of signal data of described first receiving device and described second receiving system.

5, the method for claim 1 is characterized in that: the described noise of removing from a plurality of acoustical signals comprises direction and the scope of use from described at least one first receiving device at least one signal source.

6, the method for claim 1, it is characterized in that: each frequency response of described at least one first receiving device and described at least one second receiving system is different, signal data from described at least one second receiving system is compensated, make it and have correct relation from the signal data of described at least one first receiving device.

7, method as claimed in claim 6, it is characterized in that: compensation comprises record from described at least one first receiving device in source and the broadband signal in described at least one second receiving system from the signal data of described at least one second receiving system, and described source is positioned at for one from the desired distance and orientation of the signal of described at least one signal source.

8, method as claimed in claim 6 is characterized in that: compensation comprises frequency domain compensation from the signal data of described at least one second receiving system.

9, method as claimed in claim 8, it is characterized in that: frequency compensation comprises:

Each signal data from described at least one first receiving device and described at least one second receiving system is carried out frequency conversion calculation;

Calculate the quantity of the frequency inverted of each frequency bin; And

In each frequency to carrying out the setting of frequency inverted quantity from the signal data of described at least one second receiving system, make its become one with the relevant value of frequency inverted quantity from the signal data of described at least one first receiving device.

10, method as claimed in claim 6 is characterized in that: compensation comprises time domain compensation from the signal data of described at least one second receiving system.

11, method as claimed in claim 6 also comprises:

Originally described at least one second transfer function is set is 0; And

When described at least one noise signal is very little with respect to described at least one voice signal, calculate penalty coefficient.

12, the method for claim 1 is characterized in that: described a plurality of acoustical signals comprise at least one reflection of described at least one noise signal and at least one reflection of described at least one voice signal.

13, the method for claim 1, it is characterized in that: described reception physiologic information comprises: use at least one detector to receive the physiological data relevant with human sounding, described detector is selected from voice microphone, radio-frequency unit, glottis resistance instrument, Vltrasonic device, sound throat microphone and pneumatic detector.

14, the method for claim 1 is characterized in that: at least one first transfer function of described generation and at least one second transfer function comprise the use at least one technology in adaptive technique and the recursive technique.

15, a kind of system of removing noise from acoustical signal comprises:

At least one receptor comprises:

At least one signal receiver is used to receive at least one acoustical signal from signal source;

At least one noise receptor is used to receive at least one noise signal from noise source, and the relative position of described signal source, at least one signal receiver and at least one noise receptor is fixed and be known;

At least one pick off is used to receive the physiologic information movable relevant with human sounding; And

At least one processor, be coupled in described at least one receptor and produce between described at least one pick off of a plurality of transfer functions, the generation of at least one first transfer function of wherein representing described at least one acoustical signal is in response to the judgement that does not have sounding information in described at least one acoustical signal at least one specific period, the generation of at least one second transfer function of wherein representing described at least one acoustical signal wherein uses at least a combination of described at least one first transfer function and described at least one second transfer function that noise is removed from described at least one acoustical signal in response to the judgement that sounding information is arranged in described at least one acoustical signal at least one specific period.

16, system as claimed in claim 15 is characterized in that: described at least one pick off comprises that at least one radio frequency (" RF ") interferometer detects the histokinesis relevant with human speech.

17, system as claimed in claim 15 is characterized in that: described at least one pick off comprises at least one pick off of selecting from voice microphone, radio-frequency unit, glottis resistance instrument, Vltrasonic device, sound throat microphone and pneumatic detector.

18, system as claimed in claim 15 is characterized in that: as follows to described at least one processor configuration:

The voice data of described at least one acoustical signal is divided into a plurality of subbands;

Use at least a combination of described at least one first transfer function and described at least one second transfer function that noise is removed from each subband, produce a plurality of audio data streams that remove after making an uproar; And

With described a plurality of remove audio data stream combination results after making an uproar described at least one remove audio data stream after making an uproar.

19, system as claimed in claim 15 is characterized in that: described at least one signal receiver and described at least one noise receptor all are mikes, are selected from omnidirectional microphone and omnidirectional microphone.

20, a kind of signal processing system that is coupled between at least one user and at least one electronic installation, described signal processing system comprises:

At least one first receiving device is used to receive at least one acoustical signal from signal source;

At least one second receiving system is used to receive at least one noise signal from noise source, and the relative position of described signal source, described at least one first receiving device and described at least one second receiving system is fixed and be known; And

At least one removes the subsystem of making an uproar, and is used for removing noise from acoustical signal, and the described subsystem of making an uproar that removes comprises:

At least one processor is coupled in described at least one first receptor and at least described

Between one second receptor; And

At least one pick off, be coupled in described at least one processor, wherein said at least one pick off is used to receive the physiologic information movable relevant with human sounding, described at least one processor produces a plurality of transfer functions, the generation of at least one first transfer function of wherein representing described at least one acoustical signal is in response to the judgement that does not have sounding information in described at least one acoustical signal at least one specific period, the generation of at least one second transfer function of wherein representing described at least one acoustical signal is in response to the judgement that sounding information is arranged in described at least one acoustical signal at least one specific period, use at least a combination of described at least one first transfer function and described at least one second transfer function that noise is removed from described at least one acoustical signal, produce at least one except that the data flow after making an uproar.

21, signal processing system as claimed in claim 20 is characterized in that: described first receiving device and described second receiving system all are mikes, are selected from omnidirectional microphone and omnidirectional microphone.

22, signal processing system as claimed in claim 20, it is characterized in that: described at least one acoustical signal is received with discrete time sampling, the described first receiving device and the second receiving system distance ' ' d ' ', described d is corresponding to n discrete time sampling.

23, signal processing system as claimed in claim 20 is characterized in that: described at least one second transfer function is confirmed as the function of difference of amplitude of the signal data of the amplitude of signal data of described first receiving device and described second receiving system.

24, signal processing system as claimed in claim 20 is characterized in that: the described noise of removing from least one acoustical signal comprises direction and the scope of use from described at least one first receiving device at least one signal source.

25, signal processing system as claimed in claim 20, it is characterized in that: each frequency response of described at least one first receiving device and described at least one second receiving system is different, signal data from described at least one second receiving system is compensated, make it and have correct relation from the signal data of described at least one first receiving device.

26, signal processing system as claimed in claim 25, it is characterized in that: compensation comprises record from described at least one first receiving device in source and the broadband signal in described at least one second receiving system from the signal data of described at least one second receiving system, and described source is positioned at for one from the desired distance and orientation of the signal of described at least one signal source.

27, signal processing system as claimed in claim 25 is characterized in that: compensation comprises frequency domain compensation from the signal data of described at least one second receiving system.

28, signal processing system as claimed in claim 27, it is characterized in that: frequency compensation comprises:

Each is carried out frequency conversion calculation from the signal data in described at least one first receiving device and described at least one second receiving system;

Calculate the quantity of the frequency inverted of each frequency bin; And

29, signal processing system as claimed in claim 25 is characterized in that: compensation comprises time domain compensation from the signal data of described at least one second receiving system.

30, signal processing system as claimed in claim 25, further compensation also comprises:

Originally described at least one second transfer function is set is 0; And

31, signal processing system as claimed in claim 20 is characterized in that: described at least one acoustical signal comprises at least one reflection of described at least one noise signal and at least one reflection of described at least one acoustical signal.

32, signal processing system as claimed in claim 20, it is characterized in that: described reception physiologic information comprises: use at least one detector to receive the physiological data relevant with human sounding, described detector is selected from voice microphone, radio-frequency unit, glottis resistance instrument, Vltrasonic device, sound throat microphone and pneumatic detector.

33, signal processing system as claimed in claim 20 is characterized in that: at least one first transfer function of described generation and at least one second transfer function comprise the use at least one technology in adaptive technique and the recursive technique.