FI115328B

FI115328B - Expression for sound activity

Info

Publication number: FI115328B
Application number: FI20010933A
Authority: FI
Inventors: Daniel Kenneth Freeman; Ivan Boyd
Original assignee: Lg Electronics Inc
Priority date: 1988-03-11
Filing date: 2001-05-04
Publication date: 2005-04-15
Also published as: CA1335003C; AU608432B2; DE68910859D1; EP0335521B1; IE890774L; EP0335521A1; NO903936L; FI904410A0; PT89978B; HK135896A; JP2000148172A; EP0548054B1; DE68929442T2; DK215690A; NO903936D0; IE61863B1; NO316610B1; JPH03504283A; KR900700993A; EP0548054A3

Abstract

The first aspect provides a voice activity detection appts. for receiving an input signal, estimating the noise signal component of the input signal and continually forming a measure M of the spectral similarity between a portion of the input signal and the noise signal. A circuit is provided to compare a parameter derived from the measure M with a threshold value T to produce an output to indicate the presence, or absence, of speech depending on whether, or not, that value is exceeded. A second aspect covers voice activity detection appts. which continually forms a spectral distortion measure and carries out a comparison.

Description

115328 Äänen aktiivisuuden ilmaisu. - Uttryck för ljudet aktivitet.115328 Sound Activity Detection. - Uttryck for a lot of activity.

Esillä oleva hakemus on jakamalla erotettu hakemuksesta Fl 904410.The present application is subdivided from application F1904410.

5 Äänen aktiivisuuden ilmaisin on laite, jolle syötetään signaali puhejaksojen tai vain kohinaa sisältävien jaksojen ilmaisemista varten. Vaikka esillä oleva keksintö ei rajoitu tähän, tällaisten ilmaisinten eräänä erikoisen mielenkiintoisena sovelluskohteena ovat matkaradiopuhelinjärjestelmät, joissa puhekooderi voi käyttää tietoa puheen esiintymisestä tai puuttumisesta parantamaan radiospek-10 trin hyväksikäyttöä ja joissa myös kohinataso (kulkuvälineeseen asennetusta yksiköstä) on todennäköisesti suuri.5 A voice activity detector is a device that is fed with a signal for detecting speech sequences or noise-only sequences. While not limited to the present invention, one particularly interesting application of such detectors is mobile radiotelephone systems in which the speech encoder may use information on the presence or absence of speech to improve radio spectrum utilization and where the noise level (from the unit mounted on the vehicle) is likely to be high.

Äänen aktiivisuuden ilmaisun olennaisena sisältönä on löytää mitta, joka eroaa selvästi puhejaksoilla ja puheettomilla jaksoilla. Puhekooderin sisältävässä lait-15 teessä kooderin eri asteista voidaan saada helposti useita parametrejä ja tarvittavaa prosessointia on tämän vuoksi suotavaa vähentää käyttämällä jotakin tällaista parametria. Monissa ympäristöissä pääkohinalähteet esiintyvät taajuus-spektrin määrätyillä tunnetuilla alueilla. Esimerkiksi liikkuvassa autossa suuri osa kohinasta (esim. moottorin melu) keskittyy spektrin pientaajuisille alueille.The essential content of the expression of voice activity is to find a measure that is clearly different between speech periods and non-speech periods. In a device containing a speech encoder, several parameters can easily be obtained from different degrees of the encoder, and it is therefore desirable to reduce the processing required by using any such parameter. In many environments, the main noise sources occur in certain known areas of the frequency spectrum. For example, in a moving car, much of the noise (eg engine noise) is concentrated in the low-frequency areas of the spectrum.

20 Kun tällaista tietoa kohinan spektriasemasta on käytettävissä, päätös puheen esiintymisestä tai puuttumisesta on edullista perustaa mittauksiin, jotka on suo-·.·, ritettu spektrin siinä osassa, joka sisältää suhteellisen vähän kohinaa. Käytän- :· .·. nössä olisi luonnollisesti mahdollista suodattaa signaali ennakolta ennen pu-When such information about the noise spectral position is available, it is advantageous to base the decision on the presence or absence of speech on the measurements, preferably in the part of the spectrum that contains relatively little noise. Practical: ·. ·. naturally, it would be possible to pre-filter the signal before pu

• I• I

heen aktiivisuuden ilmaisemiseksi suoritettua analyysia, mutta silloin kun äänen . 25 aktiivisuuden ilmaisin seuraa puhekooderin lähtöä, esisuodatus vääristäisi koo- dattavaa äänisignaalia.analysis of human activity, but when audio. The activity detector follows the output of the speech encoder, pre-filtering would distort the audio signal to be encoded.

Keksintö kohdistuu siten äänen aktiivisuuden ilmaisinlaitteeseen, joka käsittää: ‘ : 30 (i) ensimmäisen äänen aktiivisuuden ilmaisimen, joka toimii muodostamalla ', mitan tulosignaalin osan ja tulosignaalin sellaisen osan, jonka katsotaan ! / olevan vapaa puheesta, väliselle spektraaliselle samankaltaisuudelle läh- tösignaalin tuottamiseksi, joka osoittaa puheen esiintymisen tai puuttumisen tulosignaalissa; 115328 2 (ii) muistin tallentamaan dataan, joka on saatu mainitusta puheesta vapaasta osasta; ja (iii) äänen aktiivisuuden lisäilmaisimen; 5 jolle on tunnusomaista, että äänen aktiivisuuden lisäilmaisin ohjaa muistin päivitystä, jolloin äänen aktiivisuuden lisäilmaisin toimii muodostamalla mitan tulo-signaalin sen hetkisen osan ja tulosignaalin aiemman osan väliselle spektraa-liselle samankaltaisuudelle.The invention thus relates to a voice activity detector device comprising: ': (i) a portion of a signal input signal of a first voice activity detector operating by forming' and a portion of the input signal being considered! / being free of speech, for spectral similarity to produce an output signal indicating the presence or absence of speech in the input signal; 115328 2 (ii) data stored in memory obtained from said speech-free portion; and (iii) an additional voice activity detector; Characterized in that the auxiliary voice activity detector controls the memory update, wherein the auxiliary voice activity detector operates by generating a measure of spectral similarity between the current portion of the input signal and the previous portion of the input signal.

10 Mitta on edullisesti Itakura-Saito-vääristymämitta.Preferably, the measure is an Itakura-Saito distortion measure.

Esillä olevan keksinnön muut muodot ovat patenttivaatimuksissa määritellyn mukaisia.Other embodiments of the present invention are as defined in the claims.

15 Keksinnön erästä suoritusmuotoa selitetään seuraavassa esimerkkinä oheisiin piirustuksiin viitaten, joissa:An embodiment of the invention will now be described by way of example with reference to the accompanying drawings, in which:

Kuviot 1 ja 2 esittävät keksinnön erään suoritusmuodon mahdollisia komponentteja, ja 20Figures 1 and 2 show possible components of an embodiment of the invention, and Figs

Kuvio 3 esittää esillä olevan keksinnön erästä edullista suoritusmuotoa.Figure 3 illustrates a preferred embodiment of the present invention.

• t : *. ·. Keksinnön yhden suoritusmuodon mukaisen ensimmäisen äänen aktiivisuuden . ·: ·. ilmaisimen perustana oleva yleinen periaate on seuraava.• t: *. ·. The activity of the first voice according to one embodiment of the invention. ·: ·. the general principle underlying the detector is as follows.

• · » 25 ....: Kehyksestä, jossa on n signaalinäytettä (s0, sv s2, s3, s4 ... sn-1), saadaan, kun se johdetaan neljännen kertaluvun äärellisen impulssivasteen (FIR) digitaalisen • %» laskentasuotimen kautta, jonka impulssivaste on (1, h0, h1p h2, h3), tuloksena t: suodatettu signaali (kun näytteet aikaisemmista kehyksistä jätetään huomiotta) 30 s’= (So), :· (s.| + h0s0), j’ (s2 + h0s.| + h-|S0), 3 115328 (s3 + h0s2 + h1s1 + h2s0) (s4 + h0s3 + h1s2 + h^ + h-|S0), (s5 + h0s4 + h1s3 + h2s2 + hgs^, (s 6 + h0s5 + h1s4 + h2s3 + h3s2), 5 (s7...)• · »25 ....: A frame with n signal samples (s0, sv s2, s3, s4 ... sn-1) is obtained when it is passed through a fourth order finite impulse response (FIR) digital •%» calculator with an impulse response of (1, h0, h1p h2, h3) resulting in t: filtered signal (when samples from previous frames are ignored) 30 s '= (So),: · (s. | + h0s0), j' (s2 + h0s. | + h- | S0), 3 115328 (s3 + h0s2 + h1s1 + h2s0) (s4 + h0s3 + h1s2 + h ^ + h- | S0), (s5 + h0s4 + h1s3 + h2s2 + hgs ^, (s 6 + h0s5 + h1s4 + h2s3 + h3s2), 5 (s7 ...)

Kertaluvun nolla autokorrelaatiokerroin on termien neliösumma, joka voidaan normalisoida ts. jakaa termien kokonaislukumäärällä (kehysten ollessa vakio-pituisia jakolasku on helpointa jättää pois). Suodatetun signaalin kerroin on 10 sitenThe zero autocorrelation coefficient is a squared sum of terms that can be normalized, i.e. divided by an integer number of terms (for frames of constant length, the division is easiest to omit). The coefficient of the filtered signal is thus 10

R’o = ΣWR'o = ΣW

i=0 15 ja tämä muodostaa siten mitan laskennallisen suodatetun signaalin s’ - toisin sanoen laskentasuotimen päästökaistan sisälle osuvan signaalin s osan -teholle.i = 0 15 and thus provides a measure of the power of a portion of the signal s that falls within the calculated bandwidth of the calculated filtered signal s', that is, of the calculation filter.

Kun lauseke ratkaistaan, saadaan jätettäessä 4 ensimmäistä termiä huomiotta 20 R o = (s4 + ^os3 + h-|S2 + h2s1 + h3s0) + (s5 + h0s4 + h1s3 + h2s2 + h3s / • * * + ...When solving the expression, ignoring the first 4 terms gives 20 R o = (s4 + ^ os3 + h- | S2 + h2s1 + h3s0) + (s5 + h0s4 + h1s3 + h2s2 + h3s / • * * + ...

• « : 25 = S4 + h0s4s3 + h.,s4s2 + h2s4s1 + h3s4s0 + hoS4s3 + hgSo + hoh-|S3s2 + h0h2s3s1 + h0h3s3s0• «: 25 = S4 + h0s4s3 + h., S4s2 + h2s4s1 + h3s4s0 + hoS4s3 + hgSo + hoh- | S3s2 + h0h2s3s1 + h0h3s3s0

* * h^s4s2 + hQh.|S3s2 + h^Sj + h^h2s2s-| + h^h3s2SQ* * h ^ s4s2 + hQh. | S3s2 + h ^ Sj + h ^ h2s2s- | + h ^ h3s2SQ

1,,, * ^ h2s4s1 + h0h1s3s1 + h1h2s2s1 ^ h2S^ 4· h2h3s^SQ1 ,,, * ^ h2s4s1 + h0h1s3s1 + h1h2s2s1 ^ h2S ^ 4 · h2h3s ^ SQ

+ h3s4s0 + h0h3s3s0 +h1h3s2s0 + h^s^o + h23s20 30 +...+ h3s4s0 + h0h3s3s0 + h1h3s2s0 + h ^ s ^ o + h23s20 30 + ...

!;> = R0 (1 + ho + h, + hj + hj) : + R-, (2h0 + 2h0h1 +2h1h2 + 2h2h3) + R2 (2h^ + 2h^h3 + 2hgh2) ,:. + R3 (2h2 + 2h0h3) 35 +R4(2h3) I i > 115328 4 R'o voidaan siten saada autokorrelaatiokertoimien Rj yhdistelmästä painotettuina suluissa olevilla vakioilla, jotka määräävät taajuuskaistan, jossa kertoimen R'0 arvo vaikuttaa. Suluissa olevat termit ovat itse asiassa laskentasuotimen impulssivasteen autokorrelaatiokertoimia, joten edellä esitetty lauseke voidaan 5 yksinkertaistaa muotoon!;> = R0 (1 + ho + h, + hj + hj): + R-, (2h0 + 2h0h1 + 2h1h2 + 2h2h3) + R2 (2h ^ + 2h ^ h3 + 2hgh2),:. + R3 (2h2 + 2h0h3) 35 + R4 (2h3) I i> 115328 4 R'o can thus be obtained from a combination of autocorrelation coefficients Rj, weighted by the constants in parentheses that determine the frequency band in which the value of R'0 affects. The terms in parentheses are actually autocorrelation coefficients for the impulse response of the computational filter, so the above expression can be simplified to

NOF

R'o * R0H0 + 2 Σ RA..................................(1) i=1 10 missä N on suotimen kertaluku ja Hj ovat suotimen impulssivasteen (normali-soimattomia) autokorrelaatikertoimia.R'o * R0H0 + 2 Σ RA .................................. (1) i = 1 10 where N is the order of the filter and Hj are the autocorrelation coefficients of the impulse response of the filter (normal to no sound).

Toisin sanoen signaalin suodatuksen vaikutusta signaalin autokorrelaatiokertoi-15 miin voidaan simuloida muodostamalla (suodattamattoman) signaalin autokorrelaatiokertoimien painotettu summa käyttämällä impulssivastetta, joka vaaditulla suotimella olisi ollut.In other words, the effect of signal filtering on the signal autocorrelation coefficients can be simulated by generating a weighted sum of the (unfiltered) signal autocorrelation coefficients using the impulse response that the required filter would have had.

Suhteellisen yksinkertainen algoritmi, jossa käytetään vain vähän kertolaskutoi-20 mituksia, voi siten simuloida digitaalisen suotimen vaikutusta, jossa tarvitaan tyypillisesti sata kertaa tämä lukumäärä kertolaskutoimituksia.Thus, a relatively simple algorithm with little multiplication of 20 measures can simulate the effect of a digital filter, which typically requires one hundred times this number of multiplications.

; >: Suodatustoimitusta voidaan vaihtoehtoisesti tarkastella sen muotoisena spektri- vertailuna, jossa signaalispektriä verrataan vertailuspektriin (laskentasuotimen v · 25 vasteen käänteisarvoon). Koska laskentasuodin valitaan tässä sovelluksessa :,, / siten, että se approksimoi kohinaspektrin käänteisarvoa, tämä toimitus voidaan : "· katsoa puhe- ja kohinaspektrien spektrivertailuksi ja siten kehitetty nollas auto- •, korrelaatiokerroin (ts. käänteissuodatetun signaalin energia) voidaan katsoa spektrien erilaisuuden mitaksi. Itakura-Saito-mittaa käytetään lineaariprediktio- * : 30 koodauksessa LPC prediktorisuotimen ja tulospektrin välisen yhteensopivuuden arvioimiseksi ja se voidaan ilmaista eräässä muodossa *; >: Alternatively, the filtering operation can be viewed as a spectral comparison in the form it compares the signal spectrum with the reference spectrum (inverse of the response of the computational filter v · 25). Since the computational filter is selected in this application: ,, / such that it approximates the inverse of the noise spectrum, this operation can be: "· considered as spectral comparison of speech and noise spectra, and the resulting zero auto, correlation coefficient (i.e. inverse filtered signal energy) The Itakura-Saito measure is used in linear prediction *: 30 coding to evaluate LPC predictor filter compatibility with the result spectrum and can be expressed in one form *

NOF

M = RoA0 + 2 Z RA, :· 35 i=iM = RoA0 + 2 Z RA,: · 35 i = i

• 1 I• 1 I

5 115328 missä Aq jne. ovat LPC-parametrijoukon autokorrelaatiokertoimia. Havaitaan, että lauseke on hyvin samankaltainen kuin edellä johdettu riippuvuus ja kun muistetaan, että LPC-kertoimet ovat sellaisen FIR-suotimen tappeja, jolla on tulosignaalin käänteinen spektrivaste, niin että LPC-kerroinjoukko on kääntei-5 sen LPC-suotimen impulssivaste, on ilmeistä, että Itakura-Saito-vääristymä-mitta on itse asiassa vain yhtälön 1 sellainen muoto, jossa suotimen vaste H on tulosignaalin pelkkiä napoja sisältävän mallin spektrimuodon käänteisarvo.5 115328 where Aq etc. are the autocorrelation coefficients of the LPC parameter set. It will be seen that the expression is very similar to the dependency derived above, and remembering that the LPC coefficients are pins of an FIR filter having an inverse spectral response of the input signal so that the LPC coefficient set is the impulse response of the LPC filter inverse, that the Itakura-Saito-distortion measure is, in fact, only a form of equation 1 in which the filter response H is the inverse of the spectral form of the model containing only the poles of the input signal.

Itse asiassa on myös mahdollista muuntaa spektrit käyttämällä testispektrin 10 LPC-kertoimia ja vertailuspektrin autokorrelaatiokertoimia erilaisen mitan saamiseksi spektrien samanlaisuudelle.In fact, it is also possible to convert the spectra using LPC coefficients of the test spectrum and autocorrelation coefficients of the reference spectrum to obtain a different measure of spectral similarity.

l-S-vääristymämittaa on käsitelty enemmän julkaisussa A Buzo, A H Gray, R M Gray ja J D Markel, "Speech Coding based upon Vector Quantisation", IEEE 15 Trans on ASSP, Voi ASSP-28, No 5, lokakuu 1980.The 1-S distortion measure is further discussed in A Buzo, A H Gray, R M Gray, and J D Markel, "Speech Coding Based on Vector Quantization," IEEE 15 Trans on ASSP, Vol. ASSP-28, No. 5, October 1980.

Koska signaalikehyksillä on vain äärellinen pituus ja tietty lukumäärä termejä (N, missä N on suotimen kertaluku) jätetään ottamatta huomioon, edellä esitetty tulos on vain likiarvo. Se antaa kuitenkin hämmästyttävän hyvän ilmaisun pu-20 heen esiintymisestä tai puuttumisesta ja sitä voidaan siten käyttää mittana M puheen ilmaisussa. Ympäristössä, jossa kohinaspektri on hyvin tunnettu ja v, muuttumaton, on täysin mahdollista käyttää yksinkertaisesti kiinteitä kertoimia h0, h., jne. käänteisen kohinasuotimen mallintamiseksi.Since the signal frames have only a finite length and a certain number of terms (N, where N is the order of the filter) are ignored, the result above is only an approximation. However, it provides an astonishingly good indication of the presence or absence of pu-20 and can thus be used as a measure of M speech expression. In an environment where the noise spectrum is well known and v, constant, it is perfectly possible to simply use fixed coefficients h0, h, etc. to model the inverse noise filter.

$ t • t t * t * * · * , ···, 25 Kuitenkin sellaista laitetta, joka voi adaptoitua erilaisiin kohinaympäristöihin, , ’'!; voidaan käyttää yleisemmin.$ t • t t * t * * · *, ···, 25 However, a device that can adapt to different noise environments,, ''!; can be used more generally.

Kuvion 1 suoritusmuodossa mikrofonista (ei esitetty) tuleva signaali vastaanote- ,.; taan tulossa 1 ja muunnetaan digitaalisiksi näytteiksi s sopivalla näytteenotto- ( » 30 taajuudella analogia-digitaalimuuntimella 2. LPC-analyysiyksikkö 3 (sisältyy tun-*, netun tyyppiseen LPC-kooderiin) johtaa tällöin n (esim. 160) näytteen peräkkäi- i » ;;,: sille kehyksille joukon N (esim. 8 tai 12) LPC-suodinkertoimia Lj, jotka siirretään edustamaan tulevaa puhetta. Puhesignaali s syötetään myös korrelaattoriyksi-.! i * kölle 4 (sisältyy normaalisti osana LPCkooderiin 3, koska myös puheen auto- I i 6 115328 korrelaatiovektori Rj kehitetään LPC-analyysin yhtenä vaiheena, vaikka on selvää, että myös erillistä korrelaattoria voitaisiin käyttää). Korrelaattori 4 kehittää autokorrelaatiovektorin Rj, johon sisältyy nollakertaluvun korrelaatiokerroin R0 ja ainakin kaksi muuta autokorrelaatiokerrointa R1( R2, R3. Ne syötetään tämän 5 jälkeen kertojayksikölle 5.In the embodiment of Figure 1, the signal from the microphone (not shown) is received; is input 1 and converted to digital samples s with a suitable sampling (3030 frequency by analog-to-digital converter 2). The LPC analysis unit 3 (included in a known * LPC encoder of a known type) then produces n (e.g. 160) samples in succession »; ;,: for those frames, a set of LPC filter coefficients Lj of N (e.g., 8 or 12) which are transmitted to represent incoming speech. The speech signal s is also provided to the correlator unit!! * * 4 (normally included as part of LPC encoder 3, i 6 115328 correlation vector Rj is developed as one step in the LPC analysis (although it is clear that a separate correlator could be used) .Correlator 4 develops an autocorrelation vector Rj which includes a zero order correlation coefficient R0 and at least two other autocorrelation coefficients R1 (R2, R3. after the multiplier unit 5.

Toinen tulo 11 on kytketty toiseen mikrofoniin, joka on kaukana puhujasta siten, että tämä mikrofoni vastaanottaa vain taustakohinaa. Tästä mikrofonista tuleva tulo muunnetaan AD-muuntimella 12 digitaaliseksi tulonäytejonoksi ja se LPC-10 analysoidaan toisella LPC-analysaattorilla 13. Analysaattorista 13 kehitetyt "kohina"-LPC-kertoimet johdetaan korrelaattoriyksikölle 14 ja siten kehitetty autokorrelaattorivektori kerrotaan termeittäin puhemikrofonista tulevan tulosig-naalin autokorrelaatiokertoimien Rj kanssa kertojassa 5 ja siten kehitetyt painotetut kertoimet yhdistetään summaimessa 6 yhtälön 1 mukaan, jotta saataisiin 15 suodinvaikutus, jolla on pelkkää kohinaa havaitsevan mikrofonin kohinaspektriin (joka on käytännössä sama kuin kohinaspektrin muoto signaalin ja kohinan vastaanottavassa mikrofonissa) nähden käänteinen muoto ja joka siten suodattaa pois suurimman osan kohinasta. Tuloksena olevaa mittaa M verrataan kynnysarvoon kynnysarvopiirissä 7 logiikkalähdön 8 kehittämiseksi, joka ilmaisee pu-20 heen esiintymisen tai puuttumisen. Jos M on suuri, puheen katsotaan esiinty-*.*. vän.The second input 11 is connected to a second microphone which is far from the speaker such that this microphone receives only background noise. The input from this microphone is converted by the AD converter 12 to a digital input sample queue and is analyzed by the LPC-10 by another LPC analyzer 13. The "noise" -LPC coefficients generated from the analyzer 13 are fed to the correlator unit 14 and the autocorrelation vector the weighted coefficients generated in the multiplier 5 and so combined in the adder 6 according to equation 1 to obtain a filter effect having an inverse shape to the noise spectrum of the noise-only microphone (which is virtually the same as the noise spectrum in the signal and noise receiving microphone) noise. The resulting dimension M is compared to a threshold value in a threshold circuit 7 to generate a logic output 8 indicating the presence or absence of a pu 20. If M is large, speech is considered to occur - *. *. date.

. · :*. Tämä suoritusesimerkki vaatii kuitenkin kaksi mikrofonia ja kaksi LPC-analy- .' “. saattona, mikä lisää tarvittavan laitteiston kustannuksia ja monimutkaisuutta.. ·: *. However, this embodiment requires two microphones and two LPC analyzers. ' ". which increases the cost and complexity of the hardware needed.

25 . ·' ·. Toisessa suoritusmuodossa käytetään vaihtoehtoisesti vastaavaa mittaa, joka muodostettu käyttämällä kohinamikrofonista 11 saatuja autokorrelaatioita ja •:: päämikrofonista 1 saatuja LPC-kertoimia, joten ylimääräisen LPC-analysaatto- •"': rin sijasta tarvitaan ylimääräinen autokorrelaattori.25th · '·. Alternatively, in another embodiment, a corresponding dimension formed using autocorrelations from the noise microphone 11 and LPC coefficients derived from the microphone 1 is used, so that an additional autocorrelator is needed instead of an extra LPC analyzer.

,30 Nämä suoritusmuodot voivat siten toimia erilaisissa ympäristöissä, joissa esiintyy kohinaa eri taajuuksilla, tai kohinaspektrin muuttuessa määrätyssä ympäristössä.Thus, these embodiments may operate in different environments where noise occurs at different frequencies, or when the noise spectrum changes in a given environment.

7 1153287, 115328

Kuvion 2 suoritusmuodossa on puskuri 15, johon on tallennettu LPC-kerroin-joukko (tai joukon autokorrelaatiovektori), joka on johdettu mikrofonitulosta 1 sellaisen jakson aikana, joka on tunnistettu "puheettomaksi" jaksoksi (ts. pelkäksi kohinajaksoksi). Näitä kertoimia käytetään tämän jälkeen mitan johtami-5 seksi käyttämällä yhtälöä 1, joka mitta myös tietenkin vastaa Itakura-Saito-vää-ristymämittaa, paitsi että tällöin käytetään yhtä tallennettua LPC-kerrointen kehystä, joka vastaa käänteisen kohinaspektrin approksimaatiota, eikä sen hetkistä LPC-kerrointen kehystä.In the embodiment of Figure 2, there is a buffer 15 storing a set of LPC coefficients (or set of autocorrelation vectors) derived from microphone input 1 during a period identified as a "speechless" period (i.e., a mere noise period). These coefficients are then used to derive the measure using equation 1, which of course also corresponds to the Itakura-Saito-distortion measure, except that one stored LPC coefficient frame corresponding to the inverse noise spectrum approximation is used instead of the current LPC coefficient approximation. frame.

10 Analysaattorin 3 antama LPC-kerroinvektori johdetaan myös korrelaattorille 14, joka muodostaa LPC-kerroinvektorin autokorrelaatiovektorin. Kynnysarvopiirin 7 puhe/puheetonlähtö ohjaa puskurimuistia 15 sillä tavalla, että puskuri säilyttää "puhekehysten" aikana "kohinan" autokorrelaatiokertoimet, mutta "kohinakehys-ten" aikana voidaan käyttää uutta LPC-kerrointen joukkoa puskurin päivittämi- 15 seksi, esimerkiksi monikkokytkimellä 16, jonka välityksellä korrelaattorin 14 lähdöt, joissa kussakin on autokorrelaatiokerroin, on kytketty puskuriin 15. On selvää, että korrelaattori 14 voitaisiin sijoittaa puskurin 15 jälkeen. Lisäksi puhe/pu-heeton-päätöstä kerrointen päivittämiseksi ei tarvitse tehdä lähdöstä 8, vaan se voitaisiin johtaa (ja edullisesti johdetaan) muulla tavalla.The LPC coefficient vector provided by the analyzer 3 is also provided to the correlator 14 which forms the autocorrelation vector of the LPC coefficient vector. The speech / speech output of the threshold circuit 7 controls the buffer memory 15 in such a way that the buffer maintains the "noise" autocorrelation coefficients during the "speech frames" but a new set of LPC coefficients can be used during the "noise frames" to update the buffer The outputs 14, each with an autocorrelation coefficient, are coupled to the buffer 15. It is clear that the correlator 14 could be located downstream of the buffer 15. In addition, a speech / speech decoding decision updating of the coefficients does not need to be made at output 8, but could be derived (and preferably derived) in some other way.

20 •. . Koska puheettomia jaksoja esiintyy usein, puskuriin tallennetut LPC-kertoimet • » : v. tulevat päivitetyiksi ajoittain, niin että laite kykenee siten seuraamaan kohina- . : ·. spektrin muutoksia, on selvää että tällainen puskurin päivitys saattaa olla tar- . · ·. peen vain satunnaisesti tai se voi tapahtua vain kerran ilmaisimen toiminan ....: 25 alussa, jos (kuten usein on asianlaita) kohinaspektri on ajallisesti suhteellisen .' · *; muuttumaton, mutta matkaradiopuhelinympäristössä usein tapahtuva päivitys •»· on edullisempi.20 •. . Because speechless episodes occur frequently, the LPC coefficients stored in the buffer • »: v. Will be updated from time to time so that the device can monitor the noise. : ·. spectrum changes, it is clear that such a buffer update may be necessary. · ·. only occasionally, or it can occur only once when the detector operates ....: 25, if (as is often the case) the noise spectrum is relatively temporal. ' · *; an unchanged but frequent update in a mobile radio environment is »» · cheaper.

.' *': Tämän suoritusesimerkin eräässä muunnoksessa järjestelmä käyttää aluksi . ‘ . 30 yhtälöä 1 kerrointermien vastatessa yksinkertaista kiinteää ylipäästösuodinta ja ‘* tämän jälkeen järjestelmä alkaa adaptoitua siirtymällä käyttämään "kohinajak- ' i ‘ son" LPC-kertoimia. Jos puheenilmaisu jostakin syystä epäonnistuu, järjestelmä. ' * ': In one variant of this embodiment, the system initially uses. '. 30 equations 1 as the multiplication terms correspond to a simple fixed high pass filter and then the system begins to adapt by switching to the "noise time" LPC coefficients. If the speech fails for any reason, the system

I I II I I

• · ·: voi palata käyttämään yksinkertaista ylipäästösuodinta.• · ·: Can return to using a simple high pass filter.

» · 8 115328»· 8 115328

Edellä esitetty mitta voidaan normalisoida jakamalla arvolla R0, niin että lauseke, jota verrataan kynnysarvoon, on muodoltaan n RjAj 5 M=Ao + 2£--- i=1 R0 Tämä mitta on riippumaton kehyksen kokonaissignaalienergiasta ja kokonais-signaalitason muutokset on siten kompensoitu siinä, mutta se antaa heikom-10 man kontrastin "kohina-" ja "puhetasojen" välillä ja sitä ei tämän vuoksi edullisimmin käytetä erittäin häiriöllisissä ympäristöissä.The above dimension can be normalized by dividing by R0 such that the expression compared to the threshold is of the form n RjAj 5 M = Ao + 2 £ --- i = 1 R0 This dimension is independent of the total signal energy of the frame and is thus compensated for therein , but it provides a weaker contrast between the "noise" and "speech levels" and is therefore most advantageously not used in highly disturbed environments.

Sen sijaan että käytettäisiin LPC-analyysiä kohinasignaalin känteisen suotimen kertoimien johtamiseen (joko kohinamikrofonista tai pelkkää kohinaa sisältävis-15 tä jaksoista, kuten edellä selitetyissä eri esimerkeissä), käänteinen kohinaspek-tri on mahdollista mallintaa käyttämällä tunnetun tyyppistä adaptiivista suodinta. Koska kohinaspektri muuttuu vain hitaasti (kuten seuraavassa selitetään), tällaisissa suotimissa tavallinen suhteellisen hidas kertoimien adaptoitumisnopeus voidaan hyväksyä. Eräässä suoritusmuodossa, joka vastaa kuviota 1, LPC-ana-20 lyysiyksikkö 13 korvataan yksinkertaisesti adaptiivisella suotimella (esimerkiksi FIR-poikittaissuotimella tai verkkosuotimella), joka on kytketty siten, että se te-; kee tulevan kohinan valkoisemmaksi mallintamalla käänteistä suodinta, ja sen | *]: kertoimet syötetään kuten edellä autokorrelaattorille 14.Rather than using LPC analysis to derive the inverse filter coefficients of the noise signal (either from a noise microphone or from noise-only cycles, as in the various examples described above), the inverse noise spectrum can be modeled using a known type of adaptive filter. Because the noise spectrum changes only slowly (as will be explained below), the usual relatively slow coefficient adaptation rate for such filters can be accepted. In one embodiment corresponding to Figure 1, the LPC ana-20 lysis unit 13 is simply replaced by an adaptive filter (e.g., an FIR transverse filter or a network filter) coupled to perform; Makes the resulting noise whiter by modeling a reverse filter, and *]: Factors are input as above for autocorrelator 14.

• » · :, t / 25 Eräässä toisessa suoritusmuodossa, joka vastaa kuvion 2 suoritusmuotoa, LPC-analyysiväline 3 on korvattu tällaisella adaptiivisella suotimella, ja puskuri-:,, ,·* väline 15 jätetään pois, mutta kytkin 16 toimii siten, että se estää adaptiivista suodinta adaptoimasta kertoimiaan puhejaksojen aikana.In another embodiment corresponding to the embodiment of Fig. 2, the LPC analysis means 3 is replaced by such an adaptive filter, and the buffer -: ,,, · * means 15 is omitted, but the switch 16 operates so that it prevents the adaptive filter from adapting its coefficients during speech periods.

* * » · 4 · > 30 Seuraavassa selitetään toista äänen aktiivisuuden ilmaisinta, joka on tarkoitettu käytettäväksi keksinnön erään suoritusmuodon yhteydessä.* * »· 4 ·> 30 The following describes another voice activity detector for use in connection with an embodiment of the invention.

• ' Edellä olevan perusteella on selvää, että LPC-kerroinvektori on yksinkertaisesti sellaisen FIR-suotimen impulssivaste, jonka vaste approksimoi tulosignaalin 35 käänteistä spektrimuotoa. Kun muodostetaan viereisten kehysten välinen 9 115328From the foregoing, it is clear that the LPC coefficient vector is simply the impulse response of an FIR filter whose response approximates the inverse spectral form of the input signal. When establishing between adjacent frames 9,115,328

Itakura-Saito-vääristymämitta, tämä on itse asiassa yhtä suuri kuin signaalin teho edellisen kehyksen LPC-suotimen suodattamana. Siten jos viereisten kehysten spektrit eroavat vähän, vastaava pieni päärä kehyksen spektritehosta jää suodattamatta ja mitta on pieni. Vastaavasti kehysten välinen suuri ero kehittää 5 suuren Itakura-Saito-vääristymämitan, niin että mitta kuvastaa vierekkäisten kehysten spektraalista samankaltaisuutta. Puhekooderissa on toivottavaa minimoida datataajuus, joten kehyksen pituus tehdään niin suureksi kuin mahdollista. Toisin sanoen jos kehyksen pituus on tarpeeksi suuri, tällöin puhesignaalissa olisi esiinnyttävä huomattava spektrimuutos kehysten välillä (jos näin ei ole, 10 kyseessä on ylimääräkoodaus). Kohinalla on toisaalta spektrimuoto, joka vaih-telee hitaasti kehyksestä toiseen, ja siten jaksolla, jossa signaalissa ei esiinny puhetta, Itakura-Saito-vääristymämitta on siten vastaavasti pieni - koska aikaisemman kehyksen käänteisen LPC-suotimen käyttäminen "suodattaa pois" suurimman osan kohinatehosta.Itakura-Saito distortion measure, this is actually equal to the signal power filtered by the LPC filter of the previous frame. Thus, if the spectra of adjacent frames differ slightly, the corresponding small amount of spectral power of the frame is not filtered and the dimension is small. Similarly, the large difference between frames produces 5 large Itakura-Saito distortion metrics so that the measure reflects the spectral similarity of adjacent frames. In a speech encoder, it is desirable to minimize the data frequency so that the frame length is made as large as possible. In other words, if the length of the frame is large enough, then there should be a significant spectral change in the speech signal between the frames (if not, 10 is an excess coding). On the other hand, noise has a spectral form that varies slowly from frame to frame, and thus, during a period of no speech, the Itakura-Saito distortion measure is correspondingly small - since using the previous frame reverse LPC filter "filters out" most of the noise power.

1515

Itakura-Saito-vääristymämitta ajoittaista puhetta sisältävän kohinaisen signaalin vierekkäisten kehysten välillä on tyypillisesti suurempi puhejaksojen aikana kuin kohinajaksojen aikana. Vaihtelun aste (standardipoikkeaman kuvaamana) on myös suurempi ja vähemmän ajoittain vaihteleva.The Itakura-Saito distortion measure between adjacent frames of an intermittent speech-containing noise signal is typically greater during speech periods than during noise periods. The degree of variation (as described by the standard deviation) is also greater and less variable at times.

20 on huomattava, että mitan M standardipoikkeaman standardipoikkeama on myös luotettava mitta. Kunkin standardipoikkeaman muodostamisen vaikutus : -. itse asiassa tasoittaa mittaa.It should be noted that the standard deviation of the standard deviation of dimension M is also a reliable measure. Effect of generating each standard deviation:. in fact, smooth out the measure.

11 · ,,,,: 25 Tässä äänen aktiivisuuden ilmaisimen toisessa muodossa mitattu parametri, , jota käytetään päätettäessä esiintyykö puhetta, on edullisesti Itakura-Saito-vää ristymämitan standardipoikkeama, mutta myös muita vaihtelumittoja ja muita ., spektrivääristymän mittoja (jotka perustuvat esimerkiksi FFT-analyysiin) voitai- ,· ·. siin käyttää.11 · ,,,,: 25 The parameter measured in this second form of the voice activity detector used to determine whether speech is present is preferably the Itakura-Saito standard deviation of the intersection, but also other fluctuation measurements and others., Spectral distortion measurements (based, for example, on FFT- analysis) butter, · ·. here to use.

30 :Adaptiivisen kynnyksen käyttö äänen aktiivisuuden ilmaisussa on havaittu edul-'liseksi. Tällaisia kynnyksiä ei saa asetella puhejaksojen aikana tai muuten pu-hesignaali tulee leikatuksi. Kynnyksenadaptointipiiriä on tämän vuoksi ohjattava > * * · 10 115328 käyttämällä puhe/puheeton-ohjaussignaalia ja tämän ohjaussignaalin tulisi edullisesti olla kynnyksenadaptointipiirin lähdöstä riippumaton.30: The use of an adaptive threshold for detecting voice activity has been found to be advantageous. Such thresholds must not be set during speech periods or else the speech signal will be trimmed. Therefore, the threshold adaptation circuit must be controlled by using a * / * 10 115328 speech / speech control signal, and this control signal should preferably be independent of the output of the threshold adaptation circuit.

Kynnys T asetellaan adaptiivisesti siten, että kynnysarvo pidetään juuri mitan M 5 tason yläpuolella pelkän kohinan esiintyessä. Koska mitta vaihtelee yleensä satunnaisesti kohinan esiintyessä, kynnystä muutetaan määräämällä keskimääräinen taso useiden lohkojen aikana ja kynnys asetetaan tähän keskiarvoon verrannolliselle tasolle. Tämä ei kuitenkaan yleensä riitä kohinaisessa ympäristössä ja siten myös parametrin vaihtelun asteen määritys useiden lohkojen 10 ajalta otetaan myös huomioon.The threshold T is adaptively set so that the threshold is kept just above the level of the measure M5 when noise alone occurs. Because the dimension generally varies randomly with the occurrence of noise, the threshold is altered by determining an average level over a plurality of blocks, and the threshold is set at a level proportional to this average. However, this is generally not sufficient in a noisy environment and thus the determination of the degree of parameter variation over a plurality of blocks 10 is also considered.

Kynnysarvo T lasketaan siten edullisesti seuraavan lausekkeen mukaan T = M’ + K.d 15 missä M’ on mitan keskiarvo useiden peräkkäisten kehysten yli, d on mitan standardipoikkeama näiden kehysten aikana ja K on vakio (joka voi olla tyypillisesti 2).The threshold T is thus preferably calculated according to the following expression T = M '+ K.d where M' is the average of the measure over several consecutive frames, d is the standard deviation of the measure during these frames and K is a constant (which can typically be 2).

20 Käytännössä on edullista, että adaptoimista ei aloiteta uudelleen välittömästi sen jälkeen, kun puheen on ilmaistu puuttuvan, vaan että odotetaan sen var- • · ^ mistamiseksi, että pudotus on stabiili (jotta vältettäisiin nopea toistuva kytkentä adaptoituvan ja ei-adaptoituvan tilan välillä).In practice, it is preferable that the adaptation is not restarted immediately after the speech has been detected, but is expected to ensure that the drop is stable (to avoid a rapid repetitive switching between the adaptive and non-adaptive modes).

25 Kuten kuviosta 3 ilmenee, edellä mainitut piirteet sisältävässä esillä olevan kek-,* · ·. sinnön eräässä edullisessa suoritusmuodossa tulo 1 vastaanottaa signaalin, josta on otettu näytteitä ja joka on muunnettu digitaaliseksi analogia-digitaali-,muuntimen (ADC) 2 avulla ja signaali syötetään käänteisen suotimen analy-saattorin 3 tuloon, joka käytännössä kuuluu osana siihen puhekooderiin, jonka 30 kanssa äänen aktiivisuuden ilmaisimen on tarkoitus toimia ja joka kehittää tulo-;signaalispektrin käänteisarvoa vastaavan suotimen kertoimet Li (tyypillisesti 8).As shown in Figure 3, the present invention incorporating the aforementioned features, * · ·. in a preferred embodiment of the invention, the input 1 receives a sampled signal converted to digital by an analog-to-digital converter (ADC) 2 and the signal is input to the input of the inverse filter analyzer 3, which is in practice part of the speech encoder 30 the sound activity detector is intended to operate and generates the coefficients L i (typically 8) of the filter corresponding to the inverse of the input signal spectrum.

• Digitalisoitu signaali syötetään myös autokorrelaattorille 4 (joka sisältyy osana analysaattoriin 3), joka kehittää tulosignaalin autokorrelaatiovektorin R| (tai aina-:kin yhtä monta kertaluvultaan alempaa termiä kuin LPC-kertoimia on). Laitteen 11 115328 näiden osien toiminta on kuvioissa 1 ja 2 selitetyn mukainen. Tällöin muodostetaan edullisesti autokorrelaatiokertoimien Ri keskiarvot useiden peräkkäisten puhekehysten yli (pituus tyypillisesti 5-20 ms) niiden luotettavuuden parantamiseksi. Tämä voidaan saada aikaan tallentamalla jokainen autokorrelaattorin 4 5 antama autokorrelaatiokertoimien joukko puskuriin 4a ja käyttämällä keskiar-vonmuodostajaa 4b sen hetkisten autokorrelaatiokertoimien Rj ja puskuriin 4a tallennettujen ja sieltä syötettyjen aikaisempien kehysten kertoimien painotetun summan muodostamiseksi. Siten johdetut keskimääräiset autokorrelaatiokertoi-met Ra, syötetään painotus- ja summausvälineille 5, 6, jotka vastaanottavat 10 myös tallennetut kohinajakson käänteisen suotimen suodinkertoimien Lf auto-korrelaatiovektorin Aj autokorrelaattorilta 14 puskurin 15 kautta ja jotka muodostavat arvoista Raj ja A, mitan M, joka on edullisesti määritelty seuraavasti:• The digitized signal is also fed to autocorrelator 4 (included as part of analyzer 3), which generates an input signal autocorrelation vector R | (or at least as many lower-order terms as LPCs). The operation of these parts of the device 11 115328 is as described in Figures 1 and 2. Hereby, averages of the autocorrelation coefficients R1 are preferably generated over a plurality of consecutive speech frames (typically 5-20 msec) to improve their reliability. This can be accomplished by storing each set of autocorrelation coefficients provided by the autocorrelator 45 in buffer 4a and using the averaging factor 4b to generate a weighted sum of the current autocorrelation coefficients Rj and previous frames stored in and input to the buffer 4a. The average autocorrelation coefficients Ra thus derived are supplied to the weighting and summing means 5, 6, which also receive the stored noise period inverse filter filter coefficients Lf from the autocorrelator 14 of the autocorrelation vector A 1 and preferably form Raj and A, M defined as follows:

RajAj 15 M = A0 + 2£ ------RajAj 15 M = A0 + 2 £ ------

Ro Tätä mittaa verrataan tämän jälkeen kynnystasoon kynnysarvopiirissä 7 ja looginen tulos antaa ilmaisun puheen esiintymisestä tai puuttumisesta lähdöstä 8.Ro This measure is then compared to the threshold level in the threshold circuit 7 and the logical result gives an indication of the presence or absence of speech at the output 8.

2020

Jotta käänteisen suotimen kertoimet L| vastaisivat kohtuullista estimaattia kohi-·': naspektrin käänteisarvosta, nämä kertoimet on suotavaa päivittää kohinajakso- ': jen aikana (ja tietenkin olla päivittämättä puhejaksojen aikana), on kuitenkin • : edullista, että puhe/puheeton-päätös, johon päivitys perustuu, ei riipu päivityk- :" 25 sen tuloksesta tai muuten yksi ainoa väärin tunnistettu signaalikehys voi aiheut- taa äänen aktiivisuuden ilmaisimen "lukituksen katoamisen" tämän jälkeen ja :' ': seuraavien kehysten virheellisen tunnistamisen. Tämän vuoksi on edullista käyttää ohjaussignaalinkehityspiiriä 20, joka on itse asiassa erillinen äänen i aktiivisuuden ilmaisin, joka muodostaa riippumattoman ohjaussignaalin, joka ' : 30 osoittaa puheen esiintymisen tai puuttumisen, käänteisen suotimen analysaat torin 3 (eli puskurin 8) ohjaamiseksi, niin että mitan M muodostamiseen käytet-tyjä käänteisen suotimen autokorrelaatiokertoimia Aj päivitetään vain "kohina-jaksojen" aikana. Ohjaussignaalinkehityspiiri 20 sisältää LPC-analysaattorin 21 ;;; (joka myös voi kuulua osana puhekooderiin ja jonka erikoisesti voi toteuttaa 35 analysaattori 3), joka kehittää tulosignaalia vastaavan LPC-kerrointen Mj jou- 12 115328 kon, ja autokorrelaattorin 21a (jonka voi toteuttaa autokorrelaattori 3a), joka johtaa kertoimien Mj autokorrelaatiokertoimet Bj. Jos analysaattorin 21 toteuttaa analysaattori 3, tällöin Mj = Ls ja Bj = Aj. Nämä autokorrelaatiokertoimet syötetään tällöin painotus-ja summausvälineille 22, 23 (vastaavat elimiä 5, 6), jotka 5 vastaanottavat myös tulosignaalin autokorrelaatiovektorin Rj autokorrelaattorilta 4. Siten lasketaan mitta spektraaliselle samankaltaisuudelle tulevan puhekehyk-sen ja edellisen puhekehyksen välillä. Tämä mitta voi olla Itakura-Saito-vääris-tymämitta sen hetkisen kehyksen kertoimien Rj ja edellisen kehyksen kertoimien Bj välillä, kuten edellä on esitetty, tai se voidaan sen sijaan johtaa laske-10 maila Itakura-Saito-vääristymämitta sen hetkisen kehyksen kertoimille Rj ja Bj ja vähentämällä (vähennyslaskuelimessä 25) puskuriin 24 tallennettu vastaava aikaisempi mitta spektrierosignaalin kehittämiseksi (kummassakin tapauksessa mitan energia normalisoidaan jakamalla arvolla R0). Tämän jälkeen puskuri 24 luonnollisesti päivitetään. Tämä spektrierosignaali muodostaa edellä selitetyn 15 mukaisen kynnysarvopiirissä 26 suoritetun kynnysarvovertailun jälkeen ilmaisi men puheen esiintymiselle tai puuttumiselle, olemme kuitenkin havainneet, että vaikka tämä mitta on erinomainen kohinan erottamiseksi ääntiöttömästä puheesta (tehtävä, johon tunnetut järjestelmät eivät yleensä pysty), se kykenee yleensä jonkin verran huonommin erottamaan kohinan ääntiöllisestä puheesta.For the inverse filter coefficients L | would correspond to a reasonable estimate of the inverse of the noise · 'spectrum, these coefficients are desirable to be updated during the noise periods (and of course not updated during the speech periods), however: • it is preferable that the speech / speechless decision on which the update is based does not depend refresh: "25 as a result thereof, or else a single misidentified signal frame may cause the voice activity detector to" unlock "thereafter, and: '': incorrect identification of the following frames. Therefore, it is preferable to use a control signal generating circuit 20 which is a voice i activity detector generating an independent control signal which ': indicates the presence or absence of speech to control the inverse filter analyzer 3 (i.e., buffer 8) such that the inverse filter autocorrelation coefficients A 1 used to generate the dimension M are updated only for "noise periods" "during the signal generating circuit 20 includes an LPC analyzer 21 ;;; (which may also be part of a speech encoder and may be implemented in particular by analyzer 35) generating a LPC coefficients Mj corresponding to the input signal Dec 12 115328 kon, and an autocorrelator 21a (which may be implemented by autocorrelator 3a) which derives autocorrelation coefficients B If the analyzer 21 is implemented by the analyzer 3, then Mj = Ls and Bj = Aj. These autocorrelation coefficients are then applied to the weighting and summing means 22, 23 (corresponding to the elements 5, 6), which also receive an input signal from the autocorrelator 4 of the autocorrelation vector R1, thereby calculating the spectral similarity between the incoming speech frame and the previous speech frame. This dimension may be the Itakura-Saito-distortion measure between the current frame coefficients Rj and the previous frame's coefficients Bj, as above, or it may instead be calculated as a 10-bar Itakura-Saito distortion measure for the current frame coefficients Rj and Bj and subtracting (in subtracting means 25) a corresponding prior measure stored in buffer 24 to generate a spectral signal (in each case, the energy of the measure is normalized by dividing by R0). The buffer 24 is then naturally updated. This spectral signal, after the threshold comparison performed in the threshold circuit 26 described above, 15, indicates the presence or absence of speech, but we have found that although this measure is excellent for distinguishing noise from silent speech to distinguish noise from pronunciation.

20 Tämän mukaisesti piirissä 20 käytetään edullisesti lisäksi ääntiöllisen puheen ilmaisinpiiriä, jossa on äänenkorkeuden analysaattori 27 (joka voi käytännössä ! toimia puhekooderin osana ja voi erikoisesti mitata monipulssi-LPC-kooderissa syntyvää pitkäaikaista prediktorin viivearvoa). Äänenkorkeuden analysaattori 27 kehittää loogisen signaalin, joka on "tosi", kun ääntiöllinen puhe havaitaan, ja ' ‘ ‘; 25 tämä signaali johdetaan yhdessä kynnysarvopiiriltä 26 johdetun kynnysarvover- .. , taillun mitan kanssa (joka on yleensä "tosi" ääntiöttömän puheen esiintyessä) TAI-EI-portin 28 tuloihin signaalin kehittämiseksi, joka on "epätosi" puheen esiintyessä ja "tosi" kohinan esiintyessä. Tämä signaali syötetään puskuriin 8 ·. (tai käänteisen suotimen analysaattorille 3), niin että käänteisen suotimen ker- 30 toimia Li päivitetään vain kohinajaksojen aikana.Accordingly, the circuit 20 preferably further utilizes a spoken voice detector circuit having a pitch analyzer 27 (which may, in practice, function as part of a speech encoder and, in particular, may measure the long-term predictor delay value generated in the multipulse LPC encoder). The pitch analyzer 27 generates a logical signal that is "true" when the voiced speech is detected, and '' '; 25 this signal is coupled with a threshold value derived from threshold circuit 26, a measured dimension (which is generally "true" in the presence of silent speech) OR to NO inputs 28 to generate a signal that is "false" in the presence of speech and "true" noise. . This signal is supplied to buffer 8 ·. (or inverse filter analyzer 3) so that the inverse filter coefficients Li are updated only during the noise periods.

1 I1 I

;·’ Kynnyksenadaptointipiiri 29 on myös kytketty vastaanottamaan ohjaussignaali- * ;·* generaattoripiirin 20 puheettoman signaalin ohjauslähdön. Kynnyksenadap- > t · ,,tointipiirin 29 lähtö syötetään kynnysarvopiirille 7. Kynnyksenadaptointipiiri suu- 115328 13 rentaa tai pienentää kynnystä portaissa, jotka ovat verrannollisia sen hetkiseen kynnysarvoon, kunnes kynnys approksimoi kohinatehotasoa (joka voidaan käytännöllisesti johtaa esimerkiksi painotus- ja summauspiireistä 22, 23).The threshold adaptation circuit 29 is also coupled to receive a control signal output from the control signal generator circuit 20; The output of the threshold adapters ,, ,, ,, is supplied to the threshold circuit 7. The threshold adaptation circuit 115 115328 13 relaxes or reduces the threshold in steps proportional to the current threshold value until the threshold approximates the noise power level (which can be practically derived) .

5 Kun tulosignaali on erittäin alhainen, saattaa olla edullista, että kynnys asetetaan automaattisesti kiinteään alhaiseen tasoon, koska analogia-digitaalimuun-timen 2 synnyttämä signaalin kvantisointivaikutus saattaa aiheuttaa alhaisilla signaalitasoilla epäluotettavia tuloksia.5 When the input signal is very low, it may be advantageous to set the threshold automatically to a fixed low level, since the signal quantization effect generated by the analog-to-digital converter 2 may produce unreliable results at low signal levels.

10 Lisäksi voidaan käyttää "ylityksen" kehittäviä välineitä 30, jotka mittaavat puheilmaisujen kestot kynnysarvopiirin 7 jälkeen, ja kun puheen esiintymisen on ilmaistu jatkuvan ennaltamäärätyn aikavakion yli, lähtö pidetään ylemmässä tilassa lyhyen "ylitysjakson" ajan. Tällä tavalla vältetään pientasoisten puhe-purskeiden leikkautuminen keskeltä ja aikavakion oikea valinta estää ylitys-15 generaattorin 30 liipaisun lyhyiden, virheellisesti puheeksi ilmaistujen kohina-piikkien vaikutuksesta.Further, "overrun" generating means 30 can be used to measure the duration of speech detection after threshold circuit 7, and when speech occurrence is detected over a predetermined time constant, the output is maintained in a higher state for a short "overrun" period. In this way, low-level centering of speech bursts is avoided and correct selection of the time constant prevents triggering of the crossing 15 generator 30 by short, incorrectly pronounced noise peaks.

on luonnollisesti selvää, että kaikki edellä mainitut toiminnat voi suorittaa yksi sopivasti ohjelmoitu digitaalinen prosessoriväline, kuten digitaalinen signaalin-20 käsittelypiiri (DSP), joka on siten toteutetun LPC-koodekin osana (tämä on . . parhaana pidetty toteutus), tai sopivasti ohjelmoitu mikrotietokone tai mikrokont- rolleripiiri siihen liittyvine muistilaitteineen.It is, of course, obvious that all of the above functions can be performed by one suitably programmed digital processor device, such as a digital signal processing circuit (DSP), which is part of the LPC codec thus implemented (this is the preferred embodiment), or a suitably programmed microcomputer or a microcontroller circuit with associated memory devices.

I · · I · * t · I . ’ Kuten edellä on selitetty, äänen ilmaisulaite voidaan käytännöllisesti toteuttaa I * ’ ’ . 25 LPC-koodekin osana. Vaihtoehtoisesti kun signaalin autokorrelaatiokertoimet t > * tai niihin liittyvät mitat (osittaiskorrelaatio eli "parcor"-kertoimet) lähetetään t · I · etäällä olevalle asemalle äänen ilmaisu voi tapahtua kaukana koodekista.I · · I · * t · I. 'As explained above, the voice detector can be practically implemented I *' '. 25 As part of the LPC codec. Alternatively, when the signal autocorrelation coefficients t> * or related dimensions (partial correlation, or "parcor" coefficients) are transmitted to a remote station t · I ·, voice detection can occur far from the codec.

* · » . » < · » 1 f I • : I · I · « • t * ·* · ». »<·» 1 f I •: I · I · «• t * ·

Claims

1. Voice activity detector apparatus comprising: (i) a first voice activity detector (3-6,14), which operates by creating a matte of the spectral similarity between a portion of the input signal and the portion of the input signal considered free from tai for forming such an output signal showing the presence or absence of speech in the input signal; (ii) data stored in memory (15) obtained from said voiceless portion, and (iii) an additional detector (20) for voice activity; ::: characterized in that the voice detector auxiliary detector (20) alone controls the updating of the memory (15), the voice activity auxiliary detector (20) acting by forming a *: measured by the spectral similarity between the current portion of the input signal and the previous portion of the signal.

2. Voice activity detector apparatus comprising: / (i) means (1) for receiving an input signal; (ii) memory (15) for storing the signal representing noise, which signal. * represents an estimate of the noise component of said input signal; (iii) means (3-6, 14) for periodically forming a measure of the spectral similarity between the input signal and said estimated noise component of the input signal and said signal representing ' (Iv) means (7) for comparing the measurement with a threshold value T for generating an output signal to indicate the presence or absence of tai; (v) an auxiliary detector (20) for voice activity; (iv) updating means for updating the memory from the input signal; Characterized in that the auxiliary detector for voice activity operates depending on the output generated from the measure of spectral similarity between the current portion of the input signal and the preceding portion of the input signal, which output indicates the presence or absence of tai, and that the memory update device is only operable for the memory when said control signal indicates absence of tai. 10

Device according to claim 2, characterized in that it further comprises means for controlling said threshold value during such periods when said control signal indicates the absence of tai.

Device according to claim 2 or 3, characterized in that said additional detector for voice activity further comprises vocalized tai detection devices (27), which include speech height analyzers for generating a signal indicating the presence of vocalized tai, where it is controlled the additional detector (20) for voice activity also depends on said signal. 20

»»: 5. Device for encoding the voice signal, characterized in that it comprises a device according to any of the preceding claims. I # »• · • · · • · f

Mobile phone device, characterized in that it comprises a device according to any preceding claim.

A method of detecting voice activity in the input signal, which comprises the following steps: receiving said input signal; ** estimating the noise signal component of said input signal; / storing data representing said noise signal component; »: Creating the measurement M of the spectral similarity between a portion of the input signal and said noise signal component; and comparing a parameter derived from the measured M with a first threshold T to form an indication of the activity of the primary voice to show the presence or absence of tai, depending on whether or not this value is exceeded, wherein said estimation phase includes a additional display of voice activity and which method is characterized in that said additional display of voice activity includes; forming a measured spectral distortion of the similarity between the current portion of the input signal and the previous portions of the input signal; comparing the spectral distortion with one another to create an indication of the presence or absence of tai, depending on whether this value is exceeded or not; and updating said stored data of the input signal, only from such periods where said supplemental indication of voice activity indicates absence of tai. 15 «·» • · · • * • · I t <I • · • * (<> • «« • · »• * * • · •> •« * i i »*» »»