[go: up one dir, main page]

WO2003063138A1 - Voice activity detector and validator for noisy environments - Google Patents

Voice activity detector and validator for noisy environments Download PDF

Info

Publication number
WO2003063138A1
WO2003063138A1 PCT/EP2003/000271 EP0300271W WO03063138A1 WO 2003063138 A1 WO2003063138 A1 WO 2003063138A1 EP 0300271 W EP0300271 W EP 0300271W WO 03063138 A1 WO03063138 A1 WO 03063138A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
frame
input
communication unit
signal
Prior art date
Application number
PCT/EP2003/000271
Other languages
English (en)
French (fr)
Inventor
Douglas Ralph Ealey
Holly Louise Kelleher
David John Benjamin Pearce
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to KR10-2004-7011459A priority Critical patent/KR20040075959A/ko
Priority to JP2003562919A priority patent/JP2005516247A/ja
Publication of WO2003063138A1 publication Critical patent/WO2003063138A1/en
Priority to FI20041013A priority patent/FI124869B/fi

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold

Definitions

  • This invention relates to detection of speech (commonly known as voice activity detection (VAD) ) within a noisy environment.
  • VAD voice activity detection
  • the invention is applicable to, but not limited to, energy acceleration measurement of voice signals in a speech detection system.
  • GSM global system for mobile communications
  • TETRA TErrestrial Trunked RAdio
  • a voice activity detector operates under the assumption that speech is present only in part of the audio signal. This assumption is usually correct, since there are many audio signal intervals that exhibit only silence or background noise.
  • a voice activity detector can be used for many purposes. These include suppressing overall transmission activity in a transmission system, when there is no speech, thus potentially saving power and channel bandwidth. When the VAD detects that speech activity has resumed, it can reinitiate transmission activity.
  • a voice activity detector can also be used in conjunction with speech storage devices, by differentiating audio portions which include speech from those that are
  • Conventional methods for detecting voice are based, at least in part, on methods for detecting and assessing the power of a speech signal.
  • the estimated power is compared to either a constant or an adaptive threshold, in order to make a decision on whether the signal was speech or not.
  • the main advantage of these methods is their low complexity, which makes them suitable for low- processing resource implementations.
  • the main disadvantage of such methods is that background noise can inadvertently result in "speech" being detected when no "speech” is actually present. Alternatively, "speech" that is present may not be detected because it is obscured, and difficult to detect due to the background noise.
  • Some methods for detecting speech activity are directed at noisy mobile environments and are based on adaptive filtering of the speech signal . This reduces the noise content from the signal, prior to the final decision.
  • the frequency spectrum and noise level may vary because the method will be used for different speakers and in different environments.
  • the input filter and thresholds are often adaptive so as to track these variations .
  • European Patent application No. EP-A- 0785419 by Benyassine et al is directed to a method for voice activity detection that includes the following steps : (i) Extracting a predetermined set of parameters from the incoming speech signal for each frame, and (ii) Making a frame voicing decision of the incoming speech signal for each frame according to a set of difference measures extracted from the predetermined set of parameters.
  • the VAD in cellular systems is biased in order to ensure that when a party speaks, the radio, including the speech codec and RF circuitry etc., will be active to convey that speech to the other party in the presence of background noise and other impairments. However, this leads to transmission of data when a party is not speaking. The cost of this is slightly lower battery life and slightly increased interference to co-channel users in other cells of the system. These are essentially second (or higher) order effects.
  • VADs/VODs voice activity or voice onset detectors
  • characteristics of the speech such as harmonic structure (e.g., via autocorrelation) to distinguish voiced speech.
  • harmonic structure e.g., via autocorrelation
  • these structural indicators can fail, either due to disruption of the speech structure or due to structure in the noise. This might be e.g., engine, tyre or air-conditioning noise in a car.
  • these methods are poor at detecting unvoiced speech.
  • noise levels in one set of examples may be greater than speech levels in another - this makes it impossible to set a threshold value.
  • the traditional method to overcome this is to average the first 100msec or so of an utterance on the assumption that this is representative of noise, creating an ad hoc threshold for that utterance. Again, however, this is insufficient for non-stationary noise where the noise may rapidly diverge from the initial estimate, where the noise has high variance or where the first few frames actually contain speech rather than the presumed noise.
  • a communication unit as claimed in claim 1.
  • a method of detecting a speech signal input to a communication unit as claimed in claim 11.
  • a method of deciding whether a signal input to a communication unit is speech or noise as claimed in claim 14.
  • the present invention aims to address the case of .arbitrary amplitude, non-stationary noise, by the use of an energy acceleration measurement in preference to an energy amplitude measurement to denote the presence, or absence, of speech.
  • FIG. 1 illustrates a block diagram of a communication unit adapted to perform the voice activity detection and validation of the preferred embodiment of the present invention
  • FIG. 2 illustrates a flowchart of an energy acceleration based voice activity detector for noisy environments in accordance with a preferred embodiment of the present invention
  • FIG. 3 illustrates a flowchart of an energy acceleration based voice activity validation for noisy environments in accordance with a preferred embodiment of the present invention
  • FIG. 4 illustrates a buffer operation in accordance with a preferred embodiment of the present invention.
  • Voiced speech has a comparatively high-energy acceleration value, as its onset is dependent upon the activation of the vocal cords, which are either vibrating or still.
  • unvoiced onsets e.g. plosives
  • the inventors have recognised that, in a representational domain emphasising voicing such as a narrowband power spectrum or the Mel-spectrum, the resultant energy acceleration is significantly higher than non-stationary noise.
  • impulsive noises e.g. a hand clap
  • the inventors have appreciated that one can additionally discriminate against these noises by concentrating on energy in the frequency region that is likely to contain a fundamental pitch of the voice signal.
  • the inventors of the present invention propose to use an unstructured characteristic of speech, namely energy acceleration (or acceleration of some metric reflecting the speech energy or components thereof) .
  • DSR distributed speech recognition
  • ETSI European Telecommunications Standards Institute
  • STQ Transmission and Quality aspects
  • FIG. 1 a block diagram of an audio subscriber unit 100, adapted to support the inventive concepts of the preferred embodiments of the present invention, is shown.
  • the preferred embodiment of the present invention is described with respect to a wireless audio communication unit, for example one capable of operating in the 3 rd generation partnership project (3GPP) standard for future cellular wireless communication systems and offering DSR capabilities.
  • 3GPP 3 rd generation partnership project
  • the inventive concepts herein described, relating to voice activity detection and validation thereof, are equally applicable to any electronic device that responds to voice signals, and which may benefit from improved voice activity detection circuitry.
  • the audio subscriber unit 100 contains an antenna 102 preferably coupled to a duplex filter, antenna switch or circulator 104 that provides isolation between receive and transmit chains within the audio subscriber unit 100.
  • the receiver chain includes receiver front-end circuitry 106 (effectively providing reception, filtering and intermediate or base-band frequency conversion) .
  • the front-end circuit 106 is serially coupled to a signal processing function (generally realised by a digital signal processor (DSP)) 108.
  • DSP digital signal processor
  • the signal processing function 108 performs signal demodulation, error correction and formatting.
  • Recovered data from the signal processing function 108 is serially coupled to an audio processing function 109, which formats the received signal in a suitable manner to send to an audio enunciator/display 111.
  • the signal processing function 108 and audio processing function 109 may be provided within the same physical device.
  • a controller 114 is configured to control the information flow and operational state of the elements of the subscriber unit 100.
  • this essentially includes an audio input device 120 coupled in series through the audio processing function 109, signal processing function 108, transmitter/modulation circuitry 122 and a power amplifier 124.
  • the processor 108, transmitter/modulation circuitry 122 and the power amplifier 124 are operationally responsive to the controller.
  • the power amplifier output is coupled to the duplex filter, antenna switch or circulator 104, and antenna 102 to radiate the final radio frequency signal.
  • audio processing function 109 includes a voice activity (or voice onset) detection (VAD) function 130 operably coupled to a voice activity decision function 135.
  • VAD voice activity detection
  • the VAD function 130 and voice activity decision function 135 have been adapted to provide improved voice detection and decision mechanism, the operation of which is further described with respect to FIG. 2 and FIG. 3.
  • the voice activity detector function 130 includes a frame-by-frame detection stage consisting of three measurements. The three frequency range measurements include:
  • the voice activity decision function 135 performs a decision based on a buffer of measurements, which are analysed for their speech likelihood.
  • the final decision from the decision stage is applied retrospectively to the earliest frame in the buffer.
  • a timer/counter 118 is also adapted to perform the timing functions in the detection and decision processes of FIG. 2 and FIG. 3.
  • the signal processor function 108, audio processing function 109, VAD function 130 and voice activity decision function 135 may be implemented as distinct, operably- coupled, processing elements. Alternatively, one or more processors may be used to implement one or more of the corresponding processing operations. In a yet further alternative embodiment, the aforementioned functions may be implemented as a mixture of hardware, software or firmware elements, using application specific integrated circuits (ASICs) and/or processors, for example digital signal processors (DSPs) .
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • the various components within the audio subscriber unit 100 can be realised in discrete or integrated component form, with an ultimate structure therefore being merely an arbitrary selection.
  • the method returns values that can be interpreted as ⁇ deceleration ' ⁇ 1 2 ⁇ x acceleration ' .
  • the preferred VAD and parameter initialisation systems within the detection stage are summarised in the flowchart of FIG. 2.
  • non-stationary noise long-term energy thresholds are not a reliable indicator of speech.
  • structure of the speech e.g. harmonics
  • the preferred voice activity detector uses a noise-robust characteristic of the speech, namely the energy acceleration associated with voice onset.
  • the preferred VAD mechanism relates to a "whole spectrum' measurement process.
  • a frame counter is initially assessed to determine whether' it is less than N' , which defines the number of buffered frames, as shown in step 205.
  • N' defines the number of buffered frames
  • X N' is set to "15', assuming it has been established that each frame increments by say, 10msec. If the frame counter is less than ⁇ ' in step 205, then the rolling average for an initial acceleration test is updated, as in step 210. If the frame counter is not less than X N' in step 205, then step 210 is skipped.
  • step 235 A determination is then made to assess whether the energy acceleration measurement is within one or more specified margin(s), as shown in step 235. If the energy acceleration measurement is within one or more specified margin (s) in step 235, then the rolling average is updated with the results of a further energy acceleration test, as in step 240. If the energy acceleration measurement is not within one or more specified margin (s) in step 235, then step 240 is skipped.
  • the frame counter is then incremented, as in step 275, and the process repeats from step 205.
  • a sub-region measurement process shown in optional steps 215 and 245 may be performed.
  • a particular sub-region of the spectrum is selected as that sub-region most likely to contain the fundamental pitch.
  • step 220 a determination is made to check whether the energy acceleration measurement is greater than the threshold value, as shown in step 220. If the energy acceleration measurement is greater than the threshold value in step 220, the process of initialising other parameters is suspended, as shown in step 225. If the energy acceleration measurement is not greater than the threshold value in step 220, the initialisation of other parameters is updated, as in step 230. The process then returns to step 235 as shown.
  • a further preferred determination is made after the determination to assess whether the energy acceleration measurement is within one or more specified margin (s) in step 235.
  • the deceleration value is assessed to determine if it is 'high' in step 250 and, if so, the rolling average for the energy acceleration test is slowly updated, as shown in step 255.
  • the process then returns to the whole spectrum method in step 260.
  • the preferred embodiment of the present invention incorporates the sub- region detector in order to augment the whole spectrum measurement .
  • a further measurement process is preferably performed using the 'acceleration' of the variance of values within, for example, the lower half of the spectrum of each frame.
  • the variance measure detects structure within the lower half of the spectrum, making it highly sensitive to voiced speech.
  • the variance measurement follows the approach of the sub-region process, with the lower half of the spectrum being the particular sub-region selected. This variance measurement further complements the whole spectrum measurement approach, which is better able to detect unvoiced and plosive speech.
  • the whole-spectrum detector uses the known Mel-filtered spectral representation of the filter gains generated by the first stage of the double Weiner filter.
  • a single input value is obtained by squaring the sum of the Mel filter banks.
  • the whole-spectrum detector in the preferred embodiment of the invention, applies the following process to all frames, as described below:
  • Step one initialises the noise estimate Tracker in the following manner:
  • Tracker MAX (Tracker, Input) .
  • the energy acceleration measure prevents the Tracker being updated if speech occurs within the lead-in time of 15 frames .
  • Step two updates the Tracker value if the current input is similar to the noise estimate, in the following manner:
  • Step three provides a failsafe mechanism for those instances where there is speech or an uncharacteristically large noise content within the first few frames . This causes the resulting erroneously high noise estimate to decay. Step three preferably functions in the following manner :
  • Step four returns, as a 'true' speech determination, if the current input is more that 165% larger than the Tracker, in the following manner:
  • the ratio of the instantaneous input to the short-term mean Tracker is a function of the energy acceleration of successive inputs.
  • UpperBound is 150% and LowerBound 75% ;
  • Threshold is 165% . Notably, there is no update if the value is greater than UpperBound, or between LowerBound and Floor.
  • the energy acceleration input can be calculated either as: double-differentiation of successive inputs, or estimated by tracking the ratio of two rolling averages of the inputs.
  • the ratio of fast and slow-adapting rolling averages reflects the energy acceleration of successive inputs.
  • the sub-band detector preferably uses the average of the second, third and fourth Mel-filter banks derived for the
  • the detector then applies the following process to all frames, in the manner described below:
  • the variance of the values comprising the lower frequency half of the narrowband spectral representation of the gain for each frame is used as an input.
  • the detector then applies exactly the same process as for the whole spectrum measurement .
  • the variance is calculated as :
  • N FFT Length/4
  • wi are the values of the narrowband spectral representation of the gain.
  • N FFT Length/4
  • wi are the values of the narrowband spectral representation of the gain.
  • the three measures detailed above are presented to a VAD decision algorithm, as shown in the flowchart of FIG. 3. Successive inputs are presented to a buffer, which provides contextual analysis. This introduces a frame delay equal to the length of the buffer minus one frame .
  • FIG. 3 a flowchart 300 of an acceleration-based voice activity validation process for noisy environments is illustrated, in accordance with a preferred embodiment of the present invention.
  • the decision logic applies a number, and preferably each, of the following steps:
  • Input VJJ is defined as 'true' (T) if any of the three measurements returns a true speech indication.
  • Step 3 The algorithm searches for the longest contiguous sequence of 'true' values in the buffer, as in step 310. Hence, for example, for the sequence ' T T F T T T F' , M would equal ' 3 ' . Step 3 :
  • Step 4
  • step 365 If T>0 output TRUE else output FALSE; If the timer is greater than zero, in step 365, the process outputs a 'true' speech decision, as shown in step 370. Alternatively, if the timer is not greater than zero, in step 365, the process outputs a 'noise' decision, as shown in step 375.
  • Step 7 Frame++, Shift buffer left and return to step 1.
  • the buffer is left-shifted to accommodate the next input, as shown with respect to FIG. 4.
  • the output speech decision is applied to the frame being ejected from the buffer. The process then repeats again, at step 305, for the next true/false input to the data buffer.
  • the decision mechanism may not be based on one or more timer (s) , and may make a decision purely on whether one or more energy acceleration thresholds are exceeded.
  • frames #2-#5 indicate 'true' due to the buffer lead- in function.
  • Frames #6-#8 indicate 'true' as the positions of the actual original 'true' speech inputs.
  • Frames #9-#12 indicate 'true' due to the buffer lead-out function.
  • Frames #13 -#18 indicate 'true' in response to the timer hangover that is used.
  • the buffer length and hangover timers can be adjusted dynamically to suit the audio communication unit's needs.
  • the preferred embodiment of using a buffer length 'N' of 8, and a hangover timer of five frames are used for explanatory purposes only.
  • the energy acceleration measure performed in the method steps of FIG. 2 can be used to validate the initialisation of other parameters.
  • a spectral subtraction scheme requires an initial estimate of the noise, based on the first ten frames (typically 100msec) of speech. Even in stationary noise, several events may occur to invalidate the initial estimate. Examples of such events include:
  • the energy acceleration measure can identify this and so suspend noise-based initialisations, as shown in step 225 of FIG. 2, or force the use of default estimates.
  • a communication unit includes an audio processing unit having a voice activity detection mechanism.
  • the voice activity detection mechanism provides an indication of energy acceleration of a signal input to the communication unit and determines whether said input signal is speech or noise based on said indication.
  • the method includes the steps of indicating the acceleration of an input signal to the communication unit; and determining whether said input signal is speech or noise based on said step of indicating.
  • the method includes the step of deciding whether said input signal is speech or noise based on an energy acceleration, for example using a frame average or a rolling average of a number of input signals.
  • the energy acceleration based voice activity detector and validator for noisy environments described above provides the advantages of noise robustness and fast response.
  • the preferred embodiment uses a measure dependant upon energy acceleration, instead of an absolute measurement, the inventive concepts herein described can be applied to speech of any input level .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Telephone Function (AREA)
PCT/EP2003/000271 2002-01-24 2003-01-10 Voice activity detector and validator for noisy environments WO2003063138A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR10-2004-7011459A KR20040075959A (ko) 2002-01-24 2003-01-10 잡음 환경들에 대한 음성 활동도 검출기 및 밸리데이터
JP2003562919A JP2005516247A (ja) 2002-01-24 2003-01-10 雑音環境のための音声活動検出器及び有効化器
FI20041013A FI124869B (fi) 2002-01-24 2004-07-22 Ääniaktiviteetin tunnistin ja hyväksyjä kohinallisia ympäristöjä varten

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0201585.7 2002-01-24
GB0201585A GB2384670B (en) 2002-01-24 2002-01-24 Voice activity detector and validator for noisy environments

Publications (1)

Publication Number Publication Date
WO2003063138A1 true WO2003063138A1 (en) 2003-07-31

Family

ID=9929648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/000271 WO2003063138A1 (en) 2002-01-24 2003-01-10 Voice activity detector and validator for noisy environments

Country Status (6)

Country Link
JP (2) JP2005516247A (zh)
KR (2) KR100976082B1 (zh)
CN (1) CN1307613C (zh)
FI (1) FI124869B (zh)
GB (1) GB2384670B (zh)
WO (1) WO2003063138A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010048999A1 (en) * 2008-10-30 2010-05-06 Telefonaktiebolaget Lm Ericsson (Publ) Telephony content signal discrimination
KR101196518B1 (ko) 2011-04-05 2012-11-01 한국과학기술연구원 실시간 음성 활동 검출 장치 및 검출 방법
US8909522B2 (en) 2007-07-10 2014-12-09 Motorola Solutions, Inc. Voice activity detector based upon a detected change in energy levels between sub-frames and a method of operation
US20160267923A1 (en) * 2015-03-09 2016-09-15 Tomoyuki Goto Communication apparatus, communication system, method of storing log data, and storage medium

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100657912B1 (ko) * 2004-11-18 2006-12-14 삼성전자주식회사 잡음 제거 방법 및 장치
CN100543841C (zh) * 2005-10-21 2009-09-23 神基科技股份有限公司 音源处理电路结构及其处理方法
JP4758879B2 (ja) * 2006-12-14 2011-08-31 日本電信電話株式会社 仮音声区間決定装置、方法、プログラム及びその記録媒体、音声区間決定装置、方法
CN102044241B (zh) * 2009-10-15 2012-04-04 华为技术有限公司 一种实现通信系统中背景噪声的跟踪的方法和装置
EP2561508A1 (en) * 2010-04-22 2013-02-27 Qualcomm Incorporated Voice activity detection
US8898058B2 (en) 2010-10-25 2014-11-25 Qualcomm Incorporated Systems, methods, and apparatus for voice activity detection
RU2544293C1 (ru) * 2013-10-11 2015-03-20 Сергей Александрович Косарев Способ измерения физической величины с помощью мобильного электронного устройства и внешнего блока
US9953661B2 (en) * 2014-09-26 2018-04-24 Cirrus Logic Inc. Neural network voice activity detection employing running range normalization
CN104575498B (zh) * 2015-01-30 2018-08-17 深圳市云之讯网络技术有限公司 有效语音识别方法及系统
CN109841223B (zh) * 2019-03-06 2020-11-24 深圳大学 一种音频信号处理方法、智能终端及存储介质
US11217262B2 (en) * 2019-11-18 2022-01-04 Google Llc Adaptive energy limiting for transient noise suppression
CN112820324B (zh) * 2020-12-31 2024-06-25 平安科技(深圳)有限公司 多标签语音活动检测方法、装置及存储介质
KR102453919B1 (ko) 2022-05-09 2022-10-12 (주)피플리 인공지능 기반 문화 콘텐츠 관련 가이드 음원의 검증 방법, 장치 및 시스템

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0785419A2 (en) * 1996-01-22 1997-07-23 Rockwell International Corporation Voice activity detection
US6009391A (en) * 1997-06-27 1999-12-28 Advanced Micro Devices, Inc. Line spectral frequencies and energy features in a robust signal recognition system

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1209561B (it) * 1983-07-14 1989-08-30 Gte Laboratories Inc Rivelazione complementare della parola.
JP2559475B2 (ja) * 1988-09-22 1996-12-04 積水化学工業株式会社 音声検出方式
JPH03114100A (ja) * 1989-09-28 1991-05-15 Matsushita Electric Ind Co Ltd 音声区間検出装置
JP3024447B2 (ja) * 1993-07-13 2000-03-21 日本電気株式会社 音声圧縮装置
JP3109978B2 (ja) * 1995-04-28 2000-11-20 松下電器産業株式会社 音声区間検出装置
JPH10171497A (ja) * 1996-12-12 1998-06-26 Oki Electric Ind Co Ltd 背景雑音除去装置
US5946649A (en) * 1997-04-16 1999-08-31 Technology Research Association Of Medical Welfare Apparatus Esophageal speech injection noise detection and rejection
JP3297346B2 (ja) * 1997-04-30 2002-07-02 沖電気工業株式会社 音声検出装置
JPH10327089A (ja) * 1997-05-23 1998-12-08 Matsushita Electric Ind Co Ltd 携帯電話装置
JPH113091A (ja) * 1997-06-13 1999-01-06 Matsushita Electric Ind Co Ltd 音声信号の立ち上がり検出装置
FR2768544B1 (fr) * 1997-09-18 1999-11-19 Matra Communication Procede de detection d'activite vocale
JP4221537B2 (ja) * 2000-06-02 2009-02-12 日本電気株式会社 音声検出方法及び装置とその記録媒体

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0785419A2 (en) * 1996-01-22 1997-07-23 Rockwell International Corporation Voice activity detection
US6009391A (en) * 1997-06-27 1999-12-28 Advanced Micro Devices, Inc. Line spectral frequencies and energy features in a robust signal recognition system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8909522B2 (en) 2007-07-10 2014-12-09 Motorola Solutions, Inc. Voice activity detector based upon a detected change in energy levels between sub-frames and a method of operation
WO2010048999A1 (en) * 2008-10-30 2010-05-06 Telefonaktiebolaget Lm Ericsson (Publ) Telephony content signal discrimination
CN102272826A (zh) * 2008-10-30 2011-12-07 爱立信电话股份有限公司 电话内容信号鉴别
US8407044B2 (en) 2008-10-30 2013-03-26 Telefonaktiebolaget Lm Ericsson (Publ) Telephony content signal discrimination
CN102272826B (zh) * 2008-10-30 2015-10-07 爱立信电话股份有限公司 电话内容信号鉴别
KR101196518B1 (ko) 2011-04-05 2012-11-01 한국과학기술연구원 실시간 음성 활동 검출 장치 및 검출 방법
US20160267923A1 (en) * 2015-03-09 2016-09-15 Tomoyuki Goto Communication apparatus, communication system, method of storing log data, and storage medium

Also Published As

Publication number Publication date
CN1623186A (zh) 2005-06-01
GB2384670A (en) 2003-07-30
KR100976082B1 (ko) 2010-08-16
KR20090127182A (ko) 2009-12-09
KR20040075959A (ko) 2004-08-30
CN1307613C (zh) 2007-03-28
FI124869B (fi) 2015-02-27
JP2005516247A (ja) 2005-06-02
GB2384670B (en) 2004-02-18
GB0201585D0 (en) 2002-03-13
FI20041013L (fi) 2004-09-22
JP2010061151A (ja) 2010-03-18

Similar Documents

Publication Publication Date Title
JP2010061151A (ja) 雑音環境のための音声活動検出器及び有効化器
KR100944252B1 (ko) 오디오 신호 내에서 음성활동 탐지
US6810273B1 (en) Noise suppression
KR101852892B1 (ko) 음성 인식 방법, 음성 인식 장치 및 전자 장치
US8977556B2 (en) Voice detector and a method for suppressing sub-bands in a voice detector
US20020165711A1 (en) Voice-activity detection using energy ratios and periodicity
US20080095384A1 (en) Apparatus and method for detecting voice end point
US8751221B2 (en) Communication apparatus for adjusting a voice signal
JP3878482B2 (ja) 音声検出装置および音声検出方法
US8924199B2 (en) Voice correction device, voice correction method, and recording medium storing voice correction program
JP2007179073A (ja) 音声活性検出装置及び移動局並びに音声活性検出方法
EP3438979B1 (en) Estimation of background noise in audio signals
EP2743923B1 (en) Voice processing device, voice processing method
KR100848798B1 (ko) 배경 노이즈의 고속 동적 추정을 위한 방법
JPH05244105A (ja) 音声検出方法および装置
US8788265B2 (en) System and method for babble noise detection
CN111128244B (zh) 基于过零率检测的短波通信语音激活检测方法
US6633847B1 (en) Voice activated circuit and radio using same
KR101336203B1 (ko) 전자기기에서 음성 검출 방법 및 장치
Wang et al. An effective voice activity detection algorithm in mobile communication corrupted by impulse noise
WO2007040883A2 (en) Voice activity detector

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1878/DELNP/2004

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2003562919

Country of ref document: JP

Ref document number: 20041013

Country of ref document: FI

WWE Wipo information: entry into national phase

Ref document number: 20038026821

Country of ref document: CN

Ref document number: 1020047011459

Country of ref document: KR

122 Ep: pct application non-entry in european phase
WWE Wipo information: entry into national phase

Ref document number: 1020097022615

Country of ref document: KR