WO2003063138A1 - Voice activity detector and validator for noisy environments - Google Patents
Voice activity detector and validator for noisy environments Download PDFInfo
- Publication number
- WO2003063138A1 WO2003063138A1 PCT/EP2003/000271 EP0300271W WO03063138A1 WO 2003063138 A1 WO2003063138 A1 WO 2003063138A1 EP 0300271 W EP0300271 W EP 0300271W WO 03063138 A1 WO03063138 A1 WO 03063138A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- frame
- input
- communication unit
- signal
- Prior art date
Links
- 230000000694 effects Effects 0.000 title claims abstract description 49
- 230000001133 acceleration Effects 0.000 claims abstract description 63
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000004891 communication Methods 0.000 claims abstract description 33
- 238000001514 detection method Methods 0.000 claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000007246 mechanism Effects 0.000 claims abstract description 16
- 238000005259 measurement Methods 0.000 claims description 43
- 238000001228 spectrum Methods 0.000 claims description 22
- 230000003595 spectral effect Effects 0.000 claims description 13
- 238000005096 rolling process Methods 0.000 claims description 11
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000003139 buffering effect Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 4
- 230000004044 response Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 25
- 230000005540 biological transmission Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000010200 validation analysis Methods 0.000 description 4
- 206010019133 Hangover Diseases 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000006837 decompression Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
- G10L2025/786—Adaptive threshold
Definitions
- This invention relates to detection of speech (commonly known as voice activity detection (VAD) ) within a noisy environment.
- VAD voice activity detection
- the invention is applicable to, but not limited to, energy acceleration measurement of voice signals in a speech detection system.
- GSM global system for mobile communications
- TETRA TErrestrial Trunked RAdio
- a voice activity detector operates under the assumption that speech is present only in part of the audio signal. This assumption is usually correct, since there are many audio signal intervals that exhibit only silence or background noise.
- a voice activity detector can be used for many purposes. These include suppressing overall transmission activity in a transmission system, when there is no speech, thus potentially saving power and channel bandwidth. When the VAD detects that speech activity has resumed, it can reinitiate transmission activity.
- a voice activity detector can also be used in conjunction with speech storage devices, by differentiating audio portions which include speech from those that are
- Conventional methods for detecting voice are based, at least in part, on methods for detecting and assessing the power of a speech signal.
- the estimated power is compared to either a constant or an adaptive threshold, in order to make a decision on whether the signal was speech or not.
- the main advantage of these methods is their low complexity, which makes them suitable for low- processing resource implementations.
- the main disadvantage of such methods is that background noise can inadvertently result in "speech" being detected when no "speech” is actually present. Alternatively, "speech" that is present may not be detected because it is obscured, and difficult to detect due to the background noise.
- Some methods for detecting speech activity are directed at noisy mobile environments and are based on adaptive filtering of the speech signal . This reduces the noise content from the signal, prior to the final decision.
- the frequency spectrum and noise level may vary because the method will be used for different speakers and in different environments.
- the input filter and thresholds are often adaptive so as to track these variations .
- European Patent application No. EP-A- 0785419 by Benyassine et al is directed to a method for voice activity detection that includes the following steps : (i) Extracting a predetermined set of parameters from the incoming speech signal for each frame, and (ii) Making a frame voicing decision of the incoming speech signal for each frame according to a set of difference measures extracted from the predetermined set of parameters.
- the VAD in cellular systems is biased in order to ensure that when a party speaks, the radio, including the speech codec and RF circuitry etc., will be active to convey that speech to the other party in the presence of background noise and other impairments. However, this leads to transmission of data when a party is not speaking. The cost of this is slightly lower battery life and slightly increased interference to co-channel users in other cells of the system. These are essentially second (or higher) order effects.
- VADs/VODs voice activity or voice onset detectors
- characteristics of the speech such as harmonic structure (e.g., via autocorrelation) to distinguish voiced speech.
- harmonic structure e.g., via autocorrelation
- these structural indicators can fail, either due to disruption of the speech structure or due to structure in the noise. This might be e.g., engine, tyre or air-conditioning noise in a car.
- these methods are poor at detecting unvoiced speech.
- noise levels in one set of examples may be greater than speech levels in another - this makes it impossible to set a threshold value.
- the traditional method to overcome this is to average the first 100msec or so of an utterance on the assumption that this is representative of noise, creating an ad hoc threshold for that utterance. Again, however, this is insufficient for non-stationary noise where the noise may rapidly diverge from the initial estimate, where the noise has high variance or where the first few frames actually contain speech rather than the presumed noise.
- a communication unit as claimed in claim 1.
- a method of detecting a speech signal input to a communication unit as claimed in claim 11.
- a method of deciding whether a signal input to a communication unit is speech or noise as claimed in claim 14.
- the present invention aims to address the case of .arbitrary amplitude, non-stationary noise, by the use of an energy acceleration measurement in preference to an energy amplitude measurement to denote the presence, or absence, of speech.
- FIG. 1 illustrates a block diagram of a communication unit adapted to perform the voice activity detection and validation of the preferred embodiment of the present invention
- FIG. 2 illustrates a flowchart of an energy acceleration based voice activity detector for noisy environments in accordance with a preferred embodiment of the present invention
- FIG. 3 illustrates a flowchart of an energy acceleration based voice activity validation for noisy environments in accordance with a preferred embodiment of the present invention
- FIG. 4 illustrates a buffer operation in accordance with a preferred embodiment of the present invention.
- Voiced speech has a comparatively high-energy acceleration value, as its onset is dependent upon the activation of the vocal cords, which are either vibrating or still.
- unvoiced onsets e.g. plosives
- the inventors have recognised that, in a representational domain emphasising voicing such as a narrowband power spectrum or the Mel-spectrum, the resultant energy acceleration is significantly higher than non-stationary noise.
- impulsive noises e.g. a hand clap
- the inventors have appreciated that one can additionally discriminate against these noises by concentrating on energy in the frequency region that is likely to contain a fundamental pitch of the voice signal.
- the inventors of the present invention propose to use an unstructured characteristic of speech, namely energy acceleration (or acceleration of some metric reflecting the speech energy or components thereof) .
- DSR distributed speech recognition
- ETSI European Telecommunications Standards Institute
- STQ Transmission and Quality aspects
- FIG. 1 a block diagram of an audio subscriber unit 100, adapted to support the inventive concepts of the preferred embodiments of the present invention, is shown.
- the preferred embodiment of the present invention is described with respect to a wireless audio communication unit, for example one capable of operating in the 3 rd generation partnership project (3GPP) standard for future cellular wireless communication systems and offering DSR capabilities.
- 3GPP 3 rd generation partnership project
- the inventive concepts herein described, relating to voice activity detection and validation thereof, are equally applicable to any electronic device that responds to voice signals, and which may benefit from improved voice activity detection circuitry.
- the audio subscriber unit 100 contains an antenna 102 preferably coupled to a duplex filter, antenna switch or circulator 104 that provides isolation between receive and transmit chains within the audio subscriber unit 100.
- the receiver chain includes receiver front-end circuitry 106 (effectively providing reception, filtering and intermediate or base-band frequency conversion) .
- the front-end circuit 106 is serially coupled to a signal processing function (generally realised by a digital signal processor (DSP)) 108.
- DSP digital signal processor
- the signal processing function 108 performs signal demodulation, error correction and formatting.
- Recovered data from the signal processing function 108 is serially coupled to an audio processing function 109, which formats the received signal in a suitable manner to send to an audio enunciator/display 111.
- the signal processing function 108 and audio processing function 109 may be provided within the same physical device.
- a controller 114 is configured to control the information flow and operational state of the elements of the subscriber unit 100.
- this essentially includes an audio input device 120 coupled in series through the audio processing function 109, signal processing function 108, transmitter/modulation circuitry 122 and a power amplifier 124.
- the processor 108, transmitter/modulation circuitry 122 and the power amplifier 124 are operationally responsive to the controller.
- the power amplifier output is coupled to the duplex filter, antenna switch or circulator 104, and antenna 102 to radiate the final radio frequency signal.
- audio processing function 109 includes a voice activity (or voice onset) detection (VAD) function 130 operably coupled to a voice activity decision function 135.
- VAD voice activity detection
- the VAD function 130 and voice activity decision function 135 have been adapted to provide improved voice detection and decision mechanism, the operation of which is further described with respect to FIG. 2 and FIG. 3.
- the voice activity detector function 130 includes a frame-by-frame detection stage consisting of three measurements. The three frequency range measurements include:
- the voice activity decision function 135 performs a decision based on a buffer of measurements, which are analysed for their speech likelihood.
- the final decision from the decision stage is applied retrospectively to the earliest frame in the buffer.
- a timer/counter 118 is also adapted to perform the timing functions in the detection and decision processes of FIG. 2 and FIG. 3.
- the signal processor function 108, audio processing function 109, VAD function 130 and voice activity decision function 135 may be implemented as distinct, operably- coupled, processing elements. Alternatively, one or more processors may be used to implement one or more of the corresponding processing operations. In a yet further alternative embodiment, the aforementioned functions may be implemented as a mixture of hardware, software or firmware elements, using application specific integrated circuits (ASICs) and/or processors, for example digital signal processors (DSPs) .
- ASICs application specific integrated circuits
- DSPs digital signal processors
- the various components within the audio subscriber unit 100 can be realised in discrete or integrated component form, with an ultimate structure therefore being merely an arbitrary selection.
- the method returns values that can be interpreted as ⁇ deceleration ' ⁇ 1 2 ⁇ x acceleration ' .
- the preferred VAD and parameter initialisation systems within the detection stage are summarised in the flowchart of FIG. 2.
- non-stationary noise long-term energy thresholds are not a reliable indicator of speech.
- structure of the speech e.g. harmonics
- the preferred voice activity detector uses a noise-robust characteristic of the speech, namely the energy acceleration associated with voice onset.
- the preferred VAD mechanism relates to a "whole spectrum' measurement process.
- a frame counter is initially assessed to determine whether' it is less than N' , which defines the number of buffered frames, as shown in step 205.
- N' defines the number of buffered frames
- X N' is set to "15', assuming it has been established that each frame increments by say, 10msec. If the frame counter is less than ⁇ ' in step 205, then the rolling average for an initial acceleration test is updated, as in step 210. If the frame counter is not less than X N' in step 205, then step 210 is skipped.
- step 235 A determination is then made to assess whether the energy acceleration measurement is within one or more specified margin(s), as shown in step 235. If the energy acceleration measurement is within one or more specified margin (s) in step 235, then the rolling average is updated with the results of a further energy acceleration test, as in step 240. If the energy acceleration measurement is not within one or more specified margin (s) in step 235, then step 240 is skipped.
- the frame counter is then incremented, as in step 275, and the process repeats from step 205.
- a sub-region measurement process shown in optional steps 215 and 245 may be performed.
- a particular sub-region of the spectrum is selected as that sub-region most likely to contain the fundamental pitch.
- step 220 a determination is made to check whether the energy acceleration measurement is greater than the threshold value, as shown in step 220. If the energy acceleration measurement is greater than the threshold value in step 220, the process of initialising other parameters is suspended, as shown in step 225. If the energy acceleration measurement is not greater than the threshold value in step 220, the initialisation of other parameters is updated, as in step 230. The process then returns to step 235 as shown.
- a further preferred determination is made after the determination to assess whether the energy acceleration measurement is within one or more specified margin (s) in step 235.
- the deceleration value is assessed to determine if it is 'high' in step 250 and, if so, the rolling average for the energy acceleration test is slowly updated, as shown in step 255.
- the process then returns to the whole spectrum method in step 260.
- the preferred embodiment of the present invention incorporates the sub- region detector in order to augment the whole spectrum measurement .
- a further measurement process is preferably performed using the 'acceleration' of the variance of values within, for example, the lower half of the spectrum of each frame.
- the variance measure detects structure within the lower half of the spectrum, making it highly sensitive to voiced speech.
- the variance measurement follows the approach of the sub-region process, with the lower half of the spectrum being the particular sub-region selected. This variance measurement further complements the whole spectrum measurement approach, which is better able to detect unvoiced and plosive speech.
- the whole-spectrum detector uses the known Mel-filtered spectral representation of the filter gains generated by the first stage of the double Weiner filter.
- a single input value is obtained by squaring the sum of the Mel filter banks.
- the whole-spectrum detector in the preferred embodiment of the invention, applies the following process to all frames, as described below:
- Step one initialises the noise estimate Tracker in the following manner:
- Tracker MAX (Tracker, Input) .
- the energy acceleration measure prevents the Tracker being updated if speech occurs within the lead-in time of 15 frames .
- Step two updates the Tracker value if the current input is similar to the noise estimate, in the following manner:
- Step three provides a failsafe mechanism for those instances where there is speech or an uncharacteristically large noise content within the first few frames . This causes the resulting erroneously high noise estimate to decay. Step three preferably functions in the following manner :
- Step four returns, as a 'true' speech determination, if the current input is more that 165% larger than the Tracker, in the following manner:
- the ratio of the instantaneous input to the short-term mean Tracker is a function of the energy acceleration of successive inputs.
- UpperBound is 150% and LowerBound 75% ;
- Threshold is 165% . Notably, there is no update if the value is greater than UpperBound, or between LowerBound and Floor.
- the energy acceleration input can be calculated either as: double-differentiation of successive inputs, or estimated by tracking the ratio of two rolling averages of the inputs.
- the ratio of fast and slow-adapting rolling averages reflects the energy acceleration of successive inputs.
- the sub-band detector preferably uses the average of the second, third and fourth Mel-filter banks derived for the
- the detector then applies the following process to all frames, in the manner described below:
- the variance of the values comprising the lower frequency half of the narrowband spectral representation of the gain for each frame is used as an input.
- the detector then applies exactly the same process as for the whole spectrum measurement .
- the variance is calculated as :
- N FFT Length/4
- wi are the values of the narrowband spectral representation of the gain.
- N FFT Length/4
- wi are the values of the narrowband spectral representation of the gain.
- the three measures detailed above are presented to a VAD decision algorithm, as shown in the flowchart of FIG. 3. Successive inputs are presented to a buffer, which provides contextual analysis. This introduces a frame delay equal to the length of the buffer minus one frame .
- FIG. 3 a flowchart 300 of an acceleration-based voice activity validation process for noisy environments is illustrated, in accordance with a preferred embodiment of the present invention.
- the decision logic applies a number, and preferably each, of the following steps:
- Input VJJ is defined as 'true' (T) if any of the three measurements returns a true speech indication.
- Step 3 The algorithm searches for the longest contiguous sequence of 'true' values in the buffer, as in step 310. Hence, for example, for the sequence ' T T F T T T F' , M would equal ' 3 ' . Step 3 :
- Step 4
- step 365 If T>0 output TRUE else output FALSE; If the timer is greater than zero, in step 365, the process outputs a 'true' speech decision, as shown in step 370. Alternatively, if the timer is not greater than zero, in step 365, the process outputs a 'noise' decision, as shown in step 375.
- Step 7 Frame++, Shift buffer left and return to step 1.
- the buffer is left-shifted to accommodate the next input, as shown with respect to FIG. 4.
- the output speech decision is applied to the frame being ejected from the buffer. The process then repeats again, at step 305, for the next true/false input to the data buffer.
- the decision mechanism may not be based on one or more timer (s) , and may make a decision purely on whether one or more energy acceleration thresholds are exceeded.
- frames #2-#5 indicate 'true' due to the buffer lead- in function.
- Frames #6-#8 indicate 'true' as the positions of the actual original 'true' speech inputs.
- Frames #9-#12 indicate 'true' due to the buffer lead-out function.
- Frames #13 -#18 indicate 'true' in response to the timer hangover that is used.
- the buffer length and hangover timers can be adjusted dynamically to suit the audio communication unit's needs.
- the preferred embodiment of using a buffer length 'N' of 8, and a hangover timer of five frames are used for explanatory purposes only.
- the energy acceleration measure performed in the method steps of FIG. 2 can be used to validate the initialisation of other parameters.
- a spectral subtraction scheme requires an initial estimate of the noise, based on the first ten frames (typically 100msec) of speech. Even in stationary noise, several events may occur to invalidate the initial estimate. Examples of such events include:
- the energy acceleration measure can identify this and so suspend noise-based initialisations, as shown in step 225 of FIG. 2, or force the use of default estimates.
- a communication unit includes an audio processing unit having a voice activity detection mechanism.
- the voice activity detection mechanism provides an indication of energy acceleration of a signal input to the communication unit and determines whether said input signal is speech or noise based on said indication.
- the method includes the steps of indicating the acceleration of an input signal to the communication unit; and determining whether said input signal is speech or noise based on said step of indicating.
- the method includes the step of deciding whether said input signal is speech or noise based on an energy acceleration, for example using a frame average or a rolling average of a number of input signals.
- the energy acceleration based voice activity detector and validator for noisy environments described above provides the advantages of noise robustness and fast response.
- the preferred embodiment uses a measure dependant upon energy acceleration, instead of an absolute measurement, the inventive concepts herein described can be applied to speech of any input level .
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Mobile Radio Communication Systems (AREA)
- Telephone Function (AREA)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2004-7011459A KR20040075959A (ko) | 2002-01-24 | 2003-01-10 | 잡음 환경들에 대한 음성 활동도 검출기 및 밸리데이터 |
JP2003562919A JP2005516247A (ja) | 2002-01-24 | 2003-01-10 | 雑音環境のための音声活動検出器及び有効化器 |
FI20041013A FI124869B (fi) | 2002-01-24 | 2004-07-22 | Ääniaktiviteetin tunnistin ja hyväksyjä kohinallisia ympäristöjä varten |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0201585.7 | 2002-01-24 | ||
GB0201585A GB2384670B (en) | 2002-01-24 | 2002-01-24 | Voice activity detector and validator for noisy environments |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003063138A1 true WO2003063138A1 (en) | 2003-07-31 |
Family
ID=9929648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2003/000271 WO2003063138A1 (en) | 2002-01-24 | 2003-01-10 | Voice activity detector and validator for noisy environments |
Country Status (6)
Country | Link |
---|---|
JP (2) | JP2005516247A (zh) |
KR (2) | KR100976082B1 (zh) |
CN (1) | CN1307613C (zh) |
FI (1) | FI124869B (zh) |
GB (1) | GB2384670B (zh) |
WO (1) | WO2003063138A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010048999A1 (en) * | 2008-10-30 | 2010-05-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Telephony content signal discrimination |
KR101196518B1 (ko) | 2011-04-05 | 2012-11-01 | 한국과학기술연구원 | 실시간 음성 활동 검출 장치 및 검출 방법 |
US8909522B2 (en) | 2007-07-10 | 2014-12-09 | Motorola Solutions, Inc. | Voice activity detector based upon a detected change in energy levels between sub-frames and a method of operation |
US20160267923A1 (en) * | 2015-03-09 | 2016-09-15 | Tomoyuki Goto | Communication apparatus, communication system, method of storing log data, and storage medium |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100657912B1 (ko) * | 2004-11-18 | 2006-12-14 | 삼성전자주식회사 | 잡음 제거 방법 및 장치 |
CN100543841C (zh) * | 2005-10-21 | 2009-09-23 | 神基科技股份有限公司 | 音源处理电路结构及其处理方法 |
JP4758879B2 (ja) * | 2006-12-14 | 2011-08-31 | 日本電信電話株式会社 | 仮音声区間決定装置、方法、プログラム及びその記録媒体、音声区間決定装置、方法 |
CN102044241B (zh) * | 2009-10-15 | 2012-04-04 | 华为技术有限公司 | 一种实现通信系统中背景噪声的跟踪的方法和装置 |
EP2561508A1 (en) * | 2010-04-22 | 2013-02-27 | Qualcomm Incorporated | Voice activity detection |
US8898058B2 (en) | 2010-10-25 | 2014-11-25 | Qualcomm Incorporated | Systems, methods, and apparatus for voice activity detection |
RU2544293C1 (ru) * | 2013-10-11 | 2015-03-20 | Сергей Александрович Косарев | Способ измерения физической величины с помощью мобильного электронного устройства и внешнего блока |
US9953661B2 (en) * | 2014-09-26 | 2018-04-24 | Cirrus Logic Inc. | Neural network voice activity detection employing running range normalization |
CN104575498B (zh) * | 2015-01-30 | 2018-08-17 | 深圳市云之讯网络技术有限公司 | 有效语音识别方法及系统 |
CN109841223B (zh) * | 2019-03-06 | 2020-11-24 | 深圳大学 | 一种音频信号处理方法、智能终端及存储介质 |
US11217262B2 (en) * | 2019-11-18 | 2022-01-04 | Google Llc | Adaptive energy limiting for transient noise suppression |
CN112820324B (zh) * | 2020-12-31 | 2024-06-25 | 平安科技(深圳)有限公司 | 多标签语音活动检测方法、装置及存储介质 |
KR102453919B1 (ko) | 2022-05-09 | 2022-10-12 | (주)피플리 | 인공지능 기반 문화 콘텐츠 관련 가이드 음원의 검증 방법, 장치 및 시스템 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0785419A2 (en) * | 1996-01-22 | 1997-07-23 | Rockwell International Corporation | Voice activity detection |
US6009391A (en) * | 1997-06-27 | 1999-12-28 | Advanced Micro Devices, Inc. | Line spectral frequencies and energy features in a robust signal recognition system |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IT1209561B (it) * | 1983-07-14 | 1989-08-30 | Gte Laboratories Inc | Rivelazione complementare della parola. |
JP2559475B2 (ja) * | 1988-09-22 | 1996-12-04 | 積水化学工業株式会社 | 音声検出方式 |
JPH03114100A (ja) * | 1989-09-28 | 1991-05-15 | Matsushita Electric Ind Co Ltd | 音声区間検出装置 |
JP3024447B2 (ja) * | 1993-07-13 | 2000-03-21 | 日本電気株式会社 | 音声圧縮装置 |
JP3109978B2 (ja) * | 1995-04-28 | 2000-11-20 | 松下電器産業株式会社 | 音声区間検出装置 |
JPH10171497A (ja) * | 1996-12-12 | 1998-06-26 | Oki Electric Ind Co Ltd | 背景雑音除去装置 |
US5946649A (en) * | 1997-04-16 | 1999-08-31 | Technology Research Association Of Medical Welfare Apparatus | Esophageal speech injection noise detection and rejection |
JP3297346B2 (ja) * | 1997-04-30 | 2002-07-02 | 沖電気工業株式会社 | 音声検出装置 |
JPH10327089A (ja) * | 1997-05-23 | 1998-12-08 | Matsushita Electric Ind Co Ltd | 携帯電話装置 |
JPH113091A (ja) * | 1997-06-13 | 1999-01-06 | Matsushita Electric Ind Co Ltd | 音声信号の立ち上がり検出装置 |
FR2768544B1 (fr) * | 1997-09-18 | 1999-11-19 | Matra Communication | Procede de detection d'activite vocale |
JP4221537B2 (ja) * | 2000-06-02 | 2009-02-12 | 日本電気株式会社 | 音声検出方法及び装置とその記録媒体 |
-
2002
- 2002-01-24 GB GB0201585A patent/GB2384670B/en not_active Expired - Lifetime
-
2003
- 2003-01-10 KR KR1020097022615A patent/KR100976082B1/ko not_active Expired - Lifetime
- 2003-01-10 JP JP2003562919A patent/JP2005516247A/ja active Pending
- 2003-01-10 WO PCT/EP2003/000271 patent/WO2003063138A1/en active Application Filing
- 2003-01-10 CN CNB038026821A patent/CN1307613C/zh not_active Expired - Lifetime
- 2003-01-10 KR KR10-2004-7011459A patent/KR20040075959A/ko not_active Ceased
-
2004
- 2004-07-22 FI FI20041013A patent/FI124869B/fi active IP Right Grant
-
2009
- 2009-11-02 JP JP2009251650A patent/JP2010061151A/ja active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0785419A2 (en) * | 1996-01-22 | 1997-07-23 | Rockwell International Corporation | Voice activity detection |
US6009391A (en) * | 1997-06-27 | 1999-12-28 | Advanced Micro Devices, Inc. | Line spectral frequencies and energy features in a robust signal recognition system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8909522B2 (en) | 2007-07-10 | 2014-12-09 | Motorola Solutions, Inc. | Voice activity detector based upon a detected change in energy levels between sub-frames and a method of operation |
WO2010048999A1 (en) * | 2008-10-30 | 2010-05-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Telephony content signal discrimination |
CN102272826A (zh) * | 2008-10-30 | 2011-12-07 | 爱立信电话股份有限公司 | 电话内容信号鉴别 |
US8407044B2 (en) | 2008-10-30 | 2013-03-26 | Telefonaktiebolaget Lm Ericsson (Publ) | Telephony content signal discrimination |
CN102272826B (zh) * | 2008-10-30 | 2015-10-07 | 爱立信电话股份有限公司 | 电话内容信号鉴别 |
KR101196518B1 (ko) | 2011-04-05 | 2012-11-01 | 한국과학기술연구원 | 실시간 음성 활동 검출 장치 및 검출 방법 |
US20160267923A1 (en) * | 2015-03-09 | 2016-09-15 | Tomoyuki Goto | Communication apparatus, communication system, method of storing log data, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN1623186A (zh) | 2005-06-01 |
GB2384670A (en) | 2003-07-30 |
KR100976082B1 (ko) | 2010-08-16 |
KR20090127182A (ko) | 2009-12-09 |
KR20040075959A (ko) | 2004-08-30 |
CN1307613C (zh) | 2007-03-28 |
FI124869B (fi) | 2015-02-27 |
JP2005516247A (ja) | 2005-06-02 |
GB2384670B (en) | 2004-02-18 |
GB0201585D0 (en) | 2002-03-13 |
FI20041013L (fi) | 2004-09-22 |
JP2010061151A (ja) | 2010-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2010061151A (ja) | 雑音環境のための音声活動検出器及び有効化器 | |
KR100944252B1 (ko) | 오디오 신호 내에서 음성활동 탐지 | |
US6810273B1 (en) | Noise suppression | |
KR101852892B1 (ko) | 음성 인식 방법, 음성 인식 장치 및 전자 장치 | |
US8977556B2 (en) | Voice detector and a method for suppressing sub-bands in a voice detector | |
US20020165711A1 (en) | Voice-activity detection using energy ratios and periodicity | |
US20080095384A1 (en) | Apparatus and method for detecting voice end point | |
US8751221B2 (en) | Communication apparatus for adjusting a voice signal | |
JP3878482B2 (ja) | 音声検出装置および音声検出方法 | |
US8924199B2 (en) | Voice correction device, voice correction method, and recording medium storing voice correction program | |
JP2007179073A (ja) | 音声活性検出装置及び移動局並びに音声活性検出方法 | |
EP3438979B1 (en) | Estimation of background noise in audio signals | |
EP2743923B1 (en) | Voice processing device, voice processing method | |
KR100848798B1 (ko) | 배경 노이즈의 고속 동적 추정을 위한 방법 | |
JPH05244105A (ja) | 音声検出方法および装置 | |
US8788265B2 (en) | System and method for babble noise detection | |
CN111128244B (zh) | 基于过零率检测的短波通信语音激活检测方法 | |
US6633847B1 (en) | Voice activated circuit and radio using same | |
KR101336203B1 (ko) | 전자기기에서 음성 검출 방법 및 장치 | |
Wang et al. | An effective voice activity detection algorithm in mobile communication corrupted by impulse noise | |
WO2007040883A2 (en) | Voice activity detector |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1878/DELNP/2004 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003562919 Country of ref document: JP Ref document number: 20041013 Country of ref document: FI |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038026821 Country of ref document: CN Ref document number: 1020047011459 Country of ref document: KR |
|
122 | Ep: pct application non-entry in european phase | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1020097022615 Country of ref document: KR |