CN111128244B - Short wave communication voice activation detection method based on zero crossing rate detection - Google Patents
Short wave communication voice activation detection method based on zero crossing rate detection Download PDFInfo
- Publication number
- CN111128244B CN111128244B CN201911414641.0A CN201911414641A CN111128244B CN 111128244 B CN111128244 B CN 111128244B CN 201911414641 A CN201911414641 A CN 201911414641A CN 111128244 B CN111128244 B CN 111128244B
- Authority
- CN
- China
- Prior art keywords
- frame
- audio data
- voice
- crossing rate
- short
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Noise Elimination (AREA)
Abstract
The invention discloses a short-wave communication voice activation detection method based on zero-crossing rate detection, which belongs to the technical field of voice detection, and effectively distinguishes voice and background noise by adopting an autocorrelation technology; the false detection and missing detection probability of the voice activation detection technology is effectively reduced by adopting a plurality of statistics and a plurality of judgment thresholds; the algorithm is simple and reliable, the calculation complexity is low, the real-time performance is good, the portability is high, and a plurality of processing platforms are provided; the zero crossing rate detection can resist single-tone interference, can effectively prevent the interference of single-tone and squeak on voice detection, and improves the reliability of judgment.
Description
Technical Field
The invention belongs to the technical field of short-wave communication, and particularly relates to a short-wave communication voice activation detection method based on zero crossing rate detection.
Background
The voice activation Detection technology (VAD), also called End-Point Detection (EPD), aims to correctly distinguish voice from various background noise, and has important application in the field of voice signal processing, especially in the field of acoustic signal processing. In speech recognition, a voiced segment and an unvoiced segment in a speech signal are generally segmented according to a certain endpoint detection algorithm, and then the voiced segment is recognized according to certain specific features of speech. Studies have shown that: even in a quiet environment, more than half of the recognition errors of the speech recognition system come from the endpoint detector. Therefore, as a first step in a speech recognition system, the criticality of endpoint detection cannot be ignored, especially in a speech endpoint detection in a strong background noise environment, and its accuracy directly influences whether subsequent work can be effectively performed to a great extent. The diversity of speech and background noise complicates the VAD problem.
Essentially, the root of the various VAD detection techniques is to find statistics that can effectively distinguish speech segments from a speech noise free background, and ultimately to a threshold decision. The conventional statistical feature quantities mainly used at present include: short-time energy, short-time zero-crossing rate, short-time autocorrelation function, information entropy, cepstrum, MEL coefficient and the like, and different VAD technologies are mostly based on different combinations of the methods. With the development of digital signal processing technology and the improvement of the computing capacity of corresponding processing equipment, new VAD algorithms such as wavelet transformation method, myopia entropy, support Vector Machine (SVM), neural network and the like are presented.
In general, the detection effect of a single statistical judgment is not ideal, and is often suitable for certain specific occasions. Because the background noise in different environments has larger change, and the voice changes along with the changes of the gender, age, language, tone, sound intensity, speech speed and the like of a speaker, the joint judgment criterion based on multiple statistics and multiple judgment thresholds becomes the direction of VAD detection research.
In a short-wave radio station, voice signal detection is a precondition for the short-wave radio station to finish squelching. The squelch is one of the basic functions of a radio station, and ensures that when a voice signal exists, the audio output of a receiver is turned on, and normal communication is maintained; and when no voice signal exists and only noise exists, the audio output is turned off. The basic process is that the presence or absence of a speech signal can be detected first, and then the audio output is controlled accordingly. In military small portable radio equipment, the VAD technology is effectively used to reduce power consumption in the voice-free section and prolong the service life of the equipment due to the limitation of power consumption.
Due to the limitation of the computing power and the power consumption of the using equipment, the adopted VAD algorithm cannot be too complex, and meanwhile, the processing delay (mainly the judgment delay of the occurrence of the voice and the end of the voice) cannot be too large, namely, the VAD algorithm has near real-time processing capability. In addition, the method should work normally in complex background noise, and has certain self-adaptation performance, and the factors lead to the VAD algorithm to be simple to realize and reliable to detect. Therefore, it is necessary to find a voice detection method with relatively simple calculation and relatively reliable detection results.
The currently used short wave voice detection method comprises the following steps: (1) Based on a combination of short-time energy and short-time average amplitude, the method is based on the amplitude change of the voice signal with time. The amplitude of the unvoiced segments is small, and the energy is concentrated in the high frequency band; the amplitude of the voiced sound section is larger, and the energy is concentrated in the low frequency section; (2) A detection method based on pre-emphasis and standard deviation comparison with a preset threshold.
In the active method, one major problem of the short-time energy function is that En is too sensitive to signal level values; in practical applications (e.g., pointing devices) it is easy to overflow due to the need to calculate the sum of squares of the signal samples. Therefore, the En is generally replaced by an average amplitude function Mn. However, at this time, the Mn of unvoiced and voiced, voiced and unvoiced is not as pronounced as the short-time energy En. Therefore, the phenomena of voice flushing failure, noise silence, and the like often occur in the practical application process, and the environmental impact is larger. The same voice activation detection method requires readjustment of parameters after changing the environment. Meanwhile, when the existing voice detection method is used for unwanted signals such as noise and single tones in the signals, the energy of the single tones and the noise is concentrated in a low frequency band, so that the signals can be misjudged to be voice through energy judgment, and misjudgment exists.
Disclosure of Invention
In order to solve the problems, the invention aims to provide a short-wave communication voice activation detection method based on zero-crossing rate detection.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
A short wave communication voice activation detection method based on zero crossing rate detection comprises the following steps:
step 1, acquiring an audio data acquisition stream, namely N frames of audio data x (N), wherein the length of each frame of data is N, and sequentially carrying out band-pass filtering, framing windowing processing and normalization processing on the audio data acquisition stream to obtain corresponding N frames of preprocessed audio data x "(N);
step 2, calculating a short-time correlation value and an average value of the pre-processed audio data of each frame, judging whether the correlation value of each point of the pre-processed audio data of each frame is more than 3 times of the average value of the correlation value of the pre-processed audio data of the frame, if so, setting the correlation value of the point to 0, otherwise, turning to step 3;
step 3, calculating standard deviation std (stat) of the audio data after each frame pretreatment m ;
Step 4, detecting zero crossing rate of each frame of audio data processed in the step 2 to obtain average zero crossing rate corresponding to each frame;
step 5, judging whether there is standard deviation std (stat) of the audio data after the continuous M-frame preprocessing m If the voice data is not smaller than the preset first-level threshold, judging that the voice data is input, and switching to the step 6; otherwise, judging that no voice is input;
step 6, judging whether the average zero crossing rate of the audio data after the continuous S frame pretreatment is not smaller than a preset second-level threshold, if yes, judging that the input is voice, otherwise, judging that the input is noise; to this end, voice activation detection is completed.
Further, the framing and windowing process is as follows: for each frame of audio data after band-pass filtering, intercepting a section of sampling point in the middle of each frame by adopting a window function to serve as data after windowing; i.e.
x m '(N)=x m (N).*Ham min g(N)
Wherein x is m (N) is m-th frame band-pass filtered audio data, m=1, 2, …, N; ham min g (N) is a Hamming window function of length N.
Further, the normalization process specifically includes:
first, data x per frame in the windowed data x' (N) is calculated m Average value of N sample points of' (N)
Secondly, comparing the preset experience value a with the average value,and (3) obtaining a correction factor: factor x m '(N)=a/(meanx m '(N));
Finally, a correction factor x is adopted m And (N) normalizing the windowed data of each frame to obtain preprocessed audio data: x is x m ”(N)=factorx m '(N)*x m '(N)。
Further, the short-time autocorrelation value R of the audio data after preprocessing each frame is calculated m (k) And its mean (R) m (k) The calculation formula of (c) is:
wherein i represents the i-th sampling point; x is x m "(i) represents the ith sample point, x, of the m-th frame pre-processed audio data m "(i+k) denotes a sample point after the audio data delay k time after the mth frame preprocessing;
Further, zero-crossing rate detection is performed on each frame of audio data processed in the step 2, which specifically includes:
since the audio data is a wideband non-stationary signal, the calculation formula of the short-time average zero-crossing rate is as follows:
wherein, I.S. is absolute value, sgn is sign function,
sgn[R m (k)]=1 R m (k)>0
sgn[R m (k)]=0 R m (k)=0
sgn[R m (k)]=-1 R m (k)<0
when the signs of two adjacent sampling points are the same, zero crossing is not generated; when the sign of two adjacent sampling points is opposite, |sgn [ R ] m (k)]-sgn[R m (k-1)]|=2, so for each frame of data, the sum is divided by 2N to give the average zero crossing rate.
Further, if there is no voice input for 3 seconds, the voice output is turned off.
Compared with the prior art, the invention has the beneficial effects that: the invention adopts the autocorrelation technology to effectively distinguish voice from background noise; the false detection and missing detection probability of the VAD is effectively reduced by adopting a plurality of statistics and a plurality of judgment thresholds; the algorithm is simple and reliable, the calculation complexity is low, the real-time performance is good, the portability is high, and a plurality of processing platforms are provided; the zero crossing rate detection can resist single-tone interference, can effectively prevent the interference of single-tone and squeak on voice detection, and improves the reliability of judgment.
The invention detects the voice activation through short-time autocorrelation, standard deviation and zero crossing rate detection, improves the comfort experience of short-wave communication voice communication, and greatly reduces the probability of noise silence and voice break-over threshold in the practical application process. The applicability is improved for the short wave communication voice squelch function.
Drawings
The invention will now be described in further detail with reference to the drawings and to specific examples.
FIG. 1 is a block flow diagram of an implementation of the present invention.
Detailed Description
Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.
Due to the limitation of the computing power and the power consumption of the using equipment, the adopted VAD algorithm cannot be too complex, and meanwhile, the processing delay (mainly the judgment delay of the occurrence of the voice and the end of the voice) cannot be too large, namely, the VAD algorithm has near real-time processing capability. In addition, the method should work normally in complex background noise, and has certain self-adaptation performance, and the factors lead to the VAD algorithm to be simple to realize and reliable to detect.
Based on the application requirements, referring to fig. 1, the short-wave communication voice activation detection method based on zero-crossing rate detection provided by the invention specifically comprises the following steps:
step 1, acquiring an audio data acquisition stream, namely N frames of audio data x (N), wherein the length of each frame of data is N, and sequentially carrying out band-pass filtering, framing windowing processing and normalization processing on the audio data acquisition stream to obtain corresponding N frames of preprocessed audio data x "(N);
the specific framing and windowing processing is as follows: for each frame of audio data after band-pass filtering, intercepting a section of sampling point in the middle of each frame by adopting a window function to serve as data after windowing; i.e.
x m '(N)=x m (N).*Hamming(N)
Wherein x is m (N) is m-th frame band-pass filtered audio data, m=1, 2, …, N; ham min g (N) is a Hamming window function of length N. Specifically, the length of each frame of data is n=256 samples, and the middle 200 points of 256 are just intercepted from 28 points to 228 points, so as to reduce interference among frequency domains in the framing process.
The specific normalization process is as follows:
first, data x per frame in the windowed data x' (N) is calculated m Average value of N sample points of' (N)
Secondly, comparing a preset experience value a (the value is an experience value obtained in the actual environment debugging process) with the average value to obtain a correction factor: factor x m '(N)=a/(mean x m '(N));
Finally, a correction factor x is adopted m ' (N) normalizing the windowed data of each frame to obtain a preprocessed dataAudio data: x is x m ”(N)=factor x m '(N)*x m '(N)。
Step 2, calculating a short-time autocorrelation value and an average value of the audio data after each frame pretreatment, judging whether the correlation value of each point of the audio data after each frame pretreatment is more than 3 times of the average value of the correlation values of the audio data after the frame pretreatment, if so, setting the correlation value of the point to 0, otherwise, turning to step 3;
the correlation function is used to determine the similarity of two signals in the time domain, and when the correlation function of two signals is large, it is explained that one signal may be a time lag or advance of the other signal; when the correlation function is 0, then the two signals are completely different. The purpose of eliminating noise is achieved by utilizing the correlation of signals.
The autocorrelation function reflects the degree of similarity of the signal to the signal itself after a delay.
Short-time autocorrelation value R of audio data after each frame pretreatment m (k) And its mean (R) m (k) The calculation formula of (c) is:
wherein i represents the i-th sampling point; x is x m "(i) represents the ith sample point, x, of the m-th frame pre-processed audio data m "(i+k) denotes a sample point after the audio data delay k time after the mth frame preprocessing.
Step 3, calculating standard deviation std (stat) of the audio data after each frame pretreatment m ;
In practice, std (stat) is required m The amplitude of the data is slightly adjusted to prevent the data from overflowing in the subsequent calculation process.
Step 4, detecting zero crossing rate of each frame of audio data processed in the step 2 to obtain average zero crossing rate corresponding to each frame;
zero-crossing rate Z n Is the case where the signal is defined to cross the horizontal axis. For continuous signals, observing the condition that the voice time domain waveform passes through the transverse axis; for discrete signals, adjacent sample values have different algebraic signs, i.e. the number of times the sample changes sign.
Since the audio data is a wideband non-stationary signal, the calculation formula of the short-time average zero-crossing rate is as follows:
wherein, I.S. is absolute value, sgn is sign function,
sgn[R m (k)]=1 R m (k)>0
sgn[R m (k)]=0 R m (k)=0
sgn[R m (k)]=-1 R m (k)<0
when the signs of two adjacent sampling points are the same, zero crossing is not generated; when the sign of two adjacent sampling points is opposite, |sgn [ R ] m (k)]-sgn[R m (k-1)]|=2, so for each frame of data, the sum is divided by 2N to give the average zero crossing rate.
Step 5, judging whether there is standard deviation std (stat) of the audio data after the continuous M-frame preprocessing m If the voice data is not smaller than the preset first-level threshold, judging that the voice data is input, and switching to the step 6; otherwise, judging that no voice is input; wherein M is more than or equal to 3.
Will std (stat) m And comparing with a preset first-level threshold. Continuously calculating a plurality of frames (the number of continuous frames is generally selected to be 1 according to the calculated amount)Preferably between 0 and 30 frames), the std (stat) calculated for each frame m And comparing with a preset first-level threshold, counting the number of frames which are not smaller than the first-level threshold in the selected continuous frames, and recording the value as peak_count.
When more than 3 frames appear in the continuous frames, the judgment is passed once, otherwise, the judgment is not passed once, namely, the judgment is passed once when the peak_count is more than or equal to 3, otherwise, the judgment is not passed. And returning to the step 1 to continue voice recognition when the decision is not passed once.
Step 6, judging whether the average zero crossing rate of the audio data after the continuous S frame pretreatment is not less than a preset secondary threshold, if yes, judging that the input is voice, otherwise, judging that the input is noise; to this end, voice activation detection is completed. Wherein S is more than or equal to 5.
Only after the first decision, the second decision is made. The secondary judgment is that if the zero crossing rate in continuous multi-frame data (5 continuous frames are defined in the actual use process of the scheme) is not smaller than a preset secondary threshold, the voice is indicated at the moment, otherwise, the noise is judged. And when the voice is judged, setting a relevant identifier, storing the data into a corresponding buffer zone, and outputting the voice data when the timing interruption of 20ms is met.
Considering the voice interval and duration, and closing voice output when the duration is about 3s less than the judgment requirement and the duration 3s standard variance is smaller than a preset threshold value;
after the voice detection is finished, the voice data is output after self-adaptive filtering, and the non-voice data is not processed, so that the purpose is to enhance the voice effect and improve the comfort level of voice.
The invention considers the randomness of noise, the average of the autocorrelation value is smaller, and the standard deviation is also smaller. In contrast, the autocorrelation value of a speech signal is large on average, the standard deviation thereof is also large, and the variance variation of autocorrelation between different frame signals of the speech signal is also large. Therefore, the presence or absence of speech is determined by using the feature of the variance of the autocorrelation and the corresponding statistics, and VAD detection is performed.
Typically, the voice sampling frequency is 9.6kHz, the data frame length is 20ms (the voice signal is generally considered to be substantially stationary in 10 ms-30 ms), and the number of processing points per time is 256 points. To prevent erroneous judgment of noise as speech, a secondary judgment is added. The method has certain expansibility, can adopt double thresholds or even multiple thresholds on the basis of the algorithm, sets the upper and lower boundaries of the thresholds, improves the detection accuracy, and has the cost of properly increasing the implementation complexity. The present invention relates generally to digital processing of speech signals, assuming that corresponding pre-processing, such as low-pass filtering, gain amplification, etc., has been performed prior to VAD processing.
The invention adopts the autocorrelation technology to effectively distinguish voice from background noise; the false detection and missing detection probability of the VAD is effectively reduced by adopting a plurality of statistics and a plurality of judgment thresholds; the algorithm is simple and reliable, the calculation complexity is low, the real-time performance is good, the portability is high, and a plurality of processing platforms are provided; zero crossing rate detection and single-tone interference resistance can effectively prevent single-tone and squeak from interfering with voice detection, and the reliability of judgment is improved.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (7)
1. The short-wave communication voice activation detection method based on zero-crossing rate detection is characterized by comprising the following steps of:
step 1, acquiring an audio data acquisition stream, namely N frames of audio data x (N), wherein the length of each frame of data is N, and sequentially carrying out band-pass filtering, framing windowing processing and normalization processing on the audio data acquisition stream to obtain corresponding N frames of preprocessed audio data x "(N);
step 2, calculating a short-time correlation value and an average value of the pre-processed audio data of each frame, judging whether the correlation value of each point of the pre-processed audio data of each frame is more than 3 times of the average value of the correlation value of the pre-processed audio data of the frame, if so, setting the correlation value of the point to 0, otherwise, turning to step 3;
step 3, calculating standard deviation std (stat) of the audio data after each frame pretreatment m ;
Step 4, detecting zero crossing rate of each frame of audio data processed in the step 2 to obtain average zero crossing rate corresponding to each frame;
step 5, judging whether there is standard deviation std (stat) of the audio data after the continuous M-frame preprocessing m If the voice data is not smaller than the preset first-level threshold, judging that the voice data is input, and switching to the step 6; otherwise, judging that no voice is input;
step 6, judging whether the average zero crossing rate of the audio data after the continuous S frame pretreatment is not smaller than a preset second-level threshold, if yes, judging that the input is voice, otherwise, judging that the input is noise; to this end, voice activation detection is completed.
2. The short wave communication voice activation detection method based on zero crossing rate detection according to claim 1, wherein the framing and windowing process is as follows: for each frame of audio data after band-pass filtering, intercepting a section of sampling point in the middle of each frame by adopting a window function to serve as data after windowing; i.e.
x m '(N)=x m (N).*Hamming(N)
Wherein x is m (N) is m-th frame band-pass filtered audio data, m=1, 2, …, N; hamming (N) is a Hamming window function of length N.
3. The short wave communication voice activation detection method based on zero crossing rate detection according to claim 1, wherein the normalization process specifically comprises:
first, data x per frame in the windowed data x' (N) is calculated m Average value of N sample points of' (N)
Secondly, comparing a preset experience value a with the average value to obtain a correction factor: factor x m '(N)=a/(meanx m '(N));
Finally, a correction factor x is adopted m And (N) normalizing the windowed data of each frame to obtain preprocessed audio data: x is x m ”(N)=factorx m '(N)*x m '(N)。
4. The method for detecting the activation of voice in short-wave communication based on zero-crossing rate detection according to claim 1, wherein the short-time autocorrelation value R of the audio data after each frame of preprocessing is calculated m (k) And its mean (R) m (k) The calculation formula of (c) is:
wherein i represents the i-th sampling point; x is x m "(i) represents the ith sample point, x, of the m-th frame pre-processed audio data m "(i+k) denotes a sample point after the audio data delay k time after the mth frame preprocessing.
6. The short wave communication voice activation detection method based on zero crossing rate detection according to claim 1, wherein the zero crossing rate detection is performed on each frame of audio data processed in step 2, and specifically comprises the following steps:
since the audio data is a wideband non-stationary signal, the calculation formula of the short-time average zero-crossing rate is as follows:
wherein, I and R are absolute values m (k) A short-time autocorrelation value of the audio data after the m-th frame pretreatment; sgn [.]As a function of the sign of the symbol,
sgn[R m (k)]=1 R m (k)>0
sgn[R m (k)]=0 R m (k)=0
sgn[R m (k)]=-1 R m (k)<0
when the signs of two adjacent sampling points are the same, zero crossing is not generated; when the sign of two adjacent sampling points is opposite, |sgn [ R ] m (k)]-sgn[R m (k-1)]|=2。
7. The method for detecting voice activation of short-wave communication based on zero-crossing rate detection according to claim 1, wherein if there is no voice input for 3 seconds continuously, the voice output is turned off.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911414641.0A CN111128244B (en) | 2019-12-31 | 2019-12-31 | Short wave communication voice activation detection method based on zero crossing rate detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911414641.0A CN111128244B (en) | 2019-12-31 | 2019-12-31 | Short wave communication voice activation detection method based on zero crossing rate detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111128244A CN111128244A (en) | 2020-05-08 |
CN111128244B true CN111128244B (en) | 2023-05-02 |
Family
ID=70506643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911414641.0A Active CN111128244B (en) | 2019-12-31 | 2019-12-31 | Short wave communication voice activation detection method based on zero crossing rate detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111128244B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115699173B (en) * | 2020-06-16 | 2024-11-29 | 华为技术有限公司 | Voice activity detection method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN87100209A (en) * | 1987-01-10 | 1987-10-21 | 上海工业大学 | The method of digital phonemic tone conversion and device |
CN102194452A (en) * | 2011-04-14 | 2011-09-21 | 西安烽火电子科技有限责任公司 | Voice activity detection method in complex background noise |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7769585B2 (en) * | 2007-04-05 | 2010-08-03 | Avidyne Corporation | System and method of voice activity detection in noisy environments |
-
2019
- 2019-12-31 CN CN201911414641.0A patent/CN111128244B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN87100209A (en) * | 1987-01-10 | 1987-10-21 | 上海工业大学 | The method of digital phonemic tone conversion and device |
CN102194452A (en) * | 2011-04-14 | 2011-09-21 | 西安烽火电子科技有限责任公司 | Voice activity detection method in complex background noise |
Non-Patent Citations (1)
Title |
---|
徐治 ; .三门限多级判决语音激活检测算法的研究.电子技术.2015,(05),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111128244A (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ramırez et al. | Efficient voice activity detection algorithms using long-term speech information | |
Moattar et al. | A simple but efficient real-time voice activity detection algorithm | |
Parchami et al. | Recent developments in speech enhancement in the short-time Fourier transform domain | |
JP5905608B2 (en) | Voice activity detection in the presence of background noise | |
CN101197130B (en) | Sound activity detecting method and detector thereof | |
Ramirez et al. | Voice activity detection. fundamentals and speech recognition system robustness | |
US6993481B2 (en) | Detection of speech activity using feature model adaptation | |
CN102194452B (en) | Voice activity detection method in complex background noise | |
Evangelopoulos et al. | Multiband modulation energy tracking for noisy speech detection | |
WO2008058842A1 (en) | Voice activity detection system and method | |
JP2006079079A (en) | Distributed speech recognition system and its method | |
CN112951259B (en) | Audio noise reduction method and device, electronic equipment and computer readable storage medium | |
Zaw et al. | The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection | |
CN105023572A (en) | Noised voice end point robustness detection method | |
CN102667927A (en) | Method and background estimator for voice activity detection | |
CN110782910A (en) | A Howling Audio Detection System with High Detection Rate | |
Khoa | Noise robust voice activity detection | |
US20220301582A1 (en) | Method and apparatus for determining speech presence probability and electronic device | |
Özaydın | Examination of energy based voice activity detection algorithms for noisy speech signals | |
CN111128244B (en) | Short wave communication voice activation detection method based on zero crossing rate detection | |
Górriz et al. | An effective cluster-based model for robust speech detection and speech recognition in noisy environments | |
Moattar et al. | A new approach for robust realtime voice activity detection using spectral pattern | |
Ramırez et al. | A new adaptive long-term spectral estimation voice activity detector | |
Khoury et al. | I-Vectors for speech activity detection. | |
Dov et al. | Voice activity detection in presence of transients using the scattering transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |