[go: up one dir, main page]

CN103646649A - High-efficiency voice detecting method - Google Patents

High-efficiency voice detecting method Download PDF

Info

Publication number
CN103646649A
CN103646649A CN201310743203.5A CN201310743203A CN103646649A CN 103646649 A CN103646649 A CN 103646649A CN 201310743203 A CN201310743203 A CN 201310743203A CN 103646649 A CN103646649 A CN 103646649A
Authority
CN
China
Prior art keywords
audio
speech
segment
frame
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310743203.5A
Other languages
Chinese (zh)
Other versions
CN103646649B (en
Inventor
陶建华
刘斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Extreme Element Hangzhou Intelligent Technology Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201310743203.5A priority Critical patent/CN103646649B/en
Publication of CN103646649A publication Critical patent/CN103646649A/en
Application granted granted Critical
Publication of CN103646649B publication Critical patent/CN103646649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a high-efficiency voice detecting method. The method comprises the following steps: analyzing the short-time energy and the short-time zero-crossing rate of an original audio frequency on a time domain and removing parts of non-voice signals; analyzing the spectral envelop characteristic and the entropy characteristic of a preserved audio frequency signal subband on a frequency domain and further removing parts of non-voice signals; forming an audio frequency segment by continuous frames with similar characteristics in each preserved frame of audio frequency signals; calculating the average value of Mel-frequency Cepstral coefficient of each frame in each audio frequency, respectively inputting the average values into a voice gaussian mixture model and various non-voice gaussian mixture models, and performing band-level judgment on whether the audio frequency segment contains voice data according to the output probability of each model, thereby finally obtaining a voice detecting result. The method is capable of detecting voice signals from audio frequency data streams under various complex environments, and positioning the boundary between voice segment data and non-voice segment data relatively correctly.

Description

A kind of efficient speech detection method
Technical field
The present invention relates to Intelligent Information Processing field, especially a kind of efficient speech detection method.
Background technology
Voice are one of Main Means of mankind's exchange of information, and speech detection technology occupies consequence always in field of voice signal; Speech detection system is as pretreatment module such as speech recognition, Speaker Identification, voice codings, and its robustness will directly affect the performance of other speech processing module.How random noise in the face of under various complex environments, navigate to speech segments accurately by a kind of efficient means, effectively distinguishes voice and non-speech audio, become current study hotspot both domestic and external, is more and more subject to extensive concern.Speech detection system has great practical value, and high-quality robust speech detection technique has all obtained general application in various communication systems, multimedia system, speech recognition system and Voiceprint Recognition System.
The speech detection method of main flow mainly comprises speech detection method and the speech detection method based on model based on parameter at present.Speech detection method based on parameter is analyzed voice signal from signals layer, in time domain, frequency domain or other transform domain, calculates speech parameter, by arranging in rational threshold test audio stream whether comprise voice; Conventional speech parameter comprises energy proportion, harmonic components of short-time energy, short-time zero-crossing rate, each frequency band etc.Speech detection method based on model, by extensive speech data training pattern, is distinguished voice signal and various non-speech audio accurately by intelligentized mathematical model; Conventional method comprises speech detection method based on gauss hybrid models, the speech detection method based on artificial neural network, the speech detection method based on Hidden Markov Model (HMM) etc.Speech detection method based on model need to mark to train reliable speech detection model to large-scale data, belongs to the speech detection method that has supervision; Speech detection method based on parameter, without training mathematical model, belongs to unsupervised speech detection method.The speech detection method of current various main flows can detect fast and accurately voice signal under various quiet environment; Under stationary noise environment and under the nonstationary noise environment of various high s/n ratios, speech detection system has higher accuracy rate; But in the face of the various Non-Stationary random noises under various complex environments, the hydraulic performance decline of speech detection system is serious.
Summary of the invention
For solving above-mentioned one or more problems, the invention provides a kind of efficient speech detection method, under various complex environments, voice signal can from audio stream, be detected fast and accurately, location speech segments that can be relatively accurate and the border between non-speech segment data.
A kind of speech detection method provided by the invention comprises the following steps:
Step S10, obtains original audio, analyzes short-time energy and the short-time zero-crossing rate of described original audio in time domain, by short-time energy and short-time zero-crossing rate, rejects the part non-speech audio in original audio;
Step S20, the sound signal remaining for described step S10 is analyzed the spectrum envelope characteristic of its subband and the entropy characteristic of subband on frequency domain, further rejects the part non-speech audio in described sound signal;
Step S30, for the sound signal of the respectively frame to be screened remaining, forms an audio section by continuous some frames of feature similarity;
Step S40, for each audio section to be screened, for whether comprising the decision-making of the speech data section of carrying out level in this audio section, finally obtains speech detection result by gauss hybrid models.
From technique scheme, can find out, the invention provides a kind of speech detection method of efficient robust, it has following beneficial effect:
(1) speech detection method provided by the invention can be applied to the front-end module of various speech recognition systems, by this module, can reject accurately the non-speech data in audio stream to be identified, improves efficiency and the robustness of speech recognition system;
(2) speech detection method provided by the invention can be applied to the front-end module of various speech coding systems, by this module, can locate accurately the border of speech segments and non-speech segment data, speech coding system is only transmitted speech segments, improve communication efficiency;
(3) speech detection method provided by the invention can steadily and under Non-Stationary random noise environment detect speech data fast and accurately various; Can effectively distinguish voice signal and various non-speech audio, not be subject to the restriction of speaker, environment and languages.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of speech detection method according to an embodiment of the invention;
Fig. 2 is the process flow diagram of time-domain analysis part in speech detection method according to an embodiment of the invention;
Fig. 3 is the process flow diagram of speech detection method frequency domain analysis part according to an embodiment of the invention;
Fig. 4 is the process flow diagram of speech detection method sound intermediate frequency frame cluster part according to an embodiment of the invention;
Fig. 5 in speech detection method according to an embodiment of the invention by the process flow diagram of the gauss hybrid models section of carrying out grade decision-making;
Fig. 6 is the process flow diagram of the off-line training process of gauss hybrid models in speech detection method according to an embodiment of the invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
It should be noted that, in accompanying drawing or instructions description, similar or identical part is all used identical figure number.The implementation that does not illustrate in accompanying drawing or describe is form known to a person of ordinary skill in the art in affiliated technical field.In addition, although the demonstration of the parameter that comprises particular value can be provided herein, should be appreciated that, parameter is without definitely equaling corresponding value, but can in acceptable error margin or design constraint, be similar to corresponding value.
The present invention proposes a kind of efficient speech detection mechanism.This mechanism is carried out the speech detection in two stages to audio stream.First by temporal signatures and frequency domain character, original audio is divided into non-speech data and data to be screened, then by sound spectrograph feature, treat examination data and carry out segmentation, by the gauss hybrid models of speech data and the gauss hybrid models of non-speech data, carry out piecemeal speech detection.
Generally speaking, described speech detection method comprises time-domain analysis step, frequency-domain analysis step, Audio clustering step and section level steps in decision-making, Fig. 1 is the process flow diagram of speech detection method according to an embodiment of the invention, and as shown in Figure 1, described speech detection method comprises the following steps:
Step S10, obtains original audio, analyzes short-time energy and the short-time zero-crossing rate of described original audio in time domain, by short-time energy and short-time zero-crossing rate, rejects the part non-speech audio in original audio;
Utilize short-time energy effectively to detect voiced sound, utilize short-time zero-crossing rate effectively to detect voiceless sound, merge these two kinds of parameters and just can effectively reject part non-speech audio.
Fig. 2 is the process flow diagram of time-domain analysis part in speech detection method according to an embodiment of the invention, and as shown in Figure 2, described step S10 further comprises the steps:
Step S11, is uniformly-spaced divided into some frames by original audio, calculates short-time energy and the short-time zero-crossing rate of every frame original audio;
Step S12, the short-time energy of every frame original audio and short-time zero-crossing rate are compared with predefined low, high two thresholdings respectively, according to comparative result, every frame original audio is divided into quiet section, transition section and voice segments, remove quiet section and transition section signal in described original audio, only retain voice segments signal.
Described the short-time energy of every frame original audio and short-time zero-crossing rate are compared with predefined low, high two thresholdings respectively, the step that every frame original audio is divided into quiet section, transition section and voice segments according to comparative result is specially: if described short-time energy or short-time zero-crossing rate surpass low threshold, mark enters transition section; In transition section, if all falling back to low threshold, two parameters enter into quiet section with next; In transition section, if any one in two parameters surpasses high threshold, think and enter voice segments; In voice segments, if two parameters all drop to below low threshold, and the duration surpass a predetermined threshold, think that voice segments finishes.
Step S20, the sound signal remaining for described step S10 is analyzed the spectrum envelope characteristic of its subband and the entropy characteristic of subband on frequency domain, further rejects the part non-speech audio in described sound signal;
The spectrum envelope characteristic of analyzing subband on frequency domain comprises the following steps:
First, described sound signal is divided into some subbands;
Then, in the frequency range of each subband, carry out respectively bandpass filtering, obtain the sound signal of each subband;
Then, each subband sound signal is carried out to Hilbert transform, obtain the spectrum envelope of each subband;
Finally, the statistical property with its spectrum envelope signal of Substrip analysis that comprises more noise contribution to the subband that comprises obvious resonance peak characteristic.
The statistical property of described spectrum envelope signal comprises average and the variance of spectrum envelope, concrete calculative being characterized as: the subband spectrum envelope variance that (1) comprises obvious resonance peak characteristic; (2) the equal value difference of the subband spectrum envelope that comprises obvious resonance peak characteristic and the subband spectrum envelope that comprises more noise contributions.
The entropy characteristic of analyzing subband on frequency domain comprises the following steps:
First, under long span pattern, utilize present frame and the some frames that are adjacent to calculate the entropy of each frequencies of present frame;
Then, within the scope of particular sub-band the average of statistical entropy and variance to determine the complexity of current speech frame.
Subband spectrum envelope characteristic under the short span pattern of fusion and the subband entropy characteristic under long span pattern just can further be rejected part non-speech audio like this, are specially:
For every frame voice signal, utilize the spectrum envelope characteristic of subband and the entropy characteristic of subband, under the ground unrest of various complexity, voice signal is carried out to frequency-domain analysis, and then voice signal and non-speech audio are classified, further reject part non-speech audio.
Fig. 3 is the process flow diagram of speech detection method frequency domain analysis part according to an embodiment of the invention, as shown in Figure 3, according to the entropy characteristic of the spectrum envelope characteristic of subband and subband, the step of further rejecting the part non-speech audio in described sound signal comprises the following steps:
Step S21, for every frame voice signal, first it is carried out to high-pass filtering to remove the interference of power frequency component, in an embodiment of the present invention, described Hi-pass filter is selected 4 rank Chebyshev's Hi-pass filters, then the sound signal through high-pass filtering is carried out to windowing process, in an embodiment of the present invention, window function is selected Hamming window;
Step S22, sound signal after windowing process is divided into N frequency range, in an embodiment of the present invention, described sound signal is divided into 0-500Hz, 500-1000Hz, 1000-2000Hz, 2000-3000Hz and 3000-4000Hz be totally five frequency ranges, in these band limits, respectively described sound signal carried out to bandpass filtering, obtains the sound signal of N subband, in an embodiment of the present invention, bandpass filter adopts 6 rank Butterworth filters;
Step S23, carries out Hilbert transform to the sound signal of each subband, obtains corresponding spectrum envelope signal;
For voiced sound signal, the spectrum envelope of 500-1000Hz frequency band comprises obvious resonance peak characteristic; And under band is made an uproar environment, the spectrum envelope of 3000-4000Hz frequency band comprises more noise contribution, in an embodiment of the present invention, only 500-1000Hz and two son bands of 3000-4000Hz are carried out to Hilbert transform.
Step S24, the spectrum envelope signal that described step S23 is obtained carries out statistical characteristic analysis, calculates their average and variance within the scope of respective sub-bands, and then obtains spectrum envelope judgement output;
If μ 1the average that represents 500-1000Hz subband spectrum envelope, μ 2the average that represents 3000-4000Hz subband spectrum envelope, σ 1and σ 2the variance that represents respectively above-mentioned two subbands, arranges spectrum envelope judgement and is output as VAD envelope, it can be expressed as:
VAD envelope2-(μ 25),
This step, just by the analysis of antithetical phrase band spectrum envelope, has obtained judgement output VAD like this envelope.
Step S25, calculates Fourier modulus spectrum to the sound signal of current frame voice frequency signal and adjacent some frames, obtains Fourier's amplitude of each Frequency point of different frame; For different Frequency points, utilize adjacent some frames to calculate present frame at the entropy at this Frequency point place; Within the scope of the subband that comprises obvious resonance peak characteristic, (in an embodiment of the present invention, selection 500-1000Hz frequency band) calculates the variance of each Frequency point entropy, as long span judgement output VAD entropy;
Step S26, merges two judgement outputs that described step S24 and step S25 obtain and comprehensively adjudicates, and obtains final frequency domain decision result VAD freq, be expressed as:
VAD freq1VAD entropy2VAD entropy
If frequency domain decision result VAD freqhigher than a threshold value, this frame is labeled as to speech frame, if VAD freqlower than this threshold value, this frame is labeled as to non-speech frame, in addition, the data that are labeled as speech frame need to be carried out extended length, and the start frame of voice segments is expanded forward to 3 frames, and the ending frame of voice segments is expanded to 3 frames backward.
Sound signal after processing like this has just further been rejected part non-speech audio.
Step S30, for the sound signal of the respectively frame to be screened remaining, forms an audio section by continuous some frames of feature similarity, follow-uply take audio section and carries out speech detection as unit;
Fig. 4 is the process flow diagram of speech detection method sound intermediate frequency frame cluster part according to an embodiment of the invention, and as shown in Figure 4, described step S30 is further comprising the steps:
Step S31, for the sound signal of frame to be screened respectively, considers human auditory system apperceive characteristic, in Mel territory, described sound signal is divided into some subbands, by Mel wave filter, obtains the sound signal of each subband;
Step S32, every frame sound signal is calculated to the entropy of each subband, to measure the proportion of each sub belt energy, the weight of each subband is set according to auditory perception property, the low frequency sub-band weight that can reflect resonance peak characteristic is relatively large, and the weight of high-frequency sub-band is relatively little;
Step S33, the entropy of each subband of take is characteristic parameter, calculate the similarity of adjacent speech frame, in computation process, consider the weight of each subband, then according to metric function conventional in prior art, the consecutive frame of feature similarity is classified as to an audio section, for each frame data in each audio section, the distance between them is less than threshold value T.
By said method, just can the entropy based on each subband of speech frame sound signal be divided into some audio sections, in each audio section, comprise similar speech frame, follow-uply take audio section and carry out speech detection as unit.
Step S40, for each audio section to be screened, the average of each frame Mel cepstrum coefficient in difference compute segment, the Mean Parameters obtaining is input to respectively in voice gauss hybrid models and various non-voice gauss hybrid models, according to the output probability of each model, for whether comprising the decision-making of the speech data section of carrying out level in this audio section, finally obtain speech detection result.
Fig. 5 in speech detection method according to an embodiment of the invention by the process flow diagram of the gauss hybrid models section of carrying out grade decision-making, as shown in Figure 5, described step S40 is specially: first extract the M rank of each frame in audio section to be screened such as the static Mel cepstrum coefficient in 13 rank, then calculate respectively their first order difference and second order difference, finally obtain 3*M Jan Vermeer cepstrum coefficient; Calculate the average of each frame Mel cepstrum coefficient, utilize the average of 3*M Jan Vermeer cepstrum coefficient to carry out speech detection: the average of 3*M Jan Vermeer cepstrum coefficient is input to respectively in the gauss hybrid models of voice signal and the gauss hybrid models of various non-speech audios, if the maximum probability of exporting while being input to the gauss hybrid models of voice signal, judges that this section is as voice signal, otherwise is judged to be non-speech audio.
In described step S40, also need to select various types of audio frequency to train the gauss hybrid models of the gauss hybrid models of voice signal and various non-speech audios, can guarantee the robustness of model like this, improve the accuracy rate of speech detection, when training, need to mark the classification of each audio file.
Fig. 6 is the process flow diagram of the off-line training process of gauss hybrid models in speech detection method according to an embodiment of the invention, as shown in Figure 6, further comprising the steps for the training of gauss hybrid models:
Step S41, carries out filtered audio for whole training audio repositories; Adopt respectively the method for described step S10 and step S20 to carry out time and frequency domain analysis to sound signal, reject part non-speech audio wherein, subsequent step is only trained sound signal remaining to be screened;
Step S42, according to audio categories mark, the sound signal after filtering is classified, the sound signal being about to after filtering is divided into voice signal and non-speech audio, for non-speech audio, need to them, further classify (in an embodiment of the present invention according to the feature of sound signal, non-speech audio is divided into background music, animal sounds, stationary noise and nonstationary noise, dissimilar non-voice is trained respectively to gauss hybrid models);
Step S43, sorted sound signal Yi Zhengwei unit is extracted to Mel cepstrum coefficient, first extract M rank static parameter, then calculate respectively their first order difference and second order difference, final extraction obtains 3*M dimension parameter, adopt the method for described step S30 that continuous some frames of feature similarity are formed to an audio section, the average of each frame Mel cepstrum coefficient in difference compute segment, the characteristic parameter using it as training gauss hybrid models;
Step S44, to voice signal and different classes of non-speech audio, adopt the Mel cepstrum coefficient on 3*M rank to carry out respectively the training of gauss hybrid models, by the training of EM iteration, determine weight, average and the variance of each gauss component in different gauss hybrid models, in an embodiment of the present invention, in each gauss hybrid models, comprise 32 gauss components.
In sum, the present invention proposes a kind of efficient speech detection method, this mechanism is carried out the speech detection in two stages to audio stream.First in time domain He on frequency domain, voice signal is analyzed, by rational parameter threshold is set, voice signal is divided into non-speech data and data to be screened.Then by the parameter model of robust, treat examination data and detect, whether judgement wherein comprises voice.Speech detection method provided by the invention can steadily and under Non-Stationary random noise environment detect speech data fast and accurately various; Can effectively distinguish voice signal and various non-speech audio, not be subject to the restriction of speaker, environment and languages.
It should be noted that, the above-mentioned implementation to each parts is not limited in the various implementations of mentioning in embodiment, and those of ordinary skill in the art can know simply and replace it, for example:
(1) while voice signal being carried out to frequency-domain analysis, according to the sense of hearing characteristic of people's ear, by frequency band division become 0-500Hz, 500-1000Hz, 1000-2000Hz, 2000-3000Hz, 3000-4000Hz totally five subbands on frequency domain, voice signal is analyzed.Can use other sub-band division method to substitute, as used Mel wave filter to divide each subband.
(2) set up in gauss hybrid models process, the mixed Gauss model number of regulation also can be adjusted, and as voice gauss hybrid models comprises 32 Gaussian distribution, non-voice gauss hybrid models comprises 64 Gaussian distribution.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1.一种语音检测方法,其特征在于,该方法包括以下步骤:1. A voice detection method, characterized in that the method may further comprise the steps: 步骤S10,获取原始音频,在时域上分析所述原始音频的短时能量和短时过零率,通过短时能量和短时过零率剔除原始音频中的部分非语音信号;Step S10, obtaining the original audio, analyzing the short-term energy and short-time zero-crossing rate of the original audio in the time domain, and removing part of the non-speech signals in the original audio through the short-term energy and short-time zero-crossing rate; 步骤S20,对于所述步骤S10保留下来的音频信号,在频域上分析其子带的谱包络特性和子带的熵特性,进一步剔除所述音频信号中的部分非语音信号;Step S20, for the audio signal retained in the step S10, analyze the spectral envelope characteristics of its subbands and the entropy characteristics of the subbands in the frequency domain, and further eliminate part of the non-speech signals in the audio signal; 步骤S30,对于保留下来的各待甄别帧的音频信号,将特征相似的连续若干帧组成一个音频段;Step S30, for the retained audio signals of the frames to be discriminated, a plurality of consecutive frames with similar characteristics are formed into an audio segment; 步骤S40,对于每个待甄别音频段,通过高斯混合模型对于该音频段中是否包含语音数据进行段级决策,最终得到语音检测结果。Step S40, for each audio segment to be discriminated, a Gaussian mixture model is used to make a segment-level decision on whether the audio segment contains voice data, and finally obtain a voice detection result. 2.根据权利要求1所述的方法,其特征在于,所述步骤S10进一步包括如下步骤:2. The method according to claim 1, wherein said step S10 further comprises the following steps: 步骤S11,将原始音频等间隔分成若干帧,计算每帧原始音频的短时能量和短时过零率;Step S11, divide the original audio into several frames at equal intervals, and calculate the short-term energy and short-time zero-crossing rate of each frame of original audio; 步骤S12,将每帧原始音频的短时能量和短时过零率分别与预先设定的低、高两个门限进行比较,根据比较结果将每帧原始音频分为静音段、过渡段和语音段,去除所述原始音频中的静音段和过渡段信号,仅保留语音段信号。Step S12, compare the short-term energy and short-time zero-crossing rate of each frame of original audio with the preset low and high thresholds respectively, and divide each frame of original audio into silence segment, transition segment and speech segment according to the comparison result. segment, removing the silent segment and the transition segment signal in the original audio, and only retaining the speech segment signal. 3.根据权利要求2所述的方法,其特征在于,如果所述短时能量或短时过零率超过低门限,则标记进入过渡段;在过渡段中,若两个参数都回落到低门限以下则进入到静音段;在过渡段中,若两个参数中的任意一个超过高门限,则认为进入语音段;在语音段中,若两个参数都降到低门限以下,并且持续时间超过一预定阈值,则认为语音段结束。3. The method according to claim 2, wherein if the short-term energy or the short-term zero-crossing rate exceeds a low threshold, then the flag enters a transition section; in the transition section, if both parameters fall back to a low If the threshold is below the threshold, it enters the silent segment; in the transition segment, if any one of the two parameters exceeds the high threshold, it is considered to enter the speech segment; in the speech segment, if both parameters drop below the low threshold, and the duration If a predetermined threshold is exceeded, the speech segment is considered to be terminated. 4.根据权利要求1所述的方法,其特征在于,所述步骤S20中,在频域上分析各子带的谱包络的统计特性包括以下步骤:4. The method according to claim 1, characterized in that, in the step S20, analyzing the statistical characteristics of the spectral envelope of each subband in the frequency domain comprises the following steps: 首先,将所述音频信号分成若干子带;First, the audio signal is divided into several subbands; 然后,分别在各子带的频率范围内进行带通滤波,得到各子带的音频信号;Then, band-pass filtering is carried out in the frequency range of each sub-band respectively to obtain the audio signal of each sub-band; 然后,对各子带音频信号进行希尔伯特变换,得到各子带的谱包络;Then, carry out Hilbert transform to each sub-band audio signal, obtain the spectral envelope of each sub-band; 最后,对包含明显的共振峰特性和包含更多的噪声成分的子带分析其谱包络信号的统计特性。Finally, the statistical properties of the spectral envelope signal are analyzed for the subbands containing obvious formant characteristics and containing more noise components. 5.根据权利要求4所述的方法,其特征在于,所述谱包络信号的统计特性包括谱包络的均值和方差,具体需要计算的特征为:包含明显共振峰特性的子带谱包络方差;包含明显共振峰特性的子带谱包络与包含更多噪声成分的子带谱包络的均值差。5. The method according to claim 4, wherein the statistical properties of the spectral envelope signal comprise the mean value and the variance of the spectral envelope, and the features that need to be calculated specifically are: subband spectral envelopes that contain obvious formant characteristics The variance of the envelope; the mean difference between the subband spectral envelope containing distinct formant characteristics and the subband spectral envelope containing more noise components. 6.根据权利要求1所述的方法,其特征在于,所述步骤S20中,在频域上分析子带的熵特性包括以下步骤:6. The method according to claim 1, characterized in that, in the step S20, analyzing the entropy characteristics of the subbands in the frequency domain comprises the following steps: 首先,在长跨度模式下,利用当前帧和与其相邻的若干帧计算当前帧各频点的熵;First, in the long-span mode, use the current frame and several frames adjacent to it to calculate the entropy of each frequency point of the current frame; 然后,在特定子带范围内统计熵的均值和方差以确定当前语音帧的复杂度。Then, the mean value and variance of the entropy are counted within a specific sub-band range to determine the complexity of the current speech frame. 7.根据权利要求1所述的方法,其特征在于,所述步骤S20中,根据各子带的谱包络统计特性和熵特性,进一步剔除所述音频信号中的部分非语音信号的步骤包括以下步骤:7. The method according to claim 1, wherein, in the step S20, according to the spectral envelope statistical characteristics and entropy characteristics of each subband, the step of further removing part of the non-speech signal in the audio signal comprises The following steps: 步骤S21,对于每帧语音信号,首先对其进行高通滤波以去除工频信号的干扰,然后对经过高通滤波的音频信号进行加窗处理;Step S21, for each frame of speech signal, first perform high-pass filtering to remove the interference of the power frequency signal, and then perform windowing processing on the high-pass filtered audio signal; 步骤S22,将加窗处理后的音频信号分成N个频段,在这些频段范围内分别对所述音频信号进行带通滤波,得到N个子带的音频信号;Step S22, dividing the audio signal after windowing processing into N frequency bands, performing bandpass filtering on the audio signal within these frequency bands respectively, to obtain audio signals of N subbands; 步骤S23,对各子带的音频信号进行希尔伯特变换,得到相应的谱包络信号;Step S23, performing Hilbert transform on the audio signal of each subband to obtain the corresponding spectral envelope signal; 步骤S24,对所述步骤S23得到的谱包络信号进行统计特性分析,得到谱包络判决输出;Step S24, performing statistical characteristic analysis on the spectral envelope signal obtained in step S23, to obtain a spectral envelope decision output; 步骤S25,对当前帧音频信号和相邻若干帧的音频信号计算傅里叶幅度谱,得到不同帧各频率点的傅里叶幅值;针对不同的频率点,利用相邻若干帧计算当前帧在该频率点处的熵;在包含明显的共振峰特性的子带范围内计算各频率点熵的方差,作为长跨度判决输出;Step S25, calculate the Fourier amplitude spectrum of the audio signal of the current frame and the audio signals of several adjacent frames, and obtain the Fourier amplitude of each frequency point in different frames; for different frequency points, use several adjacent frames to calculate the current frame Entropy at the frequency point; calculate the variance of the entropy of each frequency point within the subband range containing obvious formant characteristics, and use it as a long-span judgment output; 步骤S26,融合所述步骤S24和步骤S25得到的两个判决输出进行综合判决,得到最终的频域判决结果;如果频域判决结果高于一门限值,则将该帧标注为语音帧,如果低于该门限值则将该帧标注为非语音帧。Step S26, combining the two judgment outputs obtained in step S24 and step S25 for comprehensive judgment to obtain the final frequency domain judgment result; if the frequency domain judgment result is higher than a threshold value, then mark the frame as a speech frame, If it is lower than the threshold, the frame is marked as a non-speech frame. 8.根据权利要求1所述的方法,其特征在于,所述步骤S30进一步包括以下步骤:8. The method according to claim 1, wherein the step S30 further comprises the following steps: 步骤S31,对于各待甄别帧的音频信号,考虑人耳听觉感知特性,在梅尔域将所述音频信号分成若干子带;Step S31, for the audio signal of each frame to be discriminated, considering the auditory perception characteristics of the human ear, divide the audio signal into several subbands in the Mel domain; 步骤S32,对每帧音频信号计算各子带的熵,以度量各子带能量的比重,根据听觉感知特性设置各子带的权重;Step S32, calculating the entropy of each sub-band for each frame of audio signal to measure the proportion of energy of each sub-band, and setting the weight of each sub-band according to the auditory perception characteristics; 步骤S33,以各子带的熵为特征参数,计算相邻语音帧的相似度,在计算过程中考虑各子带的权重,然后根据度量函数将特征相似的相邻帧归为一个音频段。Step S33, using the entropy of each subband as a characteristic parameter, calculating the similarity of adjacent speech frames, considering the weight of each subband in the calculation process, and then classifying adjacent frames with similar features into an audio segment according to the measurement function. 9.根据权利要求1所述的方法,其特征在于,所述步骤S40具体为:9. The method according to claim 1, wherein the step S40 is specifically: 对于每个待甄别音频段,分别计算段内各帧梅尔倒谱系数的均值,将得到的均值参数分别输入到语音高斯混合模型和各种非语音高斯混合模型中,根据各模型的输出概率对于该音频段中是否包含语音数据进行段级决策,最终得到语音检测结果。For each audio segment to be discriminated, calculate the mean value of the Mel cepstral coefficient of each frame in the segment, and input the mean value parameters into the speech Gaussian mixture model and various non-speech Gaussian mixture models, according to the output probability of each model A segment-level decision is made on whether the audio segment contains voice data, and finally a voice detection result is obtained. 10.根据权利要求1所述的方法,其特征在于,对于所述步骤S40中高斯混合模型的训练具体为:10. The method according to claim 1, wherein the training of the Gaussian mixture model in the step S40 is specifically: 步骤S41,对于全部训练音频库进行音频过滤,分别采用所述步骤S10和步骤S20的方法对音频信号进行时域和频域分析,剔除其中的部分非语音信号,后续步骤只对剩余的待甄别音频信号进行训练;Step S41, perform audio filtering on all training audio libraries, respectively use the methods of step S10 and step S20 to analyze the audio signals in the time domain and frequency domain, and remove some of the non-speech signals, and the subsequent steps are only for the remaining to be screened audio signal for training; 步骤S42,根据音频类别标注对过滤后的音频信号进行分类,即将过滤后的音频信号分为语音信号和非语音信号;Step S42, classifying the filtered audio signal according to the audio category label, that is, dividing the filtered audio signal into a speech signal and a non-speech signal; 步骤S43,对分类后的音频信号以帧为单位提取梅尔倒谱系数,先提取M阶静态参数,然后分别计算它们的一阶差分和二阶差分,最终提取得到3*M维参数,采用所述步骤S30的方法将特征相似的连续若干帧组成一个音频段,分别计算段内各帧梅尔倒谱系数的均值,将其作为训练高斯混合模型的特征参数;Step S43, extracting the Mel cepstral coefficients from the classified audio signal in units of frames, first extracting M-order static parameters, then calculating their first-order difference and second-order difference respectively, and finally extracting 3*M-dimensional parameters, using The method of described step S30 forms an audio segment with similar continuous frames of features, calculates the mean value of each frame Mel cepstral coefficient in the segment respectively, and uses it as the characteristic parameter of training Gaussian mixture model; 步骤S44,对语音信号和不同类别的非语音信号分别进行高斯混合模型的训练,即通过EM迭代训练确定不同高斯混合模型中各个高斯成分的权重、均值和方差。In step S44, Gaussian mixture models are trained on speech signals and non-speech signals of different categories, that is, weights, mean values, and variances of each Gaussian component in different Gaussian mixture models are determined through EM iterative training.
CN201310743203.5A 2013-12-30 2013-12-30 A kind of speech detection method efficiently Active CN103646649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310743203.5A CN103646649B (en) 2013-12-30 2013-12-30 A kind of speech detection method efficiently

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310743203.5A CN103646649B (en) 2013-12-30 2013-12-30 A kind of speech detection method efficiently

Publications (2)

Publication Number Publication Date
CN103646649A true CN103646649A (en) 2014-03-19
CN103646649B CN103646649B (en) 2016-04-13

Family

ID=50251851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310743203.5A Active CN103646649B (en) 2013-12-30 2013-12-30 A kind of speech detection method efficiently

Country Status (1)

Country Link
CN (1) CN103646649B (en)

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318927A (en) * 2014-11-04 2015-01-28 东莞市北斗时空通信科技有限公司 Anti-noise low-bitrate speech coding method and decoding method
CN104464722A (en) * 2014-11-13 2015-03-25 北京云知声信息技术有限公司 Voice activity detection method and equipment based on time domain and frequency domain
CN104934043A (en) * 2015-06-17 2015-09-23 广东欧珀移动通信有限公司 Audio processing method and device
CN105118522A (en) * 2015-08-27 2015-12-02 广州市百果园网络科技有限公司 Noise detection method and device
CN105788592A (en) * 2016-04-28 2016-07-20 乐视控股(北京)有限公司 Audio classification method and apparatus thereof
CN105843400A (en) * 2016-05-05 2016-08-10 广东小天才科技有限公司 Somatosensory interaction method and device and wearable device
CN106020445A (en) * 2016-05-05 2016-10-12 广东小天才科技有限公司 Method for automatically identifying wearing by left hand and right hand and wearing equipment
CN106548782A (en) * 2016-10-31 2017-03-29 维沃移动通信有限公司 The processing method and mobile terminal of acoustical signal
CN106653047A (en) * 2016-12-16 2017-05-10 广州视源电子科技股份有限公司 Automatic gain control method and device for audio data
CN106782508A (en) * 2016-12-20 2017-05-31 美的集团股份有限公司 The cutting method of speech audio and the cutting device of speech audio
CN106887241A (en) * 2016-10-12 2017-06-23 阿里巴巴集团控股有限公司 A kind of voice signal detection method and device
CN107039035A (en) * 2017-01-10 2017-08-11 上海优同科技有限公司 A kind of detection method of voice starting point and ending point
CN107045870A (en) * 2017-05-23 2017-08-15 南京理工大学 A kind of the Method of Speech Endpoint Detection of feature based value coding
CN107910017A (en) * 2017-12-19 2018-04-13 河海大学 A kind of method that threshold value is set in noisy speech end-point detection
WO2018068639A1 (en) * 2016-10-14 2018-04-19 腾讯科技(深圳)有限公司 Data recovery method and apparatus, and storage medium
CN108269566A (en) * 2018-01-17 2018-07-10 南京理工大学 A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature
CN108831508A (en) * 2018-06-13 2018-11-16 百度在线网络技术(北京)有限公司 Voice activity detection method, device and equipment
CN109036470A (en) * 2018-06-04 2018-12-18 平安科技(深圳)有限公司 Speech differentiation method, apparatus, computer equipment and storage medium
CN109147795A (en) * 2018-08-06 2019-01-04 珠海全志科技股份有限公司 Voice print database transmission, recognition methods, identification device and storage medium
CN109347580A (en) * 2018-11-19 2019-02-15 湖南猎航电子科技有限公司 A kind of adaptive threshold signal detecting method of known duty ratio
CN109448750A (en) * 2018-12-20 2019-03-08 西京学院 A kind of sound enhancement method improving bioradar voice quality
TWI659412B (en) * 2016-10-11 2019-05-11 中國商芋頭科技(杭州)有限公司 Voice activation detection method and device
CN109788922A (en) * 2016-10-14 2019-05-21 公立大学法人大阪府立大学 Swallow diagnostic device and program
CN109801646A (en) * 2019-01-31 2019-05-24 北京嘉楠捷思信息技术有限公司 Voice endpoint detection method and device based on fusion features
CN110097895A (en) * 2019-05-14 2019-08-06 腾讯音乐娱乐科技(深圳)有限公司 A kind of absolute music detection method, device and storage medium
CN110349597A (en) * 2019-07-03 2019-10-18 山东师范大学 A kind of speech detection method and device
CN110600010A (en) * 2019-09-20 2019-12-20 上海优扬新媒信息技术有限公司 Corpus extraction method and apparatus
CN110636176A (en) * 2019-10-09 2019-12-31 科大讯飞股份有限公司 Call fault detection method, device, equipment and storage medium
CN111261143A (en) * 2018-12-03 2020-06-09 杭州嘉楠耘智信息科技有限公司 Voice wake-up method and device and computer readable storage medium
CN111398944A (en) * 2020-04-09 2020-07-10 浙江大学 A Radar Signal Processing Method for Identity Recognition
CN111415685A (en) * 2020-03-26 2020-07-14 腾讯科技(深圳)有限公司 Audio signal detection method, device, equipment and computer readable storage medium
CN111883182A (en) * 2020-07-24 2020-11-03 平安科技(深圳)有限公司 Human voice detection method, device, equipment and storage medium
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device
CN112466331A (en) * 2020-11-11 2021-03-09 昆明理工大学 Voice music classification model based on beat spectrum characteristics
CN112528920A (en) * 2020-12-21 2021-03-19 杭州格像科技有限公司 Pet image emotion recognition method based on depth residual error network
CN112562735A (en) * 2020-11-27 2021-03-26 锐迪科微电子(上海)有限公司 Voice detection method, device, equipment and storage medium
CN112767920A (en) * 2020-12-31 2021-05-07 深圳市珍爱捷云信息技术有限公司 Method, device, equipment and storage medium for recognizing call voice
CN113160853A (en) * 2021-03-31 2021-07-23 深圳鱼亮科技有限公司 Voice endpoint detection method based on real-time face assistance
CN113192488A (en) * 2021-04-06 2021-07-30 青岛信芯微电子科技股份有限公司 Voice processing method and device
CN113541867A (en) * 2021-06-30 2021-10-22 南京奥通智能科技有限公司 Remote communication module for converged terminal
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal
CN114839595A (en) * 2022-04-10 2022-08-02 哈尔滨工业大学 Improved time delay estimation method for real-time sound source positioning
CN115424626A (en) * 2022-08-05 2022-12-02 浙江大华技术股份有限公司 Method and device for voice activity detection
CN116364107A (en) * 2023-03-20 2023-06-30 珠海亿智电子科技有限公司 Voice signal detection method, device, equipment and storage medium
CN118411982A (en) * 2024-04-23 2024-07-30 山西警察学院 English voice signal processing recognition method based on artificial intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122667A1 (en) * 2002-12-24 2004-06-24 Mi-Suk Lee Voice activity detector and voice activity detection method using complex laplacian model
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Voice activity detection method and voice activity detector
CN102473412A (en) * 2009-07-21 2012-05-23 日本电信电话株式会社 Audio signal section estimateing apparatus, audio signal section estimateing method, program therefor and recording medium
CN103165127A (en) * 2011-12-15 2013-06-19 佳能株式会社 Sound segmentation equipment, sound segmentation method and sound detecting system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122667A1 (en) * 2002-12-24 2004-06-24 Mi-Suk Lee Voice activity detector and voice activity detection method using complex laplacian model
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Voice activity detection method and voice activity detector
CN102473412A (en) * 2009-07-21 2012-05-23 日本电信电话株式会社 Audio signal section estimateing apparatus, audio signal section estimateing method, program therefor and recording medium
CN103165127A (en) * 2011-12-15 2013-06-19 佳能株式会社 Sound segmentation equipment, sound segmentation method and sound detecting system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
沈红丽: "语音激活检测技术算法研究及其在语音编码器中的应用", 《万方数据》 *
章钊、郭武: "话者识别中结合模型和能量的语音激活检测算法", 《小型微型计算机系统》 *

Cited By (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318927A (en) * 2014-11-04 2015-01-28 东莞市北斗时空通信科技有限公司 Anti-noise low-bitrate speech coding method and decoding method
CN104464722B (en) * 2014-11-13 2018-05-25 北京云知声信息技术有限公司 Voice activity detection method and apparatus based on time domain and frequency domain
CN104464722A (en) * 2014-11-13 2015-03-25 北京云知声信息技术有限公司 Voice activity detection method and equipment based on time domain and frequency domain
CN104934043A (en) * 2015-06-17 2015-09-23 广东欧珀移动通信有限公司 Audio processing method and device
CN105118522A (en) * 2015-08-27 2015-12-02 广州市百果园网络科技有限公司 Noise detection method and device
CN105118522B (en) * 2015-08-27 2021-02-12 广州市百果园网络科技有限公司 Noise detection method and device
CN105788592A (en) * 2016-04-28 2016-07-20 乐视控股(北京)有限公司 Audio classification method and apparatus thereof
CN106020445A (en) * 2016-05-05 2016-10-12 广东小天才科技有限公司 Method for automatically identifying wearing by left hand and right hand and wearing equipment
CN105843400A (en) * 2016-05-05 2016-08-10 广东小天才科技有限公司 Somatosensory interaction method and device and wearable device
TWI659412B (en) * 2016-10-11 2019-05-11 中國商芋頭科技(杭州)有限公司 Voice activation detection method and device
KR20190061076A (en) * 2016-10-12 2019-06-04 알리바바 그룹 홀딩 리미티드 Method and device for detecting an audio signal
CN106887241A (en) * 2016-10-12 2017-06-23 阿里巴巴集团控股有限公司 A kind of voice signal detection method and device
US10706874B2 (en) * 2016-10-12 2020-07-07 Alibaba Group Holding Limited Voice signal detection method and apparatus
WO2018068636A1 (en) * 2016-10-12 2018-04-19 阿里巴巴集团控股有限公司 Method and device for detecting audio signal
CN109788922A (en) * 2016-10-14 2019-05-21 公立大学法人大阪府立大学 Swallow diagnostic device and program
WO2018068639A1 (en) * 2016-10-14 2018-04-19 腾讯科技(深圳)有限公司 Data recovery method and apparatus, and storage medium
US11246526B2 (en) 2016-10-14 2022-02-15 University Public Corporation Osaka Swallowing diagnosis apparatus and storage medium
CN106548782A (en) * 2016-10-31 2017-03-29 维沃移动通信有限公司 The processing method and mobile terminal of acoustical signal
CN106653047A (en) * 2016-12-16 2017-05-10 广州视源电子科技股份有限公司 Automatic gain control method and device for audio data
CN106782508A (en) * 2016-12-20 2017-05-31 美的集团股份有限公司 The cutting method of speech audio and the cutting device of speech audio
CN107039035A (en) * 2017-01-10 2017-08-11 上海优同科技有限公司 A kind of detection method of voice starting point and ending point
CN107045870A (en) * 2017-05-23 2017-08-15 南京理工大学 A kind of the Method of Speech Endpoint Detection of feature based value coding
CN107910017A (en) * 2017-12-19 2018-04-13 河海大学 A kind of method that threshold value is set in noisy speech end-point detection
CN108269566A (en) * 2018-01-17 2018-07-10 南京理工大学 A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature
CN109036470A (en) * 2018-06-04 2018-12-18 平安科技(深圳)有限公司 Speech differentiation method, apparatus, computer equipment and storage medium
CN109036470B (en) * 2018-06-04 2023-04-21 平安科技(深圳)有限公司 Voice distinguishing method, device, computer equipment and storage medium
CN108831508A (en) * 2018-06-13 2018-11-16 百度在线网络技术(北京)有限公司 Voice activity detection method, device and equipment
CN109147795A (en) * 2018-08-06 2019-01-04 珠海全志科技股份有限公司 Voice print database transmission, recognition methods, identification device and storage medium
CN109347580A (en) * 2018-11-19 2019-02-15 湖南猎航电子科技有限公司 A kind of adaptive threshold signal detecting method of known duty ratio
CN109347580B (en) * 2018-11-19 2021-01-19 湖南猎航电子科技有限公司 Self-adaptive threshold signal detection method with known duty ratio
CN111261143B (en) * 2018-12-03 2024-03-22 嘉楠明芯(北京)科技有限公司 Voice wakeup method and device and computer readable storage medium
CN111261143A (en) * 2018-12-03 2020-06-09 杭州嘉楠耘智信息科技有限公司 Voice wake-up method and device and computer readable storage medium
CN109448750A (en) * 2018-12-20 2019-03-08 西京学院 A kind of sound enhancement method improving bioradar voice quality
CN109801646A (en) * 2019-01-31 2019-05-24 北京嘉楠捷思信息技术有限公司 Voice endpoint detection method and device based on fusion features
CN109801646B (en) * 2019-01-31 2021-11-16 嘉楠明芯(北京)科技有限公司 Voice endpoint detection method and device based on fusion features
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device
CN110097895A (en) * 2019-05-14 2019-08-06 腾讯音乐娱乐科技(深圳)有限公司 A kind of absolute music detection method, device and storage medium
CN110349597A (en) * 2019-07-03 2019-10-18 山东师范大学 A kind of speech detection method and device
CN110349597B (en) * 2019-07-03 2021-06-25 山东师范大学 A kind of voice detection method and device
CN110600010A (en) * 2019-09-20 2019-12-20 上海优扬新媒信息技术有限公司 Corpus extraction method and apparatus
CN110600010B (en) * 2019-09-20 2022-05-17 度小满科技(北京)有限公司 Corpus extraction method and apparatus
CN110636176A (en) * 2019-10-09 2019-12-31 科大讯飞股份有限公司 Call fault detection method, device, equipment and storage medium
CN110636176B (en) * 2019-10-09 2022-05-17 科大讯飞股份有限公司 Call fault detection method, device, equipment and storage medium
CN111415685A (en) * 2020-03-26 2020-07-14 腾讯科技(深圳)有限公司 Audio signal detection method, device, equipment and computer readable storage medium
CN111398944A (en) * 2020-04-09 2020-07-10 浙江大学 A Radar Signal Processing Method for Identity Recognition
CN111883182A (en) * 2020-07-24 2020-11-03 平安科技(深圳)有限公司 Human voice detection method, device, equipment and storage medium
CN111883182B (en) * 2020-07-24 2024-03-19 平安科技(深圳)有限公司 Human voice detection method, device, equipment and storage medium
WO2021135547A1 (en) * 2020-07-24 2021-07-08 平安科技(深圳)有限公司 Human voice detection method, apparatus, device, and storage medium
CN112466331A (en) * 2020-11-11 2021-03-09 昆明理工大学 Voice music classification model based on beat spectrum characteristics
CN112562735B (en) * 2020-11-27 2023-03-24 锐迪科微电子(上海)有限公司 Voice detection method, device, equipment and storage medium
CN112562735A (en) * 2020-11-27 2021-03-26 锐迪科微电子(上海)有限公司 Voice detection method, device, equipment and storage medium
CN112528920A (en) * 2020-12-21 2021-03-19 杭州格像科技有限公司 Pet image emotion recognition method based on depth residual error network
CN112767920A (en) * 2020-12-31 2021-05-07 深圳市珍爱捷云信息技术有限公司 Method, device, equipment and storage medium for recognizing call voice
CN113160853A (en) * 2021-03-31 2021-07-23 深圳鱼亮科技有限公司 Voice endpoint detection method based on real-time face assistance
CN113192488A (en) * 2021-04-06 2021-07-30 青岛信芯微电子科技股份有限公司 Voice processing method and device
CN113541867A (en) * 2021-06-30 2021-10-22 南京奥通智能科技有限公司 Remote communication module for converged terminal
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal
CN114839595A (en) * 2022-04-10 2022-08-02 哈尔滨工业大学 Improved time delay estimation method for real-time sound source positioning
CN115424626A (en) * 2022-08-05 2022-12-02 浙江大华技术股份有限公司 Method and device for voice activity detection
CN116364107A (en) * 2023-03-20 2023-06-30 珠海亿智电子科技有限公司 Voice signal detection method, device, equipment and storage medium
CN118411982A (en) * 2024-04-23 2024-07-30 山西警察学院 English voice signal processing recognition method based on artificial intelligence
CN118411982B (en) * 2024-04-23 2024-10-29 山西警察学院 English voice signal processing recognition method based on artificial intelligence

Also Published As

Publication number Publication date
CN103646649B (en) 2016-04-13

Similar Documents

Publication Publication Date Title
CN103646649A (en) High-efficiency voice detecting method
CN104835498B (en) Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter
Evangelopoulos et al. Multiband modulation energy tracking for noisy speech detection
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
CN103489446B (en) Based on the twitter identification method that adaptive energy detects under complex environment
US20090076814A1 (en) Apparatus and method for determining speech signal
CN113823293B (en) Speaker recognition method and system based on voice enhancement
CN102509547A (en) Method and system for voiceprint recognition based on vector quantization based
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
CN103310789A (en) Sound event recognition method based on optimized parallel model combination
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
Ghaemmaghami et al. Noise robust voice activity detection using features extracted from the time-domain autocorrelation function
EP2817800A1 (en) Modified mel filter bank structure using spectral characteristics for sound analysis
Archana et al. Gender identification and performance analysis of speech signals
CN115662464B (en) Method and system for intelligently identifying environmental noise
Jaafar et al. Automatic syllables segmentation for frog identification system
CN109473102A (en) A kind of robot secretary intelligent meeting recording method and system
Radmard et al. A new method of voiced/unvoiced classification based on clustering
Venter et al. Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
Tripathi et al. Speaker recognition
Chu et al. A noise-robust FFT-based auditory spectrum with application in audio classification
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
Dov et al. Voice activity detection in presence of transients using the scattering transform
Estrebou et al. Voice recognition based on probabilistic SOM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20170508

Address after: 100094, No. 4, building A, No. 1, building 2, wing Cheng North Road, No. 405-346, Beijing, Haidian District

Patentee after: Beijing Rui Heng Heng Xun Technology Co., Ltd.

Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No.

Patentee before: Institute of Automation, Chinese Academy of Sciences

TR01 Transfer of patent right
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181218

Address after: 100190 Zhongguancun East Road, Haidian District, Haidian District, Beijing

Patentee after: Institute of Automation, Chinese Academy of Sciences

Address before: 100094 No. 405-346, 4th floor, Building A, No. 1, Courtyard 2, Yongcheng North Road, Haidian District, Beijing

Patentee before: Beijing Rui Heng Heng Xun Technology Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190528

Address after: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang.

Patentee after: Limit element (Hangzhou) intelligent Polytron Technologies Inc

Address before: 100190 Zhongguancun East Road, Haidian District, Haidian District, Beijing

Patentee before: Institute of Automation, Chinese Academy of Sciences

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang.

Patentee after: Zhongke extreme element (Hangzhou) Intelligent Technology Co., Ltd

Address before: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang.

Patentee before: Limit element (Hangzhou) intelligent Polytron Technologies Inc.