CN114429769A - Glass breaking sound detection method and related equipment - Google Patents
Glass breaking sound detection method and related equipment Download PDFInfo
- Publication number
- CN114429769A CN114429769A CN202011180908.7A CN202011180908A CN114429769A CN 114429769 A CN114429769 A CN 114429769A CN 202011180908 A CN202011180908 A CN 202011180908A CN 114429769 A CN114429769 A CN 114429769A
- Authority
- CN
- China
- Prior art keywords
- audio frame
- energy
- preset
- sound
- frequency band
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011521 glass Substances 0.000 title claims abstract description 84
- 238000001514 detection method Methods 0.000 title claims abstract description 77
- 230000005236 sound signal Effects 0.000 claims abstract description 227
- 238000000034 method Methods 0.000 claims abstract description 75
- 238000001228 spectrum Methods 0.000 claims abstract description 54
- 238000012545 processing Methods 0.000 claims abstract description 22
- 230000014509 gene expression Effects 0.000 claims description 73
- 230000003595 spectral effect Effects 0.000 claims description 67
- 238000004364 calculation method Methods 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 22
- 238000009432 framing Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 29
- 208000037656 Respiratory Sounds Diseases 0.000 description 14
- 230000006870 function Effects 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 238000003672 processing method Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000002238 attenuated effect Effects 0.000 description 2
- 238000009527 percussion Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
技术领域technical field
本申请属于音频处理技术领域,尤其涉及玻璃破碎声音侦测方法及相关设备。The present application belongs to the technical field of audio processing, and in particular relates to a method for detecting the sound of broken glass and related equipment.
背景技术Background technique
传统的玻璃破碎侦测方法一般是在玻璃上安装压电片模组,根据压电片模组产生的电流确定玻璃是否破碎,此方法虽然可以有效侦测玻璃破碎声音,但是压电片模组安装麻烦、且影响美观。为了避免安装压电片模组,可以通过判断获取的声音信号中是否包含玻璃破碎声音来侦测玻璃是否破碎。现有的基于声音信号侦测玻璃破碎的方法一般是基于神经网络的处理方法,或者是根据声音信号的强度确定是否是玻璃破碎声音。但是,基于神经网络的处理方法存在运算复杂的问题,根据声音信号的强度确定是否是玻璃破碎声音的方法存在侦测不准确的问题。The traditional glass breaking detection method is generally to install a piezoelectric film module on the glass, and determine whether the glass is broken according to the current generated by the piezoelectric film module. Although this method can effectively detect the sound of glass breaking, the piezoelectric film module Installation is troublesome and affects the appearance. In order to avoid installing the piezoelectric film module, it is possible to detect whether the glass is broken by judging whether the obtained sound signal contains the sound of glass breaking. Existing methods for detecting glass breaking based on sound signals are generally processing methods based on neural networks, or determine whether it is the sound of glass breaking according to the strength of the sound signal. However, the processing method based on the neural network has the problem of complicated operation, and the method of determining whether it is the sound of glass breaking according to the strength of the sound signal has the problem of inaccurate detection.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请实施例提供了玻璃破碎声音侦测方法及相关设备,可以提高侦测玻璃破碎声音的准确率,且运算简单。In view of this, the embodiments of the present application provide a method for detecting the sound of glass breaking and related equipment, which can improve the accuracy of detecting the sound of glass breaking, and the calculation is simple.
本申请实施例的第一方面提供了一种玻璃破碎声音侦测方法,包括:A first aspect of the embodiments of the present application provides a method for detecting the sound of broken glass, including:
获取声音信号,对所述声音信号进行分帧处理,获得M个音频帧,所述M表示大于0的整数;Acquiring a sound signal, performing frame-by-frame processing on the sound signal, and obtaining M audio frames, where M represents an integer greater than 0;
若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值;If it is determined that the current audio frame is a crackling sound according to the spectrum parameters of the current audio frame, then according to the spectrum parameters of the current audio frame and the spectrum parameters of the audio frames after the current audio frame, the crackling sound is determined peak;
若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,则判定所述声音信号为玻璃破碎声音,所述Np表示大于0的整数,且M>Np。If the energy of the N p audio frames after the current audio frame is within the preset attenuation range relative to the attenuation of the peak value of the crackling sound, it is determined that the sound signal is a glass breaking sound, and the N p represents an integer greater than 0, and M>N p .
在第一方面的一种可能的实现方式中,所述若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值,包括:In a possible implementation manner of the first aspect, if the current audio frame is determined to be a crackling sound according to the frequency spectrum parameter of the current audio frame, then according to the frequency spectrum parameter of the current audio frame, and the Spectral parameters of the audio frame following the current audio frame, determining the crackling peak, including:
若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,且在所述当前的音频帧之后的NB个音频帧中侦测到白噪声,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值,所述NB表示大于0的整数,且M>NB。If it is determined according to the spectral parameters of the current audio frame that the current audio frame is a crackling sound, and white noise is detected in N B audio frames after the current audio frame, then according to the current audio frame The frequency spectrum parameter of , and the frequency spectrum parameter of the audio frame after the current audio frame, determine the crackle peak value, the NB represents an integer greater than 0, and M > NB .
在第一方面的一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述在所述当前的音频帧之后的NB个音频帧中侦测到白噪声,包括:In a possible implementation manner of the first aspect, the audio frame includes at least one frequency band, the frequency spectrum parameter includes energy of the frequency band, and the audio frames in N B audio frames following the current audio frame White noise detected, including:
若所述当前的音频帧之后的NB个音频帧中存在目标音频帧,则确定第一计数值的计算公式,所述目标音频帧为频带的能量之间满足第一预设关系式的音频帧,所述第一计算值的计算公式与所述第一预设关系式对应;If there is a target audio frame in N B audio frames after the current audio frame, then determine the calculation formula of the first count value, and the target audio frame is the audio frequency that satisfies the first preset relational expression between the energies of the frequency bands frame, the calculation formula of the first calculation value corresponds to the first preset relational formula;
根据所述第一计数值的计算公式确定所述第一计数值,若所述第一计数值大于预设第一阈值,则判定所述目标音频帧为白噪声。The first count value is determined according to the calculation formula of the first count value, and if the first count value is greater than a preset first threshold, it is determined that the target audio frame is white noise.
在第一方面的一种可能的实现方式中,所述音频帧包括至少一个子帧,所述频谱参数包括所述子帧的能量,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,包括:In a possible implementation manner of the first aspect, the audio frame includes at least one subframe, the spectrum parameter includes energy of the subframe, the audio frame includes at least one frequency band, and the spectrum parameter includes the The energy of the frequency band, and determining that the current audio frame is a crackling sound according to the frequency spectrum parameter of the current audio frame, including:
若所述当前的音频帧的子帧的能量的最大值大于预设第二阈值,则:If the maximum value of the energy of the subframes of the current audio frame is greater than the preset second threshold, then:
根据所述当前的音频帧的子帧的能量确定所述当前的音频帧的能量比;Determine the energy ratio of the current audio frame according to the energy of the subframes of the current audio frame;
根据所述当前的音频帧的频带的能量以及所述当前的音频帧的前一音频帧的频带的能量,确定当前的音频帧与所述前一音频帧的能量比;According to the energy of the frequency band of the current audio frame and the energy of the frequency band of the previous audio frame of the current audio frame, determine the energy ratio of the current audio frame to the previous audio frame;
根据所述当前的音频帧的子帧的能量以及所述前一音频帧的子帧的能量确定所述当前的音频帧与所述前一音频帧的能量差;Determine the energy difference between the current audio frame and the previous audio frame according to the energy of the subframe of the current audio frame and the energy of the subframe of the previous audio frame;
根据所述前一音频帧的子帧的能量确定所述前一音频帧的能量比;Determine the energy ratio of the previous audio frame according to the energy of the subframe of the previous audio frame;
若所述当前的音频帧与所述前一音频帧的能量比大于第一预设值,且所述当前的音频帧的能量比大于第二预设值;If the energy ratio of the current audio frame and the previous audio frame is greater than a first preset value, and the energy ratio of the current audio frame is greater than a second preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第一预设值,且所述前一音频帧的能量比大于第二预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a first preset value, and the energy ratio of the previous audio frame is greater than a second preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第三预设值,且所述当前的音频帧的能量比大于第四预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a third preset value, and the energy ratio of the current audio frame is greater than a fourth preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第三预设值,且所述前一音频帧的能量比大于第四预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a third preset value, and the energy ratio of the previous audio frame is greater than a fourth preset value;
或,若所述当前的音频帧与所述前一音频帧的能量差大于第五预设值,且所述当前的音频帧与所述前一音频帧的能量比大于第六预设值,且所述当前的音频帧的能量比大于第七预设值;Or, if the energy difference between the current audio frame and the previous audio frame is greater than the fifth preset value, and the energy ratio between the current audio frame and the previous audio frame is greater than the sixth preset value, And the energy ratio of the current audio frame is greater than the seventh preset value;
或,若所述当前的音频帧与所述前一音频帧的能量差大于第五预设值,且所述当前的音频帧与所述前一音频帧的能量比大于第六预设值,且所述前一音频帧的能量比大于第七预设值;Or, if the energy difference between the current audio frame and the previous audio frame is greater than the fifth preset value, and the energy ratio between the current audio frame and the previous audio frame is greater than the sixth preset value, and the energy ratio of the previous audio frame is greater than the seventh preset value;
则确定所述当前的音频帧为爆裂音。Then it is determined that the current audio frame is a crackling sound.
在第一方面的一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值,包括:In a possible implementation manner of the first aspect, the audio frame includes at least one frequency band, the frequency spectrum parameter includes energy of the frequency band, the frequency spectrum parameter according to the current audio frame, and the current frequency The spectral parameters of the audio frame following the audio frame, determining the crackling peak, including:
将所述当前的音频帧的各频带的能量总和作为爆裂音峰值;Taking the energy sum of each frequency band of the current audio frame as the peak of the crackling sound;
将所述当前的音频帧的后一音频帧作为第一音频帧;Taking the next audio frame of the current audio frame as the first audio frame;
计算所述第一音频帧的各频带的能量总和;calculating the energy summation of each frequency band of the first audio frame;
若所述第一音频帧的各频带的能量总和大于所述爆裂音峰值,则将所述爆裂音峰值更新为所述第一音频帧的各频带的能量总和;If the sum of the energy of each frequency band of the first audio frame is greater than the peak value of the crackling sound, updating the peak value of the crackling sound to the sum of the energy of each frequency band of the first audio frame;
将所述第一音频帧的后一音频帧作为第一音频帧,返回执行所述计算所述第一音频帧的各频带的能量总和的步骤以及后续步骤,直到满足预设结束条件。The following audio frame of the first audio frame is used as the first audio frame, and the step of calculating the sum of the energy of each frequency band of the first audio frame and the subsequent steps are returned to until the preset end condition is satisfied.
在第一方面的一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,则判定所述声音信号为玻璃破碎声音,包括:In a possible implementation manner of the first aspect, the audio frame includes at least one frequency band, the spectrum parameter includes energy of the frequency band, and the frequency of the audio frames after the current audio frame If the attenuation of the energy relative to the peak value of the crackling sound is all within the preset attenuation range, it is determined that the sound signal is the sound of glass breaking, including:
若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,且所述声音信号满足第一预设条件,则判定所述声音信号为玻璃破碎声音;所述第一预设条件包括下述中的任意一个或多个:所述声音信号的时长在预设时长内、所述声音信号的频谱参数符合预设频谱特征、所述声音信号的频带之间的能量差异在第一预设范围内、所述声音信号的目标频率之间的差异在第二预设范围内,所述目标频率为频带的能量峰值对应的频率。If the energy of the N p audio frames following the current audio frame is within the preset attenuation range relative to the attenuation of the crackling peak, and the sound signal satisfies the first preset condition, it is determined that the The sound signal is glass breaking sound; the first preset condition includes any one or more of the following: the duration of the sound signal is within the preset duration, and the spectral parameters of the sound signal conform to preset spectral characteristics , the energy difference between the frequency bands of the sound signal is within a first preset range, the difference between the target frequencies of the sound signal is within a second preset range, and the target frequency is the energy peak corresponding to the frequency band frequency.
在第一方面的一种可能的实现方式中,所述声音信号的时长在预设时长内,包括:In a possible implementation manner of the first aspect, the duration of the sound signal is within a preset duration, including:
计算所述声音信号的频带的能量相对于所述爆裂音峰值的衰减速度和/或衰减时间;calculating the decay speed and/or decay time of the energy of the frequency band of the sound signal relative to the crackle peak;
若所述衰减速度在预设速度范围内,和/或所述衰减时间在预设时间范围内,则确定所述声音信号的时长在预设时长内。If the decay speed is within the preset speed range, and/or the decay time is within the preset time range, it is determined that the duration of the sound signal is within the preset duration.
在第一方面的一种可能的实现方式中,所述声音信号的频谱参数符合预设频谱特征,包括:In a possible implementation manner of the first aspect, the spectral parameters of the sound signal conform to preset spectral characteristics, including:
确定所述声音信号中,频带之间的能量满足第二预设关系式的音频帧的数量;determining the number of audio frames in the sound signal whose energy between frequency bands satisfies the second preset relational expression;
若所述满足第二预设关系式的音频帧的数量,与所述声音信号的时长之间满足第三预设关系式,则确定所述声音信号的第一预设频带的能量符合预设频谱特征。If the number of audio frames satisfying the second preset relational expression and the duration of the sound signal satisfy the third predetermined relational expression, it is determined that the energy of the first preset frequency band of the sound signal conforms to the preset spectral characteristics.
在第一方面的一种可能的实现方式中,所述声音信号的频带之间的能量差异在第一预设范围内,包括:In a possible implementation manner of the first aspect, the energy difference between frequency bands of the sound signal is within a first preset range, including:
若所述声音信号中,第n音频帧的频带的能量与第n-1音频帧的频带的能量之间满足第四预设关系式,If in the sound signal, the energy of the frequency band of the n-th audio frame and the energy of the frequency band of the n-1-th audio frame satisfy the fourth preset relational expression,
或所述第n音频帧的频带的能量与第n-2音频帧的频带的能量之间满足第五预设关系式,则更新设定的第二计数值,所述n表示大于2的整数,且M≥n;Or the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame satisfy the fifth preset relational expression, then update the set second count value, and the n represents an integer greater than 2 , and M≥n;
若所述第二计数值与所述声音信号的时长之间满足第六预设关系式,则确定所述声音信号的频带之间的能量差异在第一预设范围内。If a sixth preset relational expression is satisfied between the second count value and the duration of the sound signal, it is determined that the energy difference between the frequency bands of the sound signal is within a first preset range.
在第一方面的一种可能的实现方式中,所述声音信号的目标频率之间的差异在第二预设范围内,包括:In a possible implementation manner of the first aspect, the difference between the target frequencies of the sound signals is within a second preset range, including:
若所述声音信号中,第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值在预设差值范围内,If in the sound signal, the difference between the target frequency of the ith frequency band of the n th audio frame and the target frequency of the ith frequency band of the n-1 th audio frame is within the preset difference range,
或第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值在预设差值范围内,则更新设定的第三计数值,所述n表示大于2的整数,且M≥n,所述i表示大于0的整数;Or the difference between the target frequency of the ith frequency band of the nth audio frame and the target frequency of the ith frequency band of the n-2th audio frame is within the preset difference range, then update the set third count value, so The n represents an integer greater than 2, and M≥n, and the i represents an integer greater than 0;
若所述第三计数值与所述声音信号的时长之间满足第七预设关系式,则确定所述声音信号的目标频率之间的差异在第二预设范围内。If a seventh preset relational expression is satisfied between the third count value and the duration of the sound signal, it is determined that the difference between the target frequencies of the sound signal is within a second preset range.
本申请实施例的第二方面提供了一种玻璃破碎声音侦测装置,其特征在于,包括:A second aspect of the embodiments of the present application provides a glass breaking sound detection device, characterized in that it includes:
获取模块,用于获取声音信号,对所述声音信号进行分帧处理,获得M个音频帧,所述M表示大于0的整数;an acquisition module, configured to acquire a sound signal, perform frame-by-frame processing on the sound signal, and obtain M audio frames, where M represents an integer greater than 0;
计算模块,用于若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值;The calculation module is used to determine that the current audio frame is a crackling sound according to the frequency spectrum parameter of the current audio frame, then according to the frequency spectrum parameter of the current audio frame, and the frequency spectrum of the audio frame after the current audio frame parameter to determine the crackling peak;
判定模块,用于若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,则判定所述声音信号为玻璃破碎声音,所述Np表示大于0的整数,且M>Np。A determination module for determining that the sound signal is a glass breaking sound if the energy of the N p audio frames after the current audio frame is within a preset attenuation range relative to the attenuation of the crackling sound peak , the N p represents an integer greater than 0, and M>N p .
在第二方面的一种可能的实现方式中,所述计算模块包括第一计算单元,所述第一计算单元用于:In a possible implementation manner of the second aspect, the computing module includes a first computing unit, and the first computing unit is configured to:
若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,且在所述当前的音频帧之后的NB个音频帧中侦测到白噪声,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值,所述NB表示大于0的整数,且M>NB。If it is determined according to the spectral parameters of the current audio frame that the current audio frame is a crackling sound, and white noise is detected in N B audio frames after the current audio frame, then according to the current audio frame The frequency spectrum parameter of , and the frequency spectrum parameter of the audio frame after the current audio frame, determine the crackle peak value, the NB represents an integer greater than 0, and M > NB .
在第二方面的一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述第一计算单元具体用于:若所述当前的音频帧之后的NB个音频帧中存在目标音频帧,则确定第一计数值的计算公式,所述目标音频帧为频带的能量之间满足第一预设关系式的音频帧,所述第一计算值的计算公式与所述第一预设关系式对应;In a possible implementation manner of the second aspect, the audio frame includes at least one frequency band, the spectrum parameter includes energy of the frequency band, and the first calculation unit is specifically configured to: if the current audio frame There is a target audio frame in the following N B audio frames, then the calculation formula of the first count value is determined, and the target audio frame is an audio frame that satisfies the first preset relational expression between the energies of the frequency bands, and the first calculation The calculation formula of the value corresponds to the first preset relational expression;
根据所述第一计数值的计算公式确定所述第一计数值,若所述第一计数值大于预设第一阈值,则判定所述目标音频帧为白噪声。The first count value is determined according to the calculation formula of the first count value, and if the first count value is greater than a preset first threshold, it is determined that the target audio frame is white noise.
在第二方面的一种可能的实现方式中,所述音频帧包括至少一个子帧,所述频谱参数包括所述子帧的能量,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述计算模块还包括第二计算单元,所述第二计算单元具体用于:In a possible implementation manner of the second aspect, the audio frame includes at least one subframe, the spectrum parameter includes energy of the subframe, the audio frame includes at least one frequency band, and the spectrum parameter includes the the energy of the frequency band, the calculation module further includes a second calculation unit, and the second calculation unit is specifically used for:
若所述当前的音频帧的子帧的能量的最大值大于预设第二阈值,则:If the maximum value of the energy of the subframes of the current audio frame is greater than the preset second threshold, then:
根据所述当前的音频帧的子帧的能量确定所述当前的音频帧的能量比;Determine the energy ratio of the current audio frame according to the energy of the subframes of the current audio frame;
根据所述当前的音频帧的频带的能量以及所述当前的音频帧的前一音频帧的频带的能量,确定当前的音频帧与所述前一音频帧的能量比;According to the energy of the frequency band of the current audio frame and the energy of the frequency band of the previous audio frame of the current audio frame, determine the energy ratio of the current audio frame to the previous audio frame;
根据所述当前的音频帧的子帧的能量以及所述前一音频帧的子帧的能量确定所述当前的音频帧与所述前一音频帧的能量差;Determine the energy difference between the current audio frame and the previous audio frame according to the energy of the subframe of the current audio frame and the energy of the subframe of the previous audio frame;
根据所述前一音频帧的子帧的能量确定所述前一音频帧的能量比;Determine the energy ratio of the previous audio frame according to the energy of the subframe of the previous audio frame;
若所述当前的音频帧与所述前一音频帧的能量比大于第一预设值,且所述当前的音频帧的能量比大于第二预设值;If the energy ratio of the current audio frame and the previous audio frame is greater than a first preset value, and the energy ratio of the current audio frame is greater than a second preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第一预设值,且所述前一音频帧的能量比大于第二预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a first preset value, and the energy ratio of the previous audio frame is greater than a second preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第三预设值,且所述当前的音频帧的能量比大于第四预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a third preset value, and the energy ratio of the current audio frame is greater than a fourth preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第三预设值,且所述前一音频帧的能量比大于第四预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a third preset value, and the energy ratio of the previous audio frame is greater than a fourth preset value;
或,若所述当前的音频帧与所述前一音频帧的能量差大于第五预设值,且所述当前的音频帧与所述前一音频帧的能量比大于第六预设值,且所述当前的音频帧的能量比大于第七预设值;Or, if the energy difference between the current audio frame and the previous audio frame is greater than the fifth preset value, and the energy ratio between the current audio frame and the previous audio frame is greater than the sixth preset value, And the energy ratio of the current audio frame is greater than the seventh preset value;
或,若所述当前的音频帧与所述前一音频帧的能量差大于第五预设值,且所述当前的音频帧与所述前一音频帧的能量比大于第六预设值,且所述前一音频帧的能量比大于第七预设值;Or, if the energy difference between the current audio frame and the previous audio frame is greater than the fifth preset value, and the energy ratio between the current audio frame and the previous audio frame is greater than the sixth preset value, and the energy ratio of the previous audio frame is greater than the seventh preset value;
则确定所述当前的音频帧为爆裂音。Then it is determined that the current audio frame is a crackling sound.
在第二方面的一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述计算单元还包括第三计算单元,所述第三计算单元具体用于:In a possible implementation manner of the second aspect, the audio frame includes at least one frequency band, the spectral parameter includes energy of the frequency band, and the calculation unit further includes a third calculation unit, the third calculation unit Specifically for:
将所述当前的音频帧的各频带的能量总和作为爆裂音峰值;Taking the energy sum of each frequency band of the current audio frame as the peak of the crackling sound;
将所述当前的音频帧的后一音频帧作为第一音频帧;Taking the next audio frame of the current audio frame as the first audio frame;
计算所述第一音频帧的各频带的能量总和;calculating the energy summation of each frequency band of the first audio frame;
若所述第一音频帧的各频带的能量总和大于所述爆裂音峰值,则将所述爆裂音峰值更新为所述第一音频帧的各频带的能量总和;If the sum of the energy of each frequency band of the first audio frame is greater than the peak value of the crackling sound, updating the peak value of the crackling sound to the sum of the energy of each frequency band of the first audio frame;
将所述第一音频帧的后一音频帧作为第一音频帧,返回执行所述计算所述第一音频帧的各频带的能量总和的步骤以及后续步骤,直到满足预设结束条件。The following audio frame of the first audio frame is used as the first audio frame, and the step of calculating the sum of the energy of each frequency band of the first audio frame and the subsequent steps are returned to until the preset end condition is satisfied.
在第二方面的一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述判定模块具体用于:In a possible implementation manner of the second aspect, the audio frame includes at least one frequency band, the spectrum parameter includes energy of the frequency band, and the determining module is specifically configured to:
若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,且所述声音信号满足第一预设条件,则判定所述声音信号为玻璃破碎声音;所述第一预设条件包括下述中的任意一个或多个:所述声音信号的时长在预设时长内、所述声音信号的频谱参数符合预设频谱特征、所述声音信号的频带之间的能量差异在第一预设范围内、所述声音信号的目标频率之间的差异在第二预设范围内,所述目标频率为频带的能量峰值对应的频率。If the energy of the N p audio frames following the current audio frame is within the preset attenuation range relative to the attenuation of the crackling peak, and the sound signal satisfies the first preset condition, it is determined that the The sound signal is glass breaking sound; the first preset condition includes any one or more of the following: the duration of the sound signal is within the preset duration, and the spectral parameters of the sound signal conform to preset spectral characteristics , the energy difference between the frequency bands of the sound signal is within a first preset range, the difference between the target frequencies of the sound signal is within a second preset range, and the target frequency is the energy peak corresponding to the frequency band frequency.
在第二方面的一种可能的实现方式中,所述判定模块还用于:In a possible implementation manner of the second aspect, the determining module is further configured to:
计算所述声音信号的频带的能量相对于所述爆裂音峰值的衰减速度和/或衰减时间;calculating the decay speed and/or decay time of the energy of the frequency band of the sound signal relative to the crackle peak;
若所述衰减速度在预设速度范围内,和/或所述衰减时间在预设时间范围内,则确定所述声音信号的时长在预设时长内。If the decay speed is within the preset speed range, and/or the decay time is within the preset time range, it is determined that the duration of the sound signal is within the preset duration.
在第二方面的一种可能的实现方式中,所述判定模块还用于:In a possible implementation manner of the second aspect, the determining module is further configured to:
确定所述声音信号中,频带之间的能量满足第二预设关系式的音频帧的数量;determining the number of audio frames in the sound signal whose energy between frequency bands satisfies the second preset relational expression;
若所述满足第二预设关系式的音频帧的数量,与所述声音信号的时长之间满足第三预设关系式,则确定所述声音信号的第一预设频带的能量符合预设频谱特征。If the number of audio frames satisfying the second preset relational expression and the duration of the sound signal satisfy the third predetermined relational expression, it is determined that the energy of the first preset frequency band of the sound signal conforms to the preset spectral characteristics.
在第二方面的一种可能的实现方式中,所述判定模块还用于:In a possible implementation manner of the second aspect, the determining module is further configured to:
若所述声音信号中,第n音频帧的频带的能量与第n-1音频帧的频带的能量之间满足第四预设关系式,If in the sound signal, the energy of the frequency band of the n-th audio frame and the energy of the frequency band of the n-1-th audio frame satisfy the fourth preset relational expression,
或所述第n音频帧的频带的能量与第n-2音频帧的频带的能量之间满足第五预设关系式,则更新设定的第二计数值,所述n表示大于2的整数,且M≥n;Or the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame satisfy the fifth preset relational expression, then update the set second count value, and the n represents an integer greater than 2 , and M≥n;
若所述第二计数值与所述声音信号的时长之间满足第六预设关系式,则确定所述声音信号的频带之间的能量差异在第一预设范围内。If a sixth preset relational expression is satisfied between the second count value and the duration of the sound signal, it is determined that the energy difference between the frequency bands of the sound signal is within a first preset range.
在第二方面的一种可能的实现方式中,所述判定模块还用于:In a possible implementation manner of the second aspect, the determining module is further configured to:
若所述声音信号中,第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值在预设差值范围内,If in the sound signal, the difference between the target frequency of the ith frequency band of the n th audio frame and the target frequency of the ith frequency band of the n-1 th audio frame is within the preset difference range,
或第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值在预设差值范围内,则更新设定的第三计数值,所述n表示大于2的整数,且M≥n,所述i表示大于0的整数;Or the difference between the target frequency of the ith frequency band of the nth audio frame and the target frequency of the ith frequency band of the n-2th audio frame is within the preset difference range, then update the set third count value, so The n represents an integer greater than 2, and M≥n, and the i represents an integer greater than 0;
若所述第三计数值与所述声音信号的时长之间满足第七预设关系式,则确定所述声音信号的目标频率之间的差异在第二预设范围内。If a seventh preset relational expression is satisfied between the third count value and the duration of the sound signal, it is determined that the difference between the target frequencies of the sound signal is within a second preset range.
本申请实施例的第三方面提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面所述的玻璃破碎声音侦测方法。A third aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program The method for detecting the sound of glass breaking according to the first aspect is realized.
本申请实施例的第四方面提供了一种玻璃破碎声音侦测系统,包括:声音采集装置、报警装置以及如上述第三方面所述的电子设备,所述电子设备与所述声音采集装置以及所述报警装置通信连接。A fourth aspect of the embodiments of the present application provides a glass breaking sound detection system, including: a sound collection device, an alarm device, and the electronic device according to the third aspect, the electronic device and the sound collection device and The alarm device is communicatively connected.
本申请实施例的第五方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面所述的玻璃破碎声音侦测方法。A fifth aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the glass breaking according to the above first aspect is realized Sound detection method.
本申请实施例的第六方面提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行如上述第一方面所述的玻璃破碎声音侦测方法。A sixth aspect of the embodiments of the present application provides a computer program product that, when the computer program product runs on an electronic device, causes the electronic device to execute the method for detecting the sound of glass breaking according to the first aspect.
本申请实施例与现有技术相比存在的有益效果是:通过获取声音信号,对声音信号进行分帧处理,获得M个音频帧,若根据当前的音频帧的频谱参数确定当前的音频帧为爆裂音,则根据当前的音频帧的频谱参数,以及当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值,若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,则判断所述声音信号为玻璃破碎声音。由于当获取到拍手声、敲击声、下雨声等一些短爆音时,也会侦测到爆裂音,而短爆音相对于玻璃破碎声音的能量衰减较快,因此,在侦测到爆裂音时,再进一步计算当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,并在衰减量都在预设范围内,判定声音信号为玻璃破碎声音。也即,由于可以排除爆裂音为短爆音的情形,因此能够提高侦测玻璃破碎声音的准确率,且由于根据音频帧的频谱参数即可确定音频帧的能量的衰减量,因此,相对于通过神经网络算法确定玻璃是否破碎的处理方法,运算过程更为简单。Compared with the prior art, the embodiment of the present application has the following beneficial effects: by acquiring the sound signal, the sound signal is framed to obtain M audio frames, if the current audio frame is determined according to the frequency spectrum parameter of the current audio frame as crackling sound, then according to the spectrum parameters of the current audio frame and the spectrum parameters of the audio frame after the current audio frame, determine the peak value of the crackling sound, if the energy of the N p audio frames after the current audio frame is relative to the If the attenuation of the peak value of the crackling sound is all within the preset attenuation range, it is determined that the sound signal is the sound of glass breaking. Since some short pops such as clapping, knocking, raining, etc. are obtained, the popping sound will also be detected, and the energy of the short popping sound is attenuated faster than the glass breaking sound. Therefore, when the popping sound is detected , further calculate the attenuation of the energy of N p audio frames after the current audio frame relative to the peak value of the crackling sound, and determine that the sound signal is glass breaking sound when the attenuation is within a preset range. That is, since the situation where the popping sound is a short popping sound can be excluded, the accuracy of detecting the sound of glass breaking can be improved, and since the attenuation amount of the energy of the audio frame can be determined according to the spectral parameter of the audio frame, the The neural network algorithm determines whether the glass is broken or not, and the operation process is simpler.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that are required to be used in the description of the embodiments or the prior art.
图1是本申请实施例提供的玻璃破碎声音侦测系统的示意图;1 is a schematic diagram of a glass breaking sound detection system provided by an embodiment of the present application;
图2是本申请实施例提供的声音采集装置的示意图;2 is a schematic diagram of a sound collection device provided by an embodiment of the present application;
图3是本申请一实施例提供的玻璃破碎声音侦测方法的实现流程示意图;FIG. 3 is a schematic flowchart of the realization of the method for detecting the sound of broken glass provided by an embodiment of the present application;
图4是本申请实施例提供的侦测声音信号是否是短爆音的流程图;4 is a flowchart of detecting whether the sound signal is a short pop provided by an embodiment of the present application;
图5是本申请实施例提供的侦测当前的音频帧是否是爆裂音的流程图;5 is a flowchart of detecting whether the current audio frame is a crackling sound provided by an embodiment of the present application;
图6是本申请实施例提供的白噪声的侦测流程图;Fig. 6 is the detection flow chart of the white noise provided by the embodiment of the present application;
图7是本申请另一实施例提供的玻璃破碎声音侦测方法的流程图;7 is a flow chart of a method for detecting a sound of broken glass provided by another embodiment of the present application;
图8是本申请实施例提供的判断声音信号是否在预设时长内的流程图;8 is a flowchart of determining whether a sound signal is within a preset duration provided by an embodiment of the present application;
图9是本申请实施例提供的判断声音信号的频谱参数是否符合预设频谱特征的流程图;FIG. 9 is a flow chart of judging whether a spectral parameter of a sound signal conforms to a preset spectral characteristic provided by an embodiment of the present application;
图10是本申请实施例提供的判断声音信号的频带之间的能量差异在第一预设范围内的流程图;10 is a flowchart of determining that the energy difference between frequency bands of a sound signal is within a first preset range provided by an embodiment of the present application;
图11是本申请实施例提供的判断声音信号的目标频率之间的差异在第二预设范围内的流程图;11 is a flowchart for judging that the difference between target frequencies of sound signals is within a second preset range provided by an embodiment of the present application;
图12是本申请实施例提供的玻璃破碎声音侦测装置的示意图;12 is a schematic diagram of a glass breaking sound detection device provided by an embodiment of the present application;
图13是本申请实施例提供的电子设备的示意图。FIG. 13 is a schematic diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solutions described in the present application, the following specific embodiments are used for description.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other features , whole, step, operation, element, component and/or the presence or addition of a collection thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the specification of the application herein is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be further understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items .
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting" . Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".
另外,在本申请的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the present application, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
现有的基于声音信号侦测玻璃破碎的方法一般是基于神经网络的处理方法,或者是根据声音信号的强度确定是否是玻璃破碎声音。但是,基于神经网络的处理方法存在运算复杂的问题,根据声音信号的强度确定是否是玻璃破碎声音的方法存在侦测不准确的问题。为此,本申请实施例提供了玻璃破碎声音侦测方法,可以提高侦测玻璃破碎声音的准确率,且运算简单。Existing methods for detecting glass breaking based on sound signals are generally processing methods based on neural networks, or determine whether it is the sound of glass breaking according to the strength of the sound signal. However, the processing method based on the neural network has the problem of complicated operation, and the method of determining whether it is the sound of glass breaking according to the strength of the sound signal has the problem of inaccurate detection. To this end, the embodiments of the present application provide a method for detecting the sound of broken glass, which can improve the accuracy of detecting the sound of broken glass, and the operation is simple.
本申请实施例提供的玻璃破碎声音侦测方法应用于玻璃破碎声音侦测系统,如图1所示,本申请实施例提供的玻璃破碎声音侦测系统包括声音采集装置100、电子设备200以及报警装置300,电子设备200与声音采集装置100以及报警装置300通信连接。其中,声音采集装置100、电子设备200以及报警装置300可以集成于同一设备上,也可以是互相独立的设备。若电子设备200为独立的设备,电子设备200可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。The glass breaking sound detection method provided by the embodiment of the present application is applied to a glass breaking sound detection system. As shown in FIG. 1 , the glass breaking sound detection system provided by the embodiment of the present application includes a
如图2所示,在一种可能的实现方式中,声音采集装置100包括麦克风11、放大器12、滤波器13以及模数转换器14。麦克风11用于采集声音,将采集到的声音转换为电流信号,并将电流信号输入放大器12。放大器12用于根据设定的放大比例调整电流信号,并将调整后的电流信号输入滤波器13。滤波器13用于对调整后的电流信号进行信号加强、信号等化以及滤波处理,得到滤波后的信号,并将滤波后的信号输入模数转换器14。滤波后的信号为模拟信号,模数转换器14用于根据设定的采样频率和位元数将滤波后的信号转换为数字信号,并将数字信号输出至电子设备200,电子设备200再将数字信号转换为模拟信号,该模拟信号即为下述的声音信号。As shown in FIG. 2 , in a possible implementation manner, the
电子设备200用于判定接收到的声音信号是否是玻璃破碎声音,若判定接收到的声音信号是玻璃破碎声音,则生成玻璃破碎警报,将玻璃破碎警报发送至报警装置300,报警装置300用于根据玻璃破碎警报发出声音或者光线,以提示用户玻璃破碎。The
如图3所示,本申请一实施例提供的玻璃破碎声音侦测方法包括:As shown in FIG. 3 , the method for detecting the sound of broken glass provided by an embodiment of the present application includes:
S101:获取声音信号,对声音信号进行分帧处理,获得M个音频帧,M表示大于0的整数。S101: Acquire a sound signal, perform frame segmentation processing on the sound signal, and obtain M audio frames, where M represents an integer greater than 0.
其中,每个音频帧的时长可以为30ms,可以采用汉明窗(Hamming window)对声音信号进行分帧处理,由于采用汉明窗具有降低信息丢失的特点,因此,采用汉明窗对声音信号进行分帧处理,可以较少声音信号的频谱失真。Among them, the duration of each audio frame can be 30ms, and a Hamming window can be used to process the sound signal in frames. Since the Hamming window has the characteristics of reducing information loss, the Hamming window is used to process the sound signal. Framing processing can reduce the spectral distortion of the sound signal.
在一种可能的实现方式中,对声音信号进行分帧处理后,对音频帧进行频谱补偿处理,以补偿声音信号的失真,使得处理后的声音信号在每个频带的振幅响应趋于接近状态。In a possible implementation manner, after the sound signal is processed by framing, spectrum compensation processing is performed on the audio frame to compensate the distortion of the sound signal, so that the amplitude response of the processed sound signal in each frequency band tends to be close to the state .
S102:若根据当前的音频帧的频谱参数确定当前的音频帧为爆裂音,则根据当前的音频帧的频谱参数,以及当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值。S102: If it is determined that the current audio frame is a crackling sound according to the frequency spectrum parameter of the current audio frame, then according to the frequency spectrum parameter of the current audio frame and the frequency spectrum parameter of the audio frame after the current audio frame, determine the peak value of the crackling sound.
具体地,对音频帧进行傅立叶转换处理及再次分帧处理,得到音频帧的频谱参数。其中,音频帧为时域信号,对音频帧进行傅立叶转换处理可以得到音频帧对应的频域信号,将频域信号划分为若干个频带,对每个频带中各频率的振幅求平均,得到各个频带的能量。根据划分后的频带,也可以确定出每个频带的波峰值以及波峰值对应的频率。对音频帧进行再次分帧处理,得到至少一个子帧,对子帧的时域信号的波形数据求均方根,得到各个子帧的能量。频带的能量、频带的波峰值以及波峰值对应的频率、子帧的能量均为频谱参数。Specifically, Fourier transform processing and sub-framing processing are performed on the audio frame to obtain spectral parameters of the audio frame. Among them, the audio frame is a time-domain signal, and by performing Fourier transform processing on the audio frame, the frequency-domain signal corresponding to the audio frame can be obtained. The frequency-domain signal is divided into several frequency bands, and the amplitude of each frequency in each frequency band is averaged. band energy. According to the divided frequency bands, the peak value of each frequency band and the frequency corresponding to the peak value can also be determined. The audio frame is divided into frames again to obtain at least one subframe, and the root mean square of the waveform data of the time domain signal of the subframe is obtained to obtain the energy of each subframe. The energy of the frequency band, the peak value of the frequency band, the frequency corresponding to the peak value, and the energy of the subframe are all spectral parameters.
在一种可能的实现方式中,若检测到当前的音频帧的子帧的能量的最大值大于预设值,则确定当前的音频帧为爆裂音。在确定当前的音频帧为爆裂音后,可以将当前的音频帧的频带的能量总和确定为爆裂音峰值。也可以在将当前的音频帧的频带的能量总和确定为爆裂音峰值后,将当前音频帧的后一音频帧作为第一音频帧,计算第一音频帧的各频带的能量的总和,若第一音频帧的各频带的能量的总和大于爆裂音峰值,则将爆裂音峰值更新为第一音频帧的各频带的能量总和,再将第一音频帧的后一音频帧作为第一音频帧,返回执行计算第一音频帧的各频带的能量总和的步骤以及后续步骤,直到满足预设结束条件。其中,预设结束条件可以是在达到预设时长时,或者是第一音频帧的各频带的能量的总和达到预设的能量范围内。In a possible implementation manner, if it is detected that the maximum value of the energy of the subframes of the current audio frame is greater than a preset value, it is determined that the current audio frame is a crackling sound. After it is determined that the current audio frame is a crackling sound, the energy sum of the frequency bands of the current audio frame may be determined as a crackling sound peak value. Alternatively, after determining the total energy of the frequency bands of the current audio frame as the peak of the crackling sound, take the next audio frame of the current audio frame as the first audio frame, and calculate the sum of the energy of each frequency band of the first audio frame. The sum of the energy of each frequency band of an audio frame is greater than the peak value of the crackling sound, then the peak value of the crackling sound is updated to the sum of the energy of each frequency band of the first audio frame, and then the next audio frame of the first audio frame is used as the first audio frame, Return to perform the step of calculating the energy sum of each frequency band of the first audio frame and the subsequent steps until the preset end condition is satisfied. Wherein, the preset end condition may be when the preset duration is reached, or the sum of the energy of each frequency band of the first audio frame reaches the preset energy range.
S103:若当前的音频帧之后的Np个音频帧的能量相对于爆裂音峰值的衰减量,都在预设衰减范围内,则判定声音信号为玻璃破碎声音,Np表示大于0的整数,且M>Np。S103: If the energy of the N p audio frames after the current audio frame is within the preset attenuation range relative to the attenuation of the crackling sound peak, then determine that the sound signal is the sound of glass breaking, and N p represents an integer greater than 0, and M>N p .
具体地,可以根据当前的音频帧之后的Np个音频帧能量与爆裂音峰值的比值或者差值计算衰减量。其中,Np为预先设定的经验值,例如,Np=4。Specifically, the attenuation amount may be calculated according to the ratio or difference between the energy of N p audio frames following the current audio frame and the peak value of the crackling sound. Wherein, N p is a preset empirical value, for example, N p =4.
在一种可能的实现方式中,从当前的音频帧的下一个音频帧开始,计算每个音频帧的频带的能量总和与爆裂音峰值的比值,若该比值大于预设比值,说明对应的音频帧的能量的衰减量在预设衰减范围内,若当前的音频帧之后的Np个音频帧中,每个音频帧的能量总和与爆裂音峰值的比值均大于预设比值,则判定声音信号为玻璃破碎声音。In a possible implementation, starting from the next audio frame of the current audio frame, calculate the ratio of the energy sum of the frequency bands of each audio frame to the peak value of the crackling sound. If the ratio is greater than the preset ratio, it indicates that the corresponding audio frequency The attenuation of the energy of the frame is within the preset attenuation range. If the ratio of the sum of the energy of each audio frame to the peak value of the crackling sound in the N p audio frames after the current audio frame is greater than the preset ratio, it is determined that the sound signal For the sound of glass breaking.
上述实施例中,由于当获取到拍手声、敲击声、捏瓶子声、麻将声、硬币落地声、下雨声等一些短爆音时,也会侦测到爆裂音,而短爆音相对于玻璃破碎声音的能量衰减较快,因此,在侦测到爆裂音时,再进一步计算当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,若衰减量都在预设范围内,判定声音信号为玻璃破碎声音,否则,声音信号不是玻璃破碎声音,可以排除爆裂音为短爆音的情形,提高侦测玻璃破碎声音的准确率,且根据音频帧的频谱参数即可确定音频帧的能量的衰减量,相对于通过神经网络算法侦测玻璃破碎声音的方法,运算过程较为简单。In the above-mentioned embodiment, when some short pops such as clap, knocking, bottle pinching, mahjong, coin landing, and rain are obtained, popping sounds are also detected, and the short popping sounds are relative to glass. The energy of the broken sound decays faster, therefore, when the crackling sound is detected, the attenuation of the energy of the N p audio frames after the current audio frame relative to the peak value of the crackling sound is further calculated. Within the preset range, it is determined that the sound signal is the sound of breaking glass. Otherwise, the sound signal is not the sound of breaking glass, which can eliminate the situation that the popping sound is a short popping sound, and improve the accuracy of detecting the sound of breaking glass. The attenuation of the energy of the audio frame can be determined. Compared with the method of detecting the sound of glass breaking through the neural network algorithm, the calculation process is relatively simple.
在一种可能的实现方式中,侦测声音信号是否是短爆音的流程如图4所示,首先计算当前的音频帧的频带的能量的总和、以及频带的能量中的最大值,将音频帧的频带的能量的总和,或者频带的能量中的最大值设为爆裂音峰值。设定爆裂音峰值后,将当前的音频帧之后的第一个音频帧作为第一音频帧,并设定音频帧的帧计数值为FrameCount,FrameCount的初始值为1。对于第一音频帧,首先判断FrameCount是否满足FrameCount≤Np。若FrameCount不满足FrameCount≤Np,则继续下一阶段的检测,例如,进行玻璃破碎声音的判定,或者继续检测声音信号的其它频谱参数;若FrameCount满足FrameCount≤Np,则将FrameCount更新为FrameCount+1,计算第一音频帧的频带的能量的总和Efs,或者频带的能量中的最大值Efmax。例如,若设定的爆裂音峰值为当前的音频帧的频带的能量的总和,则计算第一音频帧的频带的能量的总和,若设定的爆裂音峰值为当前的音频帧的频带的能量的最大值,则计算第一音频帧的频带的能量的最大值。In a possible implementation, the process of detecting whether the sound signal is a short pop is shown in Figure 4. First, the sum of the energy of the frequency bands of the current audio frame and the maximum value of the energy of the frequency bands are calculated, and the audio frame The sum of the energies of the frequency bands, or the maximum of the energies of the frequency bands is set as the crackle peak. After setting the peak value of the crackling sound, the first audio frame after the current audio frame is taken as the first audio frame, and the frame count value of the audio frame is set to FrameCount, and the initial value of FrameCount is 1. For the first audio frame, first determine whether FrameCount satisfies FrameCount≤N p . If FrameCount does not satisfy FrameCount≤N p , continue to the next stage of detection, for example, determine the sound of broken glass, or continue to detect other spectral parameters of the sound signal; if FrameCount satisfies FrameCount≤N p , then update FrameCount to
以设定的爆裂音峰值为当前的音频帧的频带的能量的总和为例,若第一音频帧的频带的能量的总和大于爆裂音峰值,则说明侦测到新的爆裂音峰值,将爆裂音峰值更新为所述第一音频帧的频带的能量总和,并记录新的爆裂音峰值对应的位置,即当前的FrameCount。爆裂音峰值更新完成后,将第一音频帧的下一音频帧(即第二音频帧)作为第一音频帧,重复上述第一音频帧的检测过程,即返回执行判断FrameCount是否满足FrameCount≤Np的步骤以及后续的步骤。若第二音频帧的频带的能量的总和大于爆裂音峰值,则再次更新爆裂音峰值为第二音频帧的频带的能量的总和,并将第二音频帧的下一音频帧作为第一音频帧,重复上述第一音频帧的检测过程。Taking the set peak value of the crackling sound as the sum of the energy of the frequency bands of the current audio frame as an example, if the sum of the energy of the frequency bands of the first audio frame is greater than the peak value of the crackling sound, it means that a new peak value of the crackling sound is detected, and it will burst. The sound peak value is updated to the energy sum of the frequency bands of the first audio frame, and the position corresponding to the new crackling sound peak value, that is, the current FrameCount, is recorded. After the popping sound peak value update is completed, the next audio frame (ie, the second audio frame) of the first audio frame is used as the first audio frame, and the above-mentioned detection process of the first audio frame is repeated, that is, the execution returns to determine whether FrameCount satisfies FrameCount≤N Steps of p and subsequent steps. If the sum of the energy of the frequency bands of the second audio frame is greater than the peak value of the crackling sound, update the peak value of the crackling sound again to the sum of the energy of the frequency bands of the second audio frame, and use the next audio frame of the second audio frame as the first audio frame , and repeat the above-mentioned detection process of the first audio frame.
在另一种可能的实现方式中,为了防止将异常信号识别为爆裂音峰值,也可以根据公式Efs>Ef_max*Ar且Efs*As>Ef_min来判定是否更新爆裂音峰值,其中,“*”表示相乘运算,Efs为第一音频帧的频带的能量总和,Ef_max为当前设定的爆裂音峰值,Ef_min为设定的爆裂音峰值的最小值,Ar和As为设定的常数,例如,可以设定0.8≤Ar<1,As≤0.7。若Efs满足公式Efs>Ef_max*Ar且Ef_max*As>Ef_min,则说明侦测到新的爆裂音峰值。In another possible implementation manner, in order to prevent the abnormal signal from being identified as the crackle peak, it is also possible to determine whether to update the crackle peak according to the formula Efs>Ef_max*Ar and Efs*As>Ef_min, where "*" represents Multiplication operation, Efs is the energy sum of the frequency bands of the first audio frame, Ef_max is the currently set crackle peak value, Ef_min is the minimum value of the set crackle peak value, Ar and As are the set constants, for example, you can Set 0.8≤Ar<1, As≤0.7. If Efs satisfies the formula Efs>Ef_max*Ar and Ef_max*As>Ef_min, it means that a new crackle peak is detected.
若第一音频帧的频带的能量的总和小于或者等于爆裂音峰值,则说明未侦测到新的爆裂音峰值,计算第一音频帧的频带的能量的总和相对于爆裂音峰值的衰减量,若衰减量在预设衰减范围内,说明第一音频帧衰减不快,将下一音频帧作为第一音频帧,重复上述第一音频帧的检测过程,即返回执行判断FrameCount是否满足FrameCount≤Np的步骤以及后续的步骤。若衰减量不在预设衰减范围内,说明第一音频帧的能量衰减太快,确定第一音频帧为短爆音,并回到设定的初始侦测状态,即重新等待获取声音信号。If the sum of the energy of the frequency bands of the first audio frame is less than or equal to the peak of the crackling sound, it means that no new peak of crackling sound is detected. Calculate the attenuation of the sum of the energy of the frequency bands of the first audio frame relative to the peak of the crackling sound, If the attenuation is within the preset attenuation range, it means that the first audio frame is not attenuated quickly, the next audio frame is taken as the first audio frame, and the above-mentioned detection process of the first audio frame is repeated, that is, the execution returns to determine whether FrameCount satisfies FrameCount≤N p steps and subsequent steps. If the attenuation is not within the preset attenuation range, it means that the energy of the first audio frame attenuates too fast, determine that the first audio frame is a short pop, and return to the set initial detection state, that is, wait for the sound signal to be acquired again.
其中,可以根据公式Efs*Decay_i<Ef_max来判断第一音频帧的能量是否衰减太快,其中,Decay_i为第一音频帧对应的能量衰减系数,i=FrameCount,Decay_i随着FrameCount的增大而增大,例如,Decay_i=FrameCount-2。若Efs满足公式Efs*Decay_i<Ef_max,说明第一音频帧的能量衰减太快,否则说明第一音频帧的能量衰减不快。Among them, it can be judged whether the energy of the first audio frame decays too fast according to the formula Efs*Decay_i<Ef_max, wherein, Decay_i is the energy decay coefficient corresponding to the first audio frame, i=FrameCount, Decay_i increases with the increase of FrameCount Large, eg Decay_i=FrameCount-2. If Efs satisfies the formula Efs*Decay_i<Ef_max, it means that the energy decay of the first audio frame is too fast; otherwise, it means that the energy decay of the first audio frame is not fast.
上述实施例中,通过计算每个音频帧的频带的能量,判断是否出现新的爆裂音峰值,若出现新的爆裂音峰值,则更新爆裂音峰值,从而提高了确定出的爆裂音峰值的准确性,进而提高了计算出的能量衰减量的准确性,进而提高了短爆音侦测的准确性。In the above embodiment, by calculating the energy of the frequency band of each audio frame, it is judged whether a new crackling sound peak occurs, and if a new crackling sound peak occurs, the crackling sound peak value is updated, thereby improving the accuracy of the determined crackling sound peak value. Therefore, the accuracy of the calculated energy attenuation is improved, and the accuracy of the short pop detection is further improved.
在一种可能的实现方式中,侦测当前的音频帧是否是爆裂音的流程如图5所示,首先计算当前的音频帧的子帧的能量的最大值E_Max(n),若当前的音频帧的子帧的能量的最大值E_Max(n)小于预设第二阈值E_MAX,则返回检测下一音频帧。若当前的音频帧的子帧的能量的最大值E_Max(n)大于预设第二阈值E_MAX,则根据当前的音频帧的子帧的能量确定当前的音频帧的能量比E_Ratio(n),根据前一音频帧的子帧的能量确定前一音频帧的能量比E_Ratio(n-1),根据当前的音频帧的频带的能量以及当前的音频帧的前一音频帧的频带的能量,确定当前的音频帧与前一音频帧的能量比Ef_Ratio(n);根据当前的音频帧的子帧的能量以及前一音频帧的子帧的能量确定当前的音频帧与前一音频帧的能量差E_Dist(n)。以音频帧包括四个子帧为例,当前的音频帧的能量比E_Ratio(n)的计算公式为E_Ratio(n)=MAX(Ej(n)/Ej-1(n))|j=1~4,其中,其中,Ej(n)表示当前的音频帧的第j个子帧的能量,Ej(n-1)表示当前的音频帧的前一音频帧的第j个子帧的能量,Ej(n)/Ej-1(n)表示当前子帧的能量与前一子帧的能量的比值,“MAX”表示取最大值,Ej-1(n)>0。前一音频帧的能量比E_Ratio(n-1)的计算方法与当前音频帧的能量比的计算方法相同。当前的音频帧与前一音频帧的能量比Ef_Ratio(n)的计算公式为Ef_Ratio(n)=Efh(n)/Efh(n-1),其中,Efh(n)为当前的音频帧中最高的频带的能量,Efh(n-1)为前一音频帧中最高的频带的能量,Efh(n-1)>0。当前的音频帧与前一音频帧的能量差E_Dist(n)的计算公式为E_Dist(n)=MAX(Ej(n)-Ej(n-1))|j=1~4。In a possible implementation manner, the process of detecting whether the current audio frame is a crackling sound is shown in Figure 5. First, the maximum value E_Max(n) of the energy of the subframes of the current audio frame is calculated. If the maximum value E_Max(n) of the energy of the subframes of the frame is smaller than the preset second threshold value E_MAX, the method returns to detect the next audio frame. If the maximum value E_Max(n) of the energy of the subframes of the current audio frame is greater than the preset second threshold E_MAX, then determine the energy ratio E_Ratio(n) of the current audio frame according to the energy of the subframes of the current audio frame, according to The energy of the subframe of the previous audio frame determines the energy ratio E_Ratio(n-1) of the previous audio frame, according to the energy of the frequency band of the current audio frame and the energy of the frequency band of the previous audio frame of the current audio frame, determine the current The energy ratio Ef_Ratio(n) of the audio frame and the previous audio frame; Determine the energy difference E_Dist of the current audio frame and the previous audio frame according to the energy of the subframe of the current audio frame and the energy of the subframe of the previous audio frame (n). Taking the audio frame including four subframes as an example, the calculation formula of the energy ratio E_Ratio(n) of the current audio frame is E_Ratio(n)=MAX(Ej(n)/Ej-1(n))|j=1~4 , where Ej(n) represents the energy of the jth subframe of the current audio frame, Ej(n-1) represents the energy of the jth subframe of the previous audio frame of the current audio frame, and Ej(n) /Ej-1(n) represents the ratio of the energy of the current subframe to the energy of the previous subframe, "MAX" represents the maximum value, and Ej-1(n)>0. The calculation method of the energy ratio E_Ratio(n-1) of the previous audio frame is the same as the calculation method of the energy ratio of the current audio frame. The formula for calculating the energy ratio Ef_Ratio(n) between the current audio frame and the previous audio frame is Ef_Ratio(n)=Efh(n)/Efh(n-1), where Efh(n) is the highest in the current audio frame The energy of the frequency band, Efh(n-1) is the energy of the highest frequency band in the previous audio frame, Efh(n-1)>0. The formula for calculating the energy difference E_Dist(n) between the current audio frame and the previous audio frame is E_Dist(n)=MAX(Ej(n)-Ej(n-1))|j=1˜4.
若当前的音频帧与前一音频帧的能量比Ef_Ratio(n)大于第一预设值Ef_RATIO_THRD_1,且当前的音频帧的能量比大于第二预设值E_RATIO_THRD_1;If the energy ratio Ef_Ratio(n) of the current audio frame and the previous audio frame is greater than the first preset value Ef_RATIO_THRD_1, and the energy ratio of the current audio frame is greater than the second preset value E_RATIO_THRD_1;
或,若当前的音频帧与前一音频帧的能量比大于第一预设值Ef_RATIO_THRD_1,且前一音频帧的能量比大于第二预设值E_RATIO_THRD_1;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than the first preset value Ef_RATIO_THRD_1, and the energy ratio of the previous audio frame is greater than the second preset value E_RATIO_THRD_1;
或,若当前的音频帧与前一音频帧的能量比Ef_Ratio(n)大于第三预设值Ef_RATIO_THRD_2,且当前的音频帧的能量比E_Ratio(n)大于第四预设值E_RATIO_THRD_2;Or, if the energy ratio Ef_Ratio(n) of the current audio frame and the previous audio frame is greater than the third preset value Ef_RATIO_THRD_2, and the energy ratio E_Ratio(n) of the current audio frame is greater than the fourth preset value E_RATIO_THRD_2;
或,若当前的音频帧与前一音频帧的能量比Ef_Ratio(n)大于第三预设值Ef_RATIO_THRD_2,且前一音频帧的能量比大于第四预设值E_RATIO_THRD_2;Or, if the energy ratio Ef_Ratio(n) of the current audio frame and the previous audio frame is greater than the third preset value Ef_RATIO_THRD_2, and the energy ratio of the previous audio frame is greater than the fourth preset value E_RATIO_THRD_2;
或,若当前的音频帧与前一音频帧的能量差E_Dist(n)大于第五预设值ED_MAX,且当前的音频帧与前一音频帧的能量比Ef_Ratio(n)大于第六预设值Ef_RATIO_THRD_3,且当前的音频帧的能量比大于第七预设值E_RATIO_THRD_3;Or, if the energy difference E_Dist(n) between the current audio frame and the previous audio frame is greater than the fifth preset value ED_MAX, and the energy ratio Ef_Ratio(n) between the current audio frame and the previous audio frame is greater than the sixth preset value Ef_RATIO_THRD_3, and the energy ratio of the current audio frame is greater than the seventh preset value E_RATIO_THRD_3;
或,若当前的音频帧与前一音频帧的能量差E_Dist(n)大于第五预设值ED_MAX,且当前的音频帧与前一音频帧的能量比Ef_Ratio(n)大于第六预设值Ef_RATIO_THRD_3,且前一音频帧的能量比大于第七预设值E_RATIO_THRD_3;Or, if the energy difference E_Dist(n) between the current audio frame and the previous audio frame is greater than the fifth preset value ED_MAX, and the energy ratio Ef_Ratio(n) between the current audio frame and the previous audio frame is greater than the sixth preset value Ef_RATIO_THRD_3, and the energy ratio of the previous audio frame is greater than the seventh preset value E_RATIO_THRD_3;
则确定当前的音频帧为爆裂音。否则将当前的音频帧下一音频帧,作为当前的音频帧,重复上述当前音频帧的检测过程。即返回执行计算当前的音频帧的子帧的能量的最大值的步骤以及后续步骤。Then it is determined that the current audio frame is a crackling sound. Otherwise, the audio frame next to the current audio frame is taken as the current audio frame, and the above-mentioned detection process of the current audio frame is repeated. That is, it returns to perform the step of calculating the maximum value of the energy of the subframes of the current audio frame and the subsequent steps.
其中,E0(n)=E4(n-1),即当前音频帧的第0个子帧的能量为前一音频帧的第4个子帧的能量,E_RATIO_THRD_1,E_RATIO_THRD_2,E_RATIO_THRD_3,Ef_RATIO_THRD_1,Ef_RATIO_THRD_2,Ef_RATIO_THRD_3,ED_MAX为根据经验预先设定的参数。例如,E_MAX=5000,E_RATIO_THRD_1=10,E_RATIO_THRD_2=20,E_RATIO_THRD_3=5,Ef_RATIO_THRD_1=40,Ef_RATIO_THRD_2=20,Ef_RATIO_THRD_3=10。Among them, E0(n)=E4(n-1), that is, the energy of the 0th subframe of the current audio frame is the energy of the 4th subframe of the previous audio frame, E_RATIO_THRD_1, E_RATIO_THRD_2, E_RATIO_THRD_3, Ef_RATIO_THRD_1, Ef_RATIO_THRD_2, Ef_RATIO_THRD_3, ED_MAX is a parameter preset according to experience. For example, E_MAX=5000, E_RATIO_THRD_1=10, E_RATIO_THRD_2=20, E_RATIO_THRD_3=5, Ef_RATIO_THRD_1=40, Ef_RATIO_THRD_2=20, Ef_RATIO_THRD_3=10.
上述实施例中,在侦测到当前的音频帧的子帧的能量的最大值大于预设第二阈值时,进一步计算当前的音频帧的能量比、前一音频帧的能量比、当前的音频帧与前一音频帧的能量比、当前的音频帧与前一音频帧的能量差,通过当前的音频帧的能量比、前一音频帧的能量比、当前的音频帧与前一音频帧的能量比、当前的音频帧与前一音频帧的能量差进一步判定当前音频帧是否是爆裂音,提高了爆裂音侦测的准确度。In the above-described embodiment, when it is detected that the maximum value of the energy of the subframe of the current audio frame is greater than the preset second threshold, the energy ratio of the current audio frame, the energy ratio of the previous audio frame, the current audio The energy ratio between the frame and the previous audio frame, the energy difference between the current audio frame and the previous audio frame, the energy ratio of the current audio frame, the energy ratio of the previous audio frame, the current audio frame and the previous audio frame. The energy ratio and the energy difference between the current audio frame and the previous audio frame further determine whether the current audio frame is a crackling sound, which improves the accuracy of crackling sound detection.
在一种可能的实现方式中,电子设备在根据当前的音频帧的频谱参数确定当前的音频帧为爆裂音之后,判定当前的音频帧之后的NB个音频帧中是否存在白噪声,其中NB为预先设定的经验值,例如,NB=3。若在当前的音频帧之后的NB个音频帧中侦测到白噪声,再进一步计算当前的音频帧之后的音频帧的能量相对于爆裂音峰值的衰减量,从而进一步提高了侦测玻璃破碎声音的准确率。In a possible implementation manner, after determining that the current audio frame is a crackling sound according to the spectral parameters of the current audio frame, the electronic device determines whether white noise exists in N B audio frames following the current audio frame, where N B is a preset empirical value, for example, N B =3. If white noise is detected in N B audio frames after the current audio frame, the attenuation of the energy of the audio frames after the current audio frame relative to the peak of the crackling sound is further calculated, thereby further improving the detection of glass breakage. Accuracy of sound.
在一种可能的实现方式中,白噪声的侦测流程如图6所示,设定音频帧的帧计数值FrameCount,FrameCount的初始值为1,将当前的音频帧之后的第一个音频帧作为第一音频帧,对于第一音频帧,首先判断FrameCount是否满足FrameCount≤NB,若满足FrameCount≤NB,将FrameCount更新为FrameCount+1,计算第一音频帧的频谱参数。本申请实施例中,第一音频帧的频谱参数为第一音频帧各频带的能量。具体地,按照预设的频带划分规则将第一音频帧划分为若干频带。例如,设定最高频率为8000Hz,采样频率为16000Hz,将第一音频帧划分为Ef0、Ef1、Ef2、Ef3、Ef4、Ef5、Ef6、Ef7、Ef8、Ef9共10个频带,每个频带的频段分别为Ef0:0~500Hz,Ef1:500~1000Hz,Ef2:1000~1500Hz,Ef3:1500~2000Hz,Ef4:2000~2500Hz,Ef5:2500~3500Hz,Ef6:3500~4500Hz,Ef7:4500~5500Hz,Ef8:5500~6500Hz,Ef9:6500~8000Hz,Ef9即为最高频带Efh。计算出各频带的能量后,设定第一计数值WhiteNoiseDecisionCnt,WhiteNoiseDecisionCnt的初始值为0。若第一音频帧的频带的能量之间满足第一预设关系式,则将第一音频帧作为目标音频帧,根据第一计算值的计算公式确定第一计数值。若第一计数值小于第一阈值WNDC,将第一音频帧的下一音频帧作为第一音频帧,重复上述第一音频帧的检测过程,即返回执行判断FrameCount是否满足FrameCount≤NB的步骤以及后续的步骤。若第一计数值大于第一阈值WNDC,则记录WhiteNoiseDetected=WhiteNoiseDetected+1,再判断WhiteNoiseDetected是否满足WhiteNoiseDetected>0,若满足WhiteNoiseDetected>0,则第一音频帧为白噪声,继续下一阶段的检测,即计算当前的音频帧之后的音频帧的能量相对于爆裂音峰值的衰减量。若不满足WhiteNoiseDetected>0,则返回初始侦测状态。若第一音频帧对应的FrameCount不满足FrameCount≤NB,则直接判断WhiteNoiseDetected是否满足WhiteNoiseDetected>0。In a possible implementation, the white noise detection process is shown in Figure 6. The frame count value FrameCount of the audio frame is set, and the initial value of FrameCount is 1, and the first audio frame after the current audio frame is set. As the first audio frame, for the first audio frame, first determine whether FrameCount satisfies FrameCount≤N B , if it satisfies FrameCount≤N B , update FrameCount to
其中,第一预设关系式为下述关系式中的任意一项或者多项:Efi>Efi+1*WN_VAR)且(Efi+1>Efi*WN_VAR),i=0,1,2,…8;Efi>Efi+2*WN_VAR)且(Efi+2>Efi*WN_VAR),i=0,1,2,…7;Efi>Efi+3*WN_VAR)且(Efi+3>Efi*WN_VAR),i=0,1,2,…6;(Ef1+Ef2+Ef3+Ef4)*WN_VAR>Ef0;(Ef5+Ef6+Ef7+Ef8)*WN_VAR>Efh。Wherein, the first preset relational expression is any one or more of the following relational expressions: Efi>Efi+1*WN_VAR) and (Efi+1>Efi*WN_VAR), i=0,1,2,… 8; Efi>Efi+2*WN_VAR) and (Efi+2>Efi*WN_VAR), i=0,1,2,…7; Efi>Efi+3*WN_VAR) and (Efi+3>Efi*WN_VAR) , i=0,1,2,...6; (Ef1+Ef2+Ef3+Ef4)*WN_VAR>Ef0; (Ef5+Ef6+Ef7+Ef8)*WN_VAR>Efh.
在一种可能的实现方式中,第一计算值WhiteNoiseDecisionCnt的计算公式为:若Efi>Efi+1*WN_VAR)且(Efi+1>Efi*WN_VAR),则WhiteNoiseDecisionCnt增加N1;若Efi>Efi+2*WN_VAR)且(Efi+2>Efi*WN_VAR),则WhiteNoiseDecisionCnt增加N2;若Efi>Efi+3*WN_VAR)且(Efi+3>Efi*WN_VAR),则WhiteNoiseDecisionCnt增加N3;若(Ef1+Ef2+Ef3+Ef4)*WN_VAR>Ef0,则WhiteNoiseDecisionCnt增加N4;若(Ef5+Ef6+Ef7+Ef8)*WN_VAR>Efh),则WhiteNoiseDecisionCnt增加N5。其中,WN_VAR,N1,N2,N3,N4,N5,WNDC均为预先设定的经验值,例如,WN_VAR=0.5,N1=3,N2=2,N3=1,N4=3,N5=3,WNDC=7。最后根据WhiteNoiseDecisionCnt的增加量(N1的累加值、N2的累加值、N3的累加值、N4的累加值以及N5的累加值的总和)确定WhiteNoiseDecisionCnt的值。In a possible implementation, the calculation formula of the first calculated value WhiteNoiseDecisionCnt is: if Efi>Efi+1*WN_VAR) and (Efi+1>Efi*WN_VAR), then WhiteNoiseDecisionCnt is increased by N1; if Efi>Efi+2 *WN_VAR) and (Efi+2>Efi*WN_VAR), then WhiteNoiseDecisionCnt increases by N2; if Efi>Efi+3*WN_VAR) and (Efi+3>Efi*WN_VAR), then WhiteNoiseDecisionCnt increases by N3; if (Ef1+Ef2+ Ef3+Ef4)*WN_VAR>Ef0, then WhiteNoiseDecisionCnt increases by N4; if (Ef5+Ef6+Ef7+Ef8)*WN_VAR>Efh), then WhiteNoiseDecisionCnt increases by N5. Among them, WN_VAR, N1, N2, N3, N4, N5, WNDC are all preset empirical values, for example, WN_VAR=0.5, N1=3, N2=2, N3=1, N4=3, N5=3, WNDC=7. Finally, the value of WhiteNoiseDecisionCnt is determined according to the increment of WhiteNoiseDecisionCnt (the accumulated value of N1, the accumulated value of N2, the accumulated value of N3, the accumulated value of N4 and the sum of the accumulated value of N5).
需要说明的是,在其他可能的实现方式中,也可以根据上述计算公式中的任意一个或多个作为第一计算值WhiteNoiseDecisionCnt的计算公式。It should be noted that, in other possible implementation manners, any one or more of the above calculation formulas can also be used as the calculation formula of the first calculation value WhiteNoiseDecisionCnt.
上述实施例中,由于频谱能量之间的关系反映音频帧的能量分布情况,根据频谱能量之间的关系侦测音频帧是否是白噪声,提高了侦测的准确度。In the above embodiment, since the relationship between the spectral energies reflects the energy distribution of the audio frame, whether the audio frame is white noise is detected according to the relationship between the spectral energies, which improves the detection accuracy.
在其他可能的实现方式中,也可以根据当前的音频帧的频带的能量是否在预设的能量范围内,确定当前的音频帧之后的音频帧中是否存在白噪声。In other possible implementation manners, it may also be determined whether there is white noise in audio frames following the current audio frame according to whether the energy of the frequency band of the current audio frame is within a preset energy range.
在另一实施例中,电子设备在根据当前的音频帧的频谱参数确定当前的音频帧为爆裂音之后,判定当前的音频帧之后的NB个音频帧中是否存在白噪声。若在当前的音频帧之后的NB个音频帧中侦测到白噪声,再进一步计算当前的音频帧之后的音频帧的能量相对于爆裂音峰值的衰减量,若当前的音频帧之后的Np个音频帧的能量相对于爆裂音峰值的衰减量都在预设衰减范围内,再进一步判断声音信号是否满足第一预设条件,若声音信号满足第一预设条件,则判定声音信号为玻璃破碎声音,以进一步提高玻璃破碎声音侦测的准确度。其中,第一预设条件包括下述中的任意一个或多个:声音信号的时长在预设时长内、声音信号的频谱参数符合预设频谱特征、声音信号的频带之间的能量差异在第一预设范围内、声音信号的目标频率之间的差异在第二预设范围内,目标频率为频带的能量峰值对应的频率。In another embodiment, after determining that the current audio frame is a crackling sound according to the spectral parameters of the current audio frame, the electronic device determines whether white noise exists in N B audio frames following the current audio frame. If white noise is detected in N B audio frames after the current audio frame, further calculate the attenuation of the energy of the audio frames after the current audio frame relative to the peak of the crackling sound, if the N B audio frames after the current audio frame The attenuation of the energy of the p audio frames relative to the peak value of the crackling sound is all within the preset attenuation range, and then it is further judged whether the sound signal satisfies the first preset condition. If the sound signal satisfies the first preset condition, it is determined that the sound signal is Glass breaking sound to further improve the accuracy of glass breaking sound detection. Wherein, the first preset condition includes any one or more of the following: the duration of the sound signal is within the preset duration, the spectral parameters of the sound signal conform to the preset spectral characteristics, the energy difference between the frequency bands of the sound signal is within the first Within a preset range, the difference between the target frequencies of the sound signals is within a second preset range, and the target frequency is the frequency corresponding to the energy peak of the frequency band.
其中,检测声音信号的频谱参数是否符合预设频谱特征是为了排除声音信号为特殊音频的声音信号,例如人声、木头敲击声、鼓声、乐器声、金属敲击声、高跟鞋走路声、雨声等。检测声音信号的频带之间的能量差异是否在第一预设范围内是为了排除声音信号为固定频谱的声音信号,例如,冷气声、风扇声、吸尘器声、马达声等。检测声音信号的目标频率之间的差异是否在第二预设范围内是为了排除声音信号为共振音频的声音信号,例如,钟声、警报声、喇叭声、敲杯子声等。Among them, detecting whether the spectral parameters of the sound signal conform to the preset spectral characteristics is to exclude the sound signal as a special audio sound signal, such as human voice, wood percussion, drum sound, musical instrument sound, metal percussion sound, high-heeled shoes walking sound, rain, etc. The purpose of detecting whether the energy difference between the frequency bands of the sound signal is within the first preset range is to exclude sound signals with a fixed frequency spectrum, such as air-conditioning sound, fan sound, vacuum cleaner sound, and motor sound. The purpose of detecting whether the difference between the target frequencies of the sound signals is within the second preset range is to exclude sound signals whose sound signals are resonant audio, such as bells, alarms, horns, and cup knocks.
在一种可能的实现方式中,第一预设条件包括声音信号的时长在预设时长内、声音信号的频谱参数符合预设频谱特征、声音信号的频带之间的能量差异在第一预设范围内以及声音信号的目标频率之间的差异在第二预设范围内。对应地,玻璃破碎声音侦测方法的流程如图7所示,电子设备在接收到声音信号后,首先设置为初始侦测状态,然后对声音信号进行前处理,前处理即前述的对声音信号进行分帧处理以及对音频帧进行频谱补偿的处理。对声音信号进行前处理后,获取音频帧的频谱参数,先进行第一阶段的爆裂音的检测。爆裂音的检测即为:判断当前的音频帧是否是爆裂音,若当前的音频帧不是爆裂音,则回到初始侦测状态,若当前的音频帧是爆裂音,则进行第二阶段的白噪声侦测。白噪声侦测即判断在当前的音频帧之后的NB个音频帧中是否侦测到白噪声,若在当前的音频帧之后的NB个音频帧中均未侦测到白噪声,则回到初始侦测状态,若侦测到白噪声,则进行第三阶段的短爆音侦测。短爆音侦测即根据当前音频帧之后的音频帧相对于爆裂音峰值的衰减量判断声音信号是否是短爆音,若判定声音信号是短爆音,则回到初始侦测状态,若判定声音信号不是短爆音,则进行第四阶段的声音长度的侦测、特殊音频的侦测、固定频谱的侦测以及共振音频的侦测。若声音信号的长度在预设时长内,且声音信号不是特殊音频的声音信号,且不是固定频谱的声音信号,且不是共振音频的声音信号,则判定声音信号为玻璃破碎声音,并生成玻璃破碎警报;否则回到初始侦测状态。In a possible implementation manner, the first preset condition includes that the duration of the sound signal is within the preset duration, the spectral parameter of the sound signal conforms to the preset spectral characteristics, and the energy difference between frequency bands of the sound signal is within the first preset duration. The range and the difference between the target frequencies of the sound signal are within the second preset range. Correspondingly, the process flow of the glass breaking sound detection method is shown in FIG. 7 . After the electronic device receives the sound signal, it is first set to the initial detection state, and then the sound signal is pre-processed, and the pre-processing is the aforementioned sound signal. Framing processing and processing of spectral compensation for audio frames. After the sound signal is pre-processed, the frequency spectrum parameters of the audio frame are obtained, and the first-stage crackling sound is detected first. The detection of crackling sound is to judge whether the current audio frame is crackling sound. If the current audio frame is not crackling sound, it will return to the initial detection state. If the current audio frame is crackling sound, the second stage of whitening will be performed. Noise detection. White noise detection is to determine whether white noise is detected in the N B audio frames after the current audio frame. If no white noise is detected in the N B audio frames after the current audio frame, return In the initial detection state, if white noise is detected, the third stage short pop detection will be performed. Short pop detection is to judge whether the sound signal is a short pop according to the attenuation of the audio frame after the current audio frame relative to the peak of the pop. If it is determined that the sound signal is a short pop, it will return to the initial detection state. For short pops, the fourth stage of sound length detection, special audio detection, fixed frequency spectrum detection, and resonance audio detection are performed. If the length of the sound signal is within the preset time length, and the sound signal is not a sound signal of special audio, and is not a sound signal of a fixed frequency spectrum, and is not a sound signal of resonant audio, it is determined that the sound signal is a glass breaking sound, and a glass breaking sound is generated. Alarm; otherwise, return to the initial detection state.
其中,电子设备根据声音信号的频带的能量相对于爆裂音峰值的衰减速度和/或衰减时间确定声音信号是否在预设时长内,若衰减速度在预设速度范围内,和/或衰减时间在预设时间范围内,则确定声音信号的时长在预设时长内。Wherein, the electronic device determines whether the sound signal is within the preset time length according to the decay speed and/or decay time of the energy of the frequency band of the sound signal relative to the peak of the crackling sound, if the decay speed is within the preset speed range, and/or the decay time is within Within the preset time range, it is determined that the duration of the sound signal is within the preset duration.
在一种可能的实现方式中,判断声音信号是否在预设时长内的流程如图8所示,将当前音频帧的下一音频帧作为第一音频帧,首先计算第一音频帧中能量大于预设的第一侦测能量E_MIN的子帧的数量E_cnt,例如,第一音频帧包括4个子帧,设定E_cnt的初始值为0,依次计算每个子帧的能量Ej,j=1~4,若满足Ej>E_MIN,则E_cnt更新为E_cnt+1。再计算第一音频帧中能量大于预设的第二侦测能量E_MAX的子帧的数量Emax_cnt,其中,E_MAX>E_MIN,例如,设定Emax_cnt的初始值为0,依次计算每个子帧的能量Ej,若满足Ej>E_MAX,则Emax_cnt更新为Emax_cnt+1。再计算第一音频帧中子帧的能量的最大值Ej_Max。在得到E_cnt、Emax_cnt和Ej_Max后,首先判断是否达到声音结束条件,即声音是否即将结束。在一种可能的实现方式中,可以计算第一音频帧的平均能量,第一音频帧的平均能量为(E1+E2+E3+E4)/4,若第一音频帧的平均能量小于爆裂音峰值的1/4,说明达到声音结束条件。在另一种可能的实现方式中,可以计算子帧的能量的最大值Ej_Max,若Ej_Max<E_MIN,说明达到声音结束条件,其中,爆裂音峰值为侦测到的爆裂音的音频帧中的频带的能量的最大值。若声音即将结束,判断是否满足E_cnt≥T_MIN且Emax_cnt>0,若不满足E_cnt≥T_MIN且Emax_cnt>0,说明第一音频帧的能量较小,进一步说明声音信号的能量衰减速度较快,则回到初始侦测状态,若满足E_cnt≥T_MIN且Emax_cnt>0,则判断是否满足E_cnt≤T_MAX,若满足E_cnt≤T_MAX,说明声音信号的时长在预设时长内,继续下一阶段的检测,否则回到初始侦测状态。其中,T_MIN和T_MAX为预先设定的常数,T_MAX>T_MIN。In a possible implementation manner, the process of judging whether the sound signal is within the preset duration is as shown in FIG. 8 , taking the next audio frame of the current audio frame as the first audio frame, and first calculating that the energy in the first audio frame is greater than The preset number of subframes E_cnt of the first detection energy E_MIN, for example, the first audio frame includes 4 subframes, the initial value of E_cnt is set to 0, and the energy Ej of each subframe is calculated sequentially, j=1~4 , if Ej>E_MIN is satisfied, then E_cnt is updated to
若未达到声音结束条件,则判断第一音频帧的平均能量是否小于爆裂音峰值的1/2,若第一音频帧的平均能量小于爆裂音峰值的1/2,判断是否满足E_cnt>=T_DECAY2,其中,T_DECAY2是预先设定的能量衰减到爆裂音峰值1/2时对应的音频帧的帧计数值。若满足E_cnt≥T_DECAY2,说明声音信号的能量衰减时间较长,回到初始侦测状态。若第一音频帧的平均能量大于或者等于爆裂音峰值的1/2,或者E_cnt<T_DECAY2,说明第一音频帧邻近当前的音频帧,第一音频帧的能量还未开始大幅度衰减,将第一音频帧的下一音频帧作为第一音频帧,重复上述第一音频帧的检测过程,即返回执行计算第一音频帧中能量大于预设的第一侦测能量E_MIN的子帧的数量E_cnt的步骤以及后续步骤。If the sound end condition is not reached, judge whether the average energy of the first audio frame is less than 1/2 of the peak value of the crackle sound, and if the average energy of the first audio frame is less than 1/2 of the peak value of the crackle sound, judge whether it satisfies E_cnt>=T_DECAY2 , where T_DECAY2 is the frame count value of the corresponding audio frame when the preset energy decays to 1/2 of the peak value of the crackling sound. If it satisfies E_cnt≥T_DECAY2, it means that the energy decay time of the sound signal is long, and it returns to the initial detection state. If the average energy of the first audio frame is greater than or equal to 1/2 of the peak value of the crackling sound, or E_cnt<T_DECAY2, it means that the first audio frame is adjacent to the current audio frame, and the energy of the first audio frame has not yet begun to attenuate significantly. The next audio frame of an audio frame is used as the first audio frame, and the above-mentioned detection process of the first audio frame is repeated, that is, the number E_cnt of subframes whose energy is greater than the preset first detection energy E_MIN in the first audio frame is returned to be calculated. steps and subsequent steps.
在其它可能的实现方式中,也可以在第一音频帧达到声音结束条件时,根据第一音频帧的FrameCount的大小计算声音信号的时长;也可以根据第一音频帧与爆裂音峰值的比值,以及当前的音频帧与第一音频帧之间的时长,确定第一音频帧的能量相对于爆裂音峰值的衰减速度。In other possible implementations, when the first audio frame reaches the sound end condition, the duration of the sound signal can be calculated according to the size of the FrameCount of the first audio frame; or according to the ratio of the first audio frame to the peak value of the crackling sound, and the duration between the current audio frame and the first audio frame, to determine the decay speed of the energy of the first audio frame relative to the peak of the crackling sound.
在一种可能的实现方式中,电子设备判断声音信号的频谱参数是否符合预设频谱特征的方法包括:确定声音信号中,频带之间的能量满足第二预设关系式的音频帧的数量,若满足第二预设关系式的音频帧的数量,与声音信号的时长之间满足第三预设关系式,则确定声音信号的第一预设频带的能量符合预设频谱特征。其中,音频帧包括Ef0、Ef1、Ef2、Ef3、Ef4、Ef5、Ef6、Ef7、Ef8、Ef9共10个频带,Ef9=Efh,Ef0又包括Ef00(0~250Hz)和Ef01(250~500Hz)两个频带。第二预设关系式为下述公式中的任意一个或多个:Efh*5≤Ef0,Efh*5≤Ef1,Efh*8≤Ef1+Ef2,(Ef5+Ef6+Ef7+Ef8+Efh)*3≤(Ef0+Ef1+Ef2+Ef3+Ef4),Ef00*5≤Ef01,Ef0*5≤Ef1,(Ef0+Ef1+Ef2)*12≤(Ef4+Ef5+Ef6+Ef7+Ef8+Efh)。In a possible implementation manner, the method for the electronic device to determine whether the spectral parameter of the sound signal conforms to the preset spectral characteristic includes: determining the number of audio frames in the sound signal whose energy between frequency bands satisfies the second preset relational expression, If the number of audio frames satisfying the second predetermined relational expression and the duration of the sound signal satisfy the third predetermined relational expression, it is determined that the energy of the first predetermined frequency band of the sound signal conforms to the predetermined spectral characteristic. Among them, the audio frame includes Ef0, Ef1, Ef2, Ef3, Ef4, Ef5, Ef6, Ef7, Ef8, Ef9 a total of 10 frequency bands, Ef9=Efh, Ef0 also includes Ef00 (0~250Hz) and Ef01 (250~500Hz) two frequency band. The second preset relationship is any one or more of the following formulas: Efh*5≤Ef0, Efh*5≤Ef1, Efh*8≤Ef1+Ef2, (Ef5+Ef6+Ef7+Ef8+Efh)* 3≤(Ef0+Ef1+Ef2+Ef3+Ef4), Ef00*5≤Ef01, Ef0*5≤Ef1, (Ef0+Ef1+Ef2)*12≤(Ef4+Ef5+Ef6+Ef7+Ef8+Efh).
若某一音频帧的频带之间的能量满足第二预设关系式,说明该音频帧的特征符合特殊音频特征。声音信号的时长可以用声音信号结束时对应的音频帧的帧计数值FrameCount来表示,第三预设关系式可以用于表征满足第二预设关系式的音频帧的数量,与声音信号的时长,两者之间的大小关系或者比例关系。If the energy between the frequency bands of an audio frame satisfies the second preset relational expression, it means that the feature of the audio frame conforms to the special audio feature. The duration of the sound signal can be represented by the frame count value FrameCount of the audio frame corresponding to the end of the sound signal, and the third preset relational expression can be used to represent the number of audio frames that satisfy the second preset relational expression, and the duration of the sound signal. , the size relationship or proportional relationship between the two.
在一种可能的实现方式中,电子设备判断声音信号的频谱参数是否符合预设频谱特征的流程如图9所示,在确定当前的音频帧为爆裂音之后,将当前的音频帧的下一音频帧作为第一音频帧,设置参数SpecialSpectrumCount(i)=0,i=1,2,3,…,首先根据第一音频帧的频带之间的能量是否满足第二预设关系式来判断第一音频帧是否符合特殊音频特征。若第一音频帧的频带之间的能量满足第二预设关系式,说明第一音频帧的特征符合特殊音频特征,频带之间的能量满足第二预设关系式的音频帧的数量增加1,本申请实施例中,将SpecialSpectrumCount(i)更新为SpecialSpectrumCount(i)+Wi,Wi为预先设定的常数,可以不为1,不同的第二预设关系式对应的特殊音频特征不同,Wi的值也不同。In a possible implementation manner, the flow of the electronic device judging whether the spectral parameters of the sound signal conform to the preset spectral characteristics is shown in FIG. 9 , after determining that the current audio frame is a crackling sound, the next The audio frame is regarded as the first audio frame, and the parameters SpecialSpectrumCount(i)=0, i=1, 2, 3, . Whether an audio frame conforms to special audio characteristics. If the energy between the frequency bands of the first audio frame satisfies the second preset relational expression, it means that the characteristics of the first audio frame conform to special audio characteristics, and the number of audio frames whose energy between frequency bands satisfies the second predetermined relational expression is increased by 1 , in the embodiment of the present application, SpecialSpectrumCount(i) is updated to SpecialSpectrumCount(i)+Wi, Wi is a preset constant, which may not be 1, the special audio features corresponding to different second preset relational expressions are different, Wi value is also different.
在一种可能的实现方式中,SpecialSpectrumCount(i)的计算公式如下:In a possible implementation, the calculation formula of SpecialSpectrumCount(i) is as follows:
若Efh*5≤Ef0,则SpecialSpectrumCount(1)更新为SpecialSpectrumCount(1)+1;If Efh*5≤Ef0, then SpecialSpectrumCount(1) is updated to SpecialSpectrumCount(1)+1;
若Efh*5≤Ef1,则SpecialSpectrumCount(2)更新为SpecialSpectrumCount(2)+1;If Efh*5≤Ef1, then SpecialSpectrumCount(2) is updated to SpecialSpectrumCount(2)+1;
若Efh*8≤Ef1+Ef2,则SpecialSpectrumCount(3)更新为SpecialSpectrumCount(3)+2;If Efh*8≤Ef1+Ef2, then SpecialSpectrumCount(3) is updated to SpecialSpectrumCount(3)+2;
若(Ef5+Ef6+Ef7+Ef8+Efh)*3≤(Ef0+Ef1+Ef2+Ef3+Ef4),则SpecialSpectrumCount(4)更新为SpecialSpectrumCount(4)+3;If (Ef5+Ef6+Ef7+Ef8+Efh)*3≤(Ef0+Ef1+Ef2+Ef3+Ef4), then SpecialSpectrumCount(4) is updated to SpecialSpectrumCount(4)+3;
若Ef00*5≤Ef01,则SpecialSpectrumCount(5)更新为SpecialSpectrumCount(5)+2;If Ef00*5≤Ef01, then SpecialSpectrumCount(5) is updated to SpecialSpectrumCount(5)+2;
若Ef0*5≤Ef1,则SpecialSpectrumCount(6)更新为SpecialSpectrumCount(6)+2;If Ef0*5≤Ef1, then SpecialSpectrumCount(6) is updated to SpecialSpectrumCount(6)+2;
若(Ef0+Ef1+Ef2)*12≤(Ef4+Ef5+Ef6+Ef7+Ef8+Efh),则SpecialSpectrumCount(7)更新为SpecialSpectrumCount(7)+3。If (Ef0+Ef1+Ef2)*12≤(Ef4+Ef5+Ef6+Ef7+Ef8+Efh), then SpecialSpectrumCount(7) is updated to SpecialSpectrumCount(7)+3.
在计算出SpecialSpectrumCount(i)后,将第一音频帧的下一音频帧作为第一音频帧,重复上述第一音频帧的检测过程,即返回执行判断第一音频帧是否符合特殊音频特征的步骤以及后续步骤。若第一音频帧的频带之间的不能量满足第二预设关系式,说明第一音频帧的特征不符合特殊音频特征,将第一音频帧的下一音频帧作为第一音频帧,重复上述第一音频帧的检测过程,直到检测到声音信号的最后一个音频帧。在检测完成最后一个音频帧后,判断SpecialSpectrumCount(i)与声音信号的时长FrameCount是否满足第三关系式,若不满足第三关系式,说明声音信号的频谱参数符合预设频谱特征,即声音信号符合特殊音频的侦测条件,也即声音信号为特殊音频,进而说明声音信号为非玻璃破碎声音,回到初始侦测状态。若满足第三关系式,说明声音信号的频谱参数不符合预设频谱特征,即声音信号不符合特殊音频的侦测条件,进行下一阶段的检测。After calculating the SpecialSpectrumCount(i), the next audio frame of the first audio frame is used as the first audio frame, and the above-mentioned detection process of the first audio frame is repeated, that is, the step of judging whether the first audio frame conforms to the special audio feature is returned to be executed and next steps. If the energy between the frequency bands of the first audio frame satisfies the second preset relational expression, it means that the characteristics of the first audio frame do not conform to the special audio characteristics, and the next audio frame of the first audio frame is regarded as the first audio frame, repeating The above-mentioned detection process of the first audio frame is performed until the last audio frame of the sound signal is detected. After the last audio frame is detected, it is judged whether SpecialSpectrumCount(i) and the duration FrameCount of the sound signal satisfy the third relational expression. Meet the detection conditions of special audio, that is, the sound signal is a special audio, which further indicates that the sound signal is not the sound of glass breaking, and returns to the initial detection state. If the third relational expression is satisfied, it means that the spectral parameters of the sound signal do not meet the preset spectral characteristics, that is, the sound signal does not meet the detection conditions for special audio, and the next stage of detection is performed.
其中,第三关系式可以是SpecialSpectrumCount(i)与声音信号的时长FrameCount的大小关系。本申请实施例中,为了排除特殊音频的声音信号,可以设定声音信号的频谱参数符合预设频谱特征的关系式,也即设定SpecialSpectrumCount(i)与声音信号的时长FrameCount不满足第三关系式的条件,例如,SpecialSpectrumCount(i)与声音信号的时长FrameCount不满足第三关系式包括:Wherein, the third relational expression may be the magnitude relation between SpecialSpectrumCount(i) and the duration FrameCount of the sound signal. In the embodiment of the present application, in order to exclude the sound signal of the special audio frequency, the spectral parameters of the sound signal can be set to conform to the relational expression of the preset spectral characteristics, that is, the SpecialSpectrumCount(i) and the duration FrameCount of the sound signal are set to not satisfy the third relationship The condition of the formula, for example, SpecialSpectrumCount(i) and the duration FrameCount of the sound signal do not satisfy the third relational formula including:
(SpecialSpectrumCount(1)>3)且(SpecialSpectrumCount(1)≥FrameCount*2-2)),(SpecialSpectrumCount(1)>3) and (SpecialSpectrumCount(1)≥FrameCount*2-2)),
或(SpecialSpectrumCount(2)>3)且(SpecialSpectrumCount(2)≥FrameCount*2-2)),or (SpecialSpectrumCount(2)>3) and (SpecialSpectrumCount(2)≥FrameCount*2-2)),
或(SpecialSpectrumCount(3)>3)且(SpecialSpectrumCount(3)≥FrameCount*2-4)),or (SpecialSpectrumCount(3)>3) and (SpecialSpectrumCount(3)≥FrameCount*2-4)),
或(SpecialSpectrumCount(4)>3)且(SpecialSpectrumCount(4)≥FrameCount-2)),or (SpecialSpectrumCount(4)>3) and (SpecialSpectrumCount(4)≥FrameCount-2)),
或(SpecialSpectrumCount(5)>3)且(SpecialSpectrumCount(5)≥FrameCount*2-2)),or (SpecialSpectrumCount(5)>3) and (SpecialSpectrumCount(5)≥FrameCount*2-2)),
或(SpecialSpectrumCount(6)>3)且(SpecialSpectrumCount(6)≥FrameCount*2-2)),or (SpecialSpectrumCount(6)>3) and (SpecialSpectrumCount(6)≥FrameCount*2-2)),
或(SpecialSpectrumCount(7)>3)且(SpecialSpectrumCount(7)≥FrameCount-3)),or (SpecialSpectrumCount(7)>3) and (SpecialSpectrumCount(7)≥FrameCount-3)),
或SpecialSpectrumCount>10且SpecialSpectrumCount≥FrameCount*4-3,其中,SpecialSpectrumCount=SpecialSpectrumCount(1)+SpecialSpectrumCount(2)+…+SpecialSpectrumCount(7)。Or SpecialSpectrumCount>10 and SpecialSpectrumCount≥FrameCount*4-3, where SpecialSpectrumCount=SpecialSpectrumCount(1)+SpecialSpectrumCount(2)+...+SpecialSpectrumCount(7).
上述实施例中,通过设定多种特殊音频特征对应的第二关系式,可以更全面地排除特殊音频的声音信号,提高玻璃破碎声音侦测的准确度。In the above-mentioned embodiment, by setting the second relational expressions corresponding to various special audio features, the sound signal of the special audio frequency can be excluded more comprehensively, and the accuracy of the sound detection of glass breakage can be improved.
在一种可能的实现方式中,电子设备判断声音信号的频带之间的能量差异在第一预设范围内的方法包括:若所述声音信号中,第n音频帧的频带的能量与第n-1音频帧的频带的能量之间满足第四预设关系式,或第n音频帧的频带的能量与第n-2音频帧的频带的能量之间满足第五预设关系式,则更新设定的第二计数值,n表示大于2的整数,且M≥n;若第二计数值与声音信号的时长之间满足第六预设关系式,则确定声音信号的频带之间的能量差异在第一预设范围内。In a possible implementation manner, the method for the electronic device to determine that the energy difference between frequency bands of the sound signal is within the first preset range includes: if in the sound signal, the energy of the frequency band of the n th audio frame is the same as the n th frequency band. The energy of the frequency band of the -1 audio frame satisfies the fourth preset relational expression, or the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame satisfy the fifth preset relational expression, then update The set second count value, n represents an integer greater than 2, and M≥n; if the sixth preset relational expression is satisfied between the second count value and the duration of the sound signal, then determine the energy between the frequency bands of the sound signal The difference is within a first preset range.
其中,第n-1音频帧是第n音频帧的前一个音频帧,第n-2音频帧是第n音频帧的前两个音频帧。第n音频帧的频带的能量与第n-1音频帧的频带的能量之间满足第四预设关系式,说明第n音频帧的频带的能量与第n-1音频帧的频带的能量在预设的变化范围内。第n音频帧的频带的能量与第n-2音频帧的频带的能量之间满足第五预设关系式,说明第n音频帧的频带的能量与第n-2音频帧的频带的能量在预设的变化范围内。声音信号的时长可以用声音信号结束时对应的音频帧的帧计数值FrameCount来表示,第六预设关系式可以用于表征第二计数值与所述声音信号的时长之间的大小关系或者比值关系。若所述第二计数值与所述声音信号的时长之间不满足第六预设关系式,说明声音信号中,每个音频帧与其相邻的音频帧存在非变化的频率响应,也即声音信号为固定频谱的声音信号。Wherein, the n-1 th audio frame is the previous audio frame of the n th audio frame, and the n-2 th audio frame is the previous two audio frames of the n th audio frame. The fourth preset relationship is satisfied between the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-1th audio frame, indicating that the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-1th audio frame are between within the preset variation range. The fifth preset relationship is satisfied between the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame, indicating that the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame are between within the preset variation range. The duration of the sound signal can be represented by the frame count value FrameCount of the audio frame corresponding to the end of the sound signal, and the sixth preset relational expression can be used to represent the magnitude relationship or ratio between the second count value and the duration of the sound signal relation. If the second count value and the duration of the sound signal do not satisfy the sixth preset relational expression, it means that in the sound signal, each audio frame and its adjacent audio frames have a non-changing frequency response, that is, the sound The signal is a sound signal with a fixed frequency spectrum.
在一种可能的实现方式中,电子设备判断声音信号的频带之间的能量差异在第一预设范围内的流程如图10所示,在确定当前的音频帧为爆裂音之后,对于第n音频帧,设定第二计数值ConstantSpectrumCount=0,再判定第n音频帧的频带的能量与第n-1音频帧的频带的能量之间是否满足第四预设关系式,以及第n音频帧的频带的能量与第n-2音频帧的频带的能量之间是否满足第五预设关系式。若仅满足第四预设关系式,说明第n音频帧的频带的能量与第n-1音频帧的频带的能量在预设的变化范围内,NC=NC1,若仅满足第五预设关系式,说明第n音频帧的频带的能量与第n-2音频帧的频带的能量在预设的变化范围内,NC=NC2。若同时满足第四预设关系式和第五预设关系式,NC=NC1+NC2。然后将第二计数值ConstantSpectrumCount更新为ConstantSpectrumCount+NC。其中,NC1和NC2为预先设定的常数,例如,NC1=2,NC2=1。在计算出ConstantSpectrumCount后,对第n+1音频帧,重复上述第n音频帧的检测过程,即返回执行判定第n音频帧的频带的能量与第n-1音频帧的频带的能量之间是否满足第四预设关系式的步骤以及后续步骤。In a possible implementation manner, the flow of the electronic device judging that the energy difference between the frequency bands of the sound signal is within the first preset range is shown in FIG. 10 . After determining that the current audio frame is a crackling sound, for the nth audio frame, set the second count value ConstantSpectrumCount=0, and then determine whether the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-1th audio frame satisfy the fourth preset relational expression, and the nth audio frame Whether the fifth preset relational expression is satisfied between the energy of the frequency band of and the energy of the frequency band of the n-2 th audio frame. If only the fourth preset relationship is satisfied, it means that the energy of the frequency band of the n-th audio frame and the energy of the frequency band of the n-1-th audio frame are within the preset variation range, NC=NC1, if only the fifth preset relationship is satisfied formula, indicating that the energy of the frequency band of the n th audio frame and the energy of the frequency band of the n-2 th audio frame are within a preset variation range, NC=NC2. If both the fourth preset relational expression and the fifth preset relational expression are satisfied, NC=NC1+NC2. Then the second count value ConstantSpectrumCount is updated to ConstantSpectrumCount+NC. Among them, NC1 and NC2 are preset constants, for example, NC1=2, NC2=1. After the ConstantSpectrumCount is calculated, repeat the above-mentioned detection process of the n-th audio frame for the n+1-th audio frame, that is, return to determine whether there is a difference between the energy of the frequency band of the n-th audio frame and the energy of the frequency band of the n-1-th audio frame. The steps of satisfying the fourth preset relational expression and subsequent steps.
若第n音频帧的频带的能量与第n-1音频帧的频带的能量差异不在预设的变化范围内,且第n音频帧的频带的能量与第n-2音频帧的频带的能量差异不在预设的变化范围内,对第n+1音频帧,重复上述第n音频帧的检测过程,直到检测到声音信号的最后一个音频帧。在检测完成最后一个音频帧后,判断ConstantSpectrumCount与FrameCount是否满足第六关系式,若不满足第六关系式,说明声音信号的频带之间的能量差异不在第一预设范围内,即声音信号符合固定频谱的侦测条件,也即声音信号为固定频谱的声音信号,不是玻璃破碎声音,回到初始侦测状态。若满足第六关系式,说明声音信号的频带之间的能量差异在第一预设范围内,即声音信号不符合固定频谱的侦测条件,进行下一阶段的检测。If the energy difference between the frequency band of the nth audio frame and the frequency band of the n-1th audio frame is not within the preset variation range, and the energy difference between the frequency band of the nth audio frame and the frequency band of the n-2th audio frame If it is not within the preset variation range, for the n+1 th audio frame, the above-mentioned detection process of the n th audio frame is repeated until the last audio frame of the sound signal is detected. After the last audio frame is detected, it is determined whether ConstantSpectrumCount and FrameCount satisfy the sixth relational expression. If the sixth relational expression is not satisfied, it means that the energy difference between the frequency bands of the sound signal is not within the first preset range, that is, the sound signal conforms to the sixth relational expression. The detection condition of the fixed spectrum, that is, the sound signal is the sound signal of the fixed spectrum, not the sound of glass breaking, and returns to the initial detection state. If the sixth relational expression is satisfied, it means that the energy difference between the frequency bands of the sound signal is within the first preset range, that is, the sound signal does not meet the detection conditions of the fixed frequency spectrum, and the next stage of detection is performed.
在一种可能的实现方式中,第四预设关系式为(Ef(i)>Ef_Pre(i)*SP_VAR)且(Ef1_Pre(i)>Ef(i)*SP_VAR)),第五预设关系式为(Ef(i)>Ef_Pre2(i)*SP_VAR2)且(Ef1_Pre2(i)>Ef(i)*SP_VAR2)),其中,i=1,2,3,…,Ef(i)表示第n音频帧的第i频带的能量,Ef_Pre(i)表示第n-1音频帧的第i频带的能量,Ef_Pre2(i)表示第n-2音频帧的第i频带的能量,SP_VAR和SP_VAR2为预先设定的常数,例如SP_VAR=0.8,SP_VAR2=0.5。第六关系式为ConstantSpectrumCount<FrameCount*0.6。In a possible implementation manner, the fourth preset relationship is (Ef(i)>Ef_Pre(i)*SP_VAR) and (Ef1_Pre(i)>Ef(i)*SP_VAR)), and the fifth preset relationship is The formula is (Ef(i)>Ef_Pre2(i)*SP_VAR2) and (Ef1_Pre2(i)>Ef(i)*SP_VAR2)), where i=1,2,3,..., Ef(i) represents the nth The energy of the ith frequency band of the audio frame, Ef_Pre(i) represents the energy of the ith frequency band of the n-1 th audio frame, Ef_Pre2(i) represents the energy of the ith frequency band of the n-2 th audio frame, SP_VAR and SP_VAR2 are pre- Set constants, such as SP_VAR=0.8, SP_VAR2=0.5. The sixth relational formula is ConstantSpectrumCount<FrameCount*0.6.
在其它可能的实现方式中,也可以根据各音频帧的频带的能量的总和之间的关系确定各音频帧之间的能量差异是否在第一预设范围内。In other possible implementation manners, it may also be determined whether the energy difference between the audio frames is within the first preset range according to the relationship between the sums of the energy of the frequency bands of the audio frames.
上述实施例中,通过计算第n音频帧的频带的能量与第n-1音频帧的频带的能量的差异,以及计算第n音频帧的频带的能量与第n-2音频帧的频带的能量的差异,可以排除固定频谱的声音信号,提高玻璃破碎声音的侦测准确度。In the above-described embodiment, by calculating the difference between the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-1th audio frame, and calculating the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame The difference can eliminate the sound signal of fixed frequency spectrum and improve the detection accuracy of glass breaking sound.
在一种可能的实现方式中,电子设备判断声音信号的目标频率之间的差异在第二预设范围内的方法包括:若声音信号中,第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值在预设差值范围内,或第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值在预设差值范围内,则更新设定的第三计数值,n表示大于2的整数,且M≥n,i表示大于0的整数;若第三计数值与声音信号的时长之间满足第七预设关系式,则确定声音信号的目标频率之间的差异在第二预设范围内。In a possible implementation manner, the method for the electronic device to determine that the difference between the target frequencies of the sound signal is within the second preset range includes: if in the sound signal, the target frequency of the i-th frequency band of the n-th audio frame is different from the target frequency of the i-th frequency band of the n-th audio frame. The difference between the target frequency of the ith frequency band of the n-1 audio frame is within the preset difference value range, or the target frequency of the ith frequency band of the n th audio frame and the target frequency of the ith frequency band of the n-2 th audio frame If the difference between the frequencies is within the preset difference range, update the set third count value, n represents an integer greater than 2, and M≥n, i represents an integer greater than 0; if the third count value is the same as the sound If the durations of the signals satisfy the seventh preset relational expression, it is determined that the difference between the target frequencies of the sound signals is within the second preset range.
其中,由于目标频率为频带的能量峰值对应的频率,若第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值在预设差值范围内,说明第n音频帧与第n-1音频帧出现了相同的频率峰值。若第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值在预设差值范围内,说明第n音频帧与第n-2音频帧出现了相同的频率峰值。声音信号的时长可以用声音信号结束时对应的音频帧的帧计数值FrameCount来表示,第七预设关系式可以用于表征第三计数值与声音信号的时长之间的大小关系或者比值关系。若第三计数值与声音信号的时长之间不满足第七预设关系式,说明声音信号中存在较长时间的固定共振频率,也即声音信号为共振音频的声音信号。Among them, since the target frequency is the frequency corresponding to the energy peak of the frequency band, if the difference between the target frequency of the ith frequency band of the n th audio frame and the target frequency of the ith frequency band of the n-1 th audio frame is within the preset difference Within the range, it means that the n-th audio frame and the n-1-th audio frame have the same frequency peak. If the difference between the target frequency of the i-th frequency band of the n-th audio frame and the target frequency of the i-th frequency band of the n-2-th audio frame is within the preset difference range, it means that the n-th audio frame and the Frames show the same frequency peaks. The duration of the sound signal can be represented by the frame count value FrameCount of the audio frame corresponding to the end of the sound signal, and the seventh preset relational expression can be used to represent the magnitude or ratio relationship between the third count value and the duration of the sound signal. If the seventh preset relational expression is not satisfied between the third count value and the duration of the sound signal, it means that there is a fixed resonance frequency in the sound signal for a long time, that is, the sound signal is a sound signal of resonant audio frequency.
在一种可能的实现方式中,电子设备判断声音信号的目标频率之间的差异在第二预设范围内的流程如图11所示,在确定当前的音频帧为爆裂音之后,对于第n音频帧,首先进行频带划分,将第n音频帧划分为NA个频带,NA为根据经验值设定的常数,并设定第三计数值SameMaxPeakPosCnt(i)=0,对应地,i=1,2,3,…,NA。In a possible implementation manner, the flow of the electronic device judging that the difference between the target frequencies of the sound signals is within the second preset range is shown in FIG. 11 . After determining that the current audio frame is a crackling sound, for the nth Audio frame, first perform frequency band division, divide the nth audio frame into NA frequency bands, NA is a constant set according to the empirical value, and set the third count value SameMaxPeakPosCnt(i)=0, correspondingly, i=1, 2,3,…,NA.
完成频带划分后,计算第n音频帧各频带的目标频率,也即各频带的能量峰值对应的频率。在一种可能的实现方式中,设定频带i的振幅为X(k),则振幅的峰值中的最大值即为能量峰值。其中,k为频带i的频率索引,频率索引与频率对应,k大于0的整数,例如,频带为500~600Hz,频率为500Hz时对应的k=1,频率为505Hz时对应的k=2等。对于每个振幅X(k),若X(k)>X(k-1)且X(k)>X(k-2)*2且X(k)>X(k-3)*3且X(k)>X(k+1)且X(k)>X(k+2)*2且X(k)>X(k+3)*3,则X(k)为一个振幅的峰值,也即波峰值,该波峰值对应的位置为频率索引k。再计算各个振幅峰值的最大值,将振幅峰值的最大值作为频带的能量峰值MaxPeakPosi(n),能量峰值对应的频率为能量峰值对应的位置,也即能量峰值对应的频率索引k。After the frequency band division is completed, the target frequency of each frequency band of the nth audio frame is calculated, that is, the frequency corresponding to the energy peak value of each frequency band. In a possible implementation manner, the amplitude of the frequency band i is set to be X(k), and the maximum value among the peaks of the amplitudes is the energy peak. Among them, k is the frequency index of the frequency band i, the frequency index corresponds to the frequency, and k is an integer greater than 0. For example, when the frequency band is 500-600 Hz, the corresponding k=1 when the frequency is 500 Hz, and the corresponding k=2 when the frequency is 505 Hz, etc. . For each amplitude X(k), if X(k)>X(k-1) and X(k)>X(k-2)*2 and X(k)>X(k-3)*3 and X(k)>X(k+1) and X(k)>X(k+2)*2 and X(k)>X(k+3)*3, then X(k) is an amplitude peak , that is, the peak value of the wave, and the position corresponding to the peak value of the wave is the frequency index k. Then calculate the maximum value of each amplitude peak value, take the maximum value of the amplitude peak value as the energy peak value MaxPeakPosi(n) of the frequency band, and the frequency corresponding to the energy peak value is the position corresponding to the energy peak value, that is, the frequency index k corresponding to the energy peak value.
计算第n音频帧各频带的能量峰值对应的位置后,采用同样的方法计算第n-1音频帧的能量峰值MaxPeakPosi(n-1)及对应的位置,再判定能量峰值MaxPeakPosi(n)对应的位置与能量峰值MaxPeakPosi(n-1)对应的位置是否在预设差值范围内,预设差值范围可以设为1或2。若能量峰值MaxPeakPosi(n)对应的位置与能量峰值MaxPeakPosi(n-1)对应的位置在预设差值范围内,说明第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值在预设差值范围内,则NS=NS1,NS1为预先设定的常数,例如,NS1=1。采用同样的方法计算第n-2音频帧的能量峰值MaxPeakPosi(n-2)及对应的位置,再判定能量峰值MaxPeakPosi(n)对应的位置与能量峰值MaxPeakPosi(n-2)对应的位置是否在预设差值范围内。若能量峰值MaxPeakPosi(n)对应的位置与能量峰值MaxPeakPosi(n-2)对应的位置在预设差值范围内,说明第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值在预设差值范围内,则NS=NS2,NS2为预先设定的常数,例如,NS2=1。若能量峰值MaxPeakPosi(n)对应的位置与能量峰值MaxPeakPosi(n-1)对应的位置,以及能量峰值MaxPeakPosi(n)对应的位置与能量峰值MaxPeakPosi(n-2)对应的位置均在预设差值范围内,则NS=NS1+NS2。然后将第三计数值SameMaxPeakPosCnt(i)更新为SameMaxPeakPosCnt(i)+NS。在计算出SameMaxPeakPosCnt(i)后,对第n+1音频帧,重复上述第n音频帧的检测过程,即返回执行对第n音频帧进行频带划分的步骤以及后续步骤。After calculating the position corresponding to the energy peak of each frequency band of the nth audio frame, use the same method to calculate the energy peak MaxPeakPosi(n-1) of the n-1th audio frame and the corresponding position, and then determine the corresponding energy peak value MaxPeakPosi(n). Whether the position corresponding to the energy peak value MaxPeakPosi(n-1) is within the preset difference range, the preset difference range can be set to 1 or 2. If the position corresponding to the energy peak MaxPeakPosi(n) and the position corresponding to the energy peak MaxPeakPosi(n-1) are within the preset difference range, it means that the target frequency of the i-th frequency band of the n-th audio frame is different from the target frequency of the n-1-th audio frame. The difference between the target frequencies of the i-th frequency band is within the preset difference range, then NS=NS1, and NS1 is a preset constant, for example, NS1=1. Use the same method to calculate the energy peak MaxPeakPosi(n-2) and the corresponding position of the n-2 audio frame, and then determine whether the position corresponding to the energy peak MaxPeakPosi(n) and the energy peak MaxPeakPosi(n-2) are in within the preset difference range. If the position corresponding to the energy peak MaxPeakPosi(n) and the position corresponding to the energy peak MaxPeakPosi(n-2) are within the preset difference range, it means that the target frequency of the i-th frequency band of the n-th audio frame and the target frequency of the n-2-th audio frame The difference between the target frequencies of the i-th frequency band is within a preset difference range, then NS=NS2, and NS2 is a preset constant, for example, NS2=1. If the position corresponding to the energy peak MaxPeakPosi(n) and the position corresponding to the energy peak MaxPeakPosi(n-1), and the position corresponding to the energy peak MaxPeakPosi(n) and the energy peak MaxPeakPosi(n-2) are within the preset difference within the value range, then NS=NS1+NS2. Then the third count value SameMaxPeakPosCnt(i) is updated to SameMaxPeakPosCnt(i)+NS. After the SameMaxPeakPosCnt(i) is calculated, repeat the above-mentioned detection process of the nth audio frame for the n+1th audio frame, that is, return to the step of performing frequency band division for the nth audio frame and the subsequent steps.
若第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值,以及第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值,均不在预设差值范围内,对第n+1音频帧,重复上述第n音频帧的检测过程,直到检测到声音信号的最后一个音频帧。在检测完成最后一个音频帧后,判断SameMaxPeakPosCnt(i)与FrameCount是否满足第七关系式,若不满足第七关系式,说明声音信号的目标频率之间的差异不在第二预设范围内,即声音信号符合共振音频的侦测条件,也即声音信号为共振音频的声音信号,不是玻璃破碎声音,回到初始侦测状态。若满足第七关系式,说明声音信号的目标频率之间的差异在第二预设范围内,继续下一阶段的检测。If the difference between the target frequency of the ith band of the nth audio frame and the target frequency of the ith band of the n-1th audio frame, and the target frequency of the ith band of the nth audio frame and the n-2th audio frequency The difference between the target frequencies of the ith frequency band of the frame is not within the preset difference range. For the n+1 th audio frame, repeat the above-mentioned detection process of the n th audio frame until the last audio frequency of the sound signal is detected. frame. After the last audio frame is detected, determine whether SameMaxPeakPosCnt(i) and FrameCount satisfy the seventh relational expression. If the seventh relational expression is not satisfied, it means that the difference between the target frequencies of the sound signals is not within the second preset range, that is, The sound signal meets the detection conditions of the resonant audio, that is, the sound signal is the sound signal of the resonant audio, not the sound of glass breaking, and returns to the initial detection state. If the seventh relational expression is satisfied, it means that the difference between the target frequencies of the sound signals is within the second preset range, and the detection of the next stage is continued.
本申请实施例中,为了排除共振音频的声音信号,可以设定声音信号符合共振音频的侦测条件的关系式,也即设定SameMaxPeakPosCnt(i)与FrameCount不满足第七关系式的条件,例如,SameMaxPeakPosCnt(i)与FrameCount不满足第七关系式的条件包括:In the embodiment of the present application, in order to exclude the sound signal of the resonant audio frequency, it is possible to set the relational expression that the sound signal conforms to the detection condition of the resonant audio frequency, that is, to set the condition that SameMaxPeakPosCnt(i) and FrameCount do not satisfy the seventh relational expression, for example , the conditions under which SameMaxPeakPosCnt(i) and FrameCount do not satisfy the seventh relation include:
FrameCount>5且SameMaxPeakPosCnt(0)≥FrameCount*1.5-2;FrameCount>5 and SameMaxPeakPosCnt(0)≥FrameCount*1.5-2;
或FrameCount>5且SameMaxPeakPosCnt(1)≥FrameCount*1.5-2;Or FrameCount>5 and SameMaxPeakPosCnt(1)≥FrameCount*1.5-2;
或FrameCount>5且SameMaxPeakPosCnt(2)≥FrameCount*1.5-2;Or FrameCount>5 and SameMaxPeakPosCnt(2)≥FrameCount*1.5-2;
或FrameCount>5且SameMaxPeakPosCnt(3)≥FrameCount*1.5-2;Or FrameCount>5 and SameMaxPeakPosCnt(3)≥FrameCount*1.5-2;
或FrameCount>5且SameMaxPeakPosCnt(0)+SameMaxPeakPosCnt(1)≥FrameCount*2-2;Or FrameCount>5 and SameMaxPeakPosCnt(0)+SameMaxPeakPosCnt(1)≥FrameCount*2-2;
或FrameCount>5且SameMaxPeakPosCnt(0)+SameMaxPeakPosCnt(1)+SameMaxPeakPosCnt(2)+SameMaxPeakPosCnt(3)≥FrameCount*3-2))。Or FrameCount>5 and SameMaxPeakPosCnt(0)+SameMaxPeakPosCnt(1)+SameMaxPeakPosCnt(2)+SameMaxPeakPosCnt(3)≥FrameCount*3-2)).
上述实施例中,通过计算第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值,以及第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值,可以排除共振音频的声音信号,提高玻璃破碎声音的侦测准确度。In the above-described embodiment, by calculating the difference between the target frequency of the i-th frequency band of the n-th audio frame and the target frequency of the i-th frequency band of the n-1-th audio frame, and the target frequency of the i-th frequency band of the n-th audio frame The difference with the target frequency of the ith frequency band of the n-2 th audio frame can exclude the sound signal of the resonant audio, and improve the detection accuracy of the glass breaking sound.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
对应于上文实施例所述的玻璃破碎声音侦测方法,图12示出了本申请实施例提供的玻璃破碎装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the method for detecting the sound of glass breaking described in the above embodiment, FIG. 12 shows a structural block diagram of the glass breaking device provided by the embodiment of the present application. For the convenience of description, only the part related to the embodiment of the present application is shown. .
如图12所示,玻璃破碎声音侦测装置包括,As shown in Figure 12, the glass breaking sound detection device includes,
获取模块10,用于获取声音信号,对所述声音信号进行分帧处理,获得M个音频帧,所述M表示大于0的整数;an
计算模块20,用于若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值;The
判定模块30,用于若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,则判定所述声音信号为玻璃破碎声音,所述Np表示大于0的整数,且M>Np。The
在一种可能的实现方式中,所述计算模块20包括第一计算单元,所述第一计算单元用于:In a possible implementation manner, the
若根据当前的音频帧的频谱参数确定所述当前的音频帧为爆裂音,且在所述当前的音频帧之后的NB个音频帧中侦测到白噪声,则根据所述当前的音频帧的频谱参数,以及所述当前的音频帧之后的音频帧的频谱参数,确定爆裂音峰值,所述NB表示大于0的整数,且M>NB。If it is determined according to the spectral parameters of the current audio frame that the current audio frame is a crackling sound, and white noise is detected in N B audio frames after the current audio frame, then according to the current audio frame The frequency spectrum parameter of , and the frequency spectrum parameter of the audio frame after the current audio frame, determine the crackle peak value, the NB represents an integer greater than 0, and M > NB .
在一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述第一计算单元具体用于:若所述当前的音频帧之后的NB个音频帧中存在目标音频帧,则确定第一计数值的计算公式,所述目标音频帧为频带的能量之间满足第一预设关系式的音频帧,所述第一计算值的计算公式与所述第一预设关系式对应;In a possible implementation manner, the audio frame includes at least one frequency band, the spectrum parameter includes energy of the frequency band, and the first calculation unit is specifically configured to: if the NB after the current audio frame There is a target audio frame in the audio frames, then determine the calculation formula of the first count value, the target audio frame is the audio frame that satisfies the first preset relational expression between the energies of the frequency bands, and the calculation formula of the first calculation value corresponding to the first preset relational expression;
根据所述第一计数值的计算公式确定所述第一计数值,若所述第一计数值大于预设第一阈值,则判定所述目标音频帧为白噪声。The first count value is determined according to the calculation formula of the first count value, and if the first count value is greater than a preset first threshold, it is determined that the target audio frame is white noise.
在一种可能的实现方式中,所述音频帧包括至少一个子帧,所述频谱参数包括所述子帧的能量,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述计算模块20还包括第二计算单元,所述第二计算单元具体用于:In a possible implementation manner, the audio frame includes at least one subframe, the spectral parameter includes energy of the subframe, the audio frame includes at least one frequency band, and the spectral parameter includes energy of the frequency band , the
若所述当前的音频帧的子帧的能量的最大值大于预设第二阈值,则:If the maximum value of the energy of the subframes of the current audio frame is greater than the preset second threshold, then:
根据所述当前的音频帧的子帧的能量确定所述当前的音频帧的能量比;Determine the energy ratio of the current audio frame according to the energy of the subframes of the current audio frame;
根据所述当前的音频帧的频带的能量以及所述当前的音频帧的前一音频帧的频带的能量,确定当前的音频帧与所述前一音频帧的能量比;According to the energy of the frequency band of the current audio frame and the energy of the frequency band of the previous audio frame of the current audio frame, determine the energy ratio of the current audio frame to the previous audio frame;
根据所述当前的音频帧的子帧的能量以及所述前一音频帧的子帧的能量确定所述当前的音频帧与所述前一音频帧的能量差;Determine the energy difference between the current audio frame and the previous audio frame according to the energy of the subframe of the current audio frame and the energy of the subframe of the previous audio frame;
根据所述前一音频帧的子帧的能量确定所述前一音频帧的能量比;Determine the energy ratio of the previous audio frame according to the energy of the subframe of the previous audio frame;
若所述当前的音频帧与所述前一音频帧的能量比大于第一预设值,且所述当前的音频帧的能量比大于第二预设值;If the energy ratio of the current audio frame and the previous audio frame is greater than a first preset value, and the energy ratio of the current audio frame is greater than a second preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第一预设值,且所述前一音频帧的能量比大于第二预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a first preset value, and the energy ratio of the previous audio frame is greater than a second preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第三预设值,且所述当前的音频帧的能量比大于第四预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a third preset value, and the energy ratio of the current audio frame is greater than a fourth preset value;
或,若所述当前的音频帧与所述前一音频帧的能量比大于第三预设值,且所述前一音频帧的能量比大于第四预设值;Or, if the energy ratio of the current audio frame and the previous audio frame is greater than a third preset value, and the energy ratio of the previous audio frame is greater than a fourth preset value;
或,若所述当前的音频帧与所述前一音频帧的能量差大于第五预设值,且所述当前的音频帧与所述前一音频帧的能量比大于第六预设值,且所述当前的音频帧的能量比大于第七预设值;Or, if the energy difference between the current audio frame and the previous audio frame is greater than the fifth preset value, and the energy ratio between the current audio frame and the previous audio frame is greater than the sixth preset value, And the energy ratio of the current audio frame is greater than the seventh preset value;
或,若所述当前的音频帧与所述前一音频帧的能量差大于第五预设值,且所述当前的音频帧与所述前一音频帧的能量比大于第六预设值,且所述前一音频帧的能量比大于第七预设值;Or, if the energy difference between the current audio frame and the previous audio frame is greater than the fifth preset value, and the energy ratio between the current audio frame and the previous audio frame is greater than the sixth preset value, and the energy ratio of the previous audio frame is greater than the seventh preset value;
则确定所述当前的音频帧为爆裂音。Then it is determined that the current audio frame is a crackling sound.
在一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述计算模块20还包括第三计算单元,所述第三计算单元具体用于:In a possible implementation manner, the audio frame includes at least one frequency band, the spectral parameter includes energy of the frequency band, and the
将所述当前的音频帧的各频带的能量总和作为爆裂音峰值;Taking the energy sum of each frequency band of the current audio frame as the peak of the crackling sound;
将所述当前的音频帧的后一音频帧作为第一音频帧;Taking the next audio frame of the current audio frame as the first audio frame;
计算所述第一音频帧的各频带的能量总和;calculating the energy summation of each frequency band of the first audio frame;
若所述第一音频帧的各频带的能量总和大于所述爆裂音峰值,则将所述爆裂音峰值更新为所述第一音频帧的各频带的能量总和;If the sum of the energy of each frequency band of the first audio frame is greater than the peak value of the crackling sound, updating the peak value of the crackling sound to the sum of the energy of each frequency band of the first audio frame;
将所述第一音频帧的后一音频帧作为第一音频帧,返回执行所述计算所述第一音频帧的各频带的能量总和的步骤以及后续步骤,直到满足预设结束条件。The following audio frame of the first audio frame is used as the first audio frame, and the step of calculating the sum of the energy of each frequency band of the first audio frame and the subsequent steps are returned to until the preset end condition is satisfied.
在一种可能的实现方式中,所述音频帧包括至少一个频带,所述频谱参数包括所述频带的能量,所述判定模块30具体用于:In a possible implementation manner, the audio frame includes at least one frequency band, the spectral parameter includes energy of the frequency band, and the determining
若所述当前的音频帧之后的Np个音频帧的能量相对于所述爆裂音峰值的衰减量,都在预设衰减范围内,且所述声音信号满足第一预设条件,则判定所述声音信号为玻璃破碎声音;所述第一预设条件包括下述中的任意一个或多个:所述声音信号的时长在预设时长内、所述声音信号的频谱参数符合预设频谱特征、所述声音信号的频带之间的能量差异在第一预设范围内、所述声音信号的目标频率之间的差异在第二预设范围内,所述目标频率为频带的能量峰值对应的频率。If the energy of the N p audio frames following the current audio frame is within the preset attenuation range relative to the attenuation of the crackling peak, and the sound signal satisfies the first preset condition, it is determined that the The sound signal is glass breaking sound; the first preset condition includes any one or more of the following: the duration of the sound signal is within the preset duration, and the spectral parameters of the sound signal conform to preset spectral characteristics , the energy difference between the frequency bands of the sound signal is within a first preset range, the difference between the target frequencies of the sound signal is within a second preset range, and the target frequency is the energy peak corresponding to the frequency band frequency.
在一种可能的实现方式中,所述判定模块30还用于:In a possible implementation manner, the determining
计算所述声音信号的频带的能量相对于所述爆裂音峰值的衰减速度和/或衰减时间;calculating the decay speed and/or decay time of the energy of the frequency band of the sound signal relative to the crackle peak;
若所述衰减速度在预设速度范围内,和/或所述衰减时间在预设时间范围内,则确定所述声音信号的时长在预设时长内。If the decay speed is within the preset speed range, and/or the decay time is within the preset time range, it is determined that the duration of the sound signal is within the preset duration.
在一种可能的实现方式中,所述判定模块30还用于:In a possible implementation manner, the determining
确定所述声音信号中,频带之间的能量满足第二预设关系式的音频帧的数量;determining the number of audio frames in the sound signal whose energy between frequency bands satisfies the second preset relational expression;
若所述满足第二预设关系式的音频帧的数量,与所述声音信号的时长之间满足第三预设关系式,则确定所述声音信号的第一预设频带的能量符合预设频谱特征。If the number of audio frames satisfying the second preset relational expression and the duration of the sound signal satisfy the third predetermined relational expression, it is determined that the energy of the first preset frequency band of the sound signal conforms to the preset spectral characteristics.
在一种可能的实现方式中,所述判定模块30还用于:In a possible implementation manner, the determining
若所述声音信号中,第n音频帧的频带的能量与第n-1音频帧的频带的能量之间满足第四预设关系式,If in the sound signal, the energy of the frequency band of the n-th audio frame and the energy of the frequency band of the n-1-th audio frame satisfy the fourth preset relational expression,
或所述第n音频帧的频带的能量与第n-2音频帧的频带的能量之间满足第五预设关系式,则更新设定的第二计数值,所述n表示大于2的整数,且M≥n;Or the energy of the frequency band of the nth audio frame and the energy of the frequency band of the n-2th audio frame satisfy the fifth preset relational expression, then update the set second count value, and the n represents an integer greater than 2 , and M≥n;
若所述第二计数值与所述声音信号的时长之间满足第六预设关系式,则确定所述声音信号的频带之间的能量差异在第一预设范围内。If a sixth preset relational expression is satisfied between the second count value and the duration of the sound signal, it is determined that the energy difference between the frequency bands of the sound signal is within a first preset range.
在一种可能的实现方式中,所述判定模块30还用于:In a possible implementation manner, the determining
若所述声音信号中,第n音频帧的第i频带的目标频率与第n-1音频帧的第i频带的目标频率之间的差值在预设差值范围内,If in the sound signal, the difference between the target frequency of the ith frequency band of the n th audio frame and the target frequency of the ith frequency band of the n-1 th audio frame is within the preset difference range,
或第n音频帧的第i频带的目标频率与第n-2音频帧的第i频带的目标频率之间的差值在预设差值范围内,则更新设定的第三计数值,所述n表示大于2的整数,且M≥n,所述i表示大于0的整数;Or the difference between the target frequency of the ith frequency band of the nth audio frame and the target frequency of the ith frequency band of the n-2th audio frame is within the preset difference range, then update the set third count value, so The n represents an integer greater than 2, and M≥n, and the i represents an integer greater than 0;
若所述第三计数值与所述声音信号的时长之间满足第七预设关系式,则确定所述声音信号的目标频率之间的差异在第二预设范围内。If a seventh preset relational expression is satisfied between the third count value and the duration of the sound signal, it is determined that the difference between the target frequencies of the sound signal is within a second preset range.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.
图13是本申请实施例三提供的电子设备的示意图。如图13所示,该实施例的电子设备包括:处理器21、存储器22以及存储在所述存储器22中并可在所述处理器21上运行的计算机程序23。所述处理器21执行所述计算机程序23时实现上述玻璃破碎声音侦测方法实施例中的步骤,例如图3所示的步骤S101至S103。或者,所述处理器21执行所述计算机程序23时实现上述各装置实施例中各模块/单元的功能,例如图12所示获取模块10至判定模块30的功能。FIG. 13 is a schematic diagram of an electronic device provided in Embodiment 3 of the present application. As shown in FIG. 13 , the electronic device of this embodiment includes: a
示例性的,所述计算机程序23可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器22中,并由所述处理器21执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序23在所述电子设备中的执行过程。Exemplarily, the
本领域技术人员可以理解,图13仅仅是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备还可以包括输入输出设备、网络接入设备、总线等。Those skilled in the art can understand that FIG. 13 is only an example of an electronic device, and does not constitute a limitation to the electronic device, and may include more or less components than those shown in the figure, or combine some components, or different components, such as The electronic device may also include an input and output device, a network access device, a bus, and the like.
所述处理器21可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The
所述存储器22可以是所述电子设备的内部存储单元,例如电子设备的硬盘或内存。所述存储器22也可以是所述电子设备的外部存储设备,例如所述电子设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器12还可以既包括所述电子设备的内部存储单元也包括外部存储设备。所述存储器22用于存储所述计算机程序以及所述电子设备所需的其他程序和数据。所述存储器22还可以用于暂时地存储已经输出或者将要输出的数据。The
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
在本申请所提供的实施例中,应该理解到,所揭露的装置/电子设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/电子设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the above-described embodiments of the apparatus/electronic device are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,RandomAccess Memory)、电载波信号、电信信号以及软件分发介质等。The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-readable storage medium. Based on this understanding, the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, RandomAccess Memory), electric carrier signal, telecommunication signal and software distribution medium, etc.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011180908.7A CN114429769B (en) | 2020-10-29 | 2020-10-29 | Glass breaking sound detection method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011180908.7A CN114429769B (en) | 2020-10-29 | 2020-10-29 | Glass breaking sound detection method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114429769A true CN114429769A (en) | 2022-05-03 |
CN114429769B CN114429769B (en) | 2025-04-04 |
Family
ID=81309659
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011180908.7A Active CN114429769B (en) | 2020-10-29 | 2020-10-29 | Glass breaking sound detection method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114429769B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742232A (en) * | 1994-07-18 | 1998-04-21 | Nippondenso Co., Ltd. | Glass breaking detection device |
US5796336A (en) * | 1996-03-08 | 1998-08-18 | Denso Corporation | Glass breakage detecting device |
US6236313B1 (en) * | 1997-10-28 | 2001-05-22 | Pittway Corp. | Glass breakage detector |
US20050199064A1 (en) * | 2004-02-10 | 2005-09-15 | Samsung Electronics Co., Ltd. | Apparatus, method, and medium for detecting and discriminating impact sound |
US20060100828A1 (en) * | 2004-11-10 | 2006-05-11 | Samsung Electronics Co., Ltd. | Impulse event separating apparatus and method |
KR20160120018A (en) * | 2015-04-07 | 2016-10-17 | 주식회사 에스원 | Abnormal voice detecting method and system |
CN108492837A (en) * | 2018-03-23 | 2018-09-04 | 腾讯音乐娱乐科技(深圳)有限公司 | Detection method, device and the storage medium of audio burst white noise |
-
2020
- 2020-10-29 CN CN202011180908.7A patent/CN114429769B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742232A (en) * | 1994-07-18 | 1998-04-21 | Nippondenso Co., Ltd. | Glass breaking detection device |
US5796336A (en) * | 1996-03-08 | 1998-08-18 | Denso Corporation | Glass breakage detecting device |
US6236313B1 (en) * | 1997-10-28 | 2001-05-22 | Pittway Corp. | Glass breakage detector |
US20050199064A1 (en) * | 2004-02-10 | 2005-09-15 | Samsung Electronics Co., Ltd. | Apparatus, method, and medium for detecting and discriminating impact sound |
US20060100828A1 (en) * | 2004-11-10 | 2006-05-11 | Samsung Electronics Co., Ltd. | Impulse event separating apparatus and method |
KR20160120018A (en) * | 2015-04-07 | 2016-10-17 | 주식회사 에스원 | Abnormal voice detecting method and system |
CN108492837A (en) * | 2018-03-23 | 2018-09-04 | 腾讯音乐娱乐科技(深圳)有限公司 | Detection method, device and the storage medium of audio burst white noise |
Non-Patent Citations (2)
Title |
---|
ARSLAN, Y等: "Impulsive Sound Detection and Gunshot Recognition", 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 19 May 2015 (2015-05-19) * |
赵杰: "基于深度学习的危险声音检测技术研究", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 2020, 15 August 2020 (2020-08-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN114429769B (en) | 2025-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6606167B2 (en) | Voice section detection method and apparatus | |
WO2021223518A1 (en) | Wind noise suppression method applicable to artificial cochlea, and system thereof | |
WO2020168981A1 (en) | Wind noise suppression method and apparatus | |
CN113614828B (en) | Method and apparatus for fingerprinting an audio signal via normalization | |
US12235896B2 (en) | Methods and apparatus to fingerprint an audio signal via exponential normalization | |
CN110853677B (en) | Method, device, terminal and non-transitory computer-readable storage medium for drum beat recognition of songs | |
CN114429769B (en) | Glass breaking sound detection method and related equipment | |
Cheng et al. | Improving piano note tracking by HMM smoothing | |
CN103929704A (en) | Self-adaption acoustic feedback elimination method and system based on transformation domain | |
JP4630956B2 (en) | Howling frequency component enhancement method and apparatus, howling detection method and apparatus, howling suppression method and apparatus, peak frequency component enhancement method and apparatus | |
WO2020107455A1 (en) | Voice processing method and apparatus, storage medium, and electronic device | |
CN108986799A (en) | A kind of reverberation parameters estimation method based on cepstral filtering | |
CN114420100A (en) | Voice detection method and device, electronic equipment and storage medium | |
CN113593604A (en) | Method, device and storage medium for detecting audio quality | |
CN114049882A (en) | Noise reduction model training method, device and storage medium | |
WO2018176654A1 (en) | Gain adjustment method and apparatus, audio coder, and loudspeaker device | |
JP4165059B2 (en) | Active silencer | |
CN112017649B (en) | Audio processing method, device, electronic device and readable storage medium | |
CN109308910B (en) | Method and apparatus for determining bpm of audio | |
CN102256201A (en) | Automatic environmental identification method used for hearing aid | |
CN116129924B (en) | A Noise Recognition Method Based on Speex | |
KR20200124526A (en) | Sporadic noise detecting apparatus | |
CN113257284B (en) | Voice activity detection model training, voice activity detection method and related device | |
CN118737196A (en) | Noise suppression method, storage medium, electronic device and chip | |
CN118748767A (en) | Noise detection method, storage medium, electronic device and chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |