CN107358964B

CN107358964B - Method for detecting warning signs in changing environment

Info

Publication number: CN107358964B
Application number: CN201710223382.8A
Authority: CN
Inventors: A.伊耶; J.L.哈钦斯; R.A.克赖菲尔特
Original assignee: Harman International Industries Inc
Current assignee: Harman International Industries Inc
Priority date: 2016-04-07
Filing date: 2017-04-07
Publication date: 2023-08-04
Anticipated expiration: 2037-04-07
Also published as: US20180014112A1; CN116844559A; EP3229487A1; US10555069B2; US9749733B1; CN107358964A; EP3229487B1

Abstract

In an audio system, audio signals are pre-processed to provide input signals to a fast detector and a slow detector, the input signals including alert signals and ambient sound. The slow detector determines the ambient sound level of the input signal that is output to the alert signal detector. The alert signal detector uses the ambient sound level to calculate an adaptive threshold level using an adaptive threshold function. The fast detector determines an envelope level of the input signal output to the alert signal detector. The alert signal detector compares the envelope level to the adaptive threshold level to determine whether an alert signal is present in the input signal. The adaptive threshold level varies in accordance with the ambient sound level of the input signal, and the alert signal detection of the audio system is automatically adapted to a changing acoustic environment having a different ambient sound level.

Description

Method for detecting warning signs in changing environment

背景技术Background technique

本公开的实施方案的领域Field of Embodiments of the Disclosure

本公开的实施方案总体涉及音频信号处理，且更具体地涉及用于检测在变化的环境中的警戒信号的方法。Embodiments of the present disclosure relate generally to audio signal processing, and more particularly to methods for detecting warning signs in changing environments.

相关技术的描述Description of related technologies

头戴式耳机、耳机、耳塞和其它个人收听设备通常由希望听到从特定类型的音频源产生的声音例如音乐、言语或电影配乐的个人使用，而不干扰在附近的周围区域中的其他人。这些类型的声音在本文通常被称为“娱乐”信号，且每个这样的娱乐信号在本文被特征化为在持久的时间段期间存在的音频信号。Headphones, earphones, earbuds, and other personal listening devices are typically used by individuals who wish to hear sound produced from a specific type of audio source, such as music, speech, or movie soundtracks, without disturbing others in the nearby surrounding area . These types of sounds are generally referred to herein as "entertainment" signals, and each such entertainment signal is characterized herein as an audio signal that exists during a persistent period of time.

一般，个人收听设备包括用于插入音频重放设备的音频输出端内的音频插头。音频插头连接到将音频信号从音频重放设备传送到个人收听设备的电缆。为了提供高质量音频，这样的个人收听设备通常包括覆盖整个耳朵或完全密封耳道的扬声器部件。个人收听设备设计成提供良好的声密封，从而减少音频信号泄漏并提高收听者体验的质量，特别是关于低频响应。Typically, personal listening devices include an audio plug for insertion into the audio output of an audio playback device. The audio plug connects to the cable that carries the audio signal from the audio playback device to the personal listening device. In order to provide high quality audio, such personal listening devices typically include a speaker assembly that covers the entire ear or completely seals the ear canal. Personal listening devices are designed to provide a good acoustic seal, thereby reducing audio signal leakage and improving the quality of the listener's experience, especially with respect to low frequency response.

上面的个人收听设备设计的一个缺点是，因为设备形成与耳朵的良好声密封，用户听环境声音的能力实质上减小了，这可对用户呈现相当大的安全问题。例如，用户可能不能够听见来自环境的某些重要的声音，例如即将来临的车辆的声音、人说话或警报。从环境发出的这些类型的重要声音在本文被称为“优先级”或“警戒”信号，且每个这样的信号一般被特征化为间歇的音频信号，其充当对由娱乐信号产生的更持久的声音或收听环境的其它方面的中断。One disadvantage of the above personal listening device design is that because the device forms a good acoustic seal with the ear, the user's ability to hear ambient sounds is substantially reduced, which can present a considerable safety concern to the user. For example, the user may not be able to hear certain important sounds from the environment, such as the sound of oncoming vehicles, people speaking, or alarms. These types of important sounds emanating from the environment are referred to herein as "priority" or "alert" signals, and each such signal is generally characterized as an intermittent audio signal that acts as a response to the more persistent audio signals produced by entertainment signals. interruption of sound or other aspects of the listening environment.

解决上面的问题的一种方法涉及试图使用集成在收听设备内的一个或多个麦克风来检测在收听环境中存在的警戒信号。当检测到警戒信号时，收听设备可例如自动减小娱乐信号的声音电平，并向用户重放警戒信号以使用户知道警戒信号。然而，用于检测警戒信号的传统解决方案在计算上是复杂的，且需要相当多的处理资源以得到可接受的性能。此外，这样的解决方案不考虑变化的声环境，且因此不提供在不同的声环境中的令人满意的性能。One approach to address the above problems involves attempting to detect warning signs present in the listening environment using one or more microphones integrated within the listening device. When an alert signal is detected, the listening device may, for example, automatically reduce the sound level of the entertainment signal and replay the alert signal to the user to make the user aware of the alert signal. However, traditional solutions for detecting warning signs are computationally complex and require considerable processing resources for acceptable performance. Furthermore, such solutions do not take into account the changing acoustic environment and thus do not provide satisfactory performance in different acoustic environments.

如前述内容说明的，用于检测在收听环境内的警戒信号的、可在个人收听设备中实现的更有效的技术将是有用的。As the foregoing illustrates, more efficient techniques for detecting red flags within a listening environment that can be implemented in personal listening devices would be useful.

发明内容Contents of the invention

各种实施方案阐述了音频处理系统，其包括配置成确定包括环境声的音频输入信号的环境声电平并将环境声电平传输到警戒信号检测器的慢速检测器。音频处理系统还包括配置成确定音频输入信号的包络电平并将包络电平传输到警戒信号检测器的快速检测器。音频处理系统还包括配置成基于环境声电平来确定自适应阈值电平并通过比较包络电平与自适应阈值电平来确定警戒信号是否存在于音频输入信号中的警戒信号检测器。Various embodiments set forth an audio processing system that includes a slow detector configured to determine an ambient sound level of an audio input signal including ambient sound and transmit the ambient sound level to an alert signal detector. The audio processing system also includes a fast detector configured to determine an envelope level of the audio input signal and transmit the envelope level to the alert signal detector. The audio processing system also includes an alert signal detector configured to determine an adaptive threshold level based on the ambient sound level and determine whether an alert signal is present in the audio input signal by comparing the envelope level with the adaptive threshold level.

其它实施方案包括(但不限于)包含用于执行所公开的技术的一个或多个方面的指令的计算机可读介质以及用于执行所公开的技术的一个或多个方面的方法。Other embodiments include, but are not limited to, computer-readable media containing instructions for performing one or more aspects of the disclosed techniques and methods for performing one or more aspects of the disclosed techniques.

所公开的方法的至少一个优点是，它允许音频处理系统以检测在变化的声环境中的警戒信号的简单和低成本的方式实现。At least one advantage of the disclosed method is that it allows audio processing systems to be implemented in a simple and low-cost manner to detect warning signals in changing acoustic environments.

附图说明Description of drawings

因此可通过参考某些特定的实施方案来有上面阐述的一个或多个实施方案的所列举的特征可详细地被理解的方式、上面简要概述的一个或多个实施方案的更具体的描述，其中一些实施方案在附图中示出。然而应注意，附图只示出典型实施方案且因此不应以任何方式被考虑为它的范围的限制，因为各种实施方案的范围也包含其它实施方案。Thus, the manner in which enumerated features of one or more embodiments set forth above can be understood in detail, a more particular description of one or more embodiments briefly summarized above, may be had by reference to certain specific embodiments, Some of these embodiments are shown in the accompanying drawings. It is to be noted, however, that the drawings illustrate only typical embodiments and are therefore not to be considered in any way limiting of its scope, as the scope of various embodiments may include other embodiments as well.

图1示出配置成实现各种实施方案的一个或多个方面的音频处理系统；Figure 1 illustrates an audio processing system configured to implement one or more aspects of various embodiments;

图2示出根据各种实施方案的由图1的警戒信号检测器实现的示例性自适应阈值函数；以及FIG. 2 illustrates an exemplary adaptive threshold function implemented by the warning signal detector of FIG. 1 , according to various embodiments; and

图3是根据各种实施方案的用于检测在音频信号内的警戒信号的方法步骤的流程图。3 is a flowchart of method steps for detecting an alert signal within an audio signal, according to various embodiments.

具体实施方式Detailed ways

在下面的描述中，阐述了很多特定的细节以提供对某些特定实施方案的更彻底的理解。然而对本领域中的技术人员将明显，其它实施方案可在没有这些特定的细节中的一个或多个的情况下或有额外的特定细节的情况下被实施。In the following description, numerous specific details are set forth in order to provide a more thorough understanding of certain specific embodiments. It will be apparent, however, to one skilled in the art that other embodiments may be practiced without one or more of these specific details, or with additional specific details.

系统概述System Overview

图1示出配置成实现各种实施方案的一个或多个方面的音频处理系统100。如所示，音频处理系统100包括(但不限于)部件例如麦克风110、声环境处理器(SEP)120、带通滤波器(BPF)130、快速均方根(RMS)检测器150、慢速RMS检测器160、警戒信号检测器170和检测接收设备190。可以软件和/或硬件制造并实现在图1中示出的音频处理系统100的每个部件。例如，每个部件可使用硬连线数字和/或模拟电路以硬件实现和/或使用存储器单元和处理器单元以软件实现。通常，处理器单元可以是能够处理数据和/或执行软件应用的任何在技术上可行的硬件单元。例如，处理器可包括中央处理单元(CPU)、图形处理单元(GPU)、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑设备、分立门或晶体管逻辑、分立硬件部件或不同处理单元的任何组合，例如配置成结合GPU来操作的CPU。存储器单元配置成存储软件应用和数据。来自在存储器单元内的软件结构的指令由处理器执行以实现本文所述的创造性操作和功能。FIG. 1 illustrates an audio processing system 100 configured to implement one or more aspects of various embodiments. As shown, the audio processing system 100 includes, but is not limited to, components such as a microphone 110, a sound environment processor (SEP) 120, a bandpass filter (BPF) 130, a fast root mean square (RMS) detector 150, a slow RMS detector 160 , warning signal detector 170 and detection receiving device 190 . Each component of the audio processing system 100 shown in FIG. 1 may be manufactured and implemented in software and/or hardware. For example, each component may be implemented in hardware using hardwired digital and/or analog circuits and/or in software using memory units and processor units. In general, a processor unit may be any technically feasible hardware unit capable of processing data and/or executing software applications. For example, a processor may include a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic device, Discrete gate or transistor logic, discrete hardware components, or any combination of distinct processing units, such as a CPU configured to operate in conjunction with a GPU. The memory unit is configured to store software applications and data. Instructions from the software structures within the memory units are executed by the processor to implement the inventive operations and functions described herein.

通常，麦克风110捕获来自环境的声音并将所捕获的音频声音发送到声环境处理器120。音频信号捕获包括警戒信号和环境声的周围环境声。声环境处理器120对音频信号执行降噪并将经处理的信号传输到带通滤波器130，其产生传输到快速RMS检测器150和慢速RMS检测器160的带通滤波信号(输入信号140)。由快速和慢速RMS检测器150和160接收的输入信号140包含警戒信号和环境声。慢速RMS检测器160配置成确定被输出到警戒信号检测器170的输入信号140的环境声电平。警戒信号检测器170使用环境声电平来使用自适应阈值函数计算自适应阈值电平。快速RMS检测器150配置成确定被输出到警戒信号检测器170的输入信号140的包络电平。警戒信号检测器170比较包络电平与自适应阈值电平以确定警戒信号是否当前存在于输入信号140中。警戒信号检测器170将检测信号发送到检测接收设备190，检测信号指示警戒信号是否由警戒信号检测器170接收到。检测接收设备190接收检测信号并基于检测信号的状态来执行一个或多个操作。Generally, the microphone 110 captures sounds from the environment and sends the captured audio sounds to the acoustic environment processor 120 . The audio signal captures ambient sounds including warning signals and ambient sounds. Acoustic environment processor 120 performs noise reduction on the audio signal and passes the processed signal to bandpass filter 130, which produces a bandpass filtered signal that is passed to fast RMS detector 150 and slow RMS detector 160 (input signal 140 ). The input signal 140 received by the fast and slow RMS detectors 150 and 160 contains alarm signals and ambient sound. Slow RMS detector 160 is configured to determine the ambient sound level of input signal 140 that is output to alert signal detector 170 . The siren detector 170 uses the ambient sound level to calculate an adaptive threshold level using an adaptive threshold function. The fast RMS detector 150 is configured to determine the envelope level of the input signal 140 output to the alert signal detector 170 . The alert signal detector 170 compares the envelope level to an adaptive threshold level to determine whether an alert signal is currently present in the input signal 140 . The warning signal detector 170 sends a detection signal to the detection receiving device 190 , the detection signal indicating whether the warning signal is received by the warning signal detector 170 . The detection receiving device 190 receives the detection signal and performs one or more operations based on the state of the detection signal.

如上所述，声环境处理器120和带通滤波器130预先处理所捕获的音频信号以产生由快速和慢速RMS检测器150和160接收的输入信号140。在其它实施方案中，不同的预处理步骤或没有预处理步骤在所捕获的音频信号上被执行以产生输入信号140。不考虑预处理步骤，音频输入信号140(由快速和慢速RMS检测器150和160接收)包括周围环境声，其包括警戒信号和环境声。如上所述，警戒信号检测器170基于输入信号140的环境声电平(如由慢速RMS检测器160检测的)来确定自适应阈值电平，并接着通过比较输入信号140的包络电平(如由快速RMS检测器150检测的)与自适应阈值电平来确定警戒信号是否存在。因为自适应阈值电平根据输入信号140的环境声电平而改变，警戒信号的检测也根据环境声电平而改变。因此，音频处理系统100的警戒信号检测功能自动适应于具有不同的环境声电平的变化的声环境而没有最终用户输入或干预。通过根据环境声电平来改变自适应阈值电平，警戒信号的检测更准确并导致在不同的声环境当中的更少的错误检测。快速和慢速RMS检测器150和160还提供低复杂度解决方案，同时也提供好的性能结果。Acoustic environment processor 120 and bandpass filter 130 pre-process the captured audio signal to produce input signal 140 received by fast and slow RMS detectors 150 and 160 , as described above. In other embodiments, different or no preprocessing steps are performed on the captured audio signal to generate the input signal 140 . Irrespective of the pre-processing steps, the audio input signal 140 (received by the fast and slow RMS detectors 150 and 160) includes ambient sounds, including warning signals and ambient sounds. As described above, the alert signal detector 170 determines an adaptive threshold level based on the ambient sound level of the input signal 140 (as detected by the slow RMS detector 160), and then by comparing the envelope level of the input signal 140 (as detected by the fast RMS detector 150) and an adaptive threshold level to determine whether a warning signal is present. Because the adaptive threshold level varies according to the ambient sound level of the input signal 140, the detection of the warning signal also varies according to the ambient sound level. Accordingly, the siren detection function of the audio processing system 100 automatically adapts to changing acoustic environments with different ambient sound levels without end user input or intervention. By varying the adaptive threshold level according to the ambient sound level, the detection of warning signals is more accurate and results in fewer false detections in different acoustic environments. The fast and slow RMS detectors 150 and 160 also provide low complexity solutions while also providing good performance results.

如图1所示，声环境处理器120从捕获从环境发出的声音的一个或多个麦克风110接收输入音频信号。在一些实施方案中，声环境处理器120电子地而不是经由一个或多个麦克风110接收从环境发出的声音。声环境处理器120对输入音频信号执行降噪。声环境处理器120通过移除一个或多个噪声信号-包括(但不限于)麦克风(mic)嘶嘶声、稳态噪声、极低频声音(例如交通喧嚣)和其它低电平稳态声音-来清洁并增强输入音频信号，同时维持任何潜在的警戒信号完整无缺。通常，低电平声音是具有低于响度的阈值的信号电平的声音。在一些实施方案中，门可用于在传输经处理的信号作为对带通滤波器130的输出之前从输入信号移除这样的低电平信号。As shown in FIG. 1 , the acoustic environment processor 120 receives input audio signals from one or more microphones 110 that capture sounds emanating from the environment. In some implementations, acoustic environment processor 120 receives sounds emanating from the environment electronically rather than via one or more microphones 110 . The acoustic environment processor 120 performs noise reduction on an input audio signal. The acoustic environment processor 120 removes one or more noise signals - including but not limited to microphone (mic) hiss, steady state noise, very low frequency sounds (such as traffic noise) and other low level steady state sounds - to clean and enhance the incoming audio signal while leaving any potential red flags intact. Generally, a low-level sound is a sound having a signal level below a threshold of loudness. In some implementations, a gate may be used to remove such low-level signals from the input signal before transmitting the processed signal as an output to the bandpass filter 130 .

通常，稳态声音是信号的频谱保持相对恒定/随着时间的过去而缓慢变化的声音，与具有随着时间的过去而快速改变的频谱的短暂声音例如警戒信号相反。在一个例子中且并非限制，空转的汽车的声音可被考虑为稳态声音，而加速的汽车或具有旋转的引擎的汽车的声音将不被考虑为稳态声音。在另一例子中且并非限制，歌剧唱歌的声音可被考虑为稳态声音，而说话的声音将不被考虑为稳态声音。在又一例子中且并非限制，非常低的交响乐的声音可被考虑为稳态声音，而相对更快的敲击乐的声音将不被考虑为稳态声音。潜在的警戒信号包括不是低电平稳态声音的声音，例如人说话或汽车喇叭。In general, a steady state sound is one in which the spectrum of the signal remains relatively constant/changes slowly over time, as opposed to transient sounds such as warning signals that have a spectrum that changes rapidly over time. As one example, and not limitation, the sound of an idling car may be considered a steady state sound, while the sound of an accelerating car or a car with a spinning engine would not be considered a steady state sound. As another example, and not limitation, the voice of opera singing may be considered a steady state voice, while the voice of speaking would not be considered a steady state voice. In yet another example, and without limitation, very low symphonic sounds may be considered steady state sounds, while relatively faster percussive sounds would not be considered steady state sounds. Potential warning signs include sounds that are not low-level steady sounds, such as people talking or car horns.

声环境处理器120将噪声减小的信号输出到带通滤波器130。带通滤波器130应用于噪声减小的信号以产生带通滤波信号。带通滤波器130只使在预定频率范围内的频率通过以进一步提取信号内容并聚焦于包含警戒信号的特定的感兴趣频率范围。在一些实施方案中，带通滤波器130使在500-1800Hz的频率范围之间的频率通过。在其它实施方案中，带通滤波器130使在不同的频率范围之间的频率通过。在一些实施方案中，带通滤波器130在时域中操作，因此节省将信号转换成频域的成本。The acoustic environment processor 120 outputs the noise-reduced signal to the bandpass filter 130 . A bandpass filter 130 is applied to the noise-reduced signal to produce a bandpass filtered signal. Bandpass filter 130 passes only frequencies within a predetermined frequency range to further extract signal content and focus on a specific frequency range of interest containing warning signals. In some embodiments, the bandpass filter 130 passes frequencies between the frequency range of 500-1800 Hz. In other embodiments, the bandpass filter 130 passes frequencies between different frequency ranges. In some embodiments, bandpass filter 130 operates in the time domain, thus saving the cost of converting the signal to the frequency domain.

带通滤波器130将一些带通滤波信号(音频输入信号140)输出到快速RMS检测器150和慢速RMS检测器160。通常，由快速和慢速RMS检测器150和160检测的音频输入信号140包含周围环境声，其包括警戒信号和环境声。快速和慢速RMS检测器150和160可包括用于检测这两种不同类型的声音的时域检测器(其在规定的时间段期间测量输入信号140的声能)。快速和慢速RMS检测器150和160可通过在不同长度的时间段期间检测在输入信号140中的音频能量的平均RMS电平来这么做。在其它实施方案中，快速和慢速RMS检测器150和160可使用可选的信号电平测量技术而不是检测信号的RMS电平。在一个例子中且并非限制，快速和慢速RMS检测器150和160使用更复杂的音质信号电平测量技术。在另外的实施方案中，可使用不同类型的检测器，例如峰值检测器、包络检测器、能量检测器或频域检测器。Bandpass filter 130 outputs some bandpass filtered signals (audio input signal 140 ) to fast RMS detector 150 and slow RMS detector 160 . Typically, the audio input signal 140 detected by the fast and slow RMS detectors 150 and 160 contains ambient sounds, including warning signals and ambient sounds. The fast and slow RMS detectors 150 and 160 may include time domain detectors (which measure the acoustic energy of the input signal 140 during a specified period of time) for detecting these two different types of sound. The fast and slow RMS detectors 150 and 160 may do so by detecting the average RMS level of the audio energy in the input signal 140 during time periods of different lengths. In other embodiments, the fast and slow RMS detectors 150 and 160 may use alternative signal level measurement techniques instead of detecting the RMS level of the signal. By way of example and not limitation, fast and slow RMS detectors 150 and 160 use more sophisticated tone quality signal level measurement techniques. In other embodiments, different types of detectors may be used, such as peak detectors, envelope detectors, energy detectors, or frequency domain detectors.

慢速RMS检测器160可配置成检测并输出在相对较长的时间段期间(与快速RMS检测器150比较)在输入信号140中的平均能级。在相对较长的该时间段期间在输入信号140中的平均能级在本文可被称为环境声电平。环境声包括随着时间的过去保持相对恒定(与警戒信号比较)的具有相对较低的信号振幅的稳态声音，例如交通噪声、行人噪声和其它背景噪声。环境声电平用于通过应用自适应阈值函数来计算自适应阈值，如下面关于图2所讨论的。Slow RMS detector 160 may be configured to detect and output the average energy level in input signal 140 over a relatively longer period of time (compared to fast RMS detector 150 ). The average energy level in the input signal 140 during this relatively long period of time may be referred to herein as the ambient sound level. Ambient sounds include steady-state sounds with relatively low signal amplitudes that remain relatively constant over time (compared to warning signals), such as traffic noise, pedestrian noise, and other background noise. The ambient sound level is used to calculate an adaptive threshold by applying an adaptive threshold function, as discussed below with respect to FIG. 2 .

快速RMS检测器150可配置成检测并输出在相对较短的时间段期间(与慢速RMS检测器160比较)在输入信号140中的平均能量。在相对较短的该时间段期间在输入信号140中的平均能量在本文可被称为输入信号140的包络电平。快速RMS检测器150用于帮助确定输入信号140当前是否包括警戒信号。警戒信号包括随着时间的过去而快速改变(与环境声比较)的具有相对较高的信号振幅的稳态声音，例如人叫喊或汽车喇叭声。因此，警戒信号可由在短时间段期间的高声能尖峰特征化。基于输入信号140的包络电平(如由快速RMS检测器150输出的)和自适应阈值来检测警戒信号。例如，如果从快速RMS检测器150输出的包络电平超过自适应阈值，则可确定警戒信号当前存在于输入信号140中。Fast RMS detector 150 may be configured to detect and output the average energy in input signal 140 over a relatively short period of time (compared to slow RMS detector 160 ). The average energy in the input signal 140 during this relatively short period of time may be referred to herein as the envelope level of the input signal 140 . Fast RMS detector 150 is used to help determine whether input signal 140 currently includes a warning signal. Warning signals include steady-state sounds with relatively high signal amplitudes that change rapidly over time (compared to ambient sounds), such as a person shouting or a car horn. Thus, a warning signal may be characterized by a high acoustic energy spike during a short period of time. The alert signal is detected based on the envelope level of the input signal 140 (as output by the fast RMS detector 150) and an adaptive threshold. For example, if the envelope level output from the fast RMS detector 150 exceeds an adaptive threshold, it may be determined that a warning signal is currently present in the input signal 140 .

在一些实施方案中，快速RMS检测器150和慢速RMS检测器160的输出各自由下面的方程表示：In some embodiments, the outputs of fast RMS detector 150 and slow RMS detector 160 are each represented by the following equation:

v[n]＝a*u[n]+(1-a)*v[n-1] (1)v[n]=a*u[n]+(1-a)*v[n-1] (1)

在方程(1)中：In equation (1):

v[n]＝RMS检测器的当前输出值；v[n] = the current output value of the RMS detector;

a＝检测器的时间系数；a = time coefficient of the detector;

u[n]＝输入信号140；以及u[n] = input signal 140; and

v[n-1]＝RMS检测器的前一输出值。v[n-1] = previous output value of the RMS detector.

每个RMS检测器150和160的输出值可以在预定采样频率下被采样。因此，v[n]可等于对于当前采样点的检测器的当前输出值，而v[n-1]可等于对于前一采样点的RMS检测器的前一输出值。如所示，RMS检测器的当前输出值v[n]基于RMS检测器的前一输出值v[n-1]、检测器的时间系数“a”和所接收的输入信号u[n]。因此，每个RMS检测器150和160可包含用于存储以前的输出值的存储器部件(未示出)和用于使用前一输出值、时间系数“a”和所接收的输入信号来计算当前输出值的处理器部件(未示出)。在一些实施方案中，所接收的输入信号u[n]等于从带通滤波器130接收的带通滤波信号。在其它实施方案中，所接收的输入信号u[n]等于带通滤波信号，其然后由RMS检测器(如下面讨论的)整流并转换成对数域。The output value of each RMS detector 150 and 160 may be sampled at a predetermined sampling frequency. Thus, v[n] may be equal to the current output value of the detector for the current sample point, and v[n-1] may be equal to the previous output value of the RMS detector for the previous sample point. As shown, the current output value v[n] of the RMS detector is based on the previous output value v[n-1] of the RMS detector, the detector's time coefficient "a" and the received input signal u[n]. Accordingly, each RMS detector 150 and 160 may contain memory means (not shown) for storing previous output values and for computing the current A processor component (not shown) that outputs a value. In some implementations, the received input signal u[n] is equal to the bandpass filtered signal received from the bandpass filter 130 . In other embodiments, the received input signal u[n] is equal to a bandpass filtered signal, which is then rectified and converted to the logarithmic domain by an RMS detector (as discussed below).

在一些实施方案中，v[n]等于在由检测器的时间系数“a”定义的时间段期间的所接收的输入信号u[n]的平均能级。在这些实施方案中，快速RMS检测器150和慢速RMS检测器160由时间系数“a”的不同值区分开。快速RMS检测器150的输出v[n]可等于第一时间段期间的所接收的输入信号u[n]的平均能级，且慢速RMS检测器160的输出v[n]可等于第二时间段期间的所接收的输入信号u[n]的平均能级，第一时间段比第二时间段更短。例如，快速RMS检测器150的第一时间段可近似等于22ms，而慢速RMS检测器160的第二时间段可近似等于128ms。在这个例子中，在每个采样点处，快速RMS检测器150可在最后22ms期间输出所接收的输入信号u[n]的平均能级，而慢速RMS检测器160可在最后128ms期间输出所接收的输入信号u[n]的平均能级。在其它实施方案中，使用第一和第二时间段的其它值。In some embodiments, v[n] is equal to the average energy level of the received input signal u[n] during the time period defined by the detector's time coefficient "a". In these embodiments, fast RMS detector 150 and slow RMS detector 160 are distinguished by different values of time coefficient "a". The output v[n] of the fast RMS detector 150 may be equal to the average energy level of the received input signal u[n] during the first time period, and the output v[n] of the slow RMS detector 160 may be equal to the second The average energy level of the received input signal u[n] during time periods, the first time period being shorter than the second time period. For example, the first period of time for the fast RMS detector 150 may be approximately equal to 22ms, while the second period of time for the slow RMS detector 160 may be approximately equal to 128ms. In this example, at each sampling point, the fast RMS detector 150 may output the average energy level of the received input signal u[n] during the last 22 ms, while the slow RMS detector 160 may output during the last 128 ms The average energy level of the received input signal u[n]. In other embodiments, other values for the first and second time periods are used.

在可选的实施方案中，快速和慢速RMS检测器150和160每个包括对数域RMS检测器。在这些实施方案中，所接收的输入信号u[n](包括带通滤波信号)由RMS检测器整流并转换成对数(dB单位)域。在这些实施方案中，快速RMS检测器150和慢速RMS检测器160的输出各自由下面的方程表示：In an alternative embodiment, fast and slow RMS detectors 150 and 160 each comprise log domain RMS detectors. In these embodiments, the received input signal u[n] (including the band-pass filtered signal) is rectified and converted to the logarithmic (dB unit) domain by an RMS detector. In these embodiments, the outputs of fast RMS detector 150 and slow RMS detector 160 are each represented by the following equations:

v[n]＝a*log(abs(u[n]))+(1-a)*v[n-1] (2)v[n]=a*log(abs(u[n]))+(1-a)*v[n-1] (2)

例如，根据方程(2)，在每个采样点处，快速RMS检测器150可在最后22ms时间段期间输出所接收的输入信号u[n]的平均能级(在对数域中)，而慢速RMS检测器160可在最后128ms时间段期间输出所接收的输入信号u[n]的平均能级(在对数域中)。将快速和慢速RMS检测器150和160实现为对数域RMS检测器的优点是，快速和慢速RMS检测器150和160的输出值是按照对数域(例如dB FS)中的值。因此，涉及快速和慢速RMS检测器150和160的输出值的任何随后的乘和/或除运算使用对数值由加和/或减运算代替(例如以计算自适应阈值，如下面讨论的)。此外，对数域值可转换成使它们乘以的因子的dB值。For example, according to equation (2), at each sampling point, the fast RMS detector 150 may output the average energy level (in the logarithmic domain) of the received input signal u[n] during the last 22 ms time period, while The slow RMS detector 160 may output the average energy level (in the logarithmic domain) of the received input signal u[n] during the last 128 ms time period. An advantage of implementing the fast and slow RMS detectors 150 and 160 as logarithmic domain RMS detectors is that the output values of the fast and slow RMS detectors 150 and 160 are in terms of values in the logarithmic domain (eg dB FS). Thus, any subsequent multiply and/or divide operations involving the output values of the fast and slow RMS detectors 150 and 160 are replaced by add and/or subtract operations using logarithmic values (e.g., to calculate adaptive thresholds, as discussed below) . Additionally, the logarithmic field values can be transformed such that they are multiplied by The dB value of the factor.

如图1所示，快速RMS检测器150和慢速RMS检测器160每个将输出发送到警戒信号检测器170。如上面讨论的，慢速RMS检测器160的输出包括由警戒信号检测器170接收的输入信号140的环境声电平。警戒信号检测器170接着使用环境声电平通过应用自适应阈值函数来计算自适应阈值。自适应阈值规定根据环境声电平而改变的声能级。快速RMS检测器150的输出包括也由警戒信号检测器170接收的输入信号140的包络电平。警戒信号检测器170接着使用包络电平通过比较包络电平与自适应阈值来确定所接收的输入信号当前是否包含警戒信号。例如，如果从快速RMS检测器150输出的包络电平等于或大于自适应阈值电平，则可确定警戒信号当前存在于所接收的输入信号中。否则，可确定警戒信号当前不存在于所接收的输入信号中。As shown in FIG. 1 , fast RMS detector 150 and slow RMS detector 160 each send an output to alert signal detector 170 . As discussed above, the output of the slow RMS detector 160 includes the ambient sound level of the input signal 140 received by the warning signal detector 170 . The warning signal detector 170 then uses the ambient sound level to calculate an adaptive threshold by applying an adaptive threshold function. The adaptive threshold specifies the sound energy level that changes according to the ambient sound level. The output of the fast RMS detector 150 includes the envelope level of the input signal 140 also received by the alert signal detector 170 . The alert signal detector 170 then uses the envelope level to determine whether the received input signal currently contains an alert signal by comparing the envelope level with an adaptive threshold. For example, if the envelope level output from the fast RMS detector 150 is equal to or greater than the adaptive threshold level, it may be determined that a warning signal is currently present in the received input signal. Otherwise, it may be determined that an alert signal is not currently present in the received input signal.

因此，警戒信号检测器170基于所接收的输入信号的环境声电平来确定自适应阈值，并接着通过比较所接收的输入信号的包络电平与自适应阈值来确定警戒信号是否存在于所接收的输入信号中。因为自适应阈值规定根据所接收的输入信号的环境声电平而改变的声能级，在所接收的输入信号中的警戒信号的检测也根据环境声电平而改变。因此，音频处理系统100的警戒信号检测功能自动适合于变化的声环境，由此，当环境的环境声电平改变时，用于检测警戒信号的自适应阈值自动改变，而没有最终用户输入或干预。在一些实施方案中，当环境声电平增大时，自适应阈值自动增大，而当环境声电平减小时，自适应阈值自动减小(如下面关于图2所讨论的)。Therefore, the warning signal detector 170 determines an adaptive threshold based on the ambient sound level of the received input signal, and then determines whether the warning signal exists at the selected threshold by comparing the envelope level of the received input signal with the adaptive threshold. received input signal. Since the adaptive threshold specifies a sound energy level that varies according to the ambient sound level of the received input signal, the detection of an alert signal in the received input signal also varies according to the ambient sound level. Accordingly, the alerting signal detection function of the audio processing system 100 automatically adapts to changing acoustic environments, whereby when the ambient sound level of the environment changes, the adaptive thresholds for detecting alerting signals are automatically changed without end user input or input. intervene. In some embodiments, the adaptive threshold automatically increases when the ambient sound level increases and decreases automatically when the ambient sound level decreases (as discussed below with respect to FIG. 2 ).

在一些实施方案中，警戒信号检测器170还提供有条件的环境更新特征。在这些实施方案中，基于警戒信号是否由警戒信号检测器170检测来更新环境声电平(其从慢速RMS检测器160输出)。如在本文使用的，“当前”环境声电平包括由警戒信号检测器170接收并使用以检测警戒信号的、在“当前”采样点处的环境声电平。如果未检测到警戒信号，则在下一采样点处更新当前环境声电平以产生下一环境声电平(按照音频处理系统100的惯常操作)。然而，如果检测到警戒信号，则在下一采样点处不更新当前环境声电平，但更确切地，当前环境声电平仍然由警戒信号检测器170使用来检测警戒信号。当前环境声电平由警戒信号检测器170在随后的采样点处连续地循环并使用以检测警戒信号，直到警戒信号检测器170确定警戒信号不再存在于输入信号140中为止。在警戒信号检测器170确定警戒信号不再存在于输入信号140中之后，接着在下一采样点处更新当前环境声电平以产生下一环境声电平(按照音频处理系统100的惯常操作)。这确保警戒信号的相对高的能级不在随后的采样点处人为地升高环境声电平，这又将人为地升高自适应阈值。通过使当前环境声电平循环，更实际可行的环境声电平被输入到警戒信号检测器170。In some embodiments, the warning signal detector 170 also provides a conditional environment update feature. In these embodiments, the ambient sound level (which is output from the slow RMS detector 160 ) is updated based on whether an alert signal is detected by the alert signal detector 170 . As used herein, the "current" ambient sound level includes the ambient sound level at the "current" sampling point that is received and used by the alert signal detector 170 to detect an alert signal. If no warning signal is detected, the current ambient sound level is updated at the next sampling point to generate the next ambient sound level (as usual for the audio processing system 100). However, if an alert signal is detected, the current ambient sound level is not updated at the next sampling point, but rather, the current ambient sound level is still used by the alert signal detector 170 to detect the alert signal. The current ambient sound level is continuously cycled and used by the alert signal detector 170 at subsequent sampling points to detect the alert signal until the alert signal detector 170 determines that the alert signal is no longer present in the input signal 140 . After the alert signal detector 170 determines that the alert signal is no longer present in the input signal 140, the current ambient sound level is then updated at the next sample point to generate the next ambient sound level (as per usual operation of the audio processing system 100). This ensures that the relatively high energy level of the warning signal does not artificially raise the ambient sound level at subsequent sampling points, which in turn would artificially raise the adaptive threshold. By cycling the current ambient sound level, a more realistic ambient sound level is input to the alert signal detector 170 .

如图1所示，为了实现有条件的环境更新特征，警戒信号检测器170将控制信号180发送到慢速RMS检测器160。控制信号180的状态基于警戒信号是否被检测到。如果警戒信号没有由警戒信号检测器170检测到，则警戒信号检测器170将控制信号180发送到慢速RMS检测器160以使慢速RMS检测器160正常操作并在下一采样点处更新环境声电平。如果警戒信号由警戒信号检测器170检测到，则警戒信号检测器170将控制信号180发送到慢速RMS检测器160以使慢速RMS检测器160不在下一采样点处更新环境声电平和/或连续地输出/循环环境声电平。在警戒信号检测器170确定警戒信号不再存在于输入信号140中之后，警戒信号检测器170将控制信号180发送到慢速RMS检测器160以使慢速RMS检测器160正常操作并在下一采样点处更新环境声电平。As shown in FIG. 1 , to implement the conditional environment update feature, the warning signal detector 170 sends a control signal 180 to the slow RMS detector 160 . The state of the control signal 180 is based on whether an alert signal is detected. If the warning signal is not detected by the warning signal detector 170, the warning signal detector 170 sends a control signal 180 to the slow RMS detector 160 to make the slow RMS detector 160 operate normally and update the ambient sound at the next sampling point level. If the alert signal is detected by the alert signal detector 170, the alert signal detector 170 sends a control signal 180 to the slow RMS detector 160 so that the slow RMS detector 160 does not update the ambient sound level and/or at the next sampling point Or output/loop ambient sound levels continuously. After the alert signal detector 170 determines that the alert signal is no longer present in the input signal 140, the alert signal detector 170 sends a control signal 180 to the slow RMS detector 160 so that the slow RMS detector 160 operates normally and the next sample Update the ambient sound level at the point.

警戒信号检测器170也将检测信号发送到检测接收设备190，检测信号指示警戒信号是否由检测信号检测器170检测。检测接收设备190包括利用音频处理系统100的警戒信号检测能力的设备。检测接收设备190接收检测信号并基于检测信号的状态来执行另外的操作。例如，检测接收设备190可包括减小娱乐信号的声电平的收听设备，和/或如果检测信号指示警戒信号被检测到则通过收听设备重放警戒信号。作为另一例子，检测接收设备190可基于检测信号的状态来改变算法的设置，例如修改环境/声音特定的音频处理设置。例如，当检测信号指示警戒信号被检测到时，降噪设置可被修改以增加输入信号的可懂度。在其它实施方案中，检测接收设备190使用检测信号用于不同的目的，并基于检测信号的状态来执行不同的操作。The warning signal detector 170 also sends a detection signal to the detection receiving device 190 , the detection signal indicating whether the warning signal is detected by the detection signal detector 170 . Detection receiving device 190 includes a device that utilizes the alert signal detection capabilities of audio processing system 100 . The detection receiving device 190 receives the detection signal and performs additional operations based on the state of the detection signal. For example, detection receiving device 190 may include a listening device that reduces the sound level of the entertainment signal, and/or replays the warning signal through the listening device if the detection signal indicates that a warning signal was detected. As another example, the detection receiving device 190 may change the settings of the algorithm based on the state of the detection signal, such as modifying environment/sound specific audio processing settings. For example, the noise reduction setting may be modified to increase the intelligibility of the incoming signal when the detection signal indicates that a warning signal has been detected. In other embodiments, the detection receiving device 190 uses the detection signal for different purposes and performs different operations based on the state of the detection signal.

自适应阈值功能Adaptive Threshold Function

如上面讨论的，自适应阈值规定根据输入信号140的环境声电平而改变的声能级。自适应阈值是(由慢速RMS检测器160检测的)环境声电平的函数，由此，当环境的环境声电平改变时，自适应阈值自动改变。自适应阈值函数可将自适应阈值表示为环境电平的传递函数。在一些实施方案中，自适应阈值函数包括线性函数、分段线性函数或曲线函数。在其它实施方案中，自适应阈值函数包括取决于输入信号140的环境电平的任何其它类型的传递函数。As discussed above, the adaptive threshold specifies a sound energy level that changes according to the ambient sound level of the input signal 140 . The adaptive threshold is a function of the ambient sound level (detected by the slow RMS detector 160), whereby the adaptive threshold changes automatically when the ambient sound level of the environment changes. The adaptive threshold function may represent the adaptive threshold as a transfer function of the ambient level. In some embodiments, the adaptive threshold function includes a linear function, a piecewise linear function, or a curvilinear function. In other embodiments, the adaptive threshold function includes any other type of transfer function that depends on the ambient level of the input signal 140 .

在一些实施方案中，自适应阈值函数包括由下面的方程表示的分段线性函数：In some embodiments, the adaptive threshold function comprises a piecewise linear function represented by the following equation:

如果x[n]<b，y[n]＝A1*x[n]+B (3)If x[n]<b, y[n]=A1*x[n]+B (3)

如果b<x[n]，y[n]＝A2*x[n]+CIf b<x[n], y[n]=A2*x[n]+C

自适应阈值函数也可以不同的形式由下面的方程表示：The adaptive threshold function can also be expressed in different forms by the following equation:

y[n]＝max(A*x[n]+B,x[n]+C) (4)y[n]=max(A*x[n]+B,x[n]+C) (4)

在方程(3)和(4)中，In equations (3) and (4),

y[n]＝自适应阈值电平；y[n] = adaptive threshold level;

x[n]＝环境声电平(慢速RMS检测器160的输出)；x[n] = ambient sound level (output of slow RMS detector 160);

A1*x[n]+B＝第一阈值函数；A1*x[n]+B=the first threshold function;

A2*x[n]+C＝第二阈值函数；A2*x[n]+C=the second threshold function;

x[n]<b＝环境声电平的第一范围；x[n]<b=the first range of ambient sound level;

b<x[n]＝环境声电平的第二范围；以及b<x[n] = second range of ambient sound levels; and

b＝过渡声电平。b = Transition sound level.

图2示出根据各种实施方案的由图1的警戒信号检测器实现的示例性自适应阈值函数。X轴表示环境声电平(以dB FS为单位)而y轴表示自适应阈值电平(以dB FS为单位)。图2所示的自适应阈值函数由方程(3)表示。环境线曲线210表示环境声电平x[n](以dB FS为单位)。环境线曲线210分成环境声电平的第一范围220(其低于过渡声电平240)和环境声电平的第二范围230(其高于过渡声电平240)。阈值线曲线250表示自适应阈值声电平y[n](以dB FS为单位)。阈值线曲线250分成作为环境声电平的第一范围220的函数的第一阈值线260(在过渡声电平240之下)和作为环境声电平的第二范围230的函数的第二阈值线270(在过渡声电平240之上)。FIG. 2 illustrates an exemplary adaptive threshold function implemented by the warning signal detector of FIG. 1, according to various embodiments. The x-axis represents the ambient sound level (in dB FS) and the y-axis represents the adaptive threshold level (in dB FS). The adaptive threshold function shown in Fig. 2 is expressed by equation (3). The ambient line curve 210 represents the ambient sound level x[n] (in dB FS). The ambient line curve 210 is divided into a first range 220 of ambient sound levels (which is lower than the transitional sound level 240 ) and a second range 230 of ambient sound levels (which is higher than the transitional sound level 240 ). Threshold line curve 250 represents the adaptive threshold sound level y[n] (in dB FS). Threshold line curve 250 splits into a first threshold line 260 (below transitional sound level 240) as a function of a first range 220 of ambient sound levels and a second threshold as a function of a second range 230 of ambient sound levels Line 270 (above transition sound level 240).

第一阈值线260由环境声电平的第一范围220所定义的第一阈值函数(A1*x[n]+B)确定，而第二阈值线270由环境声电平的第二范围230所定义的第二阈值函数(A2*x[n]+C)确定。通过为环境声电平的不同范围(由过渡声电平240定义)设计不同的自适应阈值函数，自适应阈值函数本身可基于环境声电平的范围而改变。以这种方式，可以为环境声电平的特定范围特别设计自适应阈值函数以产生最佳性能结果。例如，可定义在“低”环境声电平中工作得更好的第一阈值函数，并可定义在“高”环境声电平中工作得更好的第二阈值函数。在另外的实施方案中，可为环境声电平的两个或多个不同的范围(例如低、中和高环境声电平)定义不同的自适应阈值函数。可在实验上确定定义并分离环境声电平的第一和第二范围的过渡声电平240以产生最佳性能结果。在一些实施方案中，过渡声电平240近似等于-65dB FS环境声电平。The first threshold line 260 is determined by the first threshold function (A1*x[n]+B) defined by the first range 220 of the ambient sound level, and the second threshold line 270 is determined by the second range 230 of the ambient sound level The defined second threshold function (A2*x[n]+C) is determined. By designing different adaptive threshold functions for different ranges of ambient sound level (defined by transition sound level 240), the adaptive threshold function itself can be changed based on the range of ambient sound levels. In this way, an adaptive threshold function can be specifically designed for a specific range of ambient sound levels to yield optimal performance results. For example, a first threshold function may be defined that works better in "low" ambient sound levels, and a second threshold function may be defined that works better in "high" ambient sound levels. In further embodiments, different adaptive threshold functions may be defined for two or more different ranges of ambient sound levels (eg, low, medium, and high ambient sound levels). The transitional sound level 240 that defines and separates the first and second ranges of ambient sound levels may be determined experimentally to yield optimal performance results. In some embodiments, transition sound level 240 is approximately equal to a -65 dB FS ambient sound level.

在图2的例子中，第一和第二阈值函数是具有不同的斜率系数“A1”和“A2”的线性函数。在其它实施方案中，第一阈值函数和/或第二阈值函数可包括非线性函数。对于第一阈值函数，“A1”是第一阈值线260的斜率系数，且“B”是第一阈值线260与y轴交叉(在0dB FS环境声电平处)时的点，如果延伸到y轴。对于第二阈值函数，“A2”是第二阈值线270的斜率系数，且“C”是第二阈值线270与y轴交叉(在0dB FS环境声电平处)时的点。斜率系数A1和A2控制陡度，自适应阈值根据在环境声电平中的变化以该陡度增大或减小。B的值确定环境声电平(例如-65dB FS)，陡度的变化在该环境声电平处开始。C的值确定环境声电平的比例因子以计算自适应阈值。In the example of FIG. 2, the first and second threshold functions are linear functions with different slope coefficients "A1" and "A2". In other embodiments, the first threshold function and/or the second threshold function may comprise non-linear functions. For the first threshold function, "A1" is the slope coefficient of the first threshold line 260, and "B" is the point where the first threshold line 260 crosses the y-axis (at 0 dB FS ambient sound level), if extended to y-axis. For the second threshold function, "A2" is the slope coefficient of the second threshold line 270, and "C" is the point where the second threshold line 270 crosses the y-axis (at 0 dB FS ambient sound level). The slope coefficients A1 and A2 control the steepness by which the adaptive threshold increases or decreases according to changes in ambient sound level. The value of B determines the ambient sound level (eg -65dB FS) at which the change in steepness begins. The value of C determines the scaling factor for the ambient sound level to calculate the adaptive threshold.

可在实验上确定A1和B的值以为环境声电平的第一范围220提供最佳性能结果，且可在实验上确定A2和C的值以为环境声电平的第二范围230提供最佳性能结果。例如，在实验上发现，按恒定的比例因子缩放环境声电平以确定自适应阈值电平对于环境声电平的较高范围230很好地起作用。因此，对于环境声电平的第二范围230的第二阈值线270的斜率A2可被设置为等于1，这产生等于环境声电平乘以恒定比例因子的自适应阈值电平。在实验上还发现，等于环境声电平乘以大约1.5的恒定比例因子的自适应阈值电平对环境声电平的较高范围230很好地起作用。在第二阈值线270中，C的值确定因而产生的恒定比例因子。因此，可使用在第二阈值线270中的C的值，其对环境声电平的较高范围230产生大约1.5的恒定比例因子。The values of A1 and B can be determined experimentally to provide the best performance results for the first range 220 of ambient sound levels, and the values of A2 and C can be determined experimentally to provide the best performance results for the second range 230 of ambient sound levels. performance results. For example, it has been found experimentally that scaling the ambient sound level by a constant scaling factor to determine an adaptive threshold level works well for the upper range 230 of the ambient sound level. Thus, the slope A2 of the second threshold line 270 for the second range 230 of ambient sound levels may be set equal to 1, which results in an adaptive threshold level equal to the ambient sound level multiplied by a constant scaling factor. It has also been found experimentally that an adaptive threshold level equal to the ambient sound level multiplied by a constant scaling factor of approximately 1.5 works well for the upper range 230 of the ambient sound level. In the second threshold line 270, the value of C determines the resulting constant scaling factor. Therefore, a value of C in the second threshold line 270 may be used that results in a constant scaling factor of about 1.5 for the upper range 230 of ambient sound level.

然而，在实验上发现，等于环境声电平乘以恒定比例因子的自适应阈值电平对环境声电平的较低范围220不很好地起作用。这是由于下面的事实：环境声电平的平均能量很低，使得如果恒定比例因子被使用，不是警戒信号的很多类型的声音(例如行走、钥匙掉落)可能被不正确地检测为警戒信号。因此，在较低环境声电平处，可使用当环境声电平减小时增大的非恒定/可变比例因子。因此，对于环境声电平的较低范围230的第一阈值线260的斜率A1可被设置为小于1，这产生当环境声电平减小时增大的可变比例因子。可变比例因子应用于环境声电平以确定自适应阈值电平。However, it was found experimentally that an adaptive threshold level equal to the ambient sound level multiplied by a constant scaling factor does not work well for the lower range 220 of the ambient sound level. This is due to the fact that the average energy of the ambient sound level is so low that many types of sounds that are not siren signals (e.g. walking, keys dropped) may be incorrectly detected as siren signals if a constant scaling factor is used . Therefore, at lower ambient sound levels, a non-constant/variable scaling factor may be used that increases when the ambient sound level decreases. Therefore, the slope A1 of the first threshold line 260 for the lower range 230 of the ambient sound level may be set to be less than 1, which results in a variable scaling factor that increases when the ambient sound level decreases. A variable scaling factor is applied to the ambient sound level to determine an adaptive threshold level.

检测在音频信号中的警戒信号Detecting warning signs in an audio signal

图3是根据各种实施方案的用于检测在音频信号内的警戒信号的方法步骤的流程图。虽然结合图1-2的系统描述了方法步骤，本领域中的技术人员将理解，配置成以任何顺序执行方法步骤的任何系统在本公开的范围内。3 is a flowchart of method steps for detecting an alert signal within an audio signal, according to various embodiments. Although the method steps are described in conjunction with the systems of FIGS. 1-2 , those skilled in the art will understand that any system configured to perform the method steps in any order is within the scope of the present disclosure.

如所示，方法300在步骤305开始，其中声环境处理器120经由音频信号接收周围环境声。音频信号捕获周围环境声，其包括警戒信号和环境声。声环境处理器120对音频信号执行降噪并将经处理的信号传输到带通滤波器130。在步骤310，带通滤波器130接收经处理的信号，应用带通滤波器以产生带通滤波信号并将带通滤波信号(音频输入信号140)传输到快速RMS检测器150和慢速RMS检测器160。输入信号140包含警戒信号和环境声。As shown, the method 300 begins at step 305, where the acoustic environment processor 120 receives ambient sound via an audio signal. The audio signal captures ambient sound, which includes warning signals and ambient sound. The acoustic environment processor 120 performs noise reduction on the audio signal and transmits the processed signal to the band pass filter 130 . In step 310, the bandpass filter 130 receives the processed signal, applies the bandpass filter to produce a bandpass filtered signal and transmits the bandpass filtered signal (audio input signal 140) to the fast RMS detector 150 and the slow RMS detector device 160. The input signal 140 includes warning signals and ambient sounds.

在步骤315，快速和慢速RMS检测器150和160每个接收输入信号140。快速和慢速RMS检测器150和160可包括在不同长度的时间段期间测量输入信号140中的音频能的平均RMS电平的时域检测器，快速RMS检测器150的时间段(例如22ms)比慢速RMS检测器160的时间段(例如128ms)短。在一些实施方案中，快速和慢速RMS检测器150和160每个包括首先将所接收的输入信号140整流并将所接收的输入信号140转换成对数(dB单位)阈的对数域RMS检测器。慢速RMS检测器160确定输入信号140的环境声电平并将环境声电平传输到警戒信号检测器170。快速RMS检测器150确定输入信号140的包络电平并将包络电平传输到警戒信号检测器170。At step 315 , fast and slow RMS detectors 150 and 160 each receive input signal 140 . The fast and slow RMS detectors 150 and 160 may include time domain detectors that measure the average RMS level of the audio energy in the input signal 140 during time periods of varying lengths, the time period of the fast RMS detector 150 (e.g., 22 ms) A shorter period than the slow RMS detector 160 (eg, 128 ms). In some embodiments, the fast and slow RMS detectors 150 and 160 each include a log-domain RMS detector that first rectifies the received input signal 140 and converts the received input signal 140 to a logarithmic (dB unit) threshold. Detector. Slow RMS detector 160 determines the ambient sound level of input signal 140 and transmits the ambient sound level to alert signal detector 170 . Fast RMS detector 150 determines the envelope level of input signal 140 and transmits the envelope level to alert signal detector 170 .

在步骤320，警戒信号检测器170接收输入信号140的环境声电平和包络电平。在步骤325，警戒信号检测器170应用自适应阈值函数以基于环境声电平来确定自适应阈值电平。例如，自适应阈值函数可包括线性函数、分段线性函数或曲线函数。At step 320 , the alert signal detector 170 receives the ambient sound level and the envelope level of the input signal 140 . At step 325, the warning signal detector 170 applies an adaptive threshold function to determine an adaptive threshold level based on the ambient sound level. For example, the adaptive threshold function may comprise a linear function, a piecewise linear function, or a curvilinear function.

在步骤330，警戒信号检测器170确定警戒信号是否存在于输入信号140中。警戒信号检测器170通过比较输入信号140的所接收的包络电平与自适应阈值电平来这么做。例如，如果包络电平等于或大于自适应阈值电平，则警戒信号检测器170确定警戒信号存在于输入信号140中。否则，警戒信号检测器170确定警戒信号当前不存在于所接收的输入信号140中。At step 330 , the alert signal detector 170 determines whether an alert signal is present in the input signal 140 . The alert signal detector 170 does this by comparing the received envelope level of the input signal 140 with an adaptive threshold level. For example, the alert signal detector 170 determines that an alert signal is present in the input signal 140 if the envelope level is equal to or greater than the adaptive threshold level. Otherwise, the alert signal detector 170 determines that an alert signal is not currently present in the received input signal 140 .

如果警戒信号检测器170确定(在步骤330—否)警戒信号不存在，则方法300在步骤340继续。如果警戒信号检测器170确定(在步骤330—是)警戒信号存在，则警戒信号检测器170(在步骤335)将控制信号180发送到慢速RMS检测器160以使慢速RMS检测器160不在下一采样点更新环境声电平并继续输出/循环当前环境声电平，直到警戒信号检测器170确定警戒信号不再存在于输入信号140中为止。方法300接着在步骤340继续。If the alert signal detector 170 determines (at step 330 —NO) that the alert signal is not present, the method 300 continues at step 340 . If the alert signal detector 170 determines (in step 330—yes) that the alert signal exists, the alert signal detector 170 (in step 335) sends a control signal 180 to the slow RMS detector 160 so that the slow RMS detector 160 does not The ambient sound level is updated at the next sample point and continues to output/cycle the current ambient sound level until the alert signal detector 170 determines that the alert signal is no longer present in the input signal 140 . Method 300 then continues at step 340 .

在步骤340，警戒信号检测器170将检测信号发送到检测接收设备190，检测信号指示警戒信号是否由警戒信号检测器170检测到。检测接收设备190接收检测信号并基于检测信号的状态来执行另外的操作。方法300然后继续进行到上面所述的步骤305。在各种实施方案中，可在连续循环中执行方法300的步骤，直到某些事件例如使包括音频处理系统100的设备断电出现为止。In step 340 , the warning signal detector 170 sends a detection signal to the detection receiving device 190 , the detection signal indicating whether the warning signal is detected by the warning signal detector 170 . The detection receiving device 190 receives the detection signal and performs additional operations based on the state of the detection signal. Method 300 then proceeds to step 305 described above. In various implementations, the steps of method 300 may be performed in a continuous loop until some event occurs, such as powering down the device including audio processing system 100 .

总而言之，在音频处理系统100中，所捕获的音频信号由声环境处理器和带通滤波器处理以向快速RMS检测器150和慢速RMS检测器160提供音频输入信号140，输入信号140包含警戒信号和环境声。慢速RMS检测器160确定被输出到警戒信号检测器170的输入信号140的环境声电平。警戒信号检测器170使用环境声电平使用自适应阈值函数来计算自适应阈值电平。快速RMS检测器150确定被输出到警戒信号检测器170的输入信号140的包络电平。警戒信号检测器170比较包络电平与自适应阈值电平以确定警戒信号是否当前存在于输入信号140中。因为自适应电平根据输入信号140的环境声电平而改变，警戒信号的检测也根据环境声电平而改变。因此，音频处理系统100的警戒信号检测功能自动适合于具有不同环境声电平的变化的声环境，而没有最终用户输入或干预。In summary, in the audio processing system 100, the captured audio signal is processed by an acoustic environment processor and a bandpass filter to provide an audio input signal 140 to a fast RMS detector 150 and a slow RMS detector 160, the input signal 140 containing the alert signal and ambient sound. Slow RMS detector 160 determines the ambient sound level of input signal 140 that is output to alert signal detector 170 . The siren detector 170 uses the ambient sound level to calculate an adaptive threshold level using an adaptive threshold function. Fast RMS detector 150 determines the envelope level of input signal 140 that is output to alert signal detector 170 . The alert signal detector 170 compares the envelope level to an adaptive threshold level to determine whether an alert signal is currently present in the input signal 140 . Since the adaptive level changes according to the ambient sound level of the input signal 140, the detection of the warning signal also changes according to the ambient sound level. Accordingly, the siren detection function of the audio processing system 100 automatically adapts to changing acoustic environments with different ambient sound levels without end user input or intervention.

本文所述的方法的至少一个优点是，可以用简单和低成本的方式实现音频处理系统，同时也检测在变化的声环境中的警戒信号。本文所述的方法的另一优点是，自适应阈值电平(用于检测警戒信号)基于环境的环境声电平而自动改变，由此，在不同的声环境当中实现警戒信号的准确检测。At least one advantage of the method described herein is that an audio processing system can be implemented in a simple and low-cost manner while also detecting warning signals in a changing acoustic environment. Another advantage of the method described herein is that the adaptive threshold level (used to detect warning signals) is automatically changed based on the ambient sound level of the environment, thereby enabling accurate detection of warning signals in different acoustic environments.

各种实施方案的描述为了说明的目的而被提出，但并没有被规定为无遗漏的或限于所公开的实施方案。很多修改和变化将对本领域中的普通技术人员是明显的而不偏离所述实施方案的范围和精神。The description of various embodiments is presented for purposes of illustration, and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and changes will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

当前实施方案的方面可被体现为系统、方法或程序产品。相应地，本公开的方面可采用完全硬件实施方案、完全软件实施方案(包括固件、常驻软件、微代码等)或组合软件和硬件方面的实施方案的形式，这些实施方案都可通常被称为“电路”、“部件”、“模块”或“系统”。此外，本公开的方面可采用体现在一个或多个计算机可读介质中的计算机程序产品的形式，计算机可读介质具有体现在其上的计算机可读程序代码。Aspects of the present embodiments may be embodied as a system, method or program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware implementation, an entirely software implementation (including firmware, resident software, microcode, etc.), or an implementation combining software and hardware aspects, all of which may be commonly referred to as is a "circuit", "component", "module" or "system". Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied thereon.

可利用一个或多个计算机可读介质的任何组合。计算机可读介质可以是计算机可读信号介质或计算机可读存储介质。计算机可读存储介质可以是例如但不限于电子、磁性、光学、电磁、红外或半导体系统、装置或设备或前述项的任何适当组合。计算机可读存储介质的更具体的例子(非详尽列表)将包括下列项：具有一个或多个电线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或闪存)、光纤、便携式光盘只读存储器(CD-ROM)、光学存储设备、磁性存储设备或前述项的任何适当组合。在本文档的上下文中，计算机可读存储介质可以是可包含或存储程序用于由或结合指令执行系统、装置或设备来使用的任何有形介质。Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example and without limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media would include the following: electrical connection with one or more wires, portable computer disk, hard disk, random access memory (RAM), read only memory (ROM) , erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer-readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.

上面参考根据本公开的实施方案的方法、装置(系统)和计算机程序产品的流程图图示和/或方框图描述了本公开的方面。将理解，流程图图示和/或方框图的每个块以及在流程图图示和/或方框图中的块的组合可由计算机程序指令实现。这些计算机程序指令可被提供到通用计算机、专用计算机的处理器或其它可编程数据处理装置以产生机器，使得经由计算机的处理器或其它可编程数据处理装置执行的指令使在一个或多个流程图图示和/或方框图中规定的功能/行动的实现成为可能。这样的处理器可以是(但不限于)通用处理器、专用处理器、应用特定处理器或现场可编程处理器或门阵列。Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a general purpose computer, a processor of a special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions executed via the computer's processor or other programmable data processing apparatus cause the Implementation of the functions/acts specified in the diagrams and/or block diagrams becomes possible. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor or a field programmable processor or gate array.

在附图中的流程图和方框图示出根据本公开的各种实施方案的系统、方法和计算机程序产品的可能实现的架构、功能和操作。在这个方面中，在流程图或方框图中的每个块可代表包括用于实现特定的逻辑功能的一个或多个可执行指令的代码的模块、段或部分。还应注意，在一些可选的实现中，在块中提到的功能可与在附图中提到的顺序不同地出现。例如，连续示出的两个块可以事实上实质上同时被执行，或块有时可以按相反的顺序执行，取决于所涉及的功能。还将注意，方框图和/或流程图图示的每个块以及在方框图和/或流程图图示中的块的组合可由执行特定的功能或行动或专用硬件和计算机指令的组合的基于专用硬件的系统实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this aspect, each block in a flowchart or block diagram may represent a module, segment or portion of code that includes one or more executable instructions for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special-purpose hardware-based components that perform the specified function or action, or combinations of special-purpose hardware and computer instructions. system implementation.

虽然前述内容目的在于本公开的实施方案，可设计本公开的其它和另外的实施方案而不偏离其基本范围和由接下来的权利要求确定的其范围。While the foregoing is aimed at embodiments of the present disclosure, other and additional embodiments of the present disclosure can be devised without departing from its basic scope and its scope as defined by the following claims.

Claims

1. An audio processing system, comprising:

a slow detector configured to determine an ambient sound level associated with an audio input signal comprising ambient sound;

a fast detector configured to determine an envelope level associated with the audio input signal; and

an alert signal detector configured to:

determining an adaptive threshold level based on the ambient sound level; and

the envelope level is compared with the adaptive threshold level to determine whether an alert signal is present in the audio input signal.

2. The audio processing system of claim 1, wherein:

the fast detector includes a time domain detector that determines an average energy level associated with the audio input signal during a first time period; and

the slow detector includes a time domain detector that determines an average energy level associated with the audio input signal during a second time period, wherein the second time period is greater than the first time period.

3. The audio processing system of claim 1, wherein each of the slow detector and the fast detector comprises a logarithmic domain Root Mean Square (RMS) detector.

4. The audio processing system of claim 1, further comprising:

An acoustic environment processor for receiving an audio signal from a microphone and performing one or more noise reduction operations on the audio signal to produce a processed signal; and

a band pass filter attenuating the processed signal outside a predetermined frequency range to produce a band pass filtered signal, wherein the band pass filtered signal is the audio input signal received by the slow detector and fast detector.

5. The audio processing system of claim 1, wherein the alert signal detector is further configured to transmit a detection signal to a detection receiving device, wherein the detection signal indicates whether an alert signal has been detected.

6. The audio processing system of claim 1, wherein the alert signal detector is configured to apply an adaptive threshold function to the ambient sound level to determine the adaptive threshold level, wherein the adaptive threshold function comprises a linear function, a piecewise linear function, or a curvilinear function.

7. The audio processing system of claim 1, wherein the adaptive threshold level increases when the ambient sound level increases and the adaptive threshold level decreases when the ambient sound level decreases.

8. The audio processing system of claim 1, wherein the alert signal detector is further configured to cause the slow detector to refrain from updating the ambient sound level associated with the audio input signal until the alert signal is not present in the audio input signal.

9. A computer-implemented method for detecting an alert signal within an audio input signal, the method comprising:

determining an ambient sound level associated with the audio input signal, wherein the audio input signal comprises one or more sounds from the surrounding environment;

determining an envelope level associated with the audio input signal;

determining an adaptive threshold level based on the ambient sound level; and

10. The computer-implemented method of claim 9, wherein:

determining the envelope level associated with the audio input signal includes determining an average energy level of the audio input signal during a first period of time; and

determining the ambient sound level associated with the audio input signal includes determining an average energy level of the audio input signal during a second time period, the second time period being longer than the first time period.

11. The computer-implemented method of claim 9, wherein determining the adaptive threshold level comprises applying an adaptive threshold function to the ambient sound level, the adaptive threshold function comprising a linear function, a piecewise linear function, or a curvilinear function.

12. The computer-implemented method of claim 9, wherein determining the adaptive threshold level comprises applying a first adaptive threshold function to the ambient sound level for a first range of ambient sound levels and a second adaptive threshold function to the ambient sound level for a second range of ambient sound levels.

13. The computer-implemented method of claim 12, wherein:

the first range of ambient sound levels is lower than the second range of ambient sound levels;

the first adaptive threshold function comprises a linear function having a first slope; and

the second adaptive threshold function includes a linear function having a second slope that is greater than the first slope.

14. The computer-implemented method of claim 13, wherein the first slope is less than 1 and the second slope is equal to 1.

15. The computer-implemented method of claim 12, wherein:

for the first range of ambient sound levels, the first adaptive threshold function generates an adaptive threshold level equal to the ambient sound level multiplied by a non-constant scale factor; and

for the second range of ambient sound levels, the second adaptive threshold function generates an adaptive threshold level equal to the ambient sound level multiplied by a constant scaling factor.

16. The computer-implemented method of claim 9, further comprising:

when it is determined that the alert signal is present in the audio input signal, the slow detector is caused to not update the ambient sound level of the audio input signal until the alert signal is no longer present in the audio input signal.

17. A computer readable storage medium comprising instructions that when executed by a processor cause the processor to detect an alert signal within an audio input signal by:

receiving an ambient sound level associated with the audio input signal, wherein the audio input signal comprises one or more sounds from an ambient environment;

Receiving an envelope level associated with the audio input signal;

determining an adaptive threshold level based on the ambient sound level; and

the envelope level is compared with the adaptive threshold level to determine whether the alert signal is present in the audio input signal.

18. The computer-readable storage medium of claim 17, wherein:

19. The computer-readable storage medium of claim 17, wherein determining the adaptive threshold level comprises applying an adaptive threshold function to the ambient sound level, the adaptive threshold function comprising a piecewise linear function comprising at least a first threshold function and a second threshold function.

20. The computer-readable storage medium of claim 17, wherein determining the adaptive threshold level comprises applying a first adaptive threshold function to the ambient sound level for a first range of ambient sound levels and applying a second adaptive threshold function to the ambient sound level for a second range of ambient sound levels.