CN108141694A - The event detection of playback management in audio frequency apparatus - Google Patents
The event detection of playback management in audio frequency apparatus Download PDFInfo
- Publication number
- CN108141694A CN108141694A CN201680058340.7A CN201680058340A CN108141694A CN 108141694 A CN108141694 A CN 108141694A CN 201680058340 A CN201680058340 A CN 201680058340A CN 108141694 A CN108141694 A CN 108141694A
- Authority
- CN
- China
- Prior art keywords
- ambient sound
- detecting
- sound
- microphone
- input signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1083—Reduction of ambient noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/002—Damping circuit arrangements for transducers, e.g. motional feedback circuits
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/81—Detection of presence or absence of voice signals for discriminating voice from music
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2410/00—Microphones
- H04R2410/05—Noise reduction with a separate noise microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
根据本公开的实施例,一种用于处理音频设备中的音频信息的方法可包括通过生成用于传递给音频设备的至少一个换能器的音频输出信号来再现音频信息,接收表示在音频设备外部的周围声音的至少一个输入信号,根据所述至少一个输入信号来检测周围声音中的近场声音,响应于检测到近场声音,修改向所述至少一个换能器再现的音频信息的特性。
According to an embodiment of the present disclosure, a method for processing audio information in an audio device may include reproducing the audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving a signal represented in the audio device at least one input signal of external ambient sound, detecting near-field sound in the ambient sound based on the at least one input signal, and modifying characteristics of audio information reproduced to the at least one transducer in response to detecting the near-field sound .
Description
相关申请交叉引用Related Application Cross Reference
本公开主张于2016年8月5日提交的美国非临时专利申请序列号15/229,429的优先权,美国非临时专利申请序列号15/229,429主张于2015年8月7日提交的美国临时专利申请序列号62/202,303、于2015年10月6日提交的美国临时专利申请序列号62/237,868和于2016年6月17日提交的美国临时专利申请序列号62/351,499的优先权,各案均以引用方式并入本文中。This disclosure claims priority to U.S. Nonprovisional Patent Application Serial No. 15/229,429 filed August 5, 2016, which claims U.S. Provisional Patent Application Serial No. 15/229,429 filed August 7, 2015 Priority of Serial No. 62/202,303, U.S. Provisional Patent Application Serial No. 62/237,868, filed October 6, 2015, and U.S. Provisional Patent Application Serial No. 62/351,499, filed June 17, 2016, each Incorporated herein by reference.
技术领域technical field
本公开的代表性实施例的领域涉及关于音频设备中的回放管理或与音频设备中的回放管理有关的方法、装置或实现。应用包括某些周围事件的检测,但不限于关于基于从多个麦克风接收到的信号来使用空间处理的近场声音检测、近距声音检测和音调报警检测的应用。The field of representative embodiments of the present disclosure relates to methods, apparatus or implementations relating to or relating to playback management in audio devices. Applications include detection of certain ambient events, but are not limited to applications relating to near-field sound detection using spatial processing based on signals received from multiple microphones, near-range sound detection, and tone alarm detection.
背景技术Background technique
个人音频设备已经很普遍并被用于各种各样的周围环境中。这些音频设备中使用的耳机已经很先进,使得因被动方法或主动方法导致的遮挡防止用户追踪在音频设备外部的周围声场。尽管增加隔离和不间断监听在大多数情况下是优选的,但有时为了安全或增强用户体验,用户听到一些特定的周围事件并响应于事件而采取适当行动是不可避免的。例如,如果用户正在通过他的耳机听音乐并被试图开始与他或她交谈的某个人打断,那么除非用户暂停回放信号或减小回放信号的音量,否则可能很难保持交谈。例如,美国专利号7,903,825提出一种音频设备,其中回放信号根据周围声场而修改。又如,美国专利号8,804,974教示个人音频设备中的周围事件检测,该周围事件检测随后可用来实现基于事件对回放内容的修改。上述参考文献还教示使用麦克风来检测多种声学事件。又如,于2014年7月7日提交的美国申请序列号14/324,286教示使用话音检测器作为事件检测器来在交谈期间调整回放信号。又如,美国专利号8,565,446教示使用来自一组多个麦克风的波达方向(DOA)估计和干扰与期望(近场)话音信号比估计来在存在非平稳背景噪声的情况下检测期望话音以控制降噪回声消除(NREC)系统中的话音增强算法。同样地,美国申请序列号13/199,593教示通过对多个麦克风的互相关分析而得到的标准化互相关统计量的最大值可以是用来检测近场话音的有效鉴频器。在美国专利号8,126,706中提出一种用于NREC系统的基于频谱平坦度测度的音乐检测器以区分背景噪声与背景音乐的存在。美国专利号7,903,825、美国专利号8,804,974、美国申请序列号14/324,286、美国专利号8,565,446、美国申请序列号13/199,593和美国专利号8,126,706以引用方式并入本文中。Personal audio devices have become ubiquitous and used in a variety of surroundings. Headphones used in these audio devices have been so advanced that occlusion due to passive or active methods prevents the user from tracking the ambient sound field outside the audio device. Although increased isolation and uninterrupted listening are preferable in most cases, sometimes it is unavoidable for users to hear some specific surrounding events and take appropriate actions in response to them for security or enhanced user experience. For example, if a user is listening to music through his headphones and is interrupted by someone trying to start a conversation with him or her, it may be difficult to maintain the conversation unless the user pauses or reduces the volume of the playback signal. For example, US Patent No. 7,903,825 proposes an audio device in which the playback signal is modified according to the surrounding sound field. As another example, US Patent No. 8,804,974 teaches ambient event detection in personal audio devices, which can then be used to enable event-based modification of playback content. The aforementioned references also teach the use of microphones to detect various acoustic events. As another example, US Application Serial No. 14/324,286, filed July 7, 2014, teaches using a speech detector as an event detector to adjust the playback signal during a conversation. As another example, U.S. Patent No. 8,565,446 teaches the use of direction-of-arrival (DOA) estimates and interference-to-desired (near-field) speech signal ratio estimates from a set of multiple microphones to detect desired speech in the presence of non-stationary background noise to control Speech enhancement algorithm in Noise Reduction Echo Cancellation (NREC) systems. Likewise, US Application Serial No. 13/199,593 teaches that the maximum value of a normalized cross-correlation statistic obtained by cross-correlation analysis of multiple microphones can be an effective discriminator for detecting near-field speech. In US Patent No. 8,126,706 a spectral flatness measure based music detector for NREC systems is proposed to distinguish background noise from the presence of background music. US Patent No. 7,903,825, US Patent No. 8,804,974, US Application Serial No. 14/324,286, US Patent No. 8,565,446, US Application Serial No. 13/199,593, and US Patent No. 8,126,706 are incorporated herein by reference.
发明内容Contents of the invention
根据本公开的教导,可以减少或消除与个人音频设备中的回放管理的事件检测的现有方法相关联的一个或更多个缺点和问题。In accordance with the teachings of the present disclosure, one or more disadvantages and problems associated with existing methods of event detection for playback management in personal audio devices may be reduced or eliminated.
根据本公开的实施例,一种用于处理音频设备中的音频信息的方法可包括通过生成用于传递给音频设备的至少一个换能器的音频输出信号来再现音频信息,接收表示在音频设备外部的周围声音的至少一个输入信号,根据所述至少一个输入信号来检测周围声音中的近场声音,响应于检测到近场声音,修改向所述至少一个换能器再现的音频信息的特性。According to an embodiment of the present disclosure, a method for processing audio information in an audio device may include reproducing the audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving a signal represented in the audio device at least one input signal of external ambient sound, detecting near-field sound from the ambient sound based on the at least one input signal, and modifying characteristics of audio information reproduced to the at least one transducer in response to detecting the near-field sound .
根据本公开的这些和其他实施例,一种用于实现音频设备的至少一部分的集成电路可包括:音频输出,被配置成通过生成用于传递给音频设备的至少一个换能器的音频输出信号来再现音频信息;麦克风输入,被配置成接收表示在音频设备外部的周围声音的输入信号;处理器,被配置成根据输入信号来检测周围声音中的近场声音,响应于检测到近场声音,修改音频信息的特性。According to these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may include an audio output configured to generate an audio output signal for delivery to at least one transducer of the audio device by to reproduce audio information; a microphone input configured to receive an input signal representing ambient sound outside the audio device; a processor configured to detect near-field sound in the ambient sound according to the input signal, in response to detecting the near-field sound , to modify the characteristics of the audio information.
根据本公开的这些和其他实施例,一种用于处理音频设备中的音频信息的方法可包括通过生成用于传递给音频设备的至少一个换能器的音频输出信号来再现音频信息,接收表示在音频设备外部的周围声音的至少一个输入信号,根据所述至少一个输入信号来检测音频事件,响应于检测到音频事件至少持续预定时间,修改向所述至少一个换能器再现的音频信息的特性。According to these and other embodiments of the present disclosure, a method for processing audio information in an audio device may include reproducing the audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving a representation At least one input signal of ambient sound external to the audio device, detecting an audio event based on the at least one input signal, modifying the reproduction of the audio information to the at least one transducer in response to detecting the audio event for at least a predetermined time characteristic.
根据本公开的这些和其他实施例,一种用于实现音频设备的至少一部分的集成电路可包括:音频输出,被配置成通过生成用于传递给音频设备的至少一个换能器的音频输出信号来再现音频信息;麦克风输入,被配置成接收表示在音频设备外部的周围声音的输入信号;处理器,被配置成根据输入信号来检测音频事件,响应于检测到音频事件至少持续预定时间,修改向所述至少一个换能器再现的音频信息的特性。According to these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may include an audio output configured to generate an audio output signal for delivery to at least one transducer of the audio device by to reproduce audio information; the microphone input is configured to receive an input signal representing ambient sound outside the audio device; the processor is configured to detect an audio event based on the input signal, and modify the audio event in response to detecting the audio event for at least a predetermined time A characteristic of the audio information reproduced to the at least one transducer.
本公开的技术优点对于本领域普通技术人员而言从本文中所包括的附图、说明书和权利要求书中可以很容易看出。实施例的目的和优点将至少通过权利要求范围中特别指出的要素、特征及组合来实现和达到。The technical advantages of the present disclosure are readily apparent to those of ordinary skill in the art from the drawings, descriptions, and claims included herein. The objects and advantages of the embodiments will be realized and attained by at least the elements, features and combinations particularly pointed out in the claims.
应当理解,上述大致说明和下面详细说明都是示例且是解释性的,而不是对本公开中所阐述的权利要求范围的限制。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the scope of the claims as set forth in the present disclosure.
附图说明Description of drawings
通过结合附图参照下面说明,可以获得对本发明实施例及其某些优点的更完整的理解,其中相同附图标记表示相同特征,其中:A more complete understanding of embodiments of the invention and some of the advantages thereof may be obtained by referring to the following description taken in conjunction with the accompanying drawings, wherein like reference numerals represent like features, in which:
图1示出根据本公开的实施例的用例场景的示例,其中此类检测器可与回放管理系统联合使用以增强用户体验;Figure 1 illustrates an example of a use case scenario in which such detectors can be used in conjunction with a playback management system to enhance user experience, according to an embodiment of the present disclosure;
图2示出根据本公开的实施例基于事件检测器的判决来修改回放信号的示例性回放管理系统;2 illustrates an exemplary playback management system that modifies playback signals based on decisions of event detectors according to an embodiment of the present disclosure;
图3示出根据本公开的实施例的示例性事件检测器;Figure 3 illustrates an exemplary event detector according to an embodiment of the present disclosure;
图4示出根据本公开的实施例用于得到近场空间统计量的系统的功能方块,该近场空间统计量可用来检测音频事件;FIG. 4 illustrates functional blocks of a system for obtaining near-field spatial statistics that can be used to detect audio events according to an embodiment of the present disclosure;
图5示出根据本公开的实施例用于检测近场声音的示例性融合逻辑;FIG. 5 illustrates exemplary fusion logic for detecting near-field sound according to an embodiment of the present disclosure;
图6示出根据本公开的实施例用于检测近距声音的示例性融合逻辑;FIG. 6 illustrates exemplary fusion logic for detecting close-range sounds according to an embodiment of the present disclosure;
图7示出根据本公开的实施例的近距话音检测器的实施例;Figure 7 illustrates an embodiment of a proximity speech detector according to an embodiment of the present disclosure;
图8示出根据本公开的实施例用于检测音调报警事件的示例性融合逻辑;Figure 8 illustrates exemplary fusion logic for detecting tone alarm events according to an embodiment of the present disclosure;
图9示出根据本公开的实施例的示例性时序图,该时序图示出可应用于瞬时音频事件检测信号以生成经验证的音频事件信号的延迟和迟滞逻辑;9 illustrates an exemplary timing diagram illustrating delay and hysteresis logic that may be applied to a transient audio event detection signal to generate a verified audio event signal, according to an embodiment of the present disclosure;
图10示出根据本公开的实施例具有延迟和迟滞逻辑的不同音频事件检测器。Figure 10 illustrates different audio event detectors with delay and hysteresis logic according to an embodiment of the disclosure.
具体实施方式Detailed ways
根据本公开的实施例,提出可使用至少三个不同的音频事件检测器的系统及方法,该至少三个不同的音频事件检测器可用在自动回放管理框架中。音频设备的此类音频事件检测器可包括:近场检测器,该近场检测器可检测何时检测到音频设备的近场声音,诸如音频设备的用户(例如,正穿戴或以其他形式使用音频设备的用户)何时讲话;近距检测器,该近距检测器可检测何时检测到音频设备的近距声音,诸如靠近音频设备的用户的另一个人何时讲话;音调报警检测器,该音调报警检测器检测在音频设备附近可能已经发生的声学报警。图1示出根据本公开的实施例的用例场景的示例,其中此类检测器可与回放管理系统联合使用以增强用户体验。According to embodiments of the present disclosure, systems and methods are presented that can use at least three different audio event detectors that can be used in an automatic playback management framework. Such audio event detectors for an audio device may include a near-field detector that detects when near-field sounds from an audio device are detected, such as when a user (e.g., is wearing or otherwise using the audio device) user of the audio device) speaks; a proximity detector that detects when a proximity sound of the audio device is detected, such as when another person near the user of the audio device speaks; a tone alarm detector , the Tone Alarm Detector detects acoustic alarms that may have occurred in the vicinity of audio equipment. Figure 1 illustrates an example of a use case scenario in which such a detector may be used in conjunction with a playback management system to enhance user experience, according to an embodiment of the present disclosure.
图2示出根据本公开的实施例基于事件检测器2的判决来修改回放信号的示例性回放管理系统。处理器50中的信号处理功能可包括声学回声消除器1,由于输出音频换能器51(例如,扬声器)和麦克风52之间的回声耦合,所以该声学回声消除器1可消除在麦克风52处接收到的声学回声。回声减小的信号可传递给事件检测器2,该事件检测器2可检测一个或更多个不同的周围事件,包括但不限于被近场检测器3检测到的近场事件(例如,包括但不限于来自音频设备的用户的话音)、被近距检测器4检测到的近距事件(例如,包括但不限于除近场声音外的话音或其他周围声音)和/或被报警检测器5检测到的音调报警事件。如果检测到音频事件,那么基于事件的回放控制部6可修改向输出音频换能器51再现的音频信息(图2中示出为“回放内容”)的特性。音频信息可包括可在输出音频换能器51处再现的任何信息,包括但不限于与经由通信网络(例如,蜂窝网络)接收到的电话交谈相关联的下行链路话音和/或来自内部音频源(例如,音乐文件、视频文件等)的内部音频。FIG. 2 illustrates an exemplary playback management system that modifies the playback signal based on the decision of the event detector 2 according to an embodiment of the present disclosure. Signal processing functions in the processor 50 may include an acoustic echo canceller 1 that cancels the acoustic echo at the microphone 52 due to the echo coupling between the output audio transducer 51 (e.g., a speaker) and the microphone 52. Acoustic echo received. The echo-reduced signal may be passed to event detector 2, which may detect one or more different ambient events, including but not limited to near-field events detected by near-field detector 3 (e.g., including but not limited to voice from the user of the audio device), proximity events detected by the proximity detector 4 (e.g., including but not limited to speech or other ambient sounds other than near-field sounds) and/or detected by the alarm detector 4 5 detected tone alarm events. If an audio event is detected, the event-based playback control 6 may modify the characteristics of the audio information reproduced to the output audio transducer 51 (shown as "playback content" in FIG. 2 ). Audio information may include any information reproducible at the output audio transducer 51, including, but not limited to, downlink speech associated with a telephone conversation received via a communications network (e.g., a cellular network) and/or from internal audio Internal audio of a source (eg, music file, video file, etc.).
图3示出根据本公开的实施例的示例性事件检测器。如图3所示,示例性事件检测器可包括语音活动检测器10、音乐检测器9、波达方向估计器7、近场空间信息提取器8、背景噪声声压级估计器11和判决融合逻辑器件12。该判决融合逻辑器件12使用来自语音活动检测器10、音乐检测器9、波达方向估计器7、近场空间信息提取器8和背景噪声声压级估计器11的信息来检测音频事件,包括但不限于近场声音、除近场声音外的近距声音和音调报警。Figure 3 illustrates an exemplary event detector according to an embodiment of the disclosure. As shown in Figure 3, an exemplary event detector may include a voice activity detector 10, a music detector 9, a direction of arrival estimator 7, a near-field spatial information extractor 8, a background noise sound pressure level estimator 11, and a decision fusion logic device 12 . The decision fusion logic 12 uses information from the voice activity detector 10, music detector 9, direction of arrival estimator 7, near-field spatial information extractor 8, and background noise sound pressure level estimator 11 to detect audio events, including But not limited to near-field sounds, proximity sounds other than near-field sounds, and tone alarms.
近场检测器3可检测包括话音的近场声音。当检测到此类近场声音时,修改向输出音频换能器51再现的音频信息,这可能是可取的,这是因为检测到近场声音可以表示用户正在参与交谈。这种近场检测可能需要能够在声音嘈杂的条件下检测近场声音,且适应在非常多样的背景噪声条件下(例如,餐厅中的背景噪声、驾驶汽车时的声噪等)对近场声音的错误检测。如下面更详细地说明,近场检测可能需要使用多个麦克风51进行空间声音处理。在一些实施例中,这种近场声音检测可以以与美国专利号8,565,446和/或美国申请序列号13/199,593中所述的方式相同或相似的方式实现。The near-field detector 3 can detect near-field sounds including speech. It may be desirable to modify the audio information reproduced to the output audio transducer 51 when such near-field sounds are detected, since the detection of near-field sounds may indicate that the user is engaging in a conversation. Such near-field detection may need to be able to detect near-field sound in noisy conditions, and adapt to the detection of near-field sound under very diverse background noise conditions (for example, background noise in a restaurant, noise while driving a car, etc.) error detection. As explained in more detail below, near-field detection may require the use of multiple microphones 51 for spatial sound processing. In some embodiments, such near-field sound detection may be accomplished in the same or similar manner as described in US Patent No. 8,565,446 and/or US Application Serial No. 13/199,593.
近距检测器4可检测除近场声音外的周围声音(例如,来自靠近用户的人的话音、背景音乐等)。如下面更详细地说明,因为可能很难区分近距声音与非平稳背景噪声和背景音乐,所以近距检测器可利用音乐检测器和噪声声压级估计来停用近距检测器4的近距检测,以避免由于近距声音的错误检测而导致不良用户体验。在一些实施例中,这种近距声音检测可以以与美国专利号8,126,706、美国专利号8,565,446和/或美国申请序列号13/199,593中所述的方式相同或相似的方式实现。The proximity detector 4 may detect surrounding sounds (for example, voices from people close to the user, background music, etc.) other than near-field sounds. As explained in more detail below, the proximity detector can utilize the music detector and the noise sound pressure level estimate to deactivate the proximity of proximity detector 4 because it may be difficult to distinguish proximity sounds from non-stationary background noise and background music. distance detection to avoid bad user experience due to false detection of close-range sounds. In some embodiments, such proximity sound detection may be accomplished in the same or similar manner as described in US Patent No. 8,126,706, US Patent No. 8,565,446, and/or US Application Serial No. 13/199,593.
音调报警检测器5可检测靠近音频设备的音调报警(例如,警笛声)。为了提供最大限度的用户体验,音调报警检测器5忽略某些报警(例如,微弱或低音量的报警),这可能是可取的。如下面更详细地说明,音调报警检测可能需要使用多个麦克风51进行空间声音处理。在一些实施例中,这种近距声音检测可以以与美国专利号8,126,706和/或美国申请序列号13/199,593中所述的方式相同或相似的方式实现。The tone alarm detector 5 may detect tone alarms (eg, sirens) approaching the audio device. In order to provide a maximum user experience, it may be desirable for the tone alarm detector 5 to ignore certain alarms (eg, alarms of low or low volume). As explained in more detail below, tone alarm detection may require spatial sound processing using multiple microphones 51 . In some embodiments, such proximity sound detection may be accomplished in the same or similar manner as described in US Patent No. 8,126,706 and/or US Application Serial No. 13/199,593.
图4示出根据本公开的实施例用于得到近场空间统计量的系统的功能方块,该近场空间统计量可用来检测音频事件。通过估计近麦克风和远麦克风之间的麦克风间声压级差值(imd),可对麦克风52执行声压级分析41(例如,如美国申请序列号13/199,593中所述)。可对被麦克风52接收到的信号执行互相关分析13以获得撞击在麦克风52上的周围声音的波达方向信息DOA(例如,如美国专利号8,565,446中所述)。在互相关分析13中,还可获得最大标准化相关值normMaxCorr(例如,如美国申请序列号13/199,593中所述)。语音活动检测器10可检测话音的存在性并生成表示周围声音中的话音的存在或不存在的信号speechDet(例如,如美国专利号7,492,889的以概率为基础基于话音存在/不存在的方法中所述)。波束成形器15可基于来自麦克风52的信号来生成近场信号估计和干扰信号估计,该近场信号估计和该干扰信号估计可被噪声分析14用来判定周围声音中的噪声声压级noiseLevel和干扰与近场信号比idr。美国专利号8,565,446描述一种使用一对波束成形器15来估计干扰与近场信号比idr的示例方法。语音活动检测器36可使用干扰估计来检测任何不是来源于期望信号方向的话音信号(proxSpeechDet)。只要周围声音的波达方向估计DOA在近场声音的接受角之外,就可基于波达方向估计DOA通过更新干扰信号能量来执行噪声分析14。近场声音的波达方向在个人音频设备的工业设计中对于给定麦克风阵列配置可以是先验已知的。FIG. 4 illustrates functional blocks of a system for deriving near-field spatial statistics that can be used to detect audio events according to an embodiment of the present disclosure. Sound pressure level analysis 41 may be performed on microphone 52 by estimating the inter-microphone sound pressure level difference (imd) between the near and far microphones (eg, as described in US Application Serial No. 13/199,593). Cross-correlation analysis 13 may be performed on signals received by microphone 52 to obtain direction of arrival information DOA for ambient sound impinging on microphone 52 (eg, as described in US Patent No. 8,565,446). In cross-correlation analysis 13, the maximum normalized correlation value normMaxCorr can also be obtained (eg, as described in US Application Serial No. 13/199,593). Voice activity detector 10 may detect the presence of speech and generate a signal speechDet indicative of the presence or absence of speech in ambient sound (e.g., as in the Probabilistic-Based Speech Presence/Absence-Based Method of U.S. Patent No. 7,492,889 described). Beamformer 15 may generate near-field signal estimates and interfering signal estimates based on signals from microphones 52, which may be used by noise analysis 14 to determine the noise level noiseLevel and Interference vs. near-field signal ratio idr. US Patent No. 8,565,446 describes an example method for estimating the interference-to-near-field signal ratio idr using a pair of beamformers 15 . The voice activity detector 36 may use the interference estimate to detect any voice signal (proxSpeechDet) that does not originate from the desired signal direction. The noise analysis 14 may be performed by updating the interfering signal energy based on the estimated DOA of the direction of arrival DOA of the ambient sound as long as it is outside the acceptance angle of the near-field sound. The direction of arrival of near-field sound may be known a priori for a given microphone array configuration in the industrial design of personal audio devices.
然后可使用由图4的系统生成的多种统计量来检测近场声音的存在性。图5示出根据本公开的实施例用于检测近场声音的示例性融合逻辑。如图5所示,当满足下面所有标准时,可以检测到近场话音:The various statistics generated by the system of FIG. 4 can then be used to detect the presence of near-field sound. FIG. 5 illustrates exemplary fusion logic for detecting near-field sound according to an embodiment of the disclosure. As shown in Figure 5, near-field speech can be detected when all of the following criteria are met:
·周围声音的波达方向估计DOA在近场声音的接受角内(方块16);Direction of Arrival estimation DOA of ambient sound is within the acceptance angle of near-field sound (block 16);
·最大标准化互相关统计量normMaxCorr大于阈值normMaxCorrThres1(方块17);The maximum normalized cross-correlation statistic normMaxCorr is greater than the threshold normMaxCorrThres1 (block 17);
·干扰与近场期望信号比idr小于阈值idrThres1(方块18);The interference to near-field desired signal ratio idr is less than the threshold idrThres1 (block 18);
·检测到语音活动,如由信号speechDet表示(方块19);Speech activity is detected, as indicated by the signal speechDet (block 19);
·麦克风间声压级差值统计量imd大于阈值imdTh(方块42)。• The inter-microphone sound pressure level difference statistic imd is greater than a threshold imdTh (block 42).
在一些实施例中,阈值idrThres和imdTh可以基于背景噪声声压级估计而动态调整。In some embodiments, the thresholds idrThres and imdTh may be dynamically adjusted based on background noise sound pressure level estimates.
近距检测器4的近距检测可能不同于近场检测器3的近场声音检测,这是因为近距话音的信号特性可能与诸如音乐和噪声的周围信号非常相似。因此,近距检测器4必须避免近距话音的错误检测以达到可接受的用户体验。因此,只要背景中有音乐,就可使用音乐检测器9来停用近距检测。同样地,只要背景噪声声压级高于某个阈值,就可停用近距检测器4。可以先验判定背景噪声阈值,使得低于阈值声压级的错误检测的可能性非常低。图6示出根据本公开的实施例用于检测近距声音(例如,话音)的示例性融合逻辑。此外,可能存在许多产生瞬时性声学刺激的环境噪声源。这些噪声类型可被话音检测器错误地检测为话音信号。为了降低错误检测的可能性,可使用来自音乐检测器9的频谱平坦度测度(SFM)统计量来区分话音与瞬时噪声。例如,可在一段时间内追踪SFM,并可计算同一时间段内最大SFM值和最小SFM值之差,该差定义为sfmSwing。sfmSwing的值对于瞬时噪声信号通常可能很小,这是因为这些信号的频谱成分是宽带性的且它们在短时间间隔内(300ms-500ms)趋于平稳。sfmSwing的值对于话音信号可能较高,这是因为话音信号的频谱成分可能比瞬时信号变化更快。如图6所示,当满足下面所有标准时,可以检测到近距声音(例如,话音):Proximity detection by proximity detector 4 may differ from near field sound detection by near field detector 3 because the signal characteristics of near speech may be very similar to ambient signals such as music and noise. Therefore, the proximity detector 4 must avoid false detection of near speech in order to achieve an acceptable user experience. Therefore, the music detector 9 can be used to disable proximity detection as long as there is music in the background. Likewise, the proximity detector 4 may be deactivated as long as the background noise sound pressure level is above a certain threshold. The background noise threshold can be determined a priori such that the probability of false detection below the threshold sound pressure level is very low. FIG. 6 illustrates exemplary fusion logic for detecting close sounds (eg, speech) according to an embodiment of the disclosure. In addition, there may be many sources of ambient noise that produce transient acoustic stimuli. These types of noise can be falsely detected as speech signals by speech detectors. To reduce the possibility of false detections, the Spectral Flatness Measure (SFM) statistic from the music detector 9 can be used to distinguish speech from transient noise. For example, SFM can be tracked over a period of time, and the difference between the maximum and minimum SFM values over the same period of time can be calculated, defined as sfmSwing. The value of sfmSwing may generally be small for transient noise signals because the spectral components of these signals are broadband in nature and they level off over short time intervals (300ms-500ms). The value of sfmSwing may be higher for speech signals, since the spectral content of speech signals may vary faster than instantaneous signals. As shown in Figure 6, proximity sounds (e.g., voices) can be detected when all of the following criteria are met:
·背景中未检测到音乐(方块20);No music detected in the background (block 20);
·波达方向估计DOA在近距声音的接受角内(方块21);Direction of Arrival estimated DOA is within the acceptance angle of close sounds (block 21);
·最大标准化互相关统计量normMaxCorr大于阈值normMaxCorrThres2(方块22);The maximum normalized cross-correlation statistic normMaxCorr is greater than the threshold normMaxCorrThres2 (block 22);
·背景噪声声压级noiseLevel低于阈值noiseLevelTh(方块23);The background noise sound pressure level noiseLevel is lower than the threshold noiseLevelTh (block 23);
·检测到近距语音活动,如由信号proxSpeechDet表示(方块19);Proximity speech activity is detected, as indicated by the signal proxSpeechDet (block 19);
·SFM变化统计量sfmSwing大于阈值sfmSwingTh(方块37);The SFM change statistic sfmSwing is greater than the threshold sfmSwingTh (block 37);
·干扰与近场期望信号比idr大于阈值idrThres2(方块40);The interference to near-field desired signal ratio idr is greater than the threshold idrThres2 (block 40);
·麦克风间声压级差值统计量imd接近0dB(方块43)。• The inter-microphone sound pressure level difference statistic imd approaches 0 dB (block 43).
在一些实施例中,可使用在美国专利号8,126,706中教示的音乐检测器来实现用来检测背景音乐的存在性的音乐检测器9。根据本公开的实施例,近距话音检测器的另一个实施例如图7所示。根据本实施例,如果满足下面条件,那么可以检测到近距话音。In some embodiments, the music detector 9 used to detect the presence of background music may be implemented using the music detector taught in US Patent No. 8,126,706. According to an embodiment of the present disclosure, another embodiment of the proximity speech detector is shown in FIG. 7 . According to the present embodiment, a close voice can be detected if the following conditions are satisfied.
·干扰与近场期望信号比idr大于阈值idrThres2(方块39);The interference to near-field desired signal ratio idr is greater than the threshold idrThres2 (block 39);
·检测到近距语音活动(方块27);· Proximity voice activity is detected (block 27);
·最大标准化互相关统计量normMaxCorr大于阈值normMaxCorrThres3(方块28);The maximum normalized cross-correlation statistic normMaxCorr is greater than the threshold normMaxCorrThres3 (block 28);
·波达方向估计DOA在近距声音的接受角内(方块29);Direction of Arrival estimated DOA is within the acceptance angle of near sounds (block 29);
·背景中未检测到音乐(方块30);No music detected in the background (block 30);
·存在低声压级或中声压级背景噪声或不存在背景噪声(方块31)。这个条件通过比较估计的背景噪声声压级与阈值noiseLevelThLo来验证。如果检测到低噪声声压级,那么还测试下面两个条件以确认近距话音的存在:• Presence of low or medium sound pressure level background noise or absence of background noise (block 31 ). This condition is verified by comparing the estimated background noise sound pressure level with the threshold noiseLevelThLo. If low noise sound pressure levels are detected, the following two conditions are also tested to confirm the presence of close speech:
·SFM变化统计量sfmSwing大于阈值sfmSwingTh(方块38);The SFM change statistic sfmSwing is greater than the threshold sfmSwingTh (block 38);
·麦克风间声压级差值统计量imd接近0dB(方块44)。• The inter-microphone sound pressure level difference statistic imd approaches 0 dB (block 44).
如果在方块31处未满足上述背景噪声声压级条件,那么下面条件可以表示近距话音,以在不增加发生错误报警(例如,由于背景噪声条件)的情况下提高近距话音的检测率:If the above-mentioned background noise sound pressure level condition is not met at block 31, the following conditions may represent close speech to increase the detection rate of close speech without increasing the occurrence of false alarms (e.g., due to background noise conditions):
·存在平稳背景噪声(方块32)。可通过计算由音乐检测器(方块9)在一段时间内生成的SFM的峰值与均方根值比来检测平稳背景噪声。具体地,如果上述比值较高,那么可能存在非平稳噪声,这是因为非平稳噪声的频谱平坦度测度往往比平稳噪声变化更快;• There is stationary background noise (block 32). Stationary background noise can be detected by calculating the peak-to-rms ratio of the SFM generated by the music detector (block 9) over a period of time. Specifically, if the above ratio is high, non-stationary noise may be present, since the spectral flatness measure of non-stationary noise tends to change faster than stationary noise;
·存在高噪声声压级(方块32)。如果估计的背景噪声大于阈值noiseLevelLo且小于阈值noiseLevelHi,那么可以检测到高噪声条件。如果在方块32处未满足以上平稳噪声和波达方向条件,那么下面该组两个条件的存在可以表示近距话音的存在:• There is a high noise sound pressure level (block 32). A high noise condition may be detected if the estimated background noise is greater than a threshold noiseLevelLo and less than a threshold noiseLevelHi. If the above stationary noise and direction of arrival conditions are not met at block 32, the presence of the following set of two conditions may indicate the presence of close speech:
·存在亲密交谈的近距交谈者(方块33)。当最大标准化互相关统计量normMaxCorr大于阈值normMaxCorrThres4(阈值normMaxCorrThres4可能大于normMaxCorrThres3以表示亲密交谈者的存在)时,可以检测到亲密交谈的近距交谈者;• Proximity talkers with intimate conversations (block 33). When the maximum normalized cross-correlation statistic normMaxCorr is greater than the threshold normMaxCorrThres4 (threshold normMaxCorrThres4 may be greater than normMaxCorrThres3 to indicate the presence of the intimate interlocutor), the close chatter of the intimate conversation can be detected;
·存在低声压级或中声压级或高声压级背景噪声或不存在背景噪声(方块34)。如果估计的背景噪声声压级小于阈值noiseLevelThHi,那么可以检测到这个条件。• Presence of low or medium or high sound pressure level background noise or absence of background noise (block 34). This condition may be detected if the estimated background noise sound pressure level is less than a threshold noiseLevelThHi.
如果在方块29处未满足上述波达方向条件,那么下面条件的存在可以表示近距话音:If the DOA condition above is not met at block 29, then the presence of the following conditions may indicate close speech:
·不存在音乐(方块35);No music is present (block 35);
·存在亲密交谈的近距交谈者(方块33)。当最大标准化互相关统计量normMaxCorr大于阈值normMaxCorrThres4(阈值normMaxCorrThres4可能大于normMaxCorrThres3以表示亲密交谈者的存在)时,可以检测到亲密交谈的近距交谈者;• Proximity talkers with intimate conversations (block 33). When the maximum normalized cross-correlation statistic normMaxCorr is greater than the threshold normMaxCorrThres4 (threshold normMaxCorrThres4 may be greater than normMaxCorrThres3 to indicate the presence of the intimate interlocutor), the close chatter of the intimate conversation can be detected;
·存在低声压级或中声压级或高声压级背景噪声或不存在背景噪声(方块34)。如果估计的背景噪声声压级小于阈值noiseLevelThHi,那么可以检测到这个条件。• Presence of low or medium or high sound pressure level background noise or absence of background noise (block 34). This condition may be detected if the estimated background noise sound pressure level is less than a threshold noiseLevelThHi.
音调报警检测器5可被配置成检测音调性报警信号,其中此类报警信号的声波带宽也很窄(例如,警笛声、蜂鸣声)。在一些实施例中,可通过将时域信号通过时频变换分成多个子频带来检测周围声音的音调,并可在各个子频带中计算出图6中示出为由音乐检测器9生成的信号sfm[]的频谱平坦度测度。可以估计所有子频带的频谱平坦度测度sfm[],如果频谱在大部分子频带中但不是在全部子频带中是平坦的,那么可以检测到音调报警。此外,在回放管理系统中,可能无需检测远场报警信号。因此,图3的近场空间统计量8可被用来区分远场报警信号与近场信号。图8示出根据本公开的实施例用于检测音调报警事件(例如,警笛声、蜂鸣声)的示例性融合逻辑。如图8所示,当满足下面所有标准时,可以检测到音调报警事件:The tonal alarm detector 5 may be configured to detect tonal alarm signals, wherein such alarm signals also have a narrow acoustic bandwidth (eg siren, buzzer). In some embodiments, the pitch of the ambient sound can be detected by dividing the time-domain signal into a plurality of sub-bands through time-frequency transformation, and the signal shown in FIG. 6 as generated by the music detector 9 can be calculated in each sub-band Spectral flatness measure for sfm[]. The spectral flatness measure sfm[] can be estimated for all subbands, and a tone alarm can be detected if the spectrum is flat in most but not all of the subbands. Also, in playback management systems, there may be no need to detect far-field alarm signals. Therefore, the near-field spatial statistics 8 of FIG. 3 can be used to distinguish far-field warning signals from near-field signals. 8 illustrates exemplary fusion logic for detecting a tone alarm event (eg, siren, buzzer) according to an embodiment of the disclosure. As shown in Figure 8, a tone alarm event can be detected when all of the following criteria are met:
·波达方向估计DOA在报警信号的接受角内(方块24);Direction of arrival estimated DOA is within the acceptance angle of the alarm signal (block 24);
·最大标准化互相关统计量normMaxCorr大于阈值normMaxCorrThres5(方块25);The maximum normalized cross-correlation statistic normMaxCorr is greater than the threshold normMaxCorrThres5 (block 25);
·频谱平坦度测度sfm[]表示噪声频谱在大部分子频带但不是全部子频带中是平坦的(方块26)。• The spectral flatness measure sfm[] indicates that the noise spectrum is flat in most but not all subbands (block 26).
实际上,如图5、图6、图7和图8所示的近场检测器3、近距检测器4和音调报警检测器5的瞬时音频事件检测可表示错误音频事件。因此,在将事件检测信号传递到回放控制部6之前验证瞬时音频事件检测信号,这可能是可取的。图9示出根据本公开的实施例的示例性时序图,该时序图示出可应用于瞬时音频事件检测信号以生成经验证的音频事件信号的延迟和迟滞逻辑。如图9所示,响应于瞬时检测到音频事件(例如,近场声音、近距声音、音调报警事件)至少持续预定时间,延迟逻辑可生成经验证的音频事件信号,而迟滞逻辑可继续使经验证的音频事件信号有效,直至音频事件的瞬时检测在第二预定时间内已经停止。Indeed, transient audio event detection by near field detector 3, proximity detector 4 and tone alarm detector 5 as shown in Figs. 5, 6, 7 and 8 may indicate false audio events. Therefore, it may be advisable to verify the transient audio event detection signal before passing the event detection signal to the playback control section 6 . 9 illustrates an exemplary timing diagram illustrating delay and hysteresis logic that may be applied to a transient audio event detection signal to generate a verified audio event signal, according to an embodiment of the disclosure. As shown in FIG. 9, in response to a momentary detection of an audio event (e.g., near field sound, near sound, tone alarm event) for at least a predetermined time, delay logic may generate a verified audio event signal, while hysteresis logic may continue to use The verified audio event signal is valid until the momentary detection of the audio event has ceased within the second predetermined time.
根据本公开的实施例,下面伪代码可以演示用来减少音频事件的错误检测的延迟和迟滞逻辑的应用。The following pseudocode may demonstrate the application of delay and hysteresis logic to reduce false detection of audio events according to an embodiment of the present disclosure.
/*If the instant.detect is true,increment the hold off counter andreset the hang over counter*//*If the instant. detect is true, increment the hold off counter and reset the hang over counter*/
If(instDet==TRUE)If(instDet==TRUE)
{{
holdOffCntr=holdOffCntr+1;holdOffCntr=holdOffCntr+1;
hangOverCntr=0;hangOverCntr = 0;
}}
/*If the instant.detect is false,increment the hang over counter andreset the hold off counter*//*If the instant. detect is false, increment the hang over counter and reset the hold off counter*/
elseelse
{{
hangOverCntr=hangOverCntr+1;hangOverCntr=hangOverCntr+1;
holdOffCntr=0;holdOffCntr = 0;
}}
/******************/******************
*Hold-off Logic**Hold-off Logic*
******************/*******************/
/*Valid detect will transition to true state if the instant.detect iscontinuously true for certain time and the previous valid detect is false*/if(holdOffCntr>holdOffThres&&validDet==FALSE)/*Valid detect will transition to true state if the instant.detect is continuously true for certain time and the previous valid detect is false*/if(holdOffCntr>holdOffThres&&validDet==FALSE)
{{
validDet=TRUE;validDet = TRUE;
holdOffCntr=0;holdOffCntr = 0;
hangOverCntr=0;hangOverCntr = 0;
}}
/******************/******************
*Hang-Over Logic**Hang-Over Logic*
******************/*******************/
/*Valid NF detect will transition to false state if the instant.NFdetect is continuously false for certain time and the previous valid NFdetect is true*//*Valid NF detect will transition to false state if the instant.NFdetect is continuously false for certain time and the previous valid NFdetect is true*/
If(hangOverCntr>hangOverThres&&validDet==TRUE)If(hangOverCntr>hangOverThres&&validDet==TRUE)
{{
validDet=FALSE;validDet = FALSE;
holdOffCntr=0;holdOffCntr = 0;
hangOverCntr=0;hangOverCntr = 0;
}}
在生成回放模式切换控制之前,可进一步验证经验证的事件。例如,下面伪代码可以演示用于在交谈模式(例如,其中响应于音频事件,可修改向输出音频换能器51再现的音频信息)和正常回放模式(例如,其中未修改向输出音频换能器51再现的音频信息)之间适度地切换的延迟和迟滞逻辑的应用。Authenticated events can be further validated before generating the playback mode switch control. For example, the following pseudocode can be demonstrated for use in talk mode (e.g., where the audio information reproduced to the output audio transducer 51 can be modified in response to an audio event) and normal playback mode (e.g., where the output audio transducer 51 is not modified). The application of delay and hysteresis logic to switch gracefully between the audio information reproduced by the device 51).
/***********************************/************************************
*Conversational Mode Enter Logic**Conversational Mode Enter Logic*
***********************************/*************************************/
/*Increment the time to enter conversational mode counter if theevent detect is true and the mode is not in the conversational mode.If thecounter exceeds the threshold,switch to conversational mode and reset thecounters.Note that the event detect need not be true contiguously.*/if(convModeEn==FALSE&&validDet==TRUE)/*Increment the time to enter conversational mode counter if the event detect is true and the mode is not in the conversational mode. If the counter exceeds the threshold, switch to conversational mode and reset the counters. Note that the event detect need not be true contiguously. */if(convModeEn==FALSE&&validDet==TRUE)
{{
timeToEnterConvModeCntr=timeToEnterConvModeCntr+1;timeToEnterConvModeCntr=timeToEnterConvModeCntr+1;
if(timeToEnterConvModeCntr>timeToEnterConvModeThres)if(timeToEnterConvModeCntr>timeToEnterConvModeThres)
{{
convModeEn=TRUE;convModeEn = TRUE;
timeToEnterConvModeCntr=0;timeToEnterConvModeCntr = 0;
timeToExitConvModeCntr=0;timeToExitConvModeCntr = 0;
}}
}}
/***********************************/************************************
*Conversational Mode Exit Logic**Conversational Mode Exit Logic*
***********************************/*************************************/
/*Increment the time to exit conversational mode counter if the eventdetect is false and the mode is in the conversational mode.If the counterexceeds the threshold,switch to normal mode and reset the counters.Note thatthe event detect must be false contiguously.*//*Increment the time to exit conversational mode counter if the event detect is false and the mode is in the conversational mode. If the counter exceeds the threshold, switch to normal mode and reset the counters. Note that the event detect must be false contiguously.*/
if(convModeEn==TRUE&&validDet==FALSE)if(convModeEn==TRUE&&validDet==FALSE)
{{
timeToExitConvModeCntr++;timeToExitConvModeCntr++;
if(timeToExitConvModeCntr>timeToExitConvModeThres)if(timeToExitConvModeCntr>timeToExitConvModeThres)
{{
convModeEn=FALSE;convModeEn = FALSE;
timeToEnterConvModeCntr=0;timeToEnterConvModeCntr = 0;
timeToExitConvModeCntr=0;timeToExitConvModeCntr = 0;
}}
}}
elseelse
{{
timeToExitConvModeCntr=0;timeToExitConvModeCntr = 0;
}}
图10示出根据本公开的实施例具有延迟和迟滞逻辑的不同音频事件检测器。可以不同地设定各个检测器的延迟周期和/或迟滞周期。另外,在一些实施例中,可以基于检测到的事件的类型来不同地控制回放管理。在这些和其他实施例中,如图9所示,只要检测到音频事件中的一个或更多个音频事件,就可以衰减回放增益(从而衰减在输出音频换能器51处再现的音频信息)。在这些和其他实施例中,为了提供平滑增益跃迁,可使用由下面伪代码表示的一阶指数平均值滤波器对回放增益进行平滑化:Figure 10 illustrates different audio event detectors with delay and hysteresis logic according to an embodiment of the disclosure. The delay period and/or hysteresis period of each detector may be set differently. Additionally, in some embodiments, playback management may be controlled differently based on the type of event detected. In these and other embodiments, as shown in FIG. 9, the playback gain (and thus attenuation of the audio information reproduced at the output audio transducer 51) may be attenuated whenever one or more of the audio events are detected. . In these and other embodiments, to provide smooth gain transitions, the playback gain may be smoothed using a first order exponential averaging filter represented by the following pseudocode:
if(convModeEn==TRUE)if(convModeEn==TRUE)
{{
playBackGain=(1-alpha)*convModeGain+alpha*playBackGainplayBackGain=(1-alpha)*convModeGain+alpha*playBackGain
}}
elseelse
{{
playBackGain=(1-beta)*normalModeGain+beta*playBackGainplayBackGain=(1-beta)*normalModeGain+beta*playBackGain
}}
平滑参数α和β可以设置为不同值以调整增益斜率。The smoothing parameters α and β can be set to different values to adjust the gain slope.
应当理解,特别是得益于本公开的本领域普通技术人员,本文中特别是结合附图所述的多种操作可通过其他电路或其他硬件部件来实现。执行给定方法的各个操作的顺序可以改变,且本文中所示的系统的多种元件可以添加、重新排序、合并、省略、修改等。本公开旨在包含所有此类修改和改变,因此,以上说明应当被认为是说明性而不是限制性。It should be understood that various operations described herein, especially with reference to the accompanying drawings, may be implemented by other circuits or other hardware components, especially by those skilled in the art having benefited from the present disclosure. The order in which the various operations of a given method are performed may be changed, and various elements of the systems shown herein may be added, reordered, combined, omitted, modified, etc. The present disclosure is intended to embrace all such modifications and changes, and thus, the above description should be considered as illustrative rather than restrictive.
同样地,虽然本公开参照具体实施例,但是在不脱离本公开的范围的情况下,可对这些实施例作出某些修改和改变。此外,本文中就具体实施例所述的任何益处、优点或技术方案并不旨在被解释为关键的、必须的或必要的特征或要素。Likewise, while this disclosure refers to specific embodiments, certain modifications and changes may be made to these embodiments without departing from the scope of the present disclosure. Furthermore, any benefits, advantages or solutions described herein with respect to specific embodiments are not intended to be construed as critical, required or essential features or elements.
得益于本公开,更多实施例同样地对于本领域普通技术人员而言是显而易见的,且此类实施例应当被认为包含在本文中。Further embodiments will likewise be apparent to persons of ordinary skill in the art having the benefit of this disclosure, and such embodiments should be considered to be included herein.
Claims (68)
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562202303P | 2015-08-07 | 2015-08-07 | |
US62/202,303 | 2015-08-07 | ||
US201562237868P | 2015-10-06 | 2015-10-06 | |
US62/237,868 | 2015-10-06 | ||
US201662351499P | 2016-06-17 | 2016-06-17 | |
US62/351,499 | 2016-06-17 | ||
PCT/US2016/045834 WO2017027397A2 (en) | 2015-08-07 | 2016-08-05 | Event detection for playback management in an audio device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108141694A true CN108141694A (en) | 2018-06-08 |
CN108141694B CN108141694B (en) | 2021-03-16 |
Family
ID=62079093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680058340.7A Active CN108141694B (en) | 2015-08-07 | 2016-08-05 | Event detection for playback management in audio devices |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3332558B1 (en) |
CN (1) | CN108141694B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4306115A (en) * | 1980-03-19 | 1981-12-15 | Humphrey Francis S | Automatic volume control system |
JP2004336251A (en) * | 2003-05-02 | 2004-11-25 | Alpine Electronics Inc | Hearing defect preventing apparatus |
CN1682441A (en) * | 2002-07-26 | 2005-10-12 | 摩托罗拉公司(在特拉华州注册的公司) | Electrical impedance based audio compensation in audio devices and methods therefor |
WO2008083315A2 (en) * | 2006-12-31 | 2008-07-10 | Personics Holdings Inc. | Method and device configured for sound signature detection |
JP2011097268A (en) * | 2009-10-28 | 2011-05-12 | Sony Corp | Playback device, headphone, and playback method |
US20140270200A1 (en) * | 2013-03-13 | 2014-09-18 | Personics Holdings, Llc | System and method to detect close voice sources and automatically enhance situation awareness |
US20140286497A1 (en) * | 2013-03-15 | 2014-09-25 | Broadcom Corporation | Multi-microphone source tracking and noise suppression |
-
2016
- 2016-08-05 CN CN201680058340.7A patent/CN108141694B/en active Active
- 2016-08-05 EP EP16763354.4A patent/EP3332558B1/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4306115A (en) * | 1980-03-19 | 1981-12-15 | Humphrey Francis S | Automatic volume control system |
CN1682441A (en) * | 2002-07-26 | 2005-10-12 | 摩托罗拉公司(在特拉华州注册的公司) | Electrical impedance based audio compensation in audio devices and methods therefor |
JP2004336251A (en) * | 2003-05-02 | 2004-11-25 | Alpine Electronics Inc | Hearing defect preventing apparatus |
WO2008083315A2 (en) * | 2006-12-31 | 2008-07-10 | Personics Holdings Inc. | Method and device configured for sound signature detection |
US20080240458A1 (en) * | 2006-12-31 | 2008-10-02 | Personics Holdings Inc. | Method and device configured for sound signature detection |
JP2011097268A (en) * | 2009-10-28 | 2011-05-12 | Sony Corp | Playback device, headphone, and playback method |
US20140270200A1 (en) * | 2013-03-13 | 2014-09-18 | Personics Holdings, Llc | System and method to detect close voice sources and automatically enhance situation awareness |
US20140286497A1 (en) * | 2013-03-15 | 2014-09-25 | Broadcom Corporation | Multi-microphone source tracking and noise suppression |
Also Published As
Publication number | Publication date |
---|---|
EP3332558B1 (en) | 2021-12-01 |
CN108141694B (en) | 2021-03-16 |
EP3332558A2 (en) | 2018-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102409536B1 (en) | Event detection for playback management on audio devices | |
US11614916B2 (en) | User voice activity detection | |
US9269367B2 (en) | Processing audio signals during a communication event | |
TWI720314B (en) | Correlation-based near-field detector | |
KR102578147B1 (en) | Method for detecting user voice activity in a communication assembly, its communication assembly | |
JP5551176B2 (en) | Audio source proximity estimation using sensor array for noise reduction | |
JP5581329B2 (en) | Conversation detection device, hearing aid, and conversation detection method | |
US20180330745A1 (en) | Dual microphone voice processing for headsets with variable microphone array orientation | |
KR20090050372A (en) | Method and apparatus for removing noise from mixed sound | |
CN112424863A (en) | Voice perception audio system and method | |
JP2023159381A (en) | Speech recognition audio system and method | |
EP2896126B1 (en) | Long term monitoring of transmission and voice activity patterns for regulating gain control | |
KR102112018B1 (en) | Apparatus and method for cancelling acoustic echo in teleconference system | |
EP3332558B1 (en) | Event detection for playback management in an audio device | |
WO2019169272A1 (en) | Enhanced barge-in detector | |
WO2021239254A1 (en) | A own voice detector of a hearing device | |
US20230421952A1 (en) | Subband domain acoustic echo canceller based acoustic state estimator | |
CN116783900A (en) | Acoustic state estimator based on subband-domain acoustic echo canceller | |
Kim et al. | Robust relative transfer function estimation for dual microphone-based generalized sidelobe canceller | |
JP2010050512A (en) | Voice mixing device, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |