[go: up one dir, main page]

CN118841022A - Audio processing method, processing system, medium and program product - Google Patents

Audio processing method, processing system, medium and program product Download PDF

Info

Publication number
CN118841022A
CN118841022A CN202411192487.8A CN202411192487A CN118841022A CN 118841022 A CN118841022 A CN 118841022A CN 202411192487 A CN202411192487 A CN 202411192487A CN 118841022 A CN118841022 A CN 118841022A
Authority
CN
China
Prior art keywords
audio
noise
sound source
propagation environment
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411192487.8A
Other languages
Chinese (zh)
Inventor
尹文斌
赵跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jingxiong Technology Co ltd
Original Assignee
Shenzhen Jingxiong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jingxiong Technology Co ltd filed Critical Shenzhen Jingxiong Technology Co ltd
Publication of CN118841022A publication Critical patent/CN118841022A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application relates to the technical field of audio processing, in particular to an audio processing method, a processing system, a medium and a program product, wherein the method comprises the steps of acquiring a propagation environment parameter, and determining a target array microphone configuration parameter corresponding to the propagation environment based on the propagation environment parameter, wherein the target array microphone configuration parameter comprises the number, the type and the distance; acquiring propagation environment audio corresponding to a propagation environment based on the configuration parameters of the target array microphone; identifying audio features contained in the propagation environment audio, and determining master audio and slave audio from the propagation environment audio based on the audio features, wherein the audio features comprise frequency, time domain and variation amplitude; determining a secondary audio signal of the secondary audio based on the primary audio, the audio features and a preset mathematical model; noise reduction processing is carried out on transmission environment audio based on the slave audio signals, noise reduction audio is obtained, and audio transmission is carried out according to the noise reduction audio. The application is convenient for improving the accuracy of the audio processing result.

Description

一种音频处理方法、处理系统、介质以及程序产品Audio processing method, processing system, medium and program product

技术领域Technical Field

本申请涉及音频处理技术领域,尤其是涉及一种音频处理方法、处理系统、介质以及程序产品。The present application relates to the field of audio processing technology, and in particular to an audio processing method, processing system, medium and program product.

背景技术Background Art

声音在传播过程中可能会遇到墙壁、天花板以及地板等障碍物,因此可能会产生混响回声,这些混响回声可能会干扰原始音频,从而使原始音频变得模糊和不清晰,另外,声音传播的现场环境中可能会存在较多的环境噪声,例如,空调声、交谈声、走路声等,环境噪声也可能因掩盖原始音频,而导致原始音频无法清晰传播。During the process of sound propagation, sound may encounter obstacles such as walls, ceilings, and floors, which may produce reverberation echoes. These reverberation echoes may interfere with the original audio, making the original audio blurred and unclear. In addition, there may be a lot of ambient noise in the on-site environment where sound is propagated, such as air conditioning sounds, conversation sounds, walking sounds, etc. The ambient noise may also cover up the original audio, causing the original audio to be unable to be clearly propagated.

相关技术中,一般会采用阵列麦克风技术对声音传播过程中遭受到的混响回声和环境噪声等进行抑制,以提升音频传播质量,但是,当传播环境中存在多声源和多噪声源时,仅采用阵列麦克风技术对传播音频进行处理时,可能会因无法有效地区分和处理多个不同的声源和噪声源,而影响音频处理结果的准确性,从而导致影响音频传播质量下降。In the related art, array microphone technology is generally used to suppress reverberation echoes and environmental noise encountered during sound propagation in order to improve the quality of audio propagation. However, when there are multiple sound sources and noise sources in the propagation environment, using only array microphone technology to process the propagated audio may affect the accuracy of the audio processing results due to the inability to effectively distinguish and process multiple different sound sources and noise sources, thereby affecting the quality of audio propagation.

发明内容Summary of the invention

为了提升音频处理结果的准确性,从而提升音频传播质量,本申请提供了一种音频处理方法、处理系统、介质以及程序产品。In order to improve the accuracy of audio processing results and thus improve the quality of audio transmission, the present application provides an audio processing method, a processing system, a medium and a program product.

第一方面,本申请提供一种音频处理方法,采用如下的技术方案:In a first aspect, the present application provides an audio processing method, which adopts the following technical solution:

一种音频处理方法,包括:An audio processing method, comprising:

获取传播环境参数,并基于所述传播环境参数确定与传播环境对应的目标阵列麦克风配置参数,所述传播环境参数包括场地尺寸、场地布局、场地材料,所述目标阵列麦克风配置参数包括数量、类型以及间距;Acquire propagation environment parameters, and determine target array microphone configuration parameters corresponding to the propagation environment based on the propagation environment parameters, wherein the propagation environment parameters include site size, site layout, and site material, and the target array microphone configuration parameters include quantity, type, and spacing;

基于所述目标阵列麦克风配置参数,采集所述传播环境对应的传播环境音频;Based on the target array microphone configuration parameters, collecting propagation environment audio corresponding to the propagation environment;

识别所述传播环境音频中包含的音频特征,并基于音频特征从所述传播环境音频中确定出主音频和从音频,所述音频特征包括频率、时域以及变化幅度;Identify audio features contained in the propagation environment audio, and determine the main audio and the slave audio from the propagation environment audio based on the audio features, wherein the audio features include frequency, time domain, and variation amplitude;

基于所述主音频、所述音频特征和预设数学模型,确定所述从音频对应的从音频信号;Determine a slave audio signal corresponding to the slave audio based on the master audio, the audio feature and a preset mathematical model;

基于所述从音频信号对所述传播环境音频进行降噪处理,得到降噪音频,并根据所述降噪音频进行音频传播。The propagation environment audio is subjected to noise reduction processing based on the secondary audio signal to obtain noise-reduced audio, and audio propagation is performed based on the noise-reduced audio.

通过采用上述技术方案,通过采用与传播环境对应的麦克风配置参数进行音频采集便于提升音频采集范围,而不是采用固定的配置参数进行音频采集,便于为后续的音频处理提供高质量的原始音频数据,另外,通过音频特征区分传播环境音频中的主音频和从音频,即区分传播环境音频中的优先级,便于为主音频提供优先传播资源,另外,通过预设数学模型对从音频进行处理得到从音频信号后,再基于从音频信号对传播环境音频进行降噪处理,便于提升音频降噪处理和效率和精度,从而便于提升音频传播质量。By adopting the above technical scheme, the audio collection range can be improved by adopting microphone configuration parameters corresponding to the propagation environment for audio collection, rather than adopting fixed configuration parameters for audio collection, so as to provide high-quality original audio data for subsequent audio processing. In addition, the main audio and the slave audio in the propagation environment audio are distinguished by audio features, that is, the priority in the propagation environment audio is distinguished, so as to provide priority propagation resources for the main audio. In addition, after the slave audio is processed by a preset mathematical model to obtain a slave audio signal, the propagation environment audio is denoised based on the slave audio signal, so as to improve the audio noise reduction processing efficiency and accuracy, thereby improving the audio propagation quality.

在一种可能实现的方式中,所述基于所述传播环境参数确定与传播环境对应的目标阵列麦克风配置参数,包括:In a possible implementation, determining the target array microphone configuration parameters corresponding to the propagation environment based on the propagation environment parameters includes:

基于所述传播环境参数和预设标准场地模型,确定对应的目标场地模型;Determining a corresponding target site model based on the propagation environment parameters and a preset standard site model;

根据所述传播环境参数确定多个初始阵列麦克风配置,并基于每个初始阵列麦克风配置和所述目标场地模型分别进行音频传播模拟,得到每个初始阵列麦克风配置对应的初始模拟参数,每个初始模拟参数中包含模拟回声参数和模拟混响参数;Determine a plurality of initial array microphone configurations according to the propagation environment parameters, and perform audio propagation simulation based on each initial array microphone configuration and the target site model to obtain initial simulation parameters corresponding to each initial array microphone configuration, wherein each initial simulation parameter includes a simulated echo parameter and a simulated reverberation parameter;

将预设传播标准参数与每个初始模拟参数进行匹配,得到每个初始模拟参数对应的模拟分值,并将模拟分值最高的初始模拟参数确定为所述传播环境对应的目标阵列麦克风配置参数。The preset propagation standard parameters are matched with each initial simulation parameter to obtain a simulation score corresponding to each initial simulation parameter, and the initial simulation parameter with the highest simulation score is determined as the target array microphone configuration parameter corresponding to the propagation environment.

通过采用上述技术方案,根据传播环境参数确定对应的目标阵列麦克风配置,以便于提升阵列麦克风配置与传播环境之间的适配度,从而便于提升音频在传播环境中的覆盖程度,另外,通过模型模拟和预设传播标准参数共同确定传播环境对应的目标阵列麦克风配置,便于提升确定目标阵列麦克风配置时的速率和准确性。By adopting the above technical scheme, the corresponding target array microphone configuration is determined according to the propagation environment parameters, so as to improve the compatibility between the array microphone configuration and the propagation environment, thereby facilitating the improvement of the audio coverage in the propagation environment. In addition, the target array microphone configuration corresponding to the propagation environment is jointly determined through model simulation and preset propagation standard parameters, so as to improve the speed and accuracy of determining the target array microphone configuration.

在一种可能实现的方式中,所述基于所述主音频、所述音频特征和预设数学模型,确定所述从音频对应的从音频信号,包括:In a possible implementation, determining the slave audio signal corresponding to the slave audio based on the master audio, the audio feature, and a preset mathematical model includes:

从所述音频特征中识别所述主音频对应的主音频特征,并基于所述主音频特征和所述音频特征,从所述从音频中确定出相关从音频和非相关从音频;Identify a main audio feature corresponding to the main audio from the audio feature, and determine relevant slave audio and irrelevant slave audio from the slave audio based on the main audio feature and the audio feature;

获取噪声调整量化参数,并根据所述相关从音频、所述非相关从音频、所述噪声调整参数以及预设数学模型,确定所述从音频对应的从音频信号。A noise adjustment quantization parameter is obtained, and a slave audio signal corresponding to the slave audio is determined according to the relevant slave audio, the unrelated slave audio, the noise adjustment parameter, and a preset mathematical model.

通过采用上述技术方案,由于不同的噪声对应的降噪处理方式不同,因此通过主音频特征对从音频进行进一步细分,便于提升降噪过程中的效率和准确度,再基于噪声调整量化参数和预设数学计算模型确定最终的从音频信号,便于针对性的降低或消除噪声成分,便于提升确定从音频信号时的准确性。By adopting the above technical solution, since different noises correspond to different noise reduction processing methods, the slave audio is further subdivided according to the main audio characteristics, so as to improve the efficiency and accuracy of the noise reduction process, and then determine the final slave audio signal based on the noise adjustment quantization parameter and the preset mathematical calculation model, so as to reduce or eliminate the noise components in a targeted manner and improve the accuracy of determining the slave audio signal.

在一种可能实现的方式中,当监测到主音频存在至少两个主音频特征时,该方法还包括:In a possible implementation, when it is detected that the main audio has at least two main audio features, the method further includes:

获取声波记录信息,并基于所述声波记录信息,确定每个主音频特征对应的声源位置,所述声波记录信息中包含所述主音频抵达所述阵列麦克风中每个麦克风的抵达时刻;Acquire sound wave recording information, and determine the sound source position corresponding to each main audio feature based on the sound wave recording information, wherein the sound wave recording information includes the arrival time of the main audio at each microphone in the array microphone;

获取每个声源位置对应范围内产生的声源环境音频,并基于每个主音频特征从对应声源环境音频中确定声源从音频;Acquire the sound source environment audio generated within the corresponding range of each sound source position, and determine the sound source secondary audio from the corresponding sound source environment audio based on each main audio feature;

识别每个声源从音频的噪声信息,并通过对比各个噪声信息判断所有的声源从音频对应的噪声信息是否相同,噪声信息包含噪声类型和噪声响度;Identify the noise information of each sound source from the audio, and determine whether the noise information corresponding to the audio of all sound sources is the same by comparing the noise information, the noise information includes the noise type and the noise loudness;

若是,则采用同一降噪方式对所述至少两个主音频特征对应的声源环境音频进行降噪处理。If so, the same noise reduction method is used to perform noise reduction processing on the sound source environment audio corresponding to the at least two main audio features.

通过采用上述技术方案,通过主音频抵达每个麦克风处时的抵达时刻,便于准确定位出每个主音频对应的声源位置,另外,通过比较不同主音频对应声源位置处的噪声信息是否一致,以便于对不同声源位置对应的主音频是否可采用同一降噪处理操作进行判断,而不是针对每个主音频均需进行繁杂的降噪处理确定操作,便于提升降噪处理速率。By adopting the above technical solution, the sound source position corresponding to each main audio is accurately located by the arrival time of the main audio at each microphone. In addition, by comparing whether the noise information at the sound source positions corresponding to different main audios is consistent, it is convenient to judge whether the main audios corresponding to different sound source positions can use the same noise reduction processing operation, instead of performing complicated noise reduction processing determination operations for each main audio, so as to improve the noise reduction processing rate.

在一种可能实现的方式中,所述识别每个声源从音频的噪声信息,并通过对比各个噪声信息判断所有的声源从音频对应的噪声信息是否相同,包括:In a possible implementation, the identifying the noise information of each sound source from the audio, and determining whether the noise information corresponding to all the sound source from the audio is the same by comparing the noise information, includes:

识别每个声源从音频在频谱图中的频率分布和峰值,并基于每个声源从音频对应的频率分布和峰值,确定每个声源从音频对应的第一噪声展示图;Identify the frequency distribution and peak value of each sound source from the audio in the spectrum diagram, and determine a first noise display diagram corresponding to each sound source from the audio based on the frequency distribution and peak value corresponding to each sound source from the audio;

识别每个声源从音频的音频响度,并基于响度映射关系确定每个声源从音频对应的第二噪声展示图,所述响度映射关系为音频响度与第二噪声展示图之间的对应关系;Identify the audio loudness of each sound source slave audio, and determine a second noise display map corresponding to each sound source slave audio based on a loudness mapping relationship, wherein the loudness mapping relationship is a correspondence between the audio loudness and the second noise display map;

将每个声源从音频的第一噪声展示图和第二噪声展示图进行图像叠加,得到每个声源从音频的噪声叠加展示图;Superimpose the first noise display image and the second noise display image of each sound source from the audio to obtain a noise superposition display image of each sound source from the audio;

将每个声源从音频对应的噪声叠加展示图进行图像匹配,根据图像匹配结果判断所有的声源从音频对应的噪声信息是否相同,当所有的声源从音频对应的噪声叠加展示图之间的匹配值不低于预设匹配值时,表征所有的声源从音频对应的噪声信息相同。Perform image matching on the noise superposition display map corresponding to the audio of each sound source, and determine whether the noise information corresponding to the audio of all sound sources is the same based on the image matching results. When the matching value between the noise superposition display maps corresponding to the audio of all sound sources is not lower than the preset matching value, it indicates that the noise information corresponding to the audio of all sound sources is the same.

通过采用上述技术方案,通过将每个声源从音频对应的噪声信息利用图像的形式具象化展示,便于相关工作人员直观查看每个声源从音频之间的特性和相似之处,另外,由于在确定声源从音频对应的噪声叠加展示图时一般通过提取关键特征确定,因此基于图像匹配的方式判断多个声源从音频对应的噪声信息是否相似时,便于忽略次要或冗余数据,从而便于提升判断结果的准确性。By adopting the above technical solution, by visually displaying the noise information corresponding to each sound source from the audio in the form of an image, it is convenient for relevant staff to intuitively view the characteristics and similarities between each sound source from the audio. In addition, since the noise superposition display diagram corresponding to the sound source from the audio is generally determined by extracting key features, when judging whether the noise information corresponding to multiple sound sources from the audio is similar based on image matching, it is convenient to ignore secondary or redundant data, thereby facilitating improving the accuracy of the judgment result.

在一种可能实现的方式中,当所述传播环境内存在关注人员时,该方法还包括:In a possible implementation, when there is a person of interest in the communication environment, the method further includes:

获取所述关注人员的关注位置和听力损失参数;Acquiring the concerned position and hearing loss parameters of the concerned person;

基于所述关注位置与所述主音频对应声源位置确定声源间隔和声源方向,并基于所述声源间隔和声源方向以及响度映射关系,确定所述关注位置对应的响度调整参数;Determine a sound source interval and a sound source direction based on the focus position and the sound source position corresponding to the main audio, and determine a loudness adjustment parameter corresponding to the focus position based on the sound source interval and the sound source direction and the loudness mapping relationship;

基于所述响度调整参数和所述听力损失参数对所述降噪音频进行优化,得到优化降噪音频,并将所述优化降噪音频反馈至所述关注人员佩戴的助听设备中The noise reduction audio is optimized based on the loudness adjustment parameter and the hearing loss parameter to obtain an optimized noise reduction audio, and the optimized noise reduction audio is fed back to the hearing aid device worn by the concerned person.

通过采用上述技术方案,当转播环境内包含有关注人员时,通过及时根据关注人员的所在位置和听力损失情况对降噪音频进行优化,以便于确定出适合关注人员自身听力状况的音频输出,通过个性化调整便于提升关注人员的音频体验,也便于提升音频处理过程中的鲁棒性。By adopting the above technical solution, when the broadcast environment includes persons of interest, the noise reduction audio is optimized in a timely manner according to the location and hearing loss of the persons of interest, so as to determine the audio output suitable for the hearing condition of the persons of interest. Through personalized adjustment, the audio experience of the persons of interest is improved, and the robustness of the audio processing process is also improved.

第二方面,本申请提供一种处理系统,采用如下的技术方案:In a second aspect, the present application provides a processing system, which adopts the following technical solution:

一种处理系统,该处理系统包括:A processing system, the processing system comprising:

至少一个处理器;at least one processor;

存储器;Memory;

至少一个应用程序,其中所述至少一个应用程序被存储在存储器中并被配置为由至少一个处理器执行,所述至少一个应用程序配置用于:执行上述音频处理方法。At least one application, wherein the at least one application is stored in a memory and configured to be executed by at least one processor, and the at least one application is configured to: execute the above audio processing method.

第三方面,本申请提供一种计算机可读存储介质,采用如下的技术方案:In a third aspect, the present application provides a computer-readable storage medium, which adopts the following technical solution:

一种计算机可读存储介质,包括:存储有能够被处理器加载并执行上述音频处理方法的计算机程序。A computer-readable storage medium includes: a computer program that can be loaded by a processor and execute the above audio processing method.

第四方面,本申请提供了一种计算机程序产品,采用如下的技术方案:In a fourth aspect, the present application provides a computer program product, which adopts the following technical solution:

一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述音频处理方法。A computer program product comprises a computer program, wherein the computer program implements the above-mentioned audio processing method when executed by a processor.

综上所述,本申请包括以下至少一种有益技术效果:In summary, the present application includes at least one of the following beneficial technical effects:

通过采用与传播环境对应的麦克风配置参数进行音频采集便于提升音频采集范围,而不是采用固定的配置参数进行音频采集,便于为后续的音频处理提供高质量的原始音频数据,另外,通过音频特征区分传播环境音频中的主音频和从音频,即区分传播环境音频中的优先级,便于为主音频提供优先传播资源,另外,通过预设数学模型对从音频进行处理得到从音频信号后,再基于从音频信号对传播环境音频进行降噪处理,便于提升音频降噪处理和效率和精度,从而便于提升音频传播质量。By adopting microphone configuration parameters corresponding to the propagation environment for audio collection, instead of adopting fixed configuration parameters for audio collection, it is convenient to improve the audio collection range, so as to provide high-quality original audio data for subsequent audio processing. In addition, the main audio and the slave audio in the propagation environment audio are distinguished by audio characteristics, that is, the priority in the propagation environment audio is distinguished, so as to provide priority propagation resources for the main audio. In addition, after the slave audio is processed by a preset mathematical model to obtain the slave audio signal, the propagation environment audio is denoised based on the slave audio signal, so as to improve the audio noise reduction processing efficiency and accuracy, thereby improving the audio propagation quality.

通过主音频抵达每个麦克风处时的抵达时刻,便于准确定位出每个主音频对应的声源位置,另外,通过比较不同主音频对应声源位置处的噪声信息是否一致,以便于对不同声源位置对应的主音频是否可采用同一降噪处理操作进行判断,而不是针对每个主音频均需进行繁杂的降噪处理确定操作,便于提升降噪处理速率。By knowing the arrival time of the main audio at each microphone, it is convenient to accurately locate the sound source position corresponding to each main audio. In addition, by comparing whether the noise information at the sound source positions corresponding to different main audios is consistent, it is convenient to judge whether the main audios corresponding to different sound source positions can use the same noise reduction processing operation, instead of performing complicated noise reduction processing determination operations for each main audio, thereby facilitating the improvement of the noise reduction processing rate.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例中一种音频处理方法的流程示意图;FIG1 is a schematic diagram of a flow chart of an audio processing method in an embodiment of the present application;

图2是本申请实施例中另一种音频处理方法的流程示意图;FIG2 is a flow chart of another audio processing method in an embodiment of the present application;

图3是本申请实施例中一种处理系统的结构示意图。FIG. 3 is a schematic diagram of the structure of a processing system in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

以下结合附图1-3对本申请作进一步详细说明。The present application is further described in detail below in conjunction with Figures 1-3.

本领域技术人员在阅读完本说明书后可以根据需要对本实施例做出没有创造性贡献的修改,但只要在本申请的权利要求范围内都受到专利法的保护。After reading this specification, those skilled in the art may make non-creative modifications to this embodiment as needed, but such modifications are protected by patent law as long as they are within the scope of the claims of this application.

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present application clearer, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

需要说明的是,在本申请的可选实施例中,所涉及到的对象信息等相关的数据,当本申请中的实施例运用到具体产品或技术中时,需要获得对象许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。也就是说,本申请实施例中如果涉及到与对象有关的数据,需要经由对象授权同意、相关部门授权同意、且符合国家和地区的相关法律法规和标准的情况下获取的。实施例中如涉及个人信息,所有个人信息的获取需要获得个人的同意,如涉及到敏感信息,需要征得信息主体的单独同意,实施例也是需要在对象授权同意的情况下实施。It should be noted that in the optional embodiments of the present application, the object information and other related data involved, when the embodiments in the present application are applied to specific products or technologies, need to obtain the permission or consent of the object, and the collection, use and processing of the relevant data need to comply with the relevant laws, regulations and standards of the relevant countries and regions. In other words, if the embodiments of the present application involve data related to the object, it needs to be obtained with the authorization and consent of the object, the authorization and consent of the relevant departments, and in compliance with the relevant laws, regulations and standards of the country and region. If personal information is involved in the embodiments, the acquisition of all personal information needs to obtain the consent of the individual. If sensitive information is involved, the separate consent of the information subject needs to be obtained. The embodiments also need to be implemented with the authorization and consent of the object.

具体的,本申请实施例提供了一种音频处理方法,由处理系统执行,该处理系统可以为服务器也可以为终端设备,其中,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器。终端设备可以是智能手机、平板电脑、笔记本电脑、台式计算机等,但并不局限于此,该终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例在此不做限制。Specifically, the embodiment of the present application provides an audio processing method, which is executed by a processing system, and the processing system can be a server or a terminal device, wherein the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services. The terminal device can be a smart phone, a tablet computer, a laptop computer, a desktop computer, etc., but is not limited thereto. The terminal device and the server can be directly or indirectly connected via wired or wireless communication, and the embodiment of the present application does not limit this.

参考图1,图1是本申请实施例中一种音频处理方法的流程示意图,该方法包括步骤S110-步骤S150,其中:Referring to FIG. 1 , FIG. 1 is a schematic flow chart of an audio processing method in an embodiment of the present application, the method comprising steps S110 to S150, wherein:

步骤S110:获取传播环境参数,并基于传播环境参数确定与传播环境对应的目标阵列麦克风配置参数,传播环境参数包括场地尺寸、场地布局、场地材料,目标阵列麦克风配置参数包括数量、类型以及间距。Step S110: Acquire propagation environment parameters, and determine target array microphone configuration parameters corresponding to the propagation environment based on the propagation environment parameters, the propagation environment parameters include venue size, venue layout, venue material, and the target array microphone configuration parameters include quantity, type, and spacing.

具体的,传播环境为需要进行音频处理的场地,例如,传播环境可以为互动式课堂、礼堂、报告厅等,具体传播环境在本申请实施例中不做具体限定。传播环境参数包括但不限于场地尺寸、场地布局以及场地材料等,传播环境参数可能会影响阵列麦克风的排列方式,因此,在对音频处理过程中需要对传播环境参数进行分析,传播环境参数越多,基于传播环境参数进行音频处理的效果越好。不同的传播环境对应的传播环境参数不同,传播环境参数可由相关工作人员上传。Specifically, the propagation environment is a venue where audio processing is required. For example, the propagation environment can be an interactive classroom, an auditorium, a lecture hall, etc. The specific propagation environment is not specifically limited in the embodiments of this application. The propagation environment parameters include but are not limited to the venue size, venue layout, and venue materials. The propagation environment parameters may affect the arrangement of the array microphone. Therefore, the propagation environment parameters need to be analyzed during the audio processing. The more propagation environment parameters there are, the better the effect of audio processing based on the propagation environment parameters. Different propagation environments correspond to different propagation environment parameters, and the propagation environment parameters can be uploaded by relevant staff.

阵列麦克风由多个麦克风组成,通过阵列麦克风可采集不同位置处的音频,由于不同传播环境对音频传播的要求不同,例如在商场、机场等环境,对降噪的要求高,常采用6麦配置,在家庭、办公室等环境,对降噪的要求相对较低,常采用双麦或4麦配置。阵列麦克风配置参数不仅包含数量,还包括麦克风类型、麦克风之间的间隔距离以及麦克风设置方向,其中,麦克风类型可以包含单向麦克风、双向麦克风以及心形麦克风等,单向麦克风的特点为仅可接收从指定方向发出的声音;双向麦克风的特点为可接收来自前向和后向的声音,适用于会议、采访环境等;心形麦克风的特点为可接收来自前方和两侧的声音,适用于录音室、演唱会等。目标阵列麦克风配置中可能包含不同类型的麦克风,即,在目标阵列麦克风配置中通常会根据实际需求选择适合的麦克风类型进行组合。目标阵列麦克风配置中,不同麦克风之间的间距越大,目标阵列麦克风可达到的增益越大。An array microphone consists of multiple microphones. The array microphone can collect audio at different locations. Due to different requirements for audio propagation in different propagation environments, for example, in shopping malls, airports and other environments, the requirements for noise reduction are high, and a 6-microphone configuration is often used. In homes, offices and other environments, the requirements for noise reduction are relatively low, and a dual-microphone or 4-microphone configuration is often used. The array microphone configuration parameters include not only the number, but also the microphone type, the spacing between microphones, and the direction in which the microphones are set. Among them, the microphone type can include unidirectional microphones, bidirectional microphones, and cardioid microphones. The characteristic of a unidirectional microphone is that it can only receive sound from a specified direction; the characteristic of a bidirectional microphone is that it can receive sound from the front and rear, which is suitable for meetings, interviews, etc.; the characteristic of a cardioid microphone is that it can receive sound from the front and both sides, which is suitable for recording studios, concerts, etc. The target array microphone configuration may include different types of microphones, that is, in the target array microphone configuration, suitable microphone types are usually selected for combination according to actual needs. In the target array microphone configuration, the larger the spacing between different microphones, the greater the gain that the target array microphone can achieve.

根据传播环境参数确定与对应的目标阵列麦克风配置参数时,可根据相关工作人员提前上传至处理系统的参数映射关系进行确定,其中参数映射关系中包含有不同传播环境参数对应的目标阵列麦克风配置参数,参数映射关系可由相关工作人员根据实验数据确定,进一步地,为了便于提升确定目标阵列麦克风配置时的速率和准确性,基于传播环境参数确定与传播环境对应的目标阵列麦克风配置参数,具体可以包括:When determining the corresponding target array microphone configuration parameters according to the propagation environment parameters, the determination can be made according to the parameter mapping relationship uploaded to the processing system in advance by the relevant staff, wherein the parameter mapping relationship includes the target array microphone configuration parameters corresponding to different propagation environment parameters, and the parameter mapping relationship can be determined by the relevant staff according to the experimental data. Further, in order to facilitate improving the speed and accuracy of determining the target array microphone configuration, determining the target array microphone configuration parameters corresponding to the propagation environment based on the propagation environment parameters can specifically include:

基于传播环境参数和预设标准场地模型,确定对应的目标场地模型;根据传播环境参数确定多个初始阵列麦克风配置,并基于每个初始阵列麦克风配置和目标场地模型分别进行音频传播模拟,得到每个初始阵列麦克风配置对应的初始模拟参数,每个初始模拟参数中包含模拟回声参数和模拟混响参数;将预设传播标准参数与每个初始模拟参数进行匹配,得到每个初始模拟参数对应的模拟分值,并将模拟分值最高的初始模拟参数确定为传播环境对应的目标阵列麦克风配置参数。Based on the propagation environment parameters and the preset standard site model, the corresponding target site model is determined; multiple initial array microphone configurations are determined according to the propagation environment parameters, and audio propagation simulation is performed based on each initial array microphone configuration and the target site model to obtain initial simulation parameters corresponding to each initial array microphone configuration, each initial simulation parameter includes a simulated echo parameter and a simulated reverberation parameter; the preset propagation standard parameters are matched with each initial simulation parameter to obtain a simulation score corresponding to each initial simulation parameter, and the initial simulation parameter with the highest simulation score is determined as the target array microphone configuration parameter corresponding to the propagation environment.

其中,预设标准场地模型为与传播环境对应的基础模型,基于传播环境参数和预设标准场地模型确定对应的目标场地模型,即,根据传播环境参数优化基础模型,以将基础模型优化为与传播环境参数对应的目标场地模型,目标场地模型的模型参数与传播环境参数一致。传播环境参数与初始阵列麦克风配置之间存在对应关系,初始阵列麦克风配置为可用于传播环境内的阵列麦克风配置,传播环境参数与初始阵列麦克风配置之间的对应关系可由相关工作人员根据实验数据或历史操作经验确定后上传至处理系统,该对应关系的具体内容在本申请实施例中不做具体限定,初始阵列麦克风配置可用于传播环境,不同的初始阵列麦克风配置在传播环境内产生的音频传播效果不同,因此,根据传播环境参数确定出对应的多个初始阵列麦克风配置后,还需要根据每个初始阵列麦克风配置进行音频传播模拟,再根据模拟结果从初始阵列麦克风配置中确定出与传播环境相适配的目标阵列麦克风配置。Among them, the preset standard site model is a basic model corresponding to the propagation environment, and the corresponding target site model is determined based on the propagation environment parameters and the preset standard site model, that is, the basic model is optimized according to the propagation environment parameters to optimize the basic model to the target site model corresponding to the propagation environment parameters, and the model parameters of the target site model are consistent with the propagation environment parameters. There is a corresponding relationship between the propagation environment parameters and the initial array microphone configuration, and the initial array microphone configuration is an array microphone configuration that can be used in the propagation environment. The corresponding relationship between the propagation environment parameters and the initial array microphone configuration can be determined by relevant staff based on experimental data or historical operating experience and then uploaded to the processing system. The specific content of the corresponding relationship is not specifically limited in the embodiment of the present application. The initial array microphone configuration can be used in the propagation environment. Different initial array microphone configurations produce different audio propagation effects in the propagation environment. Therefore, after determining the corresponding multiple initial array microphone configurations according to the propagation environment parameters, it is also necessary to perform audio propagation simulation according to each initial array microphone configuration, and then determine the target array microphone configuration that is compatible with the propagation environment from the initial array microphone configuration according to the simulation results.

由于在音频处理过程中回声参数和混响参数会影响音频传播质量,因此在从初始阵列麦克风配置中确定目标阵列麦克风配置时,需要对各个初始阵列麦克风配置经由音频传播模拟后生成的各个初始模拟参数进行分析,其中,初始模拟参数中包含模拟回声参数和模拟混响参数,模拟回声参数为在音频传播模拟过程中,模拟声波在遇到障碍物后反射回来的声音,模拟回声参数通常包括回声衰减率和阈值等,回声衰减率能够决定回声音频与原始音频的比率,而阈值能够决定动态混响开始混响的最大信号幅度。混响能够表征模拟声音在目标场地模型内的反射现象,混响参数可以包含混响时长、混响音量以及混响衰减率,其中,过长的混响时长可能会导致音频听起来过于持久,即,可能会模糊原始音频的清晰度;混响音量过高可能会导致音频在传播过程中过嘈杂,而过低的混响音量则可能会导致音频在传播过程中缺乏层次,即,可能会导致原始音频失真;混响衰减率为高频衰减和低频衰减之比,调整高频衰减和低频衰减的比值参数便于平衡原始音频的音色和明亮度。Since echo parameters and reverberation parameters will affect the audio propagation quality during the audio processing process, when determining the target array microphone configuration from the initial array microphone configuration, it is necessary to analyze the initial simulation parameters generated by the audio propagation simulation of each initial array microphone configuration, wherein the initial simulation parameters include simulated echo parameters and simulated reverberation parameters. The simulated echo parameters are the sounds reflected back after the simulated sound waves encounter obstacles during the audio propagation simulation process. The simulated echo parameters usually include echo attenuation rate and threshold value, etc. The echo attenuation rate can determine the ratio of the echo audio to the original audio, and the threshold value can determine the maximum signal amplitude for the start of dynamic reverberation. Reverberation can characterize the reflection phenomenon of simulated sound in the target venue model. Reverberation parameters may include reverberation duration, reverberation volume, and reverberation decay rate. A reverberation duration that is too long may cause the audio to sound too persistent, that is, it may blur the clarity of the original audio; a reverberation volume that is too high may cause the audio to be too noisy during propagation, while a reverberation volume that is too low may cause the audio to lack layers during propagation, that is, it may cause distortion of the original audio; the reverberation decay rate is the ratio of high-frequency attenuation to low-frequency attenuation. Adjusting the ratio parameter of high-frequency attenuation to low-frequency attenuation facilitates balancing the timbre and brightness of the original audio.

初始模拟参数包括但不限于模拟回声参数和模拟混响参数,预设传播标准参数中也至少包含预设标准模拟回声参数和预设标准模拟混响参数,通过将预设传播标准参数与每个初始模拟参数相匹配,得到每个初始模拟参数对应的模拟分值,模拟分值越高表征对应初始模拟参数越贴近预设标准模拟回声参数,即,对应初始模拟参数越符合预设标准。将模拟分值最高的初始模拟参数确定为目标初始模拟参数,将目标初始模拟参数对应的初始阵列麦克风配置参数确定为目标阵列麦克风配置参数。The initial simulation parameters include but are not limited to simulation echo parameters and simulation reverberation parameters. The preset propagation standard parameters also include at least preset standard simulation echo parameters and preset standard simulation reverberation parameters. By matching the preset propagation standard parameters with each initial simulation parameter, a simulation score corresponding to each initial simulation parameter is obtained. The higher the simulation score, the closer the corresponding initial simulation parameter is to the preset standard simulation echo parameter, that is, the more the corresponding initial simulation parameter meets the preset standard. The initial simulation parameter with the highest simulation score is determined as the target initial simulation parameter, and the initial array microphone configuration parameter corresponding to the target initial simulation parameter is determined as the target array microphone configuration parameter.

步骤S120:基于目标阵列麦克风配置参数,采集传播环境对应的传播环境音频。Step S120: Based on the target array microphone configuration parameters, the propagation environment audio corresponding to the propagation environment is collected.

具体的,确定出目标阵列麦克风配置参数后,可根据目标阵列麦克风配置参数生成麦克风调控指令,以便于提示相关工作人员基于麦克风调控指令调整和设置传播环境中的麦克风,以使传播环境可基于目标阵列麦克风配置参数进行音频采集。Specifically, after determining the target array microphone configuration parameters, microphone control instructions can be generated according to the target array microphone configuration parameters to prompt relevant staff to adjust and set the microphones in the propagation environment based on the microphone control instructions, so that the propagation environment can collect audio based on the target array microphone configuration parameters.

步骤S130:识别传播环境音频中包含的音频特征,并基于音频特征从传播环境音频中确定出主音频和从音频,音频特征包括频率、时域以及变化幅度。Step S130: Identify audio features contained in the propagation environment audio, and determine the main audio and the slave audio from the propagation environment audio based on the audio features, where the audio features include frequency, time domain, and variation amplitude.

具体的,传播环境音频为传播环境内产生的音频,包含主音频和从音频,其中从音频也可成为噪声音频,主音频可以为教师或主讲人产生的音频。由于主音频一般具有较高的辨识度,因此,可通过识别传播环境音频中的音频特征,区分传播环境音频中的主音频和从音频,其中音频特征可以为频率、时域以及变化幅度等,关于频率特征:主音频在不同时间段通常具有特定的频率,例如,特有的音调或音质,而从音频的频率分布可能更广泛而无规则,例如交通噪声、人群嘈杂声等;关于时域特征:主音频一般具有连续性和可识别的语音音节,而从音频可能是连续的或间歇的,通常不具备可识别的语音音节;关于变化幅度:主音频对应的变化幅度一般较大,包括从轻声到高声,而从音频的变化幅度一般较小,音量相对稳定。因此,通过识别传播环境音频中包含的音频特征,可对传播环境音频进行划分,其中,从音频为传播环境音频中非主音频的音频,但是从音频对应的噪声源可能不同,例如,空调噪声、车辆交通噪声、交谈噪声等。Specifically, the propagation environment audio is the audio generated in the propagation environment, including the main audio and the slave audio, wherein the slave audio can also be called noise audio, and the main audio can be the audio generated by the teacher or the lecturer. Since the main audio generally has a high degree of recognition, the main audio and the slave audio in the propagation environment audio can be distinguished by identifying the audio features in the propagation environment audio, wherein the audio features can be frequency, time domain, and change amplitude, etc. Regarding frequency features: the main audio usually has a specific frequency in different time periods, such as a unique tone or sound quality, while the frequency distribution of the slave audio may be wider and irregular, such as traffic noise, crowd noise, etc.; Regarding time domain features: the main audio generally has continuity and recognizable speech syllables, while the slave audio may be continuous or intermittent, and usually does not have recognizable speech syllables; Regarding change amplitude: the change amplitude corresponding to the main audio is generally larger, including from soft voice to loud voice, while the change amplitude of the slave audio is generally smaller, and the volume is relatively stable. Therefore, by identifying the audio features contained in the propagation environment audio, the propagation environment audio can be divided, wherein the slave audio is the audio that is not the main audio in the propagation environment audio, but the noise sources corresponding to the slave audio may be different, for example, air conditioning noise, vehicle traffic noise, conversation noise, etc.

步骤S140:基于主音频、音频特征和预设数学模型,确定从音频对应的从音频信号。Step S140: Determine a slave audio signal corresponding to the slave audio based on the master audio, the audio feature and the preset mathematical model.

具体的,预设数学模型可以为对从音频进行抑制的数学模型,基于音频特征可从传播环境音频中确定出从音频,通过将从音频转换为从音频信号,便于为后续降噪处理提供降噪依据,其中,将从音频转换为从音频信号时,为了提升转换准确性,也为了提升降噪处理效果,可对从音频进行进一步划分,其中基于主音频、音频特征和预设数学模型,确定从音频对应的从音频信号,具体可以包括:Specifically, the preset mathematical model may be a mathematical model for suppressing the slave audio. The slave audio may be determined from the propagation environment audio based on the audio features. By converting the slave audio into a slave audio signal, it is convenient to provide a noise reduction basis for subsequent noise reduction processing. When converting the slave audio into a slave audio signal, in order to improve the conversion accuracy and the noise reduction processing effect, the slave audio may be further divided. Based on the main audio, the audio features and the preset mathematical model, the slave audio signal corresponding to the slave audio is determined, which may specifically include:

从音频特征中识别主音频对应的主音频特征,并基于主音频特征和音频特征,从从音频中确定出相关从音频和非相关从音频;获取噪声调整量化参数,并根据相关从音频、非相关从音频、噪声调整参数以及预设数学计算模型,确定从音频对应的从音频信号。Identify main audio features corresponding to the main audio from the audio features, and determine relevant slave audio and irrelevant slave audio from the slave audio based on the main audio features and the audio features; obtain noise adjustment quantization parameters, and determine the slave audio signal corresponding to the slave audio based on the relevant slave audio, the irrelevant slave audio, the noise adjustment parameters and a preset mathematical calculation model.

具体的,从音频包含相关从音频和非相关从音频,其中,相关从音频为与主音频相关的从音频,例如,当主音频为主讲人利用麦克风发出的音频,相关从音频可以为麦克风自身的杂音,或主讲人的呼吸声,非相关从音频可以为与主音频不相关的背景噪声,例如风扇声、交通声等随机噪声。为了提升确定从音频信号时的准确性。Specifically, the slave audio includes related slave audio and unrelated slave audio, wherein the related slave audio is the slave audio related to the main audio. For example, when the main audio is the audio emitted by the speaker using a microphone, the related slave audio can be the noise of the microphone itself, or the breathing sound of the speaker, and the unrelated slave audio can be the background noise unrelated to the main audio, such as the fan sound, traffic sound and other random noise. In order to improve the accuracy of determining the slave audio signal.

预设数学模型可以为n(t)=αPi(t)+Ni(t),其中,α用于表征噪声调整量化参数;Pi(t)遵循泊松分布,用于表征相关从音频对应的信号;Ni(t)遵循高斯分布,用于表征非相关从音频对应的信号;n(t)用于表征从音频信号。噪声调整量化参数可由相关工作人员输入,噪声调整量化参数越大,相关从音频在从音频中的权重越大,反之,噪声调整量化参数越小,相关从音频在从音频中的权重越小,最后,通过预设数学模型将相关从音频和非相关从音频,按照获取到的噪声调整量化参数加权求和,可得到最终的从音频信号。The preset mathematical model can be n(t)=αPi(t)+Ni(t), where α is used to characterize the noise adjustment quantization parameter; Pi(t) follows Poisson distribution and is used to characterize the signal corresponding to the relevant slave audio; Ni(t) follows Gaussian distribution and is used to characterize the signal corresponding to the unrelated slave audio; and n(t) is used to characterize the slave audio signal. The noise adjustment quantization parameter can be input by relevant staff. The larger the noise adjustment quantization parameter, the greater the weight of the relevant slave audio in the slave audio. Conversely, the smaller the noise adjustment quantization parameter, the smaller the weight of the relevant slave audio in the slave audio. Finally, the relevant slave audio and the unrelated slave audio are weighted and summed according to the obtained noise adjustment quantization parameter through the preset mathematical model to obtain the final slave audio signal.

将相关从音频转换为遵循泊松分布的音频信号时,可先确定泊松分布参数,用于表征单位时间内音频振幅的基准值,还需要确定固定时间段内音频振幅达到基准值的期望发生数量,基于基准值和期望发生数量,可以得到遵循泊松分布的相关从音频信号,其中,基准值包括但不限于音频振幅。将非相关从音频转换为遵循高斯分布的音频信号时,可获取相关工作人员设定的期望值和标准差,其中期望值为高斯分布的中心点,用于表征音频信号的平均振幅,标准差为高斯分布的离散程度,用于表征音频信号的音量波动程度。确定出高斯分布参数后,可使用随机数生成器确定符合期望值和标准差的高斯分布随机样本,通过将生成的随机样本转换为一个时间序列,随机样本为非相关从音频中每个时间点的振幅值,基于随机样本确定非相关从音频对应的信号。When converting the related slave audio to an audio signal that follows the Poisson distribution, the parameters of the Poisson distribution can be determined first, which are used to characterize the reference value of the audio amplitude per unit time. It is also necessary to determine the expected number of occurrences in which the audio amplitude reaches the reference value within a fixed time period. Based on the reference value and the expected number of occurrences, the related slave audio signal that follows the Poisson distribution can be obtained, wherein the reference value includes but is not limited to the audio amplitude. When converting the unrelated slave audio to an audio signal that follows the Gaussian distribution, the expected value and standard deviation set by the relevant staff can be obtained, wherein the expected value is the center point of the Gaussian distribution, which is used to characterize the average amplitude of the audio signal, and the standard deviation is the degree of discreteness of the Gaussian distribution, which is used to characterize the volume fluctuation of the audio signal. After determining the parameters of the Gaussian distribution, a random number generator can be used to determine a Gaussian distribution random sample that meets the expected value and standard deviation. By converting the generated random sample into a time series, the random sample is the amplitude value at each time point in the unrelated slave audio, and the signal corresponding to the unrelated slave audio is determined based on the random sample.

将相关从音频转换为遵循泊松分布的音频信号的具体方式,在本申请实施例中不做具体限定,将非相关从音频转换为遵循高斯分布的音频信号的具体方式,在本申请实施例中不做具体限定,只要能够进行信号转换即可。由于相关从音频通常与主音频信号有某种关联,若相关从音频是由一系列独立触发事件引起的,并且这些独立触发事件在时间上均匀分布,那么将相关从音频传唤为遵循泊松分布的音频信号较为合适,由于高斯分布能够很好地描述随机变量的概率分布,因此将非相关从音频转换为遵循高斯分布的音频信号较为合理。由于不同的噪声对应的降噪处理方式不同,因此通过主音频特征对从音频进行进一步细分,便于提升降噪过程中的效率和准确度,再基于噪声调整量化参数和预设数学计算模型确定最终的从音频信号,便于针对性的降低或消除噪声成分,便于提升确定从音频信号时的准确性。The specific method of converting the relevant slave audio into an audio signal that follows the Poisson distribution is not specifically limited in the embodiments of the present application, and the specific method of converting the unrelated slave audio into an audio signal that follows the Gaussian distribution is not specifically limited in the embodiments of the present application, as long as the signal conversion can be performed. Since the relevant slave audio is usually related to the main audio signal, if the relevant slave audio is caused by a series of independent triggering events, and these independent triggering events are evenly distributed in time, then it is more appropriate to summon the relevant slave audio as an audio signal that follows the Poisson distribution. Since the Gaussian distribution can well describe the probability distribution of random variables, it is more reasonable to convert the unrelated slave audio into an audio signal that follows the Gaussian distribution. Since different noises correspond to different noise reduction processing methods, the slave audio is further subdivided by the main audio features to facilitate the improvement of efficiency and accuracy in the noise reduction process, and then the final slave audio signal is determined based on the noise adjustment quantization parameter and the preset mathematical calculation model, so as to facilitate the targeted reduction or elimination of noise components and improve the accuracy of determining the slave audio signal.

步骤S150:基于从音频信号对传播环境音频进行降噪处理,得到降噪音频,并根据降噪音频进行音频传播。Step S150: performing noise reduction processing on the propagation environment audio based on the audio signal to obtain noise-reduced audio, and performing audio propagation based on the noise-reduced audio.

具体的,基于从音频信号对传播环境音频进行降噪处理时,可先识别从音频信号中包含的信号特征,信号特征可以包含信号周期、信号相位、信号频谱等,基于信号特征对传播环境音频进行针对性降噪处理,以抑制传播环境音频中的从音频,从而便于放大或增强主音频,基于信号特征进行针对性降噪处理过程中应用的降噪算法可以为深度学习算法,其中可将信号特征和传播环境音频导入训练好的降噪模型中以得到降噪音频,降噪模型是基于大量降噪样本训练后得到的,降噪样本包含有噪声音频和噪声音频特征。Specifically, when performing noise reduction processing on the propagation environment audio based on the slave audio signal, the signal features contained in the slave audio signal can be first identified. The signal features can include signal period, signal phase, signal spectrum, etc., and targeted noise reduction processing is performed on the propagation environment audio based on the signal features to suppress the slave audio in the propagation environment audio, thereby facilitating amplification or enhancement of the main audio. The noise reduction algorithm used in the process of performing targeted noise reduction processing based on signal features can be a deep learning algorithm, wherein the signal features and the propagation environment audio can be imported into a trained noise reduction model to obtain noise-reduced audio. The noise reduction model is obtained after training with a large number of noise reduction samples, and the noise reduction samples include noise audio and noise audio features.

对于本申请实施例,通过采用与传播环境对应的麦克风配置参数进行音频采集便于提升音频采集范围,而不是采用固定的配置参数进行音频采集,便于为后续的音频处理提供高质量的原始音频数据,另外,通过音频特征区分传播环境音频中的主音频和从音频,即区分传播环境音频中的优先级,便于为主音频提供优先传播资源,另外,通过预设数学模型对从音频进行处理得到从音频信号后,再基于从音频信号对传播环境音频进行降噪处理,便于提升音频降噪处理和效率和精度,从而便于提升音频传播质量。For the embodiments of the present application, the audio collection range can be improved by using microphone configuration parameters corresponding to the propagation environment for audio collection, rather than using fixed configuration parameters for audio collection, so as to provide high-quality original audio data for subsequent audio processing. In addition, the main audio and the slave audio in the propagation environment audio are distinguished by audio features, that is, the priority in the propagation environment audio is distinguished, so as to provide priority propagation resources for the main audio. In addition, after the slave audio is processed by a preset mathematical model to obtain a slave audio signal, the propagation environment audio is denoised based on the slave audio signal, so as to improve the audio noise reduction processing efficiency and accuracy, thereby improving the audio propagation quality.

进一步地,当监测到主音频存在至少两个主音频特征时,本申请实施例提供的方法还包括步骤S1-步骤S4,如图2所示,其中:Further, when it is detected that the main audio has at least two main audio features, the method provided in the embodiment of the present application further includes steps S1 to S4, as shown in FIG2 , wherein:

步骤S1:获取声波记录信息,并基于声波记录信息,确定每个主音频特征对应的声源位置,声波记录信息中包含主音频抵达阵列麦克风中每个麦克风的抵达时刻。Step S1: Acquire sound wave recording information, and determine the sound source position corresponding to each main audio feature based on the sound wave recording information, wherein the sound wave recording information includes the arrival time of the main audio at each microphone in the array microphone.

具体的,当传播环境为互动教室,或采访间等场所时,传播环境音频中会存在至少两个主音频特征,当互动教室内出现讨论或辩论时,还可能会出现两个以上主音频特征。当传播环境音频中出现至少两个主音频特征时,两个主音频特征对应的声源位置可能不同,通过对声波记录信息进行分析,便于确定出每个主音频特征抵达麦克风处时的抵达时刻,基于每个主音频特征对应的抵达时刻便于确定出每个主音频特征对应的声源位置,即每个主音频对应的声源位置。Specifically, when the communication environment is an interactive classroom, an interview room or other places, there will be at least two main audio features in the communication environment audio. When there is a discussion or debate in the interactive classroom, more than two main audio features may appear. When at least two main audio features appear in the communication environment audio, the sound source positions corresponding to the two main audio features may be different. By analyzing the sound wave recording information, it is easy to determine the arrival time of each main audio feature at the microphone. Based on the arrival time corresponding to each main audio feature, it is easy to determine the sound source position corresponding to each main audio feature, that is, the sound source position corresponding to each main audio.

针对任一主音频而言,基于声波记录信息确定对应声源位置的具体步骤可以为:计算同一主音频抵达不同麦克风之间的时间差,再利用目标阵列麦克风配置参数、声波传播速度以及同一主音频抵达不同麦克风之间的时间差,估算主音频对应的声源位置,估算方式可采用基于到达时间差的声源定位算法,具体的估算方式在本申请实施例中不做具体限定,只要能够确定出主音频对应的声源位置即可。采用上述步骤可得到每个主音频对应的声源位置。For any main audio, the specific steps for determining the corresponding sound source position based on the sound wave recording information can be: calculating the time difference between the same main audio arriving at different microphones, and then using the target array microphone configuration parameters, sound wave propagation speed, and the time difference between the same main audio arriving at different microphones to estimate the sound source position corresponding to the main audio. The estimation method can adopt a sound source localization algorithm based on arrival time difference. The specific estimation method is not specifically limited in the embodiments of the present application, as long as the sound source position corresponding to the main audio can be determined. The above steps can be used to obtain the sound source position corresponding to each main audio.

步骤S2:获取每个声源位置对应范围内产生的声源环境音频,并基于每个主音频特征从对应声源环境音频中确定声源从音频。Step S2: Acquire the sound source environment audio generated within the corresponding range of each sound source position, and determine the sound source slave audio from the corresponding sound source environment audio based on each main audio feature.

具体的,每个声源位置对应的范围为以声源位置为中心,预设距离为半径的区域,确定出每个声源位置对应的区域后,根据区域内设置的麦克风获取对应范围内的声源环境音频,具体的预设距离在本申请实施例中不做具体限定,可由相关技术人员进行设定。基于主音频特征从对应声源环境音频中确定声源从音频的具体方式,可参考步骤S130部分公开的方式,在此不做赘述。Specifically, the range corresponding to each sound source position is an area with the sound source position as the center and a preset distance as the radius. After determining the area corresponding to each sound source position, the sound source environment audio within the corresponding range is obtained according to the microphone set in the area. The specific preset distance is not specifically limited in the embodiment of the present application and can be set by relevant technical personnel. The specific method of determining the sound source secondary audio from the corresponding sound source environment audio based on the main audio feature can refer to the method disclosed in step S130, which will not be repeated here.

步骤S3:识别每个声源从音频的噪声信息,并通过对比各个噪声信息判断所有的声源从音频对应的噪声信息是否相同,噪声信息包含噪声类型和噪声响度。Step S3: Identify the noise information of each sound source from the audio, and determine whether the noise information corresponding to all the sound source from the audio is the same by comparing the noise information. The noise information includes the noise type and the noise loudness.

具体的,声源位置不同时,对应声源从音频中包含的噪声信息可能相同,也可能不同,例如传播环境为互动教室,主音频a对应的声源位置为讲台,主音频b对应的声源位置为教室后侧,并且教室后侧设置有空调,因此,主音频b对应的声源从音频相较于主音频a对应的声源从音频而言,空调噪声的响度可能更大。Specifically, when the sound source positions are different, the noise information contained in the corresponding sound source slave audio may be the same or different. For example, when the propagation environment is an interactive classroom, the sound source position corresponding to the main audio a is the podium, and the sound source position corresponding to the main audio b is the back of the classroom, and an air conditioner is installed at the back of the classroom. Therefore, the sound source slave audio corresponding to the main audio b may have a louder air conditioning noise than the sound source slave audio corresponding to the main audio a.

噪声信息包括但不限于噪声类型和噪声响度,其中,噪声类型包含低频噪声、中频噪声以及高频噪声,其中,低频噪声为主频率低于300Hz的噪声,中频噪声为主频率在300-800Hz之间的噪声,高频噪声为主频率高于800Hz的噪声。通过噪声类型和噪声响度是否一致,可以判断不同声源从音频是否一致,除噪声类型和噪声响度之外,还可以通过比较和分析不同声源从音频的时域特征等。进一步地,为了便于提升判断结果的准确性,识别每个声源从音频的噪声信息,并通过对比各个噪声信息判断所有的声源从音频对应的噪声信息是否相同,具体可以包括:Noise information includes but is not limited to noise type and noise loudness, wherein noise type includes low-frequency noise, medium-frequency noise and high-frequency noise, wherein low-frequency noise is noise with a main frequency lower than 300Hz, medium-frequency noise is noise with a main frequency between 300-800Hz, and high-frequency noise is noise with a main frequency higher than 800Hz. Whether the noise type and noise loudness are consistent can be used to determine whether the audio from different sound sources is consistent. In addition to noise type and noise loudness, time domain features of the audio from different sound sources can also be compared and analyzed. Furthermore, in order to facilitate improving the accuracy of the judgment result, the noise information of each sound source from the audio is identified, and by comparing the noise information, it is determined whether the noise information corresponding to all sound sources from the audio is the same. Specifically, it may include:

识别每个声源从音频在频谱图中的频率分布和峰值,并基于每个声源从音频对应的频率分布和峰值,确定每个声源从音频对应的第一噪声展示图。Identify the frequency distribution and peak value of each sound source from the audio in the spectrum diagram, and determine the first noise display diagram corresponding to each sound source from the audio based on the frequency distribution and peak value corresponding to each sound source from the audio.

具体的,首先可通过对每个声源从音频进行频谱分析,得到每个声源从音频对应的频谱图,从每个频谱图中可确定出对应声源从音频的频率分布和峰值。第一噪声展示图为三维展示图,由基础分布图和峰值叠加构成,其中基础分布图中包含有多个分散点,不同的频率分布对应的分散点的分散程度图不同,确定频率分布对应的分散程度图时,可通过分散映射关系进行确定,其中分散映射关系中包含有不同频率分布对应分散点的分散程度图,确定出分散程度图后,可在分散程度图中预设位置处叠加峰值对应的峰值参数得到第一噪声展示图,在叠加过程中可先将峰值转换为峰值参数,便于将不同的声源从音频对应的第一噪声展示图进行对比,其中,将峰值转换为峰值参数时对应的转换比例和预设位置均可由相关工作人员上传至处理系统,具体内容在本申请实施例中不做限定。Specifically, first, spectrum analysis can be performed on each sound source from the audio to obtain the spectrum diagram corresponding to each sound source from the audio, and the frequency distribution and peak value of the corresponding sound source from the audio can be determined from each spectrum diagram. The first noise display diagram is a three-dimensional display diagram, which is composed of a basic distribution diagram and a peak superposition, wherein the basic distribution diagram contains multiple scattered points, and the dispersion degree diagrams of the scattered points corresponding to different frequency distributions are different. When determining the dispersion degree diagram corresponding to the frequency distribution, it can be determined through a dispersion mapping relationship, wherein the dispersion mapping relationship contains dispersion degree diagrams of scattered points corresponding to different frequency distributions. After determining the dispersion degree diagram, the peak value parameters corresponding to the peak value can be superimposed at the preset position in the dispersion degree diagram to obtain the first noise display diagram. In the superposition process, the peak value can be first converted into a peak value parameter, so as to facilitate the comparison of the first noise display diagrams corresponding to different sound source audios, wherein the corresponding conversion ratio and preset position when converting the peak value into the peak value parameter can be uploaded to the processing system by relevant staff, and the specific content is not limited in the embodiments of the present application.

识别每个声源从音频的音频响度,并基于响度映射关系确定每个声源从音频对应的第二噪声展示图,响度映射关系为音频响度与第二噪声展示图之间的对应关系,Identify the audio loudness of each sound source from the audio, and determine the second noise display map corresponding to each sound source from the audio based on the loudness mapping relationship, where the loudness mapping relationship is the corresponding relationship between the audio loudness and the second noise display map,

具体的,不同的音频响度对应的第二噪声展示图不同,音频响度越大对应的第二噪声展示图对应的图形边缘长度越长,第二噪声展示图为二维展示图,通过图像边缘长度描述不同的音频响度,即,通过第二噪声展示图的面积表征不同的音频响度,便于相关工作人员直观查看不同声源从音频之间的区别。例如,声源从音频a对应的第二噪声展示图为图形边缘长度为4厘米的圆形,声源从音频b对应的第二噪声展示图为图形边缘长度为7厘米的圆形,为了便于对比,不同的第二噪声展示图对应的基础图形不变,只存在图形边缘长度之间的区别。响度映射关系中包含有不同的音频响度对应的第二噪声展示图,具体内容在本申请实施例中不做具体限定,只要能够根据该响度映射关系确定出每个Specifically, different audio loudnesses correspond to different second noise display graphs. The louder the audio loudness, the longer the graphic edge length of the corresponding second noise display graph. The second noise display graph is a two-dimensional display graph, and different audio loudnesses are described by the image edge length, that is, different audio loudnesses are characterized by the area of the second noise display graph, which is convenient for relevant staff to intuitively view the differences between different sound sources from audio. For example, the second noise display graph corresponding to the sound source from audio a is a circle with a graphic edge length of 4 cm, and the second noise display graph corresponding to the sound source from audio b is a circle with a graphic edge length of 7 cm. For ease of comparison, the basic graphics corresponding to different second noise display graphs remain unchanged, and there is only a difference between the graphic edge lengths. The loudness mapping relationship includes second noise display graphs corresponding to different audio loudnesses. The specific content is not specifically limited in the embodiments of the present application. As long as each sound source can be determined according to the loudness mapping relationship,

将每个声源从音频的第一噪声展示图和第二噪声展示图进行图像叠加,得到每个声源从音频的噪声叠加展示图。The first noise display image and the second noise display image of each sound source from the audio are superimposed to obtain a noise superposition display image of each sound source from the audio.

具体的,将每个声源从音频的第一噪声展示图和第二噪声展示图进行图像叠加时,可通过像素级叠加、位置叠加、透明度叠加等方式,具体叠加方式在本申请实施例中不做具体限定,只要能够实现图像叠加即可,其中,当采用像素级叠加方式时,可在像素级别上将第一噪声展示图和第二噪声展示图进行数学运算,以得到噪声叠加展示图。Specifically, when superimposing the first noise display map and the second noise display map of each sound source from the audio, it can be done by pixel level superposition, position superposition, transparency superposition, etc. The specific superposition method is not specifically limited in the embodiment of the present application, as long as image superposition can be achieved. Among them, when the pixel level superposition method is adopted, the first noise display map and the second noise display map can be mathematically operated at the pixel level to obtain a noise superposition display map.

将每个声源从音频对应的噪声叠加展示图进行图像匹配,根据图像匹配结果判断所有的声源从音频对应的噪声信息是否相同,当所有的声源从音频对应的噪声叠加展示图之间的匹配值不低于预设匹配值时,表征所有的声源从音频对应的噪声信息相同。Perform image matching on the noise superposition display map corresponding to the audio of each sound source, and determine whether the noise information corresponding to the audio of all sound sources is the same based on the image matching results. When the matching value between the noise superposition display maps corresponding to the audio of all sound sources is not lower than the preset matching value, it indicates that the noise information corresponding to the audio of all sound sources is the same.

具体的,对两个噪声叠加展示图进行图像匹配时,可先确定出两个噪声叠加展示图的边缘信息,再通过比较两个噪声叠加展示图的边缘信息确定两个噪声叠加展示图之间的匹配值,当两个噪声叠加展示图之间的匹配值不低于预设匹配值时,表征两个噪声叠加展示图对应声源从音频的噪声信息一致。其中,可根据Canny边缘检测算法、梯度算子算法等方式,确定每个噪声叠加展示图的边缘信息,具体方式在本申请实施例中不做具体限定。预设匹配值可以为98%,也可以为95%,具体数值可由相关工作人员根据实际需求进行设定。采用上述方式,得到任意两个噪声叠加展示图之间的匹配值,当任意两个噪声叠加展示图对应的匹配值均不低于预设匹配值时,表征所有的噪声叠加展示图一致,即,所有噪声叠加展示图对应声源从音频的噪声信息一致。Specifically, when performing image matching on two noise superposition display images, the edge information of the two noise superposition display images can be determined first, and then the matching value between the two noise superposition display images can be determined by comparing the edge information of the two noise superposition display images. When the matching value between the two noise superposition display images is not lower than the preset matching value, it indicates that the noise information of the audio from the corresponding sound source of the two noise superposition display images is consistent. Among them, the edge information of each noise superposition display image can be determined according to the Canny edge detection algorithm, the gradient operator algorithm, etc., and the specific method is not specifically limited in the embodiment of the present application. The preset matching value can be 98% or 95%, and the specific value can be set by relevant staff according to actual needs. Using the above method, the matching value between any two noise superposition display images is obtained. When the matching values corresponding to any two noise superposition display images are not lower than the preset matching value, it indicates that all noise superposition display images are consistent, that is, the noise information of the audio from the corresponding sound source of all noise superposition display images is consistent.

由于在确定声源从音频对应的噪声叠加展示图时一般通过提取关键特征确定,因此基于图像匹配的方式判断多个声源从音频对应的噪声信息是否相似时,便于忽略次要或冗余数据,从而便于提升判断结果的准确性。Since the noise superposition display diagram corresponding to the sound source from the audio is generally determined by extracting key features, when judging whether the noise information corresponding to multiple sound sources from the audio is similar based on image matching, it is easy to ignore minor or redundant data, thereby facilitating improving the accuracy of the judgment result.

步骤S4:若是,则采用同一降噪方式对至少两个主音频特征对应的声源环境音频进行降噪处理。Step S4: If yes, the same noise reduction method is used to perform noise reduction processing on the sound source environment audio corresponding to at least two main audio features.

具体的,若确定出至少两个主音频特征对应声源环境音频的噪声叠加展示图一致,则表征至少两个声源环境音频可采用同一方式进行降噪处理,其中,对任一主音频特征对应声源环境音频进行降噪处理的方式,可参考上述实施例中步骤S110-步骤S150部分,在此不做赘述。通过比较不同主音频对应声源位置处的噪声信息是否一致,以便于对不同声源位置对应的主音频是否可采用同一降噪处理操作进行判断,而不是针对每个主音频均需进行繁杂的降噪处理确定操作,便于提升降噪处理速率。Specifically, if it is determined that the noise superposition display diagrams of the sound source environment audio corresponding to at least two main audio features are consistent, then the at least two sound source environment audios can be processed by noise reduction in the same manner, wherein the manner of performing noise reduction processing on the sound source environment audio corresponding to any main audio feature can refer to step S110-step S150 in the above embodiment, which will not be described in detail here. By comparing whether the noise information at the sound source positions corresponding to different main audios is consistent, it is convenient to judge whether the main audios corresponding to different sound source positions can be processed by the same noise reduction processing operation, rather than performing complicated noise reduction processing determination operations for each main audio, so as to improve the noise reduction processing rate.

进一步地,当传播环境内存在关注人员时,本申请实施例提供的方法还包括:Furthermore, when there is a person of interest in the communication environment, the method provided in the embodiment of the present application further includes:

获取关注人员的关注位置和听力损失参数;基于关注位置与主音频对应声源位置确定声源间隔和声源方向,并基于声源间隔和声源方向以及响度映射关系,确定关注位置对应的响度调整参数;基于响度调整参数和听力损失参数对降噪音频进行优化,得到优化降噪音频,并将优化降噪音频反馈至关注人员佩戴的助听设备中。The attention position and hearing loss parameters of the concerned person are obtained; the sound source interval and sound source direction are determined based on the attention position and the sound source position corresponding to the main audio, and the loudness adjustment parameters corresponding to the attention position are determined based on the sound source interval and sound source direction and the loudness mapping relationship; the noise reduction audio is optimized based on the loudness adjustment parameters and the hearing loss parameters to obtain the optimized noise reduction audio, and the optimized noise reduction audio is fed back to the hearing aid device worn by the concerned person.

具体的,关注人员可以为听障人员,关注位置为关注人员在传播环境中的位置,当监测到传播环境中包含关注人员时,可通过从传播环境图像中进行人员识别的方式确定关注人员的关注位置,其中传播环境图像可由设置于传播环境内的图像采集设备采集后上传至处理系统。听力损失参数用于表征关注人员的听力损失程度,可在得到关注人员授权后获取。Specifically, the person of concern may be a hearing-impaired person, and the position of concern may be the position of the person of concern in the communication environment. When the communication environment is monitored to contain the person of concern, the position of concern of the person of concern may be determined by identifying the person from the communication environment image, wherein the communication environment image may be collected by an image acquisition device set in the communication environment and uploaded to the processing system. The hearing loss parameter is used to characterize the degree of hearing loss of the person of concern, and may be obtained after obtaining authorization from the person of concern.

关注人员与主音频对应声源位置之间的声源间隔和声源方向可能会影响关注人员接收主音频时的响度,因此,需要根据响度映射关系确定出关注位置对应的响度调整参数,声源间隔越长,声源方向对应的角度越大,对应的响度调整参数值越高,响度映射关系的具体内容在本申请实施例中不做具体限定,可由相关工作人员根据历史实验数据确定后上传至处理系统,只要能够根据响度映射关系确定出关注位置对应的响度调整参数值即可。The sound source interval and sound source direction between the person of interest and the sound source position corresponding to the main audio may affect the loudness of the main audio received by the person of interest. Therefore, it is necessary to determine the loudness adjustment parameter corresponding to the position of interest based on the loudness mapping relationship. The longer the sound source interval and the larger the angle corresponding to the sound source direction, the higher the corresponding loudness adjustment parameter value. The specific content of the loudness mapping relationship is not specifically limited in the embodiment of the present application, and can be determined by relevant staff based on historical experimental data and uploaded to the processing system, as long as the loudness adjustment parameter value corresponding to the position of interest can be determined based on the loudness mapping relationship.

根据听障人员的听力损失参数确定损失调整响度,对于高频听力损失,可能需要增加高频成分的响度;对于低频听力损失,则可能需要增加低频成分的响度,不同的听障人士对应的损失调整响度不同,确定出损失调整响度后,根据损失调整响度和响度调整参数共同优化降噪音频,优化降噪音频便于适应对应听障人员的听力特性和偏好,其中,助听设备可能包括助听器、人工耳蜗等。通过个性化调整便于提升关注人员的音频体验,也便于提升音频处理过程中的鲁棒性。The loss adjustment loudness is determined according to the hearing loss parameters of the hearing-impaired person. For high-frequency hearing loss, the loudness of the high-frequency component may need to be increased; for low-frequency hearing loss, the loudness of the low-frequency component may need to be increased. Different hearing-impaired people have different corresponding loss adjustment loudness. After the loss adjustment loudness is determined, the noise reduction audio is optimized according to the loss adjustment loudness and the loudness adjustment parameters. The optimized noise reduction audio is convenient for adapting to the hearing characteristics and preferences of the corresponding hearing-impaired person. Among them, hearing aids may include hearing aids, cochlear implants, etc. Personalized adjustments can improve the audio experience of the concerned person and also improve the robustness of the audio processing process.

本申请实施例中提供了一种处理系统,如图3所示,图3所示的处理系统300包括:处理器301和存储器303。其中,处理器301和存储器303相连,如通过总线302相连。可选地,处理系统300还可以包括收发器304。需要说明的是,实际应用中收发器304不限于一个,该处理系统300的结构并不构成对本申请实施例的限定。A processing system is provided in an embodiment of the present application, as shown in FIG3 , and the processing system 300 shown in FIG3 includes: a processor 301 and a memory 303. The processor 301 and the memory 303 are connected, such as through a bus 302. Optionally, the processing system 300 may also include a transceiver 304. It should be noted that in actual applications, the transceiver 304 is not limited to one, and the structure of the processing system 300 does not constitute a limitation on the embodiment of the present application.

处理器301可以是CPU(Central Processing Unit,中央处理器),通用处理器,DSP(Digital Signal Processor,数据信号处理器),ASIC(Application SpecificIntegrated Circuit,专用集成电路),FPGA(Field Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器301也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。The processor 301 may be a CPU (Central Processing Unit), a general-purpose processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the disclosure of this application. The processor 301 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.

总线302可包括一通路,在上述组件之间传送信息。总线302可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(ExtendedIndustry Standard Architecture,扩展工业标准结构)总线等。总线302可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。The bus 302 may include a path to transmit information between the above components. The bus 302 may be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus, etc. The bus 302 may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG. 3 only uses one line to represent, but does not mean that there is only one bus or one type of bus.

存储器303可以是ROM(Read Only Memory,只读存储器)或可存储静态信息和指令的其他类型的静态存储设备,RAM(Random Access Memory,随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM(Electrically ErasableProgrammable Read Only Memory,电可擦可编程只读存储器)、CD-ROM(Compact DiscRead Only Memory,只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The memory 303 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, or an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical disk storage, optical disk storage (including compressed optical disk, laser disk, optical disk, digital versatile disk, Blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.

存储器303用于存储执行本申请方案的应用程序代码,并由处理器301来控制执行。处理器301用于执行存储器303中存储的应用程序代码,以实现前述方法实施例所示的内容。The memory 303 is used to store the application code for executing the solution of the present application, and the execution is controlled by the processor 301. The processor 301 is used to execute the application code stored in the memory 303 to implement the contents shown in the above method embodiment.

其中,处理系统包括但不限于:移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。还可以为服务器等。图3示出的处理系统仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。The processing system includes, but is not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. It can also be a server, etc. The processing system shown in FIG3 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present application.

本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,当其在计算机上运行时,使得计算机可以执行前述方法实施例中相应内容。An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored. When the computer-readable storage medium is run on a computer, the computer can execute the corresponding content in the aforementioned method embodiment.

本申请实施例提供了一种计算机程序产品,其包括计算机程序,该计算机程序被处理器执行时实现如上述任一实施例中的方法。An embodiment of the present application provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the method in any of the above embodiments is implemented.

应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the steps in the flowchart of the accompanying drawings are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be executed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

以上所述仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above description is only a partial implementation method of the present application. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications should also be regarded as the scope of protection of the present application.

Claims (9)

1.一种音频处理方法,其特征在于,包括:1. An audio processing method, comprising: 获取传播环境参数,并基于所述传播环境参数确定与传播环境对应的目标阵列麦克风配置参数,所述传播环境参数包括场地尺寸、场地布局、场地材料,所述目标阵列麦克风配置参数包括数量、类型以及间距;Acquire propagation environment parameters, and determine target array microphone configuration parameters corresponding to the propagation environment based on the propagation environment parameters, wherein the propagation environment parameters include site size, site layout, and site material, and the target array microphone configuration parameters include quantity, type, and spacing; 基于所述目标阵列麦克风配置参数,采集所述传播环境对应的传播环境音频;Based on the target array microphone configuration parameters, collecting propagation environment audio corresponding to the propagation environment; 识别所述传播环境音频中包含的音频特征,并基于音频特征从所述传播环境音频中确定出主音频和从音频,所述音频特征包括频率、时域以及变化幅度;Identify audio features contained in the propagation environment audio, and determine the main audio and the slave audio from the propagation environment audio based on the audio features, wherein the audio features include frequency, time domain, and variation amplitude; 基于所述主音频、所述音频特征和预设数学模型,确定所述从音频对应的从音频信号;Determine a slave audio signal corresponding to the slave audio based on the master audio, the audio feature and a preset mathematical model; 基于所述从音频信号对所述传播环境音频进行降噪处理,得到降噪音频,并根据所述降噪音频进行音频传播。The propagation environment audio is subjected to noise reduction processing based on the secondary audio signal to obtain noise-reduced audio, and audio propagation is performed based on the noise-reduced audio. 2.根据权利要求1所述的一种音频处理方法,其特征在于,所述基于所述传播环境参数确定与传播环境对应的目标阵列麦克风配置参数,包括:2. The audio processing method according to claim 1, wherein determining the target array microphone configuration parameters corresponding to the propagation environment based on the propagation environment parameters comprises: 基于所述传播环境参数和预设标准场地模型,确定对应的目标场地模型;Determining a corresponding target site model based on the propagation environment parameters and a preset standard site model; 根据所述传播环境参数确定多个初始阵列麦克风配置,并基于每个初始阵列麦克风配置和所述目标场地模型分别进行音频传播模拟,得到每个初始阵列麦克风配置对应的初始模拟参数,每个初始模拟参数中包含模拟回声参数和模拟混响参数;Determine a plurality of initial array microphone configurations according to the propagation environment parameters, and perform audio propagation simulations based on each initial array microphone configuration and the target site model to obtain initial simulation parameters corresponding to each initial array microphone configuration, wherein each initial simulation parameter includes a simulated echo parameter and a simulated reverberation parameter; 将预设传播标准参数与每个初始模拟参数进行匹配,得到每个初始模拟参数对应的模拟分值,并将模拟分值最高的初始模拟参数确定为所述传播环境对应的目标阵列麦克风配置参数。The preset propagation standard parameters are matched with each initial simulation parameter to obtain a simulation score corresponding to each initial simulation parameter, and the initial simulation parameter with the highest simulation score is determined as the target array microphone configuration parameter corresponding to the propagation environment. 3.根据权利要求1所述的一种音频处理方法,其特征在于,所述基于所述主音频、所述音频特征和预设数学模型,确定所述从音频对应的从音频信号,包括:3. The audio processing method according to claim 1, wherein determining the slave audio signal corresponding to the slave audio based on the master audio, the audio feature and a preset mathematical model comprises: 从所述音频特征中识别所述主音频对应的主音频特征,并基于所述主音频特征和所述音频特征,从所述从音频中确定出相关从音频和非相关从音频;Identify a main audio feature corresponding to the main audio from the audio feature, and determine a relevant slave audio and a non-relevant slave audio from the slave audio based on the main audio feature and the audio feature; 获取噪声调整量化参数,并根据所述相关从音频、所述非相关从音频、所述噪声调整参数以及预设数学模型,确定所述从音频对应的从音频信号。A noise adjustment quantization parameter is obtained, and a slave audio signal corresponding to the slave audio is determined according to the relevant slave audio, the unrelated slave audio, the noise adjustment parameter, and a preset mathematical model. 4.根据权利要求1所述的一种音频处理方法,其特征在于,当监测到主音频存在至少两个主音频特征时,还包括:4. The audio processing method according to claim 1, characterized in that when it is detected that the main audio has at least two main audio features, it further comprises: 获取声波记录信息,并基于所述声波记录信息,确定每个主音频特征对应的声源位置,所述声波记录信息中包含所述主音频抵达所述阵列麦克风中每个麦克风的抵达时刻;Acquire sound wave recording information, and determine the sound source position corresponding to each main audio feature based on the sound wave recording information, wherein the sound wave recording information includes the arrival time of the main audio at each microphone in the array microphone; 获取每个声源位置对应范围内产生的声源环境音频,并基于每个主音频特征从对应声源环境音频中确定声源从音频;Acquire the sound source environment audio generated within the corresponding range of each sound source position, and determine the sound source secondary audio from the corresponding sound source environment audio based on each main audio feature; 识别每个声源从音频的噪声信息,并通过对比各个噪声信息判断所有的声源从音频对应的噪声信息是否相同,噪声信息包含噪声类型和噪声响度;Identify the noise information of each sound source from the audio, and determine whether the noise information corresponding to the audio of all sound sources is the same by comparing the noise information, the noise information includes the noise type and the noise loudness; 若是,则采用同一降噪方式对所述至少两个主音频特征对应的声源环境音频进行降噪处理。If so, the same noise reduction method is used to perform noise reduction processing on the sound source environment audio corresponding to the at least two main audio features. 5.根据权利要求4所述的一种音频处理方法,其特征在于,所述识别每个声源从音频的噪声信息,并通过对比各个噪声信息判断所有的声源从音频对应的噪声信息是否相同,包括:5. The audio processing method according to claim 4, characterized in that the step of identifying the noise information of each sound source from the audio, and determining whether the noise information corresponding to all the sound source from the audio is the same by comparing the noise information, comprises: 识别每个声源从音频在频谱图中的频率分布和峰值,并基于每个声源从音频对应的频率分布和峰值,确定每个声源从音频对应的第一噪声展示图;Identify the frequency distribution and peak value of each sound source from the audio in the spectrum diagram, and determine a first noise display diagram corresponding to each sound source from the audio based on the frequency distribution and peak value corresponding to each sound source from the audio; 识别每个声源从音频的音频响度,并基于响度映射关系确定每个声源从音频对应的第二噪声展示图,所述响度映射关系为音频响度与第二噪声展示图之间的对应关系;Identify the audio loudness of each sound source slave audio, and determine a second noise display map corresponding to each sound source slave audio based on a loudness mapping relationship, wherein the loudness mapping relationship is a correspondence between the audio loudness and the second noise display map; 将每个声源从音频的第一噪声展示图和第二噪声展示图进行图像叠加,得到每个声源从音频的噪声叠加展示图;Superimpose the first noise display image and the second noise display image of each sound source from the audio to obtain a noise superposition display image of each sound source from the audio; 将每个声源从音频对应的噪声叠加展示图进行图像匹配,根据图像匹配结果判断所有的声源从音频对应的噪声信息是否相同,当所有的声源从音频对应的噪声叠加展示图之间的匹配值不低于预设匹配值时,表征所有的声源从音频对应的噪声信息相同。Perform image matching on the noise superposition display map corresponding to the audio of each sound source, and determine whether the noise information corresponding to the audio of all sound sources is the same based on the image matching results. When the matching value between the noise superposition display maps corresponding to the audio of all sound sources is not lower than the preset matching value, it indicates that the noise information corresponding to the audio of all sound sources is the same. 6.根据权利要求1所述的一种音频处理方法,其特征在于,当所述传播环境内存在关注人员时,还包括:6. The audio processing method according to claim 1, characterized in that when there is a person of interest in the communication environment, it also includes: 获取所述关注人员的关注位置和听力损失参数;Acquiring the concerned position and hearing loss parameters of the concerned person; 基于所述关注位置与所述主音频对应声源位置确定声源间隔和声源方向,并基于所述声源间隔和声源方向以及响度映射关系,确定所述关注位置对应的响度调整参数;Determine a sound source interval and a sound source direction based on the focus position and the sound source position corresponding to the main audio, and determine a loudness adjustment parameter corresponding to the focus position based on the sound source interval, the sound source direction and the loudness mapping relationship; 基于所述响度调整参数和所述听力损失参数对所述降噪音频进行优化,得到优化降噪音频,并将所述优化降噪音频反馈至所述关注人员佩戴的助听设备中。The noise reduction audio is optimized based on the loudness adjustment parameter and the hearing loss parameter to obtain an optimized noise reduction audio, and the optimized noise reduction audio is fed back to the hearing aid device worn by the person of interest. 7.一种处理系统,其特征在于,该处理系统包括:7. A processing system, characterized in that the processing system comprises: 至少一个处理器;at least one processor; 存储器;Memory; 至少一个应用程序,其中所述至少一个应用程序被存储在存储器中并被配置为由至少一个处理器执行,所述至少一个应用程序配置用于:执行权利要求1-6中任一项所述的一种音频处理方法。At least one application, wherein the at least one application is stored in a memory and configured to be executed by at least one processor, and the at least one application is configured to: execute an audio processing method according to any one of claims 1-6. 8.一种计算机可读存储介质,其特征在于,包括:存储有能够被处理器加载并执行如权利要求1-6中任一项所述的一种音频处理方法的计算机程序。8. A computer-readable storage medium, characterized in that it comprises: a computer program that can be loaded by a processor and execute an audio processing method according to any one of claims 1 to 6. 9.一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1-6中任一项所述的一种音频处理方法的步骤。9. A computer program product, characterized in that it comprises a computer program, and when the computer program is executed by a processor, the steps of an audio processing method according to any one of claims 1 to 6 are implemented.
CN202411192487.8A 2024-08-01 2024-08-28 Audio processing method, processing system, medium and program product Pending CN118841022A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202411050845 2024-08-01
CN2024110508451 2024-08-01

Publications (1)

Publication Number Publication Date
CN118841022A true CN118841022A (en) 2024-10-25

Family

ID=93149209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411192487.8A Pending CN118841022A (en) 2024-08-01 2024-08-28 Audio processing method, processing system, medium and program product

Country Status (1)

Country Link
CN (1) CN118841022A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119562180A (en) * 2025-02-05 2025-03-04 广东鼎创智造科技有限公司 A microphone optimization control method and system based on scene adaptation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119562180A (en) * 2025-02-05 2025-03-04 广东鼎创智造科技有限公司 A microphone optimization control method and system based on scene adaptation
CN119562180B (en) * 2025-02-05 2025-05-13 广东鼎创智造科技有限公司 Microphone optimal control method and system based on scene adaptation

Similar Documents

Publication Publication Date Title
US11812254B2 (en) Generating scene-aware audio using a neural network-based acoustic analysis
Majumder et al. Few-shot audio-visual learning of environment acoustics
US9602940B2 (en) Audio playback system monitoring
US9978388B2 (en) Systems and methods for restoration of speech components
US20150264505A1 (en) Wireless exchange of data between devices in live events
CN112306448A (en) Method, apparatus, apparatus and medium for adjusting output audio according to ambient noise
JP2017530396A (en) Method and apparatus for enhancing a sound source
CN103165136A (en) Audio processing method and audio processing device
US20240177726A1 (en) Speech enhancement
EP3320311B1 (en) Estimation of reverberant energy component from active audio source
CN118841022A (en) Audio processing method, processing system, medium and program product
US20240244390A1 (en) Audio signal processing method and apparatus, and computer device
CN104599679A (en) Speech signal based focus covariance matrix construction method and device
EP3614692A1 (en) Information processing device, information processing method, speech output device, and speech output method
Chen et al. Hearable devices with sound bubbles
Calamia et al. Blind estimation of the direct-to-reverberant ratio using a beta distribution fit to binaural coherence
US20240214765A1 (en) Signal processing method and apparatus for audio rendering, and electronic device
JP2023007657A (en) Acoustic material characteristic estimation program, device and method, and acoustic simulation program
US20240363131A1 (en) Speech enhancement
CN117896666A (en) Method for playback of audio data, electronic device and storage medium
CN113409800A (en) Processing method and device for monitoring audio, storage medium and electronic equipment
CN106339514A (en) Method estimating reverberation energy component from movable audio frequency source
CN117121104A (en) Estimating an optimized mask for processing acquired sound data
JP2015164267A (en) Sound collection device, sound collection method, and program
CN111951786A (en) Training method, device, terminal equipment and medium for voice recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination