CN115334366A

CN115334366A - A modeling method for interactive immersive sound field roaming

Info

Publication number: CN115334366A
Application number: CN202210978930.9A
Authority: CN
Inventors: 刘京宇; 蒋鉴; 任鹏昊
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-11-11

Abstract

The invention provides a modeling method for interactive immersive sound field roaming, a sound field roaming method and a sound field roaming system. The sound field roaming method comprises the following steps: determining N first positions of N virtual musical instruments in a virtual sound field space and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space, and the N virtual musical instruments, the virtual sound field space and the virtual character are realized through virtual reality technology; determining relative position information between the N first positions and the second positions, wherein the N first positions are N virtual sound source positions, and the second positions are virtual listening positions; processing the N types of first audio signals by using a sound field space model according to the relative position information to obtain second audio signals; and responding to the playing operation of the user, and playing the second audio signal to the user.

Description

A modeling method for interactive immersive sound field roaming

技术领域technical field

本发明涉及演艺科技领域，更具体地涉及一种交互沉浸式声场漫游的建模方法、声场漫游方法和声场漫游系统。The present invention relates to the field of performing arts science and technology, and more particularly relates to a modeling method of interactive immersive sound field roaming, a sound field roaming method and a sound field roaming system.

背景技术Background technique

在虚拟系统中重现真实声场空间(例如音乐厅)的听觉效果对观众和音乐欣赏者的体验至关重要。对于演出行业的音乐人来说，基于虚拟平台的排练和演出模拟系统的开发，可以帮助艺术家将舞台从线下转移到线上，帮助解决目前巡演困难、人才流失、文化艺术品牌难以建立的现状。Recreating the auditory effect of a real sound field space (such as a concert hall) in a virtual system is crucial to the experience of the audience and music listeners. For musicians in the performance industry, the development of rehearsal and performance simulation systems based on virtual platforms can help artists transfer the stage from offline to online, and help solve the current situation of difficulties in touring, brain drain, and difficulty in establishing cultural and artistic brands .

在实现本发明实施例的过程中，发明人从音乐表演领域出发，结合了观众、音乐家、指挥家和音频工程师的使用需求，发现还存在以下问题：In the process of implementing the embodiments of the present invention, the inventor started from the field of music performance, combined with the needs of audiences, musicians, conductors and audio engineers, and found that there are still the following problems:

(1)对于乐团指挥和音乐家而言，现有技术乐队声部位置不能改变，无法实现乐队声部位置音响效果的实时切换，影响乐团演出效果评价的效率。(1) For orchestra conductors and musicians, the position of the band parts in the prior art cannot be changed, and the real-time switching of the sound effect of the band parts position cannot be realized, which affects the efficiency of the performance evaluation of the orchestra.

(2)对于录音师而言，现有技术无法实现不同录音制式音响效果的实时对比与切换，也无法实现不同声部音量平衡的实时调控，音乐混音效率很低。(2) For the sound engineer, the existing technology cannot realize the real-time comparison and switching of the sound effects of different recording systems, nor can it realize the real-time control of the volume balance of different parts, and the music mixing efficiency is very low.

(3)对于音乐会的听觉效果而言，现有技术无法模拟音乐厅中任一位置的音响效果的实时呈现，也无法模拟不同声场空间(例如不同音乐厅、不同自然场景和生活环境)的听觉效果。(3) For the auditory effect of the concert, the existing technology cannot simulate the real-time presentation of the acoustic effect at any location in the concert hall, nor can it simulate the sound field of different sound field spaces (such as different concert halls, different natural scenes and living environments). auditory effect.

因此，如何再现不同声场空间的声学效果是当前亟待解决的问题。Therefore, how to reproduce the acoustic effects of different sound field spaces is an urgent problem to be solved.

发明内容Contents of the invention

鉴于上述问题，本发明提供了一种交互沉浸式声场漫游的建模方法、声场漫游方法和声场漫游系统。In view of the above problems, the present invention provides a modeling method for interactive immersive sound field roaming, a sound field roaming method and a sound field roaming system.

本发明实施例的一个方面提供了一种基于可听化的交互沉浸式声场漫游方法，包括：确定N种虚拟乐器在虚拟声场空间中的N个第一位置，以及虚拟人物在所述虚拟声场空间中的第二位置，其中，所述虚拟人物用于被用户操作以在所述虚拟声场空间中停止或移动；确定所述N个第一位置和所述第二位置之间的相对位置信息，其中，所述N个第一位置为N个虚拟声源位置，所述第二位置为虚拟收听位置，N为大于或等于1的整数；根据所述相对位置信息，利用声场空间模型处理N种第一音频信号，获得第二音频信号，其中，所述声场空间模型用于模拟所述N种第一音频信号在物理空间中的传播，所述N种第一音频信号与所述N种虚拟乐器一一对应；响应于所述用户的播放操作，向所述用户播放所述第二音频信号。An aspect of the embodiments of the present invention provides an audible-based interactive immersive sound field roaming method, including: determining N first positions of N virtual musical instruments in the virtual sound field space, and determining the first positions of N virtual instruments in the virtual sound field A second position in space, wherein the avatar is used to be operated by the user to stop or move in the virtual sound field space; determining relative position information between the N first positions and the second position , wherein, the N first positions are N virtual sound source positions, the second position is a virtual listening position, and N is an integer greater than or equal to 1; according to the relative position information, the sound field space model is used to process N Types of first audio signals to obtain a second audio signal, wherein the sound field space model is used to simulate the propagation of the N types of first audio signals in physical space, the N types of first audio signals and the N types There is a one-to-one correspondence between the virtual musical instruments; in response to the user's playback operation, the second audio signal is played to the user.

根据本发明的实施例，所述声场空间模型包括直达声处理模型、早期反射声模型和后期混响声模型，所述利用声场空间模型处理N种第一音频信号，获得第二音频信号包括：利用所述直达声处理模型对所述N种第一音频信号进行衰减处理，获得第一输出结果；将所述第一输出结果输入所述早期反射声模型进行反射处理，获得第二输出结果；将所述第一输出结果输入所述后期混响声模型进行混响处理，获得第三输出结果；根据所述第二输出结果和所述第三输出结果，获得所述第二音频信号。According to an embodiment of the present invention, the sound field space model includes a direct sound processing model, an early reflection sound model, and a late reverberation sound model, and processing N types of first audio signals using the sound field space model to obtain a second audio signal includes: using The direct sound processing model performs attenuation processing on the N kinds of first audio signals to obtain a first output result; the first output result is input into the early reflection sound model for reflection processing to obtain a second output result; The first output result is input into the late reverberation sound model for reverberation processing to obtain a third output result; and the second audio signal is obtained according to the second output result and the third output result.

根据本发明的实施例，所述相对位置信息包括距离信息，所述利用所述直达声处理模型对所述N种第一音频信号进行衰减处理包括：利用N个距离衰减曲线根据所述距离信息来处理所述N种第一音频信号，其中，所述N个距离衰减曲线与所述N种第一音频信号一一对应，所述N个距离衰减曲线中任两个曲线之间相同或不同。According to an embodiment of the present invention, the relative position information includes distance information, and the attenuation processing of the N types of first audio signals by using the direct sound processing model includes: using N distance attenuation curves according to the distance information to process the N types of first audio signals, wherein the N distance attenuation curves correspond to the N types of first audio signals one-to-one, and any two of the N distance attenuation curves are the same or different .

根据本发明的实施例，所述利用N个距离衰减曲线根据所述距离信息来处理所述N种第一音频信号包括对于所述N种第一音频信号中的至少一种音频信号进行锥形衰减处理，具体包括：对所述至少一种音频信号中的任一种音频信号，基于所述虚拟声场空间的内部空间信息获得传播距离；将该种音频信号对应的虚拟声源位置作为球心位置，将所述传播距离作为半径，获得该种音频信号的球形传播区域；将所述球形传播区域划分为内角区域、外角区域、所述内角区域与所述外角区域之间的过渡区域；根据所述第二位置所属的实际区域，对该种音频信号进行对应的衰减处理，获得所述第一输出结果，其中，所述实际区域包括所述内角区域类别、所述外角区域和所述过渡区域中的任一区域。According to an embodiment of the present invention, the processing of the N types of first audio signals according to the distance information using N distance attenuation curves includes performing taper on at least one audio signal of the N types of first audio signals The attenuation processing specifically includes: for any one of the at least one audio signal, obtaining the propagation distance based on the internal space information of the virtual sound field space; using the virtual sound source position corresponding to the audio signal as the center of the sphere position, using the propagation distance as a radius to obtain the spherical propagation area of the audio signal; dividing the spherical propagation area into an inner corner region, an outer corner region, and a transition region between the inner corner region and the outer corner region; according to The actual area to which the second position belongs performs corresponding attenuation processing on the audio signal to obtain the first output result, wherein the actual area includes the inner corner area category, the outer corner area, and the transition any of the regions.

根据本发明的实施例，根据所述N个虚拟声源位置和所述虚拟声场空间的几何形态，计算得到M个虚声源；根据所述第二位置和所述几何形态，计算得到S个声音反射路径，M和S分别为大于或等于1的整数；其中，所述将所述第一输出结果输入所述早期反射声模型进行反射处理，获得第二输出结果包括：根据所述M个虚声源和所述S个声音反射路径对所述第一输出结果进行反射处理，获得所述第二输出结果。According to an embodiment of the present invention, M virtual sound sources are calculated according to the N virtual sound source positions and the geometry of the virtual sound field space; S number of virtual sound sources are calculated according to the second position and the geometry In the sound reflection path, M and S are respectively integers greater than or equal to 1; wherein, the input of the first output result into the early reflection sound model for reflection processing, and obtaining the second output result include: according to the M The virtual sound source and the S sound reflection paths perform reflection processing on the first output result to obtain the second output result.

根据本发明的实施例，在所述计算得到S个声音反射路径之前，所述方法还包括：将所述虚拟人物作为射线源头，从所述第二位置发出虚拟射线；通过所述虚拟射线检测听觉交互信息，其中，所述听觉交互信息包括所述虚拟人物与所述虚拟声场空间中墙体之间的距离和所述虚拟声场空间中墙体的材质信息。According to an embodiment of the present invention, before obtaining the S sound reflection paths through the calculation, the method further includes: using the virtual person as a ray source, sending a virtual ray from the second position; detecting Auditory interaction information, wherein the auditory interaction information includes the distance between the virtual character and the wall in the virtual sound field space and the material information of the wall in the virtual sound field space.

根据本发明的实施例，所述后期混响声模型包括从K个物理环境中录制获得的K个脉冲响应信号，所述将所述第一输出结果输入所述后期混响声模型进行混响处理，获得第三输出结果包括：响应于所述用户从K个所述虚拟声场空间中选择的第一虚拟声场空间，调用第一脉冲响应信号，其中，所述第一虚拟声场空间根据所述K个物理环境中的第一物理环境构建获得，K为大于或等于1 的整数；将所述第一输出结果和所述第一脉冲响应信号进行卷积计算，获得所述第三输出结果。According to an embodiment of the present invention, the late reverberation sound model includes K impulse response signals recorded from K physical environments, and the first output result is input into the late reverberation sound model for reverberation processing, Obtaining the third output result includes: invoking a first impulse response signal in response to the first virtual sound field space selected by the user from the K virtual sound field spaces, wherein the first virtual sound field space is based on the K virtual sound field spaces The first physical environment in the physical environment is constructed and obtained, K is an integer greater than or equal to 1; the first output result and the first impulse response signal are convoluted to obtain the third output result.

根据本发明的实施例，所述方法还包括：响应于所述用户移动所述虚拟人物的第一指令，令所述虚拟人物移动至第三位置；将所述虚拟收听位置更新为所述第三位置；重新执行确定所述相对位置信息、获得所述第二音频信号以及向所述用户播放所述第二音频信号的操作。According to an embodiment of the present invention, the method further includes: moving the avatar to a third position in response to the user's first instruction to move the avatar; updating the virtual listening position to the first position Three positions: Re-perform the operations of determining the relative position information, obtaining the second audio signal, and playing the second audio signal to the user.

根据本发明的实施例，所述方法还包括：响应于所述用户移动至少一个虚拟乐器的第二指令，令所述至少一个虚拟乐器移动至第四位置；将所述至少一个虚拟乐器在所述N个虚拟声源位置中对应的位置更新为所述第四位置；重新执行确定所述相对位置信息、获得所述第二音频信号以及向所述用户播放所述第二音频信号的操作。According to an embodiment of the present invention, the method further includes: moving the at least one virtual musical instrument to a fourth position in response to the user's second instruction to move the at least one virtual musical instrument; The corresponding position among the N virtual sound source positions is updated to the fourth position; and the operations of determining the relative position information, obtaining the second audio signal, and playing the second audio signal to the user are performed again.

本发明实施例的另一方面提供了一种基于可听化的交互沉浸式声场漫游系统，包括：位置确定单元，用于确定N种虚拟乐器在虚拟声场空间中的N个第一位置，以及虚拟人物在所述虚拟声场空间中的第二位置，其中，所述虚拟人物用于被用户操作以在所述虚拟声场空间中停止或移动；相对位置单元，用于确定所述N个第一位置和所述第二位置之间的相对位置信息，其中，所述N个第一位置为N个虚拟声源位置，所述第二位置为虚拟收听位置，N为大于或等于1 的整数；信号处理单元，用于根据所述相对位置信息，利用声场空间模型处理N 种第一音频信号，获得第二音频信号，其中，所述声场空间模型用于模拟所述N 种第一音频信号在物理空间中的传播，所述N种第一音频信号与所述N种虚拟乐器一一对应；音频播放单元，用于响应于所述用户的播放操作，向所述用户播放所述第二音频信号。Another aspect of the embodiments of the present invention provides an audible-based interactive immersive sound field roaming system, including: a position determination unit, configured to determine N first positions of N virtual instruments in the virtual sound field space, and The second position of the virtual character in the virtual sound field space, wherein the virtual character is used to be operated by the user to stop or move in the virtual sound field space; the relative position unit is used to determine the N first relative position information between the position and the second position, wherein the N first positions are N virtual sound source positions, the second position is a virtual listening position, and N is an integer greater than or equal to 1; A signal processing unit, configured to process N types of first audio signals using a sound field space model according to the relative position information to obtain a second audio signal, wherein the sound field space model is used to simulate the N types of first audio signals in Propagation in physical space, the N types of first audio signals correspond to the N types of virtual instruments one by one; an audio playback unit is configured to play the second audio to the user in response to the user's playback operation Signal.

本发明实施例的另一方面提供了一种交互沉浸式声场漫游的建模方法，包括：获得直达声处理模型，所述直达声处理模型用于对N种第一音频信号进行衰减处理以获得第一输出结果，所述N种第一音频信号分别从N个虚拟声源位置传播至虚拟收听位置，N为大于或等于1的整数；获得早期反射声模型，所述早期反射声模型用于对所述第一输出结果进行反射处理以获得第二输出结果；获得后期混响声模型，所述后期混响声模型用于对所述第一输出结果行混响处理以获得第三输出结果；设置主输出总线，所述主输出总线用于根据所述第二输出结果和所述第三输出结果获得第二音频信号，其中，所述第二音频信号为模拟所述 N种第一音频信号在物理空间中的传播所得到的音频信号。Another aspect of the embodiments of the present invention provides a modeling method for interactive immersive sound field roaming, including: obtaining a direct sound processing model, and the direct sound processing model is used to attenuate N types of first audio signals to obtain The first output result, the N kinds of first audio signals are respectively propagated from N virtual sound source positions to the virtual listening position, N is an integer greater than or equal to 1; an early reflection model is obtained, and the early reflection model is used for Perform reflection processing on the first output result to obtain a second output result; obtain a late reverberation sound model, and the late reverberation sound model is used to perform reverberation processing on the first output result to obtain a third output result; set a main output bus, where the main output bus is used to obtain a second audio signal according to the second output result and the third output result, wherein the second audio signal is to simulate the N types of first audio signals in The resulting audio signal propagated in physical space.

本发明实施例的另一方面提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，其中，当所述一个或多个程序被所述一个或多个处理器执行时，使得一个或多个处理器执行如上所述的方法。Another aspect of the embodiments of the present invention provides an electronic device, including: one or more processors; a storage device for storing one or more programs, wherein, when the one or more programs are executed by the one When executed by one or more processors, one or more processors execute the method as described above.

本发明实施例的另一方面还提供了一种计算机可读存储介质，其上存储有可执行指令，该指令被处理器执行时使处理器执行如上所述的方法。Another aspect of the embodiments of the present invention also provides a computer-readable storage medium, on which executable instructions are stored, and when the instructions are executed by a processor, the processor executes the above-mentioned method.

本发明的一个或多个实施例能够提供可听化的虚拟声场空间，并有可供用户操作的虚拟人物，随着虚拟人物的移动，模拟现实环境中的声音传播现象来为用户播放音频。分别确定N种虚拟乐器和虚拟人物的位置，并作为N个虚拟声源位置和虚拟收听位置。然后，确定N个虚拟声源位置和虚拟收听位置之间的相对位置信息。接着，利用声场空间模型处理N种第一音频信号，以模拟物理环境中N中第一音频信号基于该相对位置的传播效果，获得第二音频信号。最后向用户播放第二音频信号。从而实现了演艺表演的可实时交互的沉浸式声场漫游功能。One or more embodiments of the present invention can provide an audible virtual sound field space, and a virtual character that can be operated by the user. As the virtual character moves, it simulates the sound propagation phenomenon in the real environment to play audio for the user. The positions of the N types of virtual musical instruments and virtual characters are respectively determined and used as N virtual sound source positions and virtual listening positions. Then, relative position information between the N virtual sound source positions and the virtual listening position is determined. Next, the N kinds of first audio signals are processed by using the sound field space model to simulate the propagation effect of the first audio signals in N in the physical environment based on the relative position, to obtain the second audio signal. Finally the second audio signal is played to the user. In this way, the real-time interactive immersive sound field roaming function of performing arts performances is realized.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

通过以下参照附图对本发明实施例的描述，本发明的上述内容以及其他目的、特征和优点将更为清楚，在附图中：Through the following description of the embodiments of the present invention with reference to the accompanying drawings, the above content and other objects, features and advantages of the present invention will be more clear, in the accompanying drawings:

图1示意性示出了根据本发明实施例的基于可听化的交互沉浸式声场漫游方法的流程图；FIG. 1 schematically shows a flow chart of an audible-based interactive immersive sound field roaming method according to an embodiment of the present invention;

图2示意性示出了根据本发明实施例的获得第二音频信号的流程图；Fig. 2 schematically shows a flow chart of obtaining a second audio signal according to an embodiment of the present invention;

图3示意性示出了根据本发明实施例的处理第一音频信号的流程图；Fig. 3 schematically shows a flow chart of processing a first audio signal according to an embodiment of the present invention;

图4示意性示出了根据本发明实施例的锥形衰减处理的流程图；Fig. 4 schematically shows a flow chart of tapered attenuation processing according to an embodiment of the present invention;

图5示意性示出了根据本发明实施例的反射处理的流程图；Fig. 5 schematically shows a flow chart of reflection processing according to an embodiment of the present invention;

图6示意性示出了根据本发明实施例的检测听觉交互信息的流程图；Fig. 6 schematically shows a flow chart of detecting auditory interaction information according to an embodiment of the present invention;

图7示意性示出了根据本发明实施例的获得第三输出结果的流程图；Fig. 7 schematically shows a flow chart of obtaining a third output result according to an embodiment of the present invention;

图8示意性示出了根据本发明实施例的更新虚拟收听位置的流程图；Fig. 8 schematically shows a flow chart of updating a virtual listening position according to an embodiment of the present invention;

图9示意性示出了根据本发明实施例的更新虚拟声源位置的流程图；Fig. 9 schematically shows a flow chart of updating the position of a virtual sound source according to an embodiment of the present invention;

图10示意性示出了根据本发明实施例的适于实现交互沉浸式声场漫游的建模方法的技术架构图；Fig. 10 schematically shows a technical architecture diagram of a modeling method suitable for realizing interactive immersive sound field roaming according to an embodiment of the present invention;

图11示意性示出了根据本发明实施例的适于实现交互沉浸式声场漫游的建模方法的系统开发架构图；Fig. 11 schematically shows a system development architecture diagram of a modeling method suitable for realizing interactive immersive sound field roaming according to an embodiment of the present invention;

图12示意性示出了根据本发明实施例的基于可听化的交互沉浸式声场漫游系统的结构框图；以及Fig. 12 schematically shows a structural block diagram of an auralization-based interactive immersive sound field roaming system according to an embodiment of the present invention; and

图13示出了根据本发明实施例的计算设备的结构示意图。Fig. 13 shows a schematic structural diagram of a computing device according to an embodiment of the present invention.

具体实施方式Detailed ways

首先对本发明实施例涉及的相关术语进行说明，以便能够更好的理解本发明。Firstly, relevant terms involved in the embodiments of the present invention are described, so as to better understand the present invention.

可听化：从数字(模拟、测量或合成)数据创建可听声音文件的技术。Sonification: The technique of creating audible sound files from digital (analog, measured, or synthetic) data.

交互式：使用户可以通过一些操作，与本发明实施例提供的虚拟对象进行交互，以向用户提供实时的声场漫游、声场切换、声部位置切换和音频处理等功能。Interactive: the user can interact with the virtual object provided by the embodiment of the present invention through some operations, so as to provide the user with functions such as real-time sound field roaming, sound field switching, part position switching and audio processing.

沉浸感：模拟现实音乐表演中的实时声学环境，令用户在声场空间具有身临其境的听觉体验，从而产生沉浸感的效果。Immersion: Simulate the real-time acoustic environment in real music performance, so that users can have an immersive listening experience in the sound field space, thus producing an immersive effect.

声场漫游：虚拟人物在虚拟声场空间中至少部分区域移动位置的过程。Sound field roaming: the process in which the virtual character moves in at least part of the virtual sound field space.

反射声：房间内从天花板和墙面传来的有助于形成较高声压级的声波。Reflected Sound: Sound waves in a room from ceilings and walls that contribute to higher sound pressure levels.

直达声：指从声源不经过任何的反射而以直线的形式直接传播到接受者的声音。Direct sound: Refers to the sound that is transmitted directly from the sound source to the receiver in a straight line without any reflection.

早期反射声：也称初始反射声。紧随着直达声到达、在音质上有益的那部分反射声。Early reflections: also known as initial reflections. That portion of reflected sound that arrives immediately after the direct sound and is sonically beneficial.

混响声：在房间内声音到达稳态或者声源连续发声时在同一时刻所有一次和多次反射声的叠加。Reverberation sound: the superposition of all primary and multiple reflections at the same time when the sound in the room reaches a steady state or the sound source continues to sound.

下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例，然而应当理解，可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本发明，并且能够将本发明的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

本发明的实施例提供了一种交互沉浸式声场漫游的建模方法、声场漫游方法和声场漫游系统。以虚拟音乐厅为例，能够从音乐表演领域出发，结合了观众、音乐家、指挥家和音频工程师等角色的使用需求，基于提供多功能的可定制的交互式沉浸式音乐厅的构思，通过几何声学的模拟算法、多引擎跨平台协同作业，解决音乐会声场声学效果实时模拟与仿真的技术问题。对于乐团指挥和音乐家而言，解决了乐队声部位置音响效果不能实时切换的技术问题，可以作为虚拟排练厅，模拟不同安排和体裁的音乐表演，提高了乐团演出效果评价的效率。对于音频工程师(如录音师)而言，解决了不同录音位置音响效果的实时对比与切换的技术问题，实现了不同声部音量平衡的实时调控，提高了录音师的工作效率。对于音乐会的听觉效果而言，解决了实时模拟音乐厅中任一位置的音响效果的技术问题，并且能够实时模拟不同声场空间的听觉效果。Embodiments of the present invention provide a modeling method for interactive immersive sound field roaming, a sound field roaming method, and a sound field roaming system. Taking the virtual concert hall as an example, it can start from the field of music performance and combine the needs of the audience, musicians, conductors and audio engineers. Based on the idea of providing a multi-functional, customizable, interactive and immersive concert hall, through The simulation algorithm of geometric acoustics and multi-engine cross-platform collaborative operation solve the technical problems of real-time simulation and simulation of the acoustic effect of the concert sound field. For orchestra conductors and musicians, it solves the technical problem that the sound effects of the band parts cannot be switched in real time. It can be used as a virtual rehearsal hall to simulate music performances of different arrangements and genres, improving the efficiency of orchestra performance evaluation. For audio engineers (such as sound engineers), it solves the technical problem of real-time comparison and switching of sound effects in different recording positions, realizes real-time control of the volume balance of different parts, and improves the work efficiency of sound engineers. For the auditory effect of the concert, it solves the technical problem of simulating the acoustic effect of any position in the concert hall in real time, and can simulate the auditory effect of different sound field spaces in real time.

图1示意性示出了根据本发明实施例的基于可听化的交互沉浸式声场漫游方法的流程图。Fig. 1 schematically shows a flow chart of an audible-based interactive immersive sound field roaming method according to an embodiment of the present invention.

如图1所示，该实施例的基于可听化的交互沉浸式声场漫游方法包括操作 S110～操作S140。As shown in Fig. 1 , the auralization-based interactive immersive sound field roaming method of this embodiment includes operation S110 to operation S140.

在操作S110，确定N种虚拟乐器在虚拟声场空间中的N个第一位置，以及虚拟人物在虚拟声场空间中的第二位置，其中，虚拟人物用于被用户操作以在虚拟声场空间中停止或移动。In operation S110, determine N first positions of N types of virtual instruments in the virtual sound field space, and second positions of the virtual character in the virtual sound field space, wherein the virtual character is used to be operated by the user to stop in the virtual sound field space or move.

示例性地，在虚拟的数字空间中模拟真实世界中的事物，以建立起一种逼真的、虚拟的、交互式的三维空间环境，如N种虚拟乐器模型、虚拟声场空间模型和虚拟人物模型。虚拟人物可以在虚拟声场空间中漫游(在至少部分区域内任意移动)。换言之，虚拟声场空间为虚拟的三维空间，其可以包括现实中的三维空间环境信息。Exemplarily, things in the real world are simulated in the virtual digital space to establish a realistic, virtual and interactive three-dimensional space environment, such as N kinds of virtual instrument models, virtual sound field space models and virtual character models . The avatar can roam in the virtual sound field space (move arbitrarily in at least part of the area). In other words, the virtual sound field space is a virtual three-dimensional space, which may include real three-dimensional space environment information.

在一些实施例中，以中国民族管弦乐团(仅为示例)对象和音乐厅场景构建系统场景模型，包括1个音乐厅模型，1个场景漫游人物模型和11个中国民族管弦乐团乐器模型，乐器模型包括弹拨乐器组的琵琶、大阮、中阮、三弦、扬琴，吹管乐器组的梆笛、南箫、笙，拉弦乐器组的二胡、中胡以及打击乐器组的编钟。人物漫游、乐器移动功能脚本的撰写通过Unity支持的编程语言C#完成。素材源被转换为fbx文件格式导入进Unity系统场景中，并在导入之后将材质贴图赋给物理模型。In some embodiments, the system scene model is constructed with the Chinese National Orchestra (just an example) object and the concert hall scene, including 1 concert hall model, 1 scene roaming character model and 11 Chinese National Orchestra instrument models, musical instruments The models include Pipa, Daruan, Zhongruan, Sanxian and Yangqin in the plucked string instruments group, Bangdi, Nanxiao and Sheng in the wind instruments group, Erhu and Zhonghu in the stringed instruments group and chime bells in the percussion instrument group. The character roaming and musical instrument movement function scripts are written through the programming language C# supported by Unity. The material source is converted into an fbx file format and imported into the Unity system scene, and the material map is assigned to the physical model after importing.

示例性地，由于音乐厅场景并没有显眼突出的灯具，而是将光源隐藏在建筑中，实现温暖融合的照明效果，采用光照贴图烘焙作为场景灯光渲染的主要方法。系统场景使用了平行光作为主光源，同时使用数几十个包括点光源、面积光在内的模拟灯光类型补充照亮音乐厅舞台区以及观众席区域的各处阴暗。除此之外，多个灯光探针组成的照明网络也应用于系统场景，以及动态地照亮音乐厅中包括人物、乐器模型在内的多个运动物体。For example, since the concert hall scene does not have conspicuous lamps and lanterns, but hides the light source in the building to achieve a warm and blended lighting effect, light map baking is used as the main method of scene lighting rendering. The system scene uses parallel light as the main light source, and at the same time uses dozens of simulated light types including point light source and area light to supplement the lighting of the stage area of the concert hall and the darkness of the auditorium area. In addition, the lighting network composed of multiple light probes is also applied to the system scene, and dynamically illuminates multiple moving objects in the concert hall, including characters and musical instrument models.

在操作S120，确定N个第一位置和第二位置之间的相对位置信息，其中， N个第一位置为N个虚拟声源位置，第二位置为虚拟收听位置，N为大于或等于1的整数。In operation S120, determine relative position information between N first positions and second positions, wherein, the N first positions are N virtual sound source positions, the second position is a virtual listening position, and N is greater than or equal to 1 an integer of .

示例性地，N个第一位置和第二位置映射为现实(物理)空间中的位置信息，从而将N个虚拟声源位置与虚拟收听位置映射为现实位置后模拟音频信号的传播，相对位置信息可以反映出现实空间中用户与乐器之间的相对位置。Exemplarily, N first positions and second positions are mapped to position information in real (physical) space, so that N virtual sound source positions and virtual listening positions are mapped to real positions to simulate the propagation of audio signals, relative positions The information can reflect the relative position between the user and the instrument in real space.

在操作S130，根据相对位置信息，利用声场空间模型处理N种第一音频信号，获得第二音频信号，其中，声场空间模型用于模拟N种第一音频信号在物理空间中的传播，N种第一音频信号与N种虚拟乐器一一对应。In operation S130, according to the relative position information, use the sound field space model to process N types of first audio signals to obtain second audio signals, wherein the sound field space model is used to simulate the propagation of N types of first audio signals in physical space, N types The first audio signal is in one-to-one correspondence with the N types of virtual musical instruments.

示例性地，声场空间模型可以具有基于声音的物理传播原理的沉浸空间声场模拟框架，从声音的发出、传播路径、接收者三个角度进行信号处理。例如11 种第一音频信号为11个中国民族管弦乐团乐器模型一一对应的音频文件，该音频文件可以是提前录制好的一首音乐中现实乐器演奏的全部音频信号。Exemplarily, the sound field space model may have an immersive space sound field simulation framework based on the physical propagation principle of sound, and perform signal processing from three perspectives of sound emission, propagation path, and receiver. For example, the 11 kinds of first audio signals are audio files corresponding to 11 musical instrument models of the Chinese National Orchestra, and the audio files may be all audio signals of actual musical instrument performances in a piece of music recorded in advance.

在操作S140，响应于用户的播放操作，向用户播放第二音频信号。In operation S140, the second audio signal is played to the user in response to the user's play operation.

示例性地，用户何时进行播放操作，本发明不进行限定，可以在操作S140 之外的其他操作执行前、执行中或执行后，例如可以在操作S110中点击播放按钮，也可以在操作S110之前点击播放按钮。Exemplarily, when the user performs the playback operation is not limited in the present invention, it may be before, during or after the execution of other operations other than operation S140, for example, the play button may be clicked in operation S110, or the play button may be clicked in operation S110 Before hitting the play button.

根据本发明的实施例，能够提供可听化的虚拟声场空间，并有可供用户交互以操作虚拟人物和播放音频，模拟现实环境(如音乐厅)中的声音传播现象来为用户播放音频。分别确定N种虚拟乐器和虚拟人物的位置，并作为N个虚拟声源位置和虚拟收听位置。然后，确定N个虚拟声源位置和虚拟收听位置之间的相对位置信息。接着，利用声场空间模型处理N种第一音频信号，以模拟物理环境中N中第一音频信号基于该相对位置的传播效果，获得第二音频信号。最后向用户播放第二音频信号。从而实现了演艺表演的可实时交互的沉浸式声场漫游功能。According to the embodiment of the present invention, an audible virtual sound field space can be provided, and users can interact to operate virtual characters and play audio, and simulate sound propagation phenomena in real environments (such as concert halls) to play audio for users. The positions of the N types of virtual musical instruments and virtual characters are respectively determined and used as N virtual sound source positions and virtual listening positions. Then, relative position information between the N virtual sound source positions and the virtual listening position is determined. Next, the N kinds of first audio signals are processed by using the sound field space model to simulate the propagation effect of the first audio signals in N in the physical environment based on the relative position, to obtain the second audio signal. Finally the second audio signal is played to the user. In this way, the real-time interactive immersive sound field roaming function of performing arts performances is realized.

图2示意性示出了根据本发明实施例的获得第二音频信号的流程图。图3 示意性示出了根据本发明实施例的处理第一音频信号的流程图。Fig. 2 schematically shows a flow chart of obtaining a second audio signal according to an embodiment of the present invention. Fig. 3 schematically shows a flow chart of processing a first audio signal according to an embodiment of the present invention.

如图2所示，操作S130中利用声场空间模型处理N种第一音频信号，获得第二音频信号包括操作S210～操作S240。其中，声场空间模型包括直达声处理模型、早期反射声模型和后期混响声模型。As shown in FIG. 2 , in operation S130 , using the sound field space model to process N types of first audio signals, and obtaining second audio signals includes operations S210 to S240 . Among them, the sound field space model includes a direct sound processing model, an early reflection sound model and a late reverberation sound model.

在操作S210，利用直达声处理模型对N种第一音频信号进行衰减处理，获得第一输出结果。In operation S210, the N types of first audio signals are attenuated by using the direct sound processing model to obtain a first output result.

示例性地，在虚拟声场空间中，声音(第一音频信号)被设置为点声源，其位置坐标被赋予给场景中的N种虚拟乐器模型。直达声是指在自由场条件下声源不经过任何反射，直接传输到接收器(虚拟人物)的那一部分能量，在传播过程中声音能量受到其周遭环境影响而衰减。在物理传播理论中，点声源能量的衰减遵循平方反比定律，振幅与传播距离的倒数成正比，即距离每增加一倍，振幅会降低6dB。这使得能量在向外辐射时，随着距离增加，散布的范围会越来越大，散布的能量会越来越小。Exemplarily, in the virtual sound field space, sound (the first audio signal) is set as a point sound source, and its position coordinates are given to N types of virtual instrument models in the scene. Direct sound refers to the part of the energy that the sound source directly transmits to the receiver (virtual character) without any reflection under free field conditions, and the sound energy is attenuated by the surrounding environment during the propagation process. In the theory of physical propagation, the attenuation of point sound source energy follows the inverse square law, and the amplitude is proportional to the reciprocal of the propagation distance, that is, every time the distance doubles, the amplitude will decrease by 6dB. This makes the energy radiate outward, as the distance increases, the spread range will become larger and smaller, and the spread energy will become smaller and smaller.

参照图3，播放虚拟乐器的第一音频信号可以称之为播放事件，在直达声处理模型的结构层级中，每个第一音频信号对应一个音轨，经过直达声处理和早期反射声处理的称之为干音轨，经过直达声处理和后期混响声处理的称之为湿音轨。Referring to Figure 3, playing the first audio signal of a virtual instrument can be called a playback event. In the structural hierarchy of the direct sound processing model, each first audio signal corresponds to an audio track. After direct sound processing and early reflection sound processing It is called a dry track, and it is called a wet track after direct sound processing and post-reverb sound processing.

图3所示的两个播放事件为同一个。一些实施例中，可以将直达声处理模型的输出分别作为早期反射声模型和后期混响声模型的输入。也可以设置两个直达声处理模型分别对应早期反射声模型和后期混响声模型。The two playback events shown in Figure 3 are the same. In some embodiments, the output of the direct sound processing model can be used as the input of the early reflection sound model and the late reverberation sound model respectively. It is also possible to set two direct sound processing models corresponding to the early reflection sound model and the late reverberation sound model respectively.

在操作S220，将第一输出结果输入早期反射声模型进行反射处理，获得第二输出结果。In operation S220, input the first output result into the early reflection acoustic model for reflection processing, and obtain the second output result.

示例性地，声波(音频信号)在从声源继续传播过程中会与周遭介质发生碰撞，在这个过程中，一部分能量被介质材料吸收，一部分能量继续向前传播，另一部分能量则会发生反射。声波最初的几次碰撞反射被定义为早期反射，早期反射声相对于直达声存在一定的时间延迟，并且它的传播方向是多种多样的。早期反射体现出的时间差、方向性以及声音能量信息经过听音者(即虚拟人物)的识别，形成了其对自身在所处空间内的方向感以及定位的初步判断，并且进一步揭示了房间的尺寸、形状，并且随墙面材质的变化产生不同的声学效果。For example, the sound wave (audio signal) will collide with the surrounding medium during the continuous propagation from the sound source. During this process, part of the energy is absorbed by the medium material, part of the energy continues to propagate forward, and the other part of the energy will be reflected . The first few collision reflections of sound waves are defined as early reflections. There is a certain time delay between the early reflections and the direct sound, and their propagation directions are various. The time difference, directionality, and sound energy information reflected in the early reflections are recognized by the listener (that is, the virtual character), forming a preliminary judgment of its own sense of direction and positioning in the space it is in, and further revealing the space of the room. Size, shape, and produce different acoustic effects with the change of wall material.

在操作S230，将第一输出结果输入后期混响声模型进行混响处理，获得第三输出结果。In operation S230, the first output result is input into the late reverberation sound model for reverberation processing, and a third output result is obtained.

示例性地，声波在现实音乐厅中持续传播，每遇到一次障碍物就经过一次反射和吸收，剩余的声波能量继续传播，经过大量、多次的反射和吸收之后，房间内剩余的声波能量之和被称之为混响声。在声源停止发声后，混响声持续地发出，并随着反射和吸收缓缓消弭。通过听辨不同空间的后期混响声，该空间的体量以及它独特的建筑声学信息将被直观地感受到。混响声中低频声音的含量总是高于高频声音，衰减时间也比高频声音长，这是因为低频声音波长较长，更容易绕过障碍物而不被反射，也更不易被障碍物所吸收。并且声音在空气中传播都会发生衰减，但在相同的传播条件下，低频声音相较于高频声音衰减程度更低。As an example, sound waves continue to propagate in a real concert hall. Every time they encounter an obstacle, they undergo reflection and absorption, and the remaining sound wave energy continues to propagate. After a large number of multiple reflections and absorptions, the remaining sound wave energy in the room The sum is called reverberation. After the sound source stops producing sound, the reverberation sound continues to be emitted, and slowly disappears with reflection and absorption. By listening to the late reverberation of different spaces, the volume of the space and its unique architectural acoustic information will be intuitively felt. The content of low-frequency sound in reverberation sound is always higher than that of high-frequency sound, and the decay time is longer than that of high-frequency sound. This is because low-frequency sound has a longer wavelength, and it is easier to bypass obstacles without being reflected, and it is also less likely to be blocked by obstacles. absorbed. And the sound will be attenuated when it propagates in the air, but under the same propagation conditions, the attenuation of low-frequency sound is lower than that of high-frequency sound.

在操作S240，根据第二输出结果和第三输出结果，获得第二音频信号。In operation S240, a second audio signal is obtained according to the second output result and the third output result.

参照图3，将第二输出结果和第三输出结果汇总到主输出总线中，由主输出总线输出第二音频信号。总线是音频信号的一条路线，它既可以到另一条总线，也可以直接到输出。Referring to FIG. 3 , the second output result and the third output result are combined into the main output bus, and the second audio signal is output from the main output bus. A bus is a route for audio signals, either to another bus or directly to an output.

在一些实施例中，主输出总线最终的输出包括11干音轨乐器直达声、早期反射声以及湿声混响声。即第一输出结果分别进入了主输出总线、早期反射声模型和后期混响声模型。In some embodiments, the final output of the main output bus includes 11 dry track instrument directs, early reflections, and wet reverbs. That is, the first output results are respectively entered into the main output bus, the early reflection sound model and the late reverberation sound model.

根据本发明的实施例，将音频干声与湿声分开处理。音频干声经过早期反射处理可以包含丰富的距离以及方位信息，直接被送入主输出总线。混响声不包含干声。根据实际的声音传播规律，纯混响声随传播距离增大而能量增大，听音者与声源的距离差将辅助形成空间位置信息，随着声源在虚拟空间的传播，干声与湿声将在传播到听音者的过程中被平滑地渲染为一个携带有综合信息的混响音效。According to an embodiment of the present invention, audio dry and wet sounds are processed separately. The audio dry sound can contain rich distance and orientation information after early reflection processing, and is sent directly to the main output bus. Reverb sounds do not contain dry sounds. According to the actual law of sound propagation, the energy of pure reverberant sound increases with the increase of propagation distance, and the distance difference between the listener and the sound source will assist in the formation of spatial position information. As the sound source propagates in the virtual space, dry sound and wet sound The sound will be smoothly rendered as a reverberation sound effect carrying comprehensive information in the process of spreading to the listener.

根据本发明的实施例，上述相对位置信息包括距离信息，操作S210中利用直达声处理模型对N种第一音频信号进行衰减处理，获得第一输出结果包括：利用N个距离衰减曲线根据距离信息来处理N种第一音频信号，获得第一输出结果，其中，N个距离衰减曲线与N种第一音频信号一一对应，N个距离衰减曲线中任两个曲线之间相同或不同。According to an embodiment of the present invention, the above-mentioned relative position information includes distance information, and in operation S210, the direct acoustic processing model is used to perform attenuation processing on N types of first audio signals, and obtaining the first output result includes: using N distance attenuation curves according to the distance information N types of first audio signals are processed to obtain a first output result, wherein the N distance attenuation curves are in one-to-one correspondence with the N types of first audio signals, and any two of the N distance attenuation curves are the same or different.

用于模拟声音自然衰减情况的距离建模工作在这一步完成，N中第一音频信号可以被分类，根据分类结果创建相应的距离衰减曲线来构建距离衰减模型。最大衰减点由在最大衰减距离值决定，以最大距离值为半径在每个声源周围形成一个球形衰减范围。衰减曲线可以自定义设置，可以增加控制点以细致化调整。距离衰减曲线包括线性曲线、恒定曲线、对数曲线、幂数曲线和S曲线。例如在这里选择对数曲线，并通过插值的方法模拟出一条听感更为真实的曲线。The distance modeling work for simulating the natural sound attenuation is completed in this step, the first audio signal in N can be classified, and the corresponding distance attenuation curve is created according to the classification results to construct the distance attenuation model. The maximum attenuation point is determined by the maximum attenuation distance value, and a spherical attenuation range is formed around each sound source with the maximum distance value as the radius. The attenuation curve can be customized, and control points can be added for fine adjustment. Distance decay curves include linear curve, constant curve, logarithmic curve, power curve and S-curve. For example, the logarithmic curve is selected here, and a curve with a more realistic sense of hearing is simulated by interpolation.

在一些实施例中，为模拟空气吸收效果，针对部分高频、低频频率选用递归滤波器。In some embodiments, in order to simulate the effect of air absorption, a recursive filter is selected for some high-frequency and low-frequency frequencies.

图4示意性示出了根据本发明实施例的锥形衰减处理的流程图。Fig. 4 schematically shows a flow chart of tapered attenuation processing according to an embodiment of the present invention.

如图4所示，该实施例包括对于N种第一音频信号中的至少一种音频信号进行锥形衰减处理，具体包括：对至少一种音频信号中的任一种音频信号执行操作S410～操作S440。As shown in FIG. 4 , this embodiment includes performing tapered attenuation processing on at least one audio signal among the N types of first audio signals, specifically including: performing operations S410 to any one of the at least one audio signal. Operation S440.

在操作S410，基于虚拟声场空间的内部空间信息获得传播距离。In operation S410, a propagation distance is obtained based on internal space information of the virtual sound field space.

示例性地，内部空间信息可以是虚拟声场空间的三维空间参数，例如内部空间的尺寸、建筑布局或音乐会各方(如舞台、观众席和乐队等)布局等。传播距离可以是虚拟声场空间内部从声源到某个墙体的距离。Exemplarily, the internal space information may be three-dimensional spatial parameters of the virtual sound field space, such as the size of the internal space, the layout of a building, or the layout of all parties in a concert (such as the stage, auditorium, and band, etc.). The propagation distance may be the distance from the sound source to a certain wall inside the virtual sound field space.

在操作S420，将该种音频信号对应的虚拟声源位置作为球心位置，将传播距离作为半径，获得该种音频信号的球形传播区域。In operation S420, the position of the virtual sound source corresponding to the audio signal of this type is used as the position of the center of the sphere, and the propagation distance is used as the radius to obtain a spherical propagation area of the audio signal of the type.

在操作S430，将球形传播区域划分为内角区域、外角区域、内角区域与外角区域之间的过渡区域。In operation S430, the spherical propagation area is divided into an inner corner area, an outer corner area, and a transition area between the inner corner area and the outer corner area.

在操作S440，根据第二位置所属的实际区域，对该种音频信号进行对应的衰减处理，获得第一输出结果，其中，实际区域包括内角区域类别、外角区域和过渡区域中的任一区域。In operation S440, perform corresponding attenuation processing on the audio signal according to the actual area to which the second position belongs, and obtain a first output result, wherein the actual area includes any area among inner corner area category, outer corner area and transition area.

根据本发明的实施例，点声源的直达声揭示了虚拟声场空间最初的虚拟声源位置信息，部分强调指向性的乐器被设置声锥衰减模式，以携带更多与听音者朝向方位变化产生交互的舞台声学信息。在锥形衰减中，以乐器模型几何中心为圆心，传播距离为半径的球体被划分为内角、外角和过渡区域。在内角区域中，输出总线音量不衰减，外角区域中输出音量发生衰减，滤波效果达到系统设置的最高水平。在内角外角之间的过渡区域，使用线性插值法以使总线输出音量发生下降。空间音频中乐器传播的方向性通过锥形衰减完成，最终在系统运行中，随着听音者的朝向变化(例如正向、侧向以及背向)，不同程度的音量衰减以及滤波效果得以呈现，听音者可以直接感受到声音随方向的变化过程。According to the embodiment of the present invention, the direct sound of the point sound source reveals the initial virtual sound source position information in the virtual sound field space, and some musical instruments that emphasize directivity are set in the sound cone attenuation mode to carry more changes in the direction of the listener. Generate interactive stage acoustic information. In conical falloff, a sphere centered at the geometric center of the instrument model and with a propagation distance of radius is divided into inner corners, outer corners, and transition regions. In the inner corner area, the volume of the output bus does not attenuate, and in the outer corner area, the output volume is attenuated, and the filtering effect reaches the highest level set by the system. In the transition area between the inner and outer corners, linear interpolation is used to make the bus output volume drop. The directionality of instrument propagation in spatial audio is accomplished through tapered attenuation. Finally, during system operation, as the listener's orientation changes (such as forward, sideways, and back), different degrees of volume attenuation and filtering effects are presented. , the listener can directly feel the change process of the sound with the direction.

在一些实施例中，可以综合使用插件算法、几何建模、滤波器设计等方法，计算并模拟了音乐厅模型的空间化早期反射声，初步还原音乐厅模型的空间声场信息。In some embodiments, plug-in algorithms, geometric modeling, filter design and other methods can be used comprehensively to calculate and simulate the spatialized early reflections of the concert hall model, and initially restore the spatial sound field information of the concert hall model.

图5示意性示出了根据本发明实施例的反射处理的流程图。Fig. 5 schematically shows a flow chart of reflection processing according to an embodiment of the present invention.

如图5所示，该实施例的反射处理包括操作S510～操作S530。其中，操作 S530为操作S220的其中一个实施例。As shown in FIG. 5 , the reflection processing in this embodiment includes operation S510 to operation S530 . Wherein, operation S530 is one embodiment of operation S220.

在虚拟声场空间中，漫游的听音者的位置信息(虚拟收听位置)是处理音频信号所考虑的因素。听音者在音乐厅中被允许绕各处漫游，同时接收到来自四面八方的早期反射声波信息，而早期反射声信息的接收情况与各种反射墙体(即声波传播过程中的障碍物)的距离信息、方向信息、材质信息等直接相关，共同构成了听音者的听觉感知，影响到其对空间感、沉浸感的判断，这一部分可以被实时检测并计算反馈。In the virtual sound field space, the position information of the roaming listener (virtual listening position) is a factor considered in processing the audio signal. The listener is allowed to roam around in the concert hall, and at the same time receive the early reflection sound wave information from all directions, and the reception of the early reflection sound information is related to the various reflection walls (that is, the obstacles in the sound wave propagation process). Distance information, direction information, material information, etc. are directly related, and together constitute the auditory perception of the listener, which affects their judgment on the sense of space and immersion. This part can be detected in real time and calculated for feedback.

在操作S510，根据N个虚拟声源位置和虚拟声场空间的几何形态，计算得到M个虚声源。In operation S510, M virtual sound sources are calculated according to the N virtual sound source positions and the geometry of the virtual sound field space.

示例性地，可以进行声反射几何建模。例如基于多抽头时变延迟线的虚源技术计算空间化早期反射声，它的前提是将反射表面视为无限大且理想刚性的，此时反射模型达到了物理上的准确性。在虚源法的模拟中，每个反射表面之后与声源等距离处都形成一个镜像声源，其与发声体的连线与反射面正交。早期反射阶数随房间几何反射表面的复杂程度而增加，原始声源反射到房间几何结构中的每个表面，从而首先产生一阶反射，之后由前一级反射递归地获得所有高阶反射。这里模拟的早期反射最高阶数是四阶，各阶反射可以称之为各个虚声源。Exemplarily, acoustic reflection geometry modeling can be performed. For example, the virtual source technology based on the multi-tap time-varying delay line calculates the spatialized early reflection sound. Its premise is that the reflective surface is regarded as infinite and ideally rigid. At this time, the reflection model achieves physical accuracy. In the simulation of the virtual source method, an image sound source is formed at an equidistant distance from the sound source after each reflecting surface, and the line connecting it with the sounding body is perpendicular to the reflecting surface. The order of early reflections increases with the complexity of the reflective surfaces in the room geometry. The original sound source is reflected to each surface in the room geometry, resulting in first-order reflections first, and then recursively obtaining all higher-order reflections from the previous order. The highest order of the early reflections simulated here is the fourth order, and each order of reflection can be called each virtual sound source.

示例性地，几何形态包括虚拟声场空间的三维几何空间信息，如尺寸、形状等信息。Exemplarily, the geometry includes three-dimensional geometric space information of the virtual sound field space, such as size, shape and other information.

在操作S520，根据第二位置和几何形态，计算得到S个声音反射路径，M 和S分别为大于或等于1的整数。In operation S520, S sound reflection paths are calculated according to the second position and the geometry, and M and S are integers greater than or equal to 1, respectively.

虚声源和声源与虚拟声场空间几何形态有关，意味着如果这两者保持静止不动，所有虚声源的信息不变，因此可以预先计算出所有的虚声源，但与移动的听音者有关的实时声音反射路径计算随听音者位置变化而被反复单独执行。The virtual sound source and the sound source are related to the geometric shape of the virtual sound field space, which means that if the two remain stationary, the information of all virtual sound sources remains unchanged, so all virtual sound sources can be pre-calculated, but they are not related to the moving listener. The real-time sound reflection path calculation related to the listener is performed iteratively and individually as the listener position changes.

示例性地，可以预先设置虚拟声场空间中的空间表面材质。墙体材质决定了声波在穿透障碍物时被过滤掉的能量的具体情况。在本音乐厅系统中，对墙体材质吸声现象的模拟通过频率分段滤波器设计完成。空间表面材质模型基于四频段滤波衰减完成，吸收频段被分为低、中低、中高、高，其默认映射区间如表1 所示。Exemplarily, the space surface material in the virtual sound field space may be preset. The wall material determines how much energy is filtered out by sound waves as they pass through obstacles. In this concert hall system, the simulation of the sound absorption phenomenon of the wall material is completed through the design of the frequency segment filter. The spatial surface material model is completed based on four-band filter attenuation. The absorption frequency bands are divided into low, medium-low, medium-high, and high. The default mapping intervals are shown in Table 1.

表1吸收频段映射区间Table 1 Absorption frequency band mapping interval

类型名称type name 频率区间frequency range 低频low frequency <250Hz<250Hz 中低频Medium and low frequency >250Hz且<1，000Hz>250Hz and <1,000Hz 中高频Medium and high frequency >1，000Hz且<4，000Hz>1,000Hz and <4,000Hz 高频high frequency >4，000Hz >4,000Hz

声反射几何建模得到的模型中的每一个反射表面都被赋予一个空间表面材质，模拟信号被连续滤波的过程，在虚拟声场空间实际运行中所有材质吸声效果将随着声波的逐次到达而叠加。虚拟声场空间内部墙体、反射板、座椅被分别设置了不同的滤波参数，这些数值与可以查到的实际材料吸收系数相符，从而确保真实物理材质的声学属性被准确地复制进虚拟场景中。Each reflective surface in the model obtained by acoustic reflection geometric modeling is given a space surface material, and the analog signal is continuously filtered. In the actual operation of the virtual sound field space, the sound absorption effect of all materials will increase with the successive arrival of sound waves. overlay. Different filtering parameters are set for the walls, reflectors, and seats inside the virtual sound field space. These values are consistent with the actual material absorption coefficients that can be found, so as to ensure that the acoustic properties of real physical materials are accurately copied into the virtual scene. .

在操作S530，根据M个虚声源和S个声音反射路径对第一输出结果进行反射处理，获得第二输出结果。In operation S530, reflection processing is performed on the first output result according to the M virtual sound sources and the S sound reflection paths to obtain a second output result.

根据本发明的实施例，第二输出结果揭示了房间的大小和形状，与传播路径中各种反射障碍物的距离、方向和材料信息直接相关，共同构成了听音者对空间位置的判断，这部分应该被实时检测和计算。According to the embodiment of the present invention, the second output result reveals the size and shape of the room, which is directly related to the distance, direction and material information of various reflection obstacles in the propagation path, and together constitutes the listener's judgment on the spatial position, This part should be detected and calculated in real time.

图6示意性示出了根据本发明实施例的检测听觉交互信息的流程图。Fig. 6 schematically shows a flow chart of detecting auditory interaction information according to an embodiment of the present invention.

在操作S520之前，如图6所示，该实施例检测听觉交互信息包括操作S610～操作S620。Before operation S520, as shown in FIG. 6 , in this embodiment, detecting auditory interaction information includes operation S610 to operation S620.

在操作S610，将虚拟人物作为射线源头，从第二位置发出虚拟射线。In operation S610, a virtual character is used as a ray source, and a virtual ray is emitted from a second position.

示例性地，虚拟射线可以模拟光线的照射，来根据虚拟射线的传播和反馈来检测周边环境。Exemplarily, the virtual ray can simulate the irradiation of light to detect the surrounding environment according to the propagation and feedback of the virtual ray.

在操作S620，通过虚拟射线检测听觉交互信息，其中，听觉交互信息包括虚拟人物与虚拟声场空间中墙体之间的距离和虚拟声场空间中墙体的材质信息。In operation S620, the auditory interaction information is detected through the virtual ray, wherein the auditory interaction information includes the distance between the virtual character and the wall in the virtual sound field space and material information of the wall in the virtual sound field space.

示例性地，射线探测的方法被用于系统检测声源与接收器之间的方位信息以及传播路径周围的环境信息。在声反射几何模型的基础上，在虚拟声场空间运行时，虚拟人物漫游时会向四周发送射线，与听觉交互有关的信息例如人物到墙体的距离、人物周围墙体的声反射材质等将被实时探测，并进入算法流程。系统的每次变动发生时，例如声源的激活与禁用时、漫游人物位置变化时或者声源及接收器四周的建筑几何形态变化时，射线都会被重新发送。Exemplarily, the method of ray detection is used to systematically detect the orientation information between the sound source and the receiver and the environment information around the propagation path. On the basis of the acoustic reflection geometric model, when the virtual sound field is running, the virtual character will send rays around when roaming, and the information related to the auditory interaction, such as the distance from the character to the wall, and the sound reflection material of the walls around the character, will be It is detected in real time and enters the algorithm process. Rays are re-sent every time there is a change in the system, such as when a sound source is activated and deactivated, when a roaming character changes position, or when the geometry of buildings around the source and receiver changes.

根据本发明的实施例，根据人物到墙体的距离、人物周围墙体的声反射材质等能够准确计算出声音反射路径。According to the embodiment of the present invention, the sound reflection path can be accurately calculated according to the distance from the person to the wall, the sound reflection material of the wall around the person, and the like.

在一些实施例中，考虑到实际的房间几何表面并不会完全是理想状态下的无限刚体，而是具有边界，因此除了声表面早期反射外，在边界处发生的更多的物理传播现象，例如衍射、透射等也需要被考虑在内。衍射的定义是声波遇到障碍物时偏离原来直线传播的物理现象，具体表现为声波在绕过障碍物边缘时发生弯曲的现象。衍射的大小与声音的波长与障碍物尺寸有关，当障碍物相对于波长的尺寸过大，则声波发生衍射的程度也相对较大。基于射线法的衍射模型结合了均匀衍射理论，定义出可视区、反射区和阴影区，声波从反射区的方向传播而来，经过反射面发生反射，然后声波弯曲，经过可视区传播到阴影区，在阴影区中声波可听但声源不可见。透射具体地描述了声源传播过程中受到发出端与听音者之间的障碍物的阻碍作用。滤波器被应用于模拟透射的程度，即设置数组关联声反射材质与透射损失值。In some embodiments, considering that the actual geometric surface of the room is not completely an infinite rigid body in an ideal state, but has a boundary, so in addition to the early reflection of the acoustic surface, more physical propagation phenomena occur at the boundary, For example, diffraction, transmission, etc. also need to be taken into account. The definition of diffraction is the physical phenomenon that the sound wave deviates from the original straight line when it encounters an obstacle. It is specifically manifested as the phenomenon that the sound wave bends when it bypasses the edge of the obstacle. The size of the diffraction is related to the wavelength of the sound and the size of the obstacle. When the size of the obstacle is too large relative to the wavelength, the degree of diffraction of the sound wave is relatively large. The diffraction model based on the ray method combines the uniform diffraction theory to define the visible area, reflection area and shadow area. The shaded area, where sound waves are audible but the sound source is not visible. Transmission specifically describes the obstruction of the propagation of a sound source by obstacles between the source and the listener. The degree to which the filter is applied to simulate transmission, i.e. sets the array associated acoustic material and transmission loss values.

图7示意性示出了根据本发明实施例的获得第三输出结果的流程图。Fig. 7 schematically shows a flow chart of obtaining a third output result according to an embodiment of the present invention.

如图7所示，操作S630中将第一输出结果输入后期混响声模型进行混响处理，获得第三输出结果包括操作S710～操作S720。As shown in FIG. 7 , in operation S630 , the first output result is input into the late reverberation sound model for reverberation processing, and obtaining the third output result includes operation S710 to operation S720 .

在操作S710，响应于用户从K个虚拟声场空间中选择的第一虚拟声场空间，调用第一脉冲响应信号，其中，第一虚拟声场空间根据K个物理环境中的第一物理环境构建获得，K为大于或等于1的整数。In operation S710, invoking a first impulse response signal in response to the first virtual sound field space selected by the user from K virtual sound field spaces, wherein the first virtual sound field space is constructed and obtained according to the first physical environment among the K physical environments, K is an integer greater than or equal to 1.

示例性地，可以为用户提供与K个物理环境一一对应构建的K个虚拟声场空间模型。物理环境包括三维空间信息。Exemplarily, the user may be provided with K virtual sound field space models constructed in one-to-one correspondence with the K physical environments. The physical environment includes three-dimensional spatial information.

此时的房间可以被类比为信号处理领域中“系统”这一概念，更具体地，可以将它视为一个线性时不变系统。在这里可以将音频干声信号看作系统的输入，经过房间的效果之后产生的带有混响的声音被看作系统的输出，输入多个不同的音频干声信号时，获得的输出是单独输入这些音频干声信号的输出结果的叠加之和，并且输入的时间不影响输出的结果。进一步地，如果输入的信号覆盖全频率，那么获得的输出信号就自然而然地包括了系统对所有频率的响应。在数字信号中，这个输入的信号被称为脉冲，系统输出的信号被称为脉冲响应，在特定的房间输入脉冲，所获得的脉冲响应也就包含了房间的所有空间信息。The room at this time can be compared to the concept of "system" in the field of signal processing, and more specifically, it can be regarded as a linear time-invariant system. Here, the audio dry sound signal can be regarded as the input of the system, and the sound with reverberation produced after passing through the room effect is regarded as the output of the system. When multiple different audio dry sound signals are input, the output obtained is a separate Input the sum of the output results of these audio dry sound signals, and the input time does not affect the output result. Furthermore, if the input signal covers all frequencies, then the obtained output signal will naturally include the response of the system to all frequencies. In a digital signal, the input signal is called an impulse, and the output signal of the system is called an impulse response. When an impulse is input in a specific room, the obtained impulse response contains all the spatial information of the room.

示例性地，在构建K个虚拟声场空间模型时，可以复制世界各大著名音乐厅和自然场景和生活环境的三维空间信息和立体场景。并且该K个虚拟声场空间模型可以被用户选择以实时切换。Exemplarily, when constructing K virtual sound field space models, three-dimensional spatial information and three-dimensional scenes of famous concert halls, natural scenes and living environments in the world can be copied. And the K virtual sound field space models can be selected by the user to switch in real time.

虚拟声场空间可以模拟如阿姆斯特丹音乐厅、柏林音乐厅、波士顿音乐厅、芝加哥音乐厅、冰川、桥洞、溶洞和室内球场等空间。在信号预处理阶段，例如选用的脉冲响应信号包括在阿姆斯特丹音乐厅、柏林音乐厅、波士顿音乐厅、芝加哥音乐厅录制采集的脉冲响应信号，每个音乐厅都包括单声道格式的一个左声道信号和一个右声道信号，共同构成立体声。又例如脉冲响应信号包括一些自然和生活环境场景，旨在为使用者提供研究实验性音乐作品适用环境的机会，例如冰川、桥洞、溶洞和室内球场等，音频制式皆为立体声。The virtual sound field space can simulate spaces such as Amsterdam Concert Hall, Berlin Concert Hall, Boston Concert Hall, Chicago Concert Hall, glaciers, bridge holes, caves and indoor courts. In the signal preprocessing stage, for example, the impulse response signals selected include the impulse response signals recorded and collected in the Amsterdam Concert Hall, Berlin Concert Hall, Boston Concert Hall, and Chicago Concert Hall. Each concert hall includes a left voice in monophonic format. channel signal and a right channel signal together constitute stereo. Another example is that the impulse response signal includes some natural and living environment scenes, aiming to provide users with the opportunity to study the applicable environment of experimental music works, such as glaciers, bridge holes, caves and indoor courts, etc. The audio system is stereo.

在操作S720，将第一输出结果和第一脉冲响应信号进行卷积计算，获得第三输出结果。In operation S720, perform convolution calculation on the first output result and the first impulse response signal to obtain a third output result.

基于多种原因，混响声中低频声音的比例更高，并且由于低频声波的辐射指向性不如高频声波明显，可以说混响声相对于早期反射声，其方向感特征并不明显。因此在模拟音乐厅系统的后期混响声时，获得音乐厅空间的采样信息，并采用声学参数使用卷积算法控制混响效果器，可以降低计算机中心处理器消耗成本。For many reasons, the proportion of low-frequency sound in reverberation sound is higher, and because the radiation directivity of low-frequency sound waves is not as obvious as that of high-frequency sound waves, it can be said that reverberation sound has less obvious directional characteristics than early reflection sound. Therefore, when simulating the late reverberation sound of the concert hall system, obtaining the sampling information of the concert hall space, and using the acoustic parameters to control the reverberation effector with the convolution algorithm can reduce the consumption cost of the central processor of the computer.

示例性地，通过设定不同的带有空间信息的脉冲响应信号，并与输入的音频干声信号进行卷积运算，从而真实的再现不同声场环境的空间感。Exemplarily, by setting different impulse response signals with spatial information and performing convolution operation with the input audio dry sound signal, the sense of space of different sound field environments can be truly reproduced.

在使用卷积算法之前，一个必要的步骤是将两个单声道音频上混为一个立体声制式的脉冲信号。经处理过的所有立体声脉冲响应信号与同为立体声的音乐干声信号卷积，输出为具有空间感音乐信号。Before using the convolution algorithm, a necessary step is to upmix the two mono audio to a stereo format pulse signal. All the processed stereo impulse response signals are convolved with the same stereo music dry sound signal, and the output is a music signal with a sense of space.

参照图3，和早期反射声模拟的工作相似，多个挂载着卷积混响效果器的辅助总线被添加至项目工程中，每一个脉冲响应信号被应用为卷积数据，形成一个音乐厅的混响效果。脉冲响应信号转码以离线的方式完成，在音乐厅系统运行时，经过预处理的脉冲响应信号直接与输入的音频干声卷积，更多涉及实时反馈的数字信号处理工作也同时被完成，在本系统中它们具体为音频交互事件“状态”的调用与切换。每一个实际音乐厅的空间混响效果预定义为一个全局状态，在系统运行时全局状态将跟随操作者的指令被触发，随后被分配给相应的音频对象，在状态中预设的音频参数变化将被作用于音频对象。Referring to Figure 3, similar to the work of the early reflection sound simulation, multiple auxiliary buses with convolution reverb effects were added to the project, and each impulse response signal was applied as convolution data to form a concert hall reverberation effect. Impulse response signal transcoding is completed offline. When the concert hall system is running, the preprocessed impulse response signal is directly convolved with the input audio dry sound, and more digital signal processing work involving real-time feedback is also completed at the same time. In this system, they are specifically the calling and switching of the "state" of the audio interaction event. The spatial reverberation effect of each actual concert hall is predefined as a global state. When the system is running, the global state will be triggered following the operator's instructions, and then assigned to the corresponding audio object. The preset audio parameters change in the state will be applied to the audio object.

当用户选择了某个音乐厅场景，系统将响应它所对应的音频状态，这个音乐厅的卷积混响所挂载的音频辅助总线将被激活，并被送入音频链路的主输出总线。在每个挂载着卷积混响效果器的辅助总线内部，脉冲响应信号的输入电平、声道配置、平衡控制以及卷积混响运算时干声电平、混响电平、频率均衡、滤波、延迟时间等技术参数都被恰当地调试了，以确保混响声不失真并且平衡，可以被平滑自然地切换。When the user selects a concert hall scene, the system will respond to its corresponding audio state, and the audio auxiliary bus mounted on the convolution reverb of this concert hall will be activated and sent to the main output bus of the audio link . Inside each auxiliary bus loaded with convolution reverb effects, the input level, channel configuration, balance control of the impulse response signal, and the dry sound level, reverb level, and frequency equalization of the convolution reverb operation , filtering, delay time and other technical parameters have been properly adjusted to ensure that the reverb sound is not distorted and balanced, and can be switched smoothly and naturally.

参照图3，整体层级结构中包含11条乐器干声轨道和11条乐器纯混响声轨道。其中干声轨道可以直接送入主输出总线作为直达声，同时被送入挂载了早期反射模拟插件的早期反射辅助总线形成音乐厅系统早期反射声。11条乐器纯混响声轨道被直接送入挂载了卷积混响效果器插件的卷积混响辅助总线，经算法处理和参数调整形成多个著名音乐厅的纯湿声混响。Referring to Fig. 3, the overall hierarchical structure includes 11 instrument dry sound tracks and 11 instrument pure reverberation sound tracks. Among them, the dry sound track can be directly sent to the main output bus as the direct sound, and at the same time sent to the early reflection auxiliary bus equipped with the early reflection simulation plug-in to form the early reflection sound of the concert hall system. 11 instrument pure reverb tracks are directly sent to the convolution reverb auxiliary bus mounted with convolution reverb effect plug-ins. After algorithm processing and parameter adjustment, the pure wet reverb of many famous concert halls is formed.

卷积混响在还原建筑声场时存在连续性问题，具体而言，鉴于卷积混响单次应用时只能将一个脉冲响应信号与干声卷积，而脉冲响应信号实则是基于音乐厅中的某一个点而录制或仿真计算形成的。因此严格意义上，此时被听到的是站在音乐厅中的某一个点感受到的空间音频信息。当听觉环境从扁平的2D转变为3D 场景，即增加了场景人物在环境中漫游的功能体验时，随着听音位置的实时改变，听音者此时所体验到的卷积混响声将不再能准确地反应出真实的空间听感，也就是说离线渲染好的卷积混响效果是静态的，不具备实时变化的空间位置信息，也不支持呈现涉及头部旋转的声音动态变化。更多的与场景漫游人物运动状态有关的声音交互信息应当被加入混响声场模拟中。Convolution reverb has continuity problems in restoring architectural sound fields, specifically, given that a single application of convolution reverb can only convolve one impulse response signal with dry sound, and the impulse response signal is actually based on the It is formed by recording or simulation calculation at a certain point. Therefore, strictly speaking, what is heard at this time is the spatial audio information felt by standing at a certain point in the concert hall. When the auditory environment changes from a flat 2D scene to a 3D scene, which increases the functional experience of scene characters roaming in the environment, as the listening position changes in real time, the convolution reverberation sound experienced by the listener at this time will not change. It can accurately reflect the real spatial sense of hearing, that is to say, the convolution reverberation effect rendered offline is static, does not have real-time changing spatial position information, and does not support the presentation of dynamic changes in sound involving head rotation. More sound interaction information related to the movement state of scene roaming characters should be added to the reverberation sound field simulation.

参照图3和操作S210～操作S240，将音频干声与卷积混响处理得到的湿声分开处理。音频干声对应具有距离衰减模型以及锥形衰减模型的原始乐器音轨，它们的物理传播模型包含丰富的距离以及方位信息，直接被送入主输出总线。卷积混响辅助总线输出的只有混响声，而不包含任何干声。根据实际的声音传播规律，纯混响声随传播距离增大而能量增大，听音者与声源的距离差将辅助形成空间位置信息，随着声源在虚拟空间的传播，干声与湿声将在传播到听音者的过程中被平滑地渲染为一个携带有综合信息的混响音效。Referring to FIG. 3 and operation S210 to operation S240, the audio dry sound and the wet sound obtained by the convolution reverberation processing are processed separately. The audio dry corresponds to the original instrument track with a distance decay model and a cone decay model, and their physical propagation model contains rich distance and orientation information, which is directly fed into the main output bus. The convolution reverb aux bus outputs only the reverb sound without any dry sound. According to the actual law of sound propagation, the energy of pure reverberant sound increases with the increase of propagation distance, and the distance difference between the listener and the sound source will assist in the formation of spatial position information. As the sound source propagates in the virtual space, dry sound and wet sound The sound will be smoothly rendered as a reverberation sound effect carrying comprehensive information in the process of spreading to the listener.

图8示意性示出了根据本发明实施例的更新虚拟收听位置的流程图。Fig. 8 schematically shows a flow chart of updating a virtual listening position according to an embodiment of the present invention.

如图8所示，该实施例的更新虚拟收听位置包括操作S810～操作S830。As shown in FIG. 8 , updating the virtual listening position in this embodiment includes operation S810 to operation S830 .

在操作S810，响应于用户移动虚拟人物的第一指令，令虚拟人物移动至第三位置。In operation S810, the avatar is moved to a third position in response to the user's first instruction to move the avatar.

示例性地，用户可以操作虚拟人物在虚拟声场空间中漫游。多个用户可以各自操作对应的虚拟人物漫游，各个虚拟人物相互独立。Exemplarily, the user can operate the virtual character to roam in the virtual sound field space. Multiple users can operate their corresponding virtual characters to roam, and each virtual character is independent of each other.

示例性地，摄像机被特别设置为第一人称视角。虚拟人物可以被操纵在音乐厅的不同位置漫游，站立和聆听。为了更好地还原现实世界中识别聆听位置的体验，系统增加了头部转动功能，观众可以通过键盘方向键控制虚拟漫游人物的头部转动，从而在这个过程中感知到声源方位的变化。通过视听感官结合为音乐厅观众带来身临其境的体验。Exemplarily, the camera is specially set to a first-person perspective. The avatar can be manipulated to roam, stand and listen in different locations of the concert hall. In order to better restore the experience of identifying the listening position in the real world, the system adds a head rotation function. The audience can control the head rotation of the virtual roaming character through the keyboard arrow keys, so as to perceive the change of the direction of the sound source in the process. Bring an immersive experience to the concert hall audience through the combination of audio-visual senses.

在操作S820，将虚拟收听位置更新为第三位置。In operation S820, the virtual listening position is updated to the third position.

在操作S830，重新执行确定相对位置信息、获得第二音频信号以及向用户播放第二音频信号的操作。即重新执行操作S120～操作S140。In operation S830, the operations of determining the relative position information, obtaining the second audio signal, and playing the second audio signal to the user are re-performed. That is, re-execute operation S120 to operation S140.

考虑到比较不同部位的坐席是观众最迫切的需求，该音乐厅系统通过实时听觉模拟实现了听众与声场的互动。在漫游过程中，随着人物位置的变化，人物与附近建筑和声源之间的相互作用被计算机实时计算出来，并作为携带声音信息的射线反馈。可能发生的情况是，听众在不同的位置漫游感受音乐会的效果，他们还可以走到舞台上体验乐团指挥的工作视角，近距离站在每件乐器旁，了解其声学特性。Considering that comparing seats in different parts is the most urgent need of the audience, the concert hall system realizes the interaction between the audience and the sound field through real-time auditory simulation. During the roaming process, as the position of the character changes, the interaction between the character and nearby buildings and sound sources is calculated by the computer in real time, and used as a ray feedback carrying sound information. What can happen is that the audience roams through the different locations to get a feel for the concert, and they can also walk up to the stage and experience the orchestra conductor's perspective at work, standing up close to each instrument and learning about its acoustic properties.

需要说明的是，上述以虚拟空间(虚拟声场空间)中的位置信息、几何形态或内部空间与现实空间具有映射关系，所描述虚拟空间中音频信号的处理、传播和播放通过模拟具有现实空间中的效果。It should be noted that the positional information, geometry or internal space in the virtual space (virtual sound field space) has a mapping relationship with the real space, and the processing, propagation and playback of audio signals in the described virtual space have the same characteristics as in the real space through simulation. Effect.

图9示意性示出了根据本发明实施例的更新虚拟声源位置的流程图。Fig. 9 schematically shows a flow chart of updating the position of a virtual sound source according to an embodiment of the present invention.

如图9所示，该实施例的更新虚拟声源位置包括操作S910～操作S930。As shown in FIG. 9 , updating the virtual sound source position in this embodiment includes operation S910 to operation S930 .

在操作S910，响应于用户移动至少一个虚拟乐器的第二指令，令至少一个虚拟乐器移动至第四位置。In operation S910, at least one virtual musical instrument is moved to a fourth position in response to a user's second instruction to move the at least one virtual musical instrument.

在操作S920，将至少一个虚拟乐器在N个虚拟声源位置中对应的位置更新为第四位置。In operation S920, the corresponding position of the at least one virtual instrument in the N virtual sound source positions is updated to a fourth position.

示例性地，每种虚拟乐器可以包括多个该种类下的虚拟乐器。可以在虚拟声场空间中移动一个或多个虚拟乐器，实现声部摆位调整。虚拟声源位置坐标被赋予给场景中的N种虚拟乐器模型，并且随着运行时乐器模型的移动而改变。Exemplarily, each virtual musical instrument may include multiple virtual musical instruments of this category. One or more virtual instruments can be moved in the virtual sound field space to realize voice part placement adjustment. The virtual sound source position coordinates are assigned to N types of virtual instrument models in the scene, and change with the movement of the instrument models during runtime.

在操作S930，重新执行确定相对位置信息、获得第二音频信号以及向用户播放第二音频信号的操作。即重新执行操作S120～操作S140。In operation S930, the operations of determining the relative position information, obtaining the second audio signal, and playing the second audio signal to the user are re-performed. That is, re-execute operation S120 to operation S140.

示例性地，乐队、指挥和音乐家可以在线排练(第一音频信号可以预先录制，也可以在实时演奏中获取并处理)，而不需要在现实世界中亲身体验。可以预设各种经典的乐团位置模式，供操作者一键切换，并能在演出时拖拽调整乐器位置。这个功能可以用来研究古典乐器、改良乐器和创新乐器在不同位置情况下的舞台声学效果。Exemplarily, bands, conductors and musicians can rehearse online (the first audio signal can be pre-recorded, and can also be acquired and processed during real-time performance), without the need for personal experience in the real world. Various classic orchestra position modes can be preset for the operator to switch with one button, and drag and drop to adjust the position of the instrument during the performance. This function can be used to study the stage acoustics of classical instruments, improved instruments and innovative instruments in different positions.

根据本发明的实施例，对于乐团和指挥，可以通过移动至少一个虚拟乐器后重新播放音频，来实现线上模拟排练。在模拟排练中，实现乐器声学模拟、声部摆位调整、音乐厅声场实时切换等功能。According to the embodiment of the present invention, for the orchestra and the conductor, the online simulated rehearsal can be realized by replaying the audio after moving at least one virtual instrument. In the simulated rehearsal, functions such as musical instrument acoustic simulation, part position adjustment, and concert hall sound field real-time switching are realized.

图10示意性示出了根据本发明实施例的适于实现交互沉浸式声场漫游的建模方法的技术架构图。图11示意性示出了根据本发明实施例的适于实现交互沉浸式声场漫游的建模方法的系统开发架构图。Fig. 10 schematically shows a technical architecture diagram of a modeling method suitable for realizing interactive immersive sound field roaming according to an embodiment of the present invention. Fig. 11 schematically shows a system development architecture diagram of a modeling method suitable for realizing interactive immersive sound field roaming according to an embodiment of the present invention.

参照图10和图11，该实施例基于数字孪生、虚拟现实、声场仿真、交互沉浸等技术手段构建的可以漫游定制化交互沉浸的虚拟声场空间，例如N种虚拟乐器、虚拟声场空间和虚拟人物通过数字孪生和虚拟现实技术实现。Referring to Figure 10 and Figure 11, this embodiment is based on digital twins, virtual reality, sound field simulation, interactive immersion and other technical means to roam the virtual sound field space for customized interactive immersion, such as N kinds of virtual instruments, virtual sound field space and virtual characters Realized through digital twin and virtual reality technology.

为了再现现实音乐厅的建筑声学效果，使用了虚拟现实技术和双耳房间脉冲响应，根据声音传播原理、几何特征模型和声学材料模拟出一个声学环境(即交互式沉浸声场)。In order to reproduce the architectural acoustics of a real concert hall, virtual reality technology and binaural room impulse response are used to simulate an acoustic environment (ie, an interactive immersive sound field) based on sound propagation principles, geometric feature models and acoustic materials.

参照图10和图11，并结合图1～图9所描述的一个或多个实施例，该实施例可以提供基于可听化的交互沉浸式声场漫游的建模方法，为了实现用户的声场漫游目的，通过建模方法，能够实现用户的声场漫游。该方法包括：获得直达声处理模型，直达声处理模型用于对N种第一音频信号进行衰减处理以获得第一输出结果，N种第一音频信号分别从N个虚拟声源位置传播至虚拟收听位置，N 为大于或等于1的整数；获得早期反射声模型，早期反射声模型用于对第一输出结果进行反射处理以获得第二输出结果；获得后期混响声模型，后期混响声模型用于对第一输出结果行混响处理以获得第三输出结果；设置主输出总线，主输出总线用于根据第二输出结果和第三输出结果获得第二音频信号，其中，第二音频信号为模拟N种第一音频信号在物理空间中的传播所得到的音频信号。Referring to Figure 10 and Figure 11 , combined with one or more embodiments described in Figures 1 to 9, this embodiment can provide a modeling method for interactive immersive sound field roaming based on audibility, in order to realize the user's sound field roaming Purpose, through the modeling method, the user's sound field roaming can be realized. The method includes: obtaining a direct acoustic processing model, the direct acoustic processing model is used to attenuate N types of first audio signals to obtain a first output result, and the N types of first audio signals propagate from N virtual sound source positions to virtual Listening position, N is an integer greater than or equal to 1; obtain the early reflection sound model, and the early reflection sound model is used to perform reflection processing on the first output result to obtain the second output result; obtain the late reverberation sound model, and use the late reverberation sound model Reverberate the first output result to obtain the third output result; set the main output bus, the main output bus is used to obtain the second audio signal according to the second output result and the third output result, wherein the second audio signal is An audio signal obtained by simulating propagation of N types of first audio signals in a physical space.

需要说明的是，参照图10和图11，并结合图1～图9所描述的一个或多个实施例，本公开的声场漫游方法的一个或多个步骤基于对应的建模方法中的一个或多个步骤来实现，在此不进行赘述。It should be noted that, referring to FIG. 10 and FIG. 11 , and in combination with one or more embodiments described in FIG. 1 to FIG. 9 , one or more steps of the sound field roaming method of the present disclosure are based on one of the corresponding modeling methods or a plurality of steps to achieve, and will not go into details here.

三维开发引擎Unity作为场景渲染平台，与专业建模软件和交互式音频引擎Wwise集成，同时开发的UI系统也搭载在Unity上，最终将上述内容集成到一个应用系统，获得实现交互沉浸式声场漫游方法的音乐厅系统。如图11所示， Wwise和Unity之间的通信是基于音频事件打包的逻辑，所有的音频素材、事件和状态属性都被打包成声音库。通过API，它可以被发送到Unity，在那里可以用C#脚本调用一系列的事件命令。As a scene rendering platform, the 3D development engine Unity is integrated with professional modeling software and interactive audio engine Wwise. At the same time, the developed UI system is also carried on Unity. Finally, the above content is integrated into an application system to achieve interactive and immersive sound field roaming Method's concert hall system. As shown in Figure 11, the communication between Wwise and Unity is based on the logic of audio event packaging, and all audio materials, events and state properties are packaged into sound libraries. Through the API, it can be sent to Unity, where a series of event commands can be invoked with a C# script.

示例性地，当前定义的同步器逻辑可以包括两个状态组，分别用于控制乐器音轨的分组播放以及卷积混响辅助总线的动态调用和旁通。乐器音轨播放状态的定义规则是：对于吹管乐器组的静音状态，梆笛、南箫和笙的干声/湿声设置为负无穷，其他乐器的干声/湿声设置为0；对于弹拨乐器组的静音状态，琵琶、中阮、大阮、三弦和扬琴的干声/湿声设置为负无穷，其他乐器的干声/湿声设置为 0，依次类推。卷积混响动态控制状态的定义规则是：5个音乐厅混响、4个自然场景和生活环境的当前卷积混响辅助总线设置为0，其余卷积混响辅助总线设置为负无穷，早期反射辅助总线设置为0，各辅助总线的旁通混响设置为负无穷。Exemplarily, the currently defined synchronizer logic may include two state groups, which are respectively used to control the group playback of instrument tracks and the dynamic recall and bypass of the convolution reverberation auxiliary bus. The rules for defining the playback status of instrument tracks are: for the mute status of the wind instrument group, the dry/wet sound of Bangdi, Nanxiao and Sheng is set to negative infinity, and the dry/wet sound of other instruments is set to 0; For the mute state of the instrument group, the dry/wet sound of Pipa, Zhongruan, Daruan, Sanxian and Yangqin is set to negative infinity, the dry/wet sound of other instruments is set to 0, and so on. The definition rule for the dynamic control state of convolution reverberation is: the current convolution reverberation auxiliary bus of 5 concert hall reverberation, 4 natural scenes and living environment is set to 0, and the other convolution reverberation auxiliary buses are set to negative infinity, The early reflections aux bus is set to 0, and the bypassed reverb for each aux bus is set to minus infinity.

对于专业音频工程师，设计了一系列音频功能，供专业人士在虚拟工作场所练习技能，也供音乐爱好者体验和探索。鉴于数字音频工作站是音频专业人士最熟悉的工作环境，数字音频工作站形式的互动控制系统是核心要求。这个音乐厅系统创建了几个全面的控制面板，支持自定义调整，所有的参数变量变化都会在虚拟管弦乐队演出时产生效果，不会造成暂停或卡顿。For professional audio engineers, a series of audio features have been designed for professionals to practice their skills in a virtual workplace, but also for music lovers to experience and explore. Given that a DAW is the most familiar working environment for audio professionals, an interactive control system in the form of a DAW is a core requirement. This concert hall system has created several comprehensive control panels that support custom adjustments, and all parameter variable changes will have an effect on the virtual orchestra performance without causing pauses or stuttering.

对于混音工程师和音乐声学研究人员来说，允许调整乐团中每个乐器轨道的音量，在混合特定乐器组或调整作品的整体声级时，系统支持选择性播放和静音。该系统还支持混响效果的切换和旁路。For mixing engineers and music-acoustic researchers, it allows adjusting the volume of each instrument track in an orchestra, and the system supports selective playback and mute when mixing specific instrument groups or adjusting the overall sound level of a production. The system also supports switching and bypassing of reverb effects.

录音工程师和舞台技术人员的主要工作是处理各种话筒，多个话筒之间的选择和匹配需要特别分析和设计。录音工程师的任务是传递第一手的音乐，但如果不在现场亲身学习，就很难快速成长，一个解决办法是在线模拟录音，做好充分准备。因此，该系统支持切换和批判性地聆听用不同拾音类型和频率响应的麦克风录制的音频文件。该功能还可以为建立数字传声器库作出贡献。The main work of recording engineers and stage technicians is to deal with various microphones, and the selection and matching between multiple microphones requires special analysis and design. The task of recording engineers is to deliver first-hand music, but it is difficult to grow quickly without learning in person on site. One solution is to simulate recording online and make full preparations. Thus, the system supports switching and critical listening to audio files recorded with microphones of different pickup types and frequency responses. This function can also contribute to the establishment of digital microphone libraries.

在一些实施例中，可以针对应用体验端提出监听系统搭建方法，即将头部追踪器用于系统，它可以实时跟踪人在听音时头部的转动方向和角度，以模拟头部在转动时所产生的声源方向变化效果。用户在使用耳机重放时，将头部跟踪器置于头戴式耳机正上方的横梁中间，通过蓝牙连接并配对至电脑即可。In some embodiments, a monitoring system construction method can be proposed for the application experience side, that is, a head tracker is used in the system, which can track the rotation direction and angle of the head when the person is listening to the sound in real time, so as to simulate the rotation of the head. The effect of changing the direction of the sound source produced. When the user uses the headset to play back, place the head tracker in the middle of the beam directly above the headset, connect and pair it to the computer via Bluetooth.

头部跟踪技术基于双耳效应，通过头部跟踪器获取准确的头部位置信息，对滤波和延迟、声音反射、声场位移等信息进行处理，在不添加声染色的情况下完成了实际空间声场的双耳化实现。头部跟踪技术还包括的一个实用性功能是个性化定制头部建模，它的核心技术原理是对头部相关传输函数建模，HRTF函数描述了声波从空间声源方位传播到双耳的物理过程，包括生理结构(头部、躯干和耳廓等)对声波的绕射、散射和衍射等作用。也可以说，HRTF反映了声波从声源到双耳的传输过程中幅度和相位的改变。通过同步跟踪器获取听音者头部围度以及双耳距离等数据，双耳间延迟和每只耳朵所需的滤波以及增益量被计算和模拟出来，以弥补真实声场中的躯体滤波影响。Based on the binaural effect, the head tracking technology obtains accurate head position information through the head tracker, processes information such as filtering and delay, sound reflection, and sound field displacement, and completes the actual spatial sound field without adding sound coloration. binaural realization. A practical function that head tracking technology also includes is personalized custom head modeling. Its core technical principle is to model the head-related transfer function. The HRTF function describes the propagation of sound waves from the spatial sound source to the two ears. Physical processes, including the effects of physiological structures (head, torso, auricle, etc.) on sound waves such as diffraction, scattering, and diffraction. It can also be said that HRTF reflects the change in amplitude and phase of sound waves during the transmission from the sound source to the ears. Data such as the listener's head circumference and the distance between the ears are obtained through the synchronization tracker, and the interaural delay and the filtering and gain required for each ear are calculated and simulated to compensate for the body filtering effect in the real sound field.

根据本发明的实施例，一方面通过虚拟现实手段搭建出音乐厅场景，并且实现交互功能，另一方面通过声音传输、声音传播的模拟实现从声源经过空间再到接收者的可听化流程，最终以双耳音频的形式呈现。音乐厅系统的功能设计以面向音乐演出的不同群体需求为核心。对于音乐会听众，以实现沉浸感、真实感以及模拟真实空间双耳定位为需求，设计了声场探索、声像定位、虚拟空间模拟等一系列功能。对于乐团和指挥，以解决面向全球巡演的乐团困境、帮助线下-线上演出形式转变以及实现线上模拟排练为需求，设计了乐器声学模拟、声部摆位调整、音乐厅声场实时切换等功能。对于音频工程师，本系统以模拟数字音频工作站、便于混音师和录音师线上工作以及技能练习为需求，设计了一个可以实时调音的用户交互控制系统。According to the embodiment of the present invention, on the one hand, the concert hall scene is built by means of virtual reality, and interactive functions are realized; on the other hand, the auralization process from the sound source through the space to the receiver is realized through sound transmission and sound propagation simulation , finally presented in the form of binaural audio. The functional design of the concert hall system is based on the needs of different groups for music performances. For concert audiences, a series of functions such as sound field exploration, sound image positioning, and virtual space simulation are designed to achieve immersion, realism, and binaural positioning in a simulated real space. For orchestras and conductors, in order to solve the dilemma of orchestras on global tours, help the transformation of offline-online performance forms, and realize online simulated rehearsals, the acoustic simulation of musical instruments, the adjustment of voice parts, and the real-time switching of the sound field of the concert hall are designed. Function. For audio engineers, this system is based on the requirements of simulating digital audio workstations, making it easier for mixers and sound engineers to work online and practice skills, and designs a user interactive control system that can tune in real time.

基于上述基于可听化的交互沉浸式声场漫游方法，本发明还提供了一种基于可听化的交互沉浸式声场漫游系统。以下将结合图12对该装置进行详细描述。Based on the above audible-based interactive immersive sound field roaming method, the present invention also provides an audible-based interactive immersive sound field roaming system. The device will be described in detail below with reference to FIG. 12 .

图12示意性示出了根据本发明实施例的基于可听化的交互沉浸式声场漫游系统1200的结构框图。Fig. 12 schematically shows a structural block diagram of an auralization-based interactive immersive sound field roaming system 1200 according to an embodiment of the present invention.

如图12，基于可听化的交互沉浸式声场漫游系统1200可以包括位置确定单元1210、相对位置单元1220、信号处理单元1230和音频播放单元1240。As shown in FIG. 12 , an auralization-based interactive immersive sound field roaming system 1200 may include a position determination unit 1210 , a relative position unit 1220 , a signal processing unit 1230 and an audio playback unit 1240 .

位置确定单元1210可以执行操作S110，用于确定N种虚拟乐器在虚拟声场空间中的N个第一位置，以及虚拟人物在虚拟声场空间中的第二位置，其中，虚拟人物用于被用户操作以在虚拟声场空间中停止或移动。The position determining unit 1210 may perform operation S110 for determining N first positions of N types of virtual instruments in the virtual sound field space, and second positions of the virtual character in the virtual sound field space, wherein the virtual character is used to be operated by the user to stop or move in the virtual sound field space.

位置确定单元1210可以执行操作S810～操作820，操作S910～操作920在此不做赘述。The location determination unit 1210 may perform operation S810 to operation S820, and operation S910 to operation S920 will not be repeated here.

相对位置单元1220可以执行操作S120，用于确定N个第一位置和第二位置之间的相对位置信息，其中，N个第一位置为N个虚拟声源位置，第二位置为虚拟收听位置，N为大于或等于1的整数。The relative position unit 1220 may perform operation S120 for determining relative position information between N first positions and second positions, wherein the N first positions are N virtual sound source positions, and the second position is a virtual listening position , N is an integer greater than or equal to 1.

信号处理单元1230可以执行操作S130，用于根据相对位置信息，利用声场空间模型处理N种第一音频信号，获得第二音频信号，其中，声场空间模型用于模拟N种第一音频信号在物理空间中的传播，N种第一音频信号与N种虚拟乐器一一对应。The signal processing unit 1230 may perform operation S130, for processing N types of first audio signals using a sound field space model according to the relative position information to obtain a second audio signal, wherein the sound field space model is used to simulate N types of first audio signals in physical For propagation in space, N types of first audio signals correspond to N types of virtual musical instruments one-to-one.

信号处理单元1230还可以执行操作S210～操作240，操作S410～操作440，操作S510～操作530，操作S610～操作620，操作S710～操作720，在此不做赘述。The signal processing unit 1230 may also perform operation S210 to operation 240, operation S410 to operation 440, operation S510 to operation 530, operation S610 to operation 620, operation S710 to operation 720, which will not be repeated here.

音频播放单元1240可以执行操作S140，用于响应于用户的播放操作，向用户播放第二音频信号。The audio playing unit 1240 may perform operation S140 for playing the second audio signal to the user in response to the user's playing operation.

图13示出了根据本发明实施例的计算设备的结构示意图，本发明具体实施例并不对计算设备的具体实现做限定。FIG. 13 shows a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

如图13所示，该计算设备可以包括：处理器(processor)1302、通信接口(Communications Interface)1304、存储器(memory)1306、以及通信总线1308。As shown in FIG. 13 , the computing device may include: a processor (processor) 1302 , a communication interface (Communications Interface) 1304 , a memory (memory) 1306 , and a communication bus 1308 .

其中：in:

处理器1302、通信接口1304、以及存储器1306通过通信总线1308完成相互间的通信。The processor 1302 , the communication interface 1304 , and the memory 1306 communicate with each other through the communication bus 1308 .

通信接口1304，用于与其它设备比如客户端或其它服务器等的网元通信。The communication interface 1304 is used to communicate with network elements of other devices such as clients or other servers.

处理器1302，用于执行程序1310，具体可以执行上述物体抓取方法实施例中的相关步骤。The processor 1302 is configured to execute the program 1310, and may specifically execute the relevant steps in the above embodiment of the object grasping method.

具体地，程序1310可以包括程序代码，该程序代码包括计算机操作指令。Specifically, the program 1310 may include program codes including computer operation instructions.

处理器1302可能是中央处理器CPU，或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU。也可以是不同类型的处理器，如一个或多个CPU以及一个或多个 ASIC。The processor 1302 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present invention. The computing device includes one or more processors, which may be of the same type, such as one or more CPUs. It can also be a different type of processor, such as one or more CPUs and one or more ASICs.

存储器1306，用于存放程序1310。存储器1306可能包含高速RAM存储器，也可能还包括非易失性存储器(nonvolatile memory)，例如至少一个磁盘存储器。The memory 1306 is used to store the program 1310 . The memory 1306 may include a high-speed RAM memory, and may also include a nonvolatile memory (nonvolatile memory), such as at least one disk memory.

程序1310具体可以用于使得处理器1302执行上述任意方法实施例中的物体抓取方法。程序1310中各步骤的具体实现可以参见上述物体抓取实施例中的相应步骤和单元中对应的描述，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述The program 1310 may be specifically configured to enable the processor 1302 to execute the object grasping method in any of the above method embodiments. For the specific implementation of each step in the program 1310, refer to the corresponding steps and the corresponding descriptions in the units in the above-mentioned object grasping embodiment, and details are not repeated here. Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the foregoing method embodiments, and will not be repeated here.

本发明还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的。也可以是单独存在，而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被执行时，实现根据本发明实施例的方法。The present invention also provides a computer-readable storage medium, which may be included in the device/apparatus/system described in the above embodiments. It can also exist independently without being assembled into the equipment/device/system. The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, the method according to the embodiment of the present invention is realized.

在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other device.

各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明实施例也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present invention are not directed to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本发明并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该发明的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline the present disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the embodiments of the invention are sometimes grouped together into a single implementation examples, figures, or descriptions thereof. This method of invention, however, is not to be interpreted as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment.

因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. And form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components according to the embodiments of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤，除有特殊说明外，不应理解为对执行顺序的限定。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be construed as limiting the execution sequence.

Claims

1. An audibility-based interactive immersive sound field roaming method, comprising:

determining N first positions of N virtual musical instruments in a virtual sound field space and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space;

determining relative position information between the N first positions and the second positions, wherein the N first positions are N virtual sound source positions, the second positions are virtual listening positions, and N is an integer greater than or equal to 1;

processing N first audio signals by using a sound field space model according to the relative position information to obtain second audio signals, wherein the sound field space model is used for simulating the propagation of the N first audio signals in a physical space, and the N first audio signals are in one-to-one correspondence with the N virtual musical instruments;

and responding to the playing operation of the user, and playing the second audio signal to the user.

2. The method according to claim 1, wherein the sound field spatial model comprises a direct sound processing model, an early reflected sound model and a late reverberant sound model, and the processing the N types of first audio signals with the sound field spatial model to obtain the second audio signals comprises:

performing attenuation processing on the N first audio signals by using the direct sound processing model to obtain a first output result;

inputting the first output result into the early reflected sound model for reflection processing to obtain a second output result;

inputting the first output result into the later reverberation model for reverberation processing to obtain a third output result;

and obtaining the second audio signal according to the second output result and the third output result.

3. The method as recited in claim 2, wherein the relative position information comprises distance information, and wherein the attenuation processing the N first audio signals using the direct sound processing model comprises:

and processing the N types of first audio signals according to the distance information by utilizing N distance attenuation curves, wherein the N distance attenuation curves correspond to the N types of first audio signals one to one, and any two curves in the N distance attenuation curves are the same or different.

4. The method according to claim 3, wherein said processing the N first audio signals according to the distance information using N distance attenuation curves comprises cone attenuation processing at least one of the N first audio signals, in particular comprising: for any of the at least one audio signal,

obtaining a propagation distance based on the internal space information of the virtual sound field space;

taking the position of a virtual sound source corresponding to the audio signal as the position of a sphere center, and taking the propagation distance as a radius to obtain a spherical propagation area of the audio signal;

dividing the spherical propagation region into an inner angle region, an outer angle region, and a transition region between the inner angle region and the outer angle region;

and performing corresponding attenuation processing on the audio signal according to an actual region to which the second position belongs to obtain the first output result, wherein the actual region comprises any one of the inner corner region type, the outer corner region and the transition region.

5. The method of claim 2, wherein:

calculating to obtain M virtual sound sources according to the N virtual sound source positions and the geometric forms of the virtual sound field spaces;

calculating S sound reflection paths according to the second position and the geometric form, wherein M and S are integers which are larger than or equal to 1 respectively;

wherein, the inputting the first output result into the early stage reflected sound model for reflection processing to obtain a second output result includes:

and performing reflection processing on the first output result according to the M virtual sound sources and the S sound reflection paths to obtain a second output result.

6. The method of claim 5, wherein prior to said calculating S sound reflection paths, the method further comprises:

taking the virtual character as a ray source, and emitting virtual rays from the second position;

and detecting auditory interaction information through the virtual ray, wherein the auditory interaction information comprises the distance between the virtual character and the wall in the virtual sound field space and the material information of the wall in the virtual sound field space.

7. The method of claim 2, wherein the late reverberation sound model includes K impulse response signals obtained from K recordings of physical environments, the inputting the first output result into the late reverberation sound model for reverberation processing, and obtaining a third output result includes:

responding to a first virtual sound field space selected by the user from K virtual sound field spaces, and calling a first impulse response signal, wherein the first virtual sound field space is obtained according to a first physical environment construction in the K physical environments, and K is an integer greater than or equal to 1;

and performing convolution calculation on the first output result and the first impulse response signal to obtain a third output result.

8. The method of claim 1, wherein the method further comprises:

in response to a first instruction from the user to move the virtual character, causing the virtual character to move to a third location;

updating the virtual listening position to the third position;

and re-executing the operations of determining the relative position information, obtaining the second audio signal and playing the second audio signal to the user.

9. The method of claim 1, wherein the method further comprises:

in response to a second instruction from the user to move at least one virtual instrument, causing the at least one virtual instrument to move to a fourth position;

updating the corresponding position of the at least one virtual musical instrument in the N virtual sound source positions to the fourth position;

10. An audibility-based interactive immersive sound field roaming system, comprising:

a position determination unit for determining N first positions of N kinds of virtual musical instruments in a virtual sound field space, and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space;

a relative position unit, configured to determine relative position information between the N first positions and the second position, where the N first positions are N virtual sound source positions, the second position is a virtual listening position, and N is an integer greater than or equal to 1;

a signal processing unit, configured to process N types of first audio signals by using a sound field space model according to the relative position information, to obtain a second audio signal, where the sound field space model is used to simulate propagation of the N types of first audio signals in a physical space, and the N types of first audio signals are in one-to-one correspondence with the N types of virtual musical instruments;

and the audio playing unit is used for responding to the playing operation of the user and playing the second audio signal to the user.

11. A method of modeling interactive immersive sound field roaming, comprising:

obtaining a direct sound processing model, wherein the direct sound processing model is used for carrying out attenuation processing on N types of first audio signals to obtain a first output result, the N types of first audio signals are respectively transmitted to virtual listening positions from N virtual sound source positions, and N is an integer greater than or equal to 1;

obtaining an early phase reflected sound model, wherein the early phase reflected sound model is used for performing reflection processing on the first output result to obtain a second output result;

obtaining a late reverberation sound model for performing reverberation processing on the first output result to obtain a third output result;

and setting a main output bus, wherein the main output bus is used for obtaining a second audio signal according to the second output result and the third output result, and the second audio signal is obtained by simulating the propagation of the N types of first audio signals in a physical space.