RU2815296C1

RU2815296C1 - Device and method for rendering audio scene using pipeline cascades

Info

Publication number: RU2815296C1
Application number: RU2022126526A
Authority: RU
Inventors: Франк ВЕФЕРС; Зимон ШВЕР
Original assignee: Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.
Priority date: 2020-03-13
Filing date: 2021-03-12
Publication date: 2024-03-13

Abstract

FIELD: physics.

SUBSTANCE: invention relates to computer engineering. Technical result is achieved due to the fact that the central controller controls the control layers of the conveyor cascades so that the first control layer or the second control layer prepares another configuration, for example, a second configuration of the first or second audio data processor during or after the operation of the audio data processor in the first configuration. New configuration for the first or second audio data processor is prepared while the audio data processor belonging to that stage of the pipeline is still operating in accordance with the other configuration or executed in another configuration, in case the processing task with an earlier configuration has already been completed. Central controller controls the first and second control layers using switching control to reconfigure the first audio data processor or the second audio data processor into another configuration at a certain moment in time.

EFFECT: reduction of sound stage artifacts with dynamically changing elements in virtual reality or augmented reality applications.

21 cl, 9 dwg

Description

Настоящее изобретение относится к обработке аудиосигнала и, в частности, к обработке аудиосигнала звуковых сцен, происходящей, например, в приложениях виртуальной реальности или дополненной реальности. The present invention relates to audio signal processing and, in particular, to audio processing of sound scenes occurring, for example, in virtual reality or augmented reality applications.

Геометрическая акустика применяется в аурализации, т.е. оперативном и автономном рендеринге аудиосигнала для слуховых сцен и окружений. Это включает в себя системы виртуальной реальности (VR) и дополненной реальности (AR) модуль рендеринга аудиосигнала с 6 DoF наподобие MPEG-I. Для рендеринга сложных аудиосцен с шестью степенями свободы (DoF), применяется область геометрической акустики, где распространение звуковых данных моделируется с использованием способов, известных из оптики, например, построение лучей. В частности, отражения на стенах моделируются на основании моделей выведенных из оптики, где угол падения луча, который отражается на стене, приводит к тому, что угол отражения оказывается равным углу падения.Geometric acoustics is used in auralization, i.e. On-line and offline audio rendering for auditory scenes and environments. This includes virtual reality (VR) and augmented reality (AR) audio rendering modules with 6 DoF like MPEG-I. To render complex audio scenes with six degrees of freedom (DoF), the field of geometric acoustics is used, where the propagation of audio data is modeled using techniques known from optics, such as beamforming. In particular, reflections on walls are modeled based on models derived from optics, where the angle of incidence of the beam that is reflected on the wall causes the angle of reflection to be equal to the angle of incidence.

Системы оперативной аурализации, наподобие модуля рендеринга аудиосигнала в системе виртуальной реальности (VR) или дополненной реальности (AR), обычно выполняют рендеринг ранних отражений на основании геометрических данных отражающего окружения. Затем способ геометрической акустики наподобие способа источника изображения совместно с построением лучей используется для отыскания пригодных путей распространения отраженного звука. Эти способы пригодны, если отражающие плоские поверхности велики по сравнению с длиной волны падающего звука. Расстояние от точки отражения на поверхности до границ отражающей поверхности также должно быть велико по сравнению с длиной волны падающего звука.On-line auralization systems, like the audio renderer in a virtual reality (VR) or augmented reality (AR) system, typically render early reflections based on the geometric data of the reflective environment. A geometric acoustics method such as the image source method is then used in conjunction with beamforming to find suitable propagation paths for the reflected sound. These methods are suitable if the reflective flat surfaces are large compared to the wavelength of the incident sound. The distance from the reflection point on the surface to the boundaries of the reflecting surface must also be large compared to the wavelength of the incident sound.

Рендеринг звука в виртуальной реальности (VR) и дополненной реальности (AR) выполняется для слушателя (пользователя). Входными сигналами этого процесса являются (обычно безэховые) аудиосигналы источников звука. Затем к этим входным сигналам применяются разнообразные методы обработки сигнала с целью имитации и включения соответствующих акустических эффектов, например, передачи звука через стены/окна/двери, дифракции и преграждения на сплошных или проницаемых структурах, распространения звука на большие расстояния, отражений в полуоткрытых и замкнутых окружениях, доплеровских сдвигов движущихся источников/слушателей и т.д. Рендеринг аудиосигнала приводит к выводу аудиосигналов, которые создают реалистическое, трехмерное акустическое восприятие представленной сцены VR/AR при доставке слушателю через наушники или громкоговорители.Audio rendering in virtual reality (VR) and augmented reality (AR) is done for the listener (user). The inputs to this process are (usually anechoic) audio signals from sound sources. A variety of signal processing techniques are then applied to these input signals to simulate and incorporate relevant acoustic effects, such as sound transmission through walls/windows/doors, diffraction and obstruction from solid or permeable structures, long-range sound propagation, reflections in semi-open and closed structures. environments, Doppler shifts of moving sources/listeners, etc. Audio signal rendering results in the output of audio signals that create a realistic, three-dimensional acoustic experience of the presented VR/AR scene when delivered to the listener via headphones or speakers.

Рендеринг осуществляется слушателецентрично, и система должна реагировать на движение и взаимодействие пользователя мгновенно, без значительных задержек. Поэтому обработка аудиосигналов должна осуществляться оперативно. Пользовательский ввод проявляется в изменениях обработки сигнала (например, разных фильтрах). Эти изменения подлежат включению в рендеринг без слышимых артефактов. Rendering is listener-centric, and the system must respond to user movement and interaction instantly, without significant delays. Therefore, audio signal processing must be carried out quickly. User input manifests itself in signal processing changes (eg, different filters). These changes must be included in the rendering without audible artifacts.

Большинство модулей рендеринга аудиосигнала использует заданную фиксированную структуру обработки сигнала (блок-схему, применяемую к множественным каналам, см. например [1]) с фиксированным бюджетом времени вычисления для каждого отдельного источника аудиосигнала (например, источника 16x объекта, 2x амбиофонии третьего порядка). Эти решения позволяют выполнять рендеринг динамичных сцен путем обновления фильтров, зависящих от положения и параметров реверберация, но они не позволяют динамически добавлять/удалять источники во время работы.Most audio renderers use a specified fixed signal processing structure (a block diagram applied to multiple channels, see e.g. [1]) with a fixed computation time budget for each individual audio source (e.g., 16x object source, 2x third order ambiophony). These solutions allow you to render dynamic scenes by updating filters based on position and reverb parameters, but they do not allow you to dynamically add/remove sources at runtime.

Кроме того, фиксированная архитектура обработка сигнала может быть недостаточно эффективной при рендеринге сложных сцен, поскольку требует однотипной обработки большого количества источников. Более новые принципы рендеринга облегчают принципы кластеризации и уровня детализации (LOD), где, в зависимости от восприятия, источники объединяются и выполняется их рендеринг с другой обработкой сигнала. Кластеризация источника (см. [2]) позволяет модулям рендеринга обрабатывать сложные сцены с сотнями объектов. В такой конфигурации, кластерный бюджет все еще фиксирован, что может приводить к слышимым артефактам обширной кластеризации в сложных сценах.In addition, a fixed signal processing architecture may not be efficient enough when rendering complex scenes, since it requires the same type of processing of a large number of sources. Newer rendering principles facilitate clustering and level of detail (LOD) principles, where, depending on perception, sources are combined and rendered with different signal processing. Source clustering (see [2]) allows renderers to process complex scenes with hundreds of objects. In this configuration, the cluster budget is still fixed, which can lead to audible extensive clustering artifacts in complex scenes.

Задача настоящего изобретения состоит в создании улучшенного принципа рендеринга аудиосцены. An object of the present invention is to provide an improved audio scene rendering principle.

Эта задача решается устройством для рендеринга звуковой сцены по пункту 1 формулы или способом рендеринга звуковой сцены по пункту 21 формулы, или компьютерной программой по пункту 22 формулы. This problem is solved by a sound stage rendering device according to claim 1, or a sound stage rendering method according to claim 21, or a computer program according to claim 22.

Настоящее изобретение основано на понимании того, что в целях рендеринга сложной звуковой сцены с большим количеством источников в окружении, где могут частые происходить изменения звуковой сцены, полезна конвейероподобная архитектура рендеринга. Конвейероподобная архитектура рендеринга содержит первый каскад конвейера, содержащий первый слой управления и переконфигурируемый первый процессор аудиоданных. Кроме того, предусмотрен второй каскад конвейера, который располагается, относительно потока конвейера, после первого каскада конвейера. Этот второй каскад конвейера, опять же, содержит второй слой управления и переконфигурируемый второй процессор аудиоданных. Как первый, так и второй каскады конвейера выполнены с возможностью работы в соответствии с некоторой конфигурацией переконфигурируемого первого процессора аудиоданных в некоторый момент времени во время обработки. Для управления конвейерной архитектурой, предусмотрен центральный контроллер для управления первым слоем управления и вторым слоем управления. Управление осуществляется в ответ на звуковую сцену, т.е. в ответ на первоначальную звуковую сцену или изменение звуковой сцены. The present invention is based on the understanding that for the purpose of rendering a complex sound stage with a large number of sources in an environment where frequent sound stage changes may occur, a pipeline-like rendering architecture is useful. The pipeline-like rendering architecture includes a first pipeline stage containing a first control layer and a reconfigurable first audio data processor. In addition, a second conveyor stage is provided, which is located, relative to the conveyor flow, after the first conveyor stage. This second pipeline stage, again, contains a second control layer and a reconfigurable second audio data processor. Both the first and second stages of the pipeline are configured to operate in accordance with some configuration of the reconfigurable first audio data processor at some point in time during processing. To control the conveyor architecture, a central controller is provided to control the first control layer and the second control layer. Control is carried out in response to the sound stage, i.e. in response to the original sound stage or a change in sound stage.

Для достижения синхронизации операций устройства между всеми каскадами конвейера, и в случае необходимости переконфигурирования первого или второго переконфигурируемого процессора аудиоданных, центральный контроллер управляет слоями управления каскадов конвейера таким образом, что первый слой управления или вторым слоем управления подготавливает другую конфигурацию, например, вторую конфигурацию первого или второго переконфигурируемого процессора аудиоданных во время или после операции переконфигурируемого процессора аудиоданных в первой конфигурации. Поэтому новая конфигурация для переконфигурируемого первого или второго процессора аудиоданных подготавливается, пока переконфигурируемый процессор аудиоданных, принадлежащий этому каскаду конвейера, все еще оперирует в соответствии с другой конфигурацией или выполнен в другой конфигурации в случае задачи обработки с более ранней конфигурацией уже выполнена. Чтобы гарантировать, что оба каскады конвейера оперируют синхронно для получения так называемой «атомарной операции» или «атомарных обновлений», центральный контроллер управляет первым и вторым слоями управления с использованием управления переключением для переконфигурирования переконфигурируемого первого процессора аудиоданных или переконфигурируемого второго процессора аудиоданных в еще одну конфигурацию в некоторый момент времени. Даже когда переконфигурируется один-единственный каскад конвейера, варианты осуществления настоящего изобретения тем не менее гарантируют, что вследствие управления переключением в некоторый момент времени, правильные данные выборки аудиосигнала обрабатываются в рабочем потоке аудиосигнала благодаря обеспечению входных или выходных буферов аудиопотока, включенных в соответствующие списки рендеринга.To achieve synchronization of device operations between all pipeline stages, and if it is necessary to reconfigure the first or second reconfigurable audio data processor, the central controller controls the control layers of the pipeline stages such that the first control layer or the second control layer prepares a different configuration, for example, a second configuration of the first or a second reconfigurable audio data processor during or after an operation of the reconfigurable audio data processor in the first configuration. Therefore, a new configuration for the reconfigurable first or second audio data processor is prepared while the reconfigurable audio data processor belonging to this pipeline stage is still operating according to a different configuration, or is executed in a different configuration in the case of a processing task with an earlier configuration having already been executed. To ensure that both stages of the pipeline operate synchronously to produce so-called "atomic operation" or "atomic updates", the central controller controls the first and second control layers using switching control to reconfigure the reconfigurable first audio data processor or the reconfigurable second audio data processor into yet another configuration at some point in time. Even when a single pipeline stage is reconfigured, embodiments of the present invention still ensure that, due to switching control at some point in time, the correct audio sample data is processed in the audio worker thread by providing input or output audio stream buffers included in the corresponding rendering lists.

Предпочтительно, устройство для рендеринга звуковой сцены имеет больше каскадов конвейера, чем первый и второй каскады конвейера, но уже в системе с первым и вторым каскадами конвейера и без дополнительного каскада конвейера, синхронизированное переключение каскадов конвейера в ответ на управление переключением необходимо для получения улучшенной операции рендеринга высококачественного аудиосигнала, которая, в то же время, является очень гибкой. Preferably, the sound stage renderer has more pipeline stages than the first and second pipeline stages, but in a system with the first and second pipeline stages and no additional pipeline stage, synchronized switching of the pipeline stages in response to the switching control is necessary to obtain improved rendering operation high-quality audio signal, which, at the same time, is very flexible.

В частности, в сложных сценах виртуальной реальности, где пользователь может двигаться в трех направлениях и где, дополнительно, пользователь может двигать головой в трех дополнительных направлениях, т.е. в ситуации с шестью степенями свободы (6 DoF), частые и внезапные изменения фильтров в конвейере рендеринга, например, для перехода от одной связанной с головой передаточной функции к другой, связанной с головой передаточной функции в случае движения головы слушателя или хождения слушателя требуется, чтобы происходило такое изменение связанных с головой передаточных функций. Particularly in complex virtual reality scenes where the user can move in three directions and where, additionally, the user can move the head in three additional directions, i.e. In a six degrees of freedom (6 DoF) situation, frequent and sudden changes to filters in the rendering pipeline, such as going from one head-related transfer function to another head-related transfer function in the case of listener head movement or listener walking, require that there was such a change in the transfer functions associated with the head.

Другие проблематичные ситуации с гибким рендерингом при высоком качестве состоят в том, что когда слушатель перемещается в сцене виртуальной или дополненной реальности, количество источников, подлежащих рендерингу все время изменяется. Это может, например, происходить ввиду того, что некоторые источники изображения становятся видимыми в некотором положении пользователя или ввиду того, что приходится принимать во внимание дополнительные дифракционные эффекты. Кроме того, другие процедуры состоят в том, что в некоторых ситуациях, кластеризация многих разных близкорасположенных источников возможна, хотя, когда пользователь приближается к этим источникам, кластеризация становится неосуществимой, поскольку пользователь настолько близок, что возникает необходимость в том, чтобы рендеринг каждого источника был выполнен в его отдельном положении. Таким образом, такие аудиосцены проблематичны в том, что изменять фильтры или изменять количество источников, подлежащих рендерингу, или, в общем случае, изменять параметры необходимо постоянно. С другой стороны, полезно распределять разные операции для рендеринга по разным каскадам конвейера, что позволяет осуществлять эффективный и высокоскоростной рендеринг, чтобы гарантировать, что достижимость рендеринга в реальном времени в сложных аудио-окружениях. Other problematic situations with flexible rendering at high quality are that as the listener moves around in a virtual or augmented reality scene, the number of sources to be rendered changes all the time. This may, for example, occur because certain image sources become visible at a certain position of the user or because additional diffraction effects have to be taken into account. Additionally, other procedures are that in some situations, clustering many different nearby sources is possible, although as the user gets closer to those sources, clustering becomes infeasible because the user is so close that it becomes necessary for each source to be rendered performed in its separate position. Thus, such audio scenes are problematic in that changing filters or changing the number of sources to be rendered, or, in general, changing parameters needs to be constantly changed. On the other hand, it is useful to distribute different rendering operations across different stages of the pipeline, allowing efficient and high-speed rendering to ensure that real-time rendering is achievable in complex audio environments.

Дополнительный пример значительного изменения параметра состоит в том, что как только пользователь приближается к источнику или источнику изображения, частотнозависимое ослабление с расстоянием и задержка на распространение изменяются с расстоянием между пользователем и источником звука. Аналогичным образом, частотнозависимые характеристики отражающей поверхности могут изменяться в зависимости от конфигурации между пользователем и отражающим объектом. Кроме того, в зависимости от того, находится ли пользователь ближе к дифракционному объекту или дальше от дифракционного объекта или под другим углом, частотнозависимые дифракционные характеристики также будут изменяться. Таким образом, если все эти задачи распределяются по разным каскадам конвейера, продолжение изменений этих каскадов конвейера должно быть возможно и должно осуществляться синхронно. Все это достигается посредством центрального контроллера, который управляет слоями управления каскадов конвейера для подготовки к новой конфигурации во время или после операции соответствующего конфигурируемого процессора аудиоданных в более ранней конфигурации. В ответ на управление переключением для всех каскадов в конвейере, обусловленное обновлением управления через управление переключением, переконфигурирование происходит в некоторый момент времени, одинаковый или по меньшей мере очень близкий для разных каскадов конвейера в устройстве для рендеринга звуковой сцены. An additional example of a significant parameter change is that as the user gets closer to the source or source of the image, the frequency-dependent attenuation with distance and propagation delay change with the distance between the user and the sound source. Likewise, the frequency-dependent characteristics of a reflective surface may vary depending on the configuration between the user and the reflective object. In addition, depending on whether the user is closer to the diffractive object or further from the diffractive object or at a different angle, the frequency-dependent diffraction characteristics will also change. Thus, if all these tasks are distributed across different pipeline stages, continued changes to these pipeline stages should be possible and should be done synchronously. All this is achieved through a central controller that manages the control layers of the pipeline stages to prepare for a new configuration during or after the operation of the corresponding configurable audio data processor in the earlier configuration. In response to the switching control for all stages in the pipeline caused by the control update via the switching control, reconfiguration occurs at some point in time that is the same or at least very close for the different stages of the pipeline in the sound stage renderer.

Настоящее изобретение имеет преимущество в том, что позволяет оперативно осуществлять высококачественную аурализацию слуховых сцен с динамически изменяющимися элементами, например, движущимися источниками и слушателями. Таким образом, настоящее изобретение способствует достижению перцепционно убедительных звуковых ландшафтов, которые оказывают значительное влияние на ощущение погруженности в виртуальную сцену. The present invention has the advantage of allowing high-quality auralization of auditory scenes with dynamically changing elements, such as moving sources and listeners, to be quickly performed. Thus, the present invention helps achieve perceptually compelling soundscapes that have a significant impact on the feeling of immersion in a virtual scene.

Варианты осуществления настоящего изобретения применяют отдельные и одновременные рабочие потоки, цепочки или процессы, которые очень хорошо подходят к ситуации рендеринга динамичных слуховых сцен. Embodiments of the present invention employ separate and simultaneous workflows, chains or processes that are very well suited to the situation of rendering dynamic auditory scenes.

1. Рабочий поток взаимодействия: манипулирование изменениями в виртуальной сцене (например, движением пользователя, взаимодействием с пользователем, анимациями сцены и т.д.), которые происходят в произвольные моменты времени.1. Interaction Workflow: Manipulating changes in a virtual scene (e.g., user movement, user interaction, scene animations, etc.) that occur at random times.

2. Рабочий поток управления: снимок текущего состояние виртуальной сцены приводит к обновлениям обработки сигнала и ее параметров.2. Workflow of control: a snapshot of the current state of the virtual scene leads to updates to the signal processing and its parameters.

3. Рабочий поток обработки: выполнение оперативной обработки сигнала, т.е. на основании кадра входных выборок вычисление соответствующего кадра выходных выборок.3. Processing workflow: performing operational signal processing, i.e. Based on the frame of input samples, calculating the corresponding frame of output samples.

Выполнение рабочего потока управления изменяется во время прогона, в зависимости от того, какие необходимые вычисления инициируют изменение, аналогично циклу кадров в визуальном вычислении. Предпочтительные варианты осуществления изобретения имеют преимущество в том, что такие вариации выполнения рабочего потока управления вовсе не оказывают негативного влияния на рабочий поток обработки, который одновременно выполняется в фоновом режиме. Поскольку аудиосигнал реального времени обрабатывается поблочно, допустимое время вычисления рабочего потока обработки обычно ограничивается несколькими миллисекундами. The execution of a work control thread changes during a run, depending on which necessary calculations are triggering the change, similar to the frame loop in visual computing. Preferred embodiments of the invention have the advantage that such variations in the execution of the control workflow do not at all have a negative impact on the processing workflow that is simultaneously running in the background. Because the real-time audio signal is processed block by block, the processing workflow computation time available is typically limited to a few milliseconds.

Рабочий поток обработки, который одновременно выполняется в фоновом режиме, обрабатывается первым и вторым переконфигурируемыми процессорами аудиоданных, и рабочий поток управления инициируется центральным контроллером и затем реализуется, на уровне каскада конвейера, слоями управления каскадов конвейера, параллельными фоновой операции рабочего потока обработки. Рабочий поток взаимодействия реализуется, на уровне устройств конвейерного рендеринга, посредством интерфейса центрального контроллера к внешним устройствам, например, модулю отслеживания головы или аналогичному устройству или управляется аудиосценой, имеющей движущийся источник или геометрию, которая представляет изменение звуковой сцены, а также изменение ориентации или местоположения пользователя, т.е. в общем случае положения пользователя. The processing worker thread, which is simultaneously running in the background, is processed by the first and second reconfigurable audio data processors, and the control worker thread is initiated by the central controller and then implemented, at the pipeline stage level, by pipeline stage control layers parallel to the background operation of the processing worker thread. The interaction workflow is implemented, at the rendering pipeline device level, by interfacing a central controller to external devices, such as a head tracker or similar device, or driven by an audio scene having a moving source or geometry that represents a change in the audio scene, as well as a change in user orientation or location , i.e. in general, the user's position.

Преимущество настоящего изобретения состоит в том, что множественные объекты в сцене могут изменяться когерентно, и выбираться синхронно вследствие управляемой из центра процедуры управления переключением. Кроме того, эта процедура допускает так называемые атомарные обновления множественных элементов, которые должны поддерживаться рабочим потоком управления и рабочим потоком обработки, чтобы не прерывать обработку аудиосигнала вследствие изменений на самом высоком уровне, т.е. в рабочем потоке взаимодействия или на промежуточном уровне, т.е. рабочем потоке управления. An advantage of the present invention is that multiple objects in a scene can be changed coherently, and selected synchronously due to a centrally controlled switching control procedure. In addition, this procedure allows for so-called atomic updates of multiple elements, which must be supported by the control worker thread and the processing worker thread so as not to interrupt audio processing due to changes at the highest level, i.e. in the interaction workflow or at an intermediate level, i.e. work flow of control.

Предпочтительные варианты осуществления настоящего изобретения относятся к устройству для рендеринга звуковой сцены, реализующему модульный конвейер рендеринга аудиосигнала, где необходимые этапы аурализации виртуальных слуховых сцен делятся на несколько каскадов, каждый из которых независимо отвечает за некоторые перцептивные эффекты. Отдельное разбиение на по меньшей мере два или, предпочтительно, еще больше отдельных каскадов конвейера зависит от применения и предпочтительно задается автором системы рендеринга, что проиллюстрировано ниже. Preferred embodiments of the present invention relate to an audio scene rendering apparatus implementing a modular audio rendering pipeline where the necessary auralization stages of virtual auditory scenes are divided into several stages, each of which is independently responsible for certain perceptual effects. The individual breakdown into at least two, or preferably even more, individual pipeline stages is application dependent and is preferably specified by the author of the rendering system, as illustrated below.

Настоящее изобретение предусматривает общую структуру для конвейера рендеринга, которая облегчает параллельную обработку и динамическое переконфигурирование параметров обработки сигнала в зависимости от текущего состояния виртуальной сцены. В этом процессе, варианты осуществления настоящего изобретения гарантируютThe present invention provides a general structure for a rendering pipeline that facilitates parallel processing and dynamic reconfiguration of signal processing parameters depending on the current state of the virtual scene. In this process, embodiments of the present invention ensure

a) что каждый каскад может динамически изменять свою обработку DSP (например, количество каналов, обновленные коэффициенты фильтрации) без создания слышимых артефактов, и что любое обновление конвейера рендеринга, на основании последних изменений в сцене, обрабатывается синхронно и, при необходимости, атомарноa) that each stage can dynamically change its DSP processing (e.g. number of channels, updated filter coefficients) without creating audible artifacts, and that any update to the rendering pipeline, based on recent changes to the scene, is processed synchronously and, if necessary, atomically

b) что изменения в сцене (например, перемещение слушателя) могут приниматься в произвольные моменты времени и не влиять на оперативную производительность системы и, в частности, обработку DSP, и b) that changes in the scene (eg listener movement) can be received at random times and not affect the operational performance of the system and, in particular, DSP processing, and

c) что отдельные каскады могут извлекать пользу из функциональности других каскадов в конвейере (например, унифицированной направленности рендеринга для первичных источников и источников изображения или непрозрачной кластеризации для снижения сложности).c) that individual cascades can benefit from the functionality of other cascades in the pipeline (eg, unified rendering directionality for primary and image sources, or opaque clustering to reduce complexity).

Далее предпочтительные варианты осуществления настоящего изобретения рассмотрены с обращением к сопровождающим чертежам, на которых: Next, preferred embodiments of the present invention are discussed with reference to the accompanying drawings, in which:

фиг. 1 иллюстрирует вход/выход каскада рендеринга;fig. 1 illustrates the input/output of the rendering stage;

фиг. 2 иллюстрирует переход состояния элементов рендеринга;fig. 2 illustrates the state transition of rendering elements;

фиг. 3 иллюстрирует обзор конвейера рендеринга;fig. 3 illustrates an overview of the rendering pipeline;

фиг. 4 иллюстрирует пример структуры конвейера аурализации виртуальной реальности;fig. 4 illustrates an example of the structure of a virtual reality auralization pipeline;

фиг. 5 иллюстрирует предпочтительную реализацию устройства для рендеринга звуковой сцены;fig. 5 illustrates a preferred implementation of an apparatus for rendering a sound stage;

фиг. 6 иллюстрирует пример реализации для изменения метаданных для существующих элементов рендеринга;fig. 6 illustrates an example implementation for changing metadata for existing rendering elements;

фиг. 7 иллюстрирует другой пример для сокращения элементов рендеринга, например, путем кластеризации; fig. 7 illustrates another example for reducing rendering elements, for example, by clustering;

фиг. 8 иллюстрирует другой пример реализации для добавления новых элементов рендеринга например для ранних отражений; и fig. 8 illustrates another example implementation for adding new rendering elements such as early reflections; And

фиг. 9 иллюстрирует блок-схему операций для демонстрации потока управления из события высокого уровня, которое является аудиосценой (изменением) до плавного усиления или плавного ослабления низкого уровня старых или новых элементов или плавного изменения фильтров или параметров. fig. 9 illustrates a flowchart for demonstrating control flow from a high level event that is an audio scene (change) to low level fade-in or fade-out of old or new elements or filter or parameter fade-in.

Фиг. 5 иллюстрирует устройство для рендеринга звуковой сцены или аудиосцены, принятой центральным контроллером 100. Устройство содержит первый каскад 200 конвейера с первым слоем 201 управления и переконфигурируемым первым процессором 202 аудиоданных. Кроме того, устройство содержит второй каскад 300 конвейера, расположенный, относительно потока конвейера, после первого каскада 200 конвейера. Второй каскад 300 конвейера может располагаться сразу после первого каскада 200 конвейера или может располагаться таким образом, что один или более каскадов конвейера находятся между каскадом 300 конвейера и каскадом 200 конвейера. Второй каскад 300 конвейера содержит второй слой 301 управления и переконфигурируемый второй процессор 302 аудиоданных. Кроме того, проиллюстрирован необязательный n-й каскад 400 конвейера, который содержит n-й слой 401 управления и переконфигурируемый n-й процессор 402 аудиоданных. В иллюстративном варианте осуществления на фиг. 5, результатом каскада 400 конвейера является уже рендерированная аудиосцена, т.е. результат полной обработки аудиосцены или изменений аудиосцены, поступивших на центральный контроллер 100. Центральный контроллер 100 выполнен с возможностью управления первым слоем 201 управления и вторым слоем 301 управления в ответ на звуковую сцену. Fig. 5 illustrates an apparatus for rendering a sound stage or audio scene received by a central controller 100. The apparatus includes a first pipeline stage 200 with a first control layer 201 and a reconfigurable first audio data processor 202. In addition, the device includes a second conveyor stage 300 located, relative to the conveyor flow, after the first conveyor stage 200. The second conveyor stage 300 may be located immediately after the first conveyor stage 200 or may be located such that one or more conveyor stages are located between the conveyor stage 300 and the conveyor stage 200. The second pipeline stage 300 includes a second control layer 301 and a reconfigurable second audio data processor 302. Also illustrated is an optional nth pipeline stage 400, which includes an nth control layer 401 and a reconfigurable nth audio data processor 402. In the illustrative embodiment of FIG. 5, the result of the pipeline stage 400 is an already rendered audio scene, i.e. the result of complete audio scene processing or audio scene changes received by the central controller 100. The central controller 100 is configured to control the first control layer 201 and the second control layer 301 in response to the sound scene.

В ответ на звуковую сцену означает в ответ на ввод полной сцены в некоторый момент времени инициализации или начальный момент времени или в ответ на изменения звуковой сцены, которые, совместно с предыдущей сценой, существовавшей до изменений звуковой сцены опять же, представляют полную звуковую сцену, которая подлежит обработке центральным контроллером 100. В частности, центральный контроллер 100 управляет первым и вторым слоями управления и при наличии, любыми другими слоями управления, например, n-ым слоем 401 управления, благодаря чему подготавливается новая или вторая конфигурация первого, второго и/или n-го переконфигурируемого процессора аудиоданных, пока соответствующий переконфигурируемый процессор аудиоданных действует в фоновом режиме в соответствии с более ранней или первой конфигурацией. Для этого фонового режима не критично, все ли еще действует переконфигурируемый процессор аудиоданных, т.е., принимает входные выборки и вычисляет выходные выборки. Напротив, возможна ситуация, что некоторый каскад конвейера уже выполнил свои задачи. Таким образом, подготовка новой конфигурации происходит во время или после операции соответствующего переконфигурируемого процессора аудиоданных в более ранней конфигурации. In response to a sound stage means in response to the input of a complete scene at some initialization time or starting time or in response to sound stage changes which, together with the previous scene that existed before the sound stage changes again, represent a complete sound stage that be processed by the central controller 100. In particular, the central controller 100 controls the first and second control layers and, if present, any other control layers, for example, the nth control layer 401, thereby preparing a new or second configuration of the first, second and/or n -th reconfigurable audio data processor while the corresponding reconfigurable audio data processor operates in the background according to the earlier or first configuration. For this background mode, it is not critical whether the reconfigurable audio data processor is still operating, i.e., accepting input samples and computing output samples. On the contrary, it is possible that some pipeline cascade has already completed its tasks. Thus, the preparation of a new configuration occurs during or after the operation of the corresponding reconfigurable audio data processor in the earlier configuration.

Чтобы гарантировать, что атомарные обновления отдельных каскадов 200, 300, 400 конвейера возможны, центральный контроллер выводит управление 110 переключением для переконфигурирования отдельного переконфигурируемого первого или второго процессора аудиоданных в некоторый момент времени. В зависимости от конкретного применения или изменения звуковой сцены, один-единственный каскад конвейера может переконфигурироваться в некоторый момент времени, или два каскада конвейера, например, каскады 200, 300 конвейера оба переконфигурируется в некоторый момент времени или все каскады конвейера полного устройства для рендеринга звуковой сцены или только подгруппа, имеющая более двух каскадов конвейера, но меньше, чем все каскады конвейера, также могут быть снабжены управлением переключением для переконфигурирования в некоторый момент времени. Для этого центральный контроллер 100 имеет линию управления к каждому слою управления соответствующего каскада конвейера помимо соединения рабочего потока обработки, последовательно, соединяющего каскады конвейера. Кроме того, соединение рабочего потока управления, которое рассмотрено ниже, может обеспечиваться также через первую структуру для центрального управления 110 переключением. Однако в предпочтительных вариантах осуществления рабочий поток управления также осуществляется через последовательное соединение между каскадами конвейера таким образом, что центральное соединение между каждым слоем управления отдельного каскада конвейера и центральным контроллером 100 зарезервировано только для управления 110 переключением для получения атомарных обновлений и, таким образом, правильного и высококачественного рендеринга аудиосигнала даже в сложных окружениях. To ensure that atomic updates to individual pipeline stages 200, 300, 400 are possible, the central controller outputs switching control 110 to reconfigure the individual reconfigurable first or second audio data processor at some point in time. Depending on the specific application or soundstage change, a single pipeline stage may be reconfigured at some point in time, or two pipeline stages, e.g., pipeline stages 200, 300 are both reconfigured at some point in time, or all pipeline stages of a complete sound stage rendering device or only a subset having more than two pipeline stages but fewer than all pipeline stages may also be provided with switching control for reconfiguration at some point in time. To this end, the central controller 100 has a control line to each control layer of the corresponding conveyor stage in addition to connecting the processing workflow sequentially connecting the conveyor stages. In addition, the control workflow connection, which is discussed below, can also be provided through the first structure for central switching control 110. However, in preferred embodiments, the control workflow is also carried out through a serial connection between pipeline stages such that the central connection between each control layer of an individual pipeline stage and the central controller 100 is reserved only for switching control 110 to obtain atomic updates and thus correct and high-quality audio rendering even in complex environments.

В следующей разделе описан общий конвейер рендеринга аудиосигнала, состоящий из независимых каскадов рендеринга, каждый с отдельными, синхронизированными рабочими потоками управления и обработки (фиг. 1). Наличие центрального контроллера гарантирует возможность одновременного атомарного обновления всех каскадов в конвейере.The next section describes a general audio rendering pipeline consisting of independent rendering stages, each with separate, synchronized control and processing worker threads (Figure 1). The presence of a central controller guarantees the possibility of simultaneous atomic updates of all stages in the pipeline.

Каждый каскад рендеринга имеет управляющую часть и обработочную часть с отдельными входами и выходами, соответствующими рабочему потоку управления и обработки соответственно. В конвейере, выходы одного каскада рендеринга являются входами следующего каскада рендеринга, тогда как общий интерфейс гарантирует возможность реорганизации и замены каскадов рендеринга, в зависимости от применения.Each rendering stage has a control part and a processing part, with separate inputs and outputs corresponding to the control and processing workflow, respectively. In a pipeline, the outputs of one render stage are the inputs of the next render stage, while a common interface ensures that render stages can be reorganized and replaced depending on the application.

Этот общий интерфейс описан как плоский список элементов рендеринга, который поступает на каскад рендеринга в рабочем потоке управления. Элемент рендеринга объединяет инструкции обработки (т.е. метаданные, например, положение, ориентацию, частотную коррекцию и т.д.) с буфером аудиопотока (одно- или многоканальным). Отображение буферов в элементы рендеринга является произвольным, поэтому несколько элементов рендеринга могут относиться к одному и тому же буферу.This common interface is described as a flat list of rendering elements that is fed to the rendering stage in the control worker thread. The rendering element combines processing instructions (i.e. metadata such as position, orientation, frequency equalization, etc.) with an audio stream buffer (single- or multi-channel). The mapping of buffers to render elements is arbitrary, so multiple render elements can refer to the same buffer.

Каждый каскад рендеринга гарантирует, что следующие каскады могут считывать правильные выборки аудиосигнала из буферов аудиопотока, соответствующих соединенным элементам рендеринга на скорости рабочего потока обработки. С этой целью, каждый каскад рендеринга создает диаграмму обработки из информации в элементах рендеринга, которая описывает необходимые этапы DSP и его входной и выходной буферы. Дополнительные данные могут потребоваться для построения диаграммы обработки (например, геометрии в сцене или персонифицированных наборов HRIR) и обеспечиваются контроллером. Диаграммы обработки выравниваются для синхронизации и передаются в рабочий поток обработки одновременно для всех каскадов рендеринга, после прохождения обновления управления через весь конвейер. Обмен диаграмм обработки инициируется безотносительно к частоте оперативных аудиоблоков, тогда как отдельные каскады должны гарантировать, что обмен не приводит к сколько-нибудь слышимым артефактам. Если каскад рендеринга только действует на метаданных, рабочий поток DSP может быть нерабочим.Each rendering stage ensures that subsequent stages can read the correct audio samples from the audio stream buffers corresponding to the connected rendering elements at the speed of the processing worker thread. To this end, each render stage creates a processing diagram from the information in the render elements that describes the necessary stages of the DSP and its input and output buffers. Additional data may be required to build the processing diagram (for example, geometry in the scene or personalized HRIR sets) and is provided by the controller. Processing diagrams are aligned for synchronization and passed to the processing worker thread simultaneously for all render stages, after control updates have passed through the entire pipeline. The exchange of processing diagrams is initiated without regard to the frequency of the operational audio units, while the individual stages must ensure that the exchange does not lead to any audible artifacts. If the rendering cascade only acts on metadata, the DSP worker thread may not work.

Контроллер поддерживает список элементов рендеринга, соответствующий фактическим источникам аудиосигнала в виртуальной сцене. В рабочем потоке управления, контроллер начинает новое обновление управления, передавая новый список элементов рендеринга на первый каскад рендеринга, атомарно накапливая все изменения метаданных, обусловленные взаимодействием с пользователем и другими изменениями в виртуальной сцене. Обновления управления инициируются с фиксированной частотой, которая может зависеть от имеющихся вычислительных ресурсов, но только после окончания предыдущего обновления. Каскад рендеринга создает новый список выходных элементов рендеринга из входного списка. В этом процессе, он может изменять существующие метаданные (например, добавлять характеристику частотной коррекции), а также добавлять новые и деактивировать или удалять существующие элементы рендеринга. Элементы рендеринга следуют заданному жизненному циклу (фиг. 2), который передается через индикатор состояния на каждом элементе рендеринга (например, «активировать», «деактивировать», «активный», «неактивный»). Это позволяет последующим каскадам рендеринга обновлять свои диаграммы DSP согласно вновь созданным или устаревшим элементам рендеринга. Безартефактовое плавное усиление и плавное ослабление элементов рендеринга при изменении состояния обрабатываются контроллером.The controller maintains a list of rendering elements corresponding to the actual audio sources in the virtual scene. In the control workflow, the controller begins a new control update by passing a new list of render elements to the first render stage, atomically accumulating all metadata changes due to user interaction and other changes in the virtual scene. Control updates are initiated at a fixed frequency, which may depend on available computing resources, but only after the previous update has completed. The render cascade creates a new list of output render elements from the input list. In this process, it can modify existing metadata (for example, add a frequency equalization characteristic), as well as add new ones and deactivate or delete existing rendering elements. Render elements follow a specified life cycle (Fig. 2), which is conveyed through a status indicator on each render element (e.g., “activate”, “deactivate”, “active”, “inactive”). This allows subsequent rendering stages to update their DSP diagrams according to newly created or obsolete rendering elements. Artifact-free fade-in and fade-out of rendering elements as state changes are handled by the controller.

В оперативном применении, рабочий поток обработки инициируется обратным вызовом от аудио-оборудования. Когда запрашивается новый блок выборок, контроллер заполняет буферы элементов рендеринга, которые он поддерживает, входными выборками (например, с диска или из входящих аудиопотоков). Затем контроллер последовательно инициирует обработочную часть каскадов рендеринга, которые действуют на буферах аудиопотока согласно их текущим диаграммам обработки.In operational use, the processing worker thread is initiated by a callback from the audio hardware. When a new block of samples is requested, the controller fills the buffers of the render elements it supports with input samples (for example, from disk or from incoming audio streams). The controller then sequentially initiates the processing portion of the rendering stages, which act on the audio stream buffers according to their current processing patterns.

Конвейер рендеринга может содержать один или более преобразователей в пространственную область (фиг. 3), аналогичных каскаду рендеринга, но выходной сигнал их обработочной части является смешанным представлением всей виртуальной слуховой сцены, описанной окончательным списком элементов рендеринга, и может напрямую воспроизводиться согласно указанному способу воспроизведения (например, через стереофонические наушники или многоканальные акустические системы). Однако дополнительные каскады рендеринга могут следовать после преобразователя в пространственную область (например, для ограничения динамического диапазона выходного сигнала).The rendering pipeline may contain one or more spatial domain transformers (Fig. 3), similar to the rendering stage, but the output of their processing part is a mixed representation of the entire virtual auditory scene described by the final list of rendering elements, and can be directly reproduced according to the specified reproduction method ( for example, through stereo headphones or multi-channel speaker systems). However, additional rendering stages may follow the spatial domain converter (for example, to limit the dynamic range of the output signal).

Преимущества предложенного решенияAdvantages of the proposed solution

По сравнению с уровнем техники, конвейер рендеринга аудиосигнала, отвечающий изобретению, может гибко обрабатывать высокодинамичные сцены для адаптации обработки к разным аппаратным или пользовательским требованиям. В этом разделе перечислено несколько преимуществ над традиционными способами.Compared with the prior art, the audio rendering pipeline of the invention can flexibly process highly dynamic scenes to adapt the processing to different hardware or user requirements. This section lists several advantages over traditional methods.

- новые аудио-элементы можно добавлять в виртуальную сцену и удалять из нее ненужные во время прогона.- new audio elements can be added to the virtual scene and unnecessary ones can be removed from it during the run.

Аналогичным образом, каскады рендеринга могут динамически регулировать уровень детализации их рендеринга на основании имеющихся вычислительных ресурсов и перцептивных требований.Likewise, rendering stages can dynamically adjust the level of detail of their rendering based on available computational resources and perceptual requirements.

- в зависимости от применения, каскады рендеринга можно переупорядочивать или новые каскады рендеринга можно вставлять в произвольных положениях в конвейере (например, каскад кластеризации или визуализации), не изменяя другие части программного обеспечения. Отдельные реализации каскада рендеринга можно изменять без необходимости изменять другие каскады рендеринга.- Depending on the application, render stages can be reordered or new render stages can be inserted at arbitrary positions in the pipeline (for example, a clustering or rendering stage) without changing other parts of the software. Individual implementations of a render cascade can be changed without having to change other render cascades.

- множественные преобразователи в пространственную область могут совместно использовать общий конвейер обработки, позволяющий осуществлять рендеринг, например, на многопользовательских установках VR или с помощью наушников и, параллельно, громкоговорителей с минимальными вычислительными затратами.- Multiple spatial domain converters can share a common processing pipeline, allowing rendering, for example, on multi-user VR setups or via headphones and speakers in parallel, with minimal computational overhead.

- изменения в виртуальной сцене (например, обусловленные устройством высокочастотного отслеживания головы) накапливаются с динамически регулируемой частотой управления, снижая вычислительные затраты, например, для переключения фильтра. В то же время, обновления сцены, которые явно требуют атомарности (например, параллельное движение источников аудиосигнала) гарантированно выполняются одновременно по всем каскадам рендеринга.- changes in the virtual scene (for example, caused by a high-frequency head tracking device) are accumulated with a dynamically adjustable control frequency, reducing the computational cost of, for example, filter switching. At the same time, scene updates that explicitly require atomicity (such as parallel movement of audio sources) are guaranteed to occur simultaneously across all rendering stages.

- скорость управления и обработки можно регулировать по отдельности, на основании требований пользователя и оборудования (звуковоспроизведения).- Control and processing speed can be adjusted separately, based on user and equipment (sound reproduction) requirements.

ПримерExample

Практический пример конвейера рендеринга для создания виртуальных акустических окружений для приложений VR может содержать следующее каскады рендеринга в заданном порядке (см. также фиг. 4):A practical example of a rendering pipeline for creating virtual acoustic environments for VR applications might contain the following rendering stages in a given order (see also Fig. 4):

1. Передача: упрощение сложной сцены с множественными сопряженными подпространствами путем понижающего микширования сигналов и реверберирования отдаленных от слушателя частей с образованием единого элемента рендеринга (возможно с пространственной протяженностью).1. Transmission: simplifying a complex scene with multiple conjugate subspaces by downmixing signals and reverberating parts distant from the listener to form a single rendering element (possibly with spatial extent).

Обработочная часть: понижающее микширование сигналов в объединенные буферы аудиопотока и обработка выборок аудиосигнала традиционными методами для создания поздней реверберации.Processing part: downmixing signals into combined audio stream buffers and processing audio samples using traditional methods to create late reverberation.

2. Протяженность: рендеринг перцептивного эффекта пространственно протяженных источников звука путем создания множественных, пространственно обособленных элементов рендеринга.2. Extension: Rendering the perceptual effect of spatially extended sound sources by creating multiple, spatially distinct rendering elements.

Обработочная часть: распределение входного аудиосигнала по нескольким буферам для новых элементов рендеринга (возможно, с дополнительной обработкой наподобие декорреляции).Processing part: distribution of the input audio signal across several buffers for new rendering elements (possibly with additional processing like decorrelation).

3. Ранние отражения: включение перцепционно соответствующих геометрических отражений на поверхностях путем создания репрезентативных элементов рендеринга с соответствующей частотной коррекцией и метаданными положения. 3. Early Reflections: Enables perceptually relevant geometric reflections on surfaces by creating representative rendering elements with appropriate frequency correction and position metadata.

Обработочная часть: распределение входного аудиосигнала по нескольким буферам для новых элементов рендеринга.Processing part: distribution of the input audio signal across several buffers for new rendering elements.

4. Кластеризация: объединение множественных элементов рендеринга с перцепционно неотличимыми положениями в единый элемент рендеринга для снижения вычислительной сложности последующих каскадов.4. Clustering: combining multiple render elements with perceptually indistinguishable positions into a single render element to reduce the computational complexity of subsequent cascades.

Обработочная часть: понижающее микширование сигналов в объединенные буферы аудиопотока.Processing part: downmixing of signals into combined audio stream buffers.

5. Дифракция: добавление перцептивных эффектов заграждения и дифракции путей распространения за счет геометрии.5. Diffraction: Adding perceptual blocking and diffraction effects to propagation paths through geometry.

6. Распространение: рендеринг перцептивных эффектов на пути распространения (например, зависящих от направления характеристик излучения, поглощения в среде, задержки на распространение и т.д.)6. Propagation: Rendering of perceptual effects along the propagation path (e.g. direction-dependent emission characteristics, absorption in the medium, propagation delay, etc.)

Обработочная часть: фильтрация, линии дробной задержки и т.д.Processing part: filtering, fractional delay lines, etc.

7. Стереофонический преобразователь в пространственную область: рендеринг оставшихся элементов рендеринга в слушателецентричный стереофонический звуковой выход.7. Stereo to spatial domain converter: Renders the remaining rendering elements into a listener-centric stereo audio output.

Обработочная часть: фильтрация HRIR, понижающее микширование и т.д.Processing part: HRIR filtering, downmixing, etc.

Далее на фиг. 1-4 описаны другими словами. Фиг. 1 иллюстрирует, например, первый каскад 200 конвейера также обозначается как «каскад рендеринга», который содержит слой 201 управления, указанный как «контроллер» на фиг. 1 и переконфигурируемый первый процессор 202 аудиоданных, указанный как “DSP” (цифровой сигнальный процессор). Однако каскад конвейера или каскад 200 рендеринга на фиг. 1 также может рассматриваться как второй каскад 300 конвейера на фиг. 1 или n-й каскад 400 конвейера на фиг. 5. Next in FIG. 1-4 are described in different words. Fig. 1 illustrates, for example, the first pipeline stage 200, also referred to as the "rendering stage", which contains a control layer 201, referred to as the "controller" in FIG. 1 and a reconfigurable first audio data processor 202, designated “DSP” (Digital Signal Processor). However, the pipeline stage or rendering stage 200 in FIG. 1 can also be viewed as the second conveyor stage 300 of FIG. The 1 or nth conveyor stage 400 in FIG. 5.

Каскад 200 конвейера принимает, в качестве входного сигнала через входной интерфейс, входной список 500 рендеринга и выводит, через выходной интерфейс, выходной список 600 рендеринга. В случае непосредственно последующего соединения второго каскада 300 конвейера на фиг. 5, входной список рендеринга для второго каскада 300 конвейера станет выходным списком 600 рендеринга первого каскада 200 конвейера, поскольку каскады конвейера последовательно соединены для потока конвейера. The pipeline stage 200 receives, as input via an input interface, an input render list 500 and outputs, via an output interface, an output render list 600. In the case of the immediate subsequent connection of the second conveyor stage 300 in FIG. 5, the input render list for the second pipeline stage 300 will become the output render list 600 of the first pipeline stage 200 since the pipeline stages are serially connected for the pipeline thread.

Каждый список 500 рендеринга содержит выбор элементов рендеринга, проиллюстрированный столбцом во входном списке 500 рендеринга или выходном списке 600 рендеринга. Каждый элемент рендеринга содержит идентификатор 501 элемента рендеринга, метаданные 502 элемента рендеринга, указанные как “x” на фиг. 1, и один или более буферов аудиопотока в зависимости от того, сколько аудио-объектов или отдельных аудиопотоков принадлежит элементу рендеринга. Буферы аудиопотока обозначены “O” и предпочтительно реализованы посредством ссылок из памяти на фактические физические буферы в части текстовой памяти устройства для рендеринга звуковой сцены, которая, например, может управляться центральным контроллером или может управляться любым другим способом управления памятью. В качестве альтернативы, список рендеринга может содержать буферы аудиопотока, представляющие участки физической памяти, но предпочтительно реализовать буферы 503 аудиопотока как упомянутые ссылки на некоторую физическую память. Each rendering list 500 contains a selection of rendering elements, illustrated by a column in the input rendering list 500 or the output rendering list 600. Each render element contains a render element identifier 501, render element metadata 502, indicated as “x” in FIG. 1, and one or more audio stream buffers depending on how many audio objects or individual audio streams belong to the rendering element. Audio stream buffers are denoted “O” and are preferably implemented by memory references to actual physical buffers in the text memory portion of the sound stage rendering device, which, for example, may be controlled by a central controller or may be controlled by any other memory management method. Alternatively, the render list may contain audio stream buffers representing locations of physical memory, but it is preferable to implement audio stream buffers 503 as references to some physical memory.

Аналогичным образом, выходной список 600 рендеринга, опять же, имеет один столбец для каждого элемента рендеринга, и соответствующий элемент рендеринга идентифицируется идентификацией 601 элемента рендеринга, соответствующими метаданными 602 и буферами 603 аудиопотока. Метаданные 502 или 602 для элементов рендеринга могут содержать положение источника, тип источника, частотный корректор, связанный с тем или иным источником или, в общем случае, частотно-избирательное поведение, связанное с тем или иным источником. Таким образом, каскад 200 конвейера принимает, в качестве входного сигнала, входной список 500 рендеринга и формирует в качестве выходного сигнала выходной список 600 рендеринга. В DSP 202, значения выборки аудиосигнала, идентифицированные соответствующими буферами аудиопотока, обрабатываются по мере необходимости соответствующей конфигурацией переконфигурируемого процессора 202 аудиоданных, например, как указано некоторой диаграммой обработки, сформированной слоем 201 управления для цифрового сигнального процессора 202. Поскольку входной список 500 рендеринга содержит, например, три элемента рендеринга, и выходной список 600 рендеринга содержит, например, четыре элемента рендеринга, т.е. больше элементов рендеринга, чем входной, каскад 202 конвейера может осуществлять, например, повышающее микширование. Другая реализация может, например, состоять в том, что первый элемент рендеринга с четырьмя аудиосигналами микшируется с понижением в элемент рендеринга с единым каналом. Второй элемент рендеринга может оставаться незатронутым обработкой, т.е. может, например, копироваться только от входа к выходу, и третий элемент рендеринга также может, например, оставаться незатронутым каскадом рендеринга. Только последний выходной элемент рендеринга в выходном списке 600 рендеринга не может формироваться DSP, например, путем объединения второго и третьего элементов рендеринга из входного списка 500 рендеринга в единый выходной аудиопоток для соответствующего буфера аудиопотока для четвертого элемента рендеринга выходного списка рендеринга. Likewise, output render list 600, again, has one column for each render element, and the corresponding render element is identified by render element identification 601, corresponding metadata 602, and audio stream buffers 603. Metadata 502 or 602 for rendering elements may include source position, source type, frequency equalizer associated with a particular source, or, more generally, frequency selective behavior associated with a particular source. Thus, the pipeline stage 200 takes as an input the input render list 500 and generates the output render list 600 as an output. At DSP 202, the audio sample values identified by the respective audio stream buffers are processed as needed by an appropriate configuration of the reconfigurable audio data processor 202, for example, as indicated by some processing diagram generated by the control layer 201 for the digital signal processor 202. Because the input render list 500 contains, e.g. , three render elements, and the output render list 600 contains, for example, four render elements, i.e. more rendering elements than the input, the pipeline stage 202 can perform, for example, upmixing. Another implementation could, for example, be that the first render element with four audio signals is downmixed into a single channel render element. The second rendering element may remain unaffected by processing, i.e. could, for example, only be copied from input to output, and the third render element could also, for example, remain unaffected by the rendering cascade. Only the last output render element in the output render list 600 may not be generated by the DSP, for example, by combining the second and third render elements from the input render list 500 into a single output audio stream for the corresponding audio stream buffer for the fourth render element of the output render list.

Фиг. 2 иллюстрирует диаграмму состояний для установления элемента рендеринга как «живой». Предпочтительно, чтобы соответствующее состояние диаграммы состояний также хранилось в метаданных 502 элемента рендеринга или в поле идентификации элемента рендеринга. В начальном узле 510 может осуществляться два разных способа активации. Один способ предусматривает нормальную активацию для перехода в состояние 511 активации. Другой способ предусматривает процедуру немедленной активации, чтобы сразу оказаться в активном состоянии 512. Различие между двумя процедурами состоит в том, что при переходе из состояния 511 активации в активное состояние 512 осуществляется процедура плавного усиления. Fig. 2 illustrates the state diagram for establishing a render element as "live". Preferably, the corresponding state of the statechart is also stored in the render element metadata 502 or in the render element identification field. At the start node 510, two different methods of activation may occur. One method involves normal activation to enter the activation state 511. Another method involves an immediate activation procedure to immediately be in the active state 512. The difference between the two procedures is that when transitioning from the activation state 511 to the active state 512, a smooth increase procedure is performed.

Если элемент рендеринга активен, он обрабатывается и и может либо немедленно деактивироваться, либо нормально деактивировали. В последнем случае, получается состояние 514 деактивации, и процедура плавного ослабления осуществляется для перехода из состояния 514 деактивации в неактивное состояние 513. В случае немедленной деактивации осуществляется прямой переход из состояния 512 в состояние 513. Из неактивного состояния возможно возвращение либо к немедленной реактивации, либо к инструкции реактивации, чтобы оказаться в состоянии 511 активации или, если не получена ни команда реактивации, ни команда немедленной реактивации, управление может переходить к расположенному выходному узлу 515. If a render element is active, it is processed and can either be deactivated immediately or deactivated normally. In the latter case, a deactivation state 514 is obtained, and a fade-out procedure is carried out to transition from the deactivation state 514 to an inactive state 513. In the case of immediate deactivation, a direct transition from state 512 to state 513 occurs. From the inactive state, it is possible to return to either immediate reactivation or to the reactivation instruction to be in the activation state 511 or, if neither the reactivation command nor the immediate reactivation command is received, control may pass to the located output node 515.

Фиг. 3 иллюстрирует обзор конвейера рендеринга, где аудиосцена проиллюстрирована в блоке 50, и где также проиллюстрированы отдельные потоки управления. Центральный поток управление переключением обозначен позицией 110. Рабочий поток 130 управления проиллюстрирован проходящим от контроллера 100 к первому каскаду 200 и оттуда, через соответствующую последовательную линию 120 рабочий поток управления. Таким образом, фиг. 3 иллюстрирует реализацию, где рабочий поток управления также поступает на начальный каскад конвейера и оттуда последовательно распространяется к последнему каскаду. Аналогичным образом, рабочий поток 120 обработки начинается от контроллера 120 через переконфигурируемые процессоры аудиоданных отдельных каскадов конвейера в оконечные каскады, где фиг. 3 иллюстрирует два оконечных каскада, каскад 400a вывода на громкоговоритель с преобразователем в пространственную область 1 или каскад 400b вывода на наушники с преобразователем в пространственную область M. Fig. 3 illustrates an overview of the rendering pipeline, where an audio scene is illustrated in block 50, and where individual control flows are also illustrated. The central switching control flow is indicated at 110. The control flow 130 is illustrated passing from the controller 100 to the first stage 200 and from there, through the corresponding serial line 120, the control flow. Thus, FIG. Figure 3 illustrates an implementation where the control workflow also enters the initial stage of the pipeline and from there sequentially propagates to the last stage. Likewise, the processing workflow 120 starts from the controller 120 through the reconfigurable audio data processors of the individual pipeline stages to the final stages, where FIG. 3 illustrates two final stages, a speaker output stage 400a with a spatial domain converter 1 or a headphone output stage 400b with a spatial domain converter M.

Фиг. 4 иллюстрирует конвейер рендеринга виртуальной реальности, имеющий представление 50 аудиосцены, контроллер 100 и, в качестве первого каскада конвейера, каскад 200 передачи конвейера. Второй каскад 300 конвейера реализуется как каскад протяженности рендеринга. Третий каскад 400 конвейера реализуется как каскад раннего отражения конвейера. Четвертый каскад конвейера реализуется как каскад 551 кластеризации конвейера. Пятый каскад конвейера реализуется как каскад 552 дифракции конвейера. Шестой каскад конвейера реализуется как каскад 553 распространения конвейера, и оконечный, седьмой, каскад 554 конвейера реализуется как стереофонический преобразователь в пространственную область для окончательного получения сигналов наушников для наушников, подлежащих ношению слушателем, ориентирующимся в аудиосцене виртуальной реальности или дополненной реальности. Fig. 4 illustrates a virtual reality rendering pipeline having an audio scene view 50, a controller 100, and, as a first pipeline stage, a pipeline transfer stage 200. The second pipeline stage 300 is implemented as a rendering extent stage. The third pipeline stage 400 is implemented as an early pipeline reflection stage. The fourth pipeline stage is implemented as a clustering pipeline stage 551. The fifth pipeline stage is implemented as a diffraction pipeline stage 552. The sixth pipeline stage is implemented as a propagation pipeline stage 553, and the final seventh pipeline stage 554 is implemented as a stereo-to-spatial domain converter for finally obtaining headphone signals for headphones to be worn by a listener navigating a virtual reality or augmented reality audio scene.

Далее, на фиг. 6, 7 и 8 проиллюстрированы и рассмотрены для некоторые примеры конфигурирования и переконфигурирования каскадов конвейера.Next, in FIG. 6, 7 and 8 illustrate and discuss some examples of configuring and reconfiguring pipeline stages.

Фиг. 6 иллюстрирует процедуру изменения метаданных для существующих элементов рендеринга. Fig. Figure 6 illustrates the procedure for changing metadata for existing rendering elements.

СитуацияSituation

Два объектных источника аудиосигнала представлены как два элемента рендеринга (RI). Каскад направленности отвечает за направленную фильтрацию сигнала источник звука. Каскад распространения отвечает за рендеринг задержки на распространение на основании расстояния до слушателя. Стереофонический преобразователь в пространственную область отвечает за стереофонию и понижающее микширование сцены до стереофонического сигнала.Two object audio sources are represented as two rendering elements (RIs). The directional cascade is responsible for directional filtering of the sound source signal. The propagation stage is responsible for rendering the propagation delay based on the distance to the listener. The stereo-to-spatial converter is responsible for stereophony and downmixing the scene to a stereo signal.

На некотором этапе управления положением RI изменяется по сравнению с предыдущими этапами управления, что требует изменений в обработке DSP каждого отдельного каскада. Акустический сцена должна обновляться синхронно, чтобы, например, перцептивный эффект изменения расстояния синхронизировался с перцептивным эффектом изменения угла падения относительно слушателя.At some position control stage, RI changes from previous control stages, requiring changes in the DSP processing of each individual stage. The acoustic scene must be updated synchronously so that, for example, the perceptual effect of a change in distance is synchronized with the perceptual effect of a change in the angle of incidence relative to the listener.

РеализацияImplementation

Список рендеринга распространяется через полный конвейер на каждом этапе управления. Во время выполнения этапа управления, параметры обработки DSP остаются постоянными для всех каскадов, пока последний каскад/преобразователь в пространственную область не обработает новый список рендеринга. После этого все каскады синхронно изменяют свои параметры DSP в начале следующего этапа DSP.The rendering list is propagated through a complete pipeline at each control stage. During the execution of the control stage, the DSP processing parameters remain constant for all stages until the last stage/spatial converter processes a new render list. All stages then synchronously change their DSP parameters at the start of the next DSP stage.

Каждый каскад отвечает за обновление параметров обработки DSP без заметных артефактов (например, плавный переход на выходе для обновлений фильтра FIR, линейную интерполяцию для линий задержки).Each stage is responsible for updating DSP processing parameters without noticeable artifacts (e.g., smooth output transition for FIR filter updates, linear interpolation for delay lines).

RI могут содержать поля для объединения метаданных. Таким образом, например, каскад направленности может не фильтровать сам сигнал, но может обновлять поле EQ в метаданных RI. Затем следующий каскад EQ применяет к сигналу объединенное поле EQ всех предыдущих каскадов.RIs may contain fields to aggregate metadata. Thus, for example, the directional stage may not filter the signal itself, but may update the EQ field in the RI metadata. The next EQ stage then applies the combined EQ field of all previous stages to the signal.

Ключевые преимуществаKey Benefits

- гарантированная атомарность изменений сцены (и по каскадам, и по RI)- guaranteed atomicity of scene changes (both cascades and RI)

- более крупные изменения конфигурации DSP не блокируют обработку аудиосигнала и синхронно выполняются по готовности всех каскадов /преобразователей в пространственную область - larger DSP configuration changes do not block audio signal processing and are performed synchronously when all stages / spatial domain converters are ready

- при четко заданных сферах ответственности другие каскады конвейера не зависят от алгоритма, используемого для конкретной задачи (например, способа или даже доступности кластеризации)- with clearly defined areas of responsibility, other stages of the pipeline do not depend on the algorithm used for a specific task (for example, the method or even the availability of clustering)

- объединение метаданных позволяет многим каскадам (направленности, заграждения и т.д.) действовать только на этапе управления.- combining metadata allows many cascades (directivity, barriers, etc.) to act only at the control stage.

В частности, входной список рендеринга идентичен выходному списку 500 рендеринга в примере фиг. 6. В частности, список рендеринга имеет первый элемент 511 рендеринга и второй элемент 512 рендеринга, где каждый элемент рендеринга имеет единый буфер аудиопотока. In particular, the input render list is identical to the output render list 500 in the example of FIG. 6. Specifically, the render list has a first render element 511 and a second render element 512, where each render element has a single audio stream buffer.

В первом каскаде 200 рендеринга или конвейера, который является каскадом направленности в этом примере, первый фильтр 211 FIR применяется к первому элементу рендеринга, и другой фильтр направленности или фильтр 212 FIR применяется ко второму элементу 512 рендеринга. Кроме того, во втором каскаде рендеринга или втором каскаде 33 конвейера, который в этом варианте осуществления является каскадом распространения, первая интерполяционная линия 311 задержки применяется к первому элементу 511 рендеринга, и вторая интерполяционная линия 312 задержки применяется ко второму элементу 512 рендеринга. In the first rendering stage or pipeline 200, which is the directional stage in this example, a first FIR filter 211 is applied to the first render element and another directional filter or FIR filter 212 is applied to the second render element 512. Moreover, in the second rendering stage or the second pipeline stage 33, which in this embodiment is a propagation stage, the first interpolation delay line 311 is applied to the first rendering element 511, and the second interpolating delay line 312 is applied to the second rendering element 512.

Кроме того, в третьем каскаде 400 конвейера, присоединенном после второго каскада 300 конвейера, для первого элемента 511 рендеринга используется первый стереофонический фильтр 411 FIR, и для второго элемента 512 рендеринга используется второй фильтр 412 FIR. В стереофоническом преобразователе в пространственную область, понижающее микширование двух выходных данных фильтра осуществляется на сумматоре 413 для получения стереофонического выходного сигнала. Таким образом, на основании двух объектных сигналов, указанных элементами 511, 512 рендеринга, на выходе сумматора 413 (не показан на фиг. 6) формируется стереофонический сигнал. Таким образом, как рассмотрено, все элементы 211, 212, 311, 312, 411, 412 изменяются в ответ на управление переключением в один и тот же некоторый момент времени под управлением слоя 201, 301, 401 управления. Фиг. 6 иллюстрирует ситуацию, где количество объектов, указанных в списке 500 рендеринга, остается неизменным, но метаданные для объектов изменяются вследствие изменения положения объекта. В качестве альтернативы, метаданные для объектов и, в частности, положение объекта остается неизменным, но, вследствие перемещения слушателя, соотношение между слушателем и соответствующим (фиксированным) объектом изменяется, приводя к изменениям фильтров 211, 212 FIR, и изменениям в линиях 311, 312 задержки, и изменениям фильтров 411, 412 FIR, которые, например, реализованы как фильтры передаточной функции, связанной с положением головы, которые изменяются с каждым изменением положения источника или объекта или положения слушателя, измеренной, например, модулем отслеживания головы. In addition, in the third pipeline stage 400 connected after the second pipeline stage 300, a first stereo FIR filter 411 is used for the first rendering element 511, and a second FIR filter 412 is used for the second rendering element 512. In a stereo to spatial domain converter, downmixing of the two filter outputs is performed at an adder 413 to produce a stereo output signal. Thus, based on the two object signals indicated by the rendering elements 511, 512, a stereo signal is generated at the output of the adder 413 (not shown in FIG. 6). Thus, as discussed, all elements 211, 212, 311, 312, 411, 412 change in response to the switching control at the same time under the control of the control layer 201, 301, 401. Fig. 6 illustrates a situation where the number of objects specified in the render list 500 remains unchanged, but the metadata for the objects changes due to a change in the object's position. Alternatively, the metadata for the objects, and in particular the position of the object, remains unchanged, but, due to the movement of the listener, the relationship between the listener and the corresponding (fixed) object changes, resulting in changes to the FIR filters 211, 212, and changes to the lines 311, 312 delay, and changes to FIR filters 411, 412, which are, for example, implemented as head position transfer function filters that change with each change in source or object position or listener position measured, for example, by a head tracking module.

Фиг. 7 иллюстрирует дополнительный пример, относящийся к сокращению элементов рендеринга (за счет кластеризации). Fig. 7 illustrates an additional example related to the reduction of rendering elements (through clustering).

СитуацияSituation

В сложной слуховой сцене, список рендеринга может содержать большое количество RI, перцепционно близких, т.е. различие в их положении неразличимо для слушателя. Для снижения вычислительной нагрузки для последующих каскадов, каскад кластеризации может заменять множественные отдельные RI единым репрезентативным RI.In a complex auditory scene, the rendering list may contain a large number of RIs that are perceptually close, i.e. the difference in their position is indistinguishable to the listener. To reduce the computational burden for subsequent cascades, the clustering cascade can replace multiple individual RIs with a single representative RI.

На некотором этапе управления, конфигурация сцены может изменяться таким образом, что кластеризация больше не является перцепционно осуществимой. В этом случае, каскад кластеризации станет неактивным и будет пропускать список рендеринга без зменения.At some point in control, the scene configuration may change such that clustering is no longer perceptually feasible. In this case, the clustering cascade will become inactive and will skip the render list without changing.

РеализацияImplementation

Когда некоторые входящие RI кластеризованы, первоначальные RI деактивируются в исходящем списке рендеринга. Сокращение является непрозрачным для последующих каскадах, и каскад кластеризации должен гарантировать, что, как только новый исходящий список рендеринга становится активным, пригодные выборки обеспечиваются в буферах, связанных с репрезентативным RI.When some incoming RIs are clustered, the original RIs are deactivated in the outgoing rendering list. The reduction is opaque to subsequent cascades, and the clustering cascade must ensure that, as soon as a new outgoing render list becomes active, suitable samples are provided in the buffers associated with the representative RI.

Когда кластер становится нереализуемым, новый исходящий список рендеринга каскада кластеризации содержит первоначальные, некластеризованные RI. Последующие каскады должны обрабатывать их по отдельности, начиная со следующего изменения параметров DSP (например, путем добавления нового фильтра FIR, линии задержки и т.д. к их диаграмме DSP).When a cluster becomes unfeasible, the new outgoing rendering list of the clustering cascade contains the original, non-clustered RIs. Subsequent stages must process them individually, starting with the next DSP parameter change (e.g. by adding a new FIR filter, delay line, etc. to their DSP diagram).

Ключевые преимуществаKey Benefits

- непрозрачное сокращение RI снижает вычислительную нагрузку для последующих каскадов без явного переконфигурирования- opaque RI reduction reduces the computational load for subsequent stages without explicit reconfiguration

- вследствие атомарности изменения параметров DSP, каскады могут обрабатывать изменяющиеся количества входящих и исходящих RI без артефактов- due to the atomicity of changing DSP parameters, cascades can process changing numbers of incoming and outgoing RIs without artifacts

На фиг. 7 показано, что входной список 500 рендеринга содержит 3 элемента 521, 522, 523 рендеринга, и выходной модуль 600 рендеринга содержит два элемента 623, 624 рендеринга. In fig. 7 shows that the input render list 500 contains 3 render elements 521, 522, 523, and the output render module 600 contains two render elements 623, 624.

Первый элемент 521 рендеринга поступает с выхода фильтра 221 FIR. Второй элемент 522 рендеринга формируется на выходе фильтра 222 FIR каскада направленности, и третий элемент 523 рендеринга получается на выходе фильтра 223 FIR первого каскада 200 конвейера, который является каскадом направленности. Следует отметить, что указание вывода элемента рендеринга из фильтра, это относится к выборкам аудиосигнала для буфера аудиопотока соответствующего элемента рендеринга. The first rendering element 521 comes from the output of the FIR filter 221. A second rendering element 522 is generated at the output of the FIR filter 222 of the directional stage, and a third rendering element 523 is produced at the output of the FIR filter 223 of the first pipeline stage 200, which is the directivity stage. It should be noted that specifying a render element's output from a filter refers to audio samples for the corresponding render element's audio stream buffer.

В примере на фиг. 7, элемент 523 рендеринга остается независимым от состояния 300 кластеризации и становится выходным элементом 623 рендеринга. Однако элемент 521 рендеринга и элемент 522 рендеринга микшируются с понижением в микшированный с понижением элемент 324 рендеринга, который образуется в модуле 600 рендеринга в качестве выходного элемента 624 рендеринга. Понижающее микширование в каскаде 300 кластеризации указано местом 321 для первого элемента 521 рендеринга и местом 322 для второго элемента 522 рендеринга. In the example in FIG. 7, the render element 523 remains independent of the clustering state 300 and becomes the output render element 623. However, the rendering element 521 and the rendering element 522 are downmixed into a downmixing rendering element 324, which is generated in the rendering unit 600 as the output rendering element 624. Downmixing in the clustering stage 300 is indicated by a location 321 for a first rendering element 521 and a location 322 for a second rendering element 522.

Опять же, третьим каскадом конвейера на фиг. 7 является стереофонический преобразователь 400 в пространственную область, и элемент 624 рендеринга обрабатывается первым стереофоническим фильтром 424 FIR, и элемент 623 рендеринга обрабатывается стереофоническим фильтром 423 FIR, и выходные сигналы обоих фильтров суммируются на сумматоре 413, давая стереофонический выходной сигнал. Again, the third stage of the conveyor in FIG. 7 is a stereo to spatial domain converter 400, and rendering element 624 is processed by first stereo FIR filter 424, and rendering element 623 is processed by stereo FIR filter 423, and the outputs of both filters are summed at adder 413 to produce a stereo output signal.

Фиг. 8 иллюстрирует другой пример, демонстрирующий добавление новых элементов рендеринга (для ранних отражений). Fig. Figure 8 illustrates another example showing the addition of new rendering elements (for early reflections).

СитуацияSituation

В геометрической акустике комнаты может быть полезным моделирование отраженного звука как источников изображения (т.е. двух точечных источников с одним и тем же сигналом, положения которых симметричны относительно отражающей поверхности). Если конфигурация слушателя, источника и отражающей поверхности в сцене благоприятна для отражения, каскад ранних отражений добавляет в свой исходящий список рендеринга новый RI, который представляет источник изображения.In geometric room acoustics, it can be useful to model reflected sound as image sources (i.e., two point sources with the same signal whose positions are symmetrical about a reflecting surface). If the configuration of the listener, source, and reflective surface in the scene is favorable for reflection, the early reflection cascade adds a new RI to its outgoing render list that represents the image source.

Слышимость источников изображения обычно быстро изменяется, когда слушатель движется. Каскад ранних отражений может активировать и деактивировать RI на каждом этапе управления, и последующие каскады должны соответственно регулировать свою обработку DSP.The audibility of image sources usually changes quickly when the listener moves. The early reflection stage can activate and deactivate the RI at each control stage, and subsequent stages must adjust their DSP processing accordingly.

РеализацияImplementation

Каскады после каскада ранних отражений могут нормально обрабатывать отражение RI, поскольку каскад ранних отражений гарантирует, что соответствующий аудио-буфер содержит те же выборки, что и первоначальный RI. Таким образом, перцептивные эффекты наподобие задержки на распространение могут одинаково обрабатываться для первоначальных RI и отражений без явного переконфигурирования. Для повышения эффективности, когда статус активности RI изменяется часто, каскады могут сохранять необходимые артефакты DSP (например, экземпляры фильтра FIR) для повторного использования.Stages after the early reflection cascade can handle RI reflection normally because the early reflection cascade ensures that the corresponding audio buffer contains the same samples as the original RI. Thus, perceptual effects like propagation delay can be treated equally for initial RIs and reflections without explicit reconfiguration. To improve efficiency, when the RI activity status changes frequently, cascades can store necessary DSP artifacts (e.g., FIR filter instances) for reuse.

Каскады могут по-разному обрабатывать элементы рендеринга с некоторыми свойствами. Например, элемент рендеринга, созданный каскадом реверберации (изображенный элементом 532 на фиг. 8), может не обрабатываться каскадом ранних отражений и обрабатываться только преобразователем в пространственную область. Таким образом, элемент рендеринга может обеспечивать функциональность шины понижающее микширование. Аналогичным образом, каскад может обрабатывать элементы рендеринга, сформированные каскадом ранних отражений с помощью алгоритма DSP более низкого качество. поскольку они обычно менее акустически заметны.Cascades can treat render elements with certain properties differently. For example, the render element created by the reverb stage (depicted by element 532 in FIG. 8) may not be processed by the early reflection stage and only processed by the spatial domain converter. Thus, the rendering element may provide bus downmix functionality. Likewise, the cascade can process rendering elements generated by a cascade of early reflections using a lower quality DSP algorithm. as they are usually less acoustically noticeable.

Ключевые преимуществаKey Benefits

- разные элементы рендеринга могут обрабатываться по-разному на основании их свойств- different rendering elements may be treated differently based on their properties

- каскад, который создает новые элементы рендеринга, может извлекать пользу из обработки последующих каскадов без явного переконфигурирования- a cascade that creates new render elements can benefit from processing subsequent cascades without explicit reconfiguration

Список 500 рендеринга содержит первый элемент 531 рендеринга и второй элемент 532 рендеринга. Каждый имеет единый буфер аудиопотока, который может переносить, например, моно- или стерео-сигнал. The rendering list 500 contains a first rendering element 531 and a second rendering element 532. Each has a single audio stream buffer that can carry, for example, a mono or stereo signal.

Первый каскад 200 конвейера является каскадом реверберации, который имеет, например, сформированный элемент 531 рендеринга. Список 500 рендеринга дополнительно имеет элемент 532 рендеринга. В более раннем каскаде 300 отклонения, элемент 531 рендеринга и, в частности, его выборки аудиосигнала представлены входным сигналом 331 для операции копирования. Входной сигнал 331 операции копирования копируется в выходной буфер 331 аудиопотока, соответствующий буферу аудиопотока элемента 631 рендеринга выходного списка 600 рендеринга. Кроме того, другой скопированный аудио-объект 333 соответствует элементу 633 рендеринга. Кроме того, как указано, элемент 532 рендеринга входного списка 500 рендеринга просто копируется или подается на элемент 632 рендеринга выходного списка рендеринга. The first pipeline stage 200 is a reverb stage, which has, for example, a render element 531 generated. The rendering list 500 additionally has a rendering element 532 . In the earlier rejection stage 300, the rendering element 531 and, in particular, its audio samples are represented by the input signal 331 for the copy operation. The copy operation input signal 331 is copied to an output audio stream buffer 331 corresponding to the audio stream buffer of the rendering element 631 of the output rendering list 600. In addition, another copied audio object 333 corresponds to rendering element 633. Moreover, as indicated, the rendering element 532 of the input rendering list 500 is simply copied or fed to the rendering element 632 of the output rendering list.

Затем, в третьем каскаде конвейера, то есть, в вышеприведенном примере, стереофонический преобразователь в пространственную область, стереофонический фильтр 431 FIR применяется к первому элементу 631 рендеринга, стереофонический фильтр 433 FIR применяется ко второму элементу 633 рендеринга, и третий стереофонический фильтр 432 FIR применяется к третьему элементу 632 рендеринга. Затем соответственно суммируются вклады всех трех фильтров, т.е. поканально, сумматором 413, и сумматор 413 выводит левый сигнал с одной стороны и правый сигнал с другой стороны для наушников или, в общем случае, для стереофонического воспроизведения. Then, in the third stage of the pipeline, that is, in the above example, a stereo to spatial domain converter, a stereo FIR filter 431 is applied to the first render element 631, a stereo FIR filter 433 is applied to the second render element 633, and a third stereo FIR filter 432 is applied to the third element has 632 renderings. Then the contributions of all three filters are summed up accordingly, i.e. channel by adder 413, and adder 413 outputs a left signal on one side and a right signal on the other side for headphones or, more generally, stereo playback.

Фиг. 9 иллюстрирует обзор отдельных процедур управления от управления высокого уровня посредством интерфейса центрального контроллера аудиосцены до управления низкого уровня, осуществляемого слоем управления каскада конвейера.Fig. 9 illustrates an overview of individual control procedures from high-level control via the central audio stage controller interface to low-level control implemented by the pipeline stage control layer.

В некоторые моменты, которые могут быть моментами времени, которые являются нерегулярными и зависят от поведения слушателя, которое, например, определяется модулем отслеживания головы, центральный контроллер принимает аудиосцену или изменение аудиосцены, как указано на этапе 91. На этапе 92 центральный контроллер определяет список рендеринга для каждого каскада конвейера под управлением центрального контроллера. В частности, обновления управления, которые затем отправляются от центрального контроллера на отдельные каскады конвейера, инициируются с регулярными интервалами, т.е. с некоторым/ой темпом обновления или частотой обновления. At some points, which may be points in time that are irregular and dependent on the listener's behavior, which is, for example, determined by the head tracking module, the central controller receives an audio scene or audio scene change, as indicated at step 91. At step 92, the central controller determines a render list for each conveyor stage under the control of a central controller. In particular, control updates, which are then sent from the central controller to the individual pipeline stages, are initiated at regular intervals, i.e. with a certain update rate or update frequency.

Как показано на этапе 93, центральный контроллер отправляет отдельный список рендеринга на каждый слой управления конвейера соответствующего каскада. Это может осуществляться централизованно через инфраструктуру управления переключением, например, но предпочтительно осуществлять это последовательно через первый каскад конвейера и оттуда на следующий каскад конвейера и т.д., как указано линией 130 рабочего потока управления на фиг. 3. На дополнительном этапе 94, каждый слой управления строит свою соответствующую диаграмму обработки для новой конфигурации для соответствующего переконфигурируемого процессора аудиоданных, как показано на этапе 94. Старая конфигурация также указана как “первая конфигурация”, и новая конфигурация указана как “вторая конфигурация”. As shown in step 93, the central controller sends a separate render list to each pipeline control layer of the corresponding stage. This can be done centrally through the switching control infrastructure, for example, but is preferably done sequentially through the first pipeline stage and from there to the next pipeline stage, etc., as indicated by control workflow line 130 in FIG. 3. In an additional step 94, each control layer builds its respective processing diagram for the new configuration for the corresponding reconfigurable audio data processor, as shown in step 94. The old configuration is also indicated as the “first configuration”, and the new configuration is indicated as the “second configuration”.

На этапе 95 слой управления принимает управление переключением от центрального контроллера и переконфигурирует связанный с ним переконфигурируемый процессор аудиоданных к новой конфигурации. Этот прием управления переключением слоя управления на этапе 95 может происходить в ответ на прием сообщения готовности всех каскадов конвейера центральным контроллером или может осуществляться в ответ на отправку с центрального контроллера соответствующей инструкции управления переключением спустя некоторое время относительно инициирования обновления, что осуществляется на этапе 93. Затем, на этапе 96, слой управления соответствующего каскада конвейера заботится о плавном ослаблении элементов, которых не существуют в новой конфигурации, или заботится о плавном усилении новых элементов, которых не существуют в старой конфигурации. В случае некоторых объектов в старой конфигурации и новой конфигурации, и в случае изменений метаданных, например, относительно расстояния до источника или нового фильтра HRTF вследствие перемещения головы слушателя и т.д., плавный переход фильтров или плавный переход фильтрованных данных для плавного прихода с одного расстояния, например, до другого расстояния, также управляется слоем управления на этапе 96. At step 95, the control layer receives switching control from the central controller and reconfigures the associated reconfigurable audio data processor to the new configuration. This control layer switch control technique at step 95 may occur in response to receipt of an all-pipeline-stage ready message by the central controller, or may occur in response to the central controller sending a corresponding switch control instruction some time later regarding the initiation of an update, which occurs at step 93. Then , at step 96, the control layer of the corresponding pipeline stage takes care of smoothly weakening elements that do not exist in the new configuration, or takes care of smoothly strengthening new elements that do not exist in the old configuration. In the case of some objects in the old configuration and the new configuration, and in case of metadata changes, for example regarding the distance to the source or the new HRTF filter due to the movement of the listener's head, etc., a smooth transition of filters or a smooth transition of filtered data to smoothly arrive from one distance, for example, to another distance, is also controlled by the control layer at step 96.

Фактическая обработка в новой конфигурации начинается через обратный вызов от аудио-оборудования. Таким образом, другими словами, рабочий поток обработки инициируется после переконфигурирования к новой конфигурации в предпочтительном варианте осуществления. Когда запрашивается новый блок выборок, центральный контроллер заполняет буферы аудиопотока элементов рендеринга, которые он поддерживает, входными выборками, например, с диска или из входящих аудиопотоков. Затем контроллер инициирует обработочную часть каскадов рендеринга, т.е. переконфигурируемые процессоры аудиоданных последовательно, и переконфигурируемые процессоры аудиоданных действуют на буферах аудиопотока согласно их текущей конфигурации, т.е. их текущим диаграммам обработки. Таким образом, центральный контроллер заполняет буферы аудиопотока первого каскада конвейера в устройстве для рендеринга звуковой сцены. Однако также возникает ситуация, где входные буферы других каскадов конвейера должны наполняться от центрального контроллера. Эта ситуация может, например, возникать, в отсутствие пространственно протяженных источников звука в более ранних ситуациях аудиосцены. Таким образом, в этой более ранней ситуации, каскад 300 на фиг. 4 отсутствует. Однако затем слушатель перемещается в некоторое место в виртуальной аудиосцене, где пространственно протяженный источник звука является видимым или подлежит рендеринга как пространственно протяженный источник звука, поскольку слушатель находится очень близко к этому источнику звука. Затем, в этот момент времени, для введения этого пространственно протяженного источника звука через блок 300, центральный контроллер 100 будет подавать, обычно через каскад 200 передачи, новый список рендеринга для каскада 300 протяженной рендеринга. The actual processing in the new configuration begins via a callback from the audio hardware. Thus, in other words, the processing workflow is initiated after reconfiguration to the new configuration in the preferred embodiment. When a new block of samples is requested, the central controller fills the audio stream buffers of the rendering elements it supports with input samples, such as from disk or from incoming audio streams. The controller then initiates the processing part of the rendering cascades, i.e. reconfigurable audio processors sequentially, and reconfigurable audio processors operate on audio stream buffers according to their current configuration, i.e. their current processing charts. Thus, the central controller fills the audio stream buffers of the first stage of the pipeline in the device to render the sound stage. However, a situation also arises where the input buffers of other stages of the pipeline must be filled from the central controller. This situation may, for example, arise in the absence of spatially extended sound sources in earlier situations in the audio scene. Thus, in this earlier situation, stage 300 in FIG. 4 is missing. However, the listener then moves to some location in the virtual audio scene where the spatially extended sound source is visible or rendered as a spatially extended sound source because the listener is very close to that sound source. Then, at this point in time, to introduce this spatially extended audio source through block 300, the central controller 100 will supply, typically through the transfer stage 200, a new render list for the extended rendering stage 300.

Список литературыBibliography

[1] Wenzel, E. M., Miller, J. D., and Abel, J. S. "Sound Lab: A real-time, software-based system for the study of spatial hearing." Audio Engineering Society Convention 108. Audio Engineering Society, 2000.[1] Wenzel, E. M., Miller, J. D., and Abel, J. S. "Sound Lab: A real-time, software-based system for the study of spatial hearing." Audio Engineering Society Convention 108. Audio Engineering Society, 2000.

[2] Tsingos, N., Gallo, E., and Drettakis, G "Perceptual audio rendering of complex virtual environments." ACM Transactions on Graphics (TOG) 23.3 (2004): 249-258.[2] Tsingos, N., Gallo, E., and Drettakis, G. "Perceptual audio rendering of complex virtual environments." ACM Transactions on Graphics (TOG) 23.3 (2004): 249-258.

Claims

1. A device for rendering a sound stage (50), containing:

a first pipeline stage (200) comprising a first control layer (201) and a reconfigurable first audio data processor (202), wherein the reconfigurable first audio data processor (202) is configured to operate in accordance with the first configuration of the reconfigurable first audio data processor (202);

a second pipeline stage (300) located relative to the pipeline flow after the first pipeline stage (200), wherein the second pipeline stage (300) comprises a second control layer (301) and a reconfigurable second audio data processor (302), wherein the reconfigurable second audio data processor (302) configured to operate in accordance with the first configuration of the reconfigurable second audio data processor (302); And

a central controller (100) for controlling the first control layer (201) and the second control layer (301) in response to the sound stage (50), such that the first control layer (201) prepares a second configuration of the reconfigurable first audio data processor (202) in time or after the operation of the reconfigurable first audio data processor (202) in the first configuration of the reconfigurable first audio data processor (202), or such that the second control layer (301) prepares a second configuration of the reconfigurable second audio data processor (302) during or after the operation of the reconfigurable second an audio data processor (302) in a first configuration of a reconfigurable second audio data processor (302), and

wherein the central controller (100) is configured to control the first control layer (201) or the second control layer (301) using switching control (110) to reconfigure the reconfigurable first audio data processor (202) into a second configuration for the reconfigurable first processor (202) audio data or for reconfiguring the reconfigurable second audio data processor (302) into a second configuration for the reconfigurable second audio data processor (302) at some point in time.

2. The apparatus of claim 1, wherein the central controller (100) is configured to control the first control layer (201) to prepare a second configuration of the reconfigurable first audio data processor (202) during operation of the reconfigurable first audio data processor (202) in the first configuration of the reconfigurable a first audio data processor (202), and

controlling a second control layer (301) to prepare a second configuration of the reconfigurable second audio data processor (302) during operation of the reconfigurable second audio data processor (302) in the first configuration of the reconfigurable second audio data processor (302), and

controlling the first control layer (201) and the second control layer (301) using switching control (110) to reconfigure the reconfigurable first audio data processor (202) into a second configuration for the reconfigurable first audio data processor (202) and to reconfigure the reconfigurable second processor (302) audio data into a second configuration for a reconfigurable second audio data processor (302) at some point in time.

3. The apparatus of claim 1 or 2, wherein the first pipeline stage (200) or the second pipeline stage (300) comprises an input interface configured to receive a render input list (500), wherein the render input list comprises an input element list (501 ) rendering, metadata (502) for each render element and audio stream buffer (503) for each render element,

wherein at least the first stage (200) of the pipeline contains an output interface configured to output a rendering output list (600), where the rendering output list contains an output list of rendering elements (601), metadata (602) for each rendering element and a buffer (603 ) audio stream for each render element, and

wherein, when the second pipeline stage (300) is connected to the first pipeline stage (200), the output render list of the first pipeline stage (200) is the input render list for the second pipeline stage (300).

4. The apparatus of claim 3, wherein the first pipeline stage (200) is configured to write audio samples into a corresponding audio stream buffer (603) specified in the output rendering element list (600), such that the second pipeline stage (300) , subsequent to the first stage (200) of the pipeline, is capable of extracting audio stream samples from the corresponding audio stream buffer (603) at the rate of the processing workflow.

5. An apparatus as claimed in any one of the preceding claims, wherein the central controller (100) is configured to supply an input or output rendering list (500, 600) to a first or second pipeline stage (300), the first or second configuration being a reconfigurable first or second processor (202, 302) of audio data contains a processing diagram, wherein the first or second control layer (201, 301) is configured to create a processing diagram for the second configuration from the input or output rendering list (500, 600) received from the central controller (100) or from the previous conveyor stage,

wherein the processing diagram comprises audio data processor stages and references to input and output buffers of the corresponding first or second reconfigurable audio data processor.

6. The device according to claim 5, in which the central controller (100) is configured to supply additional data necessary for creating a processing diagram to the first or second stage (200, 300) of the conveyor, and the additional data is not included in the input list (500 ) rendering or output list (600) rendering.

7. The apparatus of any one of the preceding claims, wherein the central controller (100) is configured to receive sound stage changes (50) via the sound stage interface at the time the sound stage changes,

wherein the central controller (100) is configured to generate a first render list for the first pipeline stage (200) and a second render list for the second pipeline stage (300) in response to the sound stage change and based on the current sound stage specified by the sound stage change, and wherein the central controller (100) is configured to send the first rendering list to the first control layer (201) and the second, central, rendering list to the second control layer (301) after the sound stage change time.

8. Device according to clause 7,

wherein the first control layer (201) is configured to calculate a second configuration of the first reconfigurable audio data processor (202) from the first rendering list after the sound stage change time, and

the second control layer (301) is configured to calculate a second configuration of the second reconfigurable audio data processor (302) from the second rendering list, and

wherein the central controller (100) is configured to initiate switching control (110) simultaneously for the first and second stages (200, 300) of the conveyor.

9. The apparatus of any one of the preceding claims, wherein the central controller (100) is configured to use switching control (110) without regard to the audio sample calculation operation performed by the first and second reconfigurable audio data processors (202, 302).

10. The device according to any of the previous paragraphs,

wherein the central controller (100) is configured to receive (91) changes in the audio scene (50) at change times having an irregular frequency,

the central controller (100) is configured to supply (93) control instructions to the first and second control layers (201, 301) at a regular control frequency, and

wherein the reconfigurable first and second audio data processors (203, 302) operate at the frequency of the audio blocks, calculating output audio samples from input audio samples received from an input buffer of the reconfigurable first or second audio data processor, wherein the output samples are stored in the output buffer of the reconfigurable first or second audio data processor , and the control frequency is lower than the frequency of the audio units.

11. The device according to any of the previous paragraphs,

wherein the central controller (100) is configured to initiate switching control (110) at some period of time after controlling the first and second control layers (201, 202) to prepare a second configuration or in response to a ready signal received from the first and second stages ( 200, 300) of the pipeline, indicating that the first and second stages (200, 300) of the pipeline are ready to transition to the corresponding second configuration.

12. A device according to any of the previous paragraphs,

wherein the first or second stage (200, 300) of the pipeline is configured to create a list (600) of output rendering elements from the list (500) of input rendering elements,

wherein the creation comprises changing metadata for rendering elements from the input list and writing the changed metadata to the output list, or

comprises calculating output audio data for render elements using input audio data extracted from the input stream buffer of the input render list, and writing the output audio data to the output stream buffer of the output render list (600).

13. The device according to any of the previous paragraphs,

wherein the first or second control layer (201, 301) is configured to control the first or second reconfigurable audio data processor to smoothly enhance a new rendering element to be processed after switching control (110) or to smoothly attenuate an old rendering element no longer existing after the control (110) switching, but existed before the (110) switching control.

14. The device according to any of the previous paragraphs,

wherein each rendering element of the list of rendering elements includes, in the input list or output list of the first or second rendering stage, a status indicator indicating at least one of the following states: rendering active, rendering to be activated, rendering inactive, rendering to be deactivated .

15. The device according to any of the previous paragraphs,

wherein the central controller (100) is configured to fill, in response to a request from the first or second rendering stage, render element input buffers maintained by the central controller (100) with new samples and

the central controller (100) is configured to sequentially initiate the reconfigurable first and second audio data processors (202, 302) such that the configurable first and second audio data processors (202, 302) act on the respective render element input buffers in accordance with the first or second configuration depending on which configuration is currently active.

16. The device according to any of the previous paragraphs,

wherein the second pipeline stage (300) is a spatial domain conversion stage that provides, as an output, a channel representation for playback through headphones or a speaker system.

17. A device according to any of the previous paragraphs,

wherein the first and second pipeline stages (200, 300) comprise at least one of the following groups of stages:

transmission stage (200), extension stage (300), early reflection stage (400), clustering stage (551), diffraction stage (552), propagation stage (553), spatial domain conversion stage (554), limiter stage and stage visualization.

18. The device according to any of the previous paragraphs,

wherein the first pipeline stage (200) is a directionality stage (200) for one or more rendering elements, and wherein the second pipeline stage (300) is a propagation stage (300) for one or more rendering elements,

the central controller (100) is configured to receive changes to the audio scene (50) indicating that one or more rendering elements have one or more new positions,

the central controller (100) is configured to control the first control layer (201) and the second control layer (301) to adapt filter settings for the first and second reconfigurable audio data processors to one or more new positions, and

the first control layer (201) and the second control layer (301) are configured to transition to the second configuration at some point in time, and, when transitioning to the second configuration, the operation of smooth transition from the first configuration to the second configuration is carried out in the reconfigurable first or second processor ( 202, 302) audio data.

19. The device according to any one of paragraphs. 1-17, in which the first pipeline stage (200) is a directional stage (200) and the second pipeline stage (300) is a clustering stage (300),

the central controller (100) is configured to receive changes to the audio scene (50) indicating that the clustering of rendering elements is to be stopped, and

the central controller (100) is configured to control the first control layer (201) to deactivate the reconfigurable audio data processor of the clustering stage and copy the input list of rendering elements to the output list of rendering elements of the second stage (300) of the pipeline.

20. The device according to any one of paragraphs. 1-17,

wherein the first pipeline stage (200) is a reverb stage and the second pipeline stage (300) is an early reflection stage,

the central controller (100) is configured to receive audio scene changes (50) indicating that an additional image source needs to be added, and

the central controller (100) is configured to control the control layer of the second stage (300) of the pipeline to multiply a render element from the input render list to obtain the multiplied render element (333) and to sum the multiplied render element (333) with the output render list of the second stage (300 ) conveyor.

21. A method for rendering a sound stage (50) using an apparatus comprising a first pipeline stage (200), comprising a first control layer (201) and a reconfigurable first audio data processor (202), wherein the reconfigurable first audio data processor (202) is configured to operate in in accordance with the first configuration of the reconfigurable first audio data processor (202); a second pipeline stage (300) located, relative to the pipeline flow, after the first pipeline stage (200), wherein the second pipeline stage (300) comprises a second control layer (301) and a reconfigurable second audio data processor (302), wherein the reconfigurable second processor (302) ) audio data is configured to operate in accordance with a first configuration of a reconfigurable second audio data processor (302), comprising the steps of:

controls the first control layer (201) and the second control layer (301) in response to the sound stage (50) such that the first control layer (201) prepares a second configuration of the reconfigurable first audio data processor (202) during or after operation of the reconfigurable first processor (202) of audio data in a first configuration of the reconfigurable first audio data processor (202), or such that the second control layer (301) prepares a second configuration of the reconfigurable second audio data processor (302) during or after operation of the reconfigurable second audio data processor (302) in the first configuration of the reconfigurable second audio data processor (302), and

controls the first control layer (201) or the second control layer (301) using switching control (110) to reconfigure the reconfigurable first audio data processor (202) into a second configuration for the reconfigurable first audio data processor (202) or to reconfigure the reconfigurable second processor (302) audio data into a second configuration for a reconfigurable second audio data processor (302) at some point in time.