CN114928788B

CN114928788B - A method for decoding sound field playback space based on sparse plane wave decomposition

Info

Publication number: CN114928788B
Application number: CN202210370600.1A
Authority: CN
Inventors: 曾向阳; 洪汐; 杜博凯; 路东东
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-04-10
Filing date: 2022-04-10
Publication date: 2025-02-21
Anticipated expiration: 2042-04-10
Also published as: CN114928788A

Abstract

The present invention belongs to the technical field of spatial sound field playback, and specifically relates to a sound field playback spatial decoding method of high-order Ambisonics based on sparse plane wave decomposition. The method realizes sound field playback through two stages. The first stage pre-acquires the driving function of the loudspeaker system to play back plane waves in different directions, which is called the pre-calculation stage; the second stage performs sparse plane wave decomposition on the target sound field to calculate the playback driving function, which is called the playback decoding stage. The present invention uses the spherical harmonic domain sparse plane wave decomposition method to play back the sound field, and effectively improves the sound field playback accuracy when the target sound field is generated by a small number of sound sources. The method of the present invention is simple and low in cost, and can use the spherical harmonic coefficients of the target sound field with lower orders to achieve higher-precision playback; it has a good computing speed, and the plane wave generator of the playback system can be measured and calculated in advance, and the loudspeaker system drive signal can be quickly obtained in the playback stage; at the same time, a larger optimal listening range is guaranteed.

Description

Sound field replay space decoding method based on sparse plane wave decomposition

Technical Field

The invention belongs to the technical field of space sound field reproduction, and particularly relates to a sound field reproduction space decoding method of high-order Ambisonics based on sparse plane wave decomposition.

Background

Playback of sound fields using speaker arrays is an important way to achieve acoustic virtual reality and has been widely studied in recent decades. The purpose of sound field playback is to provide listeners with a different auditory experience, such as listening to music, watching a movie, or playing an electronic game, etc. Higher order Ambisonics (High Order Ambisonics, HOA) is a common sound field playback method. HOA is divided into encoding and decoding stages. In the encoding stage, the target sound field is decomposed in the spherical harmonic domain to obtain a group of spherical harmonic coefficients representing the original sound field. The loudspeaker array drive function is obtained in the decoding stage such that the spherical harmonic coefficients of the playback sound field are the same as the spherical harmonic coefficients obtained in the encoding stage.

The existing HOA decoding method is usually a mode matching method (Poletti M A.Three-dimensional surround sound systems based on spherical harmonics[J].J.Audio Eng.Soc,2005,53(11):1004–1025.),, i.e. decomposing the sound field into a set of superimposed spherical harmonics (basis functions) under the spherical coordinate system, and using spherical harmonics modes between the primary sound field and the playback sound field to perform matching to solve the driving function of the secondary sound source (loudspeaker array). However, with the sound field playback method of the mode matching method, the recorded sound field order of the target sound field has a great influence on the measurement accuracy and the optimum spot size. The size range of the optimum spot is inversely proportional to frequency and proportional to the mode order. At a constant mode order of 4, the size of the optimum point is reduced to be smaller than the size of the human head at a frequency of 2 kHz.

In order to improve the replay performance and ensure that the optimal spot size is in a certain range, the invention provides a high-order Ambiosonic decoding method based on sparse plane wave decomposition under the condition that a target sound field is controlled by a small number of sound sources.

Disclosure of Invention

In order to avoid the defects of the prior art, the invention provides a high-order Ambisonics decoding method based on sparse plane wave decomposition.

Technical proposal

The invention adopts a plane wave sparse decomposition method of spherical harmonic domain to decompose the target sound field into a limited number of plane waves, and replay the target sound field according to a pre-obtained replay system plane wave driving function.

The invention realizes sound field replay through two stages, wherein the first stage obtains the driving functions of the loudspeaker system for replaying plane waves in different directions in advance, which is called as pre-calculation stage, and the second stage carries out sparse plane wave decomposition on the target sound field to calculate the replaying driving functions, which is called as replay decoding stage. The following steps are specific implementation steps of two stages:

A pre-calculation stage:

and step 1, determining the sound pressure of a target sound field, and recording the sound field environment sound pressure by using a high-order ambisonic microphone array, or synthesizing the sound pressure of a desired acoustic scene.

And 2, measuring the spherical harmonic coefficient of the target sound field. The measured sound pressure of the target sound field can be effectively collected by the spherical microphone array and decomposed in the spherical harmonic domain to obtain a group of spherical harmonic coefficients of the sound field. Commonly used spherical microphone arrays include hollow sphere arrays, heart-shaped microphone sphere arrays, double radius hollow sphere arrays, rigid sphere arrays, and the like.

And 3, carrying out plane wave decomposition on the measured or synthesized target sound field sound pressure in a spherical harmonic region. The spherical harmonic coefficients of the target sound field may be represented as a weighted combination of a set of plane wave spherical harmonic coefficients.

And 4, representing the spherical harmonic coefficients of the target sound field as a weighted combination of a group of plane wave spherical harmonic coefficients. The plane wave weight is sparse, and the sparse solution of the plane wave weight can be obtained by using the 1 norm.

Playback decoding stage

Step 1. For a fixed speaker system and playback, the transfer function of the speaker to the playback area is measured.

And 2, calculating a weight matrix of the plane wave generator. The plane wave generator may be obtained by calculating to minimize the error between the replay sound field and the plane wave sound field throughout the replay area. The speaker driving signals generating plane waves in different directions are obtained through calculation.

And step 3, combining the calculated plane wave weight with a plane wave generator to obtain a driving function of the loudspeaker in the high-order ambisonic decoding method.

Advantageous effects

The invention provides a method for replaying a sound field by utilizing a spherical harmonic domain sparse plane wave decomposition method, and effectively improves the replaying precision of the sound field under the condition that a target sound field is generated by a small number of sound sources. The method is simple and convenient, compared with the traditional method, the cost is lower, the target sound field spherical harmonic coefficient with lower order can be used for realizing higher-precision reproduction, the method has better operation speed, the plane wave generator of the reproduction system can be measured and calculated in advance, the driving signal of the loudspeaker system can be rapidly obtained in the reproduction stage, and meanwhile, compared with the traditional method, the method guarantees a larger optimal listening range. In addition, the methods presented herein may also be applicable to playback systems in reverberant environments. Through practical tests, the invention has higher replay accuracy compared with the traditional method, and can be suitable for target sound field replay under reverberant environment.

Drawings

FIG. 1 is a schematic diagram showing a distribution of speakers in a space according to an embodiment

FIG. 2 is a comparison of playback errors for 2 methods in different sweet ranges in a reverberant environment in an example embodiment

FIG. 3 is a flow chart of a sound field playback method of the present invention in an embodiment

Detailed Description

The sound field reproduction method of the present invention will now be described in detail with reference to examples, drawings, but embodiments of the present invention are not limited thereto.

The reproduction distance is (3,0,0) the sound field radiated by the point source. The playback speaker array consisted of 144 speakers with a radius of 1m and a uniform spherical distribution. The target sound field for playback is a square area of 0.5m by 0.5m, the center of which is the origin of coordinates as well as the center of the speaker.

Pretreatment:

Step 1, determining sound pressure p of a target sound field, wherein the sound pressure p is sound field environment sound pressure recorded by using a high-order ambisonic microphone array, and can also be synthesized expected sound field scene sound pressure. The target sound field sound pressure is expressed as a spherical harmonic region decomposition:

Where ω=2pi f is the angular velocity, f is the frequency, r is the distance, θ is the azimuth, For pitch angle, k=ω/c is wave number, c is sound velocity, n is order, and m is angle.J _n (·) is the order n spherical bessel function, which is the frequency dependent spherical harmonic coefficient.As spherical harmonics:

wherein P _n is an n-th order Legendre polynomial.

And 2, decomposing plane waves of the spherical harmonic domain. The spherical harmonic coefficients of the desired sound field may be represented as a weighted combination of a set of plane wave spherical harmonic coefficients:

where L _w is the number of plane wave bases, w _l is the weight of the first plane wave, and "×" is the conjugate symbol.

Step 3, considering all orders and all angles, expressing the formula (3) as a matrix form:

Pw=A (4)

Wherein P is an (N+1) ²×L_w -dimensional matrix containing plane wave spherical harmonic coefficients of different orders and angles. w is an L _w x1 vector, which represents plane wave weights in different directions.

Step 4, solving 1-norm thinning solution of the formula (4):

decoding:

step 1. Suppose that speaker weight vector d generates a sound field controlled by an L _w plane wave, denoted as

Wherein the first _w plane waves are expressed asX is a generator weight matrix.

Step 2X can be pre-calculated when the speaker array is fixed to the playback acoustic environment. The first _w plane wave is:

Where G is the transfer function of the speaker to the control point, Sound pressure generated at the control point for the first _w plane wave. ThenThe solution of (2) is:

Where lambda ₀ is the regularization parameter and H is the conjugate transpose.

Step 3, the solution of the formula (5) is expressed as w=f (P, a), and in the higher-order Ambisonics decoding method of the present invention, the driving function of the speaker is:

d=Xf(P,A) (9)

Claims

1. A sound field replay spatial decoding method based on sparse plane wave decomposition is characterized in that sound field replay is realized through two stages, wherein the first stage obtains a driving function of a loudspeaker system for replaying plane waves in different directions in advance, which is called a pre-calculation stage, the second stage carries out sparse plane wave decomposition on a target sound field to calculate a replay driving function, which is called a replay decoding stage,

The specific process of the pre-calculation stage is as follows:

Step 1, determining sound pressure of a target sound field, namely recording sound field environment sound pressure by using a high-order ambisonic microphone array or synthesizing expected acoustic scene sound pressure;

measuring spherical harmonic coefficients of a target sound field, wherein the measured sound pressure of the target sound field can be effectively collected by a spherical microphone array and decomposed in a spherical harmonic function domain to obtain a group of spherical harmonic coefficients of the sound field;

step 3, carrying out plane wave decomposition on the measured or synthesized target sound field sound pressure in a spherical harmonic domain, wherein spherical harmonic coefficients of the target sound field can be expressed as weighted combination of a group of spherical harmonic coefficients of the plane wave;

step 4, representing the spherical harmonic coefficients of the target sound field as a weighted combination of a group of plane wave spherical harmonic coefficients, wherein the plane wave weights are sparse, and the sparse solution of the plane wave weights can be obtained by using 1 norm;

The specific process of the playback decoding stage is as follows:

Step 1, for a fixed speaker system and playback, measuring the transfer function of the speaker to the playback area;

Step 2, calculating a weight matrix of the plane wave generator, wherein the plane wave generator can be obtained by calculating the minimum error between the replay sound field and the plane wave sound field in the whole replay area;

And step 3, combining the calculated plane wave weight with a plane wave generator to obtain a driving function of the loudspeaker in the high-order Ambisonics decoding method.