Sound field replay space decoding method based on sparse plane wave decomposition
Technical Field
The invention belongs to the technical field of space sound field reproduction, and particularly relates to a sound field reproduction space decoding method of high-order Ambisonics based on sparse plane wave decomposition.
Background
Playback of sound fields using speaker arrays is an important way to achieve acoustic virtual reality and has been widely studied in recent decades. The purpose of sound field playback is to provide listeners with a different auditory experience, such as listening to music, watching a movie, or playing an electronic game, etc. Higher order Ambisonics (High Order Ambisonics, HOA) is a common sound field playback method. HOA is divided into encoding and decoding stages. In the encoding stage, the target sound field is decomposed in the spherical harmonic domain to obtain a group of spherical harmonic coefficients representing the original sound field. The loudspeaker array drive function is obtained in the decoding stage such that the spherical harmonic coefficients of the playback sound field are the same as the spherical harmonic coefficients obtained in the encoding stage.
The existing HOA decoding method is usually a mode matching method (Poletti M A.Three-dimensional surround sound systems based on spherical harmonics[J].J.Audio Eng.Soc,2005,53(11):1004–1025.),, i.e. decomposing the sound field into a set of superimposed spherical harmonics (basis functions) under the spherical coordinate system, and using spherical harmonics modes between the primary sound field and the playback sound field to perform matching to solve the driving function of the secondary sound source (loudspeaker array). However, with the sound field playback method of the mode matching method, the recorded sound field order of the target sound field has a great influence on the measurement accuracy and the optimum spot size. The size range of the optimum spot is inversely proportional to frequency and proportional to the mode order. At a constant mode order of 4, the size of the optimum point is reduced to be smaller than the size of the human head at a frequency of 2 kHz.
In order to improve the replay performance and ensure that the optimal spot size is in a certain range, the invention provides a high-order Ambiosonic decoding method based on sparse plane wave decomposition under the condition that a target sound field is controlled by a small number of sound sources.
Disclosure of Invention
In order to avoid the defects of the prior art, the invention provides a high-order Ambisonics decoding method based on sparse plane wave decomposition.
Technical proposal
The invention adopts a plane wave sparse decomposition method of spherical harmonic domain to decompose the target sound field into a limited number of plane waves, and replay the target sound field according to a pre-obtained replay system plane wave driving function.
The invention realizes sound field replay through two stages, wherein the first stage obtains the driving functions of the loudspeaker system for replaying plane waves in different directions in advance, which is called as pre-calculation stage, and the second stage carries out sparse plane wave decomposition on the target sound field to calculate the replaying driving functions, which is called as replay decoding stage. The following steps are specific implementation steps of two stages:
A pre-calculation stage:
and step 1, determining the sound pressure of a target sound field, and recording the sound field environment sound pressure by using a high-order ambisonic microphone array, or synthesizing the sound pressure of a desired acoustic scene.
And 2, measuring the spherical harmonic coefficient of the target sound field. The measured sound pressure of the target sound field can be effectively collected by the spherical microphone array and decomposed in the spherical harmonic domain to obtain a group of spherical harmonic coefficients of the sound field. Commonly used spherical microphone arrays include hollow sphere arrays, heart-shaped microphone sphere arrays, double radius hollow sphere arrays, rigid sphere arrays, and the like.
And 3, carrying out plane wave decomposition on the measured or synthesized target sound field sound pressure in a spherical harmonic region. The spherical harmonic coefficients of the target sound field may be represented as a weighted combination of a set of plane wave spherical harmonic coefficients.
And 4, representing the spherical harmonic coefficients of the target sound field as a weighted combination of a group of plane wave spherical harmonic coefficients. The plane wave weight is sparse, and the sparse solution of the plane wave weight can be obtained by using the 1 norm.
Playback decoding stage
Step 1. For a fixed speaker system and playback, the transfer function of the speaker to the playback area is measured.
And 2, calculating a weight matrix of the plane wave generator. The plane wave generator may be obtained by calculating to minimize the error between the replay sound field and the plane wave sound field throughout the replay area. The speaker driving signals generating plane waves in different directions are obtained through calculation.
And step 3, combining the calculated plane wave weight with a plane wave generator to obtain a driving function of the loudspeaker in the high-order ambisonic decoding method.
Advantageous effects
The invention provides a method for replaying a sound field by utilizing a spherical harmonic domain sparse plane wave decomposition method, and effectively improves the replaying precision of the sound field under the condition that a target sound field is generated by a small number of sound sources. The method is simple and convenient, compared with the traditional method, the cost is lower, the target sound field spherical harmonic coefficient with lower order can be used for realizing higher-precision reproduction, the method has better operation speed, the plane wave generator of the reproduction system can be measured and calculated in advance, the driving signal of the loudspeaker system can be rapidly obtained in the reproduction stage, and meanwhile, compared with the traditional method, the method guarantees a larger optimal listening range. In addition, the methods presented herein may also be applicable to playback systems in reverberant environments. Through practical tests, the invention has higher replay accuracy compared with the traditional method, and can be suitable for target sound field replay under reverberant environment.
Drawings
FIG. 1 is a schematic diagram showing a distribution of speakers in a space according to an embodiment
FIG. 2 is a comparison of playback errors for 2 methods in different sweet ranges in a reverberant environment in an example embodiment
FIG. 3 is a flow chart of a sound field playback method of the present invention in an embodiment
Detailed Description
The sound field reproduction method of the present invention will now be described in detail with reference to examples, drawings, but embodiments of the present invention are not limited thereto.
The reproduction distance is (3,0,0) the sound field radiated by the point source. The playback speaker array consisted of 144 speakers with a radius of 1m and a uniform spherical distribution. The target sound field for playback is a square area of 0.5m by 0.5m, the center of which is the origin of coordinates as well as the center of the speaker.
Pretreatment:
Step 1, determining sound pressure p of a target sound field, wherein the sound pressure p is sound field environment sound pressure recorded by using a high-order ambisonic microphone array, and can also be synthesized expected sound field scene sound pressure. The target sound field sound pressure is expressed as a spherical harmonic region decomposition:
Where ω=2pi f is the angular velocity, f is the frequency, r is the distance, θ is the azimuth, For pitch angle, k=ω/c is wave number, c is sound velocity, n is order, and m is angle.J n (·) is the order n spherical bessel function, which is the frequency dependent spherical harmonic coefficient.As spherical harmonics:
wherein P n is an n-th order Legendre polynomial.
And 2, decomposing plane waves of the spherical harmonic domain. The spherical harmonic coefficients of the desired sound field may be represented as a weighted combination of a set of plane wave spherical harmonic coefficients:
where L w is the number of plane wave bases, w l is the weight of the first plane wave, and "×" is the conjugate symbol.
Step 3, considering all orders and all angles, expressing the formula (3) as a matrix form:
Pw=A (4)
Wherein P is an (N+1) 2×Lw -dimensional matrix containing plane wave spherical harmonic coefficients of different orders and angles. w is an L w x1 vector, which represents plane wave weights in different directions.
Step 4, solving 1-norm thinning solution of the formula (4):
decoding:
step 1. Suppose that speaker weight vector d generates a sound field controlled by an L w plane wave, denoted as
Wherein the first w plane waves are expressed asX is a generator weight matrix.
Step 2X can be pre-calculated when the speaker array is fixed to the playback acoustic environment. The first w plane wave is:
Where G is the transfer function of the speaker to the control point, Sound pressure generated at the control point for the first w plane wave. ThenThe solution of (2) is:
Where lambda 0 is the regularization parameter and H is the conjugate transpose.
Step 3, the solution of the formula (5) is expressed as w=f (P, a), and in the higher-order Ambisonics decoding method of the present invention, the driving function of the speaker is:
d=Xf(P,A) (9)