EP2757811A1

EP2757811A1 - Modal beamforming

Info

Publication number: EP2757811A1
Application number: EP13152209.6A
Authority: EP
Inventors: Markus Christoph
Original assignee: Harman Becker Automotive Systems GmbH
Current assignee: Harman Becker Automotive Systems GmbH
Priority date: 2013-01-22
Filing date: 2013-01-22
Publication date: 2014-07-23
Anticipated expiration: 2033-01-22
Also published as: EP2757811B1

Abstract

A method and system for generating an auditory scene that comprises: receiving eigenbeam outputs, the eigenbeam outputs having been generated by decomposing a plurality of audio signals, each audio signal having been generated by a different microphone of a microphone array, wherein each eigenbeam output corresponds to a different eigenbeam for the microphone array; generating the auditory scene based on the eigenbeam outputs and their corresponding eigenbeams, wherein generating the auditory scene comprises applying a weighting value to each eigenbeam output to form a steered eigenbeam output; and combining the weighted eigenbeams to generate the auditory scene, wherein generating the auditory scene further comprises applying a regularized equalizer filter to each eigenbeam output or steered eigenbeam output, the regularized equalizer filter(s) being configured to compensate for acoustic deficiencies of the microphone array and having a regularized equalization function.

Description

FIELD OF TECHNOLOGY

The embodiments disclosed herein refer to sound capture systems and methods, particularly to sound capture methods that employ modal beamforming.

BACKGROUND

Beamforming sound capture systems comprise at least (a) an array of two or more microphones and (b) a beamformer that combines audio signals generated by the microphones to form an auditory scene representative of at least a portion of an acoustic sound field. Due to the underlying geometry, it is natural to represent the sound field captured on the surface of a sphere with respect to spherical harmonics. In this context, spherical harmonics are also known as acoustic modes (or eigenbeams) and the appending signal-processing techniques as modal beamforming.
Two spherical microphone array configurations are commonly employed: the sphere may exist physically, or may merely be conceptual. In the first configuration, the microphones are arranged around a rigid sphere made of, for example, wood or hard plastic. In the second configuration, the microphones are arranged in free-field around an "open" sphere, referred to as an open-sphere configuration. Although the rigid-sphere configuration provides a more robust numerical formulation, the open-sphere configuration might be more desirable in practice at low frequencies where large spheres are realized.
Beamforming techniques allow for the controlling of the characteristics of the microphone array in order to achieve a desired directivity. One of the most general formulations is the filter-and-sum beamformer, which has readily been generalized by the concept of modal subspace decomposition. This approach finds optimum finite impulse response (FIR) filter coefficients for each microphone by solving an eigenvalue problem and projecting the desired beam pattern onto the set of eigenbeam patterns found.
Beamforming sound capture systems enable picking up acoustic signals dependent on their direction of propagation. The directional pattern of the microphone array can be varied over a wide range due to the degrees of freedom offered by the plurality of microphones and the processing of the associated beamformer. This enables, for example, steering the look direction, adapting the pattern according to the actual acoustic situation, and/or zooming in to or out from an acoustic source. All this can be done by controlling the beamformer, which is typically implemented via software, such that no mechanical alteration of the microphone array is needed. However, common beamformers fail to be directive at very low frequencies. Therefore, modal beamformers having less frequency-dependent directivity are desired.

SUMMARY

A method for generating an auditory scene comprises: receiving eigenbeam outputs generated by decomposing a plurality of audio signals, each audio signal having been generated by a different microphone of a microphone array, wherein each eigenbeam output corresponds to a different eigenbeam for the microphone array; generating the auditory scene based on the eigenbeam outputs and their corresponding eigenbeams, wherein generating the auditory scene comprises applying a weighting value to each eigenbeam output to form a steered eigenbeam output; and combining the weighted eigenbeams to generate the auditory scene, wherein generating the auditory scene further comprises applying a regularized equalizer filter to each eigenbeam output or steered eigenbeam output, the regularized equalizer filter(s) being configured to compensate for acoustic deficiencies of the microphone array and having a regularized equalization function.
A modal beamformer system for generating an auditory scene comprises: a steering unit that is configured to receive eigenbeam outputs, the eigenbeam outputs having been generated by decomposing a plurality of audio signals, each audio signal having been generated by a different microphone of a microphone array, wherein each eigenbeam output corresponds to a different eigenbeam for the microphone array and the microphones are arranged on a rigid or open sphere; a weighting unit that is configured to generate the auditory scene based on the eigenbeam outputs and their corresponding eigenbeams, wherein generating the auditory scene comprises applying a weighting value to each eigenbeam output to form a steered eigenbeam output; and a summing element configured to combine the weighted eigenbeams to generate the auditory scene, wherein the weighting unit or the summing element are further configured to apply a regularized equalizer filter to each eigenbeam output or steered eigenbeam output, the regularized equalizer filter(s) being configured to compensate for acoustic deficiencies of the microphone array and having a regularized equalization function.

BRIEF DESCRIPTION OF THE DRAWINGS

The figures identified below are illustrative of some embodiments of the invention. The figures are not intended to be limiting of the invention recited in the appended claims. The embodiments, both as to their organization and manner of operation, together with further object and advantages thereof, may best be understood with reference to the following description, taken in connection with the accompanying drawings, in which:

FIG. 1 is a schematic representation of a generalized structure of a sound capture system that employs modal beamforming;
FIG. 2 is a schematic representation of a possible microphone array for the sound capture system of FIG. 1 ;
FIG. 3 is a schematic representation of a more detailed structure of a sound capture system that employs modal beamforming;
FIG. 4 is a schematic representation of an arrangement for extracting ambisonic components with which an arbitrary sound field can be coded and/or decoded;
FIG. 5 is a schematic representation of an arrangement for measuring a sound pressure field;
FIG. 6 is a schematic diagram illustrating the radial function of a spherical microphone array;
FIG. 7 is a schematic diagram illustrating the magnitude frequency response of the equalizer filter corresponding to the radial function illustrated in FIG. 6;
FIG. 8 is a flow chart illustrating the process of calculating the equalizer filter referred to above in connection with FIG. 7;
FIG. 9 is a schematic diagram illustrating the regularization parameter over frequency for an improved 4^th-order modal beamformer with a given minimal white noise gain of -10 [dB];
FIG. 10 is a schematic diagram corresponding to the flow chart of FIG. 8 and the diagram of FIG. 9, and illustrating the white noise gain for a 4^th-order modal beamformer utilizing a regularized equalizing filter;
FIG. 11 is a schematic diagram corresponding to the flow chart of FIG. 8 and the diagram of FIG. 9, and illustrating the directivity index for a 4^th-order modal beamformer utilizing a regularized equalizing filter;
FIG. 12 is a schematic diagram illustrating the magnitude frequency response of the improved regularized equalizing filter;
FIG. 13 is a schematic diagram illustrating the corresponding phase response of the improved filter of FIG. 12;
FIG. 14 is a schematic diagram illustrating the magnitude frequency response of an improved, regularized equalizing filter;
FIG. 15 is a schematic diagram illustrating the corresponding phase frequency response of the improved filter of FIG. 14; and
FIG. 16 is a schematic diagram illustrating the cylindrical view of the directional pattern of the improved 4^th-order modal beamformer over frequency.

DESCRIPTION

FIG. 1 is a block diagram illustrating the basic structure of a beamforming sound capture system as described in more detail, for instance, in WO 03/061336 . The sound capture system comprises a plurality Q of microphones Mic1, Mic2, ... MicQ configured to form a microphone array, a matrixing unit MU (also known as modal decomposer or eigenbeam former), and a modal beamformer BF. In the system of FIG. 1, modal beamformer BF comprises a steering unit SU, a weighting unit WU, and a summing element SE, each of which will be discussed in further detail later in this specification. Each microphone Mic1, Mic2, ... MicQ generates a time-varying analog or digital audio signal S_i(θ_i,ϕ_i,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka) corresponding to the sound incident at the location of that microphone.
Matrixing unit MU decomposes (according to Y⁺ = (Y^TY)^-1Y^T) the audio signals S₁(θ₁,ϕ₁,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka) generated by the different microphones Mic1, Mic2, ... MicQ to generate a set of spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ), also known as eigenbeams or modal outputs, where each spherical harmonic Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ) corresponds to a different mode for the microphone array. The spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ) are then processed by beamformer BF to generate an auditory scene that is represented in the present example by output signal OUT (=Ψ(θ_Des,ϕ_Des)). In this specification, the term auditory scene is used generically to refer to any desired output from a sound capture system, such as the system of FIG. 1. The definition of the particular auditory scene will vary from application to application. For example, the output generated by beamformer BF may correspond to one or more output signals, e.g., one for each speaker used to generate the resultant auditory scene. Moreover, depending on the application, beamformer BF may simultaneously generate beampatterns for two or more different auditory scenes, each of which can be independently steered to any direction in space. In certain implementations of the sound capture system, microphones Mic1, Mic2, ... MicQ may be mounted on the surface of an acoustically rigid sphere or may be arranged on a virtual (open) sphere to form the microphone array. Alternatively, weighting unit WU may be arranged upstream of steering unit SU so that the non-steered eigenbeams are weighted (not shown).
FIG. 2 shows a schematic diagram of a possible microphone array MA for the sound capture system of FIG. 1. In particular, microphone array MA comprises the Q microphones Mic1, Mic2, ... MicQ of FIG. 1 mounted on the surface of an acoustically rigid sphere RS in a "truncated icosahedron" pattern. Each microphone Mic1, Mic2, ... MicQ in microphone array MA generates one of the audio signals S₁(θ₁,ϕ₁,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka) that is transmitted to matrixing unit MU of FIG. 1 via some suitable (e.g., wired or wireless) connection (not shown in FIG. 2). The continuous spherical sensor may be replaced by a discrete spherical array, in particular when the subsequent processing is digital-signal processing.
Referring again to FIG. 1, beamformer BF exploits the geometry of the spherical array of FIG. 2 and relies on the spherical harmonic decomposition of the incoming sound field by matrixing unit MU to construct a desired spatial response. In beamformer BF, steering unit SU generates (according to Y^+σ _m,n(θ_Des,ϕ_Des)) steered spherical harmonics Y⁺¹ _0,0(θ_Des,ϕ_Des), Y⁺¹ _1,0(θ_Des,ϕ_Des), ... Y^{+ σ} _m,n(θ_Des,ϕ_Des) from the spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ), which are further processed by weighting unit WU and summing element SE. Beamformer BF can provide continuous steering of the beampattern in 3-D space by changing a few scalar multipliers, while the filters determining the beampattern itself remain constant. The shape of the beampattern is invariant with respect to the steering direction. Beamformer BF needs only one filter per spherical harmonic (in the weighting unit WU), rather than per microphone as in known beamforming concepts, which significantly reduces the computational cost.
The sound capture system of FIG. 1 with the spherical array geometry of FIG. 2 enables accurate control over the beampattern in 3-D space. In addition to pencil-like beams, the sound capture system can also provide multi-direction beampatterns or toroidal beampatterns giving uniform directivity in one plane. These properties can be useful for applications such as general multichannel speech pick-up, video conferencing, and direction of arrival (DOA) estimation. It can also be used as an analysis tool for room acoustics to measure, e.g., directional properties of the sound field. The sound capture system of FIG. 1 offers another advantage: it supports decomposition of the sound field into mutually orthogonal components, the eigenbeams (i.e., spherical harmonics) that can also be used to reproduce the sound field. The eigenbeams are also suitable for wave field synthesis (WFS) methods that enable spatially accurate sound reproduction in a fairly large volume, allowing for reproduction of the sound field that is present around the recording sphere. This allows for all kinds of general real-time spatial audio.
A circuit that provides the beamforming functionality is shown in detail in FIG. 3. The modal beamformer circuit of FIG. 3 receives the Q audio signals S₁(θ₁,ϕ₁,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka) provided by microphones Mic1, Mic2, ... MicQ, transforms the audio signals S₁(θ₁,ϕ₁,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka) into the spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ), and steers the spherical harmonics. The circuit of FIG. 3 may be realized by hardware (and software) components that (together) build matrixing unit MU and the modal beamformer, which includes steering unit SU, modal weighting unit WU, and summing element SE. Matrixing unit MU and steering unit SU include coefficient elements CE that multiply the respective input signals with given coefficients and adders AD that sum up the input signals multiplied with coefficients so that the audio signals S₁(θ₁,ϕ₁,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka) are decomposed into the eigenbeams, i.e., the spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ), which are then processed to provide the steered spherical harmonics Y⁺¹ _0,0(θ_Des,ϕ_Des), Y⁺¹ _1,0(θ_Des,ϕ_Des), ... Y^{+ σ} _m,n(θ_Des,ϕ_Des). Modal weighting unit WU includes delay elements DE, coefficient elements CE, and adders AD, which are connected to form FIR filters for weighting. The output signals of these FIR filters are summed up by summing element SE.
Matrixing unit MU in the modal beamformer of FIG. 3 is responsible for decomposing the sound field, which is picked up by microphones Mic1, Mic2, ... MicQ and decomposed into the different eigenbeam outputs, i.e., the spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ), corresponding to the zero-order, first-order, and second-order spherical harmonics. This can also be seen as a transformation, where the sound field is transformed from the time or frequency domain into the "modal domain". To simplify a time-domain implementation, one can also work with the real and imaginary parts of the spherical harmonics. This will result in real-value coefficients, which are more suitable for a time-domain implementation. If the sensitivity equals the imaginary part of a spherical harmonic, then the beampattern of the corresponding array factor will also be the imaginary part of this spherical harmonic. To compensate for this frequency dependence, weighting unit WU may be implemented accordingly. Steering unit SU allows for steering the look direction by the angles θ_Des and ϕ_Des. Weighting unit WU compensates for a frequency-dependent sensitivity over the modes (eigenbeams), i.e., modal weighting over frequency, to the effect that the modal composition is adjusted, e.g., equalized. Equalizing is used to compensate for deficiencies of the microphone array, e.g., self-noise of the microphones, location errors of the microphones at the surface of the sphere, and other electrical and mechanical drawbacks. Summation node SE performs the actual beamforming for the sound capture system by summing up the weighted harmonics to yield the beamformer output OUT = ψ(θ_Des, ϕ_Des), i.e., the auditory scene.
Due to self-noise amplification, the order of a modal beamformer has to be reduced toward low frequencies, leading to a gradually decreasing directivity pattern with decreasing frequency. Regularization of the radial filter is configured such that, for example, the white noise gain will not fall below a given limit (e.g., WNG_dBMin = - 10[dB] ( ±3 [dB])) to keep the robustness, i.e., the self-noise amplification, within a tolerable range, and a constant directivity in look direction over frequency, such as 0 [dB], will be reached. By doing this, an optimum balance between robustness and directivity will result, leading to a modal beamformer with enhanced properties in which the directivity of the modal beamformer is enhanced by keeping the transfer function in look direction at a frequency-independent constant value and a minimum threshold of robustness. Regularization may be achieved by adapting the weighting coefficients of the FIR filters in weighting unit WU to an optimum.
But before going into detail on the regularization process, some general issues are discussed, in particular issues with regard to the measurement of the acoustic wave field via a rigid spherical microphone array. In general, sound pressure values p_a(θ_q, ϕ_q) can be described by way of the Fourier-Bessel series truncated to the M^th order at positions θ_q, ϕ_q of the Q microphones located at radius a, in which 1 ≤ q ≤ Q, as follows: $p_{a} (θ_{q} ϕ_{q}) = \sum_{m = 0}^{M} W_{m} (ka) \sum_{0 \leq n \leq m, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ_{q} ϕ_{q})$

in which: p_a(θ_q, ϕ_q) is the sound pressure measured by the q^th microphone located at position(s) θ_q, ϕ_q at the surface of a sphere having a radius a; W_m(ka) is the radial function that describes the acoustic wave field in the vicinity of the sphere center, i.e., in a certain distance from the center; and B^σ _m,n is the complex, m^th order, n^th degree, ambisonic component that completely describes wave fields up to the M^th order.
The above equation can be rewritten as: $\begin{array}{l} p_{a_{Q \times 1}} = & Y_{Q \times N} W_{N \times N} B_{N \times 1}, \\ W_{N \times N} = & \underset{(2 m + 1)}{diag} \{W_{m} (ka)\} \underset{(2 M + 1)}{} \\ = & diag [W] [_{0} (ka), \dots, W_{m} (ka), \dots, W_{m} (ka), \dots, W_{m} (ka), \dots, W_{m} (ka)], \\ diag (\cdot) = & diagonal matrix \end{array}$
By rearranging the previous equation, the ambisonic components up to M^th order can be calculated from the Q microphone signals: $\begin{matrix} B & = & W^{- 1} {(Y^{T} Y)}^{- 1} Y^{T} p_{a}, \\ Y^{+} & = & Y^{T} Y^{- 1} Y^{T} = pseudo inverse of Y, \\ B & = & diag \{W_{m}^{- 1}\} Y^{+} p_{a}, {in which}_{(2 m + 1)} \\ diag \{W_{m}^{- 1}\} & = & diag \{{EQ}_{m} (ka)\} = diag ([\frac{1}{W_{0} (ka)} \dots \frac{1}{W_{m} (ka)} \dots \frac{1}{W_{m} (ka)} \dots \frac{1}{W_{M} (ka)} \dots \frac{1}{W_{M} (ka)}]), and \\ diag \{{EQ}_{m} (ka)\} & = & diagonal matrix having the radial equalizing functions {EQ}_{m} (ka), in which 0 \leq m \leq M . \end{matrix}$
An arrangement for extracting the N ambisonic components B from the wave field p_a is illustrated in FIG. 4. The room and, thus, the spherical harmonics Y⁺¹ _0,0(θ,ϕ), Y⁺¹ _1,0(θ,ϕ), ... Y^+σ _m,n(θ,ϕ) are sampled by way of matrix Y⁺ at the position(s) θ_q, ϕ_q with the Q microphones, in which: $1 \leq q \leq Q,$

and $M = ⌊ \sqrt{Q} - 1 ⌋$

so that the N = (M+1)² ambisonic components of M^th order can be calculated from the samples.
Combining the Q microphone signals (1 ≤ q ≤ Q), i.e., S₁(θ₁,ϕ₁,ka), S₂(θ₁,ϕ₂,ka) ... S_Q(θ_Q,ϕ_Q,ka), by way of matrix Y⁺ into N output signals, which correspond to signals that would have been obtained when a wave field is sampled with N microphones having a certain directivity, can be seen as a transformation from the time domain into the spatial domain. By way of a radial equalizing function EQ_m(ka) the thereby generated spherical harmonic signals are then weighted to provide frequency-independent normalized-to-1 ambisonic components B^σ _m,n or the ambisonic signals B.
Referring now to FIG. 5, the derivation of the radial function Wm(ka) of a rigid closed sphere with microphones arranged on the sphere's surface can be described as follows: at the surface of a rigid closed sphere, velocity v_a is zero, i.e., v_a(θ_q,ϕ_q,ka) = 0.
Therefore, the related sound field is defined solely by the pressure distribution p_a(θ_q, ϕ_q) on the sphere's surface, which can be easily measured by sound pressure sensors (microphones). Mathematically, the underlying, physically logical condition that v_a(θ_q,ϕ_q,ka) = 0 holds at the surface of a rigid body can be met when inner sources (i.e., sources inside the measurement sphere) and outer sources (i.e., sources outside the measurement sphere) are superposed, as illustrated in FIG. 5. For instance, the outer sources serve to model the scattered field occurring at the surface of a scattered sphere. Based on the general form of the Bessel series, $P (r ω) = S (j ω) (\sum_{m = 0}^{\infty} j^{m} j_{m} (kr) \sum_{0 \leq n \leq m, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ ϕ) + \sum_{m = 0}^{\infty} j^{m} h_{m}^{} (kr) \sum_{0 \leq n \leq m, σ = \pm 1} A_{m, n}^{σ} Y_{m, n}^{σ} (θ ϕ)),$

in which $A_{m, n}^{σ}$
are the weighting coefficients (ambisonic coefficients) that relate to the spherical Bessel function of the 1^st kind j_m (kr) and that describe the pervasive wave field (plane wave); $A_{m, n}^{σ}$
are the weighting coefficients that relate to the spherical Hankel function of the 2^nd kind $h_{m}^{(2)} (kr)$
and that describe the outgoing spherical wave field (spherical wave), eventually representing the scattered wave field at the surface of the solid sphere; P(r,ω) is the sound pressure spectrum at the position r = r,θ,φ; S(jω) is the input signal in the spectral domain; j is the imaginary unit for complex numbers with $j = \sqrt{- 1};$
j_m (kr) is the spherical Bessel function of the 1^st kind, m^th order; $h_{m}^{(2)} (kr)$
is the spherical Bessel function of the 2^nd kind, m^th order; and based on the assumption that the outer sources provide incoming plane waves (indicated by the index "Inc"), the wave field generated by the outer sources that moves toward the center and thus toward the sphere's surface can be described as follows: $\begin{matrix} p_{Inc} (θ_{q} ϕ_{q} ka) & = \sum_{m = 0}^{\infty} j^{m} j_{m} (ka) \sum_{0 \leq n \leq m, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ_{q} ϕ_{q}), in which \\ p_{Inc} (θ_{q} ϕ_{q} ka) & = Sound pressure received by the q - th microphone at position (θ_{q} ϕ_{q}), and generated by the outer source . \end{matrix}$
Furthermore, it is required that the velocity at the sphere's surface, i.e., r = a, is zero:
V_Inc(θ_q,Φ_q,ka) + V_Scat(θ_q,Φ_q,ka) = 0 or
V_Scat(θ_q,Φ_q,ka) = - V_Inc(θ_q,Φ_q,ka), in which
V_Inc(θ_q,Φ_q,ka) = velocity at the q^th microphone at position (θ_q,Φ_q) caused by the plane wave from the outer source, and
V_Scat(θ_q,Φ_q,ka) = velocity at the q^th microphone at position (θ_q,Φ_q) caused by the spherical wave from the outer source.
Differentiating the previous equation with respect to r or a leads to $v_{Scat} (θ_{q} φ_{q} ka) = - k \sum_{m = 0}^{\infty} j^{m} j_{m}^{ʹ} (ka) \sum_{0 nm, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ_{q} φ_{q}),$

j'_m(ka) = 1^st derivative of the spheric Bessel function of the 1^st kind, m^th order.
Applying the Euler equation to this leads to: $\begin{matrix} v (θ ϕ kr) & = \frac{1}{j ρck} \nabla p (θ ϕ kr) \\ \nabla & = \frac{1}{r^{2}} \frac{\partial}{\partial r} r^{2} \frac{\partial}{\partial r} + \frac{1}{r^{2} \sin ϕ} \frac{\partial}{\partial ϕ} \sin ϕ \frac{\partial}{\partial ϕ} + \frac{1}{r^{2} \sin^{2} ϕ} \frac{\partial^{2}}{\partial θ^{2}}, i . e ., \\ \nabla & = Nabla operator expressed as spherical coordinates, \\ ρ & = Medium (air) density in [\frac{kg}{m}] ρ_{0} = 1.292 [\frac{kg}{m}], \\ c & = Sonic speed in [\frac{m}{s}] v_{0} = 343 [\frac{m}{s}], \end{matrix}$
The Euler equation links the sound velocity v(θ_q,Φ_q,ka) to the sound pressure p(θ_q,Φ_q,ka) and the fact that sound velocity v(θ_q,Φ_q,ka) and sound pressure p(θ_q,Φ_q,ka) can be derived by weighting spherical harmonics according to the Fourier-Bessel series: $\begin{matrix} v (θ ϕ kr) & = \sum_{m = 0}^{\infty} (ka) \sum_{0 \leq n \leq m, σ = \pm 1} v_{m, n}^{σ} (kr) Y_{m, n}^{σ} (θ_{q} ϕ_{q}), in which \\ v_{m, n}^{σ} (kr) & = Sound coefficients of corresponding, spherical harmonics, \end{matrix}$

so that the following relationship of sound velocity v(θ_q,Φ_q,ka) and sound pressure p(θ_q,Φ_q,ka) at the surface of a rigid sphere applies: $\begin{array}{l} p (θ ϕ kr) & = jρc \sum_{m = 0}^{\infty} \frac{h_{m}^{} (kr)}{h_{m}} \sum_{0 \leq n \leq m, σ = \pm 1} v_{m, n}^{σ} (kr) Y_{m, n}^{σ} (θ_{q} φ_{q}), in which \\ h_{m}^{} (kr) & = 1^{st} derivative of the spherical Bessel function of the 2^{nd} kind, m^{th} order . \end{array}$
The sound velocity coefficients v^σ _m,n of an incoming plane wave can be calculated from the ambisonic coefficients B^σ _m,n as follows: $v_{m, n}^{σ} (ka) = - k j^{m} j_{m}^{ʹ} (ka) B_{m, n}^{σ} .$
From the two previous equations, a simplified relationship can be provided for the sound pressure p_scat(θ_q,Φ_q,ka) that results from the sound field of the spherical waves distributing inner sound sources and that can be measured on the sphere's surface (r = a) at the positions (θ_q,Φ_q) where the q pressure sensors (microphones) are arranged,, thereby neglecting the constants jpck and 4π: $p_{Scat} (θ_{q} ϕ_{q} ka) = - \sum_{m = 0}^{\infty} \frac{j^{m} j_{m}^{ʹ} (ka) h_{m}^{} (ka)}{h_{m}} \sum_{0 \leq n \leq m, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ_{q} ϕ_{q}) .$
Superimposing the wave fields, e.g., the sound pressures of the inner and outer sources, leads to the sound pressures occurring at the surface of a rigid sphere having a radius a: $\begin{matrix} p (θ_{q} ϕ_{q} ka) & = p_{Inc} (θ_{q} ϕ_{q} ka) + p_{Scat} (θ_{q} ϕ_{q} ka) \to \\ p (θ_{q} ϕ_{q} ka) & = \sum_{m = 0}^{\infty} j^{m} [j_{m} (ka) - \frac{j_{m}^{ʹ} (ka) h_{m}^{} (ka)}{h_{m}}] \sum_{0 \leq n \leq m, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ_{q} ϕ_{q}) \end{matrix}$

which can be simplified by way of the Wronskian relation: $j_{m} (kr) h_{m}^{} (kr) j_{m}^{ʹ} (kr) h_{m}^{} (kr) = \frac{j}{(kr)},$

to read as: $W_{m} (ka) = \frac{j^{(m - 1)}}{{(ka)}^{2} h_{m}^{} (ka)},$

so that: $\begin{matrix} p (θ_{q} ϕ_{q} ka) & = \sum_{m = 0}^{\infty} \frac{j^{m - 1}}{{(ka)}^{2} h_{m}^{} (ka)} \sum_{0 \leq n \leq m, σ = \pm 1} B_{m, n}^{σ} Y_{m, n}^{σ} (θ_{q} ϕ_{q}) \end{matrix} .$
An accordingly calculated magnitude frequency response for the radial functions w_m(ka)=1/EQ_m(ka) for a sphere radius of a=0.9m in a spectral range of 50Hz to 6700Hz for orders m up to M=10 is shown in FIG. 6. The corresponding radial equalizing function EQ_m(ka) for orders m up to M=4, is depicted in FIG. 7. The equations outlined above provide a least-square solution that offers the smallest-mean-squared error, but cannot be used per se in connection with small or very small w_m(ka) values. However, this is the case at higher orders m and/or lower frequencies f so that instabilities may occur due to amplified noise of the sensors or measurement system, positioning errors of the microphones, or irregularities in the frequency characteristic, which may deteriorate the results.
By introducing a regularization functionality, which means limiting the radial equalizing function EQ_m(ka) by way of a regularization function T_m(ka), e.g., to a maximum gain, these drawbacks can be overcome, whereby filters known as Tikhonov filters may be used. The following applies: $\begin{array}{l} {EQ}_{m} (ka) & = \frac{T_{m} (ka)}{W_{m} (ka)}, \\ T_{m} (ka) & = \frac{|W_{m} {(ka)}^{2}|}{|W_{m} {(ka)}^{2}| + ϵ^{2}}, \\ ϵ & = {\begin{cases} ϵ = 0, \\ \in = \infty, \end{cases} \\ T_{m} (ka) & = Regularization function with m = [0 \dots M], \\ ϵ & = Regularization parameter ϵ \geq 0. \end{array}$
If ε = 0, the system works as a least-square beamformer (ideal case as shown above, i.e., without any regularization, which leads to the solution with the highest directivity but also with the least robustness). If ε = ∞, the system works as a delay-and-sum beamformer, which delivers the maximum possible robustness but the least directivity. The radial equalizing functions EQ_m(ka) can be further simplified to read as: $\begin{array}{l} {EQ}_{m} (ka) & = \frac{T_{m} (ka)}{W_{m} (ka)} = \frac{W_{m} {(ka)}^{*}}{|W_{m} {(ka)}^{2}| + ϵ^{2}}, \\ W_{m} (ka) & = Complex conjugate of matrix W_{m} (ka) \end{array}$
Thus, with regularization parameters ε(ka) or ε(ω) one can control the modal beamformer to exhibit a certain robustness with respect to the inherent noise that is amplified with w_m(ka), in particular at lower frequencies.
In order to calculate appropriate values for the regularization parameters ε(ka) or ε(ω), a parameter called susceptibility K(ω) or its reciprocal white noise gain WNG(ω) may be used. For instance, white noise gain WNG(ω) addresses most effects and problems caused by microphone noise, changes in the transfer function, and variations of the microphone positions, so that it is representative of the sensitivity of the beamformer. A white noise gain WNG(ω) > 0 [dB] characterizes a sufficient suppression of uncorrelated errors and is thus indicative of a robust system behavior, while a white noise gain WNG(ω) < 0 [dB] is indicative of an amplification of the noise and is therefore indicative of an increasingly unstable system behavior.
The white noise gain WNG(ω) represents the ratio of the energy of the useful signal provided by the microphone array to the energy of the noise signal provided by the microphone array and can be expressed as: $\begin{array}{l} WING (ω) & = \frac{{|d (θ_{0} φ_{0} ω)|}^{2}}{\frac{1}{Q^{2}} \sum_{q = 1}^{Q} | {H_{q} (θ_{q} φ_{q} ω) |}^{2}}, in which \\ d (θ) (_{0} φ_{0} ω) & = output signal of the beamformer having a look direction of (θ_{0} φ_{0}) \\ H_{q} (θ_{q} φ_{q} ω) & = noise signal of the microphones caused by inherent noise . \end{array}$
The useful signal d(θ₀, ϕ₀, ω) output by the microphone array and the output signal of the beamformer having the required look direction can be described as follows: $\begin{array}{l} d (θ) (_{0} φ_{0} ω) & = \frac{1}{N} \sum_{m = 0}^{M} \sum_{0 \leq n \leq m, σ = \pm 1} T_{m} (ω) Y_{m, n}^{σ} (θ_{0} φ_{0}), in which \\ N & = {(M + 1)}^{2} = Number of spherical harmonics, \\ Y_{m, n}^{σ} (θ_{0} φ_{0}) & = Spherical harmonic of m - th order, n - th degree in look direction of thebeamformer (θ) (_{0} φ_{0}), \\ T_{m} (ω) & = Tikhonov regularization filter . \end{array}$
The noise signal of the q^th microphone of the microphone array over frequency caused by the inherent noise of the microphone is represented by H_q(θ_q, ϕ_q, ω), which is: $H (θ_{q} φ_{q} ω) = \frac{1}{N^{2}} \sum_{m = 0}^{M} \sum_{0 \leq n \leq m, σ = \pm 1} E Q_{m} (ω) Y_{m, n}^{σ} (θ_{q} φ_{q}) .$
The frequency-dependent white noise gain WNG_dB(ω) in [dB] is: ${WNG}_{dB} (ω) = 10 \log_{10} (WNG (ω)) .$
Thereby, the maximum white noise gain WNG_dB(ω) for a modal beamformer is as follows: ${WNG}_{{dB}_{M at}} = 10 \log_{10} (Q),$

which is, e.g., ≈ 15 [dB] for Q = 32.
Furthermore, it has been found that best results are achieved when an array gain G(ω) is maximum and the white noise gain WNG_dB(ω) is above a given minimum value, for instance, WNG_dB(ω) > -10 [dB]. The array gain G(ω) can be calculated according to: $\begin{array}{l} G (ω) & = \frac{{|ψ (θ_{0} φ_{0} ω)|}^{2}}{4 π \int_{θ = 0}^{2 π} \int_{φ = 0}^{π} {|ψ (θ φ ω)|}^{2} \sin (θ) dθdφ}, \\ ψ (θ φ ω) & = Directivity of the microphone array . \end{array}$
In words, the array gain G(ω) is the ratio of the energy of sound coming from the look direction of the beamformer to the energy of omnidirectionally incoming sound.
The directivity Ψ(θ₀, ϕ₀, ω) for incoming sound from the look direction can be described as: $\begin{array}{l} φ (θ_{0} φ_{0} ω) & = \frac{1}{N} \sum_{m = 0}^{M} \sum_{0 \leq n \leq m, σ = \pm 1} C_{m, n}^{σ} (θ φ ω) Y_{m, n}^{σ} (θ_{0} φ_{0} ω), in which \\ C_{m, n}^{σ} (θ φ ω) & = diag (\frac{T_{m} (ω)}{W_{m} (ω)}) Y^{+} p_{a}, and \\ C_{m, n}^{σ} (θ φ ω) & = Weighting factors of the modal beamformer, \end{array}$

while the directivity Ψ(θ₀, ϕ₀, ω) for omnidirectionally incoming sound can be described as: $ψ (θ φ ω) = \frac{4 π}{Q} \frac{1}{N} \sum_{m = 0}^{M} \sum_{0 \leq n \leq m, σ = \pm 1} C_{m, n}^{σ} (θ φ ω) Y_{m, n}^{σ} (θ φ ω) .$
Then, the frequency-dependent array gain G(ω) is: $G_{dB} (ω) = 10 \log_{10} (G (ω)) .$
The array gain G(ω) is a measure for the improvement in the acoustic signal-to-noise ratio SNR, based on the directivity of the modal beamformer for sound coming from the look direction of the beamformer. The achievable maximum array gain G_dBmax(ω) in [dB] is: ${G_{dB}}_{Max} (ω) = 20 \log_{10} (M + 1) .$
For instance, when M = 4, then the achievable maximum array gain G_dBmax(ω) is approximately 14dB.
Referring now to FIG. 8, an exemplary iterative process of adapting the parameters of a modal beamformer is described in detail. In an initializing step 1, parameters required for calculation are set to a starting value or a constant value, as the case may be. The following parameters may be set to, for instance:

WNG parameter
- Minimum white noise gain threshold WNG_dBMin(ω), which is not undercut by the regularized modal beamformer; for instance, WNG _dBMin = -10[dB].
- Offset ΔWNG_dB in [dB], by which the minimum white noise gain threshold WNG_dBMin(ω) is overcut or undercut during the adaptation process; for instance, ΔWNG_dB = 0.5dB.
Regularization parameter ε(ω)
- Maximum regularization parameter ε _Max , which is the upper limit for the regularization parameter ε(ω); for instance, ε _Max = 1.
- Step size by which the regularization parameter ε(ω) is increased or decreased.
Frequency ω
- Start value of the (angular) frequency for the adaptation process; for instance, ω = 2π1[HZ],
- Step size by which the (angular) frequency is increased or decreased when the adaption is completed at a certain frequency; for instance, Δω = 2π1[Hz],
- Maximum (angular) frequency at which an adaptation is performed; for instance, ω _Max = πf_s [Hz].

Then the adaptation process is started in step 2. In step 3, the regularization parameter is set to, e.g., ε(ω) = 0 for the current frequency ω under investigation. Regularization provides the ability to achieve a robust system by way of adjusting the regularization parameter ε(ω). This is a trade-off between a higher robustness, i.e., a higher white noise gain WNG _dB(ω), and less directivity in look direction ψ(θ₀,ϕ₀,ω), i.e., a decreasing array gain G_dB (ω). If the regularization parameter is set to ε(ω) = 0, the adaptation process begins with the maximum directivity G_dBMax(ω) and is then decreased by the increasing regularization parameter ε(ω) until the desired white noise gain threshold WNG_dBMin is no more undercut.
Steps 4, 5, and 6 serve to calculate the white noise gain WNG_db(ω). In step 4, the regularization filter T_m(ω) or T_m(ka), is calculated as outlined above using regularization parameter ε(w). In step 5, the transfer function EQ_m(ω) is calculated as outlined above using the current version of the transfer function T_m(ω) of the regularization filter or the current version of the regularization parameter ε(ω). In step 6, the white noise gain WNG_db(ω) is calculated as outlined above using the transfer function EQ_m(ω) and the current version of the transfer function T_m(ω) of the regularization filter (regularization function). Steps 4 and 5 may be taken simultaneously or in opposite order.
In the following step 7, the current white noise gain WNG_db(ω) is compared with the predetermined threshold WNG_dbMin so that according to step 8 or 9, $ϵ (ω) = {\begin{cases} ϵ (ω) + Δ ϵ, if ({WNG}_{dB} (ω) > {WNG}_{{dB}_{Min}} \\ ϵ (ω) + Δ ϵ, otherwise . \end{cases}$
In step 10, the directivity ψ(θ₀,ϕ₀,ω) of the modal beamformer is calculated for sound coming from the look direction using the transfer function EQ_m(ω) provided in step 5.
In step 11, the transfer function of the equalizing filter, the equalizing function EQ_m(ω), is scaled according to: ${EQ}_{m} (ω) = \frac{{EQ}_{m} (ω)}{ψ (θ_{0} φ_{0} ω)} .$
In step 12, the current white noise gain WNG_db(ω) is compared with the predetermined white noise gain threshold WNG_dBMin(ω), and it is checked to see if the regularization parameter ε(ω) has reached its maximum according to (|WNG _dBMin - WNG _dB(ω)| > ΔWNG) and (ε(ω) ≤ ε_Max). If both requirements are met, i.e., if (|WNG _dBMin - WNG _dB(ω)| > ΔWNG)&(ε(ω) ≤ ε _Max ), the adaptation process is not yet finished, resulting in jumping back to step 3 and starting again with an updated regularization parameter ε(ω).
Otherwise, i.e., if the adaptation process for the current angular frequency ω has been completed so that the current equalizing function EQ_m(ω) has been limited to the given threshold or if the current regularization parameter has reached its maximum, the angular frequency ω is incremented according to ω = ω + Δω in step 13, which is followed by step 14.
In step 14, the current angular frequency ω is checked to see if it has reached its maximum value ω_Max. If ω < ω_Max, the process jumps back to step 2 using the current angular frequency ω. Otherwise, i.e., if the equalizing filter has been adapted for the complete set of frequencies, the filter coefficients are outputted in step 15.
Referring to FIGS. 9 through 16, measurements made with an exemplary arrangement in combination with an exemplary adaptation method are described in detail. The arrangement includes a sphere having a radius of a = 0.09 [m] and the shape of an obtuse icosahedron, which is a blend of two platonic solids, i.e., an icosahedron and a dodecahedron. The number of microphones arranged on the sphere is Q = 32. The directivity characteristic of the beamformer is a 4^th-order cardioid and the minimum white noise gain WNG_db(ω) used in the adaptation process is -10 [dB].
FIG. 9 illustrates a regularization parameter over frequency ε(ω) for a common 4^th-order modal beamformer. As can be seen from FIG. 9 with regularization, i.e., limiting the maximum directivity index for frequencies up to, for instance, 750 [Hz], values above a minimum lower threshold WNG_dbMin of -10 [dB] may be maintained. Above 750 [Hz], the exemplary beamformer exhibits the desired directivity of a 4^th-order cardioid. FIG. 10 illustrates the corresponding white noise gain WNG for the above-mentioned 4^th-order beamformer, which supports the findings in connection with the diagram of FIG. 9. The corresponding directivity index DI and the array gain G_db(ω) as shown FIG. 11 illustrate that the maximum array gain G_db(ω) is more or less below 10 [dB] depending on the frequency.
However, applying the adapted regularization filter (T_m(ω)) described herein causes a monotonic decrease of the array gain G_db(ω) down to 7.5 [dB] at 20 [Hz] as shown in FIG. 11. The magnitude frequency responses of the thereby applied M regularization filter T_m(ω) is shown in FIG. 12, and its corresponding frequency-independent phase characteristic is illustrated in FIG. 13.
Further applying the optimized radial equalizing filter (EQm(ω)) causes an improved regularized equalizing filter whose magnitude frequency response is depicted in FIG. 14 and whose phase frequency response is depicted in FIG. 15. The directivity of the corresponding improved beamformer at frequencies above 650 [Hz] is a 4^th-order cardioid, between 300 [Hz] and 650 [Hz] a 3^rd-order cardioid, between 70 [Hz] and 300 [Hz] a 2^nd-order cardioid, and below 70 [Hz] a 1^st-order cardioid. FIG. 16 depicts the resulting directivity of the beamformer outlined above in look directivity ψ(θ₀,ϕ₀,ω) as amplitudes over frequency.
While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. The words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.

Claims

A method for generating an auditory scene, comprising:
receiving eigenbeam outputs, the eigenbeam outputs having been generated by decomposing a plurality of audio signals, each audio signal having been generated by a different microphone of a microphone array, wherein each eigenbeam output corresponds to a different eigenbeam for the microphone array, and the microphones are arranged on a rigid or open sphere; and

generating the auditory scene based on the eigenbeam outputs and their corresponding eigenbeams, wherein generating the auditory scene comprises applying a weighting value to each eigenbeam output to form a steered eigenbeam output; and

combining the weighted eigenbeams to generate the auditory scene, wherein

generating the auditory scene further comprises applying a regularized equalizer filter to each eigenbeam output or steered eigenbeam output, the regularized equalizer filter(s) being configured to compensate for acoustic deficiencies of the microphone array and having a regularized equalization function.
The method of claim 1 wherein the regularized equalization function is a radial equalization function that comprises the quotient of a regularization function limiting the radial equalization function and a radial function describing an acoustic wave field in the vicinity of the surface of the rigid sphere or the center of the open sphere.
The method of claim 2 wherein the regularization function is the quotient of the absolute value of the square of the radial function and the sum of the absolute value of the square of the radial function and a regularization parameter, the regularization parameter being set to a value greater than 0 and smaller than a maximum value that is smaller than infinity.
The method of claim 3 wherein the maximum value of the regularization parameter is 1.
The method of claim 3 or 4 wherein the regularization parameter depends on a susceptibility parameter that is the reciprocal of a white noise gain parameter, the white noise gain parameter being greater than a minimum white noise gain parameter that is not undercut by the equalizer filter.
The method of claim 5 wherein the minimum white noise gain parameter is -10 [dB].
The method of any one of claims 3 through 6 wherein the regularization parameter is adapted in an iterative process.
The method of claim 7 wherein, for a given frequency, the iterative process comprises:
setting at least the minimum white noise gain parameter and the regularization parameters to a starting value or a constant value; and

calculating the white noise gain, the regularization function, and the radial equalization function; and

comparing the calculated white noise gain parameter with the set minimum white noise gain parameter; and

calculating the directivity for sound coming from the look direction using the radial equalization function; and

scaling the radial equalization function; and

comparing the calculated white noise gain with the set minimum white noise gain and checking if the regularization parameter has reached its maximum; if both requirements are met, the adaptation process is not yet finished, resulting in jumping back and starting again with an updated regularization parameter; otherwise the process for the current frequency has been completed and the frequency is incremented; and

checking if the current frequency has reached its maximum value; if the frequency has not reached its maximum, the process jumps back and starts again with another frequency; otherwise the filter coefficients are outputted.
The method of claim 8 wherein the iterative process comprises an offset white noise gain parameter by which the minimum white noise gain parameter is overcut or undercut at maximum during adaptation.
A modal beamformer system for generating an auditory scene, comprising:
a steering unit that is configured to receive eigenbeam outputs, the eigenbeam outputs having been generated by decomposing a plurality of audio signals, each audio signal having been generated by a different microphone of a microphone array, wherein each eigenbeam output corresponds to a different eigenbeam for the microphone array, and the microphones are arranged on a rigid or open sphere; and

a weighting unit that is configured to generate the auditory scene based on the eigenbeam outputs and their corresponding eigenbeams, wherein generating the auditory scene comprises applying a weighting value to each eigenbeam output to form a steered eigenbeam output; and

a summing element configured to combine the weighted eigenbeams to generate the auditory scene, wherein

the weighting unit or the summing element are further configured to apply a regularized equalizer filter to each eigenbeam output or steered eigenbeam output, the regularized equalizer filter(s) being configured to compensate for acoustic deficiencies of the microphone array and having a regularized equalization function.
The system of claim 10 wherein the regularized equalization function is a radial equalization function that comprises the quotient of a regularization function limiting the radial equalization function and a radial function describing an acoustic wave field in the vicinity of the sphere.
The system of claim 11 wherein the regularization function is the quotient of the absolute value of the square of the radial function and the sum of the absolute value of the square of the radial function and a regularization parameter, the regularization parameter being set to a value greater than 0 and smaller than a maximum value that is smaller than infinity.
The system of claim 12 wherein the maximum value of the regularization parameter is 1.
The system of claim 12 or 13 wherein the regularization parameter depends on a susceptibility parameter that is the reciprocal of a white noise gain parameter, the white noise gain parameter being greater than a minimum white noise gain parameter that is not undercut by the equalizer filter.
The system of claim 14 wherein the minimum white noise gain parameter is -10 [dB].