2024 32st European Signal Processing Conference (EUSIPCO), 2024
Loudspeaker array beamforming is a common signal processing technique for acoustic directivity co... more Loudspeaker array beamforming is a common signal processing technique for acoustic directivity control and robust audio reproduction. Unlike their microphone counterpart, loudspeaker constraints are often heterogeneous due to arrayed transducers with varying operating ranges in frequency, acousticelectrical sensitivity, efficiency, and directivity. This work proposes a frequency-regularization method for generalized Rayleigh quotient directivity specifications and two novel beamformer designs that optimize for maximum efficiency constant directivity (MECD) and maximum sensitivity constant directivity (MSCD). We derive fast converging and analytic solutions from their quadratic equality constrained quadratic program formulations. Experiments optimize generalized directivity index constrained beamformer designs for a full-band heterogeneous array.
2023 31st European Signal Processing Conference (EUSIPCO), 2023
This paper presents several novel techniques for stereo to multichannel upmixing under barycentri... more This paper presents several novel techniques for stereo to multichannel upmixing under barycentric constraints and beamformer formulations in the short-time Fourier transform (STFT) domain. We derive optimal solutions to center channel extraction and power-averaged monoization problems for a barycentric weighted mid-side decomposition. We then generalize passive multichannel upmixing via an active pan-potted barycentric beamformer formulation with sparse Chebyshev polynomial directivity. Experiments analyze center channel leakage, evaluate subjective listening tests, and render sample multichannel upmixes over varied speaker arrangements.
EURASIP Journal on Audio, Speech, and Music Processing, 2021
Microphone and speaker array designs have increasingly diverged from simple topologies due to div... more Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in the array’s acoustic radiation pattern, spatial distribution of target and noise sources, and intended beampattern directivity. Relevant tasks such as representing complex pressure fields, specifying spatial priors, and composing beampatterns can be efficiently synthesized using spherical harmonic (SH) basis functions. This paper extends the expansion of common stationary covariance functions onto the SHs and proposes models for encoding magnitude functions on a sphere. Conventional beamformer designs are reformulated in terms of magnitude density functions and beampatterns along SH bases. Applications to speaker far-field response fitting, cross-talk cancelation design, and microphone beampattern fitting are presented.
Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and p... more Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N 3) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the covariance function is expressible as a tensor product kernel (TPK) and the inputs form a multidimensional grid, it was shown that the costs for exact GPR can be reduced to a sub-quadratic function of N. We extend these exact fast algorithms to sparse GPR and remark on a connection to Gaussian process latent variable models (GPLVMs). In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. We demonstrate our method on synthetic, text scan, and magnetic resonance imaging (MRI) data reconstructions. 1
We parallelize a version of the active-set iterative algorithm derived from the original works of... more We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson (1974) on multi-core architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and downdate the QR factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multi-core CPUs. Both synthetic and non-synthetic data are used in the experiments.
Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used... more Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used in spatial audio synthesis. They describe the scattering response of the head, torso, and pinnae of the subject. We propose a structural factorization of the HRIRs into a product of non-negative and Toeplitz matrices; the factorization is based on a novel extension of a non-negative matrix factorization algorithm. As a result, the HRIR becomes expressible as a convolution between a direction- independent resonance filter and a direction-dependent reflection filter. Further, the reflection filter can be made sparse with minimal HRIR distortion. The described factorization is shown to be applicable to the arbitrary source signal case and allows one to employ time-domain convolution at a computational cost lower than using convolution in the frequency domain. Index Terms—Head-related impulse response, non-negative matrix factorization, Toeplitz, convolution, sparsity I. INTRODUCTION The hu...
Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD) matrices have... more Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD) matrices have been used to generate preconditioners for various iterative solvers. These solvers generally use preconditioners derived from the matrix system, , in order to reduce the total number of iterations until convergence. In this report, we investigate the findings of ref. [1] on their method for computing preconditioners from SSPD matrix. In particular, we focus on their first supernodal Cholesky factorization algorithm designed for matrices with naturally occurring block structures. The supernodal incomplete Cholesky algorithm for preconditioner generation is motivated by how the Cholesky factorization accesses column nodes, the overhead from indirect addressing of SSPD matrix , and the memory advantages obtained from level 3 BLAS routines with dense blocking. We introduce this motivation and explain some priors such as supernodal elimination trees [2] in the background section. In Matlab, ...
Scientific computing and numerical analysis techniques have been widely adapted and implemented i... more Scientific computing and numerical analysis techniques have been widely adapted and implemented in the Fortran language since its inception. Many successive versions of Fortran have allowed new functionalities to be incorporated into the language while maintaining backward compatibility with older code. The current standard, Fortran 2003, is extensively used in high-performance computing on supercomputers and computer clusters. While Fortran users may have access to these nodes, fast processing times are still possible on a single desktop system. The graphical processing unit (GPU) on modern graphic cards can perform a large amount of instructions in parallel execution. By employing programmable GPUs and their respective frameworks, many scientific computations are sped-up over their CPU counterparts at a high cost of rewriting code for specific GPU architectures. To address this issue, we have developed a middleware library on top of Fortran 95 and later versions that interfaces wi...
The fast multipole method (FMM) performs fast approximate kernel summation to a specified toleran... more The fast multipole method (FMM) performs fast approximate kernel summation to a specified tolerance $\epsilon$ by using a hierarchical division of the domain, which groups source and receiver points into regions that satisfy local separation and the well-separated pair decomposition properties. While square tilings and quadtrees are commonly used in 2D, we investigate alternative tilings and associated spatial data structures: regular hexagons (septree) and triangles (triangle-quadtree). We show that both structures satisfy separation properties for the FMM and prove their theoretical error bounds and computational costs. Empirical runtime and error analysis of our implementations are provided.
Advances in speaker, room, and device acoustic modeling have given rise to large scale simulation... more Advances in speaker, room, and device acoustic modeling have given rise to large scale simulations of their spatial-frequency responses suitable for tasks such as rapid hardware prototyping, audio front-end algorithm validation, and back-end data-set augmentation for machine learning. Joint modeling of sources, rooms, and receivers is computationally prohibitive due to the large combinatorial space, coupling between models, and overhead cost of data exchange. To address these issues, we introduce the complex spherical harmonics as a separable set of basis functions for representing each of these models and their first-order interactions. We then present a partitioned frequency-dependent image-source model expanded into the spherical harmonics for efficient impulse response synthesis. Results are validated against real-world measurements.
Title of dissertation: FAST NUMERICAL AND MACHINE LEARNING ALGORITHMS FOR SPATIAL AUDIO REPRODUCT... more Title of dissertation: FAST NUMERICAL AND MACHINE LEARNING ALGORITHMS FOR SPATIAL AUDIO REPRODUCTION Yuancheng Luo, Doctor of Philosophy, 2014 Dissertation directed by: Professor Ramani Duraiswami Department of Computer Science Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatialdirectional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener’s anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The la...
I-vectors are concise representations of speaker characteristics. Recent progress in i-vectors re... more I-vectors are concise representations of speaker characteristics. Recent progress in i-vectors related research has utilized their ability to capture speaker and channel variability to develop efficient automatic speaker verification (ASV) systems. Inter-speaker relationships in the i-vector space are nonlinear. Accomplishing effective speaker verification requires a good modeling of these non-linearities and can be cast as a machine learning problem. Kernel partial least squares (KPLS) can be used for discriminative training in the i-vector space. However, this framework suffers from training data imbalance and asymmetric scoring. We use “one shot similarity scoring” (OSS) to address this. The resulting ASV system (OSS-KPLS) is tested across several conditions of the NIST SRE 2010 extended core data set and compared against state-of-the-art systems: Joint Factor Analysis (JFA), Probabilistic Linear Discriminant Analysis (PLDA), and Cosine Distance Scoring (CDS) classifiers. Improve...
Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used... more Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used in spatial audio synthesis. They describe the scattering response of the head, torso, and pinnae of the subject. We propose a structural factorization of the HRIRs into a product of non-negative and Toeplitz matrices; the factorization is based on a novel extension of a non-negative matrix factorization algorithm. As a result, the HRIR becomes expressible as a convolution between a direction-independent \emph{resonance} filter and a direction-dependent \emph{reflection} filter. Further, the reflection filter can be made \emph{sparse} with minimal HRIR distortion. The described factorization is shown to be applicable to the arbitrary source signal case and allows one to employ time-domain convolution at a computational cost lower than using convolution in the frequency domain.
Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and p... more Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the covariance function is expressible as a tensor product kernel (TPK) and the inputs form a multidimensional grid, it was shown that the costs for exact GPR can be reduced to a sub-quadratic function of N . We extend these exact fast algorithms to sparse GPR and remark on a connection to Gaussian process latent variable models (GPLVMs). In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. We demonstrate our method on synthetic, text scan, and magnetic resonance imaging (MRI) data reconstructions.
2024 32st European Signal Processing Conference (EUSIPCO), 2024
Loudspeaker array beamforming is a common signal processing technique for acoustic directivity co... more Loudspeaker array beamforming is a common signal processing technique for acoustic directivity control and robust audio reproduction. Unlike their microphone counterpart, loudspeaker constraints are often heterogeneous due to arrayed transducers with varying operating ranges in frequency, acousticelectrical sensitivity, efficiency, and directivity. This work proposes a frequency-regularization method for generalized Rayleigh quotient directivity specifications and two novel beamformer designs that optimize for maximum efficiency constant directivity (MECD) and maximum sensitivity constant directivity (MSCD). We derive fast converging and analytic solutions from their quadratic equality constrained quadratic program formulations. Experiments optimize generalized directivity index constrained beamformer designs for a full-band heterogeneous array.
2023 31st European Signal Processing Conference (EUSIPCO), 2023
This paper presents several novel techniques for stereo to multichannel upmixing under barycentri... more This paper presents several novel techniques for stereo to multichannel upmixing under barycentric constraints and beamformer formulations in the short-time Fourier transform (STFT) domain. We derive optimal solutions to center channel extraction and power-averaged monoization problems for a barycentric weighted mid-side decomposition. We then generalize passive multichannel upmixing via an active pan-potted barycentric beamformer formulation with sparse Chebyshev polynomial directivity. Experiments analyze center channel leakage, evaluate subjective listening tests, and render sample multichannel upmixes over varied speaker arrangements.
EURASIP Journal on Audio, Speech, and Music Processing, 2021
Microphone and speaker array designs have increasingly diverged from simple topologies due to div... more Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in the array’s acoustic radiation pattern, spatial distribution of target and noise sources, and intended beampattern directivity. Relevant tasks such as representing complex pressure fields, specifying spatial priors, and composing beampatterns can be efficiently synthesized using spherical harmonic (SH) basis functions. This paper extends the expansion of common stationary covariance functions onto the SHs and proposes models for encoding magnitude functions on a sphere. Conventional beamformer designs are reformulated in terms of magnitude density functions and beampatterns along SH bases. Applications to speaker far-field response fitting, cross-talk cancelation design, and microphone beampattern fitting are presented.
Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and p... more Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N 3) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the covariance function is expressible as a tensor product kernel (TPK) and the inputs form a multidimensional grid, it was shown that the costs for exact GPR can be reduced to a sub-quadratic function of N. We extend these exact fast algorithms to sparse GPR and remark on a connection to Gaussian process latent variable models (GPLVMs). In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. We demonstrate our method on synthetic, text scan, and magnetic resonance imaging (MRI) data reconstructions. 1
We parallelize a version of the active-set iterative algorithm derived from the original works of... more We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson (1974) on multi-core architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and downdate the QR factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multi-core CPUs. Both synthetic and non-synthetic data are used in the experiments.
Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used... more Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used in spatial audio synthesis. They describe the scattering response of the head, torso, and pinnae of the subject. We propose a structural factorization of the HRIRs into a product of non-negative and Toeplitz matrices; the factorization is based on a novel extension of a non-negative matrix factorization algorithm. As a result, the HRIR becomes expressible as a convolution between a direction- independent resonance filter and a direction-dependent reflection filter. Further, the reflection filter can be made sparse with minimal HRIR distortion. The described factorization is shown to be applicable to the arbitrary source signal case and allows one to employ time-domain convolution at a computational cost lower than using convolution in the frequency domain. Index Terms—Head-related impulse response, non-negative matrix factorization, Toeplitz, convolution, sparsity I. INTRODUCTION The hu...
Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD) matrices have... more Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD) matrices have been used to generate preconditioners for various iterative solvers. These solvers generally use preconditioners derived from the matrix system, , in order to reduce the total number of iterations until convergence. In this report, we investigate the findings of ref. [1] on their method for computing preconditioners from SSPD matrix. In particular, we focus on their first supernodal Cholesky factorization algorithm designed for matrices with naturally occurring block structures. The supernodal incomplete Cholesky algorithm for preconditioner generation is motivated by how the Cholesky factorization accesses column nodes, the overhead from indirect addressing of SSPD matrix , and the memory advantages obtained from level 3 BLAS routines with dense blocking. We introduce this motivation and explain some priors such as supernodal elimination trees [2] in the background section. In Matlab, ...
Scientific computing and numerical analysis techniques have been widely adapted and implemented i... more Scientific computing and numerical analysis techniques have been widely adapted and implemented in the Fortran language since its inception. Many successive versions of Fortran have allowed new functionalities to be incorporated into the language while maintaining backward compatibility with older code. The current standard, Fortran 2003, is extensively used in high-performance computing on supercomputers and computer clusters. While Fortran users may have access to these nodes, fast processing times are still possible on a single desktop system. The graphical processing unit (GPU) on modern graphic cards can perform a large amount of instructions in parallel execution. By employing programmable GPUs and their respective frameworks, many scientific computations are sped-up over their CPU counterparts at a high cost of rewriting code for specific GPU architectures. To address this issue, we have developed a middleware library on top of Fortran 95 and later versions that interfaces wi...
The fast multipole method (FMM) performs fast approximate kernel summation to a specified toleran... more The fast multipole method (FMM) performs fast approximate kernel summation to a specified tolerance $\epsilon$ by using a hierarchical division of the domain, which groups source and receiver points into regions that satisfy local separation and the well-separated pair decomposition properties. While square tilings and quadtrees are commonly used in 2D, we investigate alternative tilings and associated spatial data structures: regular hexagons (septree) and triangles (triangle-quadtree). We show that both structures satisfy separation properties for the FMM and prove their theoretical error bounds and computational costs. Empirical runtime and error analysis of our implementations are provided.
Advances in speaker, room, and device acoustic modeling have given rise to large scale simulation... more Advances in speaker, room, and device acoustic modeling have given rise to large scale simulations of their spatial-frequency responses suitable for tasks such as rapid hardware prototyping, audio front-end algorithm validation, and back-end data-set augmentation for machine learning. Joint modeling of sources, rooms, and receivers is computationally prohibitive due to the large combinatorial space, coupling between models, and overhead cost of data exchange. To address these issues, we introduce the complex spherical harmonics as a separable set of basis functions for representing each of these models and their first-order interactions. We then present a partitioned frequency-dependent image-source model expanded into the spherical harmonics for efficient impulse response synthesis. Results are validated against real-world measurements.
Title of dissertation: FAST NUMERICAL AND MACHINE LEARNING ALGORITHMS FOR SPATIAL AUDIO REPRODUCT... more Title of dissertation: FAST NUMERICAL AND MACHINE LEARNING ALGORITHMS FOR SPATIAL AUDIO REPRODUCTION Yuancheng Luo, Doctor of Philosophy, 2014 Dissertation directed by: Professor Ramani Duraiswami Department of Computer Science Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatialdirectional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener’s anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The la...
I-vectors are concise representations of speaker characteristics. Recent progress in i-vectors re... more I-vectors are concise representations of speaker characteristics. Recent progress in i-vectors related research has utilized their ability to capture speaker and channel variability to develop efficient automatic speaker verification (ASV) systems. Inter-speaker relationships in the i-vector space are nonlinear. Accomplishing effective speaker verification requires a good modeling of these non-linearities and can be cast as a machine learning problem. Kernel partial least squares (KPLS) can be used for discriminative training in the i-vector space. However, this framework suffers from training data imbalance and asymmetric scoring. We use “one shot similarity scoring” (OSS) to address this. The resulting ASV system (OSS-KPLS) is tested across several conditions of the NIST SRE 2010 extended core data set and compared against state-of-the-art systems: Joint Factor Analysis (JFA), Probabilistic Linear Discriminant Analysis (PLDA), and Cosine Distance Scoring (CDS) classifiers. Improve...
Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used... more Head-related impulse responses (HRIRs) are subject-dependent and direction-dependent filters used in spatial audio synthesis. They describe the scattering response of the head, torso, and pinnae of the subject. We propose a structural factorization of the HRIRs into a product of non-negative and Toeplitz matrices; the factorization is based on a novel extension of a non-negative matrix factorization algorithm. As a result, the HRIR becomes expressible as a convolution between a direction-independent \emph{resonance} filter and a direction-dependent \emph{reflection} filter. Further, the reflection filter can be made \emph{sparse} with minimal HRIR distortion. The described factorization is shown to be applicable to the arbitrary source signal case and allows one to employ time-domain convolution at a computational cost lower than using convolution in the frequency domain.
Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and p... more Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the covariance function is expressible as a tensor product kernel (TPK) and the inputs form a multidimensional grid, it was shown that the costs for exact GPR can be reduced to a sub-quadratic function of N . We extend these exact fast algorithms to sparse GPR and remark on a connection to Gaussian process latent variable models (GPLVMs). In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. We demonstrate our method on synthetic, text scan, and magnetic resonance imaging (MRI) data reconstructions.
Uploads
Papers by Yuancheng Luo