Venkataramani et al., 2018 - Google Patents

End-to-end source separation with adaptive front-ends

Venkataramani et al., 2018

Document ID: 12781241671791412537
Author: Venkataramani S; Casebeer J; Smaragdis P
Publication year: 2018
Publication venue: 2018 52nd asilomar conference on signals, systems, and computers

External Links

Cited by

Snippet

Source separation and other audio applications have traditionally relied on the use of short- time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the …

Continue reading at arxiv.org (PDF) (other versions)

238000000926 separation method 0 title abstract description 37

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition

Similar Documents

Publication	Publication Date	Title
Venkataramani et al.	2018	End-to-end source separation with adaptive front-ends
Luo et al.	2019	Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Venkataramani et al.	2017	Adaptive front-ends for end-to-end source separation
Qian et al.	2017	Speech Enhancement Using Bayesian Wavenet.
Koizumi et al.	2022	SpecGrad: Diffusion probabilistic model based neural vocoder with adaptive noise spectral shaping
US20230317056A1 (en)	2023-10-05	Audio generator and methods for generating an audio signal and training an audio generator
Geng et al.	2020	End-to-end speech enhancement based on discrete cosine transform
Mysore et al.	2012	Variational inference in non-negative factorial hidden Markov models for efficient audio source separation
US20070154033A1 (en)	2007-07-05	Audio source separation based on flexible pre-trained probabilistic source models
CN108198566A (en)	2018-06-22	Information processing method and device, electronic device and storage medium
Takeuchi et al.	2020	Invertible DNN-based nonlinear time-frequency transform for speech enhancement
CN116013343A (en)	2023-04-25	Speech enhancement method, electronic device and storage medium
Wang et al.	2025	A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
Baby et al.	2021	Speech dereverberation using variational autoencoders
Lostanlen et al.	2023	Fitting auditory filterbanks with multiresolution neural networks
Venkataramani et al.	2018	End-to-end networks for supervised single-channel speech separation
Gandhiraj et al.	2007	Auditory-based wavelet packet filterbank for speech recognition using neural network
Nie et al.	2016	Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation
Sivapatham et al.	2022	Gammatone filter bank-deep neural network-based monaural speech enhancement for unseen conditions
Venkataramani et al.	2020	End-to-end non-negative autoencoders for sound source separation
Lee et al.	2017	Discriminative training of complex-valued deep recurrent neural network for singing voice separation
JP7641371B2 (en)	2025-03-06	Apparatus for providing a processed audio signal, method for providing a processed audio signal, apparatus for providing neural network parameters, and method for providing neural network parameters - Patents.com
Al-Ali et al.	2021	Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments
Wall et al.	2016	Recurrent lateral inhibitory spiking networks for speech enhancement
Guzewich et al.	2018	Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement.