Venkataramani et al., 2018 - Google Patents
End-to-end source separation with adaptive front-endsVenkataramani et al., 2018
View PDF- Document ID
- 12781241671791412537
- Author
- Venkataramani S
- Casebeer J
- Smaragdis P
- Publication year
- Publication venue
- 2018 52nd asilomar conference on signals, systems, and computers
External Links
Snippet
Source separation and other audio applications have traditionally relied on the use of short- time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the …
- 238000000926 separation method 0 title abstract description 37
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Venkataramani et al. | End-to-end source separation with adaptive front-ends | |
Luo et al. | Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation | |
Venkataramani et al. | Adaptive front-ends for end-to-end source separation | |
Qian et al. | Speech Enhancement Using Bayesian Wavenet. | |
Koizumi et al. | SpecGrad: Diffusion probabilistic model based neural vocoder with adaptive noise spectral shaping | |
US20230317056A1 (en) | Audio generator and methods for generating an audio signal and training an audio generator | |
Geng et al. | End-to-end speech enhancement based on discrete cosine transform | |
Mysore et al. | Variational inference in non-negative factorial hidden Markov models for efficient audio source separation | |
US20070154033A1 (en) | Audio source separation based on flexible pre-trained probabilistic source models | |
CN108198566A (en) | Information processing method and device, electronic device and storage medium | |
Takeuchi et al. | Invertible DNN-based nonlinear time-frequency transform for speech enhancement | |
CN116013343A (en) | Speech enhancement method, electronic device and storage medium | |
Wang et al. | A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments | |
Baby et al. | Speech dereverberation using variational autoencoders | |
Lostanlen et al. | Fitting auditory filterbanks with multiresolution neural networks | |
Venkataramani et al. | End-to-end networks for supervised single-channel speech separation | |
Gandhiraj et al. | Auditory-based wavelet packet filterbank for speech recognition using neural network | |
Nie et al. | Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation | |
Sivapatham et al. | Gammatone filter bank-deep neural network-based monaural speech enhancement for unseen conditions | |
Venkataramani et al. | End-to-end non-negative autoencoders for sound source separation | |
Lee et al. | Discriminative training of complex-valued deep recurrent neural network for singing voice separation | |
JP7641371B2 (en) | Apparatus for providing a processed audio signal, method for providing a processed audio signal, apparatus for providing neural network parameters, and method for providing neural network parameters - Patents.com | |
Al-Ali et al. | Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments | |
Wall et al. | Recurrent lateral inhibitory spiking networks for speech enhancement | |
Guzewich et al. | Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement. |