[go: up one dir, main page]

Venkataramani et al., 2018 - Google Patents

End-to-end source separation with adaptive front-ends

Venkataramani et al., 2018

View PDF
Document ID
12781241671791412537
Author
Venkataramani S
Casebeer J
Smaragdis P
Publication year
Publication venue
2018 52nd asilomar conference on signals, systems, and computers

External Links

Snippet

Source separation and other audio applications have traditionally relied on the use of short- time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition

Similar Documents

Publication Publication Date Title
Venkataramani et al. End-to-end source separation with adaptive front-ends
Luo et al. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Venkataramani et al. Adaptive front-ends for end-to-end source separation
Qian et al. Speech Enhancement Using Bayesian Wavenet.
Koizumi et al. SpecGrad: Diffusion probabilistic model based neural vocoder with adaptive noise spectral shaping
US20230317056A1 (en) Audio generator and methods for generating an audio signal and training an audio generator
Geng et al. End-to-end speech enhancement based on discrete cosine transform
Mysore et al. Variational inference in non-negative factorial hidden Markov models for efficient audio source separation
US20070154033A1 (en) Audio source separation based on flexible pre-trained probabilistic source models
CN108198566A (en) Information processing method and device, electronic device and storage medium
Takeuchi et al. Invertible DNN-based nonlinear time-frequency transform for speech enhancement
CN116013343A (en) Speech enhancement method, electronic device and storage medium
Wang et al. A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
Baby et al. Speech dereverberation using variational autoencoders
Lostanlen et al. Fitting auditory filterbanks with multiresolution neural networks
Venkataramani et al. End-to-end networks for supervised single-channel speech separation
Gandhiraj et al. Auditory-based wavelet packet filterbank for speech recognition using neural network
Nie et al. Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation
Sivapatham et al. Gammatone filter bank-deep neural network-based monaural speech enhancement for unseen conditions
Venkataramani et al. End-to-end non-negative autoencoders for sound source separation
Lee et al. Discriminative training of complex-valued deep recurrent neural network for singing voice separation
JP7641371B2 (en) Apparatus for providing a processed audio signal, method for providing a processed audio signal, apparatus for providing neural network parameters, and method for providing neural network parameters - Patents.com
Al-Ali et al. Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments
Wall et al. Recurrent lateral inhibitory spiking networks for speech enhancement
Guzewich et al. Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement.