Edraki et al., 2024 - Google Patents
Speaker adaptation for enhancement of bone-conducted speechEdraki et al., 2024
- Document ID
- 11411193931536919947
- Author
- Edraki A
- Chan W
- Jensen J
- Fogerty D
- Publication year
- Publication venue
- ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
External Links
Snippet
Deep neural network (DNN)-based speech enhancement models often face challenges in maintaining their performance for speakers not encountered during training. This challenge is exacerbated in applications such as enhancement and bandwidth extension of bone …
- 230000006978 adaptation 0 title abstract description 43
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/03—Synergistic effects of band splitting and sub-band processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/43—Signal processing in hearing aids to enhance the speech intelligibility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gabbay et al. | Seeing through noise: Visually driven speaker separation and enhancement | |
Lv et al. | S-dccrn: Super wide band dccrn with learnable complex feature for speech enhancement | |
Ren et al. | A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement. | |
US20200005770A1 (en) | Sound processing apparatus | |
Edraki et al. | Speaker adaptation for enhancement of bone-conducted speech | |
Akeroyd et al. | The 2nd clarity enhancement challenge for hearing aid speech intelligibility enhancement: Overview and outcomes | |
Ju et al. | Tea-pse 2.0: Sub-band network for real-time personalized speech enhancement | |
Strake et al. | INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising. | |
Wang et al. | Wavelet speech enhancement based on nonnegative matrix factorization | |
Li et al. | Single-channel speech dereverberation via generative adversarial training | |
Rao et al. | INTERSPEECH 2021 ConferencingSpeech challenge: Towards far-field multi-channel speech enhancement for video conferencing | |
Healy et al. | A causal and talker-independent speaker separation/dereverberation deep learning algorithm: Cost associated with conversion to real-time capable operation | |
Healy et al. | A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions | |
CN111968627B (en) | Bone conduction voice enhancement method based on joint dictionary learning and sparse representation | |
Shankar et al. | Influence of mvdr beamformer on a speech enhancement based smartphone application for hearing aids | |
Liu et al. | Gesper: A restoration-enhancement framework for general speech reconstruction | |
Ohlenbusch et al. | Multi-microphone noise data augmentation for DNN-based own voice reconstruction for hearables in noisy environments | |
Goehring et al. | Speech enhancement for hearing-impaired listeners using deep neural networks with auditory-model based features | |
Toloosham et al. | A training framework for stereo-aware speech enhancement using deep neural networks | |
Gaultier et al. | Recovering speech intelligibility with deep learning and multiple microphones in noisy-reverberant situations for people using cochlear implants | |
Kashani et al. | Speech enhancement via deep spectrum image translation network | |
CN111009259B (en) | Audio processing method and device | |
Magadum et al. | An Innovative Method for Improving Speech Intelligibility in Automatic Sound Classification Based on Relative-CNN-RNN | |
Dashtipour et al. | Evaluating the audio-visual speech enhancement challenge (AVSEC) baseline model using an out-of-domain free-flowing corpus | |
Gergen et al. | Source separation by feature-based clustering of microphones in ad hoc arrays |