Keynote Slides
Keynote Slides
Keynote Slides
Recurrent Networks
Han Zhao1, Shuayb Zarar2, Ivan Tashev2 and Chin-Hui Lee3
Apr. 19th
1
Speech Enhancement — Motivation
Clean Speech
2
Speech Enhancement — Motivation
Noisy Speech
Fixed
3
Speech Enhancement — Motivation
Distribution mismatch
4
Speech Enhancement — Motivation
Speech
Enhancement
5
Outline
• Background
• Data-driven Approach
• Conclusion
6
Background
Problem setup:
Clean signal
• Noise type:
7
Background
Pros:
• Simple, and computationally efficient
• Optimality under proper assumption
• Interpretable
Cons:
• Limited to stationary noise
• Restricted to noise with specific characteristics
8
Data-driven Approach
9
Data-driven Approach
Goal:
• Build function approximator such that
10
Data-driven Approach
12
Data-driven Approach
STFT
ISTFT( )
13
Convolutional-Recurrent Networks for SE
Problem setup:
where
15
Convolutional-Recurrent Networks for SE
Observations:
16
Convolutional-Recurrent Networks for SE
Proposed: Convolution + bi-LSTM + Linear
Regression
Objective:
17
Convolutional-Recurrent Networks for SE
Proposed: Convolution + bi-LSTM + Linear
Regression
* Convolution kernel
with size (b, w)
Zero-padded
spectrogram (t, f) =
20
Convolutional-Recurrent Networks for SE
bi-directional LSTM State transition function of LSTM cell:
21
Convolutional-Recurrent Networks for SE
Linear Regression with Projection
At each time step t:
MSE:
23
Experiments
Evaluation Metric
• Signal-to-Noise Ratio (SNR) dB
24
Experiments
Comparison with State-of-the-Art Methods
• Classic noise suppressor
• DNN-Symmetric (Xu et al. 2015)
• Multilayer perceptron, 3 hidden layers (2048x3), 11 context window
• DNN-Causal (Tashev et al. 2016)
• Multilayer perceptron, 3 hidden layers (2048x3), 7 causal window
• Deep-RNN (Maas et al. 2012)
• Recurrent autoencoders, 3 hidden layers (500x3), 3 context window
25
Experiments
Comparison with State-of-the-Art Methods (seen noise)
SNR LSD MSE WER PESQ
Noisy data 15.18 23.07 0.04399 15.40 2.26
Classic NS 18.82 22.24 0.03985 14.77 2.40
DNN-s 44.51 19.89 0.03436 55.38 2.20
DNN-c 40.70 20.09 0.03485 54.92 2.17
RNN 41.08 17.49 0.03533 44.93 2.19
Ours 49.79 15.17 0.03399 14.64 2.86
Clean data 57.31 1.01 0.0000 2.19 4.48
26
Experiments
Comparison with State-of-the-Art Methods (unseen noise)
SNR LSD MSE WER PESQ
Noisy data 14.78 23.76 0.04786 18.40 2.09
Classic NS 19.73 22.82 0.04201 15.54 2.26
DNN-s 40.47 21.07 0.03741 54.77 2.16
DNN-c 38.70 21.38 0.03718 54.13 2.13
RNN 44.60 18.81 0.03665 52.05 2.06
Ours 39.70 17.06 0.04721 16.71 2.73
Clean data 58.35 1.15 0.0000 1.83 4.48
27
Experiments
Case Study
Noisy Clean
MS-Cortana 28
Experiments
Case Study
Noisy Clean
DNN 29
Experiments
Case Study
Noisy Clean
RNN 30
Experiments
Case Study
Noisy Clean
Ours 31
Conclusion
• Convolutions help capture local pattern
32
Conclusion
Thanks
33