Abstract
Dual-channel speech enhancement based on traditional beamforming is difficult to effectively suppress noise. In recent years, it is promising to replace beamforming with a neural network that learns spectral characteristic. This paper proposes a neural network adaptive beamforming end-to-end dual-channel model for speech enhancement task. First, the LSTM layer is used to directly process the original speech waveform to estimate the time-domain beamforming filter coefficients of each channel and convolve and sum it with the input speech. Second, we modified a fully-convolutional time-domain audio separation network (Conv-TasNet) into a network suitable for speech enhancement which is called Denoising-TasNet to further enhance the output of the beamforming. The experimental results show that the proposed method is better than convolutional recurrent network (CRN) model and several popular noise reduction methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Xia, Y., Braun, S., Reddy, C.K., Dubey, H., Cutler, R., Tashev, I.: Weighted speech distortion losses for neural-network-based real-time speech enhancement. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 871–875 (2020)
Van Veen, B.D., Buckley, K.M.: Beamforming: a versatile approach to spatial filtering. IEEE ASSP Mag. 5(2), 4–24 (1988)
Hoshuyama, O., Sugiyama, A., Hirano, A.: A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Trans. Signal Process. 47(10), 2677–2684 (1999)
Pfeifenberger, L., Zohrer, M., Pernkopf, F.: Eigenvector-based speech mask estimation for multi-channel speech enhancement. IEEE Trans. Audio Speech Lang. Process. 27(12), 2162–2172 (2019)
Pfeifenberger, L., Pernkopf, F.: Blind source extraction based on a direction-dependent a-priori SNR. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Luo, Y., Chen, Z., Mesgarani, N., Yoshioka, T.: End-to-end microphone permutation and number invariant multi-channel speech separation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6394–6398 (2020)
Li, B., Sainath, T.N., Weiss, R.J., Wilson, K.W., Bacchiani, M.: Neural network adaptive beamforming for robust multichannel speech recognition. Interspeech 2016, 1976–1980 (2016)
Sainath, T.N., et al.: Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 965–979 (2017)
Luo, Y., Han, C., Mesgarani, N., Ceolini, E., Liu, S.-C.: FasNet: low-latency adaptive beamforming for multi-microphone audio processing. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019, pp. 260–267 (2019)
Wang, Z.-Q., Wang, P., Wang, D.: Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1778–1787 (2020)
Benesty, J., Chen, J., Huang, Y.: Microphone Array Signal Processing, vol. 1. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78612-2
Luo, Y., Mesgarani, N.: Conv-TasNet: surpassing ideal time-frequency magnitude masking for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 27(8), 1256–1266 (2019)
Fu, S.W., Tsao, Y., Lu, X., Kawai, H.: Raw waveform-based speech enhancement by fully convolutional networks. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 006–012. IEEE (2017)
Kolbæk, M., Tan, Z.-H., Jensen, S.H., Jensen, J.: On loss functions for supervised monaural time-domain speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 825–838 (2020)
Tawara, N., Kobayashi, T., Ogawa, T.: Multi-channel speech enhancement using time-domain convolutional denoising autoencoder. In: INTERSPEECH, pp. 86–90 (2019)
Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)
Reddy, C.K., Gopal, V., Cutler, R.: DNSMOS: a non-intrusive perceptual objective speech quality metric to evaluate noise suppressors. arXiv e-prints, pp. arXiv-2010 (2020)
Hendriks, R.C., Heusdens, R., Jensen, J.: MMSE based noise PSD tracking with low complexity. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4266–4269. IEEE (2010)
López-Espejo, I., González, J.A., Gómez, Á.M., Peinado, A.M.: A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: application to noise-robust speech recognition. In: Navarro Mesa, J.L., et al. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 119–128. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13623-3_13
Tan, K., Zhang, X., Wang, D.L.: Real-time speech enhancement using an efficient convolutional recurrent network for dual-microphone mobile phones in close-talk scenarios. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019)
Tammen, M., Doclo, S.: Deep multi-frame MVDR filtering for single-microphone speech enhancement. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8443–8447. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 40279 KB)
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Jiang, T., Liu, H., Shuai, C., Wang, M., Zhou, Y., Gan, L. (2022). Dual-Channel Speech Enhancement Using Neural Network Adaptive Beamforming. In: Gao, H., Wun, J., Yin, J., Shen, F., Shen, Y., Yu, J. (eds) Communications and Networking. ChinaCom 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 433. Springer, Cham. https://doi.org/10.1007/978-3-030-99200-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-99200-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99199-9
Online ISBN: 978-3-030-99200-2
eBook Packages: Computer ScienceComputer Science (R0)