Abstract
Generally, most blind signal separation algorithms deal with the separation problem in the absence of noise. The presence of noise degrades the performance of separated signals. This paper deals with the problem of blind separation of audio signals from noisy mixtures. Blind signal separation algorithm is applied on the discrete cosine transform, the discrete sine transform or the discrete wavelet transform of the mixed signals, instead of performing the separation on the mixtures in the time domain. All of these transforms have an energy compaction property, which concentrates most of the signal energy in a few coefficients in the transform domain, leaving most of the transform-domain coefficients close to zero. As a result, the separation is performed on a few coefficients in the transform domain. Another advantage of signal separation in transform domains is that the effect of noise on the signals in the transform domains is smaller than that in the time domain. The paper presents also an investigation of the rule of the speech enhancement techniques as pre- and post-processing steps for the blind signal separation process, instead of performing the separation on the mixtures in the time domain. The considered speech enhancement techniques are the spectral subtraction, the Wiener filtering, the adaptive Wiener filtering, and the wavelet denoising techniques. Both blind signal separation and noise reduction are applied within a real speaker identification system to reduce the effect of interference and noise on the system performance. The simulation results confirm the superiority of transform domain separation to time domain separation and the importance of the wavelet denoising technique, when used as a pre-processing step for noise reduction. Moreover, the speaker identification system performance is enhanced with blind signal separation and noise reduction.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abd El-Fattah, M. A., Dessouky, M. I., Diab, S. M., & Abd El-Samie, F. E. (2008). Speech enhancement using an adaptive wiener filtering approach. Progress in Electromagnetics Research M, 4, 167–184.
Beerends, J. G., Buuren, R. V., Vugt, J. V., & Verhave, J. (2009). Objective speech intelligibility measurement on the basis of natural speech in combination with perceptual modeling. Journal of the Audio Engineering Society, 57(5), 299–308.
Chan, D. C. (1997). Blind signal separation. A PhD dissertation. University of Cambridge.
Curnew, S. R., & How, J. (2007). Blind signal separation in MIMO OFDM systems using ICA and fractional sampling. In International symposium on signals, systems and electronics (pp. 67–70). ISSSE ‘07.
Dam, H. H., Nordholm, S., Low, S. Y., & Cantoni, A. (2007). Blind signal separation using steepest descent method. IEEE Trans Signal Processing, 55(8), 4198–4207.
Debals, O., Van Barel, M., & De Lathauwer, L. (2016). Löwner-based blind signal separation of rational functions with applications. IEEE Transactions on Signal Processing, 64(8), 1909–1918.
Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-time processing of speech signals (2nd ed.). New York: IEEE Press.
Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on audio, Speech, and Language Processing, 16(6), 1097–1111.
Gupta, V. K., Chandra, M., & Sharan, S. N. (2013). Acoustic echo and noise cancellation system for hand-free telecommunication using variable step size algorithms. Radioengineering, 22(1), 200–207.
Hayati, M., Shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam Region. In Processing of world academy of science, engineering and technology (Vol. 22).
Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014). Deep learning for monaural speech separation. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1562–1566). IEEE.
Jensen, J., & Hansen, J. H. (2001). Speech enhancement using a constrained iterative sinusoidal model. IEEE Transactions on Speech and Audio Processing, 9(7), 731–740.
Keighrey, C., Flynn, R., Murray, S., & Murray, N. (2017). A QoE evaluation of immersive augmented and virtual reality speech & language assessment applications. In 2017 ninth international conference on quality of multimedia experience (QoMEX) (pp. 1–6). IEEE.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676–680). IEEE.
Kozlowski, S. W. (2015). Advancing research on team process dynamics: Theoretical, methodological, and measurement considerations. Organizational Psychology Review, 5(4), 270–299.
Manmontri, U., & Naylor, P. A. (2008). A class of frobenius norm-based algorithms using penalty term and natural gradient for blind signal separation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1181–1193.
Moreau, E., Pesquet, J. C., & Thirion-Moreau, N. (2007). Convolutive blind signal separation based on asymmetrical contrast functions. IEEE Transactions on Signal Processing, 55(1), 356–371.
Pillai, S., Madhavan. (2006). Robust speaker identification using artificial neural network. Dissertation of Master degree in computer science, University of Nevada, Las Vegas.
Prochazka, A., Uhlir, J., Rayner, P. J. W., & Kingsbury, N. J. (1998). Signal analysis and prediction. Switzerland: Birkhauser Inc.
Pullella, D. (2006). Speaker identification using higher order spectra. Dissertation of Bachelor of Electrical and Electronic Engineering, University of Western Australia.
Ramakrishnan, A. G., Abhiram, B., & Mahadeva Prasanna, S. R. (2015). Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. The Journal of the Acoustical Society of America, 137(6), EL469–EL475.
Rao, K. R., & Yip, P. (2014). Discrete cosine transform: Algorithms, advantages, applications. Boston: Academic Press.
Sadhu, A., Narasimhan, S., & Antoni, J. (2017). A review of output-only structural mode identification literature employing blind source separation methods. Mechanical Systems and Signal Processing, 94, 415–431.
Unser, M., & Van De Ville, D. (2008). The pairing of a wavelet basis with a mildly redundant analysis via subband regression. IEEE Transactions on Image Processing, 17(11), 2040–2052.
Upadhyay, N., & Karmakar, A. (2015). Speech enhancement using spectral subtraction-type algorithms: A comparison and simulation study. Procedia Computer Science, 54, 574–584.
Valentini-Botinhao, C., Wu, Z., & King, S. (2015). Towards minimum perceptual error training for DNN-based speech synthesis. In Sixteenth annual conference of the international speech communication association.
Yang, W., Benbouchta, M., & Yantorno, R. (1998). Performance of the modified bark spectral distortion as an objective speech quality measure. In Proceedings of the IEEE international conf. on acoustic, speech and signal processing (ICASSP) (Vol. 1, pp. 541–544), Washington, USA.
Zheng-you, H., Xiaoqing, C., & Guoming, L. (2006). Wavelet entropy measure definition and its application for transmission line fault detection and identification; (Part I: Definition and methodology). In International conference on power system technology (pp. 1–6). PowerCon 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hammam, H., El-Shafai, W., Hassan, E. et al. Blind signal separation with Noise Reduction for efficient speaker identification. Int J Speech Technol 24, 235–250 (2021). https://doi.org/10.1007/s10772-019-09641-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-019-09641-6