Abstract
In this paper, an adaptive framework for audio retrieval in live teleconferencing environments with multiple participants is proposed. The framework uses a non reference anchor array (NRA) to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest (SOI). A linearly constrained-minimum variance (LC-MV) beamformer is used herein such that the signal coming from the look direction is preserved while interferences coming from the non look direction are nulled. Additionally, the reverberant component of the speech acquired by this framework is removed by a novel method that uses the linear prediction (LP) residual cepstrum. This method does not require the computation of the acoustic impulse response (AIR) of the teleconferencing room and hence is computationally efficient. The NRA framework is therefore able to remove correlated noise coming from the direction of the SOI and also dereverberating the noise free signal. The performance of the proposed framework is evaluated by conducting experiments on clean speech acquisition from distant microphone arrays. Experiments on distant speech recognition are also conducted using the TIMIT and MONC databases. Experimental results obtained from the proposed framework indicate a reasonable improvement over correlation, subspace and standard minimum variance beamforming methods. The application of the framework in audio retrieval in a live teleconferencing environment with multiple participants is also discussed.







Similar content being viewed by others
References
Li, J., & Stoica, P. (2006). Robust adaptive beamforming. Wiley Online Library.
Benesty, J., Chen, J., Huang, Y.(2008). Microphone array signal processing (Vol. 1). Berlin Heidelberg: Springer-Verlag.
Li, J., Stoica, P., Wang, Z. (2003). On robust capon beamforming and diagonal loading. IEEE Transactions on Signal Processing , 51(7), 1702–1715.
Shukla, A., Nathwani, K., Hegde, R.M. (2012). An adaptive non reference anchor array framework for distant speech recognition. In Advances in multimedia information processing–PCM 2012 (pp. 222–231). Berlin Heidelberg: Springer-Verlag.
Nathwani, K., & Hegde, R. (2012). Joint adaptive beamforming and echo cancellation using a non reference anchor array framework. In TA8a1-10: array signal processing, 46th asilomar conference on signals, systems and computers Nov. 2012. Pacific Grove, California.
Bees, D., Blostein, M., Kabal, P. (1991). Reverberant speech enhancement using cepstral processing. In Acoustics, speech, and signal processing, 1991. ICASSP-91., International conference on (pp. 977–980). IEEE.
Dobrowolski, A.P., & Majda E. (2011). Cepstral analysis in the speakers recognition systems. In Signal processing algorithms, architectures, arrangements, and applications conference proceedings (SPA), 2011 (pp. 1–6). IEEE.
Mosayyebpour, S., Sayyadiyan, A., Zareian, M., Shahbazi, A. (2010). Single channel inverse filtering of room impulse response by maximizing skewness of lp residual. In Signal acquisition and processing, 2010. ICSAP’10. International conference on (pp. 130–134). IEEE.
Xizhong, S., & Guang, M. (2009). Complex cepstrum based singlechannel speech dereverberation. In Computer science & education, 2009. ICCSE’09. 4th International conference on (pp. 7–11). IEEE.
Dmochowski, J., Benesty, J., Affès, S. (2009). On spatial aliasing in microphone arrays. Signal Processing, IEEE Transactions on, 57(4), 1383–1395.
Naylor, P.A., & Gaubitch, N.D. (2010). Speech dereverberation. Springer.
Garofolo, J. (1993). TIMIT: acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Levi, A. (2003). Multi channel overlapping numbers corpus distribution. Philadelphia: Linguistic Data Consortium. http://cslu.cse.ogi.edu/corpora/.
Loizou, P. (2011). Speech quality assessment. Multimedia analysis, processing and communications (pp. 623–654).
Naylor, P., & Gaubitch, N. (2012). Acoustic signal processing in noise: its not getting any quieter. In Acoustic signal enhancement; proceedings of IWAENC 2012, International workshop on (pp. 1–6). VDE.
Qin, B., Zhang, H., Fu, Q., Yan, Y. (2008). Subsample time delay estimation via improved gcc phat algorithm. In Signal processing, 2008. ICSP 2008. 9th international conference on (pp. 2579–2582).
Zahernia, A., Dehghani, M., Javidan, R. (2011). Music algorithm for doa estimation using mimo arrays. In 6th telecommunication systems services, and applications (TSSA), 2011 international conference on (pp. 149–153).
Huber, R. (2006). PEMO-Q–A new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio Speech and Language Processing , 14(6), 1902–1911.
Qadeer, M. (2012). Dynamic call transfer through wi-fi networks using asterisk. In Proceedings of the international conference on soft computing for problem solving (SocProS 2011) December 20-22, 2011 (pp. 51–61). New York: Springer.
Sinnreich, H., & Johnston, A. B. (2012). Internet communications using SIP: delivering VoIP and multimedia services with session initiation protocol (Vol. 27). Indianapolis: Wiley Publishing, Inc.
Acknowledgments
This work was supported in part by the DeITY, Goverment of India and in part by the BSNL Telecom Center of Excellence, IIT Kanpur
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nathwani, K., Shukla, A., Khunteta, S. et al. An Adaptive Non Reference Anchor Array Framework for Audio Retrieval in Teleconferencing Environment. J Sign Process Syst 74, 91–102 (2014). https://doi.org/10.1007/s11265-013-0786-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-013-0786-7