Abstract
Typical voice conversion functions based on Gaussian mixture models are opaque in the sense that it is not straightforward to establish a link between the conversion parameters and their physical implications. Following the line of recent works, in this paper we study how physically meaningful constraints can be imposed to a system operating in the cepstral domain in order to get more informative conversion functions. The resulting method can be used to study the differences between source and target voices in terms of formant location in frequency, spectral tilt and amplitude in specific bands.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. IEEE ICASSP, pp. 655–658 (1988)
Arslan, L.M.: Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun. 28(3), 211–226 (1999)
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 1, 145–148 (1992)
Sündermann, D., Ney, H.: VTLN-based voice conversion. In: Proc. ISSPIT, pp. 556–559 (2003)
Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)
Duxans, H., Bonafonte, A., Kain, A., van Santen, J.: Including dynamic and phonetic information in voice conversion systems. In: Proc. ICSLP, pp. 1193–1196 (2004)
Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Process. 6, 131–142 (1998)
Kain, A.: High resolution voice transformation, Ph.D. thesis, Oregon Health & Science University (2001)
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech, Lang. Process. 15(8), 2222–2235 (2007)
Toda, T., Saruwatari, H., Shikano, K.: Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. IEEE ICASSP, pp. 841–844 (2001)
Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency warping. IEEE Trans. Audio, Speech, Lang. Process. 18(5), 922–931 (2010)
Tamura, M., Morita, M., Kagoshima, T., Akamine, M.: One sentence voice adaptation using GMM-based frequency-warping and shift with a sub-band basis spectrum model. In: Proc. IEEE ICASSP, pp. 5124–5127 (2011)
Godoy, E., Rosec, O., Chonavel, T.: Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans. Audio, Speech, Lang. Process. 20(4), 1313–1323 (2012)
Zorilă, T.-C., Erro, D., Hernaez, I.: Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations. In: Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 30–39. Springer, Heidelberg (2012)
Erro, D., Navas, E., Hernaez, I.: Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. In: Proc. Interspeech, pp.86–89 (2012)
Erro, D., Navas, E., Hernaez, I.: Parametric Voice Conversion based on Bilinear Frequency Warping plus Amplitude Scaling. IEEE Trans. Audio, Speech, and Lang. Process. 21(3), 556–566 (2013)
Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech and Audio Process. 13(5), 930–944 (2005)
McDonough, J., Byrne, W.: Speaker adaptation with all-pass transforms. In: Proc. IEEE ICASSP, pp. 757–760 (1999)
CMU ARCTIC speech synthesis databases, http://festvox.org/cmu_arctic/
Erro, D., Sainz, I., Navas, E., Hernaez, I.: HNM-based MFCC+F0 extractor applied to statistical speech synthesis. In: Proc. IEEE ICASSP, pp. 4728–4731 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Erro, D., Alonso, A., Serrano, L., Navas, E., Hernáez, I. (2013). Towards Physically Interpretable Parametric Voice Conversion Functions. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-38847-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)