Towards Physically Interpretable Parametric Voice Conversion Functions

Daniel Erro^21,22,
Agustín Alonso²¹,
Luis Serrano²¹,
Eva Navas²¹ &
…
Inma Hernáez²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7911))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

1095 Accesses

Abstract

Typical voice conversion functions based on Gaussian mixture models are opaque in the sense that it is not straightforward to establish a link between the conversion parameters and their physical implications. Following the line of recent works, in this paper we study how physically meaningful constraints can be imposed to a system operating in the cepstral domain in order to get more informative conversion functions. The resulting method can be used to study the differences between source and target voices in terms of formant location in frequency, spectral tilt and amplitude in specific bands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion

Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

References

Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. IEEE ICASSP, pp. 655–658 (1988)
Google Scholar
Arslan, L.M.: Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun. 28(3), 211–226 (1999)
Article Google Scholar
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 1, 145–148 (1992)
Google Scholar
Sündermann, D., Ney, H.: VTLN-based voice conversion. In: Proc. ISSPIT, pp. 556–559 (2003)
Google Scholar
Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)
Article Google Scholar
Duxans, H., Bonafonte, A., Kain, A., van Santen, J.: Including dynamic and phonetic information in voice conversion systems. In: Proc. ICSLP, pp. 1193–1196 (2004)
Google Scholar
Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Process. 6, 131–142 (1998)
Article Google Scholar
Kain, A.: High resolution voice transformation, Ph.D. thesis, Oregon Health & Science University (2001)
Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech, Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
Toda, T., Saruwatari, H., Shikano, K.: Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. IEEE ICASSP, pp. 841–844 (2001)
Google Scholar
Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency warping. IEEE Trans. Audio, Speech, Lang. Process. 18(5), 922–931 (2010)
Article Google Scholar
Tamura, M., Morita, M., Kagoshima, T., Akamine, M.: One sentence voice adaptation using GMM-based frequency-warping and shift with a sub-band basis spectrum model. In: Proc. IEEE ICASSP, pp. 5124–5127 (2011)
Google Scholar
Godoy, E., Rosec, O., Chonavel, T.: Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans. Audio, Speech, Lang. Process. 20(4), 1313–1323 (2012)
Article Google Scholar
Zorilă, T.-C., Erro, D., Hernaez, I.: Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations. In: Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 30–39. Springer, Heidelberg (2012)
Chapter Google Scholar
Erro, D., Navas, E., Hernaez, I.: Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. In: Proc. Interspeech, pp.86–89 (2012)
Google Scholar
Erro, D., Navas, E., Hernaez, I.: Parametric Voice Conversion based on Bilinear Frequency Warping plus Amplitude Scaling. IEEE Trans. Audio, Speech, and Lang. Process. 21(3), 556–566 (2013)
Article Google Scholar
Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech and Audio Process. 13(5), 930–944 (2005)
Article Google Scholar
McDonough, J., Byrne, W.: Speaker adaptation with all-pass transforms. In: Proc. IEEE ICASSP, pp. 757–760 (1999)
Google Scholar
CMU ARCTIC speech synthesis databases, http://festvox.org/cmu_arctic/
Erro, D., Sainz, I., Navas, E., Hernaez, I.: HNM-based MFCC+F0 extractor applied to statistical speech synthesis. In: Proc. IEEE ICASSP, pp. 4728–4731 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

AHOLAB, University of the Basque Country (UPV/EHU), Bilbao, Spain
Daniel Erro, Agustín Alonso, Luis Serrano, Eva Navas & Inma Hernáez
Basque Foundation for Science (IKERBASQUE), Bilbao, Spain
Daniel Erro

Authors

Daniel Erro
View author publications
You can also search for this author in PubMed Google Scholar
Agustín Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Luis Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Eva Navas
View author publications
You can also search for this author in PubMed Google Scholar
Inma Hernáez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

TCTS Lab, University of Mons, 31, Bouldevard Bolez, 7000, Mons, Belgium
Thomas Drugman
TCTS Lab, University of Mons, 31, Boulevard Dolez, 7000, Mons, Belgium
Thierry Dutoit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erro, D., Alonso, A., Serrano, L., Navas, E., Hernáez, I. (2013). Towards Physically Interpretable Parametric Voice Conversion Functions. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-38847-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Physically Interpretable Parametric Voice Conversion Functions

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion

Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Physically Interpretable Parametric Voice Conversion Functions

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion

Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation