[go: up one dir, main page]

Skip to main content

Towards Physically Interpretable Parametric Voice Conversion Functions

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7911))

Included in the following conference series:

  • 1095 Accesses

Abstract

Typical voice conversion functions based on Gaussian mixture models are opaque in the sense that it is not straightforward to establish a link between the conversion parameters and their physical implications. Following the line of recent works, in this paper we study how physically meaningful constraints can be imposed to a system operating in the cepstral domain in order to get more informative conversion functions. The resulting method can be used to study the differences between source and target voices in terms of formant location in frequency, spectral tilt and amplitude in specific bands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. IEEE ICASSP, pp. 655–658 (1988)

    Google Scholar 

  2. Arslan, L.M.: Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun. 28(3), 211–226 (1999)

    Article  Google Scholar 

  3. Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 1, 145–148 (1992)

    Google Scholar 

  4. Sündermann, D., Ney, H.: VTLN-based voice conversion. In: Proc. ISSPIT, pp. 556–559 (2003)

    Google Scholar 

  5. Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)

    Article  Google Scholar 

  6. Duxans, H., Bonafonte, A., Kain, A., van Santen, J.: Including dynamic and phonetic information in voice conversion systems. In: Proc. ICSLP, pp. 1193–1196 (2004)

    Google Scholar 

  7. Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Process. 6, 131–142 (1998)

    Article  Google Scholar 

  8. Kain, A.: High resolution voice transformation, Ph.D. thesis, Oregon Health & Science University (2001)

    Google Scholar 

  9. Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech, Lang. Process. 15(8), 2222–2235 (2007)

    Article  Google Scholar 

  10. Toda, T., Saruwatari, H., Shikano, K.: Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. IEEE ICASSP, pp. 841–844 (2001)

    Google Scholar 

  11. Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency warping. IEEE Trans. Audio, Speech, Lang. Process. 18(5), 922–931 (2010)

    Article  Google Scholar 

  12. Tamura, M., Morita, M., Kagoshima, T., Akamine, M.: One sentence voice adaptation using GMM-based frequency-warping and shift with a sub-band basis spectrum model. In: Proc. IEEE ICASSP, pp. 5124–5127 (2011)

    Google Scholar 

  13. Godoy, E., Rosec, O., Chonavel, T.: Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans. Audio, Speech, Lang. Process. 20(4), 1313–1323 (2012)

    Article  Google Scholar 

  14. Zorilă, T.-C., Erro, D., Hernaez, I.: Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations. In: Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 30–39. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Erro, D., Navas, E., Hernaez, I.: Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. In: Proc. Interspeech, pp.86–89 (2012)

    Google Scholar 

  16. Erro, D., Navas, E., Hernaez, I.: Parametric Voice Conversion based on Bilinear Frequency Warping plus Amplitude Scaling. IEEE Trans. Audio, Speech, and Lang. Process. 21(3), 556–566 (2013)

    Article  Google Scholar 

  17. Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech and Audio Process. 13(5), 930–944 (2005)

    Article  Google Scholar 

  18. McDonough, J., Byrne, W.: Speaker adaptation with all-pass transforms. In: Proc. IEEE ICASSP, pp. 757–760 (1999)

    Google Scholar 

  19. CMU ARCTIC speech synthesis databases, http://festvox.org/cmu_arctic/

  20. Erro, D., Sainz, I., Navas, E., Hernaez, I.: HNM-based MFCC+F0 extractor applied to statistical speech synthesis. In: Proc. IEEE ICASSP, pp. 4728–4731 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Erro, D., Alonso, A., Serrano, L., Navas, E., Hernáez, I. (2013). Towards Physically Interpretable Parametric Voice Conversion Functions. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38847-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38846-0

  • Online ISBN: 978-3-642-38847-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics