[go: up one dir, main page]

Skip to main content

Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data

  • Conference paper
  • First Online:
Advances in Speech and Language Technologies for Iberian Languages (IberSPEECH 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Abstract

In a previous work we developed an HMM-based TTS system for a Basque dialect spoken in southern France. We observed that French words, frequent in daily conversations, were not pronounced properly by the TTS system because the training corpus contained very few instances of some French phones. This paper reports our attempt to improve the pronunciation of these phones without redesigning the corpus or recording the speaker again. Inspired by techniques used to adapt synthetic voices using dysarthric speech, we transplant phones of a different French voice to our Basque voice, and we show the slight improvements found after surgery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)

    Article  Google Scholar 

  2. Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of ICASSP, pp. 373–376 (1996)

    Google Scholar 

  3. Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)

    Article  Google Scholar 

  4. Zen, H., Braunschweiler, N., Buchholz, S., Gales, M.J.F., Knill, K., Krstulovic, S., Latorre, J.: Statistical parametric speech synthesis based on speaker and language factorization. IEEE Trans. Audio Speech Lang. Process. 20(6), 1713–1724 (2012)

    Article  Google Scholar 

  5. Obin, N., Lanchantin, P., Lacheret, A., Rodet, X.: Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation. In: Proceedings of Interspeech, pp. 2785–2788 (2011)

    Google Scholar 

  6. Barra-Chicote, R., Yamagishi, J., King, S., Montero, J.M., Macias-Guarasa, J.: Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Commun. 52(5), 394–404 (2010)

    Article  Google Scholar 

  7. Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. E90–D(5), 825–834 (2007)

    Article  Google Scholar 

  8. Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33(1), 1–5 (2012)

    Article  Google Scholar 

  9. Erro, D., Hernáez, I., Navas, E., Alonso, A., Arzelus, H., Jauk, I., Hy, N.Q., Magariños, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X., Ye, J.: ZureTTS: online platform for obtaining personalized synthetic voices. In: Proceedings of eNTERFACE 2014 (2014)

    Google Scholar 

  10. Erro, D., Hernaez, I., Alonso, A., Garcia-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N., Magariños, C., Perez-Ramon, R., Sulir, M., Tian, X., Wang, X.: Personalized synthetic voices for speaking impaired: website and app. In: Proceedings of Interspeech (2015)

    Google Scholar 

  11. Creer, S., Cunningham, S., Green, P., Yamagishi, J.: Building personalised synthetic voices for individuals with severe speech impairment. Comput. Speech Lang. 27(6), 1178–1193 (2013)

    Article  Google Scholar 

  12. Veaux, C., Yamagishi, J., King, S.: Towards personalized synthesized voices for individuals with vocal disabilities: voice banking and reconstruction. In: Proceeding of SLPAT, pp. 107–111 (2013)

    Google Scholar 

  13. Navas, E., Hernaez, I., Erro, D., Salaberria, J., Oyharçabal, B., Padilla, M.: Developing a Basque TTS for the Navarro-Lapurdian dialect. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 11–20. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_2

    Google Scholar 

  14. Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)

    Article  Google Scholar 

  15. Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I., Odriozola, I., Luengo, I.: Aholab speech synthesizers for albayzin2010. In: Proceedings of FALA 2010, pp. 343–348 (2010)

    Google Scholar 

  16. Erro, D., Sainz, I., Luengo, I., Odriozola, I., Sánchez, J., Saratxaga, I., Navas, E., Hernáez, I.: HMM-based speech synthesis in Basque language using HTS. In: Proceedings of FALA, pp. 67–70 (2010)

    Google Scholar 

  17. Picart, B.: Statistical parametric speech synthesis based on the degree of articulation. Ph.D. thesis, Faculté Polytechnique, University of Mons (2013)

    Google Scholar 

  18. Roekhaut, S., Brognaux, S., Beaufort, R., Dutoit, T.: eLite-HTS: a NLP tool for French HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 2136–2137 (2014)

    Google Scholar 

  19. Magariños, C., Erro, D., Rodriguez-Banga, E.: Language-independent acoustic cloning of HTS voices: a preliminary study. In: Proceedings of ICASSP, pp. 5615–5619 (2016)

    Google Scholar 

  20. Erro, D., Moreno, A., Bonafonte, A.: INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 18(5), 944–953 (2010)

    Article  Google Scholar 

  21. Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech. Audio Process. 13, 930–944 (2005)

    Article  Google Scholar 

  22. Valbret, H., Moulines, E., Tubach, J.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)

    Article  Google Scholar 

  23. Erro, D., Navas, E., Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process. 21(3), 556–566 (2013)

    Article  Google Scholar 

  24. Zorilă, T.C., Erro, D., Hernaez, I.: Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations. Commun. Comput. Inf. Sci. 328, 30–39 (2012)

    Article  Google Scholar 

  25. Alonso, A., Erro, D., Navas, E., Hernaez, I.: Speaker adaptation using only vocalic segments via frequency warping. In: Proceedings of Interspeech (2015)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER, UE) and the Basque Government (ELKAROLA project, KK-2015/00098). The research stay of A. Pierard at UPV/EHU was funded by the Erasmus program. The French database used in this study was generously provided by Acapela Group. We thank B. Picart for his help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Hernaez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Pierard, A., Erro, D., Hernaez, I., Navas, E., Dutoit, T. (2016). Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49169-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49168-4

  • Online ISBN: 978-3-319-49169-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics