Abstract
In a previous work we developed an HMM-based TTS system for a Basque dialect spoken in southern France. We observed that French words, frequent in daily conversations, were not pronounced properly by the TTS system because the training corpus contained very few instances of some French phones. This paper reports our attempt to improve the pronunciation of these phones without redesigning the corpus or recording the speaker again. Inspired by techniques used to adapt synthetic voices using dysarthric speech, we transplant phones of a different French voice to our Basque voice, and we show the slight improvements found after surgery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of ICASSP, pp. 373–376 (1996)
Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
Zen, H., Braunschweiler, N., Buchholz, S., Gales, M.J.F., Knill, K., Krstulovic, S., Latorre, J.: Statistical parametric speech synthesis based on speaker and language factorization. IEEE Trans. Audio Speech Lang. Process. 20(6), 1713–1724 (2012)
Obin, N., Lanchantin, P., Lacheret, A., Rodet, X.: Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation. In: Proceedings of Interspeech, pp. 2785–2788 (2011)
Barra-Chicote, R., Yamagishi, J., King, S., Montero, J.M., Macias-Guarasa, J.: Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Commun. 52(5), 394–404 (2010)
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. E90–D(5), 825–834 (2007)
Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33(1), 1–5 (2012)
Erro, D., Hernáez, I., Navas, E., Alonso, A., Arzelus, H., Jauk, I., Hy, N.Q., Magariños, C., Pérez-Ramón, R., SulÃr, M., Tian, X., Wang, X., Ye, J.: ZureTTS: online platform for obtaining personalized synthetic voices. In: Proceedings of eNTERFACE 2014 (2014)
Erro, D., Hernaez, I., Alonso, A., Garcia-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N., Magariños, C., Perez-Ramon, R., Sulir, M., Tian, X., Wang, X.: Personalized synthetic voices for speaking impaired: website and app. In: Proceedings of Interspeech (2015)
Creer, S., Cunningham, S., Green, P., Yamagishi, J.: Building personalised synthetic voices for individuals with severe speech impairment. Comput. Speech Lang. 27(6), 1178–1193 (2013)
Veaux, C., Yamagishi, J., King, S.: Towards personalized synthesized voices for individuals with vocal disabilities: voice banking and reconstruction. In: Proceeding of SLPAT, pp. 107–111 (2013)
Navas, E., Hernaez, I., Erro, D., Salaberria, J., Oyharçabal, B., Padilla, M.: Developing a Basque TTS for the Navarro-Lapurdian dialect. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo GarcÃa, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 11–20. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_2
Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)
Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I., Odriozola, I., Luengo, I.: Aholab speech synthesizers for albayzin2010. In: Proceedings of FALA 2010, pp. 343–348 (2010)
Erro, D., Sainz, I., Luengo, I., Odriozola, I., Sánchez, J., Saratxaga, I., Navas, E., Hernáez, I.: HMM-based speech synthesis in Basque language using HTS. In: Proceedings of FALA, pp. 67–70 (2010)
Picart, B.: Statistical parametric speech synthesis based on the degree of articulation. Ph.D. thesis, Faculté Polytechnique, University of Mons (2013)
Roekhaut, S., Brognaux, S., Beaufort, R., Dutoit, T.: eLite-HTS: a NLP tool for French HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 2136–2137 (2014)
Magariños, C., Erro, D., Rodriguez-Banga, E.: Language-independent acoustic cloning of HTS voices: a preliminary study. In: Proceedings of ICASSP, pp. 5615–5619 (2016)
Erro, D., Moreno, A., Bonafonte, A.: INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 18(5), 944–953 (2010)
Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech. Audio Process. 13, 930–944 (2005)
Valbret, H., Moulines, E., Tubach, J.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)
Erro, D., Navas, E., Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process. 21(3), 556–566 (2013)
Zorilă, T.C., Erro, D., Hernaez, I.: Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations. Commun. Comput. Inf. Sci. 328, 30–39 (2012)
Alonso, A., Erro, D., Navas, E., Hernaez, I.: Speaker adaptation using only vocalic segments via frequency warping. In: Proceedings of Interspeech (2015)
Acknowledgements
This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER, UE) and the Basque Government (ELKAROLA project, KK-2015/00098). The research stay of A. Pierard at UPV/EHU was funded by the Erasmus program. The French database used in this study was generously provided by Acapela Group. We thank B. Picart for his help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Pierard, A., Erro, D., Hernaez, I., Navas, E., Dutoit, T. (2016). Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-49169-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)