Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data

Arnaud Pierard²¹,
D. Erro^22,23,
I. Hernaez²²,
E. Navas²² &
…
Thierry Dutoit²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Included in the following conference series:

International Conference on Advances in Speech and Language Technologies for Iberian Languages

687 Accesses
1 Citations

Abstract

In a previous work we developed an HMM-based TTS system for a Basque dialect spoken in southern France. We observed that French words, frequent in daily conversations, were not pronounced properly by the TTS system because the training corpus contained very few instances of some French phones. This paper reports our attempt to improve the pronunciation of these phones without redesigning the corpus or recording the speaker again. Inspired by techniques used to adapt synthetic voices using dysarthric speech, we transplant phones of a different French voice to our Basque voice, and we show the slight improvements found after surgery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

References

Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of ICASSP, pp. 373–376 (1996)
Google Scholar
Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
Article Google Scholar
Zen, H., Braunschweiler, N., Buchholz, S., Gales, M.J.F., Knill, K., Krstulovic, S., Latorre, J.: Statistical parametric speech synthesis based on speaker and language factorization. IEEE Trans. Audio Speech Lang. Process. 20(6), 1713–1724 (2012)
Article Google Scholar
Obin, N., Lanchantin, P., Lacheret, A., Rodet, X.: Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation. In: Proceedings of Interspeech, pp. 2785–2788 (2011)
Google Scholar
Barra-Chicote, R., Yamagishi, J., King, S., Montero, J.M., Macias-Guarasa, J.: Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Commun. 52(5), 394–404 (2010)
Article Google Scholar
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. E90–D(5), 825–834 (2007)
Article Google Scholar
Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33(1), 1–5 (2012)
Article Google Scholar
Erro, D., Hernáez, I., Navas, E., Alonso, A., Arzelus, H., Jauk, I., Hy, N.Q., Magariños, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X., Ye, J.: ZureTTS: online platform for obtaining personalized synthetic voices. In: Proceedings of eNTERFACE 2014 (2014)
Google Scholar
Erro, D., Hernaez, I., Alonso, A., Garcia-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N., Magariños, C., Perez-Ramon, R., Sulir, M., Tian, X., Wang, X.: Personalized synthetic voices for speaking impaired: website and app. In: Proceedings of Interspeech (2015)
Google Scholar
Creer, S., Cunningham, S., Green, P., Yamagishi, J.: Building personalised synthetic voices for individuals with severe speech impairment. Comput. Speech Lang. 27(6), 1178–1193 (2013)
Article Google Scholar
Veaux, C., Yamagishi, J., King, S.: Towards personalized synthesized voices for individuals with vocal disabilities: voice banking and reconstruction. In: Proceeding of SLPAT, pp. 107–111 (2013)
Google Scholar
Navas, E., Hernaez, I., Erro, D., Salaberria, J., Oyharçabal, B., Padilla, M.: Developing a Basque TTS for the Navarro-Lapurdian dialect. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 11–20. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_2
Google Scholar
Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)
Article Google Scholar
Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I., Odriozola, I., Luengo, I.: Aholab speech synthesizers for albayzin2010. In: Proceedings of FALA 2010, pp. 343–348 (2010)
Google Scholar
Erro, D., Sainz, I., Luengo, I., Odriozola, I., Sánchez, J., Saratxaga, I., Navas, E., Hernáez, I.: HMM-based speech synthesis in Basque language using HTS. In: Proceedings of FALA, pp. 67–70 (2010)
Google Scholar
Picart, B.: Statistical parametric speech synthesis based on the degree of articulation. Ph.D. thesis, Faculté Polytechnique, University of Mons (2013)
Google Scholar
Roekhaut, S., Brognaux, S., Beaufort, R., Dutoit, T.: eLite-HTS: a NLP tool for French HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 2136–2137 (2014)
Google Scholar
Magariños, C., Erro, D., Rodriguez-Banga, E.: Language-independent acoustic cloning of HTS voices: a preliminary study. In: Proceedings of ICASSP, pp. 5615–5619 (2016)
Google Scholar
Erro, D., Moreno, A., Bonafonte, A.: INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 18(5), 944–953 (2010)
Article Google Scholar
Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech. Audio Process. 13, 930–944 (2005)
Article Google Scholar
Valbret, H., Moulines, E., Tubach, J.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)
Article Google Scholar
Erro, D., Navas, E., Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process. 21(3), 556–566 (2013)
Article Google Scholar
Zorilă, T.C., Erro, D., Hernaez, I.: Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations. Commun. Comput. Inf. Sci. 328, 30–39 (2012)
Article Google Scholar
Alonso, A., Erro, D., Navas, E., Hernaez, I.: Speaker adaptation using only vocalic segments via frequency warping. In: Proceedings of Interspeech (2015)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER, UE) and the Basque Government (ELKAROLA project, KK-2015/00098). The research stay of A. Pierard at UPV/EHU was funded by the Erasmus program. The French database used in this study was generously provided by Acapela Group. We thank B. Picart for his help.

Author information

Authors and Affiliations

TCTS, University of Mons, Mons, Belgium
Arnaud Pierard & Thierry Dutoit
AHOLAB, University of the Basque Country (UPV/EHU), Bilbao, Spain
D. Erro, I. Hernaez & E. Navas
IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
D. Erro

Authors

Arnaud Pierard
View author publications
You can also search for this author in PubMed Google Scholar
D. Erro
View author publications
You can also search for this author in PubMed Google Scholar
I. Hernaez
View author publications
You can also search for this author in PubMed Google Scholar
E. Navas
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Dutoit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. Hernaez .

Editor information

Editors and Affiliations

INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Alberto Abad
I3A/University of Zaragoza, Zaragoza, Spain
Alfonso Ortega
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira
AtlantTIC Research Center, Universidad de Vigo, Vigo, Spain
Carmen García Mateo
Universitat Politècnica de València, Valencia, Spain
Carlos D. Martínez Hinarejos
University of Coimbra, Coimbra, Portugal
Fernando Perdigão
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Nuno Mamede

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pierard, A., Erro, D., Hernaez, I., Navas, E., Dutoit, T. (2016). Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-49169-1_8
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation