Boundary Refining Aiming at Speech Synthesis Applications

Monique V. Nicodem¹,
Sandra G. Kafka¹,
Rui Seara Jr.¹ &
…
Rui Seara¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5190))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

581 Accesses

Abstract

In concatenative synthesis, speech is produced by joining segments automatically selected among units contained in a previously segmented database. The synthetic speech resulting from such a technique is often improved when accurate segmentation tools are considered. The performance of these tools is often enhanced by a hybrid approach resulting from the association of an HMM modeling with a boundary refining process. Such a refining has been carried out sucessfully by using techniques based on neural networks. This paper presents a set of networks that outperform other topologies discussed in the literature. These networks are trained by performing a clusterization of the training set taking into consideration phonetic transitions with similarities to each other.

This work was partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq), Studies and Projects Funding Body (FINEP), and Dígitro Tecnologia Ltda.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

LSTM-Based Speech Segmentation for TTS Synthesis

Automatic Phoneme Border Detection to Improve Speech Recognition

LSTM-Based Speech Segmentation Trained on Different Foreign Languages

References

Chou, F.-C., Tseng, C.-Y., Lee, L.-S.: An Evaluation of Cost Functions Sensitively Capturing Local Degradation of Naturalness for Segment Selection in Concatenative Speech Synthesis. Speech Communication 48(1), 45–56 (2006)
Article Google Scholar
Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database. In: ICASSP, pp. 373–376. IEEE Press, Atlanta (1996)
Google Scholar
Kawai, H., Toda, H., Ni, J.: Ximera: A New TTS from ATR Based on Corpus-Based Technologies. In: SSW, pp. 179–184. ISCA Press, Pittsburg (2004)
Google Scholar
Lee, K.-S.: MLP-Based Phone Boundary Refining for a TTS Database. IEEE Trans. Audio, Speech, Language Processing 14(3), 981–989 (2006)
Article Google Scholar
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, Upper Saddle River (2001)
Google Scholar
Toledano, D.T.: Neural Network Boundary Refining for Automatic Speech Segmentation. In: ICASSP, pp. 3438–3441. IEEE Press, Istanbul (2000)
Google Scholar
Deller Jr., J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signals. IEEE Press, New York (2000)
Google Scholar
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.1). Cambridge University, Cambridge (2001)
Google Scholar
Athaudage, C.R.N., Lech, M.: On Optimal Modeling of Speech Spectral Transitions. In: ICICS, pp. 1330–1334. IEEE Press, Singapore (2003)
Google Scholar
Klabbers, E., Veldhuis, R.: Reducing Audible Spectral Discontinuities. IEEE Trans. Speech Audio Processing 9(1), 39–51 (2001)
Article Google Scholar
Silva, T.C.: Phonetic and Phonology of the Portuguese Language: Study Script and Exercise Guide. Contexto, Sao Paulo (in Portuguese) (1999)
Google Scholar
Wang, L., Zhao, Y., Chu, M., Soong, F.K., Zhou, J., Cao, Z.: Context Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units. IEICE Trans. Information and Systems E89-D 3, 1082–1091 (2006)
Article Google Scholar
Molau, S., Pitz, M., Schluter, R., Ney, H.: Computing Mel-Frequency Cepstral Coefficients on the Power Spectrum. In: ICASSP, pp. 73–76. IEEE Press, Salt Lake City (2001)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1998)
Google Scholar
Nissen, S., Spilca, A., Zabot, A.: Fast Artificial Neural Networks (FANN), http://leenissen.dk/fann/

Download references

Author information

Authors and Affiliations

LINSE – Circuits and Signal Processing Laboratory Department of Electrical Engineering, Federal University of Santa Catarina, Brazil
Monique V. Nicodem, Sandra G. Kafka, Rui Seara Jr. & Rui Seara

Authors

Monique V. Nicodem
View author publications
You can also search for this author in PubMed Google Scholar
Sandra G. Kafka
View author publications
You can also search for this author in PubMed Google Scholar
Rui Seara Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Rui Seara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicodem, M.V., Kafka, S.G., Seara, R., Seara, R. (2008). Boundary Refining Aiming at Speech Synthesis Applications. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-85980-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85979-6
Online ISBN: 978-3-540-85980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics