Abstract
In concatenative synthesis, speech is produced by joining segments automatically selected among units contained in a previously segmented database. The synthetic speech resulting from such a technique is often improved when accurate segmentation tools are considered. The performance of these tools is often enhanced by a hybrid approach resulting from the association of an HMM modeling with a boundary refining process. Such a refining has been carried out sucessfully by using techniques based on neural networks. This paper presents a set of networks that outperform other topologies discussed in the literature. These networks are trained by performing a clusterization of the training set taking into consideration phonetic transitions with similarities to each other.
This work was partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq), Studies and Projects Funding Body (FINEP), and Dígitro Tecnologia Ltda.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chou, F.-C., Tseng, C.-Y., Lee, L.-S.: An Evaluation of Cost Functions Sensitively Capturing Local Degradation of Naturalness for Segment Selection in Concatenative Speech Synthesis. Speech Communication 48(1), 45–56 (2006)
Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database. In: ICASSP, pp. 373–376. IEEE Press, Atlanta (1996)
Kawai, H., Toda, H., Ni, J.: Ximera: A New TTS from ATR Based on Corpus-Based Technologies. In: SSW, pp. 179–184. ISCA Press, Pittsburg (2004)
Lee, K.-S.: MLP-Based Phone Boundary Refining for a TTS Database. IEEE Trans. Audio, Speech, Language Processing 14(3), 981–989 (2006)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, Upper Saddle River (2001)
Toledano, D.T.: Neural Network Boundary Refining for Automatic Speech Segmentation. In: ICASSP, pp. 3438–3441. IEEE Press, Istanbul (2000)
Deller Jr., J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signals. IEEE Press, New York (2000)
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.1). Cambridge University, Cambridge (2001)
Athaudage, C.R.N., Lech, M.: On Optimal Modeling of Speech Spectral Transitions. In: ICICS, pp. 1330–1334. IEEE Press, Singapore (2003)
Klabbers, E., Veldhuis, R.: Reducing Audible Spectral Discontinuities. IEEE Trans. Speech Audio Processing 9(1), 39–51 (2001)
Silva, T.C.: Phonetic and Phonology of the Portuguese Language: Study Script and Exercise Guide. Contexto, Sao Paulo (in Portuguese) (1999)
Wang, L., Zhao, Y., Chu, M., Soong, F.K., Zhou, J., Cao, Z.: Context Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units. IEICE Trans. Information and Systems E89-D 3, 1082–1091 (2006)
Molau, S., Pitz, M., Schluter, R., Ney, H.: Computing Mel-Frequency Cepstral Coefficients on the Power Spectrum. In: ICASSP, pp. 73–76. IEEE Press, Salt Lake City (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1998)
Nissen, S., Spilca, A., Zabot, A.: Fast Artificial Neural Networks (FANN), http://leenissen.dk/fann/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nicodem, M.V., Kafka, S.G., Seara, R., Seara, R. (2008). Boundary Refining Aiming at Speech Synthesis Applications. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-85980-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85979-6
Online ISBN: 978-3-540-85980-2
eBook Packages: Computer ScienceComputer Science (R0)