Computer Science > Artificial Intelligence

arXiv:2306.02593 (cs)

[Submitted on 5 Jun 2023]

Title:Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Authors:Dengfeng Ke, Yayue Deng, Yukang Jia, Jinlong Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang, Binghuai Lin

View PDF

Abstract:Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skipping, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate natural speech while ensuring robustness. In this study, we propose Rhythm-controllable Attention (RC-Attention) based on Tracotron2, which improves robustness and naturalness simultaneously. Proposed attention adopts a trainable scalar learned from four kinds of information to achieve rhythm control, which makes rhythm control more robust and natural, even when synthesized sentences are extremely longer than training corpus. We use word errors counting and AB preference test to measure robustness of proposed method and naturalness of synthesized speech, respectively. Results shows that RC-Attention has the lowest word error rate of nearly 0.6%, compared with 11.8% for baseline system. Moreover, nearly 60% subjects prefer to the speech synthesized with RC-Attention to that with Forward Attention, because the former has more natural rhythm.

Comments:	5 pages, 3 figures, Published in: 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.02593 [cs.AI]
	(or arXiv:2306.02593v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2306.02593
Related DOI:	https://doi.org/10.1109/ISCSLP57327.2022.10037822

Submission history

From: Yayue Deng [view email]
[v1] Mon, 5 Jun 2023 04:52:33 UTC (1,654 KB)

Computer Science > Artificial Intelligence

Title:Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators