Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis
<p>The framework of Tacotron2+WaveRNN-based Dungan speech synthesis.</p> "> Figure 2
<p>Procedure of Dungan text analysis.</p> "> Figure 3
<p>Structure of a Dungan character.</p> "> Figure 4
<p>The framework of BLSTM_CRF-based Dungan Prosodic Boundary Prediction. The input is a Dungan sentence with prosodic information.</p> "> Figure 5
<p>The framework of Transformer-based Dungan character-to-unit conversion. The input is a Dungan sentence with prosodic information (<b>left</b>) and its corresponding Pinyin sequence (<b>right</b>). The output is the Pinyin sequence with prosodic information.</p> "> Figure 6
<p>Procedure of training the Dungan language acoustic model with transfer learning.</p> "> Figure 7
<p>The average MOS scores of synthesized Dungan speech under 95% confidence intervals.</p> "> Figure 8
<p>The average MOS scores of synthesized Mandarin speech under 95% confidence intervals.</p> "> Figure 9
<p>The average DMOS scores of synthesized Dungan speech under 95% confidence intervals.</p> "> Figure 10
<p>The average DMOS scores of synthesized Mandarin speech under 95% confidence intervals.</p> ">
Abstract
:1. Introduction
- Front-end: We have implemented a complete text analyzer for the Dungan language, encompassing modules for text normalization, word segmentation, prosodic boundary prediction, and unit generation based on transformer technology. This analyzer can produce initials and finals as speech synthesis units with prosodic labels from Dungan sentences.
- Back-end: We have achieved seq2seq Dungan language speech synthesis by adapting a pre-trained Mandarin acoustic model within the Tacotron2+WaveRNN framework. This was accomplished by replacing Tacotron2’s location-sensitive attention with forward attention, enhancing convergence speed and stability.
2. Models and Methods
2.1. Text Analyzer of Dungan Language
2.1.1. Speech Synthesis Unit of Dungan Language
2.1.2. Text Normalization
2.1.3. Word Segmentation
2.1.4. Prosodic Boundary Prediction
2.1.5. Transformer-Based Character-to-Unit Conversion
2.2. Transfer Learning-Based Dungan Acoustic Model
2.3. Pre-Trained Tacotron2-Based Mandarin Acoustic Model
3. Results
3.1. Evaluation on Transformer-Base Dungan Character-to-Unit Conversion
3.2. Evaluation on Transfer Learning-Based Dungan Acoustic Modols
3.2.1. Corpus
3.2.2. Experimental Setup
Dungan Monolingual Speaker-Dependent Model
Mandarin Monolingual Speaker-Dependent Model
Mandarin and Dungan Bilingual Speaker-Dependent Model
- MDSD-Tacotron+Griffin-Lim
- MDSM-Tacotron+Griffin-Lim
- MDSD-Tacotron2+WaveNet
- MDSM-Tacotron2+WaveNet
- MDSD-Tacotron2+WaveRNN
- MDSM-Tacotron2+WaveRNN
3.2.3. Objective Evaluations
3.2.4. Subjective Evaluation
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tu, T.; Chen, Y.J.; Chieh Yeh, C.; Yi Lee, H. End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning. arXiv 2019, arXiv:1904.06508. [Google Scholar]
- Liu, R.; Sisman, B.; Bao, F.; Yang, J.; Gao, G.; Li, H. Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 274–285. [Google Scholar] [CrossRef]
- Saeki, T.; Maiti, S.; Li, X.; Watanabe, S.; Takamichi, S.; Saruwatari, H. Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 2024, 32, 1829–1844. [Google Scholar] [CrossRef]
- Xu, J.; Tan, X.; Ren, Y.; Qin, T.; Li, J.; Zhao, S.; Liu, T.Y. LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD’20, New York, NY, USA, 6–10 July 2020; pp. 2802–2812. [Google Scholar] [CrossRef]
- He, M.; Yang, J.; He, L.; Soong, F.K. Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis. arXiv 2021, arXiv:2103.03541. [Google Scholar]
- Oliveira, F.S.; Casanova, E.; Junior, A.C.; Soares, A.S.; Galvão Filho, A.R. CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages. In Text, Speech, and Dialogue; Ekštein, K., Pártl, F., Konopík, M., Eds.; Springer: Cham, Switzerland, 2023; pp. 188–199. [Google Scholar]
- Zhu, Y. Donggan Language: A Special Variety of the Shaanxi and Gansu Dialects. Asian Lang. Cult. 2013, 4, 51–60. [Google Scholar]
- Jiang, Y. Donggan Language and Its Relation to the Shaanxi and Gansu Dialects. J. Chin. Linguist. 2014, 42, 229–258. [Google Scholar]
- Chen, L.; Yang, H.; Wang, H. Research on Dungan speech synthesis based on Deep Neural Network. In Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, Taiwan, 26–29 November 2018; pp. 46–50. [Google Scholar] [CrossRef]
- Jiang, R.; Chen, C.; Shan, X.; Yang, H. Using Speech Enhancement to Realize Speech Synthesis of Low-Resource Dungan Languages. In Proceedings of the 2021 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Singapore, 18–20 November 2021; pp. 193–198. [Google Scholar] [CrossRef]
- Hunt, A.J.; Black, A.W. Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Atlanta, GA, USA, 9 May 1996; Volume 1, pp. 373–376. [Google Scholar]
- Tokuda, K.; Nankaku, Y.; Toda, T.; Zen, H.; Yamagishi, J.; Oura, K. Speech synthesis based on hidden Markov models. Proc. IEEE 2013, 101, 1234–1252. [Google Scholar] [CrossRef]
- Ling, Z.H.; Deng, L.; Yu, D. Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2129–2139. [Google Scholar] [CrossRef]
- Zen, H.; Senior, A.; Schuster, M. Statistical parametric speech synthesis using deep neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 7962–7966. [Google Scholar] [CrossRef]
- Wang, P.; Qian, Y.; Soong, F.K.; He, L.; Zhao, H. Word embedding for recurrent neural network based TTS synthesis. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 4879–4883. [Google Scholar] [CrossRef]
- Yu, Q.; Liu, P.; Wu, Z.; Ang, S.K.; Meng, H.; Cai, L. Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 5545–5549. [Google Scholar] [CrossRef]
- Tan, X.; Chen, J.; Liu, H.; Cong, J.; Zhang, C.; Liu, Y.; Wang, X.; Leng, Y.; Yi, Y.; He, L.; et al. NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4234–4245. [Google Scholar] [CrossRef]
- Wang, Y.; Skerry-Ryan, R.J.; Stanton, D.; Wu, Y.; Weiss, R.J.; Jaitly, N.; Yang, Z.; Xiao, Y.; Chen, Z.; Bengio, S.; et al. Tacotron: Towards End-to-End Speech Synthesis. In Proceedings of the 18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, 20–24 August 2017. [Google Scholar]
- Shen, J.; Pang, R.; Weiss, R.J.; Schuster, M.; Jaitly, N.; Yang, Z.; Chen, Z.; Zhang, Y.; Wang, Y.; Skerrv-Ryan, R.; et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 4779–4783. [Google Scholar] [CrossRef]
- Griffin, D.; Lim, J. Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 1984, 32, 236–243. [Google Scholar] [CrossRef]
- van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
- Kalchbrenner, N.; Elsen, E.; Simonyan, K.; Noury, S.; Casagrande, N.; Lockhart, E.; Stimberg, F.; Van den Oord, A.; Dieleman, S.; Kavukcuoglu, K. Efficient Neural Audio Synthesis. arXiv 2018, arXiv:1802.08435. [Google Scholar]
- Byambadorj, Z.; Nishimura, R.; Ayush, A.; Ohta, K.; Kitaoka, N. Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation. EURASIP J. Audio Speech Music. Process. 2021, 2021, 42. [Google Scholar] [CrossRef]
- Joshi, R.; Garera, N. Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation, Hong Kong, China, 2–4 December 2023; Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E.A.P., Zeng, W.H., Peng, B., Li, Y., et al., Eds.; ACL: Hong Kong, China, 2023; pp. 267–273. [Google Scholar]
- Do, P.; Coler, M.; Dijkstra, J.; Klabbers, E. Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection. In Proceedings of the 12th ISCA Speech Synthesis Workshop (SSW2023), Grenoble, France, 26–28 August 2023; pp. 21–26. [Google Scholar] [CrossRef]
- Azizah, K.; Jatmiko, W. Transfer learning, style control, and speaker reconstruction loss for zero-shot multilingual multi-speaker text-to-speech on low-resource languages. IEEE Access 2022, 10, 5895–5911. [Google Scholar] [CrossRef]
- Cai, Z.; Yang, Y.; Li, M. Cross-lingual multi-speaker speech synthesis with limited bilingual training data. Comput. Speech Lang. 2023, 77, 101427. [Google Scholar] [CrossRef]
- Yang, H.; Oura, K.; Wang, H.; Gan, Z.; Tokuda, K. Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis. Multimed. Tools Appl. 2015, 74, 9927–9942. [Google Scholar] [CrossRef]
- Wang, L.; Yang, H. Tibetan word segmentation method based on bilstm_ crf model. In Proceedings of the IEEE 2018 International Conference on Asian Language Processing (IALP), Bandung, Indonesia, 15–17 November 2018; pp. 297–302. [Google Scholar]
- Zhang, W.; Yang, H.; Bu, X.; Wang, L. Deep learning for mandarin-tibetan cross-lingual speech synthesis. IEEE Access 2019, 7, 167884–167894. [Google Scholar] [CrossRef]
- Zhang, W.; Yang, H. Improving Sequence-to-sequence Tibetan Speech Synthesis with Prosodic Information. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 6012. [Google Scholar] [CrossRef]
- Zhang, W.; Yang, H. Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis. Appl. Sci. 2022, 12, 2185. [Google Scholar] [CrossRef]
- Hai, F. A Pilot Study of Loan Words in Central-Asian Dungan Language. Xinjiang Univ. J. 2000, 28, 58–63. [Google Scholar]
- Lin, T. Features, Situation and Development Trends of Tung’gan Language in Central Asia. Contemp. Linguist. 2016, 18, 234–243. [Google Scholar]
- Gladney, D.C. Relational alterity: Constructing dungan (hui), uygur, and Kazakh identities across China, central Asia, and Turkey. Hist. Anthropol. 1996, 9, 445–477. [Google Scholar] [CrossRef]
- Miao, D.X. Bilingual Teaching Model of the Donggan People. J. Res. Educ. Ethn. Minor. 2008, 19, 111–114. [Google Scholar]
- Jia, Y.; Huang, D.; Liu, W.; Dong, Y.; Yu, S.; Wang, H. Text normalization in mandarin text-to-speech system. In Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 4693–4696. [Google Scholar] [CrossRef]
- Wanmezhaxi, N. Research on Several Key Issues in Tibetan Word Segmentation. J. Chin. Inf. Process. 2014, 28, 132–139. [Google Scholar]
- Zavyalova, O. Dungan Language. 2015. Available online: https://www.academia.edu/42869092/Dungan_Language (accessed on 16 June 2024).
- Lin, T. Donggan Writing—A Successful Trial of Chinese Alphabetic Writing. J. Second. Northwest Univ. Natl. 2005, 2005, 31–36. [Google Scholar]
- Yang, W.J.; Zhang, R. Ethnic Identity in the Context of Across-nation—A Study Case of “Dunggan” and the Hui Nationality. J. South-Cent. Univ. Natl. 2009, 29, 31–36. [Google Scholar]
- Zheng, Y.; Tao, J.; Wen, Z.; Li, Y. BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End. Proc. Interspeech 2018, 9, 47–51. [Google Scholar] [CrossRef]
- Hlaing, A.M.; Pa, W.P. Sequence-to-Sequence Models for Grapheme to Phoneme Conversion on Large Myanmar Pronunciation Dictionary. In Proceedings of the 2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Cebu, Philippines, 25–27 October 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Artificial Neural Networks and Machine Learning—ICANN 2018; Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I., Eds.; Springer: Cham, Switzerland, 2018; pp. 270–279. [Google Scholar]
- Wang, D.; Zhang, X. THCHS-30: A Free Chinese Speech Corpus. arXiv 2015, arXiv:1512.01882. [Google Scholar]
- Kubichek, R. Mel-cepstral distance measure for objective speech quality assessment. In Proceedings of the IEEE Pacific Rim Conference on Communications Computers and Signal Processing, Victoria, BC, Canada, 19–21 May 1993; Volume 1, pp. 125–128. [Google Scholar] [CrossRef]
- Dhiman, J.K.; Seelamantula, C.S. A Spectro-temporal Technique for Estimating Aperiodicity and Voiced/unvoiced Decision Boundaries of Speech Signals. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2019), Brighton, UK, 12–17 May 2019; pp. 6510–6514. [Google Scholar] [CrossRef]
- Castelazo, I.; Mitani, Y. On the use of the mean squared error as a proficiency index. Accredit. Qual. Assur. 2012, 17, 95–97. [Google Scholar] [CrossRef]
- Ren, Y.; Tan, X.; Qin, T.; Zhao, S.; Zhao, Z.; Liu, T.Y. Almost Unsupervised Text to Speech and Automatic Speech Recognition. arXiv 2020, arXiv:1905.06791. [Google Scholar]
- Ren, Y.; Hu, C.; Tan, X.; Qin, T.; Zhao, S.; Zhao, Z.; Liu, T.Y. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. arXiv 2022, arXiv:2006.04558. [Google Scholar]
- Chen, J.; Song, X.; Peng, Z.; Zhang, B.; Pan, F.; Wu, Z. LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2023), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Guo, Y.; Du, C.; Ma, Z.; Chen, X.; Yu, K. VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2024), Seoul, Republic of Korea, 14–19 April 2024; pp. 11121–11125. [Google Scholar] [CrossRef]
- Wang, C.; Chen, S.; Wu, Y.; Zhang, Z.; Zhou, L.; Liu, S.; Chen, Z.; Liu, Y.; Wang, H.; Li, J.; et al. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arXiv 2023, arXiv:2301.02111. [Google Scholar]
- Łajszczak, M.; Cámbara, G.; Li, Y.; Beyhan, F.; van Korlaar, A.; Yang, F.; Joly, A.; Martín-Cortinas, Á.; Abbas, A.; Michalski, A.; et al. BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data. arXiv 2024, arXiv:2402.08093. [Google Scholar]
initials | /b/, /p/, /m/, /f/, /v/, /z/, /c/, /s/, /d/, /t/,/n/,/l/ |
/zh/, /ch/,/sh/, /r/, /j/, /q/, /x/, /g/, /k/, /ng/, /h/, // | |
finals | /ii/, /iii/, /i/, /u/, /y/, /a/, /ia/, /ua/, /e/, /ue/, /ye/, /iE/ |
/ap/, /ai, /uai/, /ei/, /ui/, /ao/, /iao/, /ou/, /iou/, /an/, /ian/ | |
/uan/, /yan/, /aN/,/iaN/, /uaN/, /uN/, /iN/, /yN/ |
Parameter | Value |
---|---|
Attention layers | 6 |
Heads | 8 |
Batch size | 32 |
Hidden | 513 |
Dropout | 0.1 |
Learning rate | 0.0001 |
Precision | Recall | F1 |
---|---|---|
90.12 | 89.91 | 90.01 |
Model | Tacotron | Tacotron2 | Forward-Attention Tacotron2 | |
---|---|---|---|---|
Vocoder | Griffin-Lim | WaveNet | WaveRNN | |
Encoder | Embedding | Phomeme (256) | Phomeme (512) | Phomeme (512) |
Pre-net | FFN (256, 128) | - | FFN Phomeme (512, 256) | |
Encoder core | CBHG (256) | CNN (512) Bi-LSTM (512) | CNN (256) Bi-LSTM (256, 512) | |
Decoder | Post-net | CBHG (256) | CNN (512) | CNN (512) |
Decoder RNN | GRU (256, 256) | - | LSTM (512, 256) | |
Attention | Additive (256) | Location-sensitive (128) | Forward (256) | |
Attention RNN | GRU (256) | LSTM (1024, 1024) | LSTM (256) | |
Pre-net | FFN (256, 128) | FFN (256, 256) | FFN (256, 128) | |
Parameter |
Model | Tacotron+Griffin-Lim | Tacotron2+WaveNet | Tacotron2+WaveRNN |
---|---|---|---|
MCD (dB) | 9.675 | 9.572 | 9.502 |
BAP (dB) | 0.189 | 0.187 | 0.170 |
F0 RMSE (Hz) | 32.785 | 32.692 | 32.087 |
V/UV (%) | 9.867 | 9.721 | 9.875 |
Model | Tacotron+Griffin-Lim | Tacotron2+WaveNet | Tacotron2+WaveRNN |
---|---|---|---|
MCD (dB) | 5.460 | 5.291 | 5.036 |
BAP (dB) | 0.174 | 0.171 | 0.169 |
F0 RMSE (Hz) | 14.629 | 13.986 | 13.647 |
V/UV (%) | 5.619 | 5.793 | 5.762 |
Model | Tacotron+Griffin-Lim | Tacotron2+WaveNet | Tacotron2+WaveRNN |
---|---|---|---|
MCD (dB) | 7.523 | 7.419 | 7.395 |
BAP (dB) | 0.178 | 0.175 | 0.174 |
F0 RMSE (Hz) | 26.891 | 26.753 | 26.617 |
V/UV (%) | 7.774 | 7.693 | 7.607 |
Model | Tacotron+Griffin-Lim | Tacotron2+WaveNet | Tacotron2+WaveRNN |
---|---|---|---|
MCD(dB) | 5.339 | 5.241 | 5.108 |
BAP (dB) | 0.174 | 0.173 | 0.171 |
F0 RMSE (Hz) | 13.775 | 13.326 | 13.092 |
V/UV (%) | 5.542 | 5.472 | 5.481 |
Model | IR (%) | DFR (%) |
---|---|---|
DSD-Tacotron+Griffin-Lim | 82.93 | 79.64 |
DSD-Tacotron2+WaveNet | 86.67 | 82.43 |
DSD-Tacotron2+WaveRNN | 89.41 | 84.39 |
MDSD-Tacotron+Griffin-Lim | 95.03 | 91.14 |
MDSD-Tacotron2+WaveNet | 96.69 | 94.43 |
MDSD-Tacotron2+WaveRNN | 98.47 | 97.39 |
DSD-Tacotron+ Griffin-Lim | DSD-Tacotron2+ WaveNet | DSD-Tacotron2 +WaveRNN | MDSD-Tacotron+ Griffin-Lim | MDSD-Tacotron2 +WaveNet | MDSD-Tacotron2+ WaveRNN | Neutral | |
---|---|---|---|---|---|---|---|
1 | 12.7 | 22.9 | 52.6 | - | - | - | 11.8 |
2 | 29.5 | 32.0 | 27.6 | - | - | - | 10.9 |
3 | - | - | - | 17.7 | - | 69.9 | 12.4 |
4 | - | - | - | 3.2 | 70.8 | 11.3 | |
5 | - | - | - | - | 17.1 | 72.1 | 10.8 |
MSD-Tacotron+ Griffin-Lim | MSD-Tacotron2+ WaveNet | MSD-Tacotron2+ WaveRNN | MDSM-Tacotron+ Griffin-Lim | MDSM-Tacotron2+ WaveNet | MDSM-Tacotron2+ WaveRNN | Neutral | |
---|---|---|---|---|---|---|---|
1 | - | 24.54 | 63.56 | - | - | - | 11.9 |
2 | - | 19.98 | 67.42 | - | - | - | 12.6 |
3 | - | - | - | - | 11.8 | 71.9 | 16.3 |
4 | - | - | - | 14.4 | - | 75.1 | 10.5 |
5 | - | - | - | - | 10.7 | 79.6 | 9.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, M.; Jiang, R.; Yang, H. Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis. Appl. Sci. 2024, 14, 6336. https://doi.org/10.3390/app14146336
Liu M, Jiang R, Yang H. Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis. Applied Sciences. 2024; 14(14):6336. https://doi.org/10.3390/app14146336
Chicago/Turabian StyleLiu, Mengrui, Rui Jiang, and Hongwu Yang. 2024. "Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis" Applied Sciences 14, no. 14: 6336. https://doi.org/10.3390/app14146336
APA StyleLiu, M., Jiang, R., & Yang, H. (2024). Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis. Applied Sciences, 14(14), 6336. https://doi.org/10.3390/app14146336