[go: up one dir, main page]

Skip to content

Aiming to achieve ultimate Multilingual TTS pipeline with main focus on releasing COQUI🐸TTS(Text-to-Speech) based high performing neural voice cloning systems for Bangla for the first time, supporting different SOTA models for Bangla and also Multilingual (Arabic+Bengali) code mixed TTS pipeline.

License

Notifications You must be signed in to change notification settings

mobassir94/comprehensive-bangla-tts

Repository files navigation

UPDATE-2 (8/24/2023)

trained bangla ViTs model with phoneme.

  1. training notebook -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/bn_vits_tts/Bangla_phoneme_ViTS_trainer.ipynb

  2. test/inference notebook -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/bn_vits_tts/Bangla_phoneme_ViTS_inference.ipynb

All weight files -> https://www.kaggle.com/datasets/mobassir/comprehensive-bangla-tts

UPDATE-1

we are delighted to let you know that the bangla tts work of this repo is now available in famous COQUI🐸TTS(Text-to-Speech),please check this -> https://github.com/coqui-ai/TTS/releases and this colab demo as well -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/Bangla_text_to_speech_(TTS).ipynb

however to use the multilingual tts pipeline you still need codebase of this repository,thanks

Mission and Vision

With infinite kindness,mercy and blessings of Allah, we are launching an open source Islamic book reader system today for everyone that knows/speaks Bangla and arabic. Even though spoken by more than 210 million people as a first or second language,Bangla is still a low resource language. It is also a very difficult language because of its many sounds and spelling rules. Additionally, the script is vastly different from English and other Latin Languages.

The main purpose of making Comprehensive Multilingual Speech synthesis was to reach people through Bengali Hadith and Glorious Quran in the Bengali language.

Our Contributions

  • Collect/Scrape various important bangla-arabic or english-arabic hadith,tafsir and seerah books from the internet and translate english-arabic to bangla-arabic using powerful bangla neural machine translator. you will find our scraper with comprehensive documentation here : https://github.com/mnansary/hadith-srcapper

  • To the best of our knowledge (from our extensive google search and research and extensive human validation) we’ve discovered that the Bangla Vits TTS (text to speech) system that we trained and used for reading various bangla tafsir / hadith is the highest performing State of the Art (SOTA) Bangla neural voice cloning system till this date (Thursday, December 29, 2022) that’s ever released publicly for Bangla language for free and it beats past TTS systems like gtts,silero-tts,indic-tts by large margin in terms of quality.

  • First ever multilingual book reading pipeline that can read Bangla+Arabic code mixed books with ease.

  • We read all the books or sources chapter by chapter and made audiobooks.

  • performed audiobooks to videobooks conversion using ffmpeg

The entire process may not be 100% accurate. English to Bengali translation may contain errors in many cases, or because it is not read by humans (which is very time-consuming and expensive). It sometimes makes critical pronunciation mistakes as well, but we hope that these problems will be solved by the subsequent improvement of this work InSha'Allah.

Training and inference

we used fantastic coqui-ai🐸đŸ’Ŧ - toolkit for bangla Text-to-Speech training with IITM dataset converted in ljspeech format. we've trained 4 models and they are : glowtts(male),glowtts(female),vits(male) and vits(female). glowtts didn't perform as well as expected because the coqui-ai used attached vocoder. in order to improve the glowtts performance one need to train spectrogram models and vocoder seperately and used a powerful vocoder instead like hifi gan 2. vits male and female variants are our best model that we used for making most of the audiobooks. from this Comprehensive_Bangla_Text_to_Speech_(TTS) demo notebook you can see the sound quality of the vits model is almost as good as the training dataset which can be found here : https://www.kaggle.com/datasets/mobassir/comprehensive-bangla-tts that means End to End vits can clone human voice with high quality and it's attached vocoder is doing enough good job,one way to improve its performance could be to make robust G2P model for bangla and use phonemes during training.

each directory in this repo contains .txt file describing what that particular folders codes are doing.

for multilingual (bangla+arabic) inference demo you can check this colab tutorial Multilingual_(ben+ara)_tts_inference_colab_demo.ipynb and video tutorial of the API version of it is available here

Check out some of the samples generated by our system :

Google Drive Link -> Bangla Islamic Audiobooks

Multilingual (Bangla+arabic) Audiobooks

Books Total_Hadiths/Surahs' is english to bangla Neural Machine Translated? Neural Speech synthesized Multilingual (Bangla+arabic) Audiobooks
āĻ¤āĻžāĻĢāĻ¸ā§€āĻ° āĻ‡āĻŦāĻ¨ā§‡ āĻ•āĻžāĻ¸ā§€āĻ° 114(surah) Yes https://www.youtube.com/playlist?list=PLsHVxzxNumvPOOnpy0om5F8uSm66gEbwF
āĻŦāĻžāĻ‚āĻ˛āĻž āĻ¸ā§€āĻ°āĻžāĻšāĻƒāĻ¨āĻŦā§€āĻœāĻŋ āĻ¸āĻžāĻ˛ā§āĻ˛āĻžāĻ˛ā§āĻ˛āĻžāĻšā§ āĻ†āĻ˛āĻžāĻ‡āĻšāĻŋ āĻ“āĻ¯āĻŧāĻžāĻ¸āĻžāĻ˛ā§āĻ˛āĻžāĻŽ āĻāĻ° āĻœā§€āĻŦāĻ¨ā§€ by Dr. Yasir Qadhi 101 lectures Yes https://www.youtube.com/playlist?list=PLsHVxzxNumvPSbuqcL8oSWoxCPpZ2A3HT
āĻ¤āĻžāĻĢāĻ¸ā§€āĻ°ā§‡ āĻœāĻžāĻ•āĻžāĻ°āĻŋāĻ¯āĻŧāĻž (Tafsir Abu Bakar Zakaria) 114(surah) No https://www.youtube.com/playlist?list=PLsHVxzxNumvOintrZMeFFubL5132E72Yl
āĻ¤āĻžāĻĢāĻ¸ā§€āĻ°ā§‡ āĻ†āĻšāĻ¸āĻžāĻ¨ā§āĻ˛ āĻŦāĻžāĻ¯āĻŧāĻžāĻ¨ 114(surah) No https://www.youtube.com/playlist?list=PLsHVxzxNumvOT0a1ioq5fubqnDAjZ7PVj
āĻ¤āĻžāĻĢāĻ¸ā§€āĻ°ā§‡ āĻœāĻžāĻ˛āĻžāĻ˛āĻžāĻ‡āĻ¨ (Tafsir AL Jalalain) 114(surah) Yes https://www.youtube.com/playlist?list=PLsHVxzxNumvNbYBLhNoAIxw7BaS3yY2XB
āĻ¸āĻšāĻŋāĻš āĻŦā§āĻ–āĻžāĻ°ā§€ 7563 No https://www.youtube.com/playlist?list=PLsHVxzxNumvNIlU0TjaQaAUAWr9DuZedv
āĻ¸āĻšāĻŋāĻš āĻŽā§āĻ¸āĻ˛āĻŋāĻŽ 7500 No https://www.youtube.com/playlist?list=PLsHVxzxNumvOmpGmZKy38RvOYwAWKssDu
āĻ¸ā§āĻ¨āĻžāĻ¨ā§‡ āĻ†āĻ¨-āĻ¨āĻžāĻ¸āĻžā§Ÿā§€ 5758 No https://www.youtube.com/playlist?list=PLsHVxzxNumvNoGSguLsp3ePTR4WOUhZvT
āĻ¸ā§āĻ¨āĻžāĻ¨ā§‡ āĻ†āĻŦā§ āĻĻāĻžāĻ‰āĻĻ 5274 No https://www.youtube.com/playlist?list=PLsHVxzxNumvNZ2QPues46JtcRrwK4QF0I
āĻœāĻžāĻŽā§‡' āĻ†āĻ¤-āĻ¤āĻŋāĻ°āĻŽāĻŋāĻœāĻŋ 3956 No https://www.youtube.com/playlist?list=PLsHVxzxNumvMb31g0oeJLxmYlufZrYC0X
āĻ¸ā§āĻ¨āĻžāĻ¨ā§‡ āĻ‡āĻŦāĻ¨ā§‡ āĻŽāĻžāĻœāĻžāĻš 4341 No āĻ¸ā§āĻ¨āĻžāĻ¨ā§‡ āĻ‡āĻŦāĻ¨ā§‡ āĻŽāĻžāĻœāĻžāĻš
āĻŽā§ā§ŸāĻžāĻ¤ā§āĻ¤āĻž āĻ‡āĻŽāĻžāĻŽ āĻŽāĻžāĻ˛āĻŋāĻ• 1832 No https://www.youtube.com/playlist?list=PLsHVxzxNumvPn_8D5bn86OTQ9WFRocGGj
āĻ°āĻŋā§ŸāĻžāĻĻā§āĻ¸ āĻ¸āĻ˛ā§‡āĻšāĻŋāĻ¨ 1905 No https://www.youtube.com/playlist?list=PLsHVxzxNumvON9GuH8N28YbJiJV0c-abc
āĻŦā§āĻ˛ā§āĻ—ā§āĻ˛ āĻŽāĻžāĻ°āĻžāĻŽ 1568 No https://www.youtube.com/playlist?list=PLsHVxzxNumvMx11DlgONaLej3IeyVTXig
āĻ†āĻ˛ āĻ˛ā§'āĻ˛ā§ āĻ“ā§ŸāĻžāĻ˛ āĻŽāĻžāĻ°āĻœāĻžāĻ¨ 1906 No https://www.youtube.com/playlist?list=PLsHVxzxNumvOELySX1jhuO2tlzpmnmrvq
āĻšāĻžāĻĻāĻŋāĻ¸ āĻ¸āĻŽā§āĻ­āĻžāĻ° 2013 No https://www.youtube.com/playlist?list=PLsHVxzxNumvPCCit-aKpSjls4KgabZOhb
āĻ¸āĻŋāĻ˛āĻ¸āĻŋāĻ˛āĻž āĻ¸āĻšāĻŋāĻšāĻž 60 No https://www.youtube.com/watch?v=geVWWA8RX3Q&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7&index=11
āĻœāĻžāĻ˛ āĻœā§ŸāĻŋāĻĢ āĻšāĻžāĻĻāĻŋāĻ¸ āĻ¸āĻŋāĻ°āĻŋāĻœ 102 No https://www.youtube.com/watch?v=R1CU0AAiB7Y&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7
āĻŽāĻŋāĻļāĻ•āĻžāĻ¤ā§āĻ˛ āĻŽāĻžāĻ¸āĻžāĻŦāĻŋāĻš 2758 No https://www.youtube.com/playlist?list=PLsHVxzxNumvNnnPeAOIhxBWcmlrvblKxb
ā§Ēā§Ļ āĻšāĻžāĻĻāĻŋāĻ¸ 42 No https://www.youtube.com/watch?v=ROMcvpPpvoE&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7&index=2
āĻ†āĻĻāĻžāĻŦā§āĻ˛ āĻŽā§āĻĢāĻ°āĻžāĻĻ 1336 No https://www.youtube.com/playlist?list=PLsHVxzxNumvNQfGS2d0DQHRX6_TYlnO1m
āĻœā§āĻœ'āĻ‰āĻ˛ āĻ°āĻžāĻĢāĻžā§Ÿā§‡āĻ˛ āĻ‡ā§ŸāĻžāĻĻāĻžāĻ‡āĻ¨ 56 No https://www.youtube.com/watch?v=mQtAo_xEhgs&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7&index=9
āĻ¸āĻšāĻŋāĻš āĻšāĻžāĻĻāĻŋāĻ¸ā§‡ āĻ•ā§āĻĻāĻ¸āĻŋ 163 No https://www.youtube.com/watch?v=mqUIy6d6UfI&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7&index=8
ā§§ā§Ļā§Ļ āĻ¸ā§āĻ¸āĻžāĻŦā§āĻ¯āĻ¸ā§āĻ¤ āĻšāĻžāĻĻāĻŋāĻ¸ 101 No https://www.youtube.com/watch?v=ZBs-ZyI3brw&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7&index=3
āĻŽāĻŋāĻļāĻ•āĻžāĻ¤ā§‡ āĻœā§ŸāĻŋāĻĢ āĻšāĻžāĻĻāĻŋāĻ¸ 106 No https://www.youtube.com/playlist?list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7
āĻļāĻžāĻŽāĻžā§Ÿā§‡āĻ˛ā§‡ āĻ¤āĻŋāĻ°āĻŽāĻŋāĻ¯āĻŋ 320 No https://www.youtube.com/playlist?list=PLsHVxzxNumvNduizMiAvAVWRUu0UEx2Bw
āĻ¸āĻšāĻŋāĻš āĻ¤āĻžāĻ°āĻ—āĻŋāĻŦ āĻ“ā§ŸāĻžāĻ¤ āĻ¤āĻžāĻšāĻ°āĻŋāĻŦ 200 No https://www.youtube.com/playlist?list=PLsHVxzxNumvPI5tg5cDoWBCWLQjXcSlBg
āĻ¸āĻšāĻŋāĻš āĻĢāĻžāĻ¯āĻžā§Ÿā§‡āĻ˛ā§‡ āĻ†āĻŽāĻ˛ 151 No https://www.youtube.com/playlist?list=PLsHVxzxNumvMMUpTUsJykeFyWzttWKZyw
āĻŠāĻĒāĻĻā§‡āĻļ 234 No https://www.youtube.com/playlist?list=PLsHVxzxNumvPzHcoR1gDHv0fm-6je0GvM
āĻ°āĻŽāĻ¯āĻžāĻ¨ āĻŦāĻŋāĻˇā§Ÿā§‡ āĻœāĻžāĻ˛ āĻ“ āĻĻā§āĻ°ā§āĻŦāĻ˛ āĻšāĻžāĻĻāĻŋāĻ¸āĻ¸āĻŽā§‚āĻš 36 No https://www.youtube.com/watch?v=MJi1V7e5ai8&list=PLsHVxzxNumvOsZibj3sZRJxt1uZH_k6n7&index=10
āĻŽā§āĻ¸āĻ¨āĻžāĻĻā§‡ āĻ†āĻšāĻŽāĻžāĻĻ
āĻœā§āĻœ'āĻ‰āĻ˛ āĻ•āĻŋāĻ°āĻžāĻ¤
āĻ¸ā§āĻ¨āĻžāĻ¨ āĻ†āĻĻ-āĻĻāĻžāĻ°āĻŋāĻŽā§€
āĻ¤āĻžāĻšāĻžāĻŦā§€ āĻļāĻ°āĻŋāĻĢ
āĻ¸ā§āĻ¨āĻžāĻ¨ āĻĻāĻžāĻ°āĻžāĻ•ā§āĻ¤āĻ¨ā§€

issues

  • GitHub automatically eliminates html like tags from python code written in jupyter notebook,please check this issue #1
  • grutt doesn't have support for bangla. if possible,build a strong G2P model for bangla and it should help improve the performance of our bangla TTS

References :

  1. https://aclanthology.org/2020.lrec-1.789.pdf
  2. https://arxiv.org/pdf/2106.06103.pdf
  3. https://arxiv.org/abs/2005.11129
  4. https://aclanthology.org/2020.emnlp-main.207.pdf

Acknowledgements

Apsis Solutions Ltd.

bengali.ai

About

Aiming to achieve ultimate Multilingual TTS pipeline with main focus on releasing COQUI🐸TTS(Text-to-Speech) based high performing neural voice cloning systems for Bangla for the first time, supporting different SOTA models for Bangla and also Multilingual (Arabic+Bengali) code mixed TTS pipeline.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published