-
Department of Electrical Engineering, National Taiwan University
- Taipei, Taiwan
Stars
Python scripts to bulk upload your local image as emojis to your Slack
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Instant voice cloning by MIT and MyShell.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Pikachu Volleyball implemented into JavaScript by reverse engineering the original game
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Unofficial implementation of NVIDIA P-Flow TTS paper
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
An Open Source text-to-speech system built by inverting Whisper.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
fine-tune Whipser model for Taiwanese speech recognition
"Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences", ICASSP 2023
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Grapheme to phoneme conversion with deep learning.
A tokenizer, text cleaner, and phonemizer for many human languages.
A playbook for systematically maximizing the performance of deep learning models.
Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.