8000 Libriheavy (Code from SAIC-Cambridge) (#2781) · speechbrain/speechbrain@948fa2b · GitHub
[go: up one dir, main page]

Skip to content

Commit 948fa2b

Browse files
shucongzhangShucong Zhang/Embedded AI /SRUK/Engineer/Samsung ElectronicsTParcolletAdel-Moumen
authored
Libriheavy (Code from SAIC-Cambridge) (#2781)
* libriheavy AED * Update README.md * modified README and added the test csv * update PR * add the possibility to have a different audio backend in SB * followup backend * add Readme in root folder * add dynamic backend * add backend info in header of train.py * placeholder * dev split should be defined through the yaml * data root + dev split * dev * READMEs * pre-commit * pre-commit fix: how is it possible??? * add dataclass doc * last pre-commit inchallah * remove links * fix link * remove speed perturb as unused --------- Co-authored-by: Shucong Zhang/Embedded AI /SRUK/Engineer/Samsung Electronics <s1.zhang@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Parcollet Titouan <parcollet.titouan@gmail.com> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> Co-authored-by: Adel Moumen <88119391+Adel-Moumen@users.noreply.github.com>
1 parent 79c4fd1 commit 948fa2b

File tree

14 files changed

+1475
-74
lines changed

14 files changed

+1475
-74
lines changed

recipes/DNS/dns_download.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -463,7 +463,6 @@ def decompress_file(file, decompress_path, split_name):
463463
tar.close()
464464
else:
465465
print("Unsupported file format. Only zip and bz2 files are supported.")
466-
# os.remove(file)
467466

468467

469468
def rename_rirs(decompress_path):
@@ -473,11 +472,8 @@ def rename_rirs(decompress_path):
473472
474473
Arguments
475474
---------
476-
decompress_path (str): The path to the directory containing the RIRs
477-
478-
Returns
479-
-------
480-
None
475+
decompress_path : str
476+
The path to the directory containing the RIRs
481477
"""
482478
try:
483479
os.rename(

recipes/IEMOCAP/emotion_recognition/iemocap_prepare.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def skip(*filenames):
147147
Arguments
148148
---------
149149
*filenames : tuple
150-
List of paths to check for existence.
150+
A list of paths to check for existence.
151151
152152
Returns
153153
-------
@@ -212,7 +212,7 @@ def split_sets(speaker_dict, split_ratio):
212212
---------
213213
speaker_dict : list
214214
a dictionary of speaker id and its corresponding audio information
215-
split_ratio: list
215+
split_ratio : list
216216
List composed of three integers that sets split ratios for train,
217217
valid, and test sets, respectively.
218218
For instance split_ratio=[80, 10, 10] will assign 80% of the sentences
@@ -256,7 +256,7 @@ def transform_data(path_loadSession):
256256
Returns
257257
-------
258258
speaker_dict : dict
259-
Mapping from speaker id to waveform.
259+
Map from speaker id to wav.
260260
261261
Example
262262
-------
@@ -312,12 +312,12 @@ def load_session(pathSession):
312312
313313
Arguments
314314
---------
315-
pathSession: str
316-
Path folder of IEMOCAP session.
315+
pathSession: str
316+
Path folder of IEMOCAP session.
317317
Returns
318318
-------
319-
improvisedUtteranceList: list
320-
List of improvised utterancefor IEMOCAP session.
319+
improvisedUtteranceList: list
320+
List of improvised utterancefor IEMOCAP session.
321321
"""
322322
pathEmo = pathSession + "/dialog/EmoEvaluation/"
323323
pathWavFolder = pathSession + "/sentences/wav/"

recipes/IEMOCAP/iemocap_prepare.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -312,12 +312,12 @@ def load_session(pathSession):
312312
313313
Arguments
314314
---------
315-
pathSession: str
316-
Path folder of IEMOCAP session.
315+
pathSession: str
316+
Path folder of IEMOCAP session.
317317
Returns
318318
-------
319-
improvisedUtteranceList: list
320-
List of improvised utterancefor IEMOCAP session.
319+
improvisedUtteranceList: list
320+
List of improvised utterancefor IEMOCAP session.
321321
"""
322322
pathEmo = pathSession + "/dialog/EmoEvaluation/"
323323
pathWavFolder = pathSession + "/sentences/wav/"
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Libriheavy Dataset
2+
This folder contains the scripts to train a Transformer-based speech recognizer.
3+
4+
1. Please download Libri-Light at https://github.com/facebookresearch/libri-light/tree/main/data_preparation
5+
After this step, please make sure you have all the splits (small, medium, and large) in one folder.
6+
Please note if you want to use the large split, the large.tar file is 3.05TB. Also, the download can take quite a while.
7+
8+
2. Please git clone the repo https://github.com/k2-fsa/libriheavy, and follow the repo's instruction to prepare Libriheavy manifests.
9+
After this step, please make sure you have all the "jsonl.gz" Libriheavy manifest files in one folder.
10+
11+
**Note 1:** This recipe relies on the `soundfile` backend for fast audio processing. Libriheavy comes with long audio files, and we need to read them in chunks. In our experiments, we found that `soundfile` was the only audio backend fast enough to read these long audio files. You can dynamically change the backend through the `--audio_backend` parameter in the YAML file.
12+
13+
**Note 2:** If you don't have the `large` folder but want to run this recipe with the `small` and/or `medium` splits, you need to download the official `dev` and `test` splits from the LibriSpeech dataset. This is necessary because the `dev` and `test` splits for Libriheavy are located in the `large` folder. You can download LibriSpeech at http://www.openslr.org/12 and run the `librispeech_prepare.py` script from the `recipes/LibriSpeech/` folder. Then, specify the `dev_splits` and `test_splits` parameters in the YAML file.
14+
15+
# How to run
16+
```shell
17+
python train.py hparams/transformer.yaml --data_folder=/path/to/Libri-Light --manifest_folder=/path/to/Libriheavy
18+
```
19+
20+
# LibriSpeech Dev/Test Results
21+
Results of trained with the Libriheavy large split and tested with LibriSpeech dev/test sets.
22+
23+
| Release | hyperparams file | Dev Clean WER (Transformer LM) | Test Clean WER (Transformer LM) | Test Other WER (Transformer LM) | HuggingFace link | Model link | GPUs |
24+
|:-------------:|:-------------:|:-------------:|:---------------------------:| :-----:| :-----:| :-----:| :--------:|
25+
| 24-12-09 | conformer_large.yaml | 1.58 | 1.74 | 3.92 | Not Avail. | Not Avail. | 8xA100 80GB |
26+
27+
28+
# **About SpeechBrain**
29+
- Website: https://speechbrain.github.io/
30+
- Code: https://github.com/speechbrain/speechbrain/
31+
- HuggingFace: https://huggingface.co/speechbrain/
32+
33+
34+
# **Citing SpeechBrain**
35+
Please, cite SpeechBrain if you use it for your research or business.
36+
37+
```bibtex
38+
@misc{speechbrainV1,
39+
title={Open-Source Conversational AI with SpeechBrain 1.0},
40+
author={Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Gaelle Laperriere and Mickael Rouvier and Renato De Mori and Yannick Esteve},
41+
year={2024},
42+
eprint={2407.00463},
43+
archivePrefix={arXiv},
44+
primaryClass={cs.LG},
45+
url={https://arxiv.org/abs/2407.00463},
46+
}
47+
@misc{speechbrain,
48+
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
49+
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
50+
year={2021},
51+
eprint={2106.04624},
52+
archivePrefix={arXiv},
53+
primaryClass={eess.AS},
54+
note={arXiv:2106.04624}
55+
}
56+
```

0 commit comments

Comments
 (0)
0