spatiospectral_diarization

Combining local, spatial segmentation and global, embedding-based speaker assignment for diarization

)

Diarization is the task of determining "who spoke when" in a given audio recording. Current popular approaches make use of a hybrid approach using a local segmentation module followed by a global speaker assignment, which assigns the respective speaker identity to each segment. This repository implements a spatio-spectral diarization pipeline that makes use of the same structure, while replacing the local segmentation stage with a TDOA-based spatial segmentation module, as introduced in Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering.

The segmentation module is based on the spatial diarization pipeline proposed in "Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks, Tobias Gburrek, Joerg Schmalenstroer, Reinhold Haeb-Umbach, 2023 Asilomar Conference" [link]

NOTE: This repository is still undergoing changes. While the diarization pipeline is in a functional state and can be applied to arbitrary multi-channel recordings, the documentation is still incomplete, and the code is undergoing revision for the sake of clarity and usability. Therefore, the repository will still be undergoing some changes over the next few weeks.

Content

A multi-channel, spatio-spectral diarization pipeline
- A spatial multi-channel segmentation module utilizing time difference of arrival (TDOA) features
- Beamforming and TDOA segment refinement to remove segments corresponding to reflections
- A global speaker assignment module using d-vector-based speaker embeddings obtained from the beamformed speech segments
Scripts to reproduce the results of the reference publication
- Diarization of the LibriWASN and LibriCSS datasets
- Evaluation in a semi-static meeting scenario
Modular design to enable further research and exchanging individual components of the pipeline

Installation

After cloning the repository, you can install the package using pip:

git clone https://github.com/fgnt/spatiospectral_diarization.git
pip install spatiospectral_diarization

See the code snippet below on how to directly apply the pipeline to a recording, or check the example notebook for further details on how to use the diarization pipeline and exchange parts of it (still WIP: to come in the next update).

Applying the pipeline to a recording

We provide the full diarization pipeline pre-packaged into a single python class. To apply the diarization pipeline to a multi-channel recording, you can use the following code snippet:

from spatiospectral_diarization.spatio_spectral_pipeline import SpatioSpectralDiarization
import paderbox as pb

pipeline = SpatioSpectralDiarization(
    sample_rate=16000,  # Sample rate of the audio data
)

audio_signal = pb.io.load_audio('path/to/your/multi_channel_audio.wav')

output = pipeline(audio_signal)

The pipeline expects synchronized signals, both in terms of sampling rate offset (SRO) and sampling time offset (STO). If you want to apply the pipeline to data obtained in a distributed setup, e.g., from multiple recording devices, we recommend applying the synchronization modules from paderwasn to the audio data before applying the diarization pipeline.

The pipeline outputs a dictionary containing the following entries:

diarization_estimate: a dictionary containing all speakers with onsets and offsets of each speaker detected in the recording
activity_segments: a list with all segments estimated in the spatial segmentation component
tdoa_vectors: a list containing the corresponding average time differences of arrival (TDOAs) for each segment
embeddings: The speaker embeddings for each segment

For more details, on how to apply and modify the pipeline, please refer to the example notebook in this repository.

Reproducing the LibriWASN & LibriCSS results

NOTE: Still Undergoing final code revision. Data preparation scripts are available. Call instructions on how to reproduce the results from the paper will be added in the next update.

Citation

To cite this package, please refer to the following publication:

@inproceedings{cordgburrek2025spatiospectral_diarization,
      title={Spatio-spectral diarization of meetings by combining {TDOA}-based segmentation and speaker embedding-based clustering}, 
      author={Tobias Cord-Landwehr and Tobias Gburrek and Marc Deegen and Reinhold Haeb-Umbach},
      year={2025},
      booktitle={Proceedings of Interspeech},
      publisher={ISCA}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
notebooks		notebooks
spatiospectral_diarization		spatiospectral_diarization
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spatiospectral_diarization

Combining local, spatial segmentation and global, embedding-based speaker assignment for diarization

Content

Installation

Applying the pipeline to a recording

Reproducing the LibriWASN & LibriCSS results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spatiospectral_diarization

Combining local, spatial segmentation and global, embedding-based speaker assignment for diarization

Content

Installation

Applying the pipeline to a recording

Reproducing the LibriWASN & LibriCSS results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages