8000 GitHub - github-mi/forced_align: Batched Implementation of CTC Forced Alignment
[go: up one dir, main page]

Skip to content

github-mi/forced_align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTC Forced Alignment - Batched Implementation

A high-performance, batched implementation of Connectionist Temporal Classification (CTC) forced alignment algorithm. Our implementation provides two highly optimized versions: a PyTorch-based implementation and a native CUDA implementation. Both demonstrate substantial performance improvements over the standard torchaudio.functional.forced_align when processing multiple sequences.

Features

  • Native CUDA Support: Custom CUDA kernels for maximum performance
  • Full Batch Support: Unlike the original torchaudio implementation which only supports batch_size=1, our implementation supports arbitrary batch sizes
  • Drop-in Replacement: Same API with an additional use_cuda parameter
  • GPU Accelerated: Optimized for CUDA devices with significant speedup over sequential processing

Installation

# Compile CUDA extension for maximum performance
python setup.py build_ext --inplace

Performance

Benchmark Results

Configuration: Input Length=100, Target Length=20, Vocabulary Size=10,000
Hardware: NVIDIA GeForce RTX 5090

Batch Size TorchAudio CUDA Implementation PyTorch Implementation CUDA Speedup PyTorch Speedup
1 2.0ms 1.9ms 10.1ms 1.07x 0.20x
8 17.1ms 1.9ms 10.4ms 8.87x 1.64x
64 137.4ms 2.2ms 10.5ms 61.72x 13.08x

Batch Processing Efficiency

Batch Size CUDA (Time per Sample) PyTorch (Time per Sample) CUDA Efficiency PyTorch Efficiency
1 1.89ms 10.13ms 1.00x 1.00x
8 0.24ms 1.30ms 7.82x 7.79x
64 0.03ms 0.16ms 54.29x 62.03x

Benchmarking

Run the benchmark script to compare performance:

python benchmark.py

This will test various batch sizes and provide detailed performance comparisons between our implementation and torchaudio.functional.forced_align.

Usage

import torch
from ctc import ctc_forced_align

# Generate sample data
batch_size, input_length, target_length, vocab_size = 4, 100, 20, 1000
log_probs = torch.randn(batch_size, input_length, vocab_size)
log_probs = torch.log_softmax(log_probs, dim=-1)
targets = torch.randint(1, vocab_size, (batch_size, target_length))
input_lengths = torch.full((batch_size,), input_length)
target_lengths = torch.full((batch_size,), target_length)

# Perform forced alignment
alignments, scores = ctc_forced_align(
    log_probs=log_probs,
    targets=targets,
    input_lengths=input_lengths,
    target_lengths=target_lengths,
    blank=0
)

print(f"Alignment shape: {alignments.shape}")  # (batch_size, input_length)
print(f"Scores shape: {scores.shape}")  # (batch_size, input_length)

Algorithm

The implementation uses dynamic programming to find the optimal alignment path between the target sequence and the emission probabilities. The algorithm:

  1. Constructs an expanded target sequence with blanks
  2. Uses forward-backward algorithm principles for efficient computation
  3. Maintains backpointers for path reconstruction
  4. Returns the most likely alignment path

Acknowledgments

The PyTorch implementation borrows concepts from vadimkantorov/ctc.

The CUDA implementation is built upon the CUDA source code from torchaudio.functional.forced_align, with significant enhancements to support batch processing (batch_size > 1), which the original torchaudio implementation does not support.

License

MIT License - see LICENSE file for details.

About

Batched Implementation of CTC Forced Alignment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0