CTC Forced Alignment - Batched Implementation

A high-performance, batched implementation of Connectionist Temporal Classification (CTC) forced alignment algorithm. Our implementation provides two highly optimized versions: a PyTorch-based implementation and a native CUDA implementation. Both demonstrate substantial performance improvements over the standard torchaudio.functional.forced_align when processing multiple sequences.

Features

Native CUDA Support: Custom CUDA kernels for maximum performance
Full Batch Support: Unlike the original torchaudio implementation which only supports batch_size=1, our implementation supports arbitrary batch sizes
Drop-in Replacement: Same API with an additional use_cuda parameter
GPU Accelerated: Optimized for CUDA devices with significant speedup over sequential processing

Installation

# Compile CUDA extension for maximum performance
python setup.py build_ext --inplace

Performance

Benchmark Results

Configuration: Input Length=100, Target Length=20, Vocabulary Size=10,000
Hardware: NVIDIA GeForce RTX 5090

Batch Size	TorchAudio	CUDA Implementation	PyTorch Implementation	CUDA Speedup	PyTorch Speedup
1	2.0ms	1.9ms	10.1ms	1.07x	0.20x
8	17.1ms	1.9ms	10.4ms	8.87x	1.64x
64	137.4ms	2.2ms	10.5ms	61.72x	13.08x

Batch Processing Efficiency

Batch Size	CUDA (Time per Sample)	PyTorch (Time per Sample)	CUDA Efficiency	PyTorch Efficiency
1	1.89ms	10.13ms	1.00x	1.00x
8	0.24ms	1.30ms	7.82x	7.79x
64	0.03ms	0.16ms	54.29x	62.03x

Benchmarking

Run the benchmark script to compare performance:

python benchmark.py

This will test various batch sizes and provide detailed performance comparisons between our implementation and torchaudio.functional.forced_align.

Usage

import torch
from ctc import ctc_forced_align

# Generate sample data
batch_size, input_length, target_length, vocab_size = 4, 100, 20, 1000
log_probs = torch.randn(batch_size, input_length, vocab_size)
log_probs = torch.log_softmax(log_probs, dim=-1)
targets = torch.randint(1, vocab_size, (batch_size, target_length))
input_lengths = torch.full((batch_size,), input_length)
target_lengths = torch.full((batch_size,), target_length)

# Perform forced alignment
alignments, scores = ctc_forced_align(
    log_probs=log_probs,
    targets=targets,
    input_lengths=input_lengths,
    target_lengths=target_lengths,
    blank=0
)

print(f"Alignment shape: {alignments.shape}")  # (batch_size, input_length)
print(f"Scores shape: {scores.shape}")  # (batch_size, input_length)

Algorithm

The implementation uses dynamic programming to find the optimal alignment path between the target sequence and the emission probabilities. The algorithm:

Constructs an expanded target sequence with blanks
Uses forward-backward algorithm principles for efficient computation
Maintains backpointers for path reconstruction
Returns the most likely alignment path

Acknowledgments

The PyTorch implementation borrows concepts from vadimkantorov/ctc.

The CUDA implementation is built upon the CUDA source code from torchaudio.functional.forced_align, with significant enhancements to support batch processing (batch_size > 1), which the original torchaudio implementation does not support.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
ctc.cu		ctc.cu
ctc.py		ctc.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTC Forced Alignment - Batched Implementation

Features

Installation

Performance

Benchmark Results

Batch Processing Efficiency

Benchmarking

Usage

Algorithm

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

License

github-mi/forced_align

Folders and files

Latest commit

History

Repository files navigation

CTC Forced Alignment - Batched Implementation

Features

Installation

Performance

Benchmark Results

Batch Processing Efficiency

Benchmarking

Usage

Algorithm

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages