This page describes the purpose, scope, and high-level architecture of the SpecForge codebase. It covers what SpecForge does, the supported algorithms, the SpecBundle initiative, and how all major subsystems relate to each other. For installation instructions, see Getting Started. For deep dives into specific algorithms, see Core Concepts.
SpecForge is a training framework for speculative decoding draft models, built and maintained by the SGLang team. Its output — trained draft model weights — is designed to be loaded directly into the SGLang inference server to accelerate LLM inference.
Speculative decoding works by pairing a small, fast draft model with a larger target model. The draft model proposes multiple tokens ahead, and the target model verifies them in parallel. When the draft is accurate, multiple tokens are accepted per forward pass, yielding significant throughput gains. SpecForge provides the tooling to train and evaluate these draft models.
The project addresses three gaps in the open-source ecosystem:
| Gap | SpecForge Response |
|---|---|
| No production-ready training infrastructure | FSDP, tensor parallelism, online/offline modes |
| Scarce high-quality draft model checkpoints | SpecBundle: curated, openly released weights |
| Training at insufficient scale | Large-scale multi-domain dataset support |
Sources: README.md14-25
Sources: README.md18-22
SpecForge implements two speculative decoding training algorithms.
| Algorithm | Training Script | Key Idea |
|---|---|---|
| EAGLE3 | train_eagle3.py | Draft model uses hidden states from multiple target model layers; trains with Test-Time Training (TTT) unrolling |
| DFlash | train_dflash.py | Block-wise draft generation with anchor positions, Flex Attention block masking, and exponential loss decay |
For a conceptual explanation of each algorithm, see EAGLE3 Algorithm and DFlash Algorithm.
Sources: docs/index.rst1-54
SpecForge subsystem map
Sources: README.md14-36 docs/index.rst1-54 docs/community_resources/specbundle.md1-94
Code-level subsystem map
Sources: docs/index.rst1-54 specforge/tracker.py36-284
SpecBundle is an open initiative, jointly driven by the SpecForge team and industry partners (Ant Group, Meituan, Nex-AGI, EigenAI), to release production-grade EAGLE3 draft model weights for mainstream open-source LLMs.
Released model families (as of the current version):
| Series | Example Target Model |
|---|---|
| Llama | meta-llama/Llama-3.1-8B-Instruct, Llama-3.3-70B-Instruct, Llama-4 Scout/Maverick |
| Qwen | Qwen/Qwen3-30B-A3B, Qwen3-235B-A22B |
| Qwen Coder | Qwen/Qwen3-Coder-30B-A3B, Qwen3-Coder-480B-A35B |
| Ling | inclusionAI/Ling-flash-2.0 |
| Kimi | moonshotai/Kimi-K2-Instruct |
| GPT-OSS | openai/gpt-oss-20b, gpt-oss-120b |
| Nex | nex-agi/Qwen3-30B-A3B-Nex-N1 |
SpecBundle models are loaded directly into SGLang via the --speculative-algorithm EAGLE3 and --speculative-draft-model-path flags. See SpecBundle Collection for full details.
Sources: docs/community_resources/specbundle.md1-94 README.md27-36
The primary deployment target for SpecForge-trained models is the SGLang serving framework. After training, the draft model checkpoint can be passed directly to sglang.launch_server:
python3 -m sglang.launch_server \
--model <target-model-path> \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path <draft-model-path> \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4
SpecForge also patches SGLang's distributed initialization to support training-time use of a live SGLang server as the target model backend. The SGLangEagle3TargetModel class wraps this interaction, and SGLangBackendArgs controls server configuration. See SGLang Integration for implementation details.
Sources: docs/community_resources/specbundle.md25-36 docs/benchmarks/benchmark.md42-56
Training runs integrate with multiple experiment tracking backends via the Tracker abstract class in specforge/tracker.py36-71 Concrete implementations and the registry are:
Key in TRACKER_REGISTRY | Class |
|---|---|
"wandb" | WandbTracker |
"swanlab" | SwanlabTracker |
"tensorboard" | TensorboardTracker |
"mlflow" | MLflowTracker |
"none" | NoOpTracker |
The factory function create_tracker in specforge/tracker.py292-297 instantiates the appropriate tracker based on the --report-to CLI argument.
Sources: specforge/tracker.py277-297
Top-level directory map
Sources: docs/index.rst1-54 README.md1-70
| Topic | Wiki Page |
|---|---|
| Installation and first run | Getting Started |
| Speculative decoding concepts | Core Concepts |
| Data preparation scripts | Data Pipeline |
| Draft and target model classes | Model Architecture |
| EAGLE3 and DFlash training | Training System |
| Config files and utilities | Configuration System |
| Deploying with SGLang | Deployment and Inference |
| Running benchmarks | Benchmarking and Evaluation |
| SpecBundle pre-trained weights | SpecBundle Collection |