Overview

Relevant source files

This page describes the purpose, scope, and high-level architecture of the SpecForge codebase. It covers what SpecForge does, the supported algorithms, the SpecBundle initiative, and how all major subsystems relate to each other. For installation instructions, see Getting Started. For deep dives into specific algorithms, see Core Concepts.

What Is SpecForge

SpecForge is a training framework for speculative decoding draft models, built and maintained by the SGLang team. Its output — trained draft model weights — is designed to be loaded directly into the SGLang inference server to accelerate LLM inference.

Speculative decoding works by pairing a small, fast draft model with a larger target model. The draft model proposes multiple tokens ahead, and the target model verifies them in parallel. When the draft is accurate, multiple tokens are accepted per forward pass, yielding significant throughput gains. SpecForge provides the tooling to train and evaluate these draft models.

The project addresses three gaps in the open-source ecosystem:

Gap	SpecForge Response
No production-ready training infrastructure	FSDP, tensor parallelism, online/offline modes
Scarce high-quality draft model checkpoints	SpecBundle: curated, openly released weights
Training at insufficient scale	Large-scale multi-domain dataset support

Sources: README.md14-25

Key Design Principles

Runnable out of the box: dependencies are pinned, example scripts are provided.
Direct SGLang compatibility: draft models produced by SpecForge load into SGLang without adaptation.
Flexible training modes: online (target model runs live during training) and offline (hidden states pre-computed to disk).
Distributed training support: FSDP, tensor parallelism (TP), data parallelism (DP), and sequence parallelism (SP).

Sources: README.md18-22

Supported Algorithms

SpecForge implements two speculative decoding training algorithms.

Algorithm	Training Script	Key Idea
EAGLE3	`train_eagle3.py`	Draft model uses hidden states from multiple target model layers; trains with Test-Time Training (TTT) unrolling
DFlash	`train_dflash.py`	Block-wise draft generation with anchor positions, Flex Attention block masking, and exponential loss decay

For a conceptual explanation of each algorithm, see EAGLE3 Algorithm and DFlash Algorithm.

Sources: docs/index.rst1-54

High-Level System Architecture

SpecForge subsystem map

Sources: README.md14-36 docs/index.rst1-54 docs/community_resources/specbundle.md1-94

Subsystem Breakdown

Code-level subsystem map

Sources: docs/index.rst1-54 specforge/tracker.py36-284

SpecBundle

SpecBundle is an open initiative, jointly driven by the SpecForge team and industry partners (Ant Group, Meituan, Nex-AGI, EigenAI), to release production-grade EAGLE3 draft model weights for mainstream open-source LLMs.

Released model families (as of the current version):

Series	Example Target Model
Llama	`meta-llama/Llama-3.1-8B-Instruct`, `Llama-3.3-70B-Instruct`, Llama-4 Scout/Maverick
Qwen	`Qwen/Qwen3-30B-A3B`, `Qwen3-235B-A22B`
Qwen Coder	`Qwen/Qwen3-Coder-30B-A3B`, `Qwen3-Coder-480B-A35B`
Ling	`inclusionAI/Ling-flash-2.0`
Kimi	`moonshotai/Kimi-K2-Instruct`
GPT-OSS	`openai/gpt-oss-20b`, `gpt-oss-120b`
Nex	`nex-agi/Qwen3-30B-A3B-Nex-N1`

SpecBundle models are loaded directly into SGLang via the --speculative-algorithm EAGLE3 and --speculative-draft-model-path flags. See SpecBundle Collection for full details.

Sources: docs/community_resources/specbundle.md1-94 README.md27-36

SGLang Integration

The primary deployment target for SpecForge-trained models is the SGLang serving framework. After training, the draft model checkpoint can be passed directly to sglang.launch_server:

python3 -m sglang.launch_server \
    --model <target-model-path> \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path <draft-model-path> \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4

SpecForge also patches SGLang's distributed initialization to support training-time use of a live SGLang server as the target model backend. The SGLangEagle3TargetModel class wraps this interaction, and SGLangBackendArgs controls server configuration. See SGLang Integration for implementation details.

Sources: docs/community_resources/specbundle.md25-36 docs/benchmarks/benchmark.md42-56

Experiment Tracking

Training runs integrate with multiple experiment tracking backends via the Tracker abstract class in specforge/tracker.py36-71 Concrete implementations and the registry are:

Key in `TRACKER_REGISTRY`	Class
`"wandb"`	`WandbTracker`
`"swanlab"`	`SwanlabTracker`
`"tensorboard"`	`TensorboardTracker`
`"mlflow"`	`MLflowTracker`
`"none"`	`NoOpTracker`

The factory function create_tracker in specforge/tracker.py292-297 instantiates the appropriate tracker based on the --report-to CLI argument.

Sources: specforge/tracker.py277-297

Repository Structure Overview

Top-level directory map

Sources: docs/index.rst1-54 README.md1-70

Where to Go Next

Topic	Wiki Page
Installation and first run	Getting Started
Speculative decoding concepts	Core Concepts
Data preparation scripts	Data Pipeline
Draft and target model classes	Model Architecture
EAGLE3 and DFlash training	Training System
Config files and utilities	Configuration System
Deploying with SGLang	Deployment and Inference
Running benchmarks	Benchmarking and Evaluation
SpecBundle pre-trained weights	SpecBundle Collection

Overview

Relevant source files

What Is SpecForge

The project addresses three gaps in the open-source ecosystem:

Gap	SpecForge Response
No production-ready training infrastructure	FSDP, tensor parallelism, online/offline modes
Scarce high-quality draft model checkpoints	SpecBundle: curated, openly released weights
Training at insufficient scale	Large-scale multi-domain dataset support

Sources: README.md14-25

Key Design Principles

Runnable out of the box: dependencies are pinned, example scripts are provided.
Direct SGLang compatibility: draft models produced by SpecForge load into SGLang without adaptation.
Flexible training modes: online (target model runs live during training) and offline (hidden states pre-computed to disk).
Distributed training support: FSDP, tensor parallelism (TP), data parallelism (DP), and sequence parallelism (SP).

Sources: README.md18-22

Supported Algorithms

SpecForge implements two speculative decoding training algorithms.

Algorithm	Training Script	Key Idea
EAGLE3	`train_eagle3.py`	Draft model uses hidden states from multiple target model layers; trains with Test-Time Training (TTT) unrolling
DFlash	`train_dflash.py`	Block-wise draft generation with anchor positions, Flex Attention block masking, and exponential loss decay

For a conceptual explanation of each algorithm, see EAGLE3 Algorithm and DFlash Algorithm.

Sources: docs/index.rst1-54

High-Level System Architecture

SpecForge subsystem map

Sources: README.md14-36 docs/index.rst1-54 docs/community_resources/specbundle.md1-94

Subsystem Breakdown

Code-level subsystem map

Sources: docs/index.rst1-54 specforge/tracker.py36-284

SpecBundle

Released model families (as of the current version):

Series	Example Target Model
Llama	`meta-llama/Llama-3.1-8B-Instruct`, `Llama-3.3-70B-Instruct`, Llama-4 Scout/Maverick
Qwen	`Qwen/Qwen3-30B-A3B`, `Qwen3-235B-A22B`
Qwen Coder	`Qwen/Qwen3-Coder-30B-A3B`, `Qwen3-Coder-480B-A35B`
Ling	`inclusionAI/Ling-flash-2.0`
Kimi	`moonshotai/Kimi-K2-Instruct`
GPT-OSS	`openai/gpt-oss-20b`, `gpt-oss-120b`
Nex	`nex-agi/Qwen3-30B-A3B-Nex-N1`

SpecBundle models are loaded directly into SGLang via the --speculative-algorithm EAGLE3 and --speculative-draft-model-path flags. See SpecBundle Collection for full details.

Sources: docs/community_resources/specbundle.md1-94 README.md27-36

SGLang Integration

The primary deployment target for SpecForge-trained models is the SGLang serving framework. After training, the draft model checkpoint can be passed directly to sglang.launch_server:

python3 -m sglang.launch_server \
    --model <target-model-path> \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path <draft-model-path> \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4

Sources: docs/community_resources/specbundle.md25-36 docs/benchmarks/benchmark.md42-56

Experiment Tracking

Training runs integrate with multiple experiment tracking backends via the Tracker abstract class in specforge/tracker.py36-71 Concrete implementations and the registry are:

Key in `TRACKER_REGISTRY`	Class
`"wandb"`	`WandbTracker`
`"swanlab"`	`SwanlabTracker`
`"tensorboard"`	`TensorboardTracker`
`"mlflow"`	`MLflowTracker`
`"none"`	`NoOpTracker`

The factory function create_tracker in specforge/tracker.py292-297 instantiates the appropriate tracker based on the --report-to CLI argument.

Sources: specforge/tracker.py277-297

Repository Structure Overview

Top-level directory map

Sources: docs/index.rst1-54 README.md1-70

Where to Go Next

Topic	Wiki Page
Installation and first run	Getting Started
Speculative decoding concepts	Core Concepts
Data preparation scripts	Data Pipeline
Draft and target model classes	Model Architecture
EAGLE3 and DFlash training	Training System
Config files and utilities	Configuration System
Deploying with SGLang	Deployment and Inference
Running benchmarks	Benchmarking and Evaluation
SpecBundle pre-trained weights	SpecBundle Collection

Overview

What Is SpecForge

Key Design Principles

Supported Algorithms

High-Level System Architecture

Subsystem Breakdown

SpecBundle

SGLang Integration

Experiment Tracking

Repository Structure Overview

Where to Go Next

On this page

Overview

What Is SpecForge

Key Design Principles

Supported Algorithms

High-Level System Architecture

Subsystem Breakdown

SpecBundle

SGLang Integration

Experiment Tracking

Repository Structure Overview

Where to Go Next

On this page