upgrade to sglang==0.5.9 and support qwen3.5 eagle3 by jiapingW · Pull Request #496 · sgl-project/SpecForge

jiapingW · 2026-03-08T15:02:36Z

Motivation

Because sglang has undergone several version updates and now supports new models as well as Eagle3 support for some models, we've upgraded the sglang dependency on SpecForge to version 0.5.9. This also supports training with Qwen3.5. The current sglang repository supporting Qwen3.5 is located at https://github.com/jiapingW/sglang/tree/qwen3.5-eagle3. We will be adding it to the upstream sglang repository soon.

This PR will also facilitate the training of future updated models for Eagle3. We will also be validating the effectiveness of this PR soon, including some previous models and qwen3.5.

This PR has some other updates:

Now support train on resume. The format of ckpt is epoch_x_step_xxxx. We will choose the largest lexicographical order (x,xxxx) to continue training. For example, for a 10-epoch task, (5,0) means that we need to train for another 5 epochs, and (4,20000) means that we need to train for another 6 epochs starting from the 4th epoch, but following the previous 20000th step.

Modifications

Related Issues

Accuracy Test

qwen3: ✅，use online training on ultrachat with 4k length, the accept length is OK.
qwen3.5: TODO
gpt-oss: TODO

Benchmark & Profiling

Checklist

[✅] Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

del chinese comment and del uesless code del comment fix lint

gemini-code-assist · 2026-03-08T15:02:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on updating the project's core sglang dependency to version 0.5.9. This upgrade is crucial for enabling compatibility with newer models like Qwen3.5 and enhancing support for Eagle3, which involves significant internal adjustments to how model parallelism and hidden states are managed within the specforge framework. The changes ensure the system can effectively utilize the latest features and optimizations provided by the updated sglang library, paving the way for future model integrations and improved training workflows.

Highlights

SGLang Dependency Upgrade: The sglang dependency has been upgraded from version 0.5.6 to 0.5.9 to leverage new features and model support.
Qwen3.5 and Eagle3 Model Support: Added configuration and an example script to support training with the Qwen3.5-35B-A3B model, specifically for Eagle3. This includes a new JSON config file and a shell script for offline training.
SGLang API Adaptations: Modified several internal specforge files to align with API changes in sglang 0.5.9, including updates to parallel group initialization, handling of model output structures, and parameter changes in SGLangRunner and initialize_model_parallel.
Parallelism Parameter Updates: Introduced new parameters for attention data parallel, attention context model parallel, and MoE data model parallel sizes in the initialize_model_parallel function, reflecting enhanced parallelism capabilities in the new sglang version.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

configs/qwen3.5-35b-a3b-eagle3.json
- Added a new configuration file for the Qwen3.5-35B-A3B model with Eagle3 architecture, specifying model parameters like hidden size, vocabulary size, and attention heads.
examples/run_qwen3.5_35b_a3b_eagle3_offline.sh
- Added a new example shell script to demonstrate the offline training process for the Qwen3.5-35B-A3B Eagle3 model, including steps for generating hidden states.
pyproject.toml
- Updated the sglang dependency version from 0.5.6 to 0.5.9.
scripts/train_dflash.py
- Removed redundant parentheses from import statements for specforge.utils.
specforge/modeling/target/eagle3_target_model.py
- Updated the import path for prepare_mlp_sync_batch_raw from Scheduler to scheduler_dp_attn_mixin.
- Modified from_pretrained to handle torch_dtype as 'auto' if None, a requirement for sglang 0.5.9.
- Added is_draft_worker=False parameter to SGLangRunner initialization, aligning with sglang 0.5.9 API.
- Adjusted the _extend method to correctly extract logits, auxiliary hidden states, and last hidden states from the new eagle3_output structure, which now includes a logits_output attribute.
- Updated the call to prepare_mlp_sync_batch_raw in _maybe_prepare_mlp_sync_batch, removing spec_algorithm and speculative_num_draft_tokens parameters and adding attn_cp_size.
specforge/modeling/target/sglang_backend/model_runner.py
- Removed the torch_compile parameter from initialize_model_parallel as it is no longer supported in sglang 0.5.9.
- Added attention_data_parallel_size, attention_context_model_parallel_size, and moe_data_model_parallel_size parameters to initialize_model_parallel.
specforge/modeling/target/sglang_backend/patch.py
- Modified the initialize_model_parallel function signature to remove torch_compile and include new parallel parameters: attention_data_parallel_size, attention_context_model_parallel_size, and moe_data_model_parallel_size.
- Added extensive logic within initialize_model_parallel to set up _ATTN_CP, _ATTN_TP, and _MOE_DP parallel groups, which are new requirements in sglang 0.5.9.
- Updated comments to reflect the removal of torch_compile and the new parallel group initialization.
- Corrected a typo in a comment from b1 to g1 for pipeline model-parallel groups.
- Added conditional checks for pynccl_comm existence before accessing it, as it may be None in sglang 0.5.9.
- Updated initialize_dp_attention to reflect that _ATTN_TP_GROUP is now created in initialize_model_parallel and to pass attn_cp_size to compute_dp_attention_world_info.
specforge/modeling/target/sglang_backend/utils.py
- Added List to typing imports.
- Modified the replaced_logits_processor_forward_for_eagle3 function signature to accept aux_hidden_states as Optional[List[torch.Tensor]] and added a new hidden_states_before_norm parameter.
- Ensured aux_pruned_states is correctly initialized to None if aux_hidden_states is not provided.
- Updated the forward method of LogitsProcessorWrapper to pass the new hidden_states_before_norm parameter.
specforge/modeling/target/target_head.py
- Modified the __init__ method to robustly retrieve hidden_size and vocab_size from self.config by first checking for a text_config attribute, improving compatibility with different model configurations.

Activity

The pull request was created by jiapingW to upgrade the sglang dependency and add support for Qwen3.5 Eagle3 models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request upgrades sglang to version 0.5.9 and adds support for qwen3.5 eagle3. The changes primarily involve adapting the codebase to the new sglang API, including modifications to distributed initialization, model runner, and forward pass logic. The changes appear reasonable and necessary for the upgrade. I've identified a few areas for improvement, including a leftover debug print statement, some hardcoded values in an example script, and a minor typo in a docstring. Addressing these will enhance the code's quality and maintainability.

gemini-code-assist · 2026-03-08T15:04:49Z

specforge/modeling/target/sglang_backend/model_runner.py

+            # Debug: Print the values
+            dp_size = getattr(self.server_args, "dp_size", 1)
+            attn_cp_size = getattr(self.server_args, "attn_cp_size", 1)
+            moe_dp_size = getattr(self.server_args, "moe_dp_size", 1)
+            print(
+                f"[DEBUG] tp_size={self.tp_size}, dp_size={dp_size}, attn_cp_size={attn_cp_size}, moe_dp_size={moe_dp_size}"
+            )


This block includes a debug print statement. Such statements should be removed from the final code to avoid polluting logs and to maintain code cleanliness.

Suggested change

# Debug: Print the values

dp_size = getattr(self.server_args, "dp_size", 1)

attn_cp_size = getattr(self.server_args, "attn_cp_size", 1)

moe_dp_size = getattr(self.server_args, "moe_dp_size", 1)

print(

f"[DEBUG] tp_size={self.tp_size}, dp_size={dp_size}, attn_cp_size={attn_cp_size}, moe_dp_size={moe_dp_size}"

)

dp_size = getattr(self.server_args, "dp_size", 1)

attn_cp_size = getattr(self.server_args, "attn_cp_size", 1)

moe_dp_size = getattr(self.server_args, "moe_dp_size", 1)

gemini-code-assist · 2026-03-08T15:04:50Z

examples/run_qwen3.5_35b_a3b_eagle3_offline.sh

+CUDA_VISIBLE_DEVICES=1,2,3,5 torchrun \
+    --standalone \
+    --nproc_per_node $NUM_GPUS \
+    scripts/prepare_hidden_states.py \
+    --target-model-path /data/jiapingW/pretrained_models/Qwen3.5-35B-A3B \


The script contains hardcoded values for CUDA_VISIBLE_DEVICES and --target-model-path. This reduces portability and makes it difficult for other users to run the script on different machine configurations. It's recommended to parameterize these values using environment variables or script arguments.

Suggested change

CUDA_VISIBLE_DEVICES=1,2,3,5 torchrun \

--standalone \

--nproc_per_node $NUM_GPUS \

scripts/prepare_hidden_states.py \

--target-model-path /data/jiapingW/pretrained_models/Qwen3.5-35B-A3B \

CUDA_VISIBLE_DEVICES=${CUDA_DEVICES:-"1,2,3,5"} torchrun \

--standalone \

--nproc_per_node $NUM_GPUS \

scripts/prepare_hidden_states.py \

--target-model-path ${TARGET_MODEL_PATH:-"/data/jiapingW/pretrained_models/Qwen3.5-35B-A3B"} \

gemini-code-assist · 2026-03-08T15:04:50Z

examples/run_qwen3.5_35b_a3b_eagle3_offline.sh

+# NUM_GPUS=2
+# CUDA_VISIBLE_DEVICES=6,7 torchrun \
+#     --standalone \
+#     --nproc_per_node $NUM_GPUS \
+#     $ROOT_DIR/scripts/train_eagle3.py \
+#     --target-model-path /data/jiapingW/pretrained_models/Qwen3.5-35B-A3B \
+#     --draft-model-config $ROOT_DIR/configs/qwen3.5-35b-a3b-eagle3.json \
+#     --train-data-path $ROOT_DIR/cache/dataset/ultrachat_train.jsonl  \
+#     --train-hidden-states-path $ROOT_DIR/cache/hidden_states/qwen3.5-35b-a3b-ultrachat \
+#     --build-dataset-num-proc $BUILD_DATASET_NUM_PROC \
+#     --output-dir $ROOT_DIR/outputs/qwen3.5-35b-a3b-ultrachat \
+#     --num-epochs 10 \
+#     --batch-size 1 \
+#     --tp-size 1 \
+#     --learning-rate 5e-5 \
+#     --max-length 4096 \
+#     --chat-template qwen \
+#     --cache-dir $ROOT_DIR/cache \
+#     --embedding-key "model.language_model.embed_tokens.weight"


This large block of code for training is commented out, which can be confusing. If this section is not yet functional or is intended for reference, please add a comment explaining its status and usage. Otherwise, if it's obsolete, consider removing it to improve the script's clarity.

gemini-code-assist · 2026-03-08T15:04:50Z

specforge/modeling/target/sglang_backend/patch.py

            [g0, g1], [g2, g3], [g4, g5], [g6, g7]
        2 pipeline model-parallel groups:
-            [g0, g2, g4, g6], [g1, g3, g5, g7]
+            [g0, g2, g4, g6], [b1, g3, g5, g7]


There appears to be a typo in the docstring example. b1 should likely be g1 to maintain consistency with the g notation for GPUs.

Suggested change

[g0, g2, g4, g6], [b1, g3, g5, g7]

[g0, g2, g4, g6], [g1, g3, g5, g7]

upgrade to sglang==0.5.9 and support qwen3.5 eagle3

e0ffb96

del chinese comment and del uesless code del comment fix lint

jiapingW requested review from FlamingoPg, FrankLeeeee, shuaills and sleepcoo as code owners March 8, 2026 15:02

gemini-code-assist bot reviewed Mar 8, 2026

View reviewed changes

jiapingW added 2 commits March 9, 2026 03:42

upgrade FSDP api and support train resume on exist ckpt

2111712

fallback FSDP update

247f410

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade to sglang==0.5.9 and support qwen3.5 eagle3#496

upgrade to sglang==0.5.9 and support qwen3.5 eagle3#496
jiapingW wants to merge 3 commits intomainfrom
qwen3.5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	[g0, g2, g4, g6], [b1, g3, g5, g7]
	[g0, g2, g4, g6], [g1, g3, g5, g7]

Conversation

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

Summary of Changes

Highlights

Footnotes

Uh oh!

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant