8000 upgrade to sglang==0.5.9 and support qwen3.5 eagle3 by jiapingW · Pull Request #496 · sgl-project/SpecForge · GitHub
[go: up one dir, main page]

Skip to content

upgrade to sglang==0.5.9 and support qwen3.5 eagle3#496

Open
jiapingW wants to merge 3 commits intomainfrom
qwen3.5
Open

upgrade to sglang==0.5.9 and support qwen3.5 eagle3#496
jiapingW wants to merge 3 commits intomainfrom
qwen3.5

Conversation

@jiapingW
Copy link
Collaborator
@jiapingW jiapingW commented Mar 8, 2026

Motivation

Because sglang has undergone several version updates and now supports new models as well as Eagle3 support for some models, we've upgraded the sglang dependency on SpecForge to version 0.5.9. This also supports training with Qwen3.5. The current sglang repository supporting Qwen3.5 is located at https://github.com/jiapingW/sglang/tree/qwen3.5-eagle3. We will be adding it to the upstream sglang repository soon.

This PR will also facilitate the training of future updated models for Eagle3. We will also be validating the effectiveness of this PR soon, including some previous models and qwen3.5.

This PR has some other updates:

  • Now support train on resume. The format of ckpt is epoch_x_step_xxxx. We will choose the largest lexicographical order (x,xxxx) to continue training. For example, for a 10-epoch task, (5,0) means that we need to train for another 5 epochs, and (4,20000) means that we need to train for another 6 epochs starting from the 4th epoch, but following the previous 20000th step.

Modifications

Related Issues

Accuracy Test

qwen3: ✅,use online training on ultrachat with 4k length, the accept length is OK.
qwen3.5: TODO
gpt-oss: TODO

Benchmark & Profiling

Checklist

del chinese comment and del uesless code

del comment

fix lint
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on updating the project's core sglang dependency to version 0.5.9. This upgrade is crucial for enabling compatibility with newer models like Qwen3.5 and enhancing support for Eagle3, which involves significant internal adjustments to how model parallelism and hidden states are managed within the specforge framework. The changes ensure the system can effectively utilize the latest features and optimizations provided by the updated sglang library, paving the way for future model integrations and improved training workflows.

Highlights

  • SGLang Dependency Upgrade: The sglang dependency has been upgraded from version 0.5.6 to 0.5.9 to leverage new features and model support.
  • Qwen3.5 and Eagle3 Model Support: Added configuration and an example script to support training with the Qwen3.5-35B-A3B model, specifically for Eagle3. This includes a new JSON config file and a shell script for offline training.
  • SGLang API Adaptations: Modified several internal specforge files to align with API changes in sglang 0.5.9, including updates to parallel group initialization, handling of model output structures, and parameter changes in SGLangRunner and initialize_model_parallel.
  • Parallelism Parameter Updates: Introduced new parameters for attention data parallel, attention context model parallel, and MoE data model parallel sizes in the initialize_model_parallel function, reflecting enhanced parallelism capabilities in the new sglang version.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • configs/qwen3.5-35b-a3b-eagle3.json
    • Added a new configuration file for the Qwen3.5-35B-A3B model with Eagle3 architecture, specifying model parameters like hidden size, vocabulary size, and attention heads.
  • examples/run_qwen3.5_35b_a3b_eagle3_offline.sh
    • Added a new example shell script to demonstrate the offline training process for the Qwen3.5-35B-A3B Eagle3 model, including steps for generating hidden states.
  • pyproject.toml
    • Updated the sglang dependency version from 0.5.6 to 0.5.9.
  • scripts/train_dflash.py
    • Removed redundant parentheses from import statements for specforge.utils.
  • specforge/modeling/target/eagle3_target_model.py
    • Updated the import path for prepare_mlp_sync_batch_raw from Scheduler to scheduler_dp_attn_mixin.
    • Modified from_pretrained to handle torch_dtype as 'auto' if None, a requirement for sglang 0.5.9.
    • Added is_draft_worker=False parameter to SGLangRunner initialization, aligning with sglang 0.5.9 API.
    • Adjusted the _extend method to correctly extract logits, auxiliary hidden states, and last hidden states from the new eagle3_output structure, which now includes a logits_output attribute.
    • Updated the call to prepare_mlp_sync_batch_raw in _maybe_prepare_mlp_sync_batch, removing spec_algorithm and speculative_num_draft_tokens parameters and adding attn_cp_size.
  • specforge/modeling/target/sglang_backend/model_runner.py
    • Removed the torch_compile parameter from initialize_model_parallel as it is no longer supported in sglang 0.5.9.
    • Added attention_data_parallel_size, attention_context_model_parallel_size, and moe_data_model_parallel_size parameters to initialize_model_parallel.
  • specforge/modeling/target/sglang_backend/patch.py
    • Modified the initialize_model_parallel function signature to remove torch_compile and include new parallel parameters: attention_data_parallel_size, attention_context_model_parallel_size, and moe_data_model_parallel_size.
    • Added extensive logic within initialize_model_parallel to set up _ATTN_CP, _ATTN_TP, and _MOE_DP parallel groups, which are new requirements in sglang 0.5.9.
    • Updated comments to reflect the removal of torch_compile and the new parallel group initialization.
    • Corrected a typo in a comment from b1 to g1 for pipeline model-parallel groups.
    • Added conditional checks for pynccl_comm existence before accessing it, as it may be None in sglang 0.5.9.
    • Updated initialize_dp_attention to reflect that _ATTN_TP_GROUP is now created in initialize_model_parallel and to pass attn_cp_size to compute_dp_attention_world_info.
  • specforge/modeling/target/sglang_backend/utils.py
    • Added List to typing imports.
    • Modified the replaced_logits_processor_forward_for_eagle3 function signature to accept aux_hidden_states as Optional[List[torch.Tensor]] and added a new hidden_states_before_norm parameter.
    • Ensured aux_pruned_states is correctly initialized to None if aux_hidden_states is not provided.
    • Updated the forward method of LogitsProcessorWrapper to pass the new hidden_states_before_norm parameter.
  • specforge/modeling/target/target_head.py
    • Modified the __init__ method to robustly retrieve hidden_size and vocab_size from self.config by first checking for a text_config attribute, improving compatibility with different model configurations.
Activity
  • The pull request was created by jiapingW to upgrade the sglang dependency and add support for Qwen3.5 Eagle3 models.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor
@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades sglang to version 0.5.9 and adds support for qwen3.5 eagle3. The changes primarily involve adapting the codebase to the new sglang API, including modifications to distributed initialization, model runner, and forward pass logic. The changes appear reasonable and necessary for the upgrade. I've identified a few areas for improvement, including a leftover debug print statement, some hardcoded values in an example script, and a minor typo in a docstring. Addressing these will enhance the code's quality and maintainability.

Comment on lines +123 to +129
# Debug: Print the values
dp_size = getattr(self.server_args, "dp_size", 1)
attn_cp_size = getattr(self.server_args, "attn_cp_size", 1)
moe_dp_size = getattr(self.server_args, "moe_dp_size", 1)
print(
f"[DEBUG] tp_size={self.tp_size}, dp_size={dp_size}, attn_cp_size={attn_cp_size}, moe_dp_size={moe_dp_size}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block includes a debug print statement. Such statements should be removed from the final code to avoid polluting logs and to maintain code cleanliness.

Suggested change
# Debug: Print the values
dp_size = getattr(self.server_args, "dp_size", 1)
attn_cp_size = getattr(self.server_args, "attn_cp_size", 1)
moe_dp_size = getattr(self.server_args, "moe_dp_size", 1)
print(
f"[DEBUG] tp_size={self.tp_size}, dp_size={dp_size}, attn_cp_size={attn_cp_size}, moe_dp_size={moe_dp_size}"
)
dp_size = getattr(self.server_args, "dp_size", 1)
attn_cp_size = getattr(self.server_args, "attn_cp_size", 1)
moe_dp_size = getattr(self.server_args, "moe_dp_size", 1)

Comment on lines +11 to +15
CUDA_VISIBLE_DEVICES=1,2,3,5 torchrun \
--standalone \
--nproc_per_node $NUM_GPUS \
scripts/prepare_hidden_states.py \
--target-model-path /data/jiapingW/pretrained_models/Qwen3.5-35B-A3B \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script contains hardcoded values for CUDA_VISIBLE_DEVICES and --target-model-path. This reduces portability and makes it difficult for other users to run the script on different machine configurations. It's recommended to parameterize these values using environment variables or script arguments.

Suggested change
CUDA_VISIBLE_DEVICES=1,2,3,5 torchrun \
--standalone \
--nproc_per_node $NUM_GPUS \
scripts/prepare_hidden_states.py \
--target-model-path /data/jiapingW/pretrained_models/Qwen3.5-35B-A3B \
CUDA_VISIBLE_DEVICES=${CUDA_DEVICES:-"1,2,3,5"} torchrun \
--standalone \
--nproc_per_node $NUM_GPUS \
scripts/prepare_hidden_states.py \
--target-model-path ${TARGET_MODEL_PATH:-"/data/jiapingW/pretrained_models/Qwen3.5-35B-A3B"} \

Comment on lines +26 to +44
# NUM_GPUS=2
# CUDA_VISIBLE_DEVICES=6,7 torchrun \
# --standalone \
# --nproc_per_node $NUM_GPUS \
# $ROOT_DIR/scripts/train_eagle3.py \
# --target-model-path /data/jiapingW/pretrained_models/Qwen3.5-35B-A3B \
# --draft-model-config $ROOT_DIR/configs/qwen3.5-35b-a3b-eagle3.json \
# --train-data-path $ROOT_DIR/cache/dataset/ultrachat_train.jsonl \
# --train-hidden-states-path $ROOT_DIR/cache/hidden_states/qwen3.5-35b-a3b-ultrachat \
# --build-dataset-num-proc $BUILD_DATASET_NUM_PROC \
# --output-dir $ROOT_DIR/outputs/qwen3.5-35b-a3b-ultrachat \
# --num-epochs 10 \
# --batch-size 1 \
# --tp-size 1 \
# --learning-rate 5e-5 \
# --max-length 4096 \
# --chat-template qwen \
# --cache-dir $ROOT_DIR/cache \
# --embedding-key "model.language_model.embed_tokens.weight"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This large block of code for training is commented out, which can be confusing. If this section is not yet functional or is intended for reference, please add a comment explaining its status and usage. Otherwise, if it's obsolete, consider removing it to improve the script's clarity.

[g0, g1], [g2, g3], [g4, g5], [g6, g7]
2 pipeline model-parallel groups:
[g0, g2, g4, g6], [g1, g3, g5, g7]
[g0, g2, g4, g6], [b1, g3, g5, g7]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the docstring example. b1 should likely be g1 to maintain consistency with the g notation for GPUs.

Suggested change
[g0, g2, g4, g6], [b1, g3, g5, g7]
[g0, g2, g4, g6], [g1, g3, g5, g7]

957B
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

0