feat: MS-Swift Megatron #407

bradhilton · 2025-09-16T15:57:36Z

No description provided.

Configure Qwen3 MoE for LoRA SFT with Megatron-SWIFT. Co-authored-by: bhilton <bhilton@wandb.com>

Enhanced the setup and run sections in config.yaml by adjusting indentation and ensuring proper execution of Python scripts for data generation and model testing. This improves readability and maintainability of the configuration.

Modified config.yaml to rename the model and adjust GPU resources. Added a new Dockerfile to set up the environment with necessary dependencies for ms-swift-megatron, including SSH server configuration and preinstalled packages for SkyPilot.

Modified to-hf.sh, to-mcore.sh, and train.sh to use the new model version Qwen3-235B-A22B-Instruct-2507 and expanded CUDA_VISIBLE_DEVICES to include more GPUs. Adjusted dataset and training parameters in train.sh for improved performance.

cursoragent and others added 6 commits September 16, 2025 00:06

feat: Add Qwen3 MoE LoRA SFT config

e1c6617

Configure Qwen3 MoE for LoRA SFT with Megatron-SWIFT. Co-authored-by: bhilton <bhilton@wandb.com>

chore: Add ms-swift megatron scripts

f3b9e03

fix: Correct model loading path in train.sh script

7d4ed0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: MS-Swift Megatron #407

feat: MS-Swift Megatron #407

Uh oh!

Uh oh!

Uh oh!

feat: MS-Swift Megatron #407

Are you sure you want to change the base?

feat: MS-Swift Megatron #407

Uh oh!

Conversation

Uh oh!

Uh oh!