fix: Bump sglang version from 0.5.9 to 0.5.10#529
fix: Bump sglang version from 0.5.9 to 0.5.10#529moehanabi wants to merge 1 commit intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates SpecForge’s SGLang integration to align with the sglang 0.5.10 release, including adjusting backend patching and CLI/runtime argument plumbing.
Changes:
- Bump the pinned
sglangdependency from0.5.9to0.5.10. - Update the SGLang backend patch to stop passing
pynccl_use_current_streamwhen initializing model-parallel groups. - Rename the piecewise CUDA graph flag/plumbing from “enable” to “enforce” and propagate the new keyword into SGLang backend kwargs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
specforge/modeling/target/sglang_backend/patch.py |
Removes now-unsupported pynccl_use_current_stream kwargs and keeps compatibility notes around TP group initialization. |
specforge/args.py |
Renames piecewise CUDA graph flag/field and updates kwargs mapping to SGLang; updates CLI flag name and help. |
pyproject.toml |
Pins sglang==0.5.10. |
Comments suppressed due to low confidence (1)
specforge/modeling/target/sglang_backend/patch.py:163
- These inline notes reference sglang 0.5.9, but this PR bumps the dependency to 0.5.10. To avoid confusion, update the comments to be version-agnostic or reflect the new minimum supported version.
# NOTE: Check pynccl_comm exists before accessing it (may be None in sglang 0.5.9)
if parallel_state._TP.pynccl_comm is not None:
parallel_state._TP.pynccl_comm.disabled = False
if parallel_state._PDMUX_PREFILL_TP_GROUP.pynccl_comm is not None:
parallel_state._PDMUX_PREFILL_TP_GROUP.pynccl_comm.disabled = False
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -151,9 +151,9 @@ def add_args(parser: argparse.ArgumentParser) -> None: | |||
| help="Enable piecewise CUDA graph for SGLang backend", | |||
There was a problem hiding this comment.
The CLI help text for --sglang-enable-dp-lm-head appears incorrect (it mentions piecewise CUDA graph). This is user-facing and may confuse users; update the help string to describe DP LM head behavior instead.
| help="Enable piecewise CUDA graph for SGLang backend", | |
| help="Enable DP LM head for SGLang backend", |
| parser.add_argument( | ||
| "--sglang-enable-piecewise-cuda-graph", | ||
| "--sglang-enforce-piecewise-cuda-graph", | ||
| action="store_true", | ||
| help="Enable piecewise CUDA graph for SGLang backend's prefill", | ||
| help="Enforce piecewise CUDA graph for SGLang backend's prefill", | ||
| ) |
There was a problem hiding this comment.
Renaming --sglang-enable-piecewise-cuda-graph to --sglang-enforce-piecewise-cuda-graph is a breaking CLI change. Consider keeping the old flag as a deprecated alias (same dest) or clearly documenting the change so existing scripts don’t fail on upgrade.
There was a problem hiding this comment.
Code Review
This pull request updates the sglang dependency to version 0.5.10 and renames the sglang_enable_piecewise_cuda_graph argument to sglang_enforce_piecewise_cuda_graph across the codebase. It also removes the pynccl_use_current_stream parameter from model parallel initialization calls. Feedback suggests correcting the help text for the --sglang-enable-dp-lm-head argument, which currently contains an incorrect description that was made more apparent by the changes in this PR.
| "--sglang-enforce-piecewise-cuda-graph", | ||
| action="store_true", | ||
| help="Enable piecewise CUDA graph for SGLang backend's prefill", | ||
| help="Enforce piecewise CUDA graph for SGLang backend's prefill", |
There was a problem hiding this comment.
The help text for the renamed argument --sglang-enforce-piecewise-cuda-graph is correct, but it highlights a significant issue in the preceding argument's help text (line 151), which incorrectly describes --sglang-enable-dp-lm-head as enabling piecewise CUDA graphs. While line 151 is not directly modified in this diff, the rename here makes the duplication and inaccuracy more apparent to users. Consider fixing the help text for --sglang-enable-dp-lm-head in a follow-up or including it here if possible.
077636a to
0f6638d
Compare
0f6638d to
9059c75
Compare
Motivation
We need transformers 5.3.0 to train qwen 3.5 series model, and we need sglang 0.5.10 to adapt transformers 5.3.0.
Modifications
Ref:
check_model_inputsintocapture_outputsandmerge_with_config_defaults+ ensure correctness huggingface/transformers#43862_init_weightsfor ALL models huggingface/transformers#42309Related Issues
Accuracy Test
Benchmark & Profiling
Checklist