Skip to content

fix: Bump sglang version from 0.5.9 to 0.5.10#529

Open
moehanabi wants to merge 1 commit intosgl-project:mainfrom
moehanabi:bump_sglang0.5.10
Open

fix: Bump sglang version from 0.5.9 to 0.5.10#529
moehanabi wants to merge 1 commit intosgl-project:mainfrom
moehanabi:bump_sglang0.5.10

Conversation

@moehanabi
Copy link
Copy Markdown
Contributor

@moehanabi moehanabi commented Apr 13, 2026

Motivation

We need transformers 5.3.0 to train qwen 3.5 series model, and we need sglang 0.5.10 to adapt transformers 5.3.0.

Modifications

Ref:

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Copilot AI review requested due to automatic review settings April 13, 2026 03:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates SpecForge’s SGLang integration to align with the sglang 0.5.10 release, including adjusting backend patching and CLI/runtime argument plumbing.

Changes:

  • Bump the pinned sglang dependency from 0.5.9 to 0.5.10.
  • Update the SGLang backend patch to stop passing pynccl_use_current_stream when initializing model-parallel groups.
  • Rename the piecewise CUDA graph flag/plumbing from “enable” to “enforce” and propagate the new keyword into SGLang backend kwargs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
specforge/modeling/target/sglang_backend/patch.py Removes now-unsupported pynccl_use_current_stream kwargs and keeps compatibility notes around TP group initialization.
specforge/args.py Renames piecewise CUDA graph flag/field and updates kwargs mapping to SGLang; updates CLI flag name and help.
pyproject.toml Pins sglang==0.5.10.
Comments suppressed due to low confidence (1)

specforge/modeling/target/sglang_backend/patch.py:163

  • These inline notes reference sglang 0.5.9, but this PR bumps the dependency to 0.5.10. To avoid confusion, update the comments to be version-agnostic or reflect the new minimum supported version.
        # NOTE: Check pynccl_comm exists before accessing it (may be None in sglang 0.5.9)
        if parallel_state._TP.pynccl_comm is not None:
            parallel_state._TP.pynccl_comm.disabled = False
        if parallel_state._PDMUX_PREFILL_TP_GROUP.pynccl_comm is not None:
            parallel_state._PDMUX_PREFILL_TP_GROUP.pynccl_comm.disabled = False

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread specforge/args.py
@@ -151,9 +151,9 @@ def add_args(parser: argparse.ArgumentParser) -> None:
help="Enable piecewise CUDA graph for SGLang backend",
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI help text for --sglang-enable-dp-lm-head appears incorrect (it mentions piecewise CUDA graph). This is user-facing and may confuse users; update the help string to describe DP LM head behavior instead.

Suggested change
help="Enable piecewise CUDA graph for SGLang backend",
help="Enable DP LM head for SGLang backend",

Copilot uses AI. Check for mistakes.
Comment thread specforge/args.py
Comment on lines 153 to 157
parser.add_argument(
"--sglang-enable-piecewise-cuda-graph",
"--sglang-enforce-piecewise-cuda-graph",
action="store_true",
help="Enable piecewise CUDA graph for SGLang backend's prefill",
help="Enforce piecewise CUDA graph for SGLang backend's prefill",
)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming --sglang-enable-piecewise-cuda-graph to --sglang-enforce-piecewise-cuda-graph is a breaking CLI change. Consider keeping the old flag as a deprecated alias (same dest) or clearly documenting the change so existing scripts don’t fail on upgrade.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sglang dependency to version 0.5.10 and renames the sglang_enable_piecewise_cuda_graph argument to sglang_enforce_piecewise_cuda_graph across the codebase. It also removes the pynccl_use_current_stream parameter from model parallel initialization calls. Feedback suggests correcting the help text for the --sglang-enable-dp-lm-head argument, which currently contains an incorrect description that was made more apparent by the changes in this PR.

Comment thread specforge/args.py
"--sglang-enforce-piecewise-cuda-graph",
action="store_true",
help="Enable piecewise CUDA graph for SGLang backend's prefill",
help="Enforce piecewise CUDA graph for SGLang backend's prefill",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The help text for the renamed argument --sglang-enforce-piecewise-cuda-graph is correct, but it highlights a significant issue in the preceding argument's help text (line 151), which incorrectly describes --sglang-enable-dp-lm-head as enabling piecewise CUDA graphs. While line 151 is not directly modified in this diff, the rename here makes the duplication and inaccuracy more apparent to users. Consider fixing the help text for --sglang-enable-dp-lm-head in a follow-up or including it here if possible.

@moehanabi moehanabi force-pushed the bump_sglang0.5.10 branch 4 times, most recently from 077636a to 0f6638d Compare April 18, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants