fix: Bump sglang version from 0.5.9 to 0.5.10 by moehanabi · Pull Request #529 · sgl-project/SpecForge

moehanabi · 2026-04-13T03:09:33Z

Motivation

We need transformers 5.3.0 to train qwen 3.5 series model, and we need sglang 0.5.10 to adapt transformers 5.3.0.

Modifications

Ref:

[parallel_state Refactor 1/n] Remove stream of PyNCCL sglang#20866
Piecewise Cuda Graph set default sglang#16331
Refactor weight loading huggingface/transformers#41580
Correctly create tied key mapping in post_init, and dynamic tie weight huggingface/transformers#42270
Separate check_model_inputs into capture_outputs and merge_with_config_defaults + ensure correctness huggingface/transformers#43862
Add buffers to _init_weights for ALL models huggingface/transformers#42309
[loading] Really initialize on meta device for huge perf gains huggingface/transformers#42941
Move missing weights and non-persistent buffers to correct device earlier huggingface/transformers#43021

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

Copilot

Pull request overview

This PR updates SpecForge’s SGLang integration to align with the sglang 0.5.10 release, including adjusting backend patching and CLI/runtime argument plumbing.

Changes:

Bump the pinned sglang dependency from 0.5.9 to 0.5.10.
Update the SGLang backend patch to stop passing pynccl_use_current_stream when initializing model-parallel groups.
Rename the piecewise CUDA graph flag/plumbing from “enable” to “enforce” and propagate the new keyword into SGLang backend kwargs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`specforge/modeling/target/sglang_backend/patch.py`	Removes now-unsupported `pynccl_use_current_stream` kwargs and keeps compatibility notes around TP group initialization.
`specforge/args.py`	Renames piecewise CUDA graph flag/field and updates kwargs mapping to SGLang; updates CLI flag name and help.
`pyproject.toml`	Pins `sglang==0.5.10`.

Comments suppressed due to low confidence (1)

specforge/modeling/target/sglang_backend/patch.py:163

These inline notes reference sglang 0.5.9, but this PR bumps the dependency to 0.5.10. To avoid confusion, update the comments to be version-agnostic or reflect the new minimum supported version.

        # NOTE: Check pynccl_comm exists before accessing it (may be None in sglang 0.5.9)
        if parallel_state._TP.pynccl_comm is not None:
            parallel_state._TP.pynccl_comm.disabled = False
        if parallel_state._PDMUX_PREFILL_TP_GROUP.pynccl_comm is not None:
            parallel_state._PDMUX_PREFILL_TP_GROUP.pynccl_comm.disabled = False

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T03:12:25Z

@@ -151,9 +151,9 @@ def add_args(parser: argparse.ArgumentParser) -> None:
            help="Enable piecewise CUDA graph for SGLang backend",


The CLI help text for --sglang-enable-dp-lm-head appears incorrect (it mentions piecewise CUDA graph). This is user-facing and may confuse users; update the help string to describe DP LM head behavior instead.

Suggested change

help="Enable piecewise CUDA graph for SGLang backend",

help="Enable DP LM head for SGLang backend",

Copilot · 2026-04-13T03:12:25Z

        parser.add_argument(
-            "--sglang-enable-piecewise-cuda-graph",
+            "--sglang-enforce-piecewise-cuda-graph",
            action="store_true",
-            help="Enable piecewise CUDA graph for SGLang backend's prefill",
+            help="Enforce piecewise CUDA graph for SGLang backend's prefill",
        )


Renaming --sglang-enable-piecewise-cuda-graph to --sglang-enforce-piecewise-cuda-graph is a breaking CLI change. Consider keeping the old flag as a deprecated alias (same dest) or clearly documenting the change so existing scripts don’t fail on upgrade.

gemini-code-assist

Code Review

This pull request updates the sglang dependency to version 0.5.10 and renames the sglang_enable_piecewise_cuda_graph argument to sglang_enforce_piecewise_cuda_graph across the codebase. It also removes the pynccl_use_current_stream parameter from model parallel initialization calls. Feedback suggests correcting the help text for the --sglang-enable-dp-lm-head argument, which currently contains an incorrect description that was made more apparent by the changes in this PR.

gemini-code-assist · 2026-04-13T03:14:09Z

+            "--sglang-enforce-piecewise-cuda-graph",
            action="store_true",
-            help="Enable piecewise CUDA graph for SGLang backend's prefill",
+            help="Enforce piecewise CUDA graph for SGLang backend's prefill",


The help text for the renamed argument --sglang-enforce-piecewise-cuda-graph is correct, but it highlights a significant issue in the preceding argument's help text (line 151), which incorrectly describes --sglang-enable-dp-lm-head as enabling piecewise CUDA graphs. While line 151 is not directly modified in this diff, the rename here makes the duplication and inaccuracy more apparent to users. Consider fixing the help text for --sglang-enable-dp-lm-head in a follow-up or including it here if possible.

moehanabi requested review from FlamingoPg and sleepcoo as code owners April 13, 2026 03:09

Copilot AI review requested due to automatic review settings April 13, 2026 03:09

moehanabi requested review from FrankLeeeee and shuaills as code owners April 13, 2026 03:09

Copilot started reviewing on behalf of moehanabi April 13, 2026 03:10 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

moehanabi mentioned this pull request Apr 13, 2026

fix: Support different version of PCG args #517

Closed

6 tasks

moehanabi force-pushed the bump_sglang0.5.10 branch 4 times, most recently from 077636a to 0f6638d Compare April 18, 2026 14:03

fix: Bump sglang version from 0.5.9 to 0.5.10

9059c75

moehanabi force-pushed the bump_sglang0.5.10 branch from 0f6638d to 9059c75 Compare April 22, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Bump sglang version from 0.5.9 to 0.5.10#529

fix: Bump sglang version from 0.5.9 to 0.5.10#529
moehanabi wants to merge 1 commit intosgl-project:mainfrom
moehanabi:bump_sglang0.5.10

moehanabi commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -151,9 +151,9 @@ def add_args(parser: argparse.ArgumentParser) -> None:
		help="Enable piecewise CUDA graph for SGLang backend",

	help="Enable piecewise CUDA graph for SGLang backend",
	help="Enable DP LM head for SGLang backend",

Conversation

moehanabi commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moehanabi commented Apr 13, 2026 •

edited

Loading