The existing Wide-EP pattern (DPServer + decode-as-orchestrator PD + gang scheduling) passes vLLM-keyed kwargs and sets VLLM_RAY_BUNDLE_INDICES. SGLang consumes different names (dist_init_addr, moe_dp_size, enable_dp_attention, etc.) and has in-engine elastic expert backup. This issue tracks a parallel SGLang Wide-EP server class.
Subitems
Open questions
- Failure-mode layering between engine-level elastic expert backup and Ray-side
RESTART_GANG. Engine handles expert-local; gang handles control-plane. Explicit contract so the two don't fight.
- Whether expert-placement topology is observable at the Ray layer or stays engine-internal.
- Autoscaling behavior on wide-EP deployments: expert rebalance on scale-up, or opaque DP-group scaling.
- Whether the new server class subclasses
DPServer or stands alone.
Upstream coordination
- Expert-placement state exposed over a
TokenizerManager RPC for Ray-side observability.
- Clear engine signal when elastic backup is in progress, so gang restart can defer.
The existing Wide-EP pattern (
DPServer+ decode-as-orchestrator PD + gang scheduling) passes vLLM-keyed kwargs and setsVLLM_RAY_BUNDLE_INDICES. SGLang consumes different names (dist_init_addr,moe_dp_size,enable_dp_attention, etc.) and has in-engine elastic expert backup. This issue tracks a parallel SGLang Wide-EP server class.Subitems
moe_dp_size,moe_ep_size,enable_dp_attention,elastic_ep_backend,enable_elastic_expert_backup) are first-class through the new server.RESTART_GANGhandles control-plane / NCCL-level failures. Documented contract.Open questions
RESTART_GANG. Engine handles expert-local; gang handles control-plane. Explicit contract so the two don't fight.DPServeror stands alone.Upstream coordination
TokenizerManagerRPC for Ray-side observability.