Skip to content

[Serve][LLM][SGLang] Wide / elastic expert parallelism serving pattern #62793

@eicherseiji

Description

@eicherseiji

The existing Wide-EP pattern (DPServer + decode-as-orchestrator PD + gang scheduling) passes vLLM-keyed kwargs and sets VLLM_RAY_BUNDLE_INDICES. SGLang consumes different names (dist_init_addr, moe_dp_size, enable_dp_attention, etc.) and has in-engine elastic expert backup. This issue tracks a parallel SGLang Wide-EP server class.

Subitems

  • A new SGLang DP / Wide-EP server class.
  • SGLang's MoE / EP engine args (moe_dp_size, moe_ep_size, enable_dp_attention, elastic_ep_backend, enable_elastic_expert_backup) are first-class through the new server.
  • Layering: engine-level elastic backup handles expert-local failures, gang RESTART_GANG handles control-plane / NCCL-level failures. Documented contract.
  • SGLang Wide-EP release test on a real MoE model (DeepSeek-V3 class). Fault-injection variant kills one scheduler actor and asserts engine-level recovery.
  • SGLang Wide-EP user guide documenting topology and failure-mode contract.

Open questions

  • Failure-mode layering between engine-level elastic expert backup and Ray-side RESTART_GANG. Engine handles expert-local; gang handles control-plane. Explicit contract so the two don't fight.
  • Whether expert-placement topology is observable at the Ray layer or stays engine-internal.
  • Autoscaling behavior on wide-EP deployments: expert rebalance on scale-up, or opaque DP-group scaling.
  • Whether the new server class subclasses DPServer or stands alone.

Upstream coordination

  • Expert-placement state exposed over a TokenizerManager RPC for Ray-side observability.
  • Clear engine signal when elastic backup is in progress, so gang restart can defer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    llmserveRay Serve Related Issue

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions