Request-cost-aware router fairness priority#897
Request-cost-aware router fairness priority#897JagjeevanAK wants to merge 5 commits intovolcano-sh:mainfrom
Conversation
…dded the required V(4) enqueue debug log fields, and added targeted router unit tests for output-token parsing, weight application, historical-usage inclusion, request-size sensitivity, and log-gating behavior Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
Welcome @JagjeevanAK! It looks like this is your first PR to volcano-sh/kthena 🎉 |
There was a problem hiding this comment.
Code Review
This pull request updates the fairness scheduling logic by replacing generic token and request weights with specific input and output token weights. It introduces a new priority calculation that combines historical usage with an estimated request cost based on input tokens and requested output tokens. Comprehensive tests were added to verify these changes. Feedback highlights the need to handle multiple numeric types when parsing token fields and identifies inconsistencies between the new router logic and the existing datastore priority refresh mechanism.
There was a problem hiding this comment.
Pull request overview
This PR updates the router’s fairness enqueue priority to account for the estimated cost of the current request (input + requested output tokens) in addition to the user/model’s historical weighted usage, improving admission ordering for expensive requests while keeping the sliding-window fairness model.
Changes:
- Replace enqueue priority calculation with
historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens). - Add parsing for requested output token budget from
max_completion_tokens/max_tokens. - Add V(4) debug logging for enqueue priority components and introduce unit tests for the new behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Implements cost-aware enqueue priority calculation and adds debug logging for priority components. |
| pkg/kthena-router/router/router_fairness_test.go | Adds unit tests covering output-budget parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
Addressed AI review findings in commit 68c4b4c on this PR branch:
Scope note for maintainers: for issue #757 this change intentionally uses Proposal context is documented on issue #757: #757 (comment) |
|
I would also like to see the proposal doc in this pr |
|
Sure, I have added Proposal in PR description. |
|
Adding label DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Pull request overview
This PR enhances the router’s fairness scheduling by making enqueue priority request-cost-aware (estimated input + requested output tokens) while preserving the existing sliding-window historical usage model.
Changes:
- Add request-cost components (input tokens + requested output token budget) into enqueue priority calculation and log detailed enqueue priority at klog V(4).
- Persist a per-request
PriorityOffsetso dequeue-time priority refresh can preserve request-size sensitivity. - Add unit tests covering output-budget parsing and priority composition behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Computes request-cost-aware enqueue priority; adds V(4) enqueue debug log; stores per-request offset for refresh. |
| pkg/kthena-router/datastore/fairness_queue.go | Preserves request-size component across priority refresh/rebuild via PriorityOffset. |
| pkg/kthena-router/datastore/fairness_queue_test.go | Adds tests ensuring offset is preserved across refresh and heap rebuild. |
| pkg/kthena-router/router/router_fairness_test.go | Adds unit tests for output token parsing and request-cost-aware priority behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@JagjeevanAK Mind adding a doc under https://github.com/volcano-sh/kthena/tree/main/docs/proposal |
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
hey can you check now ? CC: @hzxuzhonghu |
There was a problem hiding this comment.
Pull request overview
Enhances Kthena router fairness scheduling so enqueue ordering is request-cost-aware by combining historical sliding-window usage with an estimated cost for the in-flight request, while preserving request-size sensitivity across queue priority refreshes.
Changes:
- Compute enqueue priority as
historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens)and add a V(4) debug log for the priority components. - Persist an enqueue-time
PriorityOffseton queued requests so dequeue-time refresh/rebuild keeps request-size sensitivity. - Add unit tests for output-budget parsing, priority calculation, request-size sensitivity, and offset preservation during refresh/rebuild.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Implements request-cost-aware enqueue priority calculation, logging, and enqueues PriorityOffset. |
| pkg/kthena-router/datastore/fairness_queue.go | Preserves request-cost component during priority refresh/rebuild via PriorityOffset. |
| pkg/kthena-router/router/router_fairness_test.go | Adds unit tests for output token parsing, cost calculation, request-size sensitivity, and log gating. |
| pkg/kthena-router/datastore/fairness_queue_test.go | Adds tests asserting PriorityOffset is preserved across refresh and heap rebuild. |
| docs/proposal/request-cost-aware-fairness-priority.md | Adds an RFC-style proposal documenting the intended behavior and scope. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (r *Router) calculateRequestPriority(userID, modelName string, inputTokens int, modelRequest ModelRequest) enqueuePriorityDetails { | ||
| historicalUsage, err := r.store.GetTokenCount(userID, modelName) | ||
| if err != nil { | ||
| klog.Warningf("failed to get fairness usage for user=%s model=%s: %v", userID, modelName, err) | ||
| historicalUsage = 0 | ||
| } | ||
|
|
| t.Fatalf("Expected refreshed priority 15.0 with offset preserved, got %v", result.Priority) | ||
| } | ||
| } | ||
|
|
| newPri, err := CalculateFairnessPriority( | ||
| pq.tokenTracker, | ||
| req.UserID, | ||
| req.ModelName, | ||
| pq.config.TokenWeight, | ||
| pq.config.RequestNumWeight, | ||
| ) | ||
| if err == nil && newPri != req.Priority { | ||
| newPri += req.PriorityOffset | ||
| req.Priority = newPri | ||
| // Check if this request should still be dequeued | ||
| if len(pq.heap) > 0 && newPri > pq.heap[0].Priority { |
|
Donot contain merge-commits , i will first take a look at the proposal |
|
Okay sorry for that will keep in mind |
What type of PR is this?
/kind enhancement
What this PR does / why we need it:
This PR makes router fairness enqueue priority request-cost-aware by combining historical weighted usage with the current request’s estimated cost. The goal is to keep the existing sliding-window fairness model while making enqueue ordering reflect both prior usage and the size of the request being admitted.
Which issue(s) this PR fixes:
Fixes #757
Special notes for your reviewer:
The attached RFC plan as posted on issue while workspace and the implementation scope is intentionally narrow: it reuses the existing fairness weights, avoids API/CRD changes, and adds a V(4) enqueue debug log plus unit coverage for output-token parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating behavior.
I used AI assistance while preparing this PR description and implementation summary.
Does this PR introduce a user-facing change?:
Yes.
Router fairness enqueue priority now includes estimated current request cost in addition to historical weighted usage, improving admission ordering for larger requests.
PROPOSAL
I drafted an RFC-style proposal for this issue focused on a narrow fairness improvement in router enqueue priority.
Summary:
This proposal keeps the existing sliding-window fairness model, and adds estimated current request cost at enqueue time so requests are not ranked by historical aggregate alone.
Proposed priority:
Key changes:
Configuration/logging scope:
Compatibility:
Test plan:
Non-goals for this issue: