Request-cost-aware router fairness priority by JagjeevanAK · Pull Request #897 · volcano-sh/kthena

JagjeevanAK · 2026-04-19T01:57:38Z

What type of PR is this?

/kind enhancement

What this PR does / why we need it:

This PR makes router fairness enqueue priority request-cost-aware by combining historical weighted usage with the current request’s estimated cost. The goal is to keep the existing sliding-window fairness model while making enqueue ordering reflect both prior usage and the size of the request being admitted.

Which issue(s) this PR fixes:
Fixes #757

Special notes for your reviewer:

The attached RFC plan as posted on issue while workspace and the implementation scope is intentionally narrow: it reuses the existing fairness weights, avoids API/CRD changes, and adds a V(4) enqueue debug log plus unit coverage for output-token parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating behavior.

I used AI assistance while preparing this PR description and implementation summary.

Does this PR introduce a user-facing change?:

Yes.

Router fairness enqueue priority now includes estimated current request cost in addition to historical weighted usage, improving admission ordering for larger requests.

PROPOSAL

I drafted an RFC-style proposal for this issue focused on a narrow fairness improvement in router enqueue priority.

Summary:
This proposal keeps the existing sliding-window fairness model, and adds estimated current request cost at enqueue time so requests are not ranked by historical aggregate alone.

Proposed priority:

estimatedRequestCost = inputWeight * inputTokens + outputWeight * requestedOutputTokens
finalPriority = historicalWeightedUsage + estimatedRequestCost
Lower numeric value still means higher priority (unchanged queue behavior)

Key changes:

Keep the sliding-window token tracker as the source of historical usage per (user, model)
Update router enqueue priority helper in router.go to use:
- userID
- modelName
- inputTokens
- parsed request body
Resolve requested output tokens in this order:
- max_completion_tokens
- max_tokens
- else 0
Treat missing, invalid, or negative values as 0
Keep post-response accounting unchanged (real usage still updates token tracker after completion)

Configuration/logging scope:

Reuse only FAIRNESS_INPUT_TOKEN_WEIGHT and FAIRNESS_OUTPUT_TOKEN_WEIGHT for this issue
No CRD, CLI flag, or Helm value changes
No expansion into FAIRNESS_PRIORITY_REQUEST_NUM_WEIGHT in this issue
Add one V(4) enqueue debug log with:
- requestID, userID, model, historicalUsage, inputTokens, requestedOutputTokens, estimatedRequestCost, finalPriority
No new metrics required

Compatibility:

No public API/CRD changes
Internal router helper signature changes only
Fairness queue semantics remain the same except for the new request-cost input

Test plan:

Unit tests for output token parsing:
- max_completion_tokens, max_tokens, missing, invalid, negative
Unit tests for weight application and final score calculation
Router-level tests for:
- request-size sensitivity with same historical usage
- historical usage inclusion
- zero/missing output budget fallback to input-only cost
- debug-log gating behavior
Scope is unit tests only (no e2e/load testing in this issue)

Non-goals for this issue:

Cross-model fairness
Request-count weighting redesign
Queue refresh redesign
HA/distributed fairness state

…dded the required V(4) enqueue debug log fields, and added targeted router unit tests for output-token parsing, weight application, historical-usage inclusion, request-size sensitivity, and log-gating behavior Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

volcano-sh-bot · 2026-04-19T01:57:50Z

Welcome @JagjeevanAK! It looks like this is your first PR to volcano-sh/kthena 🎉

gemini-code-assist

Code Review

This pull request updates the fairness scheduling logic by replacing generic token and request weights with specific input and output token weights. It introduces a new priority calculation that combines historical usage with an estimated request cost based on input tokens and requested output tokens. Comprehensive tests were added to verify these changes. Feedback highlights the need to handle multiple numeric types when parsing token fields and identifies inconsistencies between the new router logic and the existing datastore priority refresh mechanism.

Copilot

Pull request overview

This PR updates the router’s fairness enqueue priority to account for the estimated cost of the current request (input + requested output tokens) in addition to the user/model’s historical weighted usage, improving admission ordering for expensive requests while keeping the sliding-window fairness model.

Changes:

Replace enqueue priority calculation with historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens).
Add parsing for requested output token budget from max_completion_tokens / max_tokens.
Add V(4) debug logging for enqueue priority components and introduce unit tests for the new behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pkg/kthena-router/router/router.go	Implements cost-aware enqueue priority calculation and adds debug logging for priority components.
pkg/kthena-router/router/router_fairness_test.go	Adds unit tests covering output-budget parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

JagjeevanAK · 2026-04-19T02:16:46Z

Addressed AI review findings in commit 68c4b4c on this PR branch:

Requested output-token parsing now accepts float64, int, int64, and json.Number (invalid or negative values still fallback to 0).
Fairness queue refresh/rebuild now preserves enqueue-time request-cost contribution via a stored priority offset, so refresh no longer drops request-size sensitivity.

Scope note for maintainers: for issue #757 this change intentionally uses FAIRNESS_INPUT_TOKEN_WEIGHT and FAIRNESS_OUTPUT_TOKEN_WEIGHT for request-cost-aware enqueue behavior, while broader compatibility/deprecation handling around FAIRNESS_PRIORITY_* is being kept out of this narrowly scoped issue unless maintainers want it folded in here.

Proposal context is documented on issue #757: #757 (comment)

hzxuzhonghu · 2026-04-20T03:19:55Z

I would also like to see the proposal doc in this pr

JagjeevanAK · 2026-04-22T04:48:07Z

Sure, I have added Proposal in PR description.

volcano-sh-bot · 2026-04-22T04:50:23Z

Adding label do-not-merge/contains-merge-commits because PR contains merge commits, which are not allowed in this repository.
Use git rebase to reapply your commits on top of the target branch. Detailed instructions for doing so can be found here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copilot

Pull request overview

This PR enhances the router’s fairness scheduling by making enqueue priority request-cost-aware (estimated input + requested output tokens) while preserving the existing sliding-window historical usage model.

Changes:

Add request-cost components (input tokens + requested output token budget) into enqueue priority calculation and log detailed enqueue priority at klog V(4).
Persist a per-request PriorityOffset so dequeue-time priority refresh can preserve request-size sensitivity.
Add unit tests covering output-budget parsing and priority composition behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
pkg/kthena-router/router/router.go	Computes request-cost-aware enqueue priority; adds V(4) enqueue debug log; stores per-request offset for refresh.
pkg/kthena-router/datastore/fairness_queue.go	Preserves request-size component across priority refresh/rebuild via `PriorityOffset`.
pkg/kthena-router/datastore/fairness_queue_test.go	Adds tests ensuring offset is preserved across refresh and heap rebuild.
pkg/kthena-router/router/router_fairness_test.go	Adds unit tests for output token parsing and request-cost-aware priority behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hzxuzhonghu · 2026-04-23T08:03:58Z

@JagjeevanAK Mind adding a doc under https://github.com/volcano-sh/kthena/tree/main/docs/proposal

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

volcano-sh-bot · 2026-04-23T09:08:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lizhencheng9527 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

docs/OWNERS
pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JagjeevanAK · 2026-04-23T09:10:58Z

hey can you check now ?

CC: @hzxuzhonghu

Copilot

Pull request overview

Enhances Kthena router fairness scheduling so enqueue ordering is request-cost-aware by combining historical sliding-window usage with an estimated cost for the in-flight request, while preserving request-size sensitivity across queue priority refreshes.

Changes:

Compute enqueue priority as historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens) and add a V(4) debug log for the priority components.
Persist an enqueue-time PriorityOffset on queued requests so dequeue-time refresh/rebuild keeps request-size sensitivity.
Add unit tests for output-budget parsing, priority calculation, request-size sensitivity, and offset preservation during refresh/rebuild.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/kthena-router/router/router.go	Implements request-cost-aware enqueue priority calculation, logging, and enqueues `PriorityOffset`.
pkg/kthena-router/datastore/fairness_queue.go	Preserves request-cost component during priority refresh/rebuild via `PriorityOffset`.
pkg/kthena-router/router/router_fairness_test.go	Adds unit tests for output token parsing, cost calculation, request-size sensitivity, and log gating.
pkg/kthena-router/datastore/fairness_queue_test.go	Adds tests asserting `PriorityOffset` is preserved across refresh and heap rebuild.
docs/proposal/request-cost-aware-fairness-priority.md	Adds an RFC-style proposal documenting the intended behavior and scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+func (r *Router) calculateRequestPriority(userID, modelName string, inputTokens int, modelRequest ModelRequest) enqueuePriorityDetails {
+	historicalUsage, err := r.store.GetTokenCount(userID, modelName)
+	if err != nil {
+		klog.Warningf("failed to get fairness usage for user=%s model=%s: %v", userID, modelName, err)
+		historicalUsage = 0
+	}
+


+		t.Fatalf("Expected refreshed priority 15.0 with offset preserved, got %v", result.Priority)
+	}
+}
+


 				newPri, err := CalculateFairnessPriority(
 					pq.tokenTracker,
 					req.UserID,
 					req.ModelName,
 					pq.config.TokenWeight,
 					pq.config.RequestNumWeight,
 				)
 				if err == nil && newPri != req.Priority {
+					newPri += req.PriorityOffset
 					req.Priority = newPri
 					// Check if this request should still be dequeued
 					if len(pq.heap) > 0 && newPri > pq.heap[0].Priority {


hzxuzhonghu · 2026-04-23T11:23:33Z

Donot contain merge-commits , i will first take a look at the proposal

JagjeevanAK · 2026-04-23T13:27:43Z

Okay sorry for that will keep in mind

Copilot AI review requested due to automatic review settings April 19, 2026 01:57

volcano-sh-bot added the kind/enhancement New feature or request label Apr 19, 2026

volcano-sh-bot requested review from YaoZengzeng and hzxuzhonghu April 19, 2026 01:57

volcano-sh-bot added the size/L label Apr 19, 2026

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go Outdated

Comment thread pkg/kthena-router/router/router.go

Copilot started reviewing on behalf of JagjeevanAK April 19, 2026 01:59 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go

Comment thread pkg/kthena-router/router/router.go Outdated

Fix fairness refresh consistency and token parsing

68c4b4c

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Merge branch 'main' into issue-757-request-cost-aware-fairness-priority

684c059

Copilot AI review requested due to automatic review settings April 22, 2026 04:50

volcano-sh-bot added the do-not-merge/contains-merge-commits label Apr 22, 2026

Copilot started reviewing on behalf of JagjeevanAK April 22, 2026 04:50 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go

Comment thread pkg/kthena-router/router/router.go

Comment thread pkg/kthena-router/datastore/fairness_queue.go

JagjeevanAK and others added 2 commits April 23, 2026 14:29

Merge branch 'main' into issue-757-request-cost-aware-fairness-priority

4c5b917

doc: proposal docs for cost aware request and fairness priority

7549cc8

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Copilot AI review requested due to automatic review settings April 23, 2026 09:08

volcano-sh-bot added size/XL and removed size/L labels Apr 23, 2026

Copilot started reviewing on behalf of JagjeevanAK April 23, 2026 09:08 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Conversation

JagjeevanAK commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PROPOSAL

Uh oh!

volcano-sh-bot commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

JagjeevanAK commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hzxuzhonghu commented Apr 20, 2026

Uh oh!

JagjeevanAK commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

volcano-sh-bot commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hzxuzhonghu commented Apr 23, 2026

Uh oh!

volcano-sh-bot commented Apr 23, 2026

Uh oh!

JagjeevanAK commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

hzxuzhonghu commented Apr 23, 2026

Uh oh!

JagjeevanAK commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JagjeevanAK commented Apr 19, 2026 •

edited

Loading

JagjeevanAK commented Apr 19, 2026 •

edited

Loading

JagjeevanAK commented Apr 22, 2026 •

edited

Loading