Skip to content

Request-cost-aware router fairness priority#897

Open
JagjeevanAK wants to merge 5 commits intovolcano-sh:mainfrom
JagjeevanAK:issue-757-request-cost-aware-fairness-priority
Open

Request-cost-aware router fairness priority#897
JagjeevanAK wants to merge 5 commits intovolcano-sh:mainfrom
JagjeevanAK:issue-757-request-cost-aware-fairness-priority

Conversation

@JagjeevanAK
Copy link
Copy Markdown
Contributor

@JagjeevanAK JagjeevanAK commented Apr 19, 2026

What type of PR is this?

/kind enhancement

What this PR does / why we need it:

This PR makes router fairness enqueue priority request-cost-aware by combining historical weighted usage with the current request’s estimated cost. The goal is to keep the existing sliding-window fairness model while making enqueue ordering reflect both prior usage and the size of the request being admitted.

Which issue(s) this PR fixes:
Fixes #757

Special notes for your reviewer:

The attached RFC plan as posted on issue while workspace and the implementation scope is intentionally narrow: it reuses the existing fairness weights, avoids API/CRD changes, and adds a V(4) enqueue debug log plus unit coverage for output-token parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating behavior.

I used AI assistance while preparing this PR description and implementation summary.

Does this PR introduce a user-facing change?:

Yes.

Router fairness enqueue priority now includes estimated current request cost in addition to historical weighted usage, improving admission ordering for larger requests.

PROPOSAL

I drafted an RFC-style proposal for this issue focused on a narrow fairness improvement in router enqueue priority.

Summary:
This proposal keeps the existing sliding-window fairness model, and adds estimated current request cost at enqueue time so requests are not ranked by historical aggregate alone.

Proposed priority:

  • estimatedRequestCost = inputWeight * inputTokens + outputWeight * requestedOutputTokens
  • finalPriority = historicalWeightedUsage + estimatedRequestCost
  • Lower numeric value still means higher priority (unchanged queue behavior)

Key changes:

  • Keep the sliding-window token tracker as the source of historical usage per (user, model)
  • Update router enqueue priority helper in router.go to use:
    • userID
    • modelName
    • inputTokens
    • parsed request body
  • Resolve requested output tokens in this order:
    • max_completion_tokens
    • max_tokens
    • else 0
  • Treat missing, invalid, or negative values as 0
  • Keep post-response accounting unchanged (real usage still updates token tracker after completion)

Configuration/logging scope:

  • Reuse only FAIRNESS_INPUT_TOKEN_WEIGHT and FAIRNESS_OUTPUT_TOKEN_WEIGHT for this issue
  • No CRD, CLI flag, or Helm value changes
  • No expansion into FAIRNESS_PRIORITY_REQUEST_NUM_WEIGHT in this issue
  • Add one V(4) enqueue debug log with:
    • requestID, userID, model, historicalUsage, inputTokens, requestedOutputTokens, estimatedRequestCost, finalPriority
  • No new metrics required

Compatibility:

  • No public API/CRD changes
  • Internal router helper signature changes only
  • Fairness queue semantics remain the same except for the new request-cost input

Test plan:

  • Unit tests for output token parsing:
    • max_completion_tokens, max_tokens, missing, invalid, negative
  • Unit tests for weight application and final score calculation
  • Router-level tests for:
    • request-size sensitivity with same historical usage
    • historical usage inclusion
    • zero/missing output budget fallback to input-only cost
    • debug-log gating behavior
  • Scope is unit tests only (no e2e/load testing in this issue)

Non-goals for this issue:

  • Cross-model fairness
  • Request-count weighting redesign
  • Queue refresh redesign
  • HA/distributed fairness state

…dded the required V(4) enqueue debug log fields, and added targeted router unit tests for output-token parsing, weight application, historical-usage inclusion, request-size sensitivity, and log-gating behavior

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
Copilot AI review requested due to automatic review settings April 19, 2026 01:57
@volcano-sh-bot volcano-sh-bot added the kind/enhancement New feature or request label Apr 19, 2026
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

Welcome @JagjeevanAK! It looks like this is your first PR to volcano-sh/kthena 🎉

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the fairness scheduling logic by replacing generic token and request weights with specific input and output token weights. It introduces a new priority calculation that combines historical usage with an estimated request cost based on input tokens and requested output tokens. Comprehensive tests were added to verify these changes. Feedback highlights the need to handle multiple numeric types when parsing token fields and identifies inconsistencies between the new router logic and the existing datastore priority refresh mechanism.

Comment thread pkg/kthena-router/router/router.go Outdated
Comment thread pkg/kthena-router/router/router.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the router’s fairness enqueue priority to account for the estimated cost of the current request (input + requested output tokens) in addition to the user/model’s historical weighted usage, improving admission ordering for expensive requests while keeping the sliding-window fairness model.

Changes:

  • Replace enqueue priority calculation with historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens).
  • Add parsing for requested output token budget from max_completion_tokens / max_tokens.
  • Add V(4) debug logging for enqueue priority components and introduce unit tests for the new behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/kthena-router/router/router.go Implements cost-aware enqueue priority calculation and adds debug logging for priority components.
pkg/kthena-router/router/router_fairness_test.go Adds unit tests covering output-budget parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/kthena-router/router/router.go
Comment thread pkg/kthena-router/router/router.go Outdated
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
@JagjeevanAK
Copy link
Copy Markdown
Contributor Author

JagjeevanAK commented Apr 19, 2026

Addressed AI review findings in commit 68c4b4c on this PR branch:

  1. Requested output-token parsing now accepts float64, int, int64, and json.Number (invalid or negative values still fallback to 0).
  2. Fairness queue refresh/rebuild now preserves enqueue-time request-cost contribution via a stored priority offset, so refresh no longer drops request-size sensitivity.

Scope note for maintainers: for issue #757 this change intentionally uses FAIRNESS_INPUT_TOKEN_WEIGHT and FAIRNESS_OUTPUT_TOKEN_WEIGHT for request-cost-aware enqueue behavior, while broader compatibility/deprecation handling around FAIRNESS_PRIORITY_* is being kept out of this narrowly scoped issue unless maintainers want it folded in here.

Proposal context is documented on issue #757: #757 (comment)

@hzxuzhonghu
Copy link
Copy Markdown
Member

I would also like to see the proposal doc in this pr

@JagjeevanAK
Copy link
Copy Markdown
Contributor Author

JagjeevanAK commented Apr 22, 2026

Sure, I have added Proposal in PR description.

Copilot AI review requested due to automatic review settings April 22, 2026 04:50
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

Adding label do-not-merge/contains-merge-commits because PR contains merge commits, which are not allowed in this repository.
Use git rebase to reapply your commits on top of the target branch. Detailed instructions for doing so can be found here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the router’s fairness scheduling by making enqueue priority request-cost-aware (estimated input + requested output tokens) while preserving the existing sliding-window historical usage model.

Changes:

  • Add request-cost components (input tokens + requested output token budget) into enqueue priority calculation and log detailed enqueue priority at klog V(4).
  • Persist a per-request PriorityOffset so dequeue-time priority refresh can preserve request-size sensitivity.
  • Add unit tests covering output-budget parsing and priority composition behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/kthena-router/router/router.go Computes request-cost-aware enqueue priority; adds V(4) enqueue debug log; stores per-request offset for refresh.
pkg/kthena-router/datastore/fairness_queue.go Preserves request-size component across priority refresh/rebuild via PriorityOffset.
pkg/kthena-router/datastore/fairness_queue_test.go Adds tests ensuring offset is preserved across refresh and heap rebuild.
pkg/kthena-router/router/router_fairness_test.go Adds unit tests for output token parsing and request-cost-aware priority behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/kthena-router/router/router.go
Comment thread pkg/kthena-router/router/router.go
Comment thread pkg/kthena-router/datastore/fairness_queue.go
@hzxuzhonghu
Copy link
Copy Markdown
Member

@JagjeevanAK Mind adding a doc under https://github.com/volcano-sh/kthena/tree/main/docs/proposal

Copilot AI review requested due to automatic review settings April 23, 2026 09:08
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lizhencheng9527 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JagjeevanAK
Copy link
Copy Markdown
Contributor Author

hey can you check now ?

CC: @hzxuzhonghu

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances Kthena router fairness scheduling so enqueue ordering is request-cost-aware by combining historical sliding-window usage with an estimated cost for the in-flight request, while preserving request-size sensitivity across queue priority refreshes.

Changes:

  • Compute enqueue priority as historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens) and add a V(4) debug log for the priority components.
  • Persist an enqueue-time PriorityOffset on queued requests so dequeue-time refresh/rebuild keeps request-size sensitivity.
  • Add unit tests for output-budget parsing, priority calculation, request-size sensitivity, and offset preservation during refresh/rebuild.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/kthena-router/router/router.go Implements request-cost-aware enqueue priority calculation, logging, and enqueues PriorityOffset.
pkg/kthena-router/datastore/fairness_queue.go Preserves request-cost component during priority refresh/rebuild via PriorityOffset.
pkg/kthena-router/router/router_fairness_test.go Adds unit tests for output token parsing, cost calculation, request-size sensitivity, and log gating.
pkg/kthena-router/datastore/fairness_queue_test.go Adds tests asserting PriorityOffset is preserved across refresh and heap rebuild.
docs/proposal/request-cost-aware-fairness-priority.md Adds an RFC-style proposal documenting the intended behavior and scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +244 to +250
func (r *Router) calculateRequestPriority(userID, modelName string, inputTokens int, modelRequest ModelRequest) enqueuePriorityDetails {
historicalUsage, err := r.store.GetTokenCount(userID, modelName)
if err != nil {
klog.Warningf("failed to get fairness usage for user=%s model=%s: %v", userID, modelName, err)
historicalUsage = 0
}

t.Fatalf("Expected refreshed priority 15.0 with offset preserved, got %v", result.Priority)
}
}

Comment on lines 226 to 237
newPri, err := CalculateFairnessPriority(
pq.tokenTracker,
req.UserID,
req.ModelName,
pq.config.TokenWeight,
pq.config.RequestNumWeight,
)
if err == nil && newPri != req.Priority {
newPri += req.PriorityOffset
req.Priority = newPri
// Check if this request should still be dequeued
if len(pq.heap) > 0 && newPri > pq.heap[0].Priority {
@hzxuzhonghu
Copy link
Copy Markdown
Member

Donot contain merge-commits , i will first take a look at the proposal

@JagjeevanAK
Copy link
Copy Markdown
Contributor Author

Okay sorry for that will keep in mind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve router request priority calculation beyond simple token count

4 participants