Skip to content

Negative output token count when reasoning > outputTokens (Kilo gateway / Moonshot kimi-k2.5 thinking) #9168

@lambertjosh

Description

@lambertjosh

Summary

A user-reported session export contains an assistant message with a negative output token count (-555). Root cause is an unclamped subtraction in getUsage combined with inconsistent usage accounting from the Kilo gateway for Moonshot kimi-k2.5 (thinking variant).

Repro data

Session: ses_2612c7860ffeWRHlgvs83y9pV2 (export provided by user)
Offending message: msg_d9ed3e053001D3EyHWNlRsnlI4
Provider: kilo (Kilo gateway)
Model: moonshotai/kimi-k2.5, variant thinking
Agent: code-review

Stored tokens on the message:

field value
total 55,963
input 3,746
output -555
reasoning 3,364
cache.read 49,408
cache.write 0

Reconstructing the raw gateway response (raw outputTokens = stored.output + stored.reasoning):

  • raw outputTokens = 2,809
  • raw reasoningTokens = 3,364
  • totalTokens = 55,963 ✓ (matches input + rawOutput + cacheRead)

So the Kilo gateway reported reasoningTokens (3364) > outputTokens (2809) for this turn. Every other assistant message in the same session had reasoning ≤ rawOutput and rendered fine.

Root cause

packages/opencode/src/session/index.ts:295:

output: outputTokens - reasoningTokens,

This assumes the Vercel AI SDK v6 convention that outputTokens includes reasoningTokens, so subtraction yields "visible output". The gateway violated that invariant here, producing a negative value that then propagates into step-finish.tokens, the UI, and session exports.

Impact

  • Display: negative numbers surface in the TUI, VS Code extension, and session export JSON.
  • Stats: packages/opencode/src/cli/cmd/stats.ts:214 and :274 sum tokens.output across messages, so lifetime stats are biased low when this quirk hits.
  • Cost: not affected for the kilo provider in this session — KiloSession.providerCost returns the gateway-reported cost before the token-based formula runs (session/index.ts:304-309). If it fell through to the formula, the negative output * rate would be offset by reasoning * rate (both use costInfo.output at :320 + :325), so net cost stays correct but per-bucket breakdowns would go negative.

Two things to investigate

  1. Gateway side: why does Moonshot kimi-k2.5 (thinking) via the Kilo gateway report reasoningTokens > outputTokens for some tool-call turns? Likely double-counting or truncation in the thinking-variant usage mapping.

  2. CLI side: getUsage should be defensive against this regardless. Suggested fix at session/index.ts:295:

    output: Math.max(0, outputTokens - reasoningTokens),

    Also worth logging a warning when reasoningTokens > outputTokens so we can measure how often this occurs and across which providers/models.

Files

  • packages/opencode/src/session/index.ts:258-329getUsage
  • packages/opencode/src/cli/cmd/stats.ts:214, :274 — downstream aggregation

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions