Merge pull request #108 from getagentseal/fix/menubar-all-tab-stale-refresh

iamtoruk · web-flow · commit 44d5ffa03f9b · 2026-04-19T08:59:09.000-07:00
Release 0.8.0: model comparison, auto-refresh, menubar fix
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,22 @@
 
 ## Unreleased
 
+## 0.8.0 - 2026-04-19
+
+### Added
+- **`codeburn compare` command.** Side-by-side model comparison across any two models in your session data. Interactive model picker, period switching, and provider filtering.
+- **Compare view in dashboard.** Press `c` in the TUI to enter compare mode. Arrow keys switch periods, `b` to return.
+- **Performance metrics.** One-shot rate, retry rate, and self-correction detection per model. Self-corrections are detected by scanning JSONL transcripts for tool error followed by retry patterns.
+- **Efficiency metrics.** Cost per call, cost per edit turn, output tokens per call, and cache hit rate.
+- **Per-category one-shot rates.** Breaks down one-shot success by task category (Coding, Debugging, Feature Dev, etc.) for each model.
+- **Working style comparison.** Delegation rate, planning rate (TaskCreate, TaskUpdate, TodoWrite), average tools per turn, and fast mode usage.
+- **TUI auto-refresh enabled by default.** Dashboard now refreshes every 30 seconds out of the box. Pass `--refresh 0` to disable. Closes #107.
+- **36 comparison tests.** Full coverage for metric computation, category breakdown, working style, self-correction scanning, and planning tool detection. Total suite: 274 tests.
+
+### Fixed
+- **Planning rate showed ~0% in model comparison.** Only counted `EnterPlanMode` (rarely used) instead of all planning tools (TaskCreate, TaskUpdate, TodoWrite, EnterPlanMode, ExitPlanMode). Now detects planning at the turn level across all five tool types.
+- **Menubar "All" tab showed stale data.** Three-layer caching (300s in-memory TTL, daily disk cache, 60s parser cache) prevented tab switches from showing fresh numbers. Cache TTL reduced from 300s to 30s, tab switches always fetch fresh data, background refresh interval reduced from 60s to 15s.
+
 ## 0.7.4 - 2026-04-19
 
 ### Added
diff --git a/README.md b/README.md
@@ -51,7 +51,7 @@ codeburn report -p 30days       # rolling 30-day window
 codeburn report -p all          # every recorded session
 codeburn report --from 2026-04-01 --to 2026-04-10  # exact date range
 codeburn report --format json   # full dashboard data as JSON
-codeburn report --refresh 60    # auto-refresh every 60 seconds
+codeburn report --refresh 60    # auto-refresh every 60s (default: 30s)
 codeburn status                 # compact one-liner (today + month)
 codeburn status --format json
 codeburn export                 # CSV with today, 7 days, 30 days
@@ -60,7 +60,7 @@ codeburn optimize               # find waste, get copy-paste fixes
 codeburn optimize -p week       # scope the scan to last 7 days
 ```
 
-Arrow keys switch between Today / 7 Days / 30 Days / Month / All Time. Press `q` to quit, `1` `2` `3` `4` `5` as shortcuts. The dashboard also shows average cost per session and the five most expensive sessions across all projects.
+Arrow keys switch between Today / 7 Days / 30 Days / Month / All Time. Press `q` to quit, `1` `2` `3` `4` `5` as shortcuts, `c` to open model comparison. The dashboard auto-refreshes every 30 seconds by default (`--refresh 0` to disable). The dashboard also shows average cost per session and the five most expensive sessions across all projects.
 
 ### JSON output
 
@@ -176,7 +176,7 @@ The menu bar widget includes a currency picker with 17 common currencies. For an
 npx codeburn menubar
 ```
 
-One command: downloads the latest `.app`, installs into `~/Applications`, and launches it. Re-run with `--force` to reinstall. Native Swift + SwiftUI app lives in `mac/` (see `mac/README.md` for build details). Shows today's cost with a flame icon, opens a popover with agent tabs, period switcher (Today / 7 Days / 30 Days / Month / All), Trend / Forecast / Pulse / Stats / Plan insights, activity and model breakdowns, optimize findings, and CSV/JSON export. Refreshes live via FSEvents plus a 60-second poll.
+One command: downloads the latest `.app`, installs into `~/Applications`, and launches it. Re-run with `--force` to reinstall. Native Swift + SwiftUI app lives in `mac/` (see `mac/README.md` for build details). Shows today's cost with a flame icon, opens a popover with agent tabs, period switcher (Today / 7 Days / 30 Days / Month / All), Trend / Forecast / Pulse / Stats / Plan insights, activity and model breakdowns, optimize findings, and CSV/JSON export. Refreshes live via FSEvents plus a 15-second poll.
 
 ## What it tracks
 
@@ -250,6 +250,37 @@ Each finding shows the estimated token and dollar savings plus a ready-to-paste
 
 You can also open it inline from the dashboard: press `o` when a finding count appears in the status bar, `b` to return.
 
+## Compare
+
+Side-by-side model comparison across any two models in your session data. Pick any pair and see how they stack up on real usage from your own sessions.
+
+```bash
+codeburn compare                        # interactive model picker (default: all time)
+codeburn compare -p week                # last 7 days
+codeburn compare -p today               # today only
+codeburn compare --provider claude      # Claude Code sessions only
+```
+
+Or press `c` in the dashboard to enter compare mode. Arrow keys switch periods, `b` to return.
+
+**Metrics compared**
+
+| Section | Metric | What it measures |
+|---------|--------|-----------------|
+| Performance | One-shot rate | Edits that succeed without retries |
+| Performance | Retry rate | Average retries per edit turn |
+| Performance | Self-correction | Turns where the model corrected its own mistake |
+| Efficiency | Cost / call | Average cost per API call |
+| Efficiency | Cost / edit | Average cost per edit turn |
+| Efficiency | Output tok / call | Average output tokens per call |
+| Efficiency | Cache hit rate | Proportion of input from cache |
+
+**Per-category one-shot rates.** Breaks down one-shot success by task category (Coding, Debugging, Feature Dev, etc.) so you can see where each model excels or struggles.
+
+**Working style.** Compares delegation rate (agent spawns), planning rate (TaskCreate, TaskUpdate, TodoWrite usage), average tools per turn, and fast mode usage.
+
+All metrics are computed from your local session data. No LLM calls, fully deterministic.
+
 ## How it reads data
 
 **Claude Code** stores session transcripts as JSONL at `~/.claude/projects/<sanitized-path>/<session-id>.jsonl`. Each assistant entry contains model name, token usage (input, output, cache read, cache write), tool_use blocks, and timestamps.
@@ -280,6 +311,7 @@ src/
   parser.ts       JSONL reader, dedup, date filter, provider orchestration
   models.ts       LiteLLM pricing, cost calculation
   classifier.ts   13-category task classifier
+  compare-stats.ts Model comparison engine (metrics, category breakdown, working style)
   types.ts        Type definitions
   format.ts       Text rendering (status bar)
   menubar-json.ts Payload builder consumed by the native macOS menubar app in mac/
diff --git a/mac/Sources/CodeBurnMenubar/AppStore.swift b/mac/Sources/CodeBurnMenubar/AppStore.swift
@@ -1,7 +1,7 @@
 import Foundation
 import Observation
 
-private let cacheTTLSeconds: TimeInterval = 300
+private let cacheTTLSeconds: TimeInterval = 30
 
 struct CachedPayload {
     let payload: MenubarPayload
@@ -52,17 +52,15 @@ final class AppStore {
         payload.optimize.findingCount
     }
 
-    /// Switch to a period. Uses cached payload if fresh; otherwise fetches.
+    /// Switch to a period. Always fetches fresh data so the user never sees stale numbers.
     func switchTo(period: Period) async {
         selectedPeriod = period
-        if let cached = cache[currentKey], cached.isFresh { return }
         await refresh(includeOptimize: true)
     }
 
-    /// Switch to a provider filter. Uses cached payload if fresh; otherwise fetches.
+    /// Switch to a provider filter. Always fetches fresh data so the user never sees stale numbers.
     func switchTo(provider: ProviderFilter) async {
         selectedProvider = provider
-        if let cached = cache[currentKey], cached.isFresh { return }
         await refresh(includeOptimize: true)
     }
 
diff --git a/mac/Sources/CodeBurnMenubar/CodeBurnApp.swift b/mac/Sources/CodeBurnMenubar/CodeBurnApp.swift
@@ -2,7 +2,7 @@ import SwiftUI
 import AppKit
 import Observation
 
-private let refreshIntervalSeconds: UInt64 = 60
+private let refreshIntervalSeconds: UInt64 = 15
 private let nanosPerSecond: UInt64 = 1_000_000_000
 private let refreshIntervalNanos: UInt64 = refreshIntervalSeconds * nanosPerSecond
 /// Fixed so the popover's anchor point doesn't shift each time today's cost changes.
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "codeburn",
-  "version": "0.7.4",
+  "version": "0.8.0",
   "description": "See where your AI coding tokens go - by task, tool, model, and project",
   "type": "module",
   "main": "./dist/cli.js",
diff --git a/src/cli.ts b/src/cli.ts
@@ -247,7 +247,7 @@ program
   .option('--format <format>', 'Output format: tui, json', 'tui')
   .option('--project <name>', 'Show only projects matching name (repeatable)', collect, [])
   .option('--exclude <name>', 'Exclude projects matching name (repeatable)', collect, [])
-  .option('--refresh <seconds>', 'Auto-refresh interval in seconds', parseInt)
+  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInt, 30)
   .action(async (opts) => {
     let customRange: DateRange | null = null
     try {
@@ -502,7 +502,7 @@ program
   .option('--format <format>', 'Output format: tui, json', 'tui')
   .option('--project <name>', 'Show only projects matching name (repeatable)', collect, [])
   .option('--exclude <name>', 'Exclude projects matching name (repeatable)', collect, [])
-  .option('--refresh <seconds>', 'Auto-refresh interval in seconds', parseInt)
+  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInt, 30)
   .action(async (opts) => {
     if (opts.format === 'json') {
       await runJsonReport('today', opts.provider, opts.project, opts.exclude)
@@ -518,7 +518,7 @@ program
   .option('--format <format>', 'Output format: tui, json', 'tui')
   .option('--project <name>', 'Show only projects matching name (repeatable)', collect, [])
   .option('--exclude <name>', 'Exclude projects matching name (repeatable)', collect, [])
-  .option('--refresh <seconds>', 'Auto-refresh interval in seconds', parseInt)
+  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInt, 30)
   .action(async (opts) => {
     if (opts.format === 'json') {
       await runJsonReport('month', opts.provider, opts.project, opts.exclude)

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "codeburn",`
`3`		`- "version": "0.7.4",`
	`3`	`+ "version": "0.8.0",`
`4`	`4`	`"description": "See where your AI coding tokens go - by task, tool, model, and project",`
`5`	`5`	`"type": "module",`
`6`	`6`	`"main": "./dist/cli.js",`