You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,22 @@
2
2
3
3
## Unreleased
4
4
5
+
## 0.8.0 - 2026-04-19
6
+
7
+
### Added
8
+
-**`codeburn compare` command.** Side-by-side model comparison across any two models in your session data. Interactive model picker, period switching, and provider filtering.
9
+
-**Compare view in dashboard.** Press `c` in the TUI to enter compare mode. Arrow keys switch periods, `b` to return.
10
+
-**Performance metrics.** One-shot rate, retry rate, and self-correction detection per model. Self-corrections are detected by scanning JSONL transcripts for tool error followed by retry patterns.
11
+
-**Efficiency metrics.** Cost per call, cost per edit turn, output tokens per call, and cache hit rate.
12
+
-**Per-category one-shot rates.** Breaks down one-shot success by task category (Coding, Debugging, Feature Dev, etc.) for each model.
13
+
-**Working style comparison.** Delegation rate, planning rate (TaskCreate, TaskUpdate, TodoWrite), average tools per turn, and fast mode usage.
14
+
-**TUI auto-refresh enabled by default.** Dashboard now refreshes every 30 seconds out of the box. Pass `--refresh 0` to disable. Closes #107.
15
+
-**36 comparison tests.** Full coverage for metric computation, category breakdown, working style, self-correction scanning, and planning tool detection. Total suite: 274 tests.
16
+
17
+
### Fixed
18
+
-**Planning rate showed ~0% in model comparison.** Only counted `EnterPlanMode` (rarely used) instead of all planning tools (TaskCreate, TaskUpdate, TodoWrite, EnterPlanMode, ExitPlanMode). Now detects planning at the turn level across all five tool types.
19
+
-**Menubar "All" tab showed stale data.** Three-layer caching (300s in-memory TTL, daily disk cache, 60s parser cache) prevented tab switches from showing fresh numbers. Cache TTL reduced from 300s to 30s, tab switches always fetch fresh data, background refresh interval reduced from 60s to 15s.
codeburn optimize -p week # scope the scan to last 7 days
61
61
```
62
62
63
-
Arrow keys switch between Today / 7 Days / 30 Days / Month / All Time. Press `q` to quit, `1``2``3``4``5` as shortcuts. The dashboard also shows average cost per session and the five most expensive sessions across all projects.
63
+
Arrow keys switch between Today / 7 Days / 30 Days / Month / All Time. Press `q` to quit, `1``2``3``4``5` as shortcuts, `c` to open model comparison. The dashboard auto-refreshes every 30 seconds by default (`--refresh 0` to disable). The dashboard also shows average cost per session and the five most expensive sessions across all projects.
64
64
65
65
### JSON output
66
66
@@ -176,7 +176,7 @@ The menu bar widget includes a currency picker with 17 common currencies. For an
176
176
npx codeburn menubar
177
177
```
178
178
179
-
One command: downloads the latest `.app`, installs into `~/Applications`, and launches it. Re-run with `--force` to reinstall. Native Swift + SwiftUI app lives in `mac/` (see `mac/README.md` for build details). Shows today's cost with a flame icon, opens a popover with agent tabs, period switcher (Today / 7 Days / 30 Days / Month / All), Trend / Forecast / Pulse / Stats / Plan insights, activity and model breakdowns, optimize findings, and CSV/JSON export. Refreshes live via FSEvents plus a 60-second poll.
179
+
One command: downloads the latest `.app`, installs into `~/Applications`, and launches it. Re-run with `--force` to reinstall. Native Swift + SwiftUI app lives in `mac/` (see `mac/README.md` for build details). Shows today's cost with a flame icon, opens a popover with agent tabs, period switcher (Today / 7 Days / 30 Days / Month / All), Trend / Forecast / Pulse / Stats / Plan insights, activity and model breakdowns, optimize findings, and CSV/JSON export. Refreshes live via FSEvents plus a 15-second poll.
180
180
181
181
## What it tracks
182
182
@@ -250,6 +250,37 @@ Each finding shows the estimated token and dollar savings plus a ready-to-paste
250
250
251
251
You can also open it inline from the dashboard: press `o` when a finding count appears in the status bar, `b` to return.
252
252
253
+
## Compare
254
+
255
+
Side-by-side model comparison across any two models in your session data. Pick any pair and see how they stack up on real usage from your own sessions.
256
+
257
+
```bash
258
+
codeburn compare # interactive model picker (default: all time)
259
+
codeburn compare -p week # last 7 days
260
+
codeburn compare -p today # today only
261
+
codeburn compare --provider claude # Claude Code sessions only
262
+
```
263
+
264
+
Or press `c` in the dashboard to enter compare mode. Arrow keys switch periods, `b` to return.
265
+
266
+
**Metrics compared**
267
+
268
+
| Section | Metric | What it measures |
269
+
|---------|--------|-----------------|
270
+
| Performance | One-shot rate | Edits that succeed without retries |
271
+
| Performance | Retry rate | Average retries per edit turn |
272
+
| Performance | Self-correction | Turns where the model corrected its own mistake |
273
+
| Efficiency | Cost / call | Average cost per API call |
274
+
| Efficiency | Cost / edit | Average cost per edit turn |
275
+
| Efficiency | Output tok / call | Average output tokens per call |
276
+
| Efficiency | Cache hit rate | Proportion of input from cache |
277
+
278
+
**Per-category one-shot rates.** Breaks down one-shot success by task category (Coding, Debugging, Feature Dev, etc.) so you can see where each model excels or struggles.
279
+
280
+
**Working style.** Compares delegation rate (agent spawns), planning rate (TaskCreate, TaskUpdate, TodoWrite usage), average tools per turn, and fast mode usage.
281
+
282
+
All metrics are computed from your local session data. No LLM calls, fully deterministic.
283
+
253
284
## How it reads data
254
285
255
286
**Claude Code** stores session transcripts as JSONL at `~/.claude/projects/<sanitized-path>/<session-id>.jsonl`. Each assistant entry contains model name, token usage (input, output, cache read, cache write), tool_use blocks, and timestamps.
@@ -280,6 +311,7 @@ src/
280
311
parser.ts JSONL reader, dedup, date filter, provider orchestration
281
312
models.ts LiteLLM pricing, cost calculation
282
313
classifier.ts 13-category task classifier
314
+
compare-stats.ts Model comparison engine (metrics, category breakdown, working style)
283
315
types.ts Type definitions
284
316
format.ts Text rendering (status bar)
285
317
menubar-json.ts Payload builder consumed by the native macOS menubar app in mac/
0 commit comments