|
| 1 | +# ADR-007: Access Metadata Schema + DecayConfig Enable |
| 2 | + |
| 3 | +**Status:** Planned — not yet drafted |
| 4 | +**Depends on:** ADR-004 shipped |
| 5 | +**Related:** Consolidation improvements |
| 6 | + |
| 7 | +## What this ADR will decide |
| 8 | + |
| 9 | +Two coupled decisions: |
| 10 | +1. **Schema for access metadata** — how to track `last_retrieved_at`, `retrieval_count`, and possibly `last_modified_at` per memory file. |
| 11 | +2. **Enable `DecayConfig`** — turn on the existing (but disabled) decay configuration in `palinode/core/config.py` and use its tau values as a ranker term. |
| 12 | + |
| 13 | +These are in the same ADR because the decay ranker term can't exist without access metadata to feed it. |
| 14 | + |
| 15 | +## What this ADR will NOT decide |
| 16 | + |
| 17 | +- Whether to delete files based on decay. Nothing gets deleted. Decay is a ranker weight only. |
| 18 | +- Whether to compact / archive based on decay. Separate decision (probably ADR-005 territory). |
| 19 | +- On-read reconsolidation (ADR-006) — although access metadata is a prerequisite for ADR-006, the decision of *whether* to build ADR-006 is independent of this one. |
| 20 | + |
| 21 | +## Prerequisites before drafting |
| 22 | + |
| 23 | +- [ ] ADR-004 shipped (write-time contradiction check) — this ADR touches the same save path |
| 24 | +- [ ] Real data on how often files are actually retrieved in a typical week (so tau tuning has ground truth) |
| 25 | +- [ ] Real data on retrieval distribution — long-tailed or uniform? Informs whether decay actually matters |
| 26 | + |
| 27 | +## Questions to think through when writing |
| 28 | + |
| 29 | +### Where does access metadata live? |
| 30 | + |
| 31 | +**Three-way tension:** |
| 32 | + |
| 33 | +1. **Frontmatter in the file itself.** Source of truth, git-diffable, survives everything. But: every retrieval causes a file write, which causes a git commit, which causes a watcher reindex, which is expensive and noisy. |
| 34 | + |
| 35 | +2. **SQLite table in `.palinode.db`.** Fast to update, fast to query from the ranker. But: not part of the source-of-truth markdown, can drift from file state, lost if the db is rebuilt. |
| 36 | + |
| 37 | +3. **Hybrid: SQLite is live, periodic flush to frontmatter.** Best of both, but adds a "flush" operation that has its own failure modes. |
| 38 | + |
| 39 | +My current lean (not a decision): **SQLite only for live access metadata, no frontmatter persistence.** Rationale: access metadata is ephemeral by nature — it's a signal about what's hot *right now*. If the db is rebuilt, the ranker resets to "everything is equally cold," which is a fine reset state. Not losing information you can't reconstruct. |
| 40 | + |
| 41 | +But: if ADR-006 ever wants to use retrieval history ("this file was retrieved 10 times last week, so it's important"), SQLite-only breaks that. Worth weighing. |
| 42 | + |
| 43 | +### What fields does the schema need? |
| 44 | + |
| 45 | +Minimum: |
| 46 | +- `file_path` (PK) |
| 47 | +- `last_retrieved_at` (timestamp) |
| 48 | +- `retrieval_count` (integer, monotonic) |
| 49 | + |
| 50 | +Possibly: |
| 51 | +- `last_retrieved_by` (session id? user id? source surface like "claude-code"?) |
| 52 | +- `retrieval_count_7d` / `retrieval_count_30d` (decay-window counts, maintained by a sweeper) |
| 53 | +- `last_modified_at` (file mtime at last index) |
| 54 | +- `contradiction_check_count` (how many times tier 2a has run on this file) |
| 55 | + |
| 56 | +Don't add fields speculatively. Add only what the ranker or a future ADR concretely needs. |
| 57 | + |
| 58 | +### What's a "retrieval"? |
| 59 | + |
| 60 | +Same question as ADR-006 (see that ADR for the option list). The choice here matters for decay: |
| 61 | +- If automatic context injection counts as a retrieval, every file in core memory gets bumped on every turn → "core" never decays → access metadata is useless as a ranker signal |
| 62 | +- If only explicit tool calls count, decay has signal but core files look stale → the ranker might downrank core files, which is wrong |
| 63 | + |
| 64 | +**Likely answer:** access metadata distinguishes *explicit* reads (tool calls) from *passive* injection. Only explicit reads update `retrieval_count`. Passive injection updates `last_injected_at` separately, or not at all. |
| 65 | + |
| 66 | +This is a schema-design decision that has to be made in ADR-007. |
| 67 | + |
| 68 | +### Is decay actually useful? |
| 69 | + |
| 70 | +The existing `DecayConfig` has tau values: |
| 71 | +- `tau_critical: 180` days |
| 72 | +- `tau_decisions: 60` |
| 73 | +- `tau_insights: 90` |
| 74 | +- `tau_general: 30` |
| 75 | +- `tau_status: 7` |
| 76 | +- `tau_ephemeral: 1` |
| 77 | + |
| 78 | +These are guesses from before any data existed. Before this ADR can ship a "use decay in the ranker" decision, we need: |
| 79 | +- Evidence that unused memories are actually hurting the ranker (e.g., stale status files surfacing in search results ahead of current ones) |
| 80 | +- Empirical tau values, tuned from real retrieval patterns |
| 81 | + |
| 82 | +If neither of those is in hand, the ADR should narrow to **"add access metadata schema only, defer decay enabling."** Half an ADR. |
| 83 | + |
| 84 | +### How does this interact with the ranker? |
| 85 | + |
| 86 | +Current ranker (`palinode/core/store.py`) uses hybrid BM25 + vector similarity fused via RRF. Adding a decay term means: |
| 87 | +- Either multiply the existing RRF score by a decay factor (e.g., `score * exp(-age / tau)`) |
| 88 | +- Or add decay as a third signal in the fusion |
| 89 | + |
| 90 | +The ADR needs to pick one and justify it. Multiplication is simpler but makes decay dominate when age is large; fusion is more principled but introduces a tuning parameter. |
| 91 | + |
| 92 | +### How does this interact with the watcher? |
| 93 | + |
| 94 | +The watcher daemon reindexes on file modify. If access metadata is in frontmatter, every retrieval triggers a file modify, which triggers a reindex, which could create a feedback loop. |
| 95 | + |
| 96 | +If access metadata is SQLite-only (my current lean), the watcher doesn't need to know about it at all — it stays focused on content indexing. |
| 97 | + |
| 98 | +### Failure modes |
| 99 | + |
| 100 | +- SQLite table corrupts → ranker loses decay signal, search still works (falls back to BM25+vector only) |
| 101 | +- Access metadata gets out of sync with actual file state → stale last_retrieved_at, but self-corrects on next retrieval |
| 102 | +- Ranker heavily favors recently-retrieved files → recency bias → user complains about "why does it keep showing me the thing I just read?" Need a brake. |
| 103 | + |
| 104 | +### What telemetry do we need to tune tau? |
| 105 | + |
| 106 | +- Histogram of retrieval intervals per file (time between retrievals) |
| 107 | +- Distribution of retrieval_count per file at end of a week |
| 108 | +- Correlation between "last retrieved > X days ago" and "never retrieved again" (predictive value of decay) |
| 109 | + |
| 110 | +Without telemetry, tuning tau is guessing. The ADR should ship telemetry alongside the feature. |
| 111 | + |
| 112 | +## Draft scope when written |
| 113 | + |
| 114 | +Single coupled decision: access metadata schema + enable decay as ranker term. The two are bundled because one can't ship without the other (decay needs metadata to read from). |
| 115 | + |
| 116 | +Possible split: ship access metadata schema alone first (ADR-007a), then decide decay enabling separately (ADR-007b) after the metadata has been collecting data for a few weeks. This is probably the right move if real data isn't available at draft time. |
| 117 | + |
| 118 | +Explicitly out of scope: on-read reconsolidation (ADR-006), decay-based deletion (never — not a thing we do), decay-based archival (ADR-005 territory). |
| 119 | + |
| 120 | +## Revisit trigger |
| 121 | + |
| 122 | +Draft this ADR when: |
| 123 | +- ADR-004 has been in production long enough that we can instrument retrieval |
| 124 | +- OR we hit a concrete ranker problem that access metadata would solve (stale results outranking fresh ones in user-visible search output) |
| 125 | +- OR we decide we want ADR-006 and need access metadata as a prerequisite |
| 126 | + |
| 127 | +No rush. This ADR is valuable but not urgent. |
0 commit comments