Skip to content

Commit c6eebdc

Browse files
Paul Kyleclaude
andcommitted
feat: v0.6.0 — write-time contradiction check, ambient context search, Claude Code plugin + skills
New features: - Write-time contradiction check (ADR-004): detects conflicting memories at save time - Ambient context search (ADR-008): CWD-based project boosting for multi-project memory stores - Claude Code plugin scaffold for marketplace submission - 3 Claude Code skills: palinode-claude-code, palinode-memory, palinode-session - ADR-004 through ADR-008 documenting consolidation and search architecture Docs: - WHY-LOCAL-MEMORY.md positioning document - Research paper: Memory Compaction and Augmented Recall for Persistent AI Agents - PRD (product requirements document) - v0.6.0 CHANGELOG Changed: - MCP server: context-aware search, Streamable HTTP transport - API server: cleaned up endpoints, removed internal migration tooling - CLI: updated search/save, removed internal migrate commands - Consolidation: cron improvements, write-time worker - Version bump: 0.5.0 → 0.6.0 Removed: - palinode/migration/ (internal tooling) - plugin/ (replaced by claude-plugin/ scaffold) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b012832 commit c6eebdc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+6633
-2601
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ plugin/dist/
4141
palinode.egg-info/
4242
__pycache__/
4343
*.pyc
44+
*.cover
4445
venv/
4546

4647
# Server logs

ADR-004-event-driven-consolidation.md

Lines changed: 340 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# ADR-005: Debounced Reflection Executor (Tier 2b)
2+
3+
**Status:** Planned — not yet drafted
4+
**Depends on:** ADR-004 shipped and validated in production for at least 2 weeks
5+
**Related:** Consolidation improvements
6+
7+
## What this ADR will decide
8+
9+
Whether to build a background "reflection executor" that coalesces write-time contradiction checks into batched reflection passes, fired by a debounce threshold rather than a wall clock.
10+
11+
Pattern reference: LangMem's `ReflectionExecutor` — fire on N pending events OR K minutes idle, whichever first, reset the timer on new events.
12+
13+
## What this ADR will NOT decide
14+
15+
- Whether to remove the write-time synchronous path from ADR-004. That stays regardless.
16+
- On-read reconsolidation (ADR-006)
17+
- Access metadata schema (ADR-007)
18+
19+
## Prerequisites before drafting
20+
21+
- [ ] ADR-004 (tier 2a) is shipped and has been running in production for ≥ 2 weeks
22+
- [ ] We have real data on write-time check latency under typical load
23+
- [ ] We have real data on LLM call volume and cost under typical load
24+
- [ ] We have real data on contradiction-check accuracy (false positives, missed contradictions)
25+
- [ ] We know whether the async queue from ADR-004 is actually getting backed up, or whether per-save checks are fast enough to not need coalescing
26+
27+
## Questions to think through when writing
28+
29+
### Is this actually needed?
30+
31+
**The whole premise of ADR-005 is that write-time checks need to be coalesced.** If real-world data from ADR-004 shows per-save LLM calls are cheap enough and fast enough that no coalescing is needed, ADR-005 doesn't exist — we just let the tier 2a queue drain naturally and we're done.
32+
33+
Write the first section of ADR-005 as "when ADR-005 should NOT be written": include the concrete metrics (p50 check latency, queue depth, LLM call cost) that would make ADR-005 unnecessary. If those metrics are met by ADR-004 alone, close this ADR with status: "not needed."
34+
35+
### What's the trigger function?
36+
37+
Options to consider:
38+
- **Count-based:** fire after N pending checks accumulate
39+
- **Time-based:** fire K minutes after the most recent save
40+
- **Entity-based:** fire when the same entity file has been touched N times
41+
- **Weight-based:** each save contributes "pending weight" based on type/importance, fire at threshold
42+
- **Hybrid:** count OR time, whichever first (LangMem's choice)
43+
44+
Don't pick one until we have real data from ADR-004 about which signal is actually predictive of "this is a good time to consolidate."
45+
46+
### What does "scoped to touched entities" mean?
47+
48+
v1 draft said "reflection pass scoped to touched entities." Concretely:
49+
- If 5 facts about `project/palinode` arrive in 15 minutes, the reflection pass only looks at `projects/palinode.md` and `projects/palinode-status.md`?
50+
- Or does it also pull in related entities via graph walk?
51+
- How do we compute the scope without scanning the whole store?
52+
- Does the scope include facts that share vector similarity even if they don't share an entity ref?
53+
54+
### Where does the pending-state live?
55+
56+
Options:
57+
- **In-process queue** (in the API server, like ADR-004's write-time queue)
58+
- **SQLite table** (survives restarts, queryable for dashboards)
59+
- **File-based markers** (matches ADR-004's CLI/plugin path, disk-durable)
60+
61+
The answer probably depends on whether ADR-004's asyncio queue has been causing problems. If it has, ADR-005 might need a more durable store.
62+
63+
### What happens to the nightly cron?
64+
65+
This is the decision that belongs in ADR-005, not ADR-004. Options:
66+
- **Remove nightly cron entirely when tier 2b ships** — cleanest, but loses a fallback for very idle systems
67+
- **Keep nightly cron as "catch-up" for entities that didn't cross the debounce threshold** — hybrid, might be redundant
68+
- **Re-scope nightly to "sanity check only"** — fire once a day, emit a warning if any entity has unconsolidated facts older than 24h, don't actually do the LLM work
69+
70+
My current lean (not a decision): remove entirely. If an entity doesn't hit the debounce threshold in a day, it probably doesn't need consolidation. But wait for real data.
71+
72+
### What happens to the weekly cron?
73+
74+
This is also an ADR-005 decision, not ADR-004. Options:
75+
- **Keep weekly as "global view" safety net** for MERGE and ARCHIVE operations that need the full corpus
76+
- **Remove weekly entirely** — if tier 2b fires on all touched entities, weekly is redundant
77+
- **Re-scope weekly to "storage maintenance only"** — archive old daily notes, compact SQLite, run the linter, but don't do any LLM-driven consolidation
78+
79+
The weakness of v1's "keep weekly as safety net" was that I couldn't justify it on concrete grounds. Before ADR-005 ships, I need a concrete answer to "what specific operation does weekly do that tier 2b can't?"
80+
81+
### Failure modes to cover
82+
83+
- Queue daemon dies — how do pending reflections resume?
84+
- LLM down for an extended period — what's the backlog policy?
85+
- Debounce trigger fires during an already-running reflection pass — does it queue or drop?
86+
- Entity gets N contradicting facts in 30 seconds — does debounce wait or fire immediately?
87+
- Two separate reflection passes touch the same file — lock? serialize? last-write-wins?
88+
89+
### What observability do we need?
90+
91+
- Queue depth metric
92+
- Reflection pass latency metric
93+
- Op-counts per pass (how much work is each firing actually doing?)
94+
- LLM call count + token count per day
95+
- Contradiction-catch rate (how often does a reflection pass actually find something to UPDATE/SUPERSEDE?)
96+
97+
If the catch rate is low (most passes return NOOP), the whole tier is overengineered and we should stop.
98+
99+
## Draft scope when written
100+
101+
Single decision: whether and how to coalesce write-time contradiction checks into debounced reflection passes. Includes the concrete decisions about nightly and weekly cron fate because those are downstream of this one.
102+
103+
Explicitly out of scope: on-read reconsolidation (ADR-006), access metadata schema (ADR-007).
104+
105+
## Revisit trigger
106+
107+
Draft this ADR when any of the following is true:
108+
- ADR-004 has been in production ≥ 2 weeks AND write-time check latency p95 > 3s
109+
- OR ADR-004's queue has backed up past 500 pending jobs at any point
110+
- OR LLM call cost from ADR-004 exceeds acceptable budget
111+
- OR contradiction-catch rate from ADR-004 is high enough that coalescing would meaningfully reduce LLM calls
112+
113+
Otherwise: leave as planned, don't force it.

ADR-006-on-read-reconsolidation.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# ADR-006: On-Read Reconsolidation (Tier 4)
2+
3+
**Status:** Speculative — not yet drafted, may never be
4+
**Depends on:** ADR-004 shipped; ADR-007 (access metadata) shipped
5+
**Related:** Consolidation improvements
6+
7+
## What this ADR would decide
8+
9+
Whether to trigger a targeted consolidation pass on a specific memory file when that file is retrieved and then followed within some window by a save that contradicts or extends the retrieved content.
10+
11+
Biological analog: reconsolidation (Nader, Schafe & LeDoux 2000) — a retrieved memory becomes labile and requires re-stabilization when prediction error or novel information is present.
12+
13+
## Why this ADR is marked speculative
14+
15+
When I first wrote about this tier in ADR-004-v1, I was seduced by the biological analogy and underestimated how vague the implementation specifics are. Cold-eye review: every piece of "when a retrieved memory is followed within N minutes by a contradicting save, trigger a re-stabilization pass" is its own decision I haven't made.
16+
17+
This ADR may never be written. It's here so we don't lose the idea, not because we're committed to building it.
18+
19+
## Prerequisites before drafting
20+
21+
- [ ] ADR-004 (tier 2a) shipped and validated
22+
- [ ] ADR-007 (access metadata) shipped — we need to know which files have been recently retrieved before we can trigger anything on retrieval
23+
- [ ] Real data on the typical "read-then-save" pattern — how often does it happen? what's the typical gap?
24+
- [ ] Real data on how often write-time dedup from ADR-004 *already* catches the contradictions this tier would catch (if write-time dedup catches them all, this tier is redundant)
25+
26+
## Questions to think through when writing
27+
28+
### Is this distinguishable from tier 2a in practice?
29+
30+
The biggest threat to ADR-006 being useful: **maybe write-time dedup from ADR-004 already does the work.** When you save a fact, `_check_contradictions` looks at the top-k similar existing memories. If the memory you just retrieved is semantically close to the memory you're saving, it'll show up in the top-k regardless of whether you retrieved it. So the contradiction gets caught at write time anyway.
31+
32+
The scenario where tier 4 adds value: you retrieved a memory, and the memory you're about to save is semantically *different* from the retrieved one but contradicts it anyway. Is this even a real scenario? Example:
33+
- You retrieve `decisions/api-rest-vs-graphql.md` (about REST)
34+
- You save `insights/graphql-subscriptions-are-actually-fine.md` (not semantically identical, but contradicts the earlier decision)
35+
- Vector similarity might not surface the REST decision in top-k for the new GraphQL insight
36+
- But the chronological read-then-save pattern would
37+
38+
Before drafting ADR-006, check whether this scenario actually happens in real usage data, or whether it's a theoretical concern.
39+
40+
### What counts as a "read"?
41+
42+
Options:
43+
- MCP tool calls (`palinode_read`, `palinode_search`)
44+
- API `GET /files/{path}` calls
45+
- CLI `palinode read` / `palinode search`
46+
- Automatic context injection (core memory, top-k search results)
47+
- File watcher detecting a manual `cat` or editor open
48+
49+
Each of these is a different signal strength. Automatic injection is much noisier than an explicit tool call — every turn injects core memory, and we probably don't want every subsequent save to trigger a reconsolidation on every core file.
50+
51+
### What counts as "contradicts or extends"?
52+
53+
This is an LLM judgment call. Options:
54+
- Re-run `_check_contradictions` but with the retrieved file forced into the top-k regardless of similarity
55+
- Add a separate LLM call that asks "does the new save contradict the retrieved file specifically?"
56+
- Heuristic: if the new save mentions any entity or keyword from the retrieved file, treat it as potentially contradicting
57+
58+
Each option adds LLM calls. Measure the cost before committing.
59+
60+
### What's the window?
61+
62+
- Minutes (N=5, N=15, N=60)?
63+
- Turns (next N agent turns after the read)?
64+
- Session (same Claude Code session)?
65+
- No window (any save while the read is in "recently retrieved" metadata triggers)?
66+
67+
### Does this compose with tier 2a?
68+
69+
If a read triggers a write-time contradiction check against the retrieved file, and the save also triggers its own write-time contradiction check (ADR-004), does the system do two LLM calls or one? If two, that's wasteful. If one, which one takes precedence?
70+
71+
### What happens for pure-read sessions?
72+
73+
If a user reads 20 files and never saves anything, nothing triggers. No cost. But then what's the value of tracking access metadata at all? Access metadata has value beyond this tier (ranker boost, decay), but ADR-006 specifically doesn't need it unless there's a save.
74+
75+
### Failure modes
76+
77+
- Retrieved file is deleted between read and save
78+
- Retrieved file is already being consolidated by tier 2b
79+
- User retrieves 50 files, saves 1 — check all 50 or just some subset?
80+
- LLM hallucinates a contradiction that doesn't exist → bad UPDATE gets applied
81+
82+
The last one is the scariest. Without a human in the loop, a confidently wrong LLM can corrupt memory. ADR-006 needs a safety mechanism (low-confidence flag? dry-run only? surface to human for approval?) before shipping.
83+
84+
## Draft scope when written
85+
86+
This ADR is narrow: does retrieval + subsequent save trigger a targeted consolidation pass, yes or no, and if yes, how.
87+
88+
Out of scope: access metadata schema (ADR-007, which is a prerequisite), ranker weighting of access metadata (ADR-007), decay based on retrieval count (ADR-007).
89+
90+
## Revisit trigger
91+
92+
Draft this ADR only if **all** of the following are true:
93+
- ADR-004 and ADR-007 are shipped
94+
- Real-world data shows that the "read-then-contradicting-save" pattern happens often enough to matter
95+
- Real-world data shows that write-time dedup (tier 2a) *isn't* already catching those contradictions
96+
- We've designed a safety mechanism against LLM-hallucinated contradictions
97+
98+
If any of these is missing, leave this ADR as speculative and don't write it.
99+
100+
## Possible outcome
101+
102+
It is likely that this ADR gets closed as "not needed" after ADR-004 ships and we see that tier 2a already catches the contradictions that motivated this design. That's a fine outcome. Mark it "not needed" and move on.
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# ADR-007: Access Metadata Schema + DecayConfig Enable
2+
3+
**Status:** Planned — not yet drafted
4+
**Depends on:** ADR-004 shipped
5+
**Related:** Consolidation improvements
6+
7+
## What this ADR will decide
8+
9+
Two coupled decisions:
10+
1. **Schema for access metadata** — how to track `last_retrieved_at`, `retrieval_count`, and possibly `last_modified_at` per memory file.
11+
2. **Enable `DecayConfig`** — turn on the existing (but disabled) decay configuration in `palinode/core/config.py` and use its tau values as a ranker term.
12+
13+
These are in the same ADR because the decay ranker term can't exist without access metadata to feed it.
14+
15+
## What this ADR will NOT decide
16+
17+
- Whether to delete files based on decay. Nothing gets deleted. Decay is a ranker weight only.
18+
- Whether to compact / archive based on decay. Separate decision (probably ADR-005 territory).
19+
- On-read reconsolidation (ADR-006) — although access metadata is a prerequisite for ADR-006, the decision of *whether* to build ADR-006 is independent of this one.
20+
21+
## Prerequisites before drafting
22+
23+
- [ ] ADR-004 shipped (write-time contradiction check) — this ADR touches the same save path
24+
- [ ] Real data on how often files are actually retrieved in a typical week (so tau tuning has ground truth)
25+
- [ ] Real data on retrieval distribution — long-tailed or uniform? Informs whether decay actually matters
26+
27+
## Questions to think through when writing
28+
29+
### Where does access metadata live?
30+
31+
**Three-way tension:**
32+
33+
1. **Frontmatter in the file itself.** Source of truth, git-diffable, survives everything. But: every retrieval causes a file write, which causes a git commit, which causes a watcher reindex, which is expensive and noisy.
34+
35+
2. **SQLite table in `.palinode.db`.** Fast to update, fast to query from the ranker. But: not part of the source-of-truth markdown, can drift from file state, lost if the db is rebuilt.
36+
37+
3. **Hybrid: SQLite is live, periodic flush to frontmatter.** Best of both, but adds a "flush" operation that has its own failure modes.
38+
39+
My current lean (not a decision): **SQLite only for live access metadata, no frontmatter persistence.** Rationale: access metadata is ephemeral by nature — it's a signal about what's hot *right now*. If the db is rebuilt, the ranker resets to "everything is equally cold," which is a fine reset state. Not losing information you can't reconstruct.
40+
41+
But: if ADR-006 ever wants to use retrieval history ("this file was retrieved 10 times last week, so it's important"), SQLite-only breaks that. Worth weighing.
42+
43+
### What fields does the schema need?
44+
45+
Minimum:
46+
- `file_path` (PK)
47+
- `last_retrieved_at` (timestamp)
48+
- `retrieval_count` (integer, monotonic)
49+
50+
Possibly:
51+
- `last_retrieved_by` (session id? user id? source surface like "claude-code"?)
52+
- `retrieval_count_7d` / `retrieval_count_30d` (decay-window counts, maintained by a sweeper)
53+
- `last_modified_at` (file mtime at last index)
54+
- `contradiction_check_count` (how many times tier 2a has run on this file)
55+
56+
Don't add fields speculatively. Add only what the ranker or a future ADR concretely needs.
57+
58+
### What's a "retrieval"?
59+
60+
Same question as ADR-006 (see that ADR for the option list). The choice here matters for decay:
61+
- If automatic context injection counts as a retrieval, every file in core memory gets bumped on every turn → "core" never decays → access metadata is useless as a ranker signal
62+
- If only explicit tool calls count, decay has signal but core files look stale → the ranker might downrank core files, which is wrong
63+
64+
**Likely answer:** access metadata distinguishes *explicit* reads (tool calls) from *passive* injection. Only explicit reads update `retrieval_count`. Passive injection updates `last_injected_at` separately, or not at all.
65+
66+
This is a schema-design decision that has to be made in ADR-007.
67+
68+
### Is decay actually useful?
69+
70+
The existing `DecayConfig` has tau values:
71+
- `tau_critical: 180` days
72+
- `tau_decisions: 60`
73+
- `tau_insights: 90`
74+
- `tau_general: 30`
75+
- `tau_status: 7`
76+
- `tau_ephemeral: 1`
77+
78+
These are guesses from before any data existed. Before this ADR can ship a "use decay in the ranker" decision, we need:
79+
- Evidence that unused memories are actually hurting the ranker (e.g., stale status files surfacing in search results ahead of current ones)
80+
- Empirical tau values, tuned from real retrieval patterns
81+
82+
If neither of those is in hand, the ADR should narrow to **"add access metadata schema only, defer decay enabling."** Half an ADR.
83+
84+
### How does this interact with the ranker?
85+
86+
Current ranker (`palinode/core/store.py`) uses hybrid BM25 + vector similarity fused via RRF. Adding a decay term means:
87+
- Either multiply the existing RRF score by a decay factor (e.g., `score * exp(-age / tau)`)
88+
- Or add decay as a third signal in the fusion
89+
90+
The ADR needs to pick one and justify it. Multiplication is simpler but makes decay dominate when age is large; fusion is more principled but introduces a tuning parameter.
91+
92+
### How does this interact with the watcher?
93+
94+
The watcher daemon reindexes on file modify. If access metadata is in frontmatter, every retrieval triggers a file modify, which triggers a reindex, which could create a feedback loop.
95+
96+
If access metadata is SQLite-only (my current lean), the watcher doesn't need to know about it at all — it stays focused on content indexing.
97+
98+
### Failure modes
99+
100+
- SQLite table corrupts → ranker loses decay signal, search still works (falls back to BM25+vector only)
101+
- Access metadata gets out of sync with actual file state → stale last_retrieved_at, but self-corrects on next retrieval
102+
- Ranker heavily favors recently-retrieved files → recency bias → user complains about "why does it keep showing me the thing I just read?" Need a brake.
103+
104+
### What telemetry do we need to tune tau?
105+
106+
- Histogram of retrieval intervals per file (time between retrievals)
107+
- Distribution of retrieval_count per file at end of a week
108+
- Correlation between "last retrieved > X days ago" and "never retrieved again" (predictive value of decay)
109+
110+
Without telemetry, tuning tau is guessing. The ADR should ship telemetry alongside the feature.
111+
112+
## Draft scope when written
113+
114+
Single coupled decision: access metadata schema + enable decay as ranker term. The two are bundled because one can't ship without the other (decay needs metadata to read from).
115+
116+
Possible split: ship access metadata schema alone first (ADR-007a), then decide decay enabling separately (ADR-007b) after the metadata has been collecting data for a few weeks. This is probably the right move if real data isn't available at draft time.
117+
118+
Explicitly out of scope: on-read reconsolidation (ADR-006), decay-based deletion (never — not a thing we do), decay-based archival (ADR-005 territory).
119+
120+
## Revisit trigger
121+
122+
Draft this ADR when:
123+
- ADR-004 has been in production long enough that we can instrument retrieval
124+
- OR we hit a concrete ranker problem that access metadata would solve (stale results outranking fresh ones in user-visible search output)
125+
- OR we decide we want ADR-006 and need access metadata as a prerequisite
126+
127+
No rush. This ADR is valuable but not urgent.

0 commit comments

Comments
 (0)