Development Plan

Phased implementation roadmap for cognitive-memory-model. Each phase is independently useful and builds on the previous one.

Background & Motivation

LLMs have no real memory. Knowledge is either baked into weights, held in the ephemeral context window, or stored in explicit memory files that require deliberate read/write actions. If the context is cleared without writing something down, it's gone. This doesn't resemble human memory at all — humans recall information automatically when cues trigger associations.

This project builds an autoassociative memory system that passively monitors LLM/agent conversations, compresses them into gist representations, and automatically surfaces relevant memories when similar cues appear. The LLM does all reasoning; the memory system is a substrate that provides information.

Lessons from Prior Work

HDC role-filler binding with random expansion to 10,000D: worked but unnecessary if the LLM handles reasoning. Standard embedding vectors suffice for similarity-based retrieval.
Subject-relation-object triplets: too coarse to capture meaning.
AMR (Abstract Meaning Representation): richer structure, but the model used was very slow. Unclear if slowness was the SDM (now solved with FAISS) or the AMR model itself.
Semantic framework doing reasoning: scope creep. The memory system tried to return structured responses to queries. This is the LLM's job.
FAISS IVF for O(1) lookup: proven to work in a separate project. See docs/FAISS-SDM.md for reference.

Key Design Decisions

LLM-as-gist-encoder: use an LLM or small model to compress conversation turns into natural-language gist summaries + tags, rather than formal semantic representations (AMR, triplets).
Embedding vectors for retrieval: standard embedding models (sentence-transformers, etc.) provide the similarity vectors. No HDC random expansion needed.
FAISS for O(1) similarity search: use FAISS IVF index for content-addressable lookup at scale.
Cognitive features via scoring modifiers: spreading activation, decay, priming, and importance are all implemented as modifiers on the base FAISS similarity score: final_score = similarity * decay(age) * importance * priming_boost.
Memory system does NOT reason: it stores, retrieves, and surfaces information. The LLM does all reasoning over recalled memories.

Phase 1: Core Memory Store + Encoding Pipeline — COMPLETE

Goal: Get the "zebra example" working end-to-end. Store conversation turns as gist memories, retrieve them by similarity when relevant cues appear.

Implementation

1.1 Conversation Parser — `cmm/core/types.py`, `cmm/pipeline/conversation.py`

ConversationTurn dataclass: {role: Role, content: str, timestamp: float}
CognitiveMemoryPipeline.ingest(role, content) — main entry point
CognitiveMemoryPipeline.process_turn(turn) — encode, store, retrieve in one call

1.2 Gist Encoder — `cmm/encoding/gist_encoder.py`, `cmm/encoding/ollama_gist_encoder.py`

GistEncoder ABC with encode(turn, context?) -> Gist interface
PassthroughGistEncoder — baseline, passes raw text + keyword tag extraction (for testing)
OllamaGistEncoder — uses local Mistral 7B via Ollama to produce compressed 1-2 sentence gists + tags in JSON format; falls back to passthrough on failure

1.3 Embedding Layer — `cmm/encoding/embedding.py`

EmbeddingModel wrapping sentence-transformers (all-MiniLM-L6-v2, 384D)
Vectors L2-normalized for cosine similarity via inner product

1.4 FAISS Memory Store — `cmm/core/memory_store.py`

Starts with flat index (exact search), auto-trains IVF index once buffer reaches max(nlist * 10, 256) items
Metadata stored alongside: Memory dataclass with gist, tags, timestamp, importance, access_count, last_accessed, source_role
Thread-safe with locking; optional GPU support

1.5 Retrieval Trigger — `cmm/retrieval/retriever.py`

On each turn: embed query → FAISS top-k → update access metadata → format for context injection
Retriever.format_for_context() produces [Recalled from memory...]...[End recalled memories] blocks with relevance scores, tags, emotional context, and agent attribution — clearly marked as memory (not user input)

Results

All success criteria met: zebra test passes (store zebra facts → 20 unrelated turns → recall "zebra" succeeds)
FAISS retrieval is sub-millisecond at test scale
Multiple topic discrimination works (Python vs. elephants vs. Eiffel Tower)

Phase 2: Temporal Decay + Recency Weighting + Working Memory — COMPLETE

Goal: Memories fade over time. Recent/frequently accessed memories are prioritized. A working memory buffer keeps just-activated memories warm.

Implementation

2.1 Temporal Decay + Rehearsal — `cmm/retrieval/decay.py`

DecayScorer applies exponential decay: decay = e^(-λ_eff * age)
Age measured from last_accessed, not creation time — accessed memories reset their decay clock
Rehearsal effect: λ_eff = λ / (1 + rehearsal_weight * ln(1 + access_count)) — frequently accessed memories decay slower, with log dampening
Default decay_rate=1e-5 (~50% decay after 19 hours without access)
Both decay_rate and rehearsal_weight configurable via CognitiveMemoryPipeline

2.2 Working Memory Buffer — `cmm/retrieval/working_memory.py`

WorkingMemory — fixed-capacity buffer (default 10 items) with turn-based TTL (default 5 turns)
Retrieved memories automatically enter working memory
Reactivation resets TTL and keeps the higher score
Lowest-scoring item evicted when at capacity
tick() called on each process_turn() — items expire after TTL turns without reactivation

2.3 Retriever Integration — `cmm/retrieval/retriever.py`

Retrieval pipeline now: FAISS top-k*3 (overfetch) → apply similarity * decay * importance → merge working memory items (re-scored against current query) → threshold filter → sort → return top-k
Working memory items are re-scored using cosine similarity to the current query embedding, preventing stale working memory from crowding out relevant FAISS results
Pipeline ticks working memory on each turn automatically

Results

All success criteria met: old memories score lower, accessed memories resist decay, working memory keeps items warm for TTL turns and clears on topic shift
Scoring formula in practice: final_score = raw_similarity * decay(age, access_count) * importance

Design Note

Working memory items are re-scored against the current query rather than using their original activation score. This was necessary because stale working memory items (high score from a previous query context) would otherwise crowd out relevant FAISS results for the current query.

Phase 3: Spreading Activation + Priming — COMPLETE

Goal: Retrieving one memory activates related memories. Recent activations lower thresholds for associated concepts.

Implementation

3.1 Spreading Activation — `cmm/retrieval/spreading_activation.py`, `cmm/retrieval/entity_index.py`

SpreadingActivation expands retrieval via two paths:
1. Embedding proximity: FAISS neighbor queries from seed embeddings
2. Entity links: spaCy NER extracts named entities at storage time; EntityIndex maps entities to memory IDs; spreading traverses entity links to find cross-domain associations
Entity linking solves the cross-domain problem: "warehouse inspection on Industrial Way" and "hospital patients near Industrial Way" have only 0.18 embedding similarity but share the "Industrial Way" entity
Score decays per hop: spread_score = parent_score * spread_factor * neighbor_similarity (embedding path) or parent_score * spread_factor * entity_boost (entity path)
Multiple paths to the same memory are deduplicated, keeping max score
Embedding model upgraded from all-MiniLM-L6-v2 (384D) to all-mpnet-base-v2 (768D) for better cross-domain similarity

3.2 Priming State — `cmm/retrieval/priming.py`

PrimingState tracks recently activated memory IDs with turn counters
Boost formula: 1 + boost_strength * e^(-decay_rate * turns_since_activation) (default 1.3x at activation)
Reactivation resets the boost; auto-cleanup after max_turns (default 10)
Applied to both direct FAISS results and spread-activated results

3.3 Retriever Pipeline — `cmm/retrieval/retriever.py`

Full 8-step pipeline: FAISS fetch → decay + importance + priming → initial top-k → spreading activation → priming on spread results → working memory merge → final sort → update access/WM/priming
Retriever.tick() now advances both working memory and priming in one call

Results

All success criteria met: "zebra" activates "Africa/wildlife" memories, animal priming boosts animal retrieval on subsequent turns, spreading stays focused (animal memories rank above programming ones)
Full scoring formula: final_score = similarity * decay(age, access_count) * importance * priming_boost

Design Note

The associative index (3.3 in original plan) was deferred — FAISS re-query works well for spreading activation and avoids maintaining a separate co-occurrence graph. Can be revisited if performance becomes a concern at scale.

Phase 4: Episodic → Semantic Consolidation — COMPLETE

Goal: Over time, specific episodic memories consolidate into general semantic knowledge. Like human sleep consolidation.

Implementation

4.1 Consolidation Engine — `cmm/consolidation/consolidator.py`

Consolidator clusters episodic memories by embedding cosine similarity (greedy single-linkage)
Configurable cluster_threshold (default 0.6) and min_cluster_size (default 3)
ConsolidationSummarizer ABC + SimpleConsolidationSummarizer (concatenation fallback)
Consolidated semantic memories stored with higher importance (default 2.0x)
Episodic memories in consolidated clusters get demoted (default 0.5x importance)

4.2 LLM-Based Summarization — `cmm/consolidation/ollama_summarizer.py`

OllamaConsolidationSummarizer uses local Mistral 7B to generate consolidated summaries
Produces quality results like: "The user frequently discusses Python-related topics such as decorators, async/await, function optimization, and writing unit tests using pytest."
Same JSON output pattern as the gist encoder, with graceful fallback

4.3 Session Summaries — `cmm/consolidation/session.py`

SessionSummarizer creates session-level summaries from accumulated turn memories
Stored as semantic memories with moderate importance (1.5x), prefixed with "Session summary:"
pipeline.end_session() triggers summarization and resets session tracking

4.4 Pipeline Integration — `cmm/pipeline/conversation.py`

pipeline.consolidate() for manual trigger
Auto-consolidation fires every consolidation_threshold turns (default 50)
Session memory IDs tracked per-session so summaries only cover their own turns

Results

All success criteria met: Python debugging cluster produces "The user frequently discusses debugging Python scripts related to CSV parsing"; semantic memories retrievable by broad cues; episodic memories still retrievable by specific cues
LLM-based summarizer dramatically outperforms the simple concatenation fallback

Design Notes

Embedding similarity between diverse subtopics of the same domain (e.g., Python decorators vs async/await) is only 0.1-0.4. Cluster threshold needs to be set around 0.2-0.4 to group them, or the gist encoder needs to produce more similar phrasings.
Hierarchical retrieval (4.4 in original plan) happens naturally — FAISS returns both episodic and semantic memories, and the scoring formula (with importance weighting) gives semantic memories a natural advantage for broad queries.

Phase 5: Importance Weighting + Metamemory + Maintenance — COMPLETE

Goal: Not all memories are equal. The system detects importance signals, knows what it knows, and cleans up after itself.

Phase 5a: Importance Detection — `cmm/scoring/importance.py`

ImportanceScorer ABC + RuleBasedImportanceScorer with regex-based detection
Scoring tiers: corrections (2.0x), explicit instructions (2.0x), novel information (1.5x), normal (1.0x), routine/filler (0.5x)
Novelty detection: queries FAISS for max similarity to existing memories; below novelty_threshold (default 0.5) = novel
Integrated into process_turn() — importance is scored and assigned at storage time automatically

Results: corrections stored at 2.0x importance, greetings at 0.5x; corrections rank above normal memories for the same query due to importance multiplier in the scoring formula.

Phase 5b: Metamemory Signals — `cmm/retrieval/metamemory.py`

MetamemoryScorer classifies retrieval results into confidence levels: HIGH (≥0.7), MODERATE (≥0.4), LOW (≥0.2), NONE
MetamemoryResult wraps results with confidence, partial matches, and convenience properties (has_strong_match, has_tip_of_tongue)
Partial matches: candidates that scored above partial_threshold but below the retrieval threshold — "tip of the tongue" signals
pipeline.recall_with_metamemory() for metamemory-enriched retrieval
MetamemoryScorer.format_for_context() includes confidence level and a separate [Partial matches] section

Results: strong matches get HIGH/MODERATE confidence; completely unrelated queries get NONE; borderline queries surface partial matches as hints.

Phase 5c: Memory Maintenance — `cmm/maintenance/maintenance.py`

MemoryMaintainer.prune() removes memories where decay * importance < prune_threshold AND importance <= prune_min_importance. High-importance memories (corrections, instructions) are protected.
MemoryMaintainer.deduplicate() finds pairs with cosine similarity ≥ duplicate_threshold (default 0.95), keeps the one with higher importance/access, transfers access count to survivor.
MemoryMaintainer.get_health_metrics() returns HealthMetrics dataclass: total/episodic/semantic counts, avg importance, avg access count, pruned/merged counts.
MemoryMaintainer.maintain() runs deduplicate then prune in sequence.
pipeline.maintain() and pipeline.health() for easy access.
Added store.remove() and store.rebuild_index() to MemoryStore for index reconstruction after deletions.

Results: old low-importance memories are pruned; high-importance and recently-accessed memories survive; near-duplicates merge with access count transfer; FAISS index correctly rebuilt after maintenance.

Post-Phase Features — COMPLETE

Features originally listed as "Future Considerations" that have been implemented:

Multi-agent memory sharing — cmm/multi_agent/shared_store.py: SharedMemoryManager with per-agent pipelines, scoped visibility (PRIVATE/SHARED/TEAM), contradiction detection with agent attribution, and nightly consolidation.
Emotional valence tagging — cmm/scoring/valence.py: ValenceScorer tags each memory with valence (-1 to +1), arousal (0 to 1), and emotion labels. Emotional context surfaced in retrieval formatting.
Entity-linked spreading activation — cmm/retrieval/entity_index.py: spaCy NER extracts named entities at storage time. Spreading activation traverses both embedding neighbors AND entity links for cross-domain association.
Embedding model upgrade — Upgraded from all-MiniLM-L6-v2 (384D) to all-mpnet-base-v2 (768D) for better cross-domain similarity.

Integrations — COMPLETE

Multiple integration paths for different environments. See integrations/README.md for full guide.

HTTP Memory Server (integrations/claude-code/memory_server.py) — Language-agnostic REST API. Any application that can make HTTP calls gets autoassociative memory.
Claude Code Hooks (integrations/claude-code/hooks/) — UserPromptSubmit and Stop hooks for fully automatic two-way monitoring. Zero manual intervention.
Python API Middleware (integrations/middleware.py) — Wraps any OpenAI-compatible or Anthropic API call with automatic memory ingest + recall.
MCP Server (integrations/mcp/server.py) — Exposes memory as MCP tools for Claude Desktop, Cursor, and any MCP client. Semi-automatic (LLM decides when to call tools).
Gist encoder backends — Ollama (local), OpenAI-compatible (any provider), Anthropic (Claude API), Passthrough (no LLM needed).

All integration paths support true autoassociative memory except MCP tools, which require the LLM to decide to call the memory tools.

Persistence and Distributed Memory — COMPLETE

Persistence — pipeline.save(directory) / CognitiveMemoryPipeline.load(directory) saves and restores the FAISS index, all memory metadata (importance, timestamps, agent_id, valence, etc.), and the entity index. Three files: faiss.index, memories.json, entities.json.
Distributed multi-agent server — The HTTP memory server (integrations/claude-code/memory_server.py) supports:
- agent_id and session_id on every request for multi-agent tagging
- ThreadingMixIn for concurrent access from multiple agents
- --data-dir for automatic persistence (saves on shutdown, loads on startup)
- --auto-save N for periodic background saves
- --host 0.0.0.0 for network access (remote agents)
- POST /consolidate for nightly consolidation
- POST /contradictions for cross-agent contradiction detection
- GET /stats for per-agent memory counts

The 60-90 agent, 30-person developer team scenario: all agents hit http://memory-server:7832/ingest_and_recall with their agent_id. The server maintains a shared FAISS index. Each agent's memories are tagged and scoped. Nightly consolidation runs via POST /consolidate.

Future Considerations (Not Planned)

Ideas captured for potential future work, not in current scope:

Multi-modal memories: storing memories from images, audio, structured data
Continual learning integration: using memory patterns to influence model fine-tuning
Hardware acceleration: replacing FAISS with TCAM/neuromorphic hardware for true O(1)

FilesExpand file tree

DEVELOPMENT_PLAN.md

Latest commit

History

DEVELOPMENT_PLAN.md

File metadata and controls

Development Plan

Background & Motivation

Lessons from Prior Work

Key Design Decisions

Phase 1: Core Memory Store + Encoding Pipeline — COMPLETE

Implementation

1.1 Conversation Parser — cmm/core/types.py, cmm/pipeline/conversation.py

1.2 Gist Encoder — cmm/encoding/gist_encoder.py, cmm/encoding/ollama_gist_encoder.py

1.3 Embedding Layer — cmm/encoding/embedding.py

1.4 FAISS Memory Store — cmm/core/memory_store.py

1.5 Retrieval Trigger — cmm/retrieval/retriever.py

Results

Phase 2: Temporal Decay + Recency Weighting + Working Memory — COMPLETE

Implementation

2.1 Temporal Decay + Rehearsal — cmm/retrieval/decay.py

2.2 Working Memory Buffer — cmm/retrieval/working_memory.py

2.3 Retriever Integration — cmm/retrieval/retriever.py

Results

Design Note

Phase 3: Spreading Activation + Priming — COMPLETE

Implementation

3.1 Spreading Activation — cmm/retrieval/spreading_activation.py, cmm/retrieval/entity_index.py

3.2 Priming State — cmm/retrieval/priming.py

3.3 Retriever Pipeline — cmm/retrieval/retriever.py

Results

Design Note

Phase 4: Episodic → Semantic Consolidation — COMPLETE

Implementation

4.1 Consolidation Engine — cmm/consolidation/consolidator.py

4.2 LLM-Based Summarization — cmm/consolidation/ollama_summarizer.py

4.3 Session Summaries — cmm/consolidation/session.py

4.4 Pipeline Integration — cmm/pipeline/conversation.py

Results

Design Notes

Phase 5: Importance Weighting + Metamemory + Maintenance — COMPLETE

Phase 5a: Importance Detection — cmm/scoring/importance.py

Phase 5b: Metamemory Signals — cmm/retrieval/metamemory.py

Phase 5c: Memory Maintenance — cmm/maintenance/maintenance.py

Post-Phase Features — COMPLETE

Integrations — COMPLETE

Persistence and Distributed Memory — COMPLETE

Future Considerations (Not Planned)

1.1 Conversation Parser — `cmm/core/types.py`, `cmm/pipeline/conversation.py`

1.2 Gist Encoder — `cmm/encoding/gist_encoder.py`, `cmm/encoding/ollama_gist_encoder.py`

1.3 Embedding Layer — `cmm/encoding/embedding.py`

1.4 FAISS Memory Store — `cmm/core/memory_store.py`

1.5 Retrieval Trigger — `cmm/retrieval/retriever.py`

2.1 Temporal Decay + Rehearsal — `cmm/retrieval/decay.py`

2.2 Working Memory Buffer — `cmm/retrieval/working_memory.py`

2.3 Retriever Integration — `cmm/retrieval/retriever.py`

3.1 Spreading Activation — `cmm/retrieval/spreading_activation.py`, `cmm/retrieval/entity_index.py`

3.2 Priming State — `cmm/retrieval/priming.py`

3.3 Retriever Pipeline — `cmm/retrieval/retriever.py`

4.1 Consolidation Engine — `cmm/consolidation/consolidator.py`

4.2 LLM-Based Summarization — `cmm/consolidation/ollama_summarizer.py`

4.3 Session Summaries — `cmm/consolidation/session.py`

4.4 Pipeline Integration — `cmm/pipeline/conversation.py`

Phase 5a: Importance Detection — `cmm/scoring/importance.py`

Phase 5b: Metamemory Signals — `cmm/retrieval/metamemory.py`

Phase 5c: Memory Maintenance — `cmm/maintenance/maintenance.py`