Skip to content

SyntheticCognitionLabs/cognitive-memory-model

Repository files navigation

Cognitive Memory Model (CMM)

PyPI version Python License: MIT Documentation

Autoassociative cognitive memory for LLMs and AI agents. | Documentation | PyPI

LLMs have no real memory. Knowledge is either baked into weights, held in the ephemeral context window, or stored in files that require the agent to explicitly decide to read and write. CMM changes this — it gives LLMs automatic, cognitive-like memory that passively monitors conversations, encodes them into compressed gist representations, and surfaces relevant memories when associative cues appear. No one has to decide to "look something up."

Results

CMM improves LLM response accuracy by +67% keyword / +78% LLM-judge on contamination-free novel-fact benchmarks with Claude Opus 4.6.

LLM Evaluation: Baseline vs CMM

The improvement holds across model scales — both small (Mistral 7B) and frontier (Claude Opus 4.6) models benefit substantially.

Cross-Model Comparison

Key Features

CMM is the first LLM memory system to implement spreading activation, entity linking, priming, and metamemory in combination.

Feature Comparison

Cognitive Features

  • Spreading activation — dual-path: FAISS embedding neighbors + spaCy entity linking. Discovers cross-domain connections that flat retrieval misses.
  • Priming — recently activated memories boost related memories for subsequent turns.
  • Metamemory — confidence levels (HIGH/MODERATE/LOW/NONE) and "tip of the tongue" partial-match hints.
  • Grace-period temporal decay — no decay for 2 weeks, then frequency-dependent exponential decay. Frequently accessed memories lock permanently.
  • Importance scoring — corrections and instructions get 2x importance; routine exchanges get 0.5x. Importance scoring turned a 0.12-similarity allergy memory into a life-saving recall.
  • Emotional valence — each memory tagged with valence, arousal, and emotion label for empathetic recall.
  • Episodic → semantic consolidation — clusters similar episodic memories into general knowledge over time.

Importance Scoring

Cross-Domain Spreading Activation

In a test with 1,200 city case files and 5 hidden investigation chains, entity-linked spreading activation found 2.4x more connections than flat retrieval:

Spreading Activation

Demos

Four narrative demos showcase the cognitive features with Claude Opus 4.6:

Demo What it shows Run it
The Handoff 3 agents disagree on an API rate limit. System surfaces contradiction with agent attribution. python -m demos.handoff --backend anthropic
Tip of the Tongue Metamemory detects a partial match, surfaces a hint, priming enables full recall on follow-up. python -m demos.tip_of_tongue --backend anthropic
Six Weeks with Alex 6-week personal assistant: allergy recall, emotional empathy, correction handling, consolidation. python -m demos.six_weeks --backend anthropic
The Investigator 1,200 case files. Spreading activation chains warehouse inspection → hospital report → shipping records → environmental data. python -m demos.investigator --backend anthropic

Installation

# From PyPI
pip install cognitive-memory-model

# With optional integrations
pip install cognitive-memory-model[anthropic]   # Anthropic API support
pip install cognitive-memory-model[openai]      # OpenAI API support
pip install cognitive-memory-model[mcp]         # MCP server support
pip install cognitive-memory-model[all]         # Everything

# Download the spaCy model for entity linking
python -m spacy download en_core_web_sm

From source (for development):

git clone https://github.com/SyntheticCognitionLabs/cognitive-memory-model.git
cd cognitive-memory-model
python3 -m venv .venv
source .venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/cu126  # CUDA 12.6
pip install -e ".[dev]"
python -m spacy download en_core_web_sm

Quick Start

from cmm.pipeline.conversation import CognitiveMemoryPipeline

pipeline = CognitiveMemoryPipeline()

# Ingest conversation turns — both sides are captured
pipeline.ingest("user", "I'm allergic to peanuts. I carry an EpiPen.")
pipeline.ingest("assistant", "Noted, severe peanut allergy.")
pipeline.ingest("user", "My project deadline is April 15th.")

# Later... the memory system automatically recalls relevant information
results = pipeline.recall("I'm ordering food for the team lunch")
print(pipeline.format_recalled(results))
# → [Recalled from memory...] I'm allergic to peanuts. I carry an EpiPen.

# Save to disk — memories persist across restarts
pipeline.save("./my_memory")

# Load and continue tomorrow
pipeline = CognitiveMemoryPipeline.load("./my_memory")

Integrations

Every integration supports true autoassociative memory except MCP tools.

Integration How Autoassociative? Language
HTTP Memory Server REST API on localhost or network Yes Any
Claude Code Hooks UserPromptSubmit + Stop hooks Yes Any
Python Middleware Wraps OpenAI/Anthropic API calls Yes Python
MCP Server memory_recall/store tools Semi (tool-based) Any MCP client
Direct Library CognitiveMemoryPipeline API Yes Python

Autoassociative means the memory system passively monitors all conversation turns and automatically surfaces relevant memories. No one decides to "look something up." Every path except MCP achieves this.

Use the Python Middleware (any LLM API)

from integrations.middleware import MemoryMiddleware

mw = MemoryMiddleware(api_type="anthropic")
response = mw.chat("I'm allergic to peanuts.")
response = mw.chat("Order food for the team lunch.")
# ^ Automatically recalls the peanut allergy

Use Claude Code Hooks (zero-config)

# Start the memory server
python -m integrations.claude-code.memory_server --data-dir ./memory

# Add hooks to Claude Code settings — see integrations/claude-code/README.md
# That's it. Memory happens automatically on every message.

Multi-Agent Teams

Dozens of agents can share a single memory server. Each agent tags memories with their agent_id. Nightly consolidation merges, deduplicates, and detects contradictions.

# Central server (accessible to all agents)
python -m integrations.claude-code.memory_server \
    --host 0.0.0.0 --data-dir /shared/memory --auto-save 300

See integrations/README.md for full setup instructions, CRUD endpoints, and production deployment guidance.

Architecture

Conversation stream (user ↔ agent turns, reasoning steps)
       │
       ▼
┌─────────────────┐
│  Gist Encoder   │  LLM or small model: turn → compressed summary + tags
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Embedding Model │  gist text → 768D dense vector (all-mpnet-base-v2)
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────┐
│     FAISS-backed Memory Store   │  O(1) similarity search
│  + Entity Index (spaCy NER)     │  Named entity → memory linkage
└────────┬────────────────────────┘
         │
         ▼  (on each new turn)
┌─────────────────────────────────┐
│ Cognitive Retrieval Pipeline    │
│  1. FAISS similarity search     │
│  2. Temporal decay + rehearsal  │
│  3. Importance weighting        │
│  4. Priming boost               │
│  5. Spreading activation        │
│     (embedding + entity links)  │
│  6. Working memory merge        │
│  7. Metamemory confidence       │
└────────┬────────────────────────┘
         │
         ▼
  Inject recalled memories into LLM context
  (clearly marked as "from memory, not user input")

Scoring Formula

final_score = similarity × decay(age, access_count) × importance × priming_boost

Running Tests

pytest                                          # all tests
pytest --ignore=tests/test_ollama_encoder.py \
       --ignore=tests/test_phase4_integration.py  # fast (no Ollama needed)
pytest -v -k "zebra"                            # the zebra test

License

MIT

Documentation

About

Giving LLMs human-like autoassociative memory

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors