Autoassociative cognitive memory for LLMs and AI agents. | Documentation | PyPI
LLMs have no real memory. Knowledge is either baked into weights, held in the ephemeral context window, or stored in files that require the agent to explicitly decide to read and write. CMM changes this — it gives LLMs automatic, cognitive-like memory that passively monitors conversations, encodes them into compressed gist representations, and surfaces relevant memories when associative cues appear. No one has to decide to "look something up."
CMM improves LLM response accuracy by +67% keyword / +78% LLM-judge on contamination-free novel-fact benchmarks with Claude Opus 4.6.
The improvement holds across model scales — both small (Mistral 7B) and frontier (Claude Opus 4.6) models benefit substantially.
CMM is the first LLM memory system to implement spreading activation, entity linking, priming, and metamemory in combination.
- Spreading activation — dual-path: FAISS embedding neighbors + spaCy entity linking. Discovers cross-domain connections that flat retrieval misses.
- Priming — recently activated memories boost related memories for subsequent turns.
- Metamemory — confidence levels (HIGH/MODERATE/LOW/NONE) and "tip of the tongue" partial-match hints.
- Grace-period temporal decay — no decay for 2 weeks, then frequency-dependent exponential decay. Frequently accessed memories lock permanently.
- Importance scoring — corrections and instructions get 2x importance; routine exchanges get 0.5x. Importance scoring turned a 0.12-similarity allergy memory into a life-saving recall.
- Emotional valence — each memory tagged with valence, arousal, and emotion label for empathetic recall.
- Episodic → semantic consolidation — clusters similar episodic memories into general knowledge over time.
In a test with 1,200 city case files and 5 hidden investigation chains, entity-linked spreading activation found 2.4x more connections than flat retrieval:
Four narrative demos showcase the cognitive features with Claude Opus 4.6:
| Demo | What it shows | Run it |
|---|---|---|
| The Handoff | 3 agents disagree on an API rate limit. System surfaces contradiction with agent attribution. | python -m demos.handoff --backend anthropic |
| Tip of the Tongue | Metamemory detects a partial match, surfaces a hint, priming enables full recall on follow-up. | python -m demos.tip_of_tongue --backend anthropic |
| Six Weeks with Alex | 6-week personal assistant: allergy recall, emotional empathy, correction handling, consolidation. | python -m demos.six_weeks --backend anthropic |
| The Investigator | 1,200 case files. Spreading activation chains warehouse inspection → hospital report → shipping records → environmental data. | python -m demos.investigator --backend anthropic |
# From PyPI
pip install cognitive-memory-model
# With optional integrations
pip install cognitive-memory-model[anthropic] # Anthropic API support
pip install cognitive-memory-model[openai] # OpenAI API support
pip install cognitive-memory-model[mcp] # MCP server support
pip install cognitive-memory-model[all] # Everything
# Download the spaCy model for entity linking
python -m spacy download en_core_web_smFrom source (for development):
git clone https://github.com/SyntheticCognitionLabs/cognitive-memory-model.git
cd cognitive-memory-model
python3 -m venv .venv
source .venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/cu126 # CUDA 12.6
pip install -e ".[dev]"
python -m spacy download en_core_web_smfrom cmm.pipeline.conversation import CognitiveMemoryPipeline
pipeline = CognitiveMemoryPipeline()
# Ingest conversation turns — both sides are captured
pipeline.ingest("user", "I'm allergic to peanuts. I carry an EpiPen.")
pipeline.ingest("assistant", "Noted, severe peanut allergy.")
pipeline.ingest("user", "My project deadline is April 15th.")
# Later... the memory system automatically recalls relevant information
results = pipeline.recall("I'm ordering food for the team lunch")
print(pipeline.format_recalled(results))
# → [Recalled from memory...] I'm allergic to peanuts. I carry an EpiPen.
# Save to disk — memories persist across restarts
pipeline.save("./my_memory")
# Load and continue tomorrow
pipeline = CognitiveMemoryPipeline.load("./my_memory")Every integration supports true autoassociative memory except MCP tools.
| Integration | How | Autoassociative? | Language |
|---|---|---|---|
| HTTP Memory Server | REST API on localhost or network | Yes | Any |
| Claude Code Hooks | UserPromptSubmit + Stop hooks | Yes | Any |
| Python Middleware | Wraps OpenAI/Anthropic API calls | Yes | Python |
| MCP Server | memory_recall/store tools | Semi (tool-based) | Any MCP client |
| Direct Library | CognitiveMemoryPipeline API |
Yes | Python |
Autoassociative means the memory system passively monitors all conversation turns and automatically surfaces relevant memories. No one decides to "look something up." Every path except MCP achieves this.
from integrations.middleware import MemoryMiddleware
mw = MemoryMiddleware(api_type="anthropic")
response = mw.chat("I'm allergic to peanuts.")
response = mw.chat("Order food for the team lunch.")
# ^ Automatically recalls the peanut allergy# Start the memory server
python -m integrations.claude-code.memory_server --data-dir ./memory
# Add hooks to Claude Code settings — see integrations/claude-code/README.md
# That's it. Memory happens automatically on every message.Dozens of agents can share a single memory server. Each agent tags memories with their agent_id. Nightly consolidation merges, deduplicates, and detects contradictions.
# Central server (accessible to all agents)
python -m integrations.claude-code.memory_server \
--host 0.0.0.0 --data-dir /shared/memory --auto-save 300See integrations/README.md for full setup instructions, CRUD endpoints, and production deployment guidance.
Conversation stream (user ↔ agent turns, reasoning steps)
│
▼
┌─────────────────┐
│ Gist Encoder │ LLM or small model: turn → compressed summary + tags
└────────┬────────┘
│
▼
┌─────────────────┐
│ Embedding Model │ gist text → 768D dense vector (all-mpnet-base-v2)
└────────┬────────┘
│
▼
┌─────────────────────────────────┐
│ FAISS-backed Memory Store │ O(1) similarity search
│ + Entity Index (spaCy NER) │ Named entity → memory linkage
└────────┬────────────────────────┘
│
▼ (on each new turn)
┌─────────────────────────────────┐
│ Cognitive Retrieval Pipeline │
│ 1. FAISS similarity search │
│ 2. Temporal decay + rehearsal │
│ 3. Importance weighting │
│ 4. Priming boost │
│ 5. Spreading activation │
│ (embedding + entity links) │
│ 6. Working memory merge │
│ 7. Metamemory confidence │
└────────┬────────────────────────┘
│
▼
Inject recalled memories into LLM context
(clearly marked as "from memory, not user input")
final_score = similarity × decay(age, access_count) × importance × priming_boost
pytest # all tests
pytest --ignore=tests/test_ollama_encoder.py \
--ignore=tests/test_phase4_integration.py # fast (no Ollama needed)
pytest -v -k "zebra" # the zebra testMIT
CLAUDE.md— Architecture and development guideROADMAP.md— Future directions and contribution areasdocs/DEVELOPMENT_PLAN.md— Phased implementation roadmap (all complete)docs/BENCHMARK_PLAN.md— Evaluation methodology and resultsintegrations/README.md— Integration guide and production deploymentdocs/FAISS-SDM.md— FAISS IVF for O(1) content-addressable memory




