MemFabric

Self-organizing agent memory. No embeddings, no vector DB.

MemFabric is a memory system where the LLM organizes its own knowledge as plain markdown files with descriptive filenames. There is no vector database, no embedding pipeline, no retrieval model. The LLM reads a list of filenames, decides what's relevant, and reads those files. It stores new information by creating, appending to, or reorganizing files: merging, splitting, renaming, and synthesizing as its knowledge grows.

The server just stores and retrieves files. All intelligence (what to remember, how to organize, what's relevant) comes from the connected LLM.

What it looks like

After a few weeks of conversations, your AI builds up memory like this:

memory/
  about-me.md           0.8 KB
  family-and-friends.md 1.4 KB
  food-and-cooking.md   1.1 KB
  health.md             0.6 KB
  travel.md             2.3 KB

Then in a new conversation:

You:   My sister is visiting this weekend. What should I cook?

AI:    [reads family-and-friends.md — sister is Sarah, loves Italian]
       [reads health.md — you're lactose intolerant]

       Since Sarah loves Italian and you're dairy-free, how about
       a lemon pasta with roasted garlic and cherry tomatoes? You
       made something similar last month and liked it.

You:   Perfect, let's do that.

AI:    [writes to food-and-cooking.md]

       Noted! I saved it so I'll remember for next time.

Two reads, one write. The AI connected your sister's preferences with your dietary needs without being reminded of either. No setup, no schema, no retrieval config. This is a small example; in practice you'll have 20-30 files or more, and the AI reorganizes them over time, merging, splitting, and rewriting to keep things clean.

Use cases

Personal memory for AI chatbots - give Claude, ChatGPT, or Gemini persistent memory that works across conversations and follows you between providers
Memory for OpenClaw - equip open-source computer-use agents with long-term context about the user and their environment
Multi-agent shared memory - multiple agents read and write to the same MemFabric instance, using filenames as a shared namespace
Coding assistants - tools like Claude Code can remember codebase decisions, architecture context, and debugging history across sessions

Why this works

Most memory systems for LLMs use embeddings and vector search to find relevant memories. This adds infrastructure, introduces retrieval errors, and creates a dependency on embedding quality. But LLMs are already excellent at judging relevance from natural language descriptions. That's literally what they do.

MemFabric exploits this: the filename is the retrieval mechanism. A file called audrey-career-and-promotions.md tells the LLM everything it needs to know about whether to read it. No embedding needed.

The LLM also manages file organization. When files get too large or topics overlap, the LLM merges, splits, or synthesizes them, the same way a human would reorganize their notes. This means the memory structure improves over time, adapting to the actual information being stored rather than following a fixed schema.

Better models = better memory

Because all intelligence lives in the LLM, MemFabric gets better automatically as models improve. I tested 5 different models on the same memory task. The pattern is clear:

Model	J-Score	Files created	Notes
Haiku 4.5	78.3%	15	Too few files, topics lumped together
DeepSeek V3	79.6%	34	Over-fragmented, too many small files
MiniMax M2.5	82.2%	18	Good balance
Claude Sonnet	83.6%	23	Most balanced, best temporal reasoning
MiMo-V2-Pro	84.2%	27	Best overall, strongest multi-hop

More capable models produce better-organized memory files: they create the right number of files, choose more descriptive names, and capture more nuanced information. The gap between the weakest and strongest model is 5.9pp, with no changes to the server or tools.

This is the key advantage over embedding-based approaches: vector databases don't get better when models improve. MemFabric does. Every future model improvement translates directly into better memory organization and retrieval.

Benchmark results

I evaluated MemFabric on LoCoMo (Long Conversational Memory), a benchmark of 10 multi-session conversations with 1,540 questions testing single-hop recall, multi-hop reasoning, temporal reasoning, and open-domain knowledge.

Leaderboard

All results below use the same evaluation protocol (gpt-4o-mini as query/judge model, binary J-score) for fair comparison.

#	System	J-Score	Infrastructure
1	MemMachine v0.1	84.9%	Vector DB + reranker
2	MemFabric	82.6%	Markdown files
3	Engram	80.0%	SQLite
4	Baseline (full context)	77.2%	Full conversation in LLM context
5	Memobase	75.8%	Database
6	Zep	75.1%	Database
7	Mem0 Graph	68.4%	Qdrant + graph store
8	Mem0	66.9%	Qdrant vector DB
9	LangMem	58.1%	LangChain
10	OpenAI Memory	52.9%	Proprietary

Note: newer systems (SmartSearch, EverMemOS, MemMachine v0.2) report 91-93% using gpt-4.1-mini as both answer and judge model (vs. gpt-4o-mini above). Under that protocol, even the full-context baseline jumps from 77% to 91%, so these numbers are not directly comparable.

Results by question type

Category	MemFabric	Full-context baseline	Delta
Overall	82.6%	77.2%	+5.4
Open-domain	86.7%	90.1%	-3.4
Single-hop	80.5%	76.2%	+4.3
Multi-hop	78.2%	50.2%	+28.0
Temporal	67.7%	57.3%	+10.4

Multi-hop reasoning (connecting facts scattered across different conversation sessions) is where organized memory files dominate. The full-context baseline has every detail available but struggles to connect information spread across 20K tokens of conversation. MemFabric's organized files co-locate related information, making cross-session reasoning dramatically easier.

The open-domain gap (-3.4pp) is the inherent cost of summarization: any fact not captured during ingest is a potential miss.

Key findings

The ingest model matters most. Swapping the query model (GPT-4o-mini vs GPT-5.4 Mini) improved scores by 0.2pp. Swapping the ingest model (Haiku to MiMo) improved scores by 3.9pp. The bottleneck is memory quality during extraction, not retrieval intelligence.

File organization is retrieval. The sweet spot is 20-27 files per conversation, each covering one topic for one person. Too few files (15) lumps topics together and hurts retrieval precision. Too many files (34) overwhelms the query model with choices.

Cost: ~$8 for the full benchmark. Ingest ~$5 (MiMo via OpenRouter), query + judge ~$3 (GPT-4o-mini). Total time: 77 minutes.

Full experiment history and analysis: benchmarks/locomo/RESULTS.md

MCP tools

The server exposes 9 tools via the Model Context Protocol. Each tool description tells the LLM when and how to use it:

Tool	Description
`remember(filename, content)`	Store a fact, preference, decision, or any information worth keeping across conversations. Always call `list_memories()` first to check for existing files. Uses semantic kebab-case filenames (e.g. "career-goals", "favorite-papers").
`list_memories()`	List all memory files with metadata. Call at the start of every conversation and before `remember()`. Returns filenames, entry counts, last updated dates, and sizes. Scan filenames to decide which to read.
`read_memory(filename)`	Read the full contents of a memory file. Call after `list_memories()` to read files relevant to the current conversation. Always read before responding so answers reflect what you know about the user.
`read_all_memories()`	Dump the full contents of every memory file. Only call when the user explicitly asks (e.g. "show me everything you remember"). For normal use, prefer `list_memories()` + `read_memory()`.
`update_memory(filename, content)`	Replace the entire contents of a memory file. Use when information is outdated ("I moved to Berlin", "I quit that job") or when a file needs rewriting for clarity. Always call `read_memory()` first.
`reorganize(operations)`	Restructure memory by merging, splitting, synthesizing, or renaming files. Call when `get_status()` shows many files since last reorganize, or when files overlap. Read files before reorganizing.
`get_rules()`	Read memory rules and onboarding status. Call at the start of every conversation. If `is_onboarded` is false, call `start()`. Rules define what to store, naming conventions, and language.
`edit_rules(content)`	Update memory behavior rules. Use when the user wants to change what gets stored, naming conventions, or language (e.g. "stop storing health info", "write entries in Spanish").
`get_status()`	Check memory system health and whether reorganization is needed. Returns total files, files since last reorganize, largest files, and oldest untouched files.
`start()`	Begin user onboarding. Starts a conversation to learn the user's name, location, work, and preferences, then stores their profile.

Setup

Try it locally

The fastest way to try MemFabric is to run it locally as an MCP server. In this mode, only the AI client on your machine will have access to your memory. It won't be shared across devices or other AI tools.

cd mcp
uv sync
MEMFABRIC_DATA_DIR=./data uv run python server.py
# Server starts at http://localhost:8000, MCP endpoint at /mcp/

Then connect your local AI client to http://localhost:8000/mcp/. For example, in Claude Desktop, add this to your MCP config:

{
  "mcpServers": {
    "memfabric": {
      "url": "http://localhost:8000/mcp/"
    }
  }
}

Deploy to Railway (shared across all your AIs)

To share memory across Claude.ai, ChatGPT, and any other MCP client, deploy to a hosted server:

Push this repo to GitHub
Create a new project on Railway
Connect your GitHub repo, point to the mcp/ directory
Add a persistent volume mounted at /data
Set environment variables:
- MEMFABRIC_TOKEN: secret token for auth (openssl rand -hex 32)
- MEMFABRIC_SERVER_URL: your Railway public URL
Deploy

Connect your AI

Claude.ai: Settings > Connectors > Add custom connector > enter https://your-app.up.railway.app/mcp/. OAuth flow handles auth automatically.

Tip: Name the connector something like "Memory — remembers everything across conversations". Claude sees this name in every conversation, so a descriptive name increases how often it actually uses the memory tools. You can also add a line to your Claude.ai custom instructions (Settings > Profile) like: "You have persistent memory via the Memory connector. Use it to store important facts about me and recall them in future conversations."

ChatGPT: Settings > Apps > Advanced Settings > Developer Mode > Create app > enter your MCP server URL. ChatGPT uses OAuth for MCP authentication.

Any MCP client: Point it at https://your-app.up.railway.app/mcp/ with OAuth or a Bearer token header (MEMFABRIC_TOKEN).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmarks/locomo		benchmarks/locomo
mcp		mcp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemFabric

What it looks like

Use cases

Why this works

Better models = better memory

Benchmark results

Leaderboard

Results by question type

Key findings

MCP tools

Setup

Try it locally

Deploy to Railway (shared across all your AIs)

Connect your AI

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MemFabric

What it looks like

Use cases

Why this works

Better models = better memory

Benchmark results

Leaderboard

Results by question type

Key findings

MCP tools

Setup

Try it locally

Deploy to Railway (shared across all your AIs)

Connect your AI

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages