Self-organizing agent memory. No embeddings, no vector DB.
MemFabric is a memory system where the LLM organizes its own knowledge as plain markdown files with descriptive filenames. There is no vector database, no embedding pipeline, no retrieval model. The LLM reads a list of filenames, decides what's relevant, and reads those files. It stores new information by creating, appending to, or reorganizing files: merging, splitting, renaming, and synthesizing as its knowledge grows.
The server just stores and retrieves files. All intelligence (what to remember, how to organize, what's relevant) comes from the connected LLM.
After a few weeks of conversations, your AI builds up memory like this:
memory/
about-me.md 0.8 KB
family-and-friends.md 1.4 KB
food-and-cooking.md 1.1 KB
health.md 0.6 KB
travel.md 2.3 KB
Then in a new conversation:
You: My sister is visiting this weekend. What should I cook?
AI: [reads family-and-friends.md — sister is Sarah, loves Italian]
[reads health.md — you're lactose intolerant]
Since Sarah loves Italian and you're dairy-free, how about
a lemon pasta with roasted garlic and cherry tomatoes? You
made something similar last month and liked it.
You: Perfect, let's do that.
AI: [writes to food-and-cooking.md]
Noted! I saved it so I'll remember for next time.
Two reads, one write. The AI connected your sister's preferences with your dietary needs without being reminded of either. No setup, no schema, no retrieval config. This is a small example; in practice you'll have 20-30 files or more, and the AI reorganizes them over time, merging, splitting, and rewriting to keep things clean.
- Personal memory for AI chatbots - give Claude, ChatGPT, or Gemini persistent memory that works across conversations and follows you between providers
- Memory for OpenClaw - equip open-source computer-use agents with long-term context about the user and their environment
- Multi-agent shared memory - multiple agents read and write to the same MemFabric instance, using filenames as a shared namespace
- Coding assistants - tools like Claude Code can remember codebase decisions, architecture context, and debugging history across sessions
Most memory systems for LLMs use embeddings and vector search to find relevant memories. This adds infrastructure, introduces retrieval errors, and creates a dependency on embedding quality. But LLMs are already excellent at judging relevance from natural language descriptions. That's literally what they do.
MemFabric exploits this: the filename is the retrieval mechanism. A file called audrey-career-and-promotions.md tells the LLM everything it needs to know about whether to read it. No embedding needed.
The LLM also manages file organization. When files get too large or topics overlap, the LLM merges, splits, or synthesizes them, the same way a human would reorganize their notes. This means the memory structure improves over time, adapting to the actual information being stored rather than following a fixed schema.
Because all intelligence lives in the LLM, MemFabric gets better automatically as models improve. I tested 5 different models on the same memory task. The pattern is clear:
| Model | J-Score | Files created | Notes |
|---|---|---|---|
| Haiku 4.5 | 78.3% | 15 | Too few files, topics lumped together |
| DeepSeek V3 | 79.6% | 34 | Over-fragmented, too many small files |
| MiniMax M2.5 | 82.2% | 18 | Good balance |
| Claude Sonnet | 83.6% | 23 | Most balanced, best temporal reasoning |
| MiMo-V2-Pro | 84.2% | 27 | Best overall, strongest multi-hop |
More capable models produce better-organized memory files: they create the right number of files, choose more descriptive names, and capture more nuanced information. The gap between the weakest and strongest model is 5.9pp, with no changes to the server or tools.
This is the key advantage over embedding-based approaches: vector databases don't get better when models improve. MemFabric does. Every future model improvement translates directly into better memory organization and retrieval.
I evaluated MemFabric on LoCoMo (Long Conversational Memory), a benchmark of 10 multi-session conversations with 1,540 questions testing single-hop recall, multi-hop reasoning, temporal reasoning, and open-domain knowledge.
All results below use the same evaluation protocol (gpt-4o-mini as query/judge model, binary J-score) for fair comparison.
| # | System | J-Score | Infrastructure |
|---|---|---|---|
| 1 | MemMachine v0.1 | 84.9% | Vector DB + reranker |
| 2 | MemFabric | 82.6% | Markdown files |
| 3 | Engram | 80.0% | SQLite |
| 4 | Baseline (full context) | 77.2% | Full conversation in LLM context |
| 5 | Memobase | 75.8% | Database |
| 6 | Zep | 75.1% | Database |
| 7 | Mem0 Graph | 68.4% | Qdrant + graph store |
| 8 | Mem0 | 66.9% | Qdrant vector DB |
| 9 | LangMem | 58.1% | LangChain |
| 10 | OpenAI Memory | 52.9% | Proprietary |
Note: newer systems (SmartSearch, EverMemOS, MemMachine v0.2) report 91-93% using gpt-4.1-mini as both answer and judge model (vs. gpt-4o-mini above). Under that protocol, even the full-context baseline jumps from 77% to 91%, so these numbers are not directly comparable.
| Category | MemFabric | Full-context baseline | Delta |
|---|---|---|---|
| Overall | 82.6% | 77.2% | +5.4 |
| Open-domain | 86.7% | 90.1% | -3.4 |
| Single-hop | 80.5% | 76.2% | +4.3 |
| Multi-hop | 78.2% | 50.2% | +28.0 |
| Temporal | 67.7% | 57.3% | +10.4 |
Multi-hop reasoning (connecting facts scattered across different conversation sessions) is where organized memory files dominate. The full-context baseline has every detail available but struggles to connect information spread across 20K tokens of conversation. MemFabric's organized files co-locate related information, making cross-session reasoning dramatically easier.
The open-domain gap (-3.4pp) is the inherent cost of summarization: any fact not captured during ingest is a potential miss.
The ingest model matters most. Swapping the query model (GPT-4o-mini vs GPT-5.4 Mini) improved scores by 0.2pp. Swapping the ingest model (Haiku to MiMo) improved scores by 3.9pp. The bottleneck is memory quality during extraction, not retrieval intelligence.
File organization is retrieval. The sweet spot is 20-27 files per conversation, each covering one topic for one person. Too few files (15) lumps topics together and hurts retrieval precision. Too many files (34) overwhelms the query model with choices.
Cost: ~$8 for the full benchmark. Ingest ~$5 (MiMo via OpenRouter), query + judge ~$3 (GPT-4o-mini). Total time: 77 minutes.
Full experiment history and analysis: benchmarks/locomo/RESULTS.md
The server exposes 9 tools via the Model Context Protocol. Each tool description tells the LLM when and how to use it:
| Tool | Description |
|---|---|
remember(filename, content) |
Store a fact, preference, decision, or any information worth keeping across conversations. Always call list_memories() first to check for existing files. Uses semantic kebab-case filenames (e.g. "career-goals", "favorite-papers"). |
list_memories() |
List all memory files with metadata. Call at the start of every conversation and before remember(). Returns filenames, entry counts, last updated dates, and sizes. Scan filenames to decide which to read. |
read_memory(filename) |
Read the full contents of a memory file. Call after list_memories() to read files relevant to the current conversation. Always read before responding so answers reflect what you know about the user. |
read_all_memories() |
Dump the full contents of every memory file. Only call when the user explicitly asks (e.g. "show me everything you remember"). For normal use, prefer list_memories() + read_memory(). |
update_memory(filename, content) |
Replace the entire contents of a memory file. Use when information is outdated ("I moved to Berlin", "I quit that job") or when a file needs rewriting for clarity. Always call read_memory() first. |
reorganize(operations) |
Restructure memory by merging, splitting, synthesizing, or renaming files. Call when get_status() shows many files since last reorganize, or when files overlap. Read files before reorganizing. |
get_rules() |
Read memory rules and onboarding status. Call at the start of every conversation. If is_onboarded is false, call start(). Rules define what to store, naming conventions, and language. |
edit_rules(content) |
Update memory behavior rules. Use when the user wants to change what gets stored, naming conventions, or language (e.g. "stop storing health info", "write entries in Spanish"). |
get_status() |
Check memory system health and whether reorganization is needed. Returns total files, files since last reorganize, largest files, and oldest untouched files. |
start() |
Begin user onboarding. Starts a conversation to learn the user's name, location, work, and preferences, then stores their profile. |
The fastest way to try MemFabric is to run it locally as an MCP server. In this mode, only the AI client on your machine will have access to your memory. It won't be shared across devices or other AI tools.
cd mcp
uv sync
MEMFABRIC_DATA_DIR=./data uv run python server.py
# Server starts at http://localhost:8000, MCP endpoint at /mcp/Then connect your local AI client to http://localhost:8000/mcp/. For example, in Claude Desktop, add this to your MCP config:
{
"mcpServers": {
"memfabric": {
"url": "http://localhost:8000/mcp/"
}
}
}To share memory across Claude.ai, ChatGPT, and any other MCP client, deploy to a hosted server:
- Push this repo to GitHub
- Create a new project on Railway
- Connect your GitHub repo, point to the
mcp/directory - Add a persistent volume mounted at
/data - Set environment variables:
MEMFABRIC_TOKEN: secret token for auth (openssl rand -hex 32)MEMFABRIC_SERVER_URL: your Railway public URL
- Deploy
Claude.ai: Settings > Connectors > Add custom connector > enter https://your-app.up.railway.app/mcp/. OAuth flow handles auth automatically.
Tip: Name the connector something like "Memory — remembers everything across conversations". Claude sees this name in every conversation, so a descriptive name increases how often it actually uses the memory tools. You can also add a line to your Claude.ai custom instructions (Settings > Profile) like: "You have persistent memory via the Memory connector. Use it to store important facts about me and recall them in future conversations."
ChatGPT: Settings > Apps > Advanced Settings > Developer Mode > Create app > enter your MCP server URL. ChatGPT uses OAuth for MCP authentication.
Any MCP client: Point it at https://your-app.up.railway.app/mcp/ with OAuth or a Bearer token header (MEMFABRIC_TOKEN).
MIT