The knowledge graph is FlowAI's compiled understanding of your project. It solves the fundamental cold-start problem in multi-agent AI workflows: every agent session normally begins with no memory of past sessions, forcing agents to re-read raw files to understand context — wasting tokens, time, and increasing hallucination risk.
Instead of re-deriving knowledge on every run, FlowAI builds and maintains a persistent knowledge graph that agents read as their primary navigation layer.
Important
The graph is a first-class capability, not an optional add-on.
flowai start enforces that a graph exists and prompts you to build one.
Machine graph output (JSON, index, incremental cache) lives under a wiki directory
(default .flowai/wiki/, alongside other FlowAI machine state). Override with:
graph.wiki_dirin.flowai/config.json(repo-relative path, e.g.docs/custom-graphif you must), orFLOWAI_GRAPH_WIKI_DIR(absolute path or path relative to project root).
<wiki_dir>/
├── graph.json ← Full graph: nodes, edges, provenance, metadata
├── index.md ← Content catalog: every wiki page with a one-line summary
├── log.md ← Append-only log of all graph operations
└── cache/ ← SHA256 per-file hashes for incremental builds
Additionally, FlowAI generates a human-readable GRAPH_REPORT.md outside that cache tree. It is placed under docs/GRAPH_REPORT.md when a docs/ directory exists, otherwise at the repository root. Configure explicitly with graph.report_path in .flowai/config.json, or override with FLOWAI_GRAPH_REPORT_PATH.
docs/
└── GRAPH_REPORT.md ← Start here: god nodes, communities, suggested queries
The architectural map every agent reads before touching any file. Contains:
- God nodes — the highest-degree hubs in the codebase (everything converges here)
- Community summaries — clusters of closely-related modules
- Architectural insights — key design decisions extracted from source and docs
- Suggested queries — pre-generated questions about the architecture
- Ambiguous relationships — flagged for human review
The full graph in JSON. Each edge has a provenance tag:
| Tag | Meaning |
|---|---|
EXTRACTED |
Directly present in source (imports, function calls, links). High confidence. |
INFERRED |
Reasonable inference from context. Comes with a confidence score (0.0–1.0). Treat as hypothesis. |
AMBIGUOUS |
Uncertain. Flagged for review. Do not assume correct. |
FlowAI's graph engine runs a dual-pass extraction:
Extracts the objective, deterministic structure from your code:
- Source file inventory from configured
scan_paths(src/,docs/,specs/by default) - Function definitions and call relationships
- Module imports and dependencies
- Markdown cross-references and links
- JSON configuration key mapping
Cross-language support: The structural pass extracts definitions and imports from:
| Language | Extracted |
|---|---|
| Bash | source, function definitions, command -v |
| Python | import, from, class, def |
| TypeScript/JavaScript | import, export, require |
| Go | import, func, type |
| Markdown | [links](), headings |
| JSON | Top-level keys |
All results are tagged EXTRACTED (high confidence). No inference involved.
Result: .flowai/wiki/cache/structural.json
Off by default. Set "semantic_enabled": true under graph in .flowai/config.json
to run the LLM on changed files (adds API cost and latency).
The LLM reads each changed file and extracts:
- Key concepts and their relationships to other concepts
- Design rationale from comments and documentation
- Architectural decisions and their consequences
- Patterns and anti-patterns
Results are tagged EXTRACTED, INFERRED, or AMBIGUOUS with confidence scores.
Result: .flowai/wiki/cache/semantic/<file-hash>.json
Both passes are merged into graph.json. Nodes are deduplicated by ID; edges are deduplicated by {source, target, relation}.
Community detection runs a two-layer algorithm (no external dependencies):
-
Centrality classification — degree-based:
- God — ≥10 edges (architectural load-bearers)
- Hub — 5–9 edges (well-connected modules)
- Leaf — <5 edges (peripheral files)
-
Label propagation — 5-iteration algorithm in jq that groups related modules into
community_idclusters. Each node adopts the most common label among its neighbors, converging on natural module boundaries.
Before every merge, the previous graph.json is backed up with a timestamp (e.g., graph.json.20260410T143000). Old backups are pruned to the configured retention limit.
Interactive rollback — flowai graph rollback presents a version browser:
FlowAI Graph — Version History
# Date Nodes Edges Size
-- -------------------- ----- ----- ----
0 (current) 142 387 48K <- active
1 2026-04-10 18:45:12 138 380 46K
2 2026-04-09 22:15:33 125 350 42K
Select version to restore (1-2) [1]: 2
!! WARNING: This will:
- Restore graph.json to version #2 (20260409T221533)
- DELETE 1 newer version(s) permanently
- A pre-rollback safety copy will be saved
Are you sure? [y/N]:
The rollback:
- Shows all backups with metadata (date, node/edge count, file size)
- Lets you pick which version to restore (gum choose or plain read)
- Warns before deleting newer versions (red "DELETE" warning)
- Always saves a
.pre-rollbacksafety copy before overwriting
Non-interactive mode for scripts and CI:
flowai graph rollback --latest # restores most recent backup, no confirmationConfigure retention in .flowai/config.json:
{
"graph": {
"versions_to_keep": 10
}
}Default: 5 versions.
# Build or rebuild the full graph
flowai graph build
# Force rebuild (ignores cache — reprocesses all files)
flowai graph build --force
# Incremental update (only changed files)
flowai graph update
# Mine git log → IMPLEMENTS edges + spec evolution[] (compiled project history)
flowai graph chronicle
# Ingest a document into the wiki
flowai graph ingest docs/ARCHITECTURE.md
# Query the wiki (answer is filed back as a wiki page)
flowai graph query "How does the skill resolution chain work?"
# Health-check: orphans, contradictions, stale claims
flowai graph lint
# Show graph health in the terminal
flowai graph status
# Read GRAPH_REPORT.md in the terminal pager
flowai graph reportRaw git history is expensive for agents to re-read every session. After flowai graph chronicle,
the graph stores a persistent, incremental summary:
evolution[]on spec nodes — commits whose messages reference a spec ID (same ID must appear in the spec’sfeature_ids, including YAMLid:merged at build time).IMPLEMENTSedges — code files touched in those commits, linked to the spec node.
This matches the spirit of a maintained wiki: compile once, query many times. It is
language- and layout-agnostic; monorepos can widen graph.scan_paths to include each package root.
Every FlowAI agent (master, plan, tasks, impl, review) automatically receives the graph context in its system prompt when a graph exists. The injected block looks like:
--- [FLOWAI KNOWLEDGE GRAPH] ---
A compiled knowledge graph of this codebase is available...
Graph: .flowai/wiki/graph.json — 234 nodes · 891 edges · 3 communities · built 2h ago
Start: docs/GRAPH_REPORT.md
Index: .flowai/wiki/index.md
Navigation protocol:
1. Read docs/GRAPH_REPORT.md before searching any files
2. Use index.md to find the exact wiki page for any concept
...
---
This is platform-level behavior — every agent gets it regardless of which skills are assigned.
The graph-aware-navigation skill (bundled, assigned to all roles by default) teaches agents
the full navigation protocol: GRAPH_REPORT.md → index.md → wiki pages → graph.json → source files.
The raw compiled graph database living in .flowai/wiki/ (including graph.json and cache/) is automatically ignored in git by default (handled via .gitignore). You should never force-commit it to GitHub. It behaves identically to an AI node_modules folder, rapidly accumulating large incremental JSON blocks that will cause immediate merge conflict nightmares.
Instead, every time FlowAI builds the graph locally, it automatically compiles and exports a concise, human-readable markdown dashboard: docs/GRAPH_REPORT.md
To seamlessly share architectural visibility with your team:
- Leave
.flowai/strictly ignored in version control. - Commit
docs/GRAPH_REPORT.mdcleanly to your repository. - Your human team leads can immediately verify the health, code coverage, and God nodes right on GitHub, while any developer who checks out the repository can instantly compile the raw JSON variant locally by running
flowai build.
- Incremental builds: Only files with changed SHA256 hashes are reprocessed
- Token efficiency: Agents navigate compiled artifacts instead of raw files
- No external dependencies: The structural pass (Pass 1) requires only bash + jq
- Graceful degradation: If the graph is missing, agents fall back to raw file reads
(but
flowai startwill prompt you to build it)
The graph section in .flowai/config.json:
{
"graph": {
"enabled": true,
"scan_paths": ["src", "docs", "specs"],
"ignore_patterns": ["*.generated.*", "*.min.js", "*.min.css"],
"max_age_hours": 24,
"auto_build": false,
"versions_to_keep": 5
}
}| Key | Default | Description |
|---|---|---|
enabled |
true |
Whether the graph system is active |
scan_paths |
["src","docs","specs"] |
Project-relative directories to scan |
ignore_patterns |
[] |
Glob patterns to exclude from scanning |
max_age_hours |
24 |
Age threshold before graph is considered stale |
auto_build |
false |
Reserved for future CI integration |
versions_to_keep |
5 |
Number of graph.json backups to retain |
FlowAI uses Spec-Driven Development (SDD): specs are the authoritative source of intent before any code is written. The graph engine treats spec files differently from source files:
| Property | Source file node | Spec node |
|---|---|---|
| Node type | file |
spec |
| Trust level | standard | HIGH |
| Edge type | sources, defines, references |
SPECIFIES |
| Extra metadata | — | feature_ids, criteria |
In GRAPH_REPORT.md |
God Nodes section | Spec Coverage section |
FlowAI detects spec files by path and naming convention:
- By path:
specs/,.specify/,spec/ - By name:
*.spec.md,requirements*.md,acceptance*.md,adr*.md,rfc*.md,user-story*.md,prd*.md,feature*.md
- Feature IDs: patterns like
UC-XXX-NNN,FEAT-NNN,STORY-NNN,REQ-NNN,RFC-NNN - Acceptance criteria: headings starting with Acceptance, Given, When, Then, Must, Should, Shall
- SPECIFIES edges: every project-relative link from a spec to a source file or wiki page
creates a
SPECIFIESedge, making spec-to-code traceability machine-readable
The Spec Coverage section in GRAPH_REPORT.md shows:
- How many spec documents exist
- How many
SPECIFIESedges were found (spec → code traceability) - Which specs have zero implementation edges (possible unimplemented features)
Run flowai graph lint to detect specs with no corresponding implementation and
code that has no spec coverage.
Specs (.specify/, specs/) ─ SPECIFIES edges ─► Implementation (src/)
↑
flowai graph lint detects divergence
Before touching any source file, agents read the relevant spec node first. This ensures implementation faithfully reflects intent and reduces hallucinated behavior.
The persistent wiki pattern is inspired by Andrej Karpathy's llm-wiki concept — the idea of compiling knowledge into a persistent wiki maintained by the LLM rather than re-deriving context from scratch every session.
FlowAI's implementation is purpose-built for multi-agent agentic pipelines and Spec-Driven Development workflows. All code is original — graph extraction, community detection, wiki operations, and CLI integration are written from scratch in bash + jq, designed to work without Python, external graph databases, or network dependencies.