FlowAI Knowledge Graph

The knowledge graph is FlowAI's compiled understanding of your project. It solves the fundamental cold-start problem in multi-agent AI workflows: every agent session normally begins with no memory of past sessions, forcing agents to re-read raw files to understand context — wasting tokens, time, and increasing hallucination risk.

Instead of re-deriving knowledge on every run, FlowAI builds and maintains a persistent knowledge graph that agents read as their primary navigation layer.

Important

The graph is a first-class capability, not an optional add-on. flowai start enforces that a graph exists and prompts you to build one.

What the Graph Contains

Machine graph output (JSON, index, incremental cache) lives under a wiki directory (default .flowai/wiki/, alongside other FlowAI machine state). Override with:

graph.wiki_dir in .flowai/config.json (repo-relative path, e.g. docs/custom-graph if you must), or
FLOWAI_GRAPH_WIKI_DIR (absolute path or path relative to project root).

<wiki_dir>/
├── graph.json       ← Full graph: nodes, edges, provenance, metadata
├── index.md         ← Content catalog: every wiki page with a one-line summary
├── log.md           ← Append-only log of all graph operations
└── cache/           ← SHA256 per-file hashes for incremental builds

Additionally, FlowAI generates a human-readable GRAPH_REPORT.md outside that cache tree. It is placed under docs/GRAPH_REPORT.md when a docs/ directory exists, otherwise at the repository root. Configure explicitly with graph.report_path in .flowai/config.json, or override with FLOWAI_GRAPH_REPORT_PATH.

docs/
└── GRAPH_REPORT.md  ← Start here: god nodes, communities, suggested queries

`GRAPH_REPORT.md`

The architectural map every agent reads before touching any file. Contains:

God nodes — the highest-degree hubs in the codebase (everything converges here)
Community summaries — clusters of closely-related modules
Architectural insights — key design decisions extracted from source and docs
Suggested queries — pre-generated questions about the architecture
Ambiguous relationships — flagged for human review

`graph.json`

The full graph in JSON. Each edge has a provenance tag:

Tag	Meaning
`EXTRACTED`	Directly present in source (imports, function calls, links). High confidence.
`INFERRED`	Reasonable inference from context. Comes with a confidence score (0.0–1.0). Treat as hypothesis.
`AMBIGUOUS`	Uncertain. Flagged for review. Do not assume correct.

How the Graph is Built

FlowAI's graph engine runs a dual-pass extraction:

Pass 1: Structural (no LLM, pure bash)

Extracts the objective, deterministic structure from your code:

Source file inventory from configured scan_paths (src/, docs/, specs/ by default)
Function definitions and call relationships
Module imports and dependencies
Markdown cross-references and links
JSON configuration key mapping

Cross-language support: The structural pass extracts definitions and imports from:

Language	Extracted
Bash	`source`, function definitions, `command -v`
Python	`import`, `from`, `class`, `def`
TypeScript/JavaScript	`import`, `export`, `require`
Go	`import`, `func`, `type`
Markdown	`[links]()`, headings
JSON	Top-level keys

All results are tagged EXTRACTED (high confidence). No inference involved.

Result: .flowai/wiki/cache/structural.json

Pass 2: Semantic (LLM, optional)

Off by default. Set "semantic_enabled": true under graph in .flowai/config.json to run the LLM on changed files (adds API cost and latency).

The LLM reads each changed file and extracts:

Key concepts and their relationships to other concepts
Design rationale from comments and documentation
Architectural decisions and their consequences
Patterns and anti-patterns

Results are tagged EXTRACTED, INFERRED, or AMBIGUOUS with confidence scores.

Result: .flowai/wiki/cache/semantic/<file-hash>.json

Merge + Community Detection

Both passes are merged into graph.json. Nodes are deduplicated by ID; edges are deduplicated by {source, target, relation}.

Community detection runs a two-layer algorithm (no external dependencies):

Centrality classification — degree-based:
- God — ≥10 edges (architectural load-bearers)
- Hub — 5–9 edges (well-connected modules)
- Leaf — <5 edges (peripheral files)
Label propagation — 5-iteration algorithm in jq that groups related modules into community_id clusters. Each node adopts the most common label among its neighbors, converging on natural module boundaries.

Graph Versioning & Rollback

Before every merge, the previous graph.json is backed up with a timestamp (e.g., graph.json.20260410T143000). Old backups are pruned to the configured retention limit.

Interactive rollback — flowai graph rollback presents a version browser:

FlowAI Graph — Version History
#   Date                   Nodes   Edges   Size
--  --------------------   -----   -----   ----
0   (current)                142     387    48K  <- active
1   2026-04-10 18:45:12      138     380    46K
2   2026-04-09 22:15:33      125     350    42K

Select version to restore (1-2) [1]: 2

!! WARNING: This will:
   - Restore graph.json to version #2 (20260409T221533)
   - DELETE 1 newer version(s) permanently
   - A pre-rollback safety copy will be saved

Are you sure? [y/N]:

The rollback:

Shows all backups with metadata (date, node/edge count, file size)
Lets you pick which version to restore (gum choose or plain read)
Warns before deleting newer versions (red "DELETE" warning)
Always saves a .pre-rollback safety copy before overwriting

Non-interactive mode for scripts and CI:

flowai graph rollback --latest   # restores most recent backup, no confirmation

Configure retention in .flowai/config.json:

{
  "graph": {
    "versions_to_keep": 10
  }
}

Default: 5 versions.

Commands

# Build or rebuild the full graph
flowai graph build

# Force rebuild (ignores cache — reprocesses all files)
flowai graph build --force

# Incremental update (only changed files)
flowai graph update

# Mine git log → IMPLEMENTS edges + spec evolution[] (compiled project history)
flowai graph chronicle

# Ingest a document into the wiki
flowai graph ingest docs/ARCHITECTURE.md

# Query the wiki (answer is filed back as a wiki page)
flowai graph query "How does the skill resolution chain work?"

# Health-check: orphans, contradictions, stale claims
flowai graph lint

# Show graph health in the terminal
flowai graph status

# Read GRAPH_REPORT.md in the terminal pager
flowai graph report

Chronicle & spec evolution (Karpathy-style compiled history)

Raw git history is expensive for agents to re-read every session. After flowai graph chronicle, the graph stores a persistent, incremental summary:

evolution[] on spec nodes — commits whose messages reference a spec ID (same ID must appear in the spec’s feature_ids, including YAML id: merged at build time).
IMPLEMENTS edges — code files touched in those commits, linked to the spec node.

This matches the spirit of a maintained wiki: compile once, query many times. It is language- and layout-agnostic; monorepos can widen graph.scan_paths to include each package root.

How Agents Use the Graph

Every FlowAI agent (master, plan, tasks, impl, review) automatically receives the graph context in its system prompt when a graph exists. The injected block looks like:

--- [FLOWAI KNOWLEDGE GRAPH] ---
A compiled knowledge graph of this codebase is available...
  Graph:  .flowai/wiki/graph.json — 234 nodes · 891 edges · 3 communities · built 2h ago
  Start:  docs/GRAPH_REPORT.md
  Index:  .flowai/wiki/index.md
Navigation protocol:
  1. Read docs/GRAPH_REPORT.md before searching any files
  2. Use index.md to find the exact wiki page for any concept
  ...
---

This is platform-level behavior — every agent gets it regardless of which skills are assigned.

The graph-aware-navigation skill (bundled, assigned to all roles by default) teaches agents the full navigation protocol: GRAPH_REPORT.md → index.md → wiki pages → graph.json → source files.

Sharing the Graph with Your Team

The raw compiled graph database living in .flowai/wiki/ (including graph.json and cache/) is automatically ignored in git by default (handled via .gitignore). You should never force-commit it to GitHub. It behaves identically to an AI node_modules folder, rapidly accumulating large incremental JSON blocks that will cause immediate merge conflict nightmares.

Instead, every time FlowAI builds the graph locally, it automatically compiles and exports a concise, human-readable markdown dashboard: docs/GRAPH_REPORT.md

To seamlessly share architectural visibility with your team:

Leave .flowai/ strictly ignored in version control.
Commit docs/GRAPH_REPORT.md cleanly to your repository.
Your human team leads can immediately verify the health, code coverage, and God nodes right on GitHub, while any developer who checks out the repository can instantly compile the raw JSON variant locally by running flowai build.

Performance Characteristics

Incremental builds: Only files with changed SHA256 hashes are reprocessed
Token efficiency: Agents navigate compiled artifacts instead of raw files
No external dependencies: The structural pass (Pass 1) requires only bash + jq
Graceful degradation: If the graph is missing, agents fall back to raw file reads (but flowai start will prompt you to build it)

Configuration

The graph section in .flowai/config.json:

{
  "graph": {
    "enabled": true,
    "scan_paths": ["src", "docs", "specs"],
    "ignore_patterns": ["*.generated.*", "*.min.js", "*.min.css"],
    "max_age_hours": 24,
    "auto_build": false,
    "versions_to_keep": 5
  }
}

Key	Default	Description
`enabled`	`true`	Whether the graph system is active
`scan_paths`	`["src","docs","specs"]`	Project-relative directories to scan
`ignore_patterns`	`[]`	Glob patterns to exclude from scanning
`max_age_hours`	`24`	Age threshold before graph is considered stale
`auto_build`	`false`	Reserved for future CI integration
`versions_to_keep`	`5`	Number of graph.json backups to retain

Spec-Driven Development Integration

FlowAI uses Spec-Driven Development (SDD): specs are the authoritative source of intent before any code is written. The graph engine treats spec files differently from source files:

Spec nodes vs source nodes

Property	Source file node	Spec node
Node type	`file`	`spec`
Trust level	standard	`HIGH`
Edge type	`sources`, `defines`, `references`	`SPECIFIES`
Extra metadata	—	`feature_ids`, `criteria`
In `GRAPH_REPORT.md`	God Nodes section	Spec Coverage section

What qualifies as a spec file

FlowAI detects spec files by path and naming convention:

By path: specs/, .specify/, spec/
By name: *.spec.md, requirements*.md, acceptance*.md, adr*.md, rfc*.md, user-story*.md, prd*.md, feature*.md

What gets extracted from specs

Feature IDs: patterns like UC-XXX-NNN, FEAT-NNN, STORY-NNN, REQ-NNN, RFC-NNN
Acceptance criteria: headings starting with Acceptance, Given, When, Then, Must, Should, Shall
SPECIFIES edges: every project-relative link from a spec to a source file or wiki page creates a SPECIFIES edge, making spec-to-code traceability machine-readable

Spec coverage in GRAPH_REPORT.md

The Spec Coverage section in GRAPH_REPORT.md shows:

How many spec documents exist
How many SPECIFIES edges were found (spec → code traceability)
Which specs have zero implementation edges (possible unimplemented features)

Run flowai graph lint to detect specs with no corresponding implementation and code that has no spec coverage.

SDD navigation protocol

Specs (.specify/, specs/)  ─ SPECIFIES edges ─►  Implementation (src/)
                      ↑
          flowai graph lint detects divergence

Before touching any source file, agents read the relevant spec node first. This ensures implementation faithfully reflects intent and reduces hallucinated behavior.

Inspiration

The persistent wiki pattern is inspired by Andrej Karpathy's llm-wiki concept — the idea of compiling knowledge into a persistent wiki maintained by the LLM rather than re-deriving context from scratch every session.

FlowAI's implementation is purpose-built for multi-agent agentic pipelines and Spec-Driven Development workflows. All code is original — graph extraction, community detection, wiki operations, and CLI integration are written from scratch in bash + jq, designed to work without Python, external graph databases, or network dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlowAI Knowledge Graph

What the Graph Contains

`GRAPH_REPORT.md`

`graph.json`

How the Graph is Built

Pass 1: Structural (no LLM, pure bash)

Pass 2: Semantic (LLM, optional)

Merge + Community Detection

Graph Versioning & Rollback

Commands

Chronicle & spec evolution (Karpathy-style compiled history)

How Agents Use the Graph

Sharing the Graph with Your Team

Performance Characteristics

Configuration

Spec-Driven Development Integration

Spec nodes vs source nodes

What qualifies as a spec file

What gets extracted from specs

Spec coverage in GRAPH_REPORT.md

SDD navigation protocol

Inspiration

FilesExpand file tree

GRAPH.md

Latest commit

History

GRAPH.md

File metadata and controls

FlowAI Knowledge Graph

What the Graph Contains

GRAPH_REPORT.md

graph.json

How the Graph is Built

Pass 1: Structural (no LLM, pure bash)

Pass 2: Semantic (LLM, optional)

Merge + Community Detection

Graph Versioning & Rollback

Commands

Chronicle & spec evolution (Karpathy-style compiled history)

How Agents Use the Graph

Sharing the Graph with Your Team

Performance Characteristics

Configuration

Spec-Driven Development Integration

Spec nodes vs source nodes

What qualifies as a spec file

What gets extracted from specs

Spec coverage in GRAPH_REPORT.md

SDD navigation protocol

Inspiration

`GRAPH_REPORT.md`

`graph.json`