GBT turns OpenClaw's throwaway execution logs into a reusable experience tree.
That changes the economics of agent work.
Without GBT, every similar task forces the agent to plan again, reason again, debug again, and spend expensive tokens again. With GBT, successful runs and failed runs are distilled into reusable operational memory. The next time a similar task appears, OpenClaw can stop wasting tokens on rediscovering the same path and instead execute against a learned tree of concrete experience.
And when a covered task fails, GBT does not just shrug and move on. It queues that failed trajectory, waits for idle time, asks for approval, silently replays the task inside OpenClaw, localizes the broken step, repairs it, verifies the repaired run against the real environment, and only then writes the repaired experience back into the tree.
This is not a prompt wrapper. It is a persistent experience system for OpenClaw.
This plugin is built on the core ideas from the paper:
Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents
Peiran Li, Jiashuo Sun, Fangzhou Lin, Shuo Xing, Tianfu Fu, Suofei Feng, Chaoqun Ni, Zhengzhong Tu
https://arxiv.org/abs/2603.05517
The plugin version in this repository focuses on turning agent logs into reusable experience trees, routing covered tasks onto cheaper single-step executors, and replaying failed trajectories to repair them into reusable experience. As noted below, this current release does not implement the paper's safety-gate mechanism.
- Distills completed OpenClaw runs into reusable, non-task-specific macro nodes.
- Stores both success and failure paths instead of discarding failure.
- Switches covered tasks onto a cheaper executor model when existing experience is strong enough.
- Injects step-local runtime guidance so the executor can act like a disciplined single-step worker instead of a full long-horizon planner.
- Tracks runtime spine progress and recovery hints during execution.
- Queues failed covered runs for idle-time self-evolution.
- Replays failed tasks in fresh OpenClaw sessions, extracts the real transcript, verifies the repair, and only then writes the repaired path back into the tree.
- Lower token cost on repeated task families.
- Less repeated planning on known workflows.
- Better reuse of hard-won debugging work.
- Stronger long-term personalization: each user grows their own tree from their own OpenClaw history.
- Better use of expensive reasoning models: spend them on repair and evolution, not on re-solving the same problem forever.
This release fully implements the core GBT and GBT-SE workflow for reusable experience distillation, covered-task guidance, and replay-backed self-evolution.
This release intentionally does not implement the paper's safety-gate system.
Before installing GBT, make sure you have:
- OpenClaw installed and working.
- Node.js available in the same environment that runs OpenClaw.
- Python 3.10+ available as
python3or another executable you can point the plugin to. - The Python
openaipackage installed. - At least one OpenClaw-capable model configured for your normal runs.
- OpenAI auth available if you want the bundled Python analysis path to do LLM-backed distillation / verification.
Install the Python dependency:
python3 -m pip install --upgrade openaiFrom this repository:
npm install
npm run buildIf you want a tarball for installation:
npm packThat produces a file like gbt-skill-0.1.0.tgz.
Install from the local repository:
openclaw plugins install .Or install from the packed tarball:
openclaw plugins install ./gbt-skill-0.1.0.tgzAfter install, restart OpenClaw if your setup requires a restart for plugin discovery.
Enable it through OpenClaw:
openclaw plugins enable gbt-skillYou can confirm it is visible with:
openclaw plugins list
openclaw plugins info gbt-skillAdd plugin config under plugins.entries.gbt-skill.config in your OpenClaw config.
Minimal example:
{
"plugins": {
"entries": {
"gbt-skill": {
"enabled": true,
"config": {
"pythonExecutable": "python3",
"stateSubdir": "gbt-skill",
"cheaperModel": "gpt-4.1-mini",
"cheaperProvider": "openai",
"coverageThreshold": 0.6,
"idleMinutes": 10
}
}
}
}
}What the main config values mean:
pythonExecutable: Python executable used to run the GBT engine.stateSubdir: Where GBT stores its tree, episodes, and self-evolve state.cheaperModel: Model used for covered tasks once GBT has reusable experience.cheaperProvider: Optional provider override for that cheaper executor.coverageThreshold: How confident GBT must be before switching into guided executor mode.idleMinutes: How long OpenClaw must stay idle before GBT asks whether to start self-evolution.distillModel: Optional analysis-model override. Leave it empty if you want GBT to inherit the model from the current OpenClaw run.selfEvolveReplayCommand: Optional override. Leave it empty to use GBT's built-in OpenClaw replay runner.selfEvolveReplayCwd: Optional working directory for a custom replay command.selfEvolveReplayTimeoutSec: Timeout for replay verification.
For most OpenAI-backed OpenClaw setups, you do not need to set distillModel.
GBT will inherit the main model from the run that just happened, and the bundled Python analysis engine will normalize OpenClaw model refs like openai/gpt-5.4 into the directly callable OpenAI model name.
Set distillModel only if one of these is true:
- you want GBT's internal distill / diagnose / replay-verification steps to use a different model than your main OpenClaw run
- your main OpenClaw model is not directly callable by the bundled Python OpenAI client
Example override:
{
"plugins": {
"entries": {
"gbt-skill": {
"enabled": true,
"config": {
"distillModel": "gpt-5.4"
}
}
}
}
}The built-in replay runner uses OpenClaw's embedded replay path plus the bundled Python analysis client for strict verification when a directly usable analysis model is available.
If your OpenClaw auth store does not already have an OpenAI profile, set an API key in the shell before replay or configure OpenClaw auth for OpenAI.
At minimum:
export OPENAI_API_KEY="your-key"GBT's built-in runner will register openai:default in the local OpenClaw auth store if needed during replay.
If you leave distillModel empty and your main OpenClaw runs are not OpenAI-backed, replay still works, but some verification / diagnosis steps may fall back to heuristic analysis unless you provide an explicit OpenAI analysis model.
You do not need a special launch mode.
Once the plugin is enabled:
- completed runs are distilled into the tree
- covered tasks can route onto the cheaper executor
- failed covered runs are queued for self-evolution
GBT is meant to sit inside the normal OpenClaw loop, not beside it.
GBT will:
- Normalize the tool log.
- Segment it into macro steps.
- Distill reusable summaries and metadata.
- Add those nodes and paths into the persistent experience tree.
GBT will:
- Match the task against the tree.
- Decide whether confidence is high enough.
- If covered, switch to your configured cheaper executor model.
- Inject macro-by-macro execution guidance into the prompt.
GBT will:
- Preserve the failed trajectory.
- Queue it for self-evolution.
- Wait until OpenClaw has been idle for the configured window.
- Ask you whether to start self-evolution.
- If approved, silently replay the task in a fresh OpenClaw session.
- Verify the repair using real transcript evidence before writing it back into the tree.
GBT exposes these commands:
/gbt status/gbt match <task>/gbt evolve approve/gbt evolve reject
Use your normal strong model for the main OpenClaw run, and reserve the cheaper model only for covered execution.
Example:
- main OpenClaw model: stronger reasoning model
distillModel: leave empty to inherit the main OpenClaw model, or set it explicitly only if you want a different analysis modelcheaperModel: inexpensive executor model
That gives you the intended split:
- strong models for original execution and repair analysis
- cheaper models for repeated covered work
Check build and tests:
npm test
npm run buildInspect plugin state:
/gbt statusAsk whether a task is covered:
/gbt match fix the failing parser test by editing the parser file and rerunning pytestIf GBT is active, you should start seeing:
- increasing node and episode counts
- covered-task matches with confidence scores
- failed covered jobs entering the self-evolve queue
This package is a real OpenClaw plugin:
package.jsondeclaresopenclaw.extensions = ["./dist/index.js"]openclaw.plugin.jsondefines plugin metadata and config schema- the bundled plugin prompt assets ship under
skills/gbt - the Python engine ships inside the package under
gbt_skill - the built-in self-evolve replay runner is included and enabled by default
The current release has been validated with:
pytest -qnpm testnpm run build- real OpenClaw embedded replay smoke runs using the built-in replay runner
GBT is powerful, but the system is still constrained by the underlying runtime:
- if OpenClaw itself cannot access the required tools or workspace, replay cannot fix that
- if your model/provider auth is missing, self-evolve replay cannot run
- very hard long-horizon repair cases may need multiple replay attempts
GBT turns OpenClaw from a stateless executor that keeps relearning the same lessons into a system that remembers, reuses, repairs, and gets cheaper on the work it has already paid to understand.
If you find GBT useful in your work, please consider citing:
@misc{li2026traversalaspolicylogdistilledgatedbehavior,
title={Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents},
author={Peiran Li and Jiashuo Sun and Fangzhou Lin and Shuo Xing and Tianfu Fu and Suofei Feng and Chaoqun Ni and Zhengzhong Tu},
year={2026},
eprint={2603.05517},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2603.05517},
}