bug(gateguard): race condition in session state read-modify-write path

Follow-up to #1406, #1419, #1439. Splitting the state-layer concurrency concern into its own issue so it can be designed and reviewed independently of the hot-path performance fix that already landed via #1439.

## Problem

`scripts/hooks/gateguard-fact-force.js` persists session state (`checked` array + `last_active`) via a classic load-modify-save pattern with no mutual exclusion:

```js
function markChecked(key) {
  const state = loadState();                  // 1. read
  if (!state.checked.includes(key)) {
    state.checked.push(key);                  // 2. modify
    saveState(state);                         // 3. atomic file rename
  }
}
```

`saveState()` is atomic at the filesystem level (temp file + `fs.renameSync`), so we never see a *corrupted* state file. But the read-modify-write window itself is unprotected, and the hook runs as a fresh Node process per tool invocation — so there is no shared memory between concurrent hook processes to coordinate on.

## Concurrent scenario

Two tool calls that share a `CLAUDE_SESSION_ID` fire nearly simultaneously (parallel tool use, or rapid-fire edits on the same session):

1. Process A: `loadState()` → `{checked: [a], last_active: t0}`
2. Process B: `loadState()` → `{checked: [a], last_active: t0}`   (same snapshot)
3. Process A: `markChecked(b)` → writes `{checked: [a, b], last_active: t1}`
4. Process B: `markChecked(c)` → writes `{checked: [a, c], last_active: t2}`   (b is gone)

User-visible effect: the file `b` that process A just gated appears un-checked to the next invocation, so Claude gets re-prompted with the Fact-Forcing gate for a file it already investigated. Not data loss in the strict sense, but a UX regression proportional to how often parallel tool calls land on the same session.

The `READ_HEARTBEAT_MS` throttle landed in #1439 narrows the read-path window (~99% of invocations are now pure reads), but the `markChecked()` path still exhibits the full race on every new gate, and the throttled heartbeat write in `isChecked()` still reproduces it when it does fire.

## Candidate fixes

Three options, roughly ordered by engineering cost / platform portability trade-off:

### 1. `flock`-style advisory lock around read-modify-write

Wrap the entire `loadState` → mutate → `saveState` sequence in an exclusive lock on `STATE_FILE` (or a sibling `.lock` file). Portable options:

- [`proper-lockfile`](https://github.com/moxystudio/node-proper-lockfile) — cross-platform, already handles stale locks, small footprint.
- Raw `fs.openSync(path, 'wx')` sentinel file with retry — zero dependency, but requires reinventing stale-lock handling.
- Shell out to `flock` on POSIX + `powershell Wait-File` on Windows — brittle, not recommended.

**Pros:** closes the race fully. **Cons:** new dependency or stale-lock logic; slight latency on every mutation.

### 2. Merge-style saves

Before `fs.renameSync`, re-read the current on-disk state one last time and *merge* it with the in-memory state: union of `checked`, `max(last_active)`. Roughly:

```js
function saveState(state) {
  const disk = loadStateRaw();                // non-throwing
  state.checked = Array.from(new Set([...(disk.checked || []), ...state.checked]));
  state.last_active = Math.max(disk.last_active || 0, state.last_active);
  // ... existing prune + atomic write
}
```

**Pros:** zero new dependencies, no lock contention, matches the "additive-only state" semantics this code already has. **Cons:** still not strictly serializable (two merges can race), but entries are never *lost* — just occasionally duplicated before the next prune, which is fine.

### 3. In-memory coordination via short-lived lease

Maintain a per-session lease file (e.g. `state-<sid>.lease`) that hook processes claim for, say, 100ms before mutating. Simpler than flock but less robust than merge-style saves on crash/ctrl-C.

**Pros:** cheap. **Cons:** needs correct cleanup; behaves worst-case the same as option 2 under crashes.

## Recommendation

Option 2 (merge-style saves) feels like the right default for ECC's profile: no new dependency, tolerant of crashes, preserves the "additive entries + bounded pruning" semantics already in place. If/when GateGuard needs stronger invariants (e.g. session-level transactions across multiple files), revisit with option 1.

## Severity

P2 — UX regression only, no data loss. Easy to reproduce under parallel tool calls but rare in typical interactive sessions.

## Links

- #1406 — original multi-issue report
- #1419 — hot-path read/write removal (my initial PR, superseded)
- #1439 — GateGuard consolidation that landed the read-heartbeat throttle
- Prior discussion: https://github.com/affaan-m/everything-claude-code/pull/1419#issuecomment-4241045692 and https://github.com/affaan-m/everything-claude-code/pull/1439#issuecomment-4248766604

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug(gateguard): race condition in session state read-modify-write path #1441

Problem

Concurrent scenario

Candidate fixes

1. `flock`-style advisory lock around read-modify-write

2. Merge-style saves

3. In-memory coordination via short-lived lease

Recommendation

Severity

Links

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

bug(gateguard): race condition in session state read-modify-write path #1441

Description

Problem

Concurrent scenario

Candidate fixes

1. flock-style advisory lock around read-modify-write

2. Merge-style saves

3. In-memory coordination via short-lived lease

Recommendation

Severity

Links

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `flock`-style advisory lock around read-modify-write