HistoryPruner: self-bootstrap in TryPruneHistory instead of relying on SyncServer by AnkushinDaniil · Pull Request #11254 · NethermindEth/nethermind

AnkushinDaniil · 2026-04-20T13:38:17Z

Changes

HistoryPruner.TryPruneHistory: remove the redundant pre-lock !_hasLoadedDeletePointer gate so the pruner self-bootstraps on first entry instead of depending on SyncServer.OnNewRange to call OldestBlockHeader at a %32 block boundary. The inner path at HistoryPruner.cs:309 already calls TryLoadDeletePointer() and handles failure — the outer check only served to suppress bootstrap until an external caller primed it.

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing

Yes

If yes, did you write tests?

Yes

Documentation

Requires documentation update

No

Requires explanation in Release Notes

No

Remarks

Root cause

On a node whose head never crosses a %32 block boundary after startup (frozen snapshot, peerless, no CL attached), history pruning never runs even when --History.Pruning=Rolling is configured. _hasLoadedDeletePointer only flips when TryLoadDeletePointer() runs, and in production that only happens via the OldestBlockHeader getter, which is only called from SyncServer.OnNewRange (SyncServer.cs:472) gated on block.Number % 32 == 0 (SyncServer.cs:469). Nothing inside HistoryPruner observes or controls that condition, so the pruner waits on an external event that may never fire.

Why safe

Concurrency unchanged: _currentlyPruning atomic still prevents concurrent pruning tasks; _pruneLock still serializes actual work inside TryPruneHistory.
Cost: one extra Monitor.TryEnter/Monitor.Exit per trigger during the brief startup window before bootstrap — one-time, not steady state. After first load, TryLoadDeletePointer() early-returns at line 531 with no DB read.
Pre-lock ShouldPruneHistory call remains valid before bootstrap: reads _deletePointer (default 0) and _lastPrunedTimestamp (default null); both branches return sensibly and let the flow fall into the lock path where bootstrap + prune happen atomically.

Relationship to #9677

The pre-lock gate was introduced in #9677 as a fast-path optimization. This PR removes only that one line and preserves every other improvement from #9677 (atomic _currentlyPruning, exclusive scheduling, TryScheduleTask rename)

claude · 2026-04-20T13:38:34Z

Claude finished @AnkushinDaniil's task in 2m 15s —— View job

Code Review

Gather context and read relevant files
Load applicable coding/robustness rules
Analyze the one-line fix in HistoryPruner.cs
Review the regression test in HistoryPrunerTests.cs
Check concurrency correctness
Post findings

Summary

This is a clean, minimal, well-reasoned fix. The root cause diagnosis is accurate and the solution correctly addresses it. No critical, high, or medium issues found.

No findings that block merge.

Fix analysis (`HistoryPruner.cs:295`)

The removed line:

!_hasLoadedDeletePointer ||

created a chicken-and-egg deadlock: _hasLoadedDeletePointer was only set inside TryLoadDeletePointer(), which was only reachable via:

OldestBlockHeader getter → gated on SyncServer.OnNewRange at block.Number % 32 == 0, or
The inner path in TryPruneHistory itself (line 308) — but gated by the outer !_hasLoadedDeletePointer check that was just removed.

After the fix, TryPruneHistory reaches the lock, calls TryLoadDeletePointer() (line 308), and self-bootstraps. The inner TryLoadDeletePointer() already handles the failure path gracefully and short-circuits on repeated calls (if (_hasLoadedDeletePointer) return true at line 530), so steady-state cost is negligible.

Concurrency correctness: All invariants preserved:

_pruneLock still serialises mutation of _deletePointer, _deletePointerHeader, _hasLoadedDeletePointer, and _lastPrunedTimestamp.
_currentlyPruning atomic still prevents concurrent task scheduling.
Monitor.TryEnter with 100ms timeout is still used — no deadlock risk introduced.

Pre-lock ShouldPruneHistory with unbootstrapped _deletePointer = 1: For Rolling mode this uses the head timestamp (not _deletePointer), so no false negative. For UseAncientBarriers mode, 1 < ancientBarrier is true on first entry, admitting the lock path where the inner ShouldPruneHistory re-validates after real bootstrap. This is a transient one-time permissive pass, not a correctness issue.

Test analysis (`HistoryPrunerTests.cs:97–150`)

The test correctly and precisely reproduces the failure: it omits the CheckOldestAndCutoff / OldestBlockHeader call that previously masked the bug by bootstrapping _hasLoadedDeletePointer as a side effect. The comment block at lines 99–103 clearly explains the regression scenario. Assertions cover pruned blocks, preserved blocks, genesis, and head — matching the existing test patterns exactly.

Low (non-blocking)

_hasLoadedDeletePointer is not volatile — the fast-path read in OldestBlockHeader:215 is outside any lock. A stale false read causes one extra Monitor.TryEnter/Monitor.Exit overhead per call, not incorrect behaviour (correctness is guaranteed by the re-check inside the lock). This is a pre-existing condition unchanged by this PR.

Overall: The fix is correct, safe, and appropriately tested. Recommend merge after the usual review from @LukaszRozmej given context ownership.

On a node whose head never crosses a %32 block boundary after startup (e.g. frozen snapshot, peerless, no CL), history pruning never runs. Root cause: the pre-lock gate in TryPruneHistory short-circuits on !_hasLoadedDeletePointer, and the only production caller that flips that flag is SyncServer.OnNewRange -> HistoryPruner.OldestBlockHeader, which is gated on block.Number % 32 == 0. Nothing inside HistoryPruner observes or controls that condition, so the pruner relies on an external trigger that may never fire. The inner lock path already calls TryLoadDeletePointer() and exits cleanly on failure, so the outer gate is redundant. Removing it lets TryPruneHistory self-bootstrap on the first successful entry (triggered by the one-shot FireProcessingQueueEmpty at BlockchainProcessor startup). The redundant gate was introduced in #9677 as a fast-path optimization; this removes only that one line and preserves every other improvement from that PR (atomic _currentlyPruning scheduling, exclusive pruning, etc.). Added regression test that reproduces the frozen-head condition.

LukaszRozmej

@claude can HistoryPrunerTests be deduplicated with TestCase/TestCaseSource?

claude · 2026-04-20T15:42:12Z

Claude finished @LukaszRozmej's task in 4m 44s —— View job

Deduplicating HistoryPrunerTests with TestCase/TestCaseSource

Read the test file to identify duplication patterns
Load applicable coding/robustness rules
Implement TestCase/TestCaseSource refactor
Push changes

Yes, done — committed in 40f7d00. -194 / +57 lines across the test file.

What changed

Tests 1–4 → one [TestCaseSource] method (Prunes_history)

The four pruning tests (Can_prune_blocks_older_than_specified_epochs, Can_prune_without_prior_oldest_block_read, Can_prune_to_ancient_barriers, Prunes_up_to_sync_pivot) shared identical structure. They collapse into PruningCases() + Prunes_history(...) parameterized on:

Parameter	Purpose
`historyConfig`	pruning mode + retention config
`syncPivot`	where `SyncPivot` is set
`primeWithOldestRead`	whether to call `CheckOldestAndCutoff` before pruning (the `false` case is the regression for this PR)
`expectedPruneBelow`	threshold for pruned vs preserved block assertions
`finalCutoff`	expected `CutoffBlockNumber` in the final state check

SetName(...) on each TestCaseData preserves the original test names for discovery and filtering.

Tests 7–8 → one [TestCase] method (Validates_config)

Can_accept_valid_config used Substitute.For<ISpecProvider>() (which returns MinHistoryRetentionEpochs = 0 by default) while Can_reject_invalid_config used a real TestSpecProvider. Both collapse with explicit minHistoryRetentionEpochs parameters; TestSpecProvider is used in both cases with the same effect.

Kept separate — Can_find_oldest_block (tests binary search via SetDeletePointerToOldestBlock, no per-block assertions) and Does_not_prune_when_disabled (only 10 blocks, no pruning, no SyncPivot set).
· Branch

Co-authored-by: Lukasz Rozmej <LukaszRozmej@users.noreply.github.com>

HistoryConfig.RetentionEpochs is uint; test parameter was int, causing CS0266 build failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AnkushinDaniil requested review from LukaszRozmej and Marchhill as code owners April 20, 2026 13:38

github-actions bot added the bug fix + reliability label Apr 20, 2026

AnkushinDaniil force-pushed the fix/history-pruner-bootstrap-without-sync-server branch from 4e338d6 to d585c50 Compare April 20, 2026 13:43

LukaszRozmej approved these changes Apr 20, 2026

View reviewed changes

github-actions bot and others added 2 commits April 20, 2026 15:46

test: deduplicate HistoryPrunerTests with TestCaseSource/TestCase

40f7d00

Co-authored-by: Lukasz Rozmej <LukaszRozmej@users.noreply.github.com>

fix: correct uint type in Validates_config TestCase

b6ba767

HistoryConfig.RetentionEpochs is uint; test parameter was int, causing CS0266 build failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HistoryPruner: self-bootstrap in TryPruneHistory instead of relying on SyncServer#11254

HistoryPruner: self-bootstrap in TryPruneHistory instead of relying on SyncServer#11254
AnkushinDaniil wants to merge 3 commits intomasterfrom
fix/history-pruner-bootstrap-without-sync-server

AnkushinDaniil commented Apr 20, 2026 •

edited

Loading

Uh oh!

claude bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

LukaszRozmej left a comment

Uh oh!

claude bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AnkushinDaniil commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Documentation

Requires documentation update

Requires explanation in Release Notes

Remarks

Root cause

Why safe

Relationship to #9677

Uh oh!

claude bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

Fix analysis (HistoryPruner.cs:295)

Test analysis (HistoryPrunerTests.cs:97–150)

Low (non-blocking)

Uh oh!

LukaszRozmej left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deduplicating HistoryPrunerTests with TestCase/TestCaseSource

What changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnkushinDaniil commented Apr 20, 2026 •

edited

Loading

claude bot commented Apr 20, 2026 •

edited

Loading

Fix analysis (`HistoryPruner.cs:295`)

Test analysis (`HistoryPrunerTests.cs:97–150`)

claude bot commented Apr 20, 2026 •

edited

Loading