HistoryPruner: self-bootstrap in TryPruneHistory instead of relying on SyncServer#11254
HistoryPruner: self-bootstrap in TryPruneHistory instead of relying on SyncServer#11254AnkushinDaniil wants to merge 3 commits intomasterfrom
Conversation
|
Claude finished @AnkushinDaniil's task in 2m 15s —— View job Code Review
SummaryThis is a clean, minimal, well-reasoned fix. The root cause diagnosis is accurate and the solution correctly addresses it. No critical, high, or medium issues found. No findings that block merge. Fix analysis (
|
On a node whose head never crosses a %32 block boundary after startup (e.g. frozen snapshot, peerless, no CL), history pruning never runs. Root cause: the pre-lock gate in TryPruneHistory short-circuits on !_hasLoadedDeletePointer, and the only production caller that flips that flag is SyncServer.OnNewRange -> HistoryPruner.OldestBlockHeader, which is gated on block.Number % 32 == 0. Nothing inside HistoryPruner observes or controls that condition, so the pruner relies on an external trigger that may never fire. The inner lock path already calls TryLoadDeletePointer() and exits cleanly on failure, so the outer gate is redundant. Removing it lets TryPruneHistory self-bootstrap on the first successful entry (triggered by the one-shot FireProcessingQueueEmpty at BlockchainProcessor startup). The redundant gate was introduced in #9677 as a fast-path optimization; this removes only that one line and preserves every other improvement from that PR (atomic _currentlyPruning scheduling, exclusive pruning, etc.). Added regression test that reproduces the frozen-head condition.
4e338d6 to
d585c50
Compare
LukaszRozmej
left a comment
There was a problem hiding this comment.
@claude can HistoryPrunerTests be deduplicated with TestCase/TestCaseSource?
|
Claude finished @LukaszRozmej's task in 4m 44s —— View job Deduplicating HistoryPrunerTests with TestCase/TestCaseSource
Yes, done — committed in 40f7d00. -194 / +57 lines across the test file. What changedTests 1–4 → one The four pruning tests (
Tests 7–8 → one
Kept separate — |
Co-authored-by: Lukasz Rozmej <LukaszRozmej@users.noreply.github.com>
HistoryConfig.RetentionEpochs is uint; test parameter was int, causing CS0266 build failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Changes
HistoryPruner.TryPruneHistory: remove the redundant pre-lock!_hasLoadedDeletePointergate so the pruner self-bootstraps on first entry instead of depending onSyncServer.OnNewRangeto callOldestBlockHeaderat a%32block boundary. The inner path atHistoryPruner.cs:309already callsTryLoadDeletePointer()and handles failure — the outer check only served to suppress bootstrap until an external caller primed it.Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Documentation
Requires documentation update
Requires explanation in Release Notes
Remarks
Root cause
On a node whose head never crosses a
%32block boundary after startup (frozen snapshot, peerless, no CL attached), history pruning never runs even when--History.Pruning=Rollingis configured._hasLoadedDeletePointeronly flips whenTryLoadDeletePointer()runs, and in production that only happens via theOldestBlockHeadergetter, which is only called fromSyncServer.OnNewRange(SyncServer.cs:472) gated onblock.Number % 32 == 0(SyncServer.cs:469). Nothing insideHistoryPrunerobserves or controls that condition, so the pruner waits on an external event that may never fire.Why safe
_currentlyPruningatomic still prevents concurrent pruning tasks;_pruneLockstill serializes actual work insideTryPruneHistory.Monitor.TryEnter/Monitor.Exitper trigger during the brief startup window before bootstrap — one-time, not steady state. After first load,TryLoadDeletePointer()early-returns at line 531 with no DB read.ShouldPruneHistorycall remains valid before bootstrap: reads_deletePointer(default 0) and_lastPrunedTimestamp(default null); both branches return sensibly and let the flow fall into the lock path where bootstrap + prune happen atomically.Relationship to #9677
The pre-lock gate was introduced in #9677 as a fast-path optimization. This PR removes only that one line and preserves every other improvement from #9677 (atomic
_currentlyPruning, exclusive scheduling,TryScheduleTaskrename)