Skip to content

XDC Epoch switch manager#11140

Merged
batrr merged 18 commits intomasterfrom
feat/xdc-subnet-epoch-switch-manager
Apr 22, 2026
Merged

XDC Epoch switch manager#11140
batrr merged 18 commits intomasterfrom
feat/xdc-subnet-epoch-switch-manager

Conversation

@batrr
Copy link
Copy Markdown
Contributor

@batrr batrr commented Apr 14, 2026

Key differences in subnet implementation:

Epoch switches determined purely by blockNumber % epochLength == 0 (no round/quorum cert logic)
Penalties resolved from SubnetSnapshot.NextEpochPenalties instead of header penalties
GetBlockByEpochNumber not implemented

Types of changes

What types of changes does your code introduce?

  • New feature (a non-breaking change that adds functionality)
  • Refactoring

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

@batrr batrr requested a review from alexb5dh April 14, 2026 12:52
@batrr batrr marked this pull request as ready for review April 14, 2026 12:52
@batrr batrr requested review from ak88 and cicr99 as code owners April 14, 2026 12:52
@NethermindEth NethermindEth deleted a comment from claude Bot Apr 15, 2026
@ak88
Copy link
Copy Markdown
Contributor

ak88 commented Apr 15, 2026

@claude code review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 15, 2026

Claude finished @ak88's task in 2m 20s —— View job


Code Review: XDC Epoch Switch Manager

Tasks

  • Gather context and understand request
  • Read coding rules (coding-style, robustness, test-infrastructure, di-patterns)
  • Read changed files
  • Post comprehensive review

The refactoring is well-structured overall — extracting shared logic into BaseEpochSwitchManager is a good architectural decision. Below are issues grouped by severity.


Bugs / Correctness

1. Null dereference on Tree.HeadBaseEpochSwitchManager.cs:117

XdcBlockHeader headOfChainHeader = (XdcBlockHeader)Tree.Head.Header;

Tree.Head can be null during startup or on an empty chain. This throws a NullReferenceException. Should guard:

if (Tree.Head is null) return null;
XdcBlockHeader headOfChainHeader = Tree.Head.Header as XdcBlockHeader;
if (headOfChainHeader is null) return null;

Fix this →


2. Hard cast on Tree.FindHeader(...) will throw on wrong block type — BaseEpochSwitchManager.cs:41,106

Both of these:

// Line 41
header = (XdcBlockHeader)Tree.FindHeader(header.ParentHash!) ?? throw new InvalidOperationException(...)

// Line 106
XdcBlockHeader h = (XdcBlockHeader)Tree.FindHeader(hash);

If FindHeader returns a non-XdcBlockHeader (e.g., during a chain reorganisation or wrong-network scenario), an InvalidCastException is thrown with no context. Prefer as with a meaningful error:

header = Tree.FindHeader(header.ParentHash) as XdcBlockHeader
    ?? throw new InvalidOperationException($"Parent block {header.ParentHash} not found or not an XdcBlockHeader");

Fix this →


3. ValidatorsAddress! null-forgiving may still be null — BaseEpochSwitchManager.cs:57

masterNodes = [.. header.ValidatorsAddress!.Value];

The guard above this only checks header.ExtraConsensusData is null, not header.ValidatorsAddress. If ExtraConsensusData is non-null but ValidatorsAddress is null, this throws. Either add an explicit check or return null early:

if (header.ValidatorsAddress is null) return null;
masterNodes = [.. header.ValidatorsAddress.Value];

Fix this →


4. Potential overflow in SubnetEpochSwitchManager.GetBlockByEpochNumberSubnetEpochSwitchManager.cs:49

long targetNumber = (long)targetEpoch * xdcSpec.EpochLength;

targetEpoch is ulong. If it exceeds long.MaxValue / EpochLength, the multiplication silently overflows in an unchecked context. Add a guard:

if (targetEpoch > (ulong)(long.MaxValue / xdcSpec.EpochLength)) return null;
long targetNumber = (long)targetEpoch * xdcSpec.EpochLength;

Fix this →


5. Dead code — EpochSwitchManager.cs:192-195

Inside TryBinarySearchBlockByEpochNumber, at line 184 the method already returns false if header.ExtraConsensusData is null. The re-check at lines 192–195 is therefore unreachable:

if (header.ExtraConsensusData is null)   // <-- dead: already returned false on line 184
{
    epochBlockInfo = null;
    return false;
}

Remove the duplicate guard.

Fix this →


Style

6. using directive ordering — BaseEpochSwitchManager.cs:11-12

using System; and using System.Collections.Generic; appear after the Nethermind-namespaced imports. C# convention (and the .editorconfig) places System.* namespaces first.


7. Comment typo — SubnetEpochSwitchManager.cs:19

// Subnet epoch switches are block-number-based, not round-based;

The trailing semicolon in the comment is a typo.


8. Missing XML documentation on public/internal types

Per coding-style.md, public APIs require <summary> documentation. BaseEpochSwitchManager, SubnetEpochSwitchManager, and their methods lack any XML docs. EpochSwitchManager has docs on two overrides but none on the base class entries.


Testing Gaps

9. GetBlockByEpochNumber is not tested

SubnetEpochSwitchManager.GetBlockByEpochNumber has no test coverage, including the overflow boundary and the null-head case.

10. No test for GetEpochSwitchInfo when header is not at epoch boundary

The while (!IsEpochSwitchAtBlock(header)) walk in BaseEpochSwitchManager.GetEpochSwitchInfo(XdcBlockHeader) is not exercised — all existing tests use block 0 which is always an epoch switch. A test with blockNumber = 5 (mid-epoch) verifying it correctly walks to the epoch-switch ancestor would improve coverage.

11. GetEpochSwitchInfo(Hash256) and GetEpochSwitchInfo(ulong) overloads are untested


Minor

12. SubnetEpochSwitchManager.GetBlockByEpochNumber comment intent

// This method is not used in subnet, implementation just in case

Per coding-style.md, comments should explain why, not what. If this method is truly unused, consider making that explicit (e.g., throw new NotSupportedException(...)) rather than providing a best-effort fallback that could silently return wrong results.


Summary: The refactoring is a good structural improvement. The primary concerns are the null-dereference on Tree.Head (item 1), the hard casts that swallow type-mismatch errors (item 2), and the ValidatorsAddress null-forgiving operator that may still produce a NRE (item 3). The overflow in GetBlockByEpochNumber (item 4) is lower severity but worth fixing given it operates on user-controlled epoch numbers. Test coverage for non-genesis epoch walks and the new GetBlockByEpochNumber implementation should be added before merge.

@batrr
Copy link
Copy Markdown
Contributor Author

batrr commented Apr 16, 2026

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 16, 2026

Claude finished @batrr's task in 3m 42s —— View job


Code Review: XDC Epoch Switch Manager

Tasks

  • Gather context and understand request
  • Read coding rules (coding-style, robustness, test-infrastructure, di-patterns)
  • Read changed files
  • Post comprehensive review

The PR addresses several issues from the prior review — using ordering, comment typo, ValidatorsAddress null-forgiving, Tree.Head null check, and dead code are all fixed. The remaining items below are what still needs attention.


Bugs / Correctness

1. Hard cast still present in EpochSwitchManager.cs:275

The previous review flagged two hard-cast sites (lines 41 and 106). Both were fixed, but a third instance was missed:

// EpochSwitchManager.cs ~line 275
XdcBlockHeader estBlockHeader = (XdcBlockHeader)Tree.FindHeader(estBlockNum);

If FindHeader returns a non-XdcBlockHeader this throws InvalidCastException with no context. Change to:

if (Tree.FindHeader(estBlockNum) is not XdcBlockHeader estBlockHeader) return null;

Fix this →


2. SubnetEpochSwitchManager.GetTimeoutCertificateEpochInfo / GetEpochSwitchInfo(ulong) will throw at runtime

BaseEpochSwitchManager.GetEpochSwitchInfo(ulong round) walks back through epochs by calling GetBlockByEpochNumber(--tempTCEpoch). For SubnetEpochSwitchManager, GetBlockByEpochNumber throws NotSupportedException. The call chain therefore crashes:

GetTimeoutCertificateEpochInfo(cert)
  → GetEpochSwitchInfo(cert.Round)          // base class
    → GetBlockByEpochNumber(tempTCEpoch--)  // SubnetEpochSwitchManager throws

If the subnet never receives timeout certificates this is latent only, but any errant call will crash with a confusing NotSupportedException instead of graceful degradation. Either:

  • Override GetEpochSwitchInfo(ulong) and GetTimeoutCertificateEpochInfo in SubnetEpochSwitchManager to throw NotSupportedException explicitly (makes the contract visible), or
  • Return null from those overrides (preserves interface contract for callers that handle null).

Fix this →


3. ActivatedSyncFeed.cs — branch is missing the dispose guard fix from master

The diff between this branch and origin/master shows ActivatedSyncFeed.cs on this branch has:

private readonly bool _disposed = false;    // readonly — can never be changed
...
public virtual void Dispose() => StateChanged -= OnStateChanged;   // _disposed never set true

SyncModeSelectorOnChanged guards with if (_disposed) return;, which is now permanently false — post-dispose calls go through. PR #11163 (dd2a2d7) has already fixed this on master. Rebase or merge master to pick up that fix; it shouldn't conflict with the XDC changes.


Design

4. Round2EpochBlockInfo is unnecessarily protectedEpochSwitchManager.cs:25

protected LruCache<ulong, BlockRoundInfo> Round2EpochBlockInfo { get; } = ...

This field is read and written only inside EpochSwitchManager itself (IsEpochSwitchAtBlock, GetBlockInfoInCache). No subclass uses it. Marking it protected exposes internals unnecessarily; it should be private.

Fix this →


Testing Gaps

5. No test for GetEpochSwitchInfo when block is mid-epoch

All existing tests use block 0, which is always an epoch switch. The while (!IsEpochSwitchAtBlock(header)) walk in BaseEpochSwitchManager.GetEpochSwitchInfo is never exercised. Add a test where the starting block is mid-epoch (e.g. block 5 with epochLength = 10) and verify it correctly walks up to the epoch-switch ancestor at block 0.

6. GetEpochSwitchInfo(Hash256) overload is untested

The GetEpochSwitchInfo(Hash256 hash) overload in the base class has no test coverage. A basic happy-path test verifying it resolves through the hash lookup would suffice.


Previously Flagged — Confirmed Fixed

  • Tree.Head null dereference → fixed via pattern matching (is not XdcBlockHeader)
  • ValidatorsAddress! null-forgiving → replaced with explicit null check + throw
  • FindHeader hard cast at lines 41/106 → fixed (line 275 above was missed)
  • Dead code in TryBinarySearchBlockByEpochNumber → removed
  • using directive ordering → fixed (System.* first)
  • Comment typo (trailing semicolon) → fixed
  • GetBlockByEpochNumber comment intent → now correctly throws NotSupportedException

Summary: Most review items were addressed. The primary remaining concerns are: (3) rebase onto master to get the ActivatedSyncFeed dispose-guard fix, (2) the subnet's implicit NotSupportedException propagation through the base-class round-based methods, and (1) the remaining hard cast in EpochSwitchManager.GetBlockByEpochNumber. Items 4–6 are lower priority but straightforward to fix.

@batrr
Copy link
Copy Markdown
Contributor Author

batrr commented Apr 16, 2026

@claude review. Most cases are covered in non-subnet tests.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 16, 2026

Claude finished @batrr's task in 3m 21s —— View job


Code Review: XDC Epoch Switch Manager

Tasks

  • Gather context and understand request
  • Read coding rules (coding-style, robustness)
  • Read changed files
  • Post comprehensive review

All items from the previous two reviews have been addressed. The author's point stands — the base-class paths (GetEpochSwitchInfo(XdcBlockHeader) mid-epoch walk, GetEpochSwitchInfo(Hash256), GetTimeoutCertificateEpochInfo) are exercised by the existing EpochSwitchManagerTests since they share the same base implementation. Below is what still needs attention.


Bug / Correctness

1. GetEpochSwitchInfo(ulong round) is semantically broken for subnet — BaseEpochSwitchManager.cs:130

The base class loop compares epochBlockInfo.Round to the requested round:

while (epochBlockInfo.Round > round)
{
    tempTCEpoch--;
    epochBlockInfo = GetBlockByEpochNumber(tempTCEpoch);
    ...
}

In the subnet, EpochSwitchBlockInfo.Round is always header.ExtraConsensusData?.BlockRound ?? 0, which is 0 for subnet epoch-switch blocks that carry no consensus round. Because 0 > round is never true for a ulong, the loop body never executes — the method always returns the current epoch's info, regardless of the requested round.

If GetTimeoutCertificateEpochInfo is ever called in a subnet context with a TC from a prior epoch it will silently return wrong results instead of failing loud. If the subnet never uses timeout certificates, this is dead code — but that contract isn't enforced anywhere.

Recommendation: Override GetEpochSwitchInfo(ulong) and GetTimeoutCertificateEpochInfo in SubnetEpochSwitchManager to throw new NotSupportedException(...), making the unsupported contract explicit and preventing accidental silent misuse:

public override EpochSwitchInfo? GetEpochSwitchInfo(ulong round)
    => throw new NotSupportedException("Round-based epoch lookup is not supported in subnet mode.");

public override EpochSwitchInfo? GetTimeoutCertificateEpochInfo(TimeoutCertificate cert)
    => throw new NotSupportedException("Timeout certificates are not used in subnet mode.");

Fix this →


Testing

2. SubnetEpochSwitchManager.GetBlockByEpochNumber has no test

This is a subnet-specific implementation with different logic from EpochSwitchManager.GetBlockByEpochNumber (block-number lookup vs. round-based binary search). It is not covered by the non-subnet tests. At minimum add:

  • Happy path: given a chain of 30 blocks with epochLength = 10, GetBlockByEpochNumber(2) returns block 20.
  • Null head: Tree.Head is null, returns null.
  • Out-of-range epoch: target epoch beyond chain head, returns null.

Minor

3. Missing XML documentation on new public/internal types

Per coding-style.md, all public APIs require <summary> docs. None of the new files add them:

  • BaseEpochSwitchManager class and ResolvePenalties
  • SubnetEpochSwitchManager class and overrides
  • IEpochSwitchManager methods remain undocumented

Previously Flagged — Confirmed Fixed

  • Tree.Head null dereference → pattern-matched in both base and subnet
  • FindHeader hard casts at all three sites → fixed
  • ValidatorsAddress! null-forgiving → replaced with explicit null check + throw
  • Round2EpochBlockInfo visibility → now private
  • Dead code in TryBinarySearchBlockByEpochNumber → removed
  • Overflow guard in GetBlockByEpochNumber → added
  • using directive ordering → correct (System.* first)
  • Master branch merge → complete (ActivatedSyncFeed dispose-guard included)

Summary: The refactoring is solid and the prior review items are all resolved. The one correctness concern worth fixing before merge is item 1 — the silent semantic mismatch in round-based lookups for subnet. A test for GetBlockByEpochNumber (item 2) would also be good to add given it's a new subnet-specific path with no non-subnet analogue.

Copy link
Copy Markdown
Contributor

@ak88 ak88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard casting to expected type is much better than silently failing and just returning null.

Comment thread src/Nethermind/Nethermind.Xdc/BaseEpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/BaseEpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/EpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/BaseEpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/SubnetEpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/SubnetEpochSwitchManager.cs Outdated
Copy link
Copy Markdown
Contributor

@alexb5dh alexb5dh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check Claude-submitted issue about GetEpochSwitchInfo(ulong round) being broken for subnet - may be valid.

Comment thread src/Nethermind/Nethermind.Xdc.Test/SubnetEpochSwitchManagerTests.cs
Comment thread src/Nethermind/Nethermind.Xdc.Test/SubnetEpochSwitchManagerTests.cs Outdated
@batrr
Copy link
Copy Markdown
Contributor Author

batrr commented Apr 20, 2026

@claude review
Note: subnet headers do have rounds, but they aren't used for epoch switching. GetEpochSwitchInfo(ulong round) was computing epoch numbers via round / EpochLength, fixed by overriding blockNumber / EpochLength.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 20, 2026

Claude finished @batrr's task in 3m 49s —— View job


Code Review: XDC Epoch Switch Manager (Round 4)

Tasks

  • Gather context and understand request
  • Read coding rules (coding-style, robustness, test-infrastructure)
  • Read changed files and current branch state
  • Post comprehensive review

GetEpochSwitchInfo(ulong round) — Confirmed Correct

The prior concern (Round 3) was based on the incorrect assumption that subnet epoch-switch blocks carry Round == 0. Since subnet headers do have rounds, the base-class walk while (epochBlockInfo.Round > round) remains valid. The fix — overriding GetCurrentEpochNumber to return blockNumber / EpochLength instead of the old round / EpochLength — correctly initialises tempTCEpoch from block position, and SubnetEpochSwitchManager.GetBlockByEpochNumber then looks up the right block via targetEpoch * EpochLength. The semantics are now sound.

Hard Casts — Aligned with Reviewer Feedback

All (XdcBlockHeader?)Tree.FindHeader(...) and (XdcBlockHeader?)Tree.Head?.Header sites now use explicit casts that throw InvalidCastException on type mismatch, consistent with ak88's stated preference ("misconfigured → should throw"). No further action needed there.


Medium

1. Missing XML documentation on public IEpochSwitchManager (flagged in rounds 1–3, still unaddressed)

IEpochSwitchManager is public. Per coding-style.md, all public APIs require <summary> (and <param>/<returns> where applicable). See inline comment on IEpochSwitchManager.cs:11.


Low

2. using System ordering in test file

using System; appears last (line 12) in SubnetEpochSwitchManagerTests.cs. System.* namespaces must come first per C# convention and .editorconfig. See inline suggestion on line 12.

3. Assert.Throws<> inconsistency

GetEpochSwitchInfo_NonSubnetSnapshot_Throws uses Assert.Throws<ArgumentException>(...) while all other assertions use Assert.That(...). See inline suggestion on line 129.

4. No test for GetBlockByEpochNumber

SubnetEpochSwitchManager.GetBlockByEpochNumber has different logic from the mainnet variant (block-number lookup vs. round-based binary search) and is not covered by the shared base tests. A basic test — e.g., 30-block chain with epochLength=10, GetBlockByEpochNumber(2) returns block 20 — would give confidence in the new code path.


Previously Confirmed Fixed

Item Status
Tree.Head null dereference
FindHeader hard casts at all sites
ValidatorsAddress! null-forgiving
Round2EpochBlockInfo visibility
Dead code in TryBinarySearchBlockByEpochNumber
Overflow guard in GetBlockByEpochNumber
GetCurrentEpochNumber override for subnet (block-based)
Master branch merge (ActivatedSyncFeed dispose-guard)
using directive ordering in production files

The architecture is solid and all correctness items from the prior three rounds are resolved. Remaining items are documentation (medium) and two minor style points (low).

Comment thread src/Nethermind/Nethermind.Xdc.Test/SubnetEpochSwitchManagerTests.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc.Test/SubnetEpochSwitchManagerTests.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/IEpochSwitchManager.cs
@batrr
Copy link
Copy Markdown
Contributor Author

batrr commented Apr 20, 2026

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 20, 2026

Claude finished @batrr's task in 4m 24s —— View job


Code Review: XDC Epoch Switch Manager (Round 5)

Tasks

  • Gather context and understand request
  • Read coding rules (coding-style, robustness, test-infrastructure)
  • Read changed files
  • Post comprehensive review

All prior High/Medium items from rounds 1–4 are resolved. The remaining findings are Low.


Low

1. Unused variable in EpochSwitchManager.IsEpochSwitchAtRound — line 74

ulong epochNumber = (ulong)xdcSpec.SwitchEpoch + currentRound / (ulong)xdcSpec.EpochLength;

epochNumber is computed but never used anywhere in the method. Dead code — remove it. See inline comment.


2. snapshot parameter unused in EpochSwitchManager.ResolvePenalties — line 99

The mainnet implementation reads penalties from the header, not the snapshot, so the snapshot parameter is ignored. Use _ discard to make this explicit:

protected override Address[] ResolvePenalties(XdcBlockHeader header, Snapshot _, IXdcReleaseSpec spec) =>

See inline comment.


3. No test for SubnetEpochSwitchManager.GetBlockByEpochNumber (persistent from rounds 3–4)

The subnet implementation uses block-number arithmetic (targetEpoch * EpochLength) rather than the mainnet's round-based binary search — the non-subnet tests don't cover it. Happy path, null head, and future-epoch cases are all untested. See inline comment.


Previously Confirmed Fixed

Item Status
Tree.Head null dereference
FindHeader hard casts at all sites
ValidatorsAddress! null-forgiving
Round2EpochBlockInfo visibility
Dead code in TryBinarySearchBlockByEpochNumber
Overflow guard in GetBlockByEpochNumber
GetCurrentEpochNumber override (block-based for subnet)
GetEpochSwitchInfo(ulong round) semantics ✅ (rounds present on subnet headers; loop is correct)
Master branch merge (ActivatedSyncFeed dispose-guard)
using directive ordering
Assert.Throws style consistency
IEpochSwitchManager XML documentation

Only Low findings remain. The refactoring is solid and the correctness concerns from prior rounds are all resolved.

Comment thread src/Nethermind/Nethermind.Xdc.Test/SubnetEpochSwitchManagerTests.cs
Comment thread src/Nethermind/Nethermind.Xdc/EpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/EpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/EpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/EpochSwitchManager.cs Outdated
Comment thread src/Nethermind/Nethermind.Xdc/BaseEpochSwitchManager.cs Outdated
@batrr batrr merged commit b7cd74f into master Apr 22, 2026
438 checks passed
@batrr batrr deleted the feat/xdc-subnet-epoch-switch-manager branch April 22, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants