Skip to content

Nethermind RLPx session recovery can fail after a small malformed connection storm #11232

@N0zoM1z0

Description

@N0zoM1z0

Description

Nethermind appears to have a short-window session recovery issue after a small RLPx connection storm. In our devnet, one malformed/aborted RLPx connection attempt is enough for the next fresh RLPx+eth/69 session from the same test peer to fail while reading the peer status frame. The process stays up and the node remains reachable, so I am reporting this as a normal availability-hardening / session lifecycle bug rather than a security vulnerability.

The issue was found with WireWasp, our own devp2p/RLPx fuzzer and replay harness. The attached bundle includes the exact WireWasp cases, run configuration, raw replay logs, result summaries, and comparator output from Geth, Reth, Erigon, and Besu.

Steps to Reproduce

With the attached WireWasp reproduction bundle:

  1. Start the same-chain lab:
go run ./cmd/wasp-lab \
  -config env/config.yaml \
  -inventory env/inventory.samechain.yaml \
  -tier samechain up
  1. Run the refreshed replay campaign:
go run ./cmd/wasp-rlpx-yield \
  -config env/config.yaml \
  -inventory env/inventory.samechain.yaml \
  -targets geth-s,reth-s,erigon-s,besu-s,nethermind-s \
  -seed-dir findings/open/2026-04-16-nethermind-rlpx-storm-fresh-recovery \
  -track yield \
  -case-filter rlpx-finding-min \
  -replays 2 \
  -artifact-mode promoted \
  -out output/hunts/review-refresh-rlpx-storm-20260417
  1. Check evidence/artifacts/review-refresh-20260417/summary.tsv and the representative candidate directories.

General reproduction without WireWasp:

  1. Run Nethermind on a small devnet with JSON-RPC and devp2p TCP enabled.
  2. Establish a baseline RLPx session, negotiate eth/69, exchange status, then send a devp2p ping. This should succeed.
  3. Open one short malformed or aborted RLPx connection, for example a 307-byte invalid auth-sized payload of 0xff bytes, or a connection that is opened and closed before completing the expected session.
  4. Immediately open a new clean RLPx session from the same test host / peer identity, negotiate eth/69, send a valid ETH status packet, and wait for the peer status / recovery path.
  5. Compare against other clients on the same chain.

Actual behavior

On Nethermind v1.36.2, all 12 refreshed Nethermind rows failed the fresh-session recovery probe:

  • outcome: RECOVERY_FAILED_FRESH_CONNECTION
  • reporting class: confirmed-recovery-failure
  • fresh recovery phase: status_read
  • fresh recovery error kind: snappy
  • negotiated capability before the failure: eth/69

The same campaign produced successful fresh recovery on Geth, Reth, and Besu. Erigon gracefully disconnected in this probe shape, which is different from Nethermind's corrupt/missing fresh status behavior and was not classified as this recovery failure.

Representative Nethermind rows are in:

  • evidence/artifacts/review-refresh-20260417/summary.tsv
  • evidence/artifacts/review-refresh-20260417/representative-candidates/HIT-20260417T130148.281076469Z-496b6bd9/
  • evidence/artifacts/review-refresh-20260417/representative-candidates/HIT-20260417T130130.354018794Z-9efbd7d0/
  • evidence/artifacts/review-refresh-20260417/representative-candidates/HIT-20260417T130204.730184541Z-9889333f/

Expected behavior

Malformed or aborted RLPx connection attempts should be rejected or closed without affecting a later clean RLPx/ETH session. A fresh session should either complete the normal hello/status exchange and answer the recovery ping, or close cleanly according to policy. Prior malformed pre-auth traffic should not make the next clean session fail while reading ETH status.

Screenshots

Not applicable. This is a wire-protocol replay issue. The bundle includes raw transcripts, request/response hex, structured JSON results, and container logs.

Desktop (please complete the following information):

  • Operating System: Linux
  • Version: Nethermind/v1.36.2+f5507dec/linux-x64/dotnet10.0.1
  • Installation Method: Docker image nethermind/nethermind:latest
  • Consensus Client: not used in this devnet reproduction
  • Network: same-chain devnet, network id 12345
  • Comparators: Geth v1.17.1, Reth v1.11.3, Erigon v3.1.0, Besu v26.2.0

Additional context

The claim here is intentionally bounded. I did not observe a crash or persistent node-wide outage in this evidence set. The finding is that Nethermind behaves differently from peer clients in the immediate fresh-session lifecycle after a small malformed/aborted RLPx storm.

Logs

The attached zip contains:

  • evidence/artifacts/review-refresh-20260417/summary.tsv
  • evidence/artifacts/review-refresh-20260417/summary.json
  • evidence/artifacts/review-refresh-20260417/yield_campaign_report.tsv
  • evidence/artifacts/review-refresh-20260417/conformance_campaign_report.tsv
  • evidence/artifacts/review-refresh-20260417/representative-candidates/*/event-log.jsonl
  • evidence/artifacts/review-refresh-20260417/representative-candidates/*/transcript/*.jsonl
  • evidence/artifacts/review-refresh-20260417/representative-candidates/*/container.log
  • evidence/config/env.config.yaml
  • evidence/config/inventory.samechain.yaml
  • evidence/testcases/*.yaml

evidence.zip

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions