Nethermind RLPx session recovery can fail after a small malformed connection storm

**Description**

Nethermind appears to have a short-window session recovery issue after a small RLPx connection storm. In our devnet, one malformed/aborted RLPx connection attempt is enough for the next fresh RLPx+eth/69 session from the same test peer to fail while reading the peer status frame. The process stays up and the node remains reachable, so I am reporting this as a normal availability-hardening / session lifecycle bug rather than a security vulnerability.

The issue was found with WireWasp, our own devp2p/RLPx fuzzer and replay harness. The attached bundle includes the exact WireWasp cases, run configuration, raw replay logs, result summaries, and comparator output from Geth, Reth, Erigon, and Besu.

**Steps to Reproduce**

With the attached WireWasp reproduction bundle:

1. Start the same-chain lab:

```bash
go run ./cmd/wasp-lab \
  -config env/config.yaml \
  -inventory env/inventory.samechain.yaml \
  -tier samechain up
```

2. Run the refreshed replay campaign:

```bash
go run ./cmd/wasp-rlpx-yield \
  -config env/config.yaml \
  -inventory env/inventory.samechain.yaml \
  -targets geth-s,reth-s,erigon-s,besu-s,nethermind-s \
  -seed-dir findings/open/2026-04-16-nethermind-rlpx-storm-fresh-recovery \
  -track yield \
  -case-filter rlpx-finding-min \
  -replays 2 \
  -artifact-mode promoted \
  -out output/hunts/review-refresh-rlpx-storm-20260417
```

3. Check `evidence/artifacts/review-refresh-20260417/summary.tsv` and the representative candidate directories.

General reproduction without WireWasp:

1. Run Nethermind on a small devnet with JSON-RPC and devp2p TCP enabled.
2. Establish a baseline RLPx session, negotiate `eth/69`, exchange status, then send a devp2p ping. This should succeed.
3. Open one short malformed or aborted RLPx connection, for example a 307-byte invalid auth-sized payload of `0xff` bytes, or a connection that is opened and closed before completing the expected session.
4. Immediately open a new clean RLPx session from the same test host / peer identity, negotiate `eth/69`, send a valid ETH status packet, and wait for the peer status / recovery path.
5. Compare against other clients on the same chain.

**Actual behavior**

On Nethermind v1.36.2, all 12 refreshed Nethermind rows failed the fresh-session recovery probe:

- outcome: `RECOVERY_FAILED_FRESH_CONNECTION`
- reporting class: `confirmed-recovery-failure`
- fresh recovery phase: `status_read`
- fresh recovery error kind: `snappy`
- negotiated capability before the failure: `eth/69`

The same campaign produced successful fresh recovery on Geth, Reth, and Besu. Erigon gracefully disconnected in this probe shape, which is different from Nethermind's corrupt/missing fresh status behavior and was not classified as this recovery failure.

Representative Nethermind rows are in:

- `evidence/artifacts/review-refresh-20260417/summary.tsv`
- `evidence/artifacts/review-refresh-20260417/representative-candidates/HIT-20260417T130148.281076469Z-496b6bd9/`
- `evidence/artifacts/review-refresh-20260417/representative-candidates/HIT-20260417T130130.354018794Z-9efbd7d0/`
- `evidence/artifacts/review-refresh-20260417/representative-candidates/HIT-20260417T130204.730184541Z-9889333f/`

**Expected behavior**

Malformed or aborted RLPx connection attempts should be rejected or closed without affecting a later clean RLPx/ETH session. A fresh session should either complete the normal hello/status exchange and answer the recovery ping, or close cleanly according to policy. Prior malformed pre-auth traffic should not make the next clean session fail while reading ETH status.

**Screenshots**

Not applicable. This is a wire-protocol replay issue. The bundle includes raw transcripts, request/response hex, structured JSON results, and container logs.

**Desktop (please complete the following information):**

- Operating System: Linux
- Version: `Nethermind/v1.36.2+f5507dec/linux-x64/dotnet10.0.1`
- Installation Method: Docker image `nethermind/nethermind:latest`
- Consensus Client: not used in this devnet reproduction
- Network: same-chain devnet, network id `12345`
- Comparators: Geth v1.17.1, Reth v1.11.3, Erigon v3.1.0, Besu v26.2.0

**Additional context**

The claim here is intentionally bounded. I did not observe a crash or persistent node-wide outage in this evidence set. The finding is that Nethermind behaves differently from peer clients in the immediate fresh-session lifecycle after a small malformed/aborted RLPx storm.

**Logs**

The attached zip contains:

- `evidence/artifacts/review-refresh-20260417/summary.tsv`
- `evidence/artifacts/review-refresh-20260417/summary.json`
- `evidence/artifacts/review-refresh-20260417/yield_campaign_report.tsv`
- `evidence/artifacts/review-refresh-20260417/conformance_campaign_report.tsv`
- `evidence/artifacts/review-refresh-20260417/representative-candidates/*/event-log.jsonl`
- `evidence/artifacts/review-refresh-20260417/representative-candidates/*/transcript/*.jsonl`
- `evidence/artifacts/review-refresh-20260417/representative-candidates/*/container.log`
- `evidence/config/env.config.yaml`
- `evidence/config/inventory.samechain.yaml`
- `evidence/testcases/*.yaml`

[evidence.zip](https://github.com/user-attachments/files/26857033/evidence.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nethermind RLPx session recovery can fail after a small malformed connection storm #11232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nethermind RLPx session recovery can fail after a small malformed connection storm #11232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions