Skip to content

fix(security): harden hbbs/hbbr against unauthenticated abuse and state exhaustion#642

Open
ssh4net wants to merge 5 commits intorustdesk:masterfrom
ssh4net:Security
Open

fix(security): harden hbbs/hbbr against unauthenticated abuse and state exhaustion#642
ssh4net wants to merge 5 commits intorustdesk:masterfrom
ssh4net:Security

Conversation

@ssh4net
Copy link
Copy Markdown

@ssh4net ssh4net commented Apr 13, 2026

Summary

This PR hardens rustdesk-server against malicious clients and generic internet abuse.

The main goal is to treat clients as a possible attack vector, not only the server as a trust root for clients. The changes focus on:

  • enforcing the configured deployment key on more rendezvous control-plane paths
  • reducing relay and rendezvous abuse as generic internet-facing services
  • bounding in-memory and persistent state growth
  • improving operator visibility into protection behavior
  • adding process-level regression tests for the hardened paths

This PR also includes the required shared protocol update in hbb_common so client and server use the same keyed request fields.

Why

Before this change, several server paths were still too permissive:

  • keyed hbbs deployments enforced the server key for punch-hole requests, but not for all registration or presence paths
  • hbbr could be abused to accumulate large numbers of half-open relay requests
  • websocket listeners trusted forwarded IP headers from arbitrary clients
  • punch request tracking and peer state could grow without strong bounds
  • persistent peer growth had no hard server-side cap
  • the server had limited operator-facing visibility into rate-limit and pruning events

That made the server vulnerable to:

  • unauthorized peer registration on keyed deployments
  • unauthenticated presence enumeration
  • relay/state exhaustion by malicious clients
  • weaker IP-based anti-abuse controls due to spoofed proxy headers
  • long-term database growth from registration churn

What changed

1. Keyed auth on hbbs control-plane requests

Vulnerability:

  • on keyed/private deployments, some hbbs requests still worked without the configured server key

Fix:

  • require the configured server key for:
    • RegisterPeer
    • RegisterPk
    • OnlineRequest
    • TestNatRequest
  • keep existing key enforcement for punch-hole / relay-related flows
  • return explicit LICENSE_MISMATCH for RegisterPk where the client benefits from a clear failure reason

Implementation notes:

  • added licence_key to the relevant rendezvous protocol messages
  • updated both client and server handling to use the same shared protocol fields

2. Relay half-open abuse caps in hbbr

Vulnerability:

  • attackers could create large numbers of pending relay UUIDs and hold server state for 30 seconds at a time

Fix:

  • cap total pending relays
  • cap pending relays per source IP
  • prune expired pending relay entries before admitting new ones
  • refuse overload explicitly instead of allowing silent hangs
  • return an explicit Key mismatch refusal on relay auth failure

3. Do not trust forwarded IP headers by default

Vulnerability:

  • websocket listeners accepted X-Real-IP / X-Forwarded-For from arbitrary clients, weakening rate limits, logging, and abuse controls

Fix:

  • only honor forwarded proxy headers when TRUST_PROXY_HEADERS is explicitly enabled
  • otherwise use the real socket peer address
  • normalize X-Forwarded-For handling to the first address only

4. Bound punch-hole request tracking

Vulnerability:

  • punch request tracking could grow indefinitely in memory

Fix:

  • prune old entries by retention window
  • cap the total retained punch request entries
  • apply pruning both on insert and inspection

5. Registration abuse reduction

Vulnerability:

  • repeated registration churn could consume server resources cheaply

Fix:

  • validate peer registration IDs before touching peer state
  • apply rate limiting to RegisterPeer in addition to existing RegisterPk protections
  • reject obviously invalid IDs early

6. In-memory peer cache growth controls

Vulnerability:

  • pending and cached peer state could grow without strong bounds

Fix:

  • add a global in-memory peer cache cap
  • add per-IP pending registration limits
  • evict cached peers with preference for inactive / older entries
  • keep registration placeholder growth bounded

7. Persistent peer record cap and retention pruning

Vulnerability:

  • database-backed peer records could grow indefinitely over time

Fix:

  • add MAX_TOTAL_PEER_RECORDS
  • refuse creation of new peer records when the cap is hit
  • before refusing, prune stale records using PEER_RECORD_RETENTION_DAYS
  • continue allowing updates to existing records

Additional fix:

  • RegisterPk now returns the real backend result instead of always reporting OK
  • this was important for surfacing PEER_LIMIT_REACHED correctly on the wire

8. Listener-level rate limiting

Vulnerability:

  • abusive clients could repeatedly hit TCP and UDP listeners before deeper protocol checks

Fix:

  • add per-IP fixed-window limits for:
    • hbbs main TCP listener
    • hbbs NAT/test listener
    • hbbs websocket listener
    • hbbs UDP receive loop
    • hbbr TCP listener
    • hbbr websocket listener
  • keep loopback exempt for local admin usage

9. Operator-facing protection stats

Problem:

  • protections existed, but operator visibility was poor

Fix:

  • add protection-stats / ps admin output
  • report:
    • protection limit configuration
    • connection rate-limit hits
    • UDP rate-limit hits
    • punch-request pruning
    • relay pending rejections
    • peer pruning / peer limit events

Tests

Added process-level regression coverage in tests/server_protection_process.rs for:

  • hbbs admin ps
  • hbbr admin ps
  • hbbs OnlineRequest keyed auth enforcement
  • hbbs RegisterPk keyed auth enforcement
  • hbbr relay key mismatch refusal
  • hbbs persistent peer cap behavior
  • hbbs stale peer retention pruning

Validated with:

cargo test --test server_protection_process -- --test-threads=1

Security impact

This PR does not try to solve every operational abuse scenario, but it materially improves the default security posture for self-hosted RustDesk deployments:

  • private/keyed deployments now behave more consistently as private services
  • the server is harder to abuse as a generic tunnel or reconnaissance target
  • in-memory and persistent resource growth are bounded
  • anti-abuse controls are less vulnerable to trivial source-IP spoofing
  • failure modes are more explicit and test-covered

Notes

This PR depends on the matching shared hbb_common protocol update so client and server use the same keyed rendezvous request fields.

Summary by CodeRabbit

  • New Features

    • Configurable per‑IP rate limits for connections, control messages and UDP (with loopback exemption), plus protection reporting and admin view
    • Peer capacity/retention enforcement that can return PEER_LIMIT_REACHED when full and prunes stale records
    • Pending/active relay caps (global + per‑IP), clearer refusal reasons, timed holds, session timeouts and trusted‑proxy-aware client IP handling
    • Stricter registration/license validation (invalid ID / license mismatch / too‑frequent)
  • Tests

    • Expanded unit and integration tests covering protection, limits, pruning, eviction and auth behavior

Update Cargo.lock with multiple dependency bumps and additions, update the hbb_common submodule, and adjust source code across src/common.rs, src/database.rs, src/peer.rs, src/relay_server.rs and src/rendezvous_server.rs to match the updates. Commit also updates the local SQLite DB (db_v2.sqlite3) and adds a new integration test (tests/server_protection_process.rs) to exercise server protection behavior. These changes combine dependency maintenance with corresponding code and test updates to ensure compatibility and improved coverage.

Signed-off-by: Vlad <shaamaan@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds proxy-aware client IP extraction, per-IP and per-message rate limiting, relay/pending/peer capacity controls with pruning and eviction, registration/license validation, database peer retention caps, protection reporting, and extensive unit and integration tests.

Changes

Cohort / File(s) Summary
Rate Limiting & Protection Infrastructure
src/common.rs
Added trust_proxy_headers and apply_trusted_proxy_addr, per-IP in-memory rate limiters for connections/UDP/control messages, global Mutex-protected counters with pruning and eviction, loopback exemptions, protection event recording and reporting helpers, and unit tests.
Database Peer Capacity
src/database.rs
Added env-configurable peer caps/retention (MAX_TOTAL_PEER_RECORDS, PEER_RECORD_RETENTION_DAYS), peer_count and prune_old_peer_records helpers, new InsertPeerResult enum, changed insert_peer to return Inserted or PeerLimitReached, and tests for prune-before-insert behavior.
Peer Cache & Registration Controls
src/peer.rs
Introduced cache sizing/inactivity timeout, per-IP pending-registration caps, eviction selection (PeerEvictionEntry, select_peer_ids_to_evict), replaced get_or with get_or_for_registration(id, ip) -> Option<LockPeer) enforcing pending caps, added IP-block/change tracking and pruning, and tests.
Relay Server: Pending & Active Relays
src/relay_server.rs
Reworked pending relay storage to PendingRelay and added ACTIVE_RELAYS; added global and per-IP pending/active caps, hold/idle/session timeouts, admission checks via allow_connection_from_ip and apply_trusted_proxy_addr, explicit relay refusals (e.g., license/key mismatch), pruning, admin protection-stats, and tests.
Rendezvous Server: Validation, Rate Limits & Resource Caps
src/rendezvous_server.rs
Added registration-id and server-key validation, integrated get_or_for_registration and TOO_FREQUENT mapping, per-listener and forwarded-client rate limits, control-message rate limiting, punch/relay fanout caps with pruning, bounded TCP punch storage with TTL and eviction, CLI protection-stats, and tests.
Integration Tests & Helpers
tests/server_protection_process.rs
New integration test suite launching hbbs/hbbr processes, verifying admin ps output, license/key behaviors, peer-limit pruning with sqlite, relay fanout refusals, TCP punch TTL pruning, and providing process/port/binary helpers and framed protobuf utilities.
Dependency Update
libs/hbb_common
Submodule pointer bumped from 83419b65... to 36bf31de....

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant ConnHandler as Connection<br/>Handler
    participant RateLimiter as Rate Limiter<br/>(common.rs)
    participant Cache as Peer Cache
    participant DB as Database

    Client->>ConnHandler: Incoming TCP/UDP/control request
    ConnHandler->>RateLimiter: allow_connection_from_ip / allow_udp_packet_from_ip / allow_control_message_from_ip
    alt Rate limited
        RateLimiter-->>ConnHandler: blocked
        ConnHandler-->>Client: drop/refuse
    else Allowed
        ConnHandler->>Cache: get_or_for_registration(id, ip)
        alt Pending per-IP cap exceeded
            Cache-->>ConnHandler: None (TOO_FREQUENT)
            ConnHandler-->>Client: error response
        else Proceed
            Cache->>Cache: prune_cache_for_insert()/evict if needed
            Cache->>DB: insert_peer(...)
            alt DB at capacity
                DB->>DB: prune_old_peer_records()
                DB-->>Cache: PeerLimitReached
                ConnHandler-->>Client: PEER_LIMIT_REACHED
            else Inserted
                DB-->>Cache: Inserted(guid)
                ConnHandler-->>Client: success
            end
        end
    end
Loading
sequenceDiagram
    actor Client
    participant ReqHandler as Relay<br/>RequestHandler
    participant RateLimiter as Rate Limiter<br/>(common.rs)
    participant RelayQueue as Pending Relay Queue
    participant DB as Database

    Client->>ReqHandler: relay_request(stream, key, ip)
    ReqHandler->>RateLimiter: allow_connection_from_ip / validate license/key
    alt Key mismatch
        ReqHandler->>ReqHandler: send_relay_refuse("Key mismatch")
        ReqHandler-->>Client: refusal
    else Key valid
        ReqHandler->>RelayQueue: prune_expired_pending_relays()
        RelayQueue->>RelayQueue: compute global and per-IP pending counts
        alt Pending/active caps exceeded
            RelayQueue-->>ReqHandler: limit reason
            ReqHandler->>ReqHandler: send_relay_refuse(reason)
            ReqHandler-->>Client: refusal
        else Accept
            RelayQueue->>RelayQueue: insert PendingRelay{stream, ip, created_at}
            ReqHandler->>DB: pair and start ActiveRelay
            ReqHandler-->>Client: relay established
        end
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I nibble logs and prune the stale with care,
I count connections, forward headers fair,
I gate the hops by IP and time,
Evict the old, let lively peers chime,
A thump, a hop — protection everywhere!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main security-focused changes: hardening hbbs/hbbr against unauthenticated abuse and state exhaustion through rate limiting, resource caps, proxy header handling, and authentication enforcement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/database.rs (1)

153-178: ⚠️ Potential issue | 🟠 Major

Make the peer-cap check atomic with the insert.

This is count -> maybe prune -> count -> insert across separate statements, and peer_count() grabs a fresh pooled connection each time. Concurrent registrations, or another server process writing the same SQLite file, can pass both checks and still insert past MAX_TOTAL_PEER_RECORDS, so the new cap is bypassable under load.

Suggested direction
-        if peer_limit_reached(self.peer_count().await?, self.max_total_peers) {
-            if peer_retention_prune_enabled(self.peer_record_retention_days) {
-                let deleted = self.prune_old_peer_records().await?;
-                if deleted > 0 {
-                    crate::common::record_protection_event("peer_records_pruned");
-                    log::info!("pruned {} old peer records before inserting {}", deleted, id);
-                }
-            }
-        }
-        if peer_limit_reached(self.peer_count().await?, self.max_total_peers) {
+        let mut conn = self.pool.get().await?;
+        sqlx::query("BEGIN IMMEDIATE")
+            .execute(conn.deref_mut())
+            .await?;
+
+        let total = sqlx::query!("select count(*) as count from peer")
+            .fetch_one(conn.deref_mut())
+            .await?
+            .count as i64;
+
+        if peer_limit_reached(total, self.max_total_peers) {
+            if peer_retention_prune_enabled(self.peer_record_retention_days) {
+                let cutoff = peer_retention_cutoff_arg(self.peer_record_retention_days);
+                let deleted = sqlx::query("delete from peer where created_at < datetime('now', ?)")
+                    .bind(cutoff)
+                    .execute(conn.deref_mut())
+                    .await?
+                    .rows_affected();
+                if deleted > 0 {
+                    crate::common::record_protection_event("peer_records_pruned");
+                    log::info!("pruned {} old peer records before inserting {}", deleted, id);
+                }
+            }
+        }
+
+        let total = sqlx::query!("select count(*) as count from peer")
+            .fetch_one(conn.deref_mut())
+            .await?
+            .count as i64;
+        if peer_limit_reached(total, self.max_total_peers) {
             crate::common::record_protection_event("peer_limit_reached");
             log::warn!("peer record limit reached, rejecting new peer {}", id);
+            sqlx::query("ROLLBACK").execute(conn.deref_mut()).await?;
             return Ok(InsertPeerResult::PeerLimitReached);
         }
         let guid = uuid::Uuid::new_v4().as_bytes().to_vec();
         sqlx::query!(
             "insert into peer(guid, id, uuid, pk, info) values(?, ?, ?, ?, ?)",
@@
-        .execute(self.pool.get().await?.deref_mut())
+        .execute(conn.deref_mut())
         .await?;
+        sqlx::query("COMMIT").execute(conn.deref_mut()).await?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database.rs` around lines 153 - 178, Wrap the count->prune->count->insert
sequence in a single DB transaction using the same pooled connection so the
peer-cap check is atomic: acquire a connection once (let conn =
self.pool.get().await?), begin a transaction with an immediate/serializable lock
(use sqlx transaction on that conn), then call peer_count (or replace it with an
inline SELECT COUNT(*)) and prune_old_peer_records using that same conn (modify
prune_old_peer_records to accept &mut conn or run its SQL inline inside the
transaction), re-check peer_limit_reached against self.max_total_peers, and only
then perform the insert into peer and commit the transaction; if limit reached,
rollback and return InsertPeerResult::PeerLimitReached. Ensure all queries use
the transaction/conn rather than grabbing fresh connections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/common.rs`:
- Around line 188-205: The map currently allocates an entry for every new source
IP (via CONNECTION_RATE_LIMITS and the key variable), which allows memory
exhaustion; before inserting a new ConnectionRateEntry, enforce a global cap
(e.g., MAX_CONNECTION_RATE_ENTRIES) by checking lock.len() and pruning stale
entries (reuse prune_connection_rate_limits) and if still at capacity evict the
oldest entry (compare last_seen_at) or simply treat the unseen IP as
rate-limited and return false instead of inserting; update the logic around
lock.entry(key).or_insert(...) to only create the entry when under the cap and
to perform eviction via last_seen_at on CONNECTION_RATE_LIMITS when needed.
- Around line 272-281: apply_trusted_proxy_addr currently rewrites trusted-proxy
peer addresses to SocketAddr::new(ip, 0), collapsing all clients behind the same
public IP into one synthetic address and causing collisions where tcp_punch
(keyed by SocketAddr in src/rendezvous_server.rs) overwrites concurrent
connections. Change apply_trusted_proxy_addr so that when a forwarded_ip is
present it preserves the original connection's port (use addr.port()) instead of
0 (i.e., construct the new SocketAddr from the parsed ip and the original addr's
port), keeping the per-connection discriminator while still using
parse_forwarded_ip and trust_proxy_headers().

In `@src/peer.rs`:
- Around line 214-231: The current flow in get_or_for_registration lets multiple
concurrent callers bypass MAX_PENDING_REGISTRATIONS_PER_IP because
can_cache_pending_registration() inspects state before prune_cache_for_insert()
and before acquiring the write lock; fix by performing the pending-registration
count and pruning while holding the same write lock used to insert the
placeholder: acquire the write lock (the same map.write() used later), call
prune_cache_for_insert() or inline its logic under that lock, recompute the
per-IP pending count, and only insert the new Arc<RwLock<Peer>> if the count is
below MAX_PENDING_REGISTRATIONS_PER_IP; if the cap is reached return None.
Ensure you still short-circuit and return an existing peer when w.get(id) is
present after acquiring the write lock.

In `@src/relay_server.rs`:
- Around line 396-404: The WS listener currently calls allow_connection_from_ip
for "hbbr-ws" using the raw peer addr returned by listener2.accept(), which
rate-limits the reverse proxy rather than the real client when
TRUST_PROXY_HEADERS is enabled; change the logic so that handle_connection (or
the WebSocket handshake in make_pair) performs the allow_connection_from_ip
check using the reconstructed client IP when proxy headers are trusted, rather
than using the listener peer addr. Concretely, remove or bypass the
allow_connection_from_ip call in the listener2.accept() branch and ensure
make_pair or the handshake path invokes allow_connection_from_ip("hbbr-ws",
client_ip) after extracting and validating proxy headers (respecting
TRUST_PROXY_HEADERS) so quotas are applied per real client IP.

In `@src/rendezvous_server.rs`:
- Around line 304-312: The rate-limit check currently performed right after
listener3.accept() using allow_connection_from_ip("hbbs-ws", addr) applies the
proxy IP rather than the real client IP and should be moved or adjusted so it
uses the post-proxy/forwarded client address; update the logic to perform the
allow_connection_from_ip call inside handle_listener/handle_listener_inner after
apply_trusted_proxy_addr() has resolved the real client IP (or, if
TRUST_PROXY_HEADERS is enabled, extract the forwarded address first and pass
that IP to allow_connection_from_ip) so per-client buckets are enforced
correctly instead of rate-limiting the proxy.

In `@tests/server_protection_process.rs`:
- Around line 52-80: reserve_hbbs_port and reserve_hbbr_port race because they
bind sockets, drop them, and return only port numbers; another process can claim
the ports before the child is spawned. Fix by changing these helpers to keep the
actual bound sockets alive until the child process is started: in
reserve_hbbs_port return the bound TcpListener(s) and UdpSocket (or a struct
holding them) instead of u16, and in reserve_hbbr_port return the bound
TcpListener(s); then pass those open descriptors (or rely on inheritable fds)
into the child spawn so the ports remain reserved until the child binds them.
Ensure functions named reserve_hbbs_port and reserve_hbbr_port are updated
accordingly and all call sites are adjusted to accept and hold the returned
listeners/sockets until after child startup.

---

Outside diff comments:
In `@src/database.rs`:
- Around line 153-178: Wrap the count->prune->count->insert sequence in a single
DB transaction using the same pooled connection so the peer-cap check is atomic:
acquire a connection once (let conn = self.pool.get().await?), begin a
transaction with an immediate/serializable lock (use sqlx transaction on that
conn), then call peer_count (or replace it with an inline SELECT COUNT(*)) and
prune_old_peer_records using that same conn (modify prune_old_peer_records to
accept &mut conn or run its SQL inline inside the transaction), re-check
peer_limit_reached against self.max_total_peers, and only then perform the
insert into peer and commit the transaction; if limit reached, rollback and
return InsertPeerResult::PeerLimitReached. Ensure all queries use the
transaction/conn rather than grabbing fresh connections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aee2f984-90d0-4037-9ee6-eebb7a48e720

📥 Commits

Reviewing files that changed from the base of the PR and between 815c728 and 8da0090.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • db_v2.sqlite3
  • libs/hbb_common
  • src/common.rs
  • src/database.rs
  • src/peer.rs
  • src/relay_server.rs
  • src/rendezvous_server.rs
  • tests/server_protection_process.rs

Comment thread src/common.rs
Comment thread src/common.rs Outdated
Comment thread src/peer.rs
Comment thread src/relay_server.rs
Comment thread src/rendezvous_server.rs
Comment thread tests/server_protection_process.rs Outdated
Introduce MAX_CONNECTION_RATE_ENTRIES with a default and enforce a cap on connection-rate entries: evict the oldest entry when the cap is reached, record eviction/rejection protection events, and expose the cap in protection summaries. Fix forwarded IP port handling and add websocket-forwarded rate-limit checks in relay and rendezvous servers.

Refactor peer caching insertion: compute eviction candidates, remove excess peers before inserting pending registrations, and enforce per-IP pending registration limits without the previous helper. Update select_peer_ids_to_evict usage and minor formatting/await handling.

Revamp test process spawn helpers: replace single-port reservation with retrying pick_*_port_candidate functions, make spawn_hbbs/spawn_hbbr return the chosen port and child, preserve temp dirs on certain errors, and add/adjust tests (including a new test to assert connection-rate entry bounding). Miscellaneous whitespace, logging, and small API-call formatting improvements across files.

Signed-off-by: Vlad <shaamaan@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/peer.rs (3)

271-297: prune_cache_for_insert has a TOCTOU gap between snapshot and removal.

This method takes a read lock to build the snapshot, releases it, then acquires a write lock to remove entries. Between these operations, the map state could change (entries added/removed by concurrent calls). While this is less critical here because:

  1. It's called from get() which is a read path
  2. Worst case is slightly more or fewer evictions than intended

The get_or_for_registration method correctly holds the write lock throughout, which is the critical path for pending registration limits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/peer.rs` around lines 271 - 297, prune_cache_for_insert has a TOCTOU
window: it builds a snapshot under a read lock then releases it and later takes
a write lock to remove peers, allowing concurrent map changes between snapshot
and removal; fix by performing eviction decision while holding the write lock
(i.e., acquire self.map.write().await before computing entries and calling
select_peer_ids_to_evict) or, if you must keep the read snapshot,
re-check/compute entries again after acquiring the write lock and only remove
IDs that still exist and meet eviction criteria; reference
prune_cache_for_insert, select_peer_ids_to_evict, and self.map (and mirror the
write-lock approach used in get_or_for_registration).

227-238: Potential performance concern with async read locks inside write lock scope.

Within the write lock scope (let mut w = self.map.write().await), the code iterates over all peers and calls peer.read().await on each LockPeer. This holds the map write lock while waiting on potentially many individual peer read locks. Under high load with many cached peers, this could cause contention.

Consider whether extracting only the needed fields (ip, guid, last_reg_time) into a snapshot structure before eviction analysis could reduce lock hold time. However, given the security-focused nature of this PR and the existing pattern in prune_cache_for_insert, this may be acceptable for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/peer.rs` around lines 227 - 238, The current loop holds
self.map.write().await while awaiting peer.read().await for each LockPeer,
causing contention; change to first capture a snapshot of the peer handles
(clone the peer_id and Arc/handle to LockPeer from self.map) while holding the
write lock, then immediately drop the write lock and iterate over that snapshot
performing peer.read().await to extract ip, guid and last_reg_time and build
PeerEvictionEntry and pending_ids_for_ip; this mirrors the behavior in
prune_cache_for_insert and ensures the write lock is not held during per-peer
async reads.

79-89: Consider accepting max_pending=0 as "unlimited" or documenting the behavior.

When max_pending is 0, pending_registration_limit_exceeded always returns false (no limit), which seems intentional. However, env_usize_or filters out 0 values via .filter(|value| *value > 0), so an operator cannot explicitly set MAX_PENDING_REGISTRATIONS_PER_IP=0 to disable the limit. This is fine if 0 means "use default", but may be surprising if an operator wants to disable the feature entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/peer.rs` around lines 79 - 89, The current behavior treats
MAX_PENDING_REGISTRATIONS_PER_IP=0 as "use default" because env_usize_or filters
out zero, while pending_registration_limit_exceeded already treats 0 as
"unlimited"; either make zero explicitly settable or document it. To fix, update
env_usize_or (or add env_usize_or_allow_zero) to stop filtering out zero so a
parsed 0 is returned, and keep pending_registration_limit_exceeded as-is (it
will then correctly treat 0 as unlimited), or alternatively add a comment on
pending_registration_limit_exceeded explaining that max_pending==0 means
unlimited and leave env_usize_or unchanged if you prefer default-on-missing
semantics. Ensure changes reference env_usize_or and
pending_registration_limit_exceeded so reviewers can find the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/peer.rs`:
- Around line 271-297: prune_cache_for_insert has a TOCTOU window: it builds a
snapshot under a read lock then releases it and later takes a write lock to
remove peers, allowing concurrent map changes between snapshot and removal; fix
by performing eviction decision while holding the write lock (i.e., acquire
self.map.write().await before computing entries and calling
select_peer_ids_to_evict) or, if you must keep the read snapshot,
re-check/compute entries again after acquiring the write lock and only remove
IDs that still exist and meet eviction criteria; reference
prune_cache_for_insert, select_peer_ids_to_evict, and self.map (and mirror the
write-lock approach used in get_or_for_registration).
- Around line 227-238: The current loop holds self.map.write().await while
awaiting peer.read().await for each LockPeer, causing contention; change to
first capture a snapshot of the peer handles (clone the peer_id and Arc/handle
to LockPeer from self.map) while holding the write lock, then immediately drop
the write lock and iterate over that snapshot performing peer.read().await to
extract ip, guid and last_reg_time and build PeerEvictionEntry and
pending_ids_for_ip; this mirrors the behavior in prune_cache_for_insert and
ensures the write lock is not held during per-peer async reads.
- Around line 79-89: The current behavior treats
MAX_PENDING_REGISTRATIONS_PER_IP=0 as "use default" because env_usize_or filters
out zero, while pending_registration_limit_exceeded already treats 0 as
"unlimited"; either make zero explicitly settable or document it. To fix, update
env_usize_or (or add env_usize_or_allow_zero) to stop filtering out zero so a
parsed 0 is returned, and keep pending_registration_limit_exceeded as-is (it
will then correctly treat 0 as unlimited), or alternatively add a comment on
pending_registration_limit_exceeded explaining that max_pending==0 means
unlimited and leave env_usize_or unchanged if you prefer default-on-missing
semantics. Ensure changes reference env_usize_or and
pending_registration_limit_exceeded so reviewers can find the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ec488ab-b976-4661-8692-62aaf30bdb07

📥 Commits

Reviewing files that changed from the base of the PR and between 8da0090 and 8aa43d3.

📒 Files selected for processing (5)
  • src/common.rs
  • src/peer.rs
  • src/relay_server.rs
  • src/rendezvous_server.rs
  • tests/server_protection_process.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/common.rs
  • tests/server_protection_process.rs

ssh4net added 2 commits April 13, 2026 19:10
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/database.rs (2)

147-183: ⚠️ Potential issue | 🟠 Major

Make peer-limit enforcement atomic.

The cap check and insert are split across separate DB operations, so concurrent insert_peer() calls can all observe room below MAX_TOTAL_PEER_RECORDS and insert past the configured cap. Wrap count/prune/recheck/insert in one transaction with a write lock, or serialize new-peer inserts.

Suggested direction
     ) -> ResultType<InsertPeerResult> {
-        if peer_limit_reached(self.peer_count().await?, self.max_total_peers) {
+        // Run count/prune/recheck/insert atomically, e.g. inside a transaction
+        // started with a write lock, so concurrent registrations cannot exceed
+        // MAX_TOTAL_PEER_RECORDS.
+        if peer_limit_reached(self.peer_count().await?, self.max_total_peers) {
             if peer_retention_prune_enabled(self.peer_record_retention_days) {
                 let deleted = self.prune_old_peer_records().await?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database.rs` around lines 147 - 183, The insert_peer logic is racy
because peer_count/prune/check and the actual insert are separate DB operations;
modify insert_peer to perform count, optional prune, recheck and insert inside a
single DB transaction (use sqlx transaction API) and acquire a write lock (e.g.,
SELECT COUNT(*) FOR UPDATE or an explicit table/advisory lock) so concurrent
calls serialize; call peer_count()/prune_old_peer_records() or their SQL
equivalents inside that transaction against the same connection, re-evaluate
peer_limit_reached using self.max_total_peers and peer_record_retention_days,
and only perform the INSERT when the recheck passes, returning
InsertPeerResult::PeerLimitReached otherwise, then commit/rollback the
transaction and propagate errors as before.

112-133: ⚠️ Potential issue | 🔴 Critical

Add a migration path before relying on peer.created_at.

Existing peer tables will not have the created_at column because CREATE TABLE IF NOT EXISTS does not modify existing tables. The code will fail at startup when creating index_peer_created_at on the non-existent column, or at runtime when prune_old_peer_records() executes. No schema migration or version tracking exists in the codebase to handle this.

Implement an explicit migration that checks for the missing column (via PRAGMA table_info(peer)), adds it if absent with a default timestamp, and backfills existing rows before the index is created or pruning occurs.

Suggested direction
         db.create_tables().await?;
+        db.ensure_peer_created_at_column().await?;
         Ok(db)

Implement ensure_peer_created_at_column() by inspecting PRAGMA table_info(peer), adding a nullable created_at column when missing, and backfilling existing rows with a default timestamp before creating the index.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database.rs` around lines 112 - 133, Add a migration step to ensure the
peer.created_at column exists and is populated before creating the index or
pruning: implement a new async function ensure_peer_created_at_column() that
runs PRAGMA table_info(peer) to detect whether "created_at" exists, and if
missing executes an ALTER TABLE peer ADD COLUMN created_at datetime (nullable)
then UPDATE peer SET created_at = current_timestamp WHERE created_at IS NULL to
backfill existing rows; call ensure_peer_created_at_column() from
create_tables() before executing the SQL that creates index_peer_created_at and
also ensure prune_old_peer_records() only runs after this migration completes.
Reference these symbols: ensure_peer_created_at_column(), create_tables(),
prune_old_peer_records().
♻️ Duplicate comments (1)
src/relay_server.rs (1)

482-490: ⚠️ Potential issue | 🟠 Major

Apply the WebSocket rate limit after trusted proxy IP reconstruction.

This still charges hbbr-ws against the reverse proxy’s socket IP before the handshake extracts the forwarded client IP. With TRUST_PROXY_HEADERS enabled, one noisy proxied client can consume the shared proxy budget and throttle unrelated clients behind the same proxy.

Suggested direction
-                        if !crate::common::allow_connection_from_ip("hbbr-ws", addr) {
-                            log::warn!("Rate limit exceeded for hbbr-ws from {}", addr.ip());
-                            continue;
-                        }
                         stream.set_nodelay(true).ok();
                         handle_connection(stream, addr, &limiter, key, true).await;

Then perform the hbbr-ws admission check after apply_trusted_proxy_addr() in make_pair(), using the reconstructed address for trusted proxy deployments.

Also applies to: 543-559

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/relay_server.rs` around lines 482 - 490, The hbbr-ws admission/rate-check
is being done against the immediate socket addr (proxy IP) instead of the
reconstructed client IP; move the allow_connection_from_ip("hbbr-ws", addr)
check so it runs after trusted-proxy reconstruction (apply_trusted_proxy_addr)
inside make_pair()/connection setup where the forwarded client address is
established, and remove/replace the pre-handshake check in the listener accept
path; repeat the same change for the second occurrence (the block around the
other listener handling at lines ~543-559) so both ws paths use the
reconstructed addr for rate limiting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/peer.rs`:
- Around line 168-185: prune_ip_change_entries currently drops entries that have
only one IP because it requires both elapsed < IP_CHANGE_DUR_X2 and
value.1.len() > 1; change the retention condition so recent single-IP entries
are kept: retain entries where value.0.elapsed().as_secs() < IP_CHANGE_DUR_X2 OR
value.1.len() > 1. Update the function prune_ip_change_entries (and leave
evict_oldest_ip_change_entry, track_ip_change, IpChangesMap usage unchanged) so
first-seen IPs are preserved long enough for a second IP to be observed and
evicted only when truly expired or after multiple IPs exist.

In `@src/relay_server.rs`:
- Around line 577-630: The code inserts into ACTIVE_RELAYS keyed by rf.uuid
without checking if that UUID is already active, allowing a client to reuse an
active UUID and overwrite the existing entry; fix by checking ACTIVE_RELAYS for
rf.uuid before pairing and refuse or disambiguate the new request instead of
inserting over an existing active relay. Concretely: when you take the PEERS
pending and before creating/using active_relays.insert(rf.uuid.clone(), ...),
acquire ACTIVE_RELAYS.lock().await, check if
active_relays.contains_key(&rf.uuid) and if so call send_relay_refuse(&mut
stream, "uuid_in_use").await (and refuse the pending.stream too) and return;
otherwise perform the insert while still holding the lock (or perform an atomic
check-and-insert under the same lock) so you never overwrite an existing
ActiveRelay. Apply the same guard in the other matching block that also inserts
into ACTIVE_RELAYS.

---

Outside diff comments:
In `@src/database.rs`:
- Around line 147-183: The insert_peer logic is racy because
peer_count/prune/check and the actual insert are separate DB operations; modify
insert_peer to perform count, optional prune, recheck and insert inside a single
DB transaction (use sqlx transaction API) and acquire a write lock (e.g., SELECT
COUNT(*) FOR UPDATE or an explicit table/advisory lock) so concurrent calls
serialize; call peer_count()/prune_old_peer_records() or their SQL equivalents
inside that transaction against the same connection, re-evaluate
peer_limit_reached using self.max_total_peers and peer_record_retention_days,
and only perform the INSERT when the recheck passes, returning
InsertPeerResult::PeerLimitReached otherwise, then commit/rollback the
transaction and propagate errors as before.
- Around line 112-133: Add a migration step to ensure the peer.created_at column
exists and is populated before creating the index or pruning: implement a new
async function ensure_peer_created_at_column() that runs PRAGMA table_info(peer)
to detect whether "created_at" exists, and if missing executes an ALTER TABLE
peer ADD COLUMN created_at datetime (nullable) then UPDATE peer SET created_at =
current_timestamp WHERE created_at IS NULL to backfill existing rows; call
ensure_peer_created_at_column() from create_tables() before executing the SQL
that creates index_peer_created_at and also ensure prune_old_peer_records() only
runs after this migration completes. Reference these symbols:
ensure_peer_created_at_column(), create_tables(), prune_old_peer_records().

---

Duplicate comments:
In `@src/relay_server.rs`:
- Around line 482-490: The hbbr-ws admission/rate-check is being done against
the immediate socket addr (proxy IP) instead of the reconstructed client IP;
move the allow_connection_from_ip("hbbr-ws", addr) check so it runs after
trusted-proxy reconstruction (apply_trusted_proxy_addr) inside
make_pair()/connection setup where the forwarded client address is established,
and remove/replace the pre-handshake check in the listener accept path; repeat
the same change for the second occurrence (the block around the other listener
handling at lines ~543-559) so both ws paths use the reconstructed addr for rate
limiting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 10f0a319-02d8-4ec3-9c09-de8f2ca09ce6

📥 Commits

Reviewing files that changed from the base of the PR and between a7ad238 and 2401abd.

📒 Files selected for processing (6)
  • src/common.rs
  • src/database.rs
  • src/peer.rs
  • src/relay_server.rs
  • src/rendezvous_server.rs
  • tests/server_protection_process.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/server_protection_process.rs
  • src/rendezvous_server.rs

Comment thread src/peer.rs
Comment thread src/relay_server.rs
Adjust prune_ip_change_entries to retain entries that are either recent or have multiple IPs (use || instead of &&) to avoid prematurely dropping recent single-IP records, and add a unit test covering that behavior. In relay_server.rs, add checks to detect and reject relay requests when the relay UUID is already active (both for immediate and pending flows), log warnings, record protection events, and send refuse responses. Add corresponding tests and minor test imports to validate UUID reuse detection.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/relay_server.rs (2)

992-1006: std::sync::Mutex held across .await in async test.

TEST_RELAY_LIMITS_LOCK.lock().unwrap() returns a std::sync::MutexGuard that is held across ACTIVE_RELAYS.lock().await. Beyond tripping Clippy's await_holding_lock, if this module's test suite grows to use #[tokio::test(flavor = "multi_thread")], a panicked task holding the std mutex across an await could poison it and cause surprising failures. Prefer a tokio::sync::Mutex for tests that must await, or scope the std guard strictly around the synchronous mutations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/relay_server.rs` around lines 992 - 1006, The test
active_relay_uuid_reuse_is_detected holds a std::sync::MutexGuard from
TEST_RELAY_LIMITS_LOCK across an .await (ACTIVE_RELAYS.lock().await), which can
trigger Clippy's await_holding_lock and risk poisoning; to fix, either replace
TEST_RELAY_LIMITS_LOCK with a tokio::sync::Mutex so it can be .awaited safely,
or limit the scope of the std mutex guard by dropping it (end its scope) before
calling ACTIVE_RELAYS.lock().await; update the test to use
TEST_RELAY_LIMITS_LOCK (tokio mutex) or move the guard so no
std::sync::MutexGuard lives across the await in
active_relay_uuid_reuse_is_detected.

677-680: Redundant self-scheduled removal overlaps with prune_expired_pending_relays.

Every new pending relay spawns (via the outer tokio::spawn in handle_connection) a task that sleeps for PENDING_RELAY_HOLD_SECS purely to PEERS.lock().await.remove(&rf.uuid). The insert path already calls prune_expired_pending_relays(&mut peers) using created_at, so the sleep-then-remove is duplicated work that also keeps a task alive for 30 s per registration. Consider dropping the sleep+remove and relying solely on the prune sweep (the entry's created_at already bounds its lifetime).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/relay_server.rs` around lines 677 - 680, The code spawns a
per-registration task that sleeps PENDING_RELAY_HOLD_SECS and then calls
PEERS.lock().await.remove(&rf.uuid), duplicating lifetime enforcement already
handled by prune_expired_pending_relays; remove the sleep + removal block from
the insertion path (the lines inserting PendingRelay::new and the subsequent
drop(peers); sleep(...).await; PEERS.lock().await.remove(&rf.uuid)) so pending
entries rely on their created_at and the existing prune_expired_pending_relays
sweep invoked from handle_connection/insert logic; keep the initial
peers.insert(rf.uuid.clone(), PendingRelay::new(...)) and ensure no other code
depends on the self-scheduled removal task.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/relay_server.rs`:
- Around line 660-677: The PEERS map insertion can overwrite an existing
PendingRelay for the same rf.uuid, disconnecting the original client; before
inserting into PEERS in the new-pending branch (where you call
prune_expired_pending_relays, pending_relay_limit_reason and then
peers.insert(rf.uuid.clone(), PendingRelay::new(...))), check whether PEERS
already contains rf.uuid (e.g. peers.contains_key(&rf.uuid) or use the entry
API) and if so, record the protection event
(crate::common::record_protection_event("relay_pending_rejected")), log a
warning similar to the existing one, call send_relay_refuse(&mut stream,
"uuid_in_use" or appropriate reason).await, drop the new stream and return — do
not overwrite the existing PendingRelay in PEERS. Ensure the check uses the same
rf.uuid and that the send_relay_refuse path is exercised for the second
contender.

---

Nitpick comments:
In `@src/relay_server.rs`:
- Around line 992-1006: The test active_relay_uuid_reuse_is_detected holds a
std::sync::MutexGuard from TEST_RELAY_LIMITS_LOCK across an .await
(ACTIVE_RELAYS.lock().await), which can trigger Clippy's await_holding_lock and
risk poisoning; to fix, either replace TEST_RELAY_LIMITS_LOCK with a
tokio::sync::Mutex so it can be .awaited safely, or limit the scope of the std
mutex guard by dropping it (end its scope) before calling
ACTIVE_RELAYS.lock().await; update the test to use TEST_RELAY_LIMITS_LOCK (tokio
mutex) or move the guard so no std::sync::MutexGuard lives across the await in
active_relay_uuid_reuse_is_detected.
- Around line 677-680: The code spawns a per-registration task that sleeps
PENDING_RELAY_HOLD_SECS and then calls PEERS.lock().await.remove(&rf.uuid),
duplicating lifetime enforcement already handled by
prune_expired_pending_relays; remove the sleep + removal block from the
insertion path (the lines inserting PendingRelay::new and the subsequent
drop(peers); sleep(...).await; PEERS.lock().await.remove(&rf.uuid)) so pending
entries rely on their created_at and the existing prune_expired_pending_relays
sweep invoked from handle_connection/insert logic; keep the initial
peers.insert(rf.uuid.clone(), PendingRelay::new(...)) and ensure no other code
depends on the self-scheduled removal task.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2952d27a-c058-464c-9e27-51796275be28

📥 Commits

Reviewing files that changed from the base of the PR and between 2401abd and 8de597c.

📒 Files selected for processing (2)
  • src/peer.rs
  • src/relay_server.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/peer.rs

Comment thread src/relay_server.rs
Comment on lines +660 to +677
let mut peers = PEERS.lock().await;
prune_expired_pending_relays(&mut peers);
let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count();
if let Some(reason) =
pending_relay_limit_reason(peers.len(), pending_for_ip)
{
crate::common::record_protection_event("relay_pending_rejected");
log::warn!(
"Rejecting relay request {} from {}: {}",
rf.uuid,
addr,
reason
);
drop(peers);
send_relay_refuse(&mut stream, reason).await;
return;
}
peers.insert(rf.uuid.clone(), PendingRelay::new(Box::new(stream), ip));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing PEERS UUID collision check — concurrent new-pending requests silently evict the earlier client.

The new-pending branch guards against rf.uuid already being in ACTIVE_RELAYS (line 648) but never checks whether it already exists in PEERS. Two clients that arrive with the same UUID during the 30 s hold window will both pass PEERS.remove() returning None at line 578, both pass the ACTIVE_RELAYS check, and then serialize on the PEERS lock at line 660. The second peers.insert(rf.uuid.clone(), …) at line 677 overwrites the first PendingRelay; its Box<dyn StreamTrait> is dropped and the original client is disconnected without a send_relay_refuse. A malicious peer that replays/guesses an active UUID can grief legitimate clients.

🔒 Proposed fix
                         drop(active_relays);
                         let mut peers = PEERS.lock().await;
                         prune_expired_pending_relays(&mut peers);
+                        if peers.contains_key(&rf.uuid) {
+                            drop(peers);
+                            crate::common::record_protection_event("relay_uuid_in_use");
+                            log::warn!(
+                                "Rejecting pending relay request {} from {}: uuid already pending",
+                                rf.uuid,
+                                addr
+                            );
+                            send_relay_refuse(&mut stream, "uuid_in_use").await;
+                            return;
+                        }
                         let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let mut peers = PEERS.lock().await;
prune_expired_pending_relays(&mut peers);
let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count();
if let Some(reason) =
pending_relay_limit_reason(peers.len(), pending_for_ip)
{
crate::common::record_protection_event("relay_pending_rejected");
log::warn!(
"Rejecting relay request {} from {}: {}",
rf.uuid,
addr,
reason
);
drop(peers);
send_relay_refuse(&mut stream, reason).await;
return;
}
peers.insert(rf.uuid.clone(), PendingRelay::new(Box::new(stream), ip));
let mut peers = PEERS.lock().await;
prune_expired_pending_relays(&mut peers);
if peers.contains_key(&rf.uuid) {
drop(peers);
crate::common::record_protection_event("relay_uuid_in_use");
log::warn!(
"Rejecting pending relay request {} from {}: uuid already pending",
rf.uuid,
addr
);
send_relay_refuse(&mut stream, "uuid_in_use").await;
return;
}
let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count();
if let Some(reason) =
pending_relay_limit_reason(peers.len(), pending_for_ip)
{
crate::common::record_protection_event("relay_pending_rejected");
log::warn!(
"Rejecting relay request {} from {}: {}",
rf.uuid,
addr,
reason
);
drop(peers);
send_relay_refuse(&mut stream, reason).await;
return;
}
peers.insert(rf.uuid.clone(), PendingRelay::new(Box::new(stream), ip));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/relay_server.rs` around lines 660 - 677, The PEERS map insertion can
overwrite an existing PendingRelay for the same rf.uuid, disconnecting the
original client; before inserting into PEERS in the new-pending branch (where
you call prune_expired_pending_relays, pending_relay_limit_reason and then
peers.insert(rf.uuid.clone(), PendingRelay::new(...))), check whether PEERS
already contains rf.uuid (e.g. peers.contains_key(&rf.uuid) or use the entry
API) and if so, record the protection event
(crate::common::record_protection_event("relay_pending_rejected")), log a
warning similar to the existing one, call send_relay_refuse(&mut stream,
"uuid_in_use" or appropriate reason).await, drop the new stream and return — do
not overwrite the existing PendingRelay in PEERS. Ensure the check uses the same
rf.uuid and that the send_relay_refuse path is exercised for the second
contender.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant