fix(security): harden hbbs/hbbr against unauthenticated abuse and state exhaustion#642
fix(security): harden hbbs/hbbr against unauthenticated abuse and state exhaustion#642ssh4net wants to merge 5 commits intorustdesk:masterfrom
Conversation
Update Cargo.lock with multiple dependency bumps and additions, update the hbb_common submodule, and adjust source code across src/common.rs, src/database.rs, src/peer.rs, src/relay_server.rs and src/rendezvous_server.rs to match the updates. Commit also updates the local SQLite DB (db_v2.sqlite3) and adds a new integration test (tests/server_protection_process.rs) to exercise server protection behavior. These changes combine dependency maintenance with corresponding code and test updates to ensure compatibility and improved coverage. Signed-off-by: Vlad <shaamaan@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds proxy-aware client IP extraction, per-IP and per-message rate limiting, relay/pending/peer capacity controls with pruning and eviction, registration/license validation, database peer retention caps, protection reporting, and extensive unit and integration tests. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant ConnHandler as Connection<br/>Handler
participant RateLimiter as Rate Limiter<br/>(common.rs)
participant Cache as Peer Cache
participant DB as Database
Client->>ConnHandler: Incoming TCP/UDP/control request
ConnHandler->>RateLimiter: allow_connection_from_ip / allow_udp_packet_from_ip / allow_control_message_from_ip
alt Rate limited
RateLimiter-->>ConnHandler: blocked
ConnHandler-->>Client: drop/refuse
else Allowed
ConnHandler->>Cache: get_or_for_registration(id, ip)
alt Pending per-IP cap exceeded
Cache-->>ConnHandler: None (TOO_FREQUENT)
ConnHandler-->>Client: error response
else Proceed
Cache->>Cache: prune_cache_for_insert()/evict if needed
Cache->>DB: insert_peer(...)
alt DB at capacity
DB->>DB: prune_old_peer_records()
DB-->>Cache: PeerLimitReached
ConnHandler-->>Client: PEER_LIMIT_REACHED
else Inserted
DB-->>Cache: Inserted(guid)
ConnHandler-->>Client: success
end
end
end
sequenceDiagram
actor Client
participant ReqHandler as Relay<br/>RequestHandler
participant RateLimiter as Rate Limiter<br/>(common.rs)
participant RelayQueue as Pending Relay Queue
participant DB as Database
Client->>ReqHandler: relay_request(stream, key, ip)
ReqHandler->>RateLimiter: allow_connection_from_ip / validate license/key
alt Key mismatch
ReqHandler->>ReqHandler: send_relay_refuse("Key mismatch")
ReqHandler-->>Client: refusal
else Key valid
ReqHandler->>RelayQueue: prune_expired_pending_relays()
RelayQueue->>RelayQueue: compute global and per-IP pending counts
alt Pending/active caps exceeded
RelayQueue-->>ReqHandler: limit reason
ReqHandler->>ReqHandler: send_relay_refuse(reason)
ReqHandler-->>Client: refusal
else Accept
RelayQueue->>RelayQueue: insert PendingRelay{stream, ip, created_at}
ReqHandler->>DB: pair and start ActiveRelay
ReqHandler-->>Client: relay established
end
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/database.rs (1)
153-178:⚠️ Potential issue | 🟠 MajorMake the peer-cap check atomic with the insert.
This is
count -> maybe prune -> count -> insertacross separate statements, andpeer_count()grabs a fresh pooled connection each time. Concurrent registrations, or another server process writing the same SQLite file, can pass both checks and still insert pastMAX_TOTAL_PEER_RECORDS, so the new cap is bypassable under load.Suggested direction
- if peer_limit_reached(self.peer_count().await?, self.max_total_peers) { - if peer_retention_prune_enabled(self.peer_record_retention_days) { - let deleted = self.prune_old_peer_records().await?; - if deleted > 0 { - crate::common::record_protection_event("peer_records_pruned"); - log::info!("pruned {} old peer records before inserting {}", deleted, id); - } - } - } - if peer_limit_reached(self.peer_count().await?, self.max_total_peers) { + let mut conn = self.pool.get().await?; + sqlx::query("BEGIN IMMEDIATE") + .execute(conn.deref_mut()) + .await?; + + let total = sqlx::query!("select count(*) as count from peer") + .fetch_one(conn.deref_mut()) + .await? + .count as i64; + + if peer_limit_reached(total, self.max_total_peers) { + if peer_retention_prune_enabled(self.peer_record_retention_days) { + let cutoff = peer_retention_cutoff_arg(self.peer_record_retention_days); + let deleted = sqlx::query("delete from peer where created_at < datetime('now', ?)") + .bind(cutoff) + .execute(conn.deref_mut()) + .await? + .rows_affected(); + if deleted > 0 { + crate::common::record_protection_event("peer_records_pruned"); + log::info!("pruned {} old peer records before inserting {}", deleted, id); + } + } + } + + let total = sqlx::query!("select count(*) as count from peer") + .fetch_one(conn.deref_mut()) + .await? + .count as i64; + if peer_limit_reached(total, self.max_total_peers) { crate::common::record_protection_event("peer_limit_reached"); log::warn!("peer record limit reached, rejecting new peer {}", id); + sqlx::query("ROLLBACK").execute(conn.deref_mut()).await?; return Ok(InsertPeerResult::PeerLimitReached); } let guid = uuid::Uuid::new_v4().as_bytes().to_vec(); sqlx::query!( "insert into peer(guid, id, uuid, pk, info) values(?, ?, ?, ?, ?)", @@ - .execute(self.pool.get().await?.deref_mut()) + .execute(conn.deref_mut()) .await?; + sqlx::query("COMMIT").execute(conn.deref_mut()).await?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database.rs` around lines 153 - 178, Wrap the count->prune->count->insert sequence in a single DB transaction using the same pooled connection so the peer-cap check is atomic: acquire a connection once (let conn = self.pool.get().await?), begin a transaction with an immediate/serializable lock (use sqlx transaction on that conn), then call peer_count (or replace it with an inline SELECT COUNT(*)) and prune_old_peer_records using that same conn (modify prune_old_peer_records to accept &mut conn or run its SQL inline inside the transaction), re-check peer_limit_reached against self.max_total_peers, and only then perform the insert into peer and commit the transaction; if limit reached, rollback and return InsertPeerResult::PeerLimitReached. Ensure all queries use the transaction/conn rather than grabbing fresh connections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/common.rs`:
- Around line 188-205: The map currently allocates an entry for every new source
IP (via CONNECTION_RATE_LIMITS and the key variable), which allows memory
exhaustion; before inserting a new ConnectionRateEntry, enforce a global cap
(e.g., MAX_CONNECTION_RATE_ENTRIES) by checking lock.len() and pruning stale
entries (reuse prune_connection_rate_limits) and if still at capacity evict the
oldest entry (compare last_seen_at) or simply treat the unseen IP as
rate-limited and return false instead of inserting; update the logic around
lock.entry(key).or_insert(...) to only create the entry when under the cap and
to perform eviction via last_seen_at on CONNECTION_RATE_LIMITS when needed.
- Around line 272-281: apply_trusted_proxy_addr currently rewrites trusted-proxy
peer addresses to SocketAddr::new(ip, 0), collapsing all clients behind the same
public IP into one synthetic address and causing collisions where tcp_punch
(keyed by SocketAddr in src/rendezvous_server.rs) overwrites concurrent
connections. Change apply_trusted_proxy_addr so that when a forwarded_ip is
present it preserves the original connection's port (use addr.port()) instead of
0 (i.e., construct the new SocketAddr from the parsed ip and the original addr's
port), keeping the per-connection discriminator while still using
parse_forwarded_ip and trust_proxy_headers().
In `@src/peer.rs`:
- Around line 214-231: The current flow in get_or_for_registration lets multiple
concurrent callers bypass MAX_PENDING_REGISTRATIONS_PER_IP because
can_cache_pending_registration() inspects state before prune_cache_for_insert()
and before acquiring the write lock; fix by performing the pending-registration
count and pruning while holding the same write lock used to insert the
placeholder: acquire the write lock (the same map.write() used later), call
prune_cache_for_insert() or inline its logic under that lock, recompute the
per-IP pending count, and only insert the new Arc<RwLock<Peer>> if the count is
below MAX_PENDING_REGISTRATIONS_PER_IP; if the cap is reached return None.
Ensure you still short-circuit and return an existing peer when w.get(id) is
present after acquiring the write lock.
In `@src/relay_server.rs`:
- Around line 396-404: The WS listener currently calls allow_connection_from_ip
for "hbbr-ws" using the raw peer addr returned by listener2.accept(), which
rate-limits the reverse proxy rather than the real client when
TRUST_PROXY_HEADERS is enabled; change the logic so that handle_connection (or
the WebSocket handshake in make_pair) performs the allow_connection_from_ip
check using the reconstructed client IP when proxy headers are trusted, rather
than using the listener peer addr. Concretely, remove or bypass the
allow_connection_from_ip call in the listener2.accept() branch and ensure
make_pair or the handshake path invokes allow_connection_from_ip("hbbr-ws",
client_ip) after extracting and validating proxy headers (respecting
TRUST_PROXY_HEADERS) so quotas are applied per real client IP.
In `@src/rendezvous_server.rs`:
- Around line 304-312: The rate-limit check currently performed right after
listener3.accept() using allow_connection_from_ip("hbbs-ws", addr) applies the
proxy IP rather than the real client IP and should be moved or adjusted so it
uses the post-proxy/forwarded client address; update the logic to perform the
allow_connection_from_ip call inside handle_listener/handle_listener_inner after
apply_trusted_proxy_addr() has resolved the real client IP (or, if
TRUST_PROXY_HEADERS is enabled, extract the forwarded address first and pass
that IP to allow_connection_from_ip) so per-client buckets are enforced
correctly instead of rate-limiting the proxy.
In `@tests/server_protection_process.rs`:
- Around line 52-80: reserve_hbbs_port and reserve_hbbr_port race because they
bind sockets, drop them, and return only port numbers; another process can claim
the ports before the child is spawned. Fix by changing these helpers to keep the
actual bound sockets alive until the child process is started: in
reserve_hbbs_port return the bound TcpListener(s) and UdpSocket (or a struct
holding them) instead of u16, and in reserve_hbbr_port return the bound
TcpListener(s); then pass those open descriptors (or rely on inheritable fds)
into the child spawn so the ports remain reserved until the child binds them.
Ensure functions named reserve_hbbs_port and reserve_hbbr_port are updated
accordingly and all call sites are adjusted to accept and hold the returned
listeners/sockets until after child startup.
---
Outside diff comments:
In `@src/database.rs`:
- Around line 153-178: Wrap the count->prune->count->insert sequence in a single
DB transaction using the same pooled connection so the peer-cap check is atomic:
acquire a connection once (let conn = self.pool.get().await?), begin a
transaction with an immediate/serializable lock (use sqlx transaction on that
conn), then call peer_count (or replace it with an inline SELECT COUNT(*)) and
prune_old_peer_records using that same conn (modify prune_old_peer_records to
accept &mut conn or run its SQL inline inside the transaction), re-check
peer_limit_reached against self.max_total_peers, and only then perform the
insert into peer and commit the transaction; if limit reached, rollback and
return InsertPeerResult::PeerLimitReached. Ensure all queries use the
transaction/conn rather than grabbing fresh connections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aee2f984-90d0-4037-9ee6-eebb7a48e720
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
db_v2.sqlite3libs/hbb_commonsrc/common.rssrc/database.rssrc/peer.rssrc/relay_server.rssrc/rendezvous_server.rstests/server_protection_process.rs
Introduce MAX_CONNECTION_RATE_ENTRIES with a default and enforce a cap on connection-rate entries: evict the oldest entry when the cap is reached, record eviction/rejection protection events, and expose the cap in protection summaries. Fix forwarded IP port handling and add websocket-forwarded rate-limit checks in relay and rendezvous servers. Refactor peer caching insertion: compute eviction candidates, remove excess peers before inserting pending registrations, and enforce per-IP pending registration limits without the previous helper. Update select_peer_ids_to_evict usage and minor formatting/await handling. Revamp test process spawn helpers: replace single-port reservation with retrying pick_*_port_candidate functions, make spawn_hbbs/spawn_hbbr return the chosen port and child, preserve temp dirs on certain errors, and add/adjust tests (including a new test to assert connection-rate entry bounding). Miscellaneous whitespace, logging, and small API-call formatting improvements across files. Signed-off-by: Vlad <shaamaan@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/peer.rs (3)
271-297:prune_cache_for_inserthas a TOCTOU gap between snapshot and removal.This method takes a read lock to build the snapshot, releases it, then acquires a write lock to remove entries. Between these operations, the map state could change (entries added/removed by concurrent calls). While this is less critical here because:
- It's called from
get()which is a read path- Worst case is slightly more or fewer evictions than intended
The
get_or_for_registrationmethod correctly holds the write lock throughout, which is the critical path for pending registration limits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/peer.rs` around lines 271 - 297, prune_cache_for_insert has a TOCTOU window: it builds a snapshot under a read lock then releases it and later takes a write lock to remove peers, allowing concurrent map changes between snapshot and removal; fix by performing eviction decision while holding the write lock (i.e., acquire self.map.write().await before computing entries and calling select_peer_ids_to_evict) or, if you must keep the read snapshot, re-check/compute entries again after acquiring the write lock and only remove IDs that still exist and meet eviction criteria; reference prune_cache_for_insert, select_peer_ids_to_evict, and self.map (and mirror the write-lock approach used in get_or_for_registration).
227-238: Potential performance concern with async read locks inside write lock scope.Within the write lock scope (
let mut w = self.map.write().await), the code iterates over all peers and callspeer.read().awaiton eachLockPeer. This holds the map write lock while waiting on potentially many individual peer read locks. Under high load with many cached peers, this could cause contention.Consider whether extracting only the needed fields (ip, guid, last_reg_time) into a snapshot structure before eviction analysis could reduce lock hold time. However, given the security-focused nature of this PR and the existing pattern in
prune_cache_for_insert, this may be acceptable for now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/peer.rs` around lines 227 - 238, The current loop holds self.map.write().await while awaiting peer.read().await for each LockPeer, causing contention; change to first capture a snapshot of the peer handles (clone the peer_id and Arc/handle to LockPeer from self.map) while holding the write lock, then immediately drop the write lock and iterate over that snapshot performing peer.read().await to extract ip, guid and last_reg_time and build PeerEvictionEntry and pending_ids_for_ip; this mirrors the behavior in prune_cache_for_insert and ensures the write lock is not held during per-peer async reads.
79-89: Consider acceptingmax_pending=0as "unlimited" or documenting the behavior.When
max_pendingis 0,pending_registration_limit_exceededalways returnsfalse(no limit), which seems intentional. However,env_usize_orfilters out 0 values via.filter(|value| *value > 0), so an operator cannot explicitly setMAX_PENDING_REGISTRATIONS_PER_IP=0to disable the limit. This is fine if 0 means "use default", but may be surprising if an operator wants to disable the feature entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/peer.rs` around lines 79 - 89, The current behavior treats MAX_PENDING_REGISTRATIONS_PER_IP=0 as "use default" because env_usize_or filters out zero, while pending_registration_limit_exceeded already treats 0 as "unlimited"; either make zero explicitly settable or document it. To fix, update env_usize_or (or add env_usize_or_allow_zero) to stop filtering out zero so a parsed 0 is returned, and keep pending_registration_limit_exceeded as-is (it will then correctly treat 0 as unlimited), or alternatively add a comment on pending_registration_limit_exceeded explaining that max_pending==0 means unlimited and leave env_usize_or unchanged if you prefer default-on-missing semantics. Ensure changes reference env_usize_or and pending_registration_limit_exceeded so reviewers can find the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/peer.rs`:
- Around line 271-297: prune_cache_for_insert has a TOCTOU window: it builds a
snapshot under a read lock then releases it and later takes a write lock to
remove peers, allowing concurrent map changes between snapshot and removal; fix
by performing eviction decision while holding the write lock (i.e., acquire
self.map.write().await before computing entries and calling
select_peer_ids_to_evict) or, if you must keep the read snapshot,
re-check/compute entries again after acquiring the write lock and only remove
IDs that still exist and meet eviction criteria; reference
prune_cache_for_insert, select_peer_ids_to_evict, and self.map (and mirror the
write-lock approach used in get_or_for_registration).
- Around line 227-238: The current loop holds self.map.write().await while
awaiting peer.read().await for each LockPeer, causing contention; change to
first capture a snapshot of the peer handles (clone the peer_id and Arc/handle
to LockPeer from self.map) while holding the write lock, then immediately drop
the write lock and iterate over that snapshot performing peer.read().await to
extract ip, guid and last_reg_time and build PeerEvictionEntry and
pending_ids_for_ip; this mirrors the behavior in prune_cache_for_insert and
ensures the write lock is not held during per-peer async reads.
- Around line 79-89: The current behavior treats
MAX_PENDING_REGISTRATIONS_PER_IP=0 as "use default" because env_usize_or filters
out zero, while pending_registration_limit_exceeded already treats 0 as
"unlimited"; either make zero explicitly settable or document it. To fix, update
env_usize_or (or add env_usize_or_allow_zero) to stop filtering out zero so a
parsed 0 is returned, and keep pending_registration_limit_exceeded as-is (it
will then correctly treat 0 as unlimited), or alternatively add a comment on
pending_registration_limit_exceeded explaining that max_pending==0 means
unlimited and leave env_usize_or unchanged if you prefer default-on-missing
semantics. Ensure changes reference env_usize_or and
pending_registration_limit_exceeded so reviewers can find the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ec488ab-b976-4661-8692-62aaf30bdb07
📒 Files selected for processing (5)
src/common.rssrc/peer.rssrc/relay_server.rssrc/rendezvous_server.rstests/server_protection_process.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/common.rs
- tests/server_protection_process.rs
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/database.rs (2)
147-183:⚠️ Potential issue | 🟠 MajorMake peer-limit enforcement atomic.
The cap check and insert are split across separate DB operations, so concurrent
insert_peer()calls can all observe room belowMAX_TOTAL_PEER_RECORDSand insert past the configured cap. Wrap count/prune/recheck/insert in one transaction with a write lock, or serialize new-peer inserts.Suggested direction
) -> ResultType<InsertPeerResult> { - if peer_limit_reached(self.peer_count().await?, self.max_total_peers) { + // Run count/prune/recheck/insert atomically, e.g. inside a transaction + // started with a write lock, so concurrent registrations cannot exceed + // MAX_TOTAL_PEER_RECORDS. + if peer_limit_reached(self.peer_count().await?, self.max_total_peers) { if peer_retention_prune_enabled(self.peer_record_retention_days) { let deleted = self.prune_old_peer_records().await?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database.rs` around lines 147 - 183, The insert_peer logic is racy because peer_count/prune/check and the actual insert are separate DB operations; modify insert_peer to perform count, optional prune, recheck and insert inside a single DB transaction (use sqlx transaction API) and acquire a write lock (e.g., SELECT COUNT(*) FOR UPDATE or an explicit table/advisory lock) so concurrent calls serialize; call peer_count()/prune_old_peer_records() or their SQL equivalents inside that transaction against the same connection, re-evaluate peer_limit_reached using self.max_total_peers and peer_record_retention_days, and only perform the INSERT when the recheck passes, returning InsertPeerResult::PeerLimitReached otherwise, then commit/rollback the transaction and propagate errors as before.
112-133:⚠️ Potential issue | 🔴 CriticalAdd a migration path before relying on
peer.created_at.Existing
peertables will not have thecreated_atcolumn becauseCREATE TABLE IF NOT EXISTSdoes not modify existing tables. The code will fail at startup when creatingindex_peer_created_aton the non-existent column, or at runtime whenprune_old_peer_records()executes. No schema migration or version tracking exists in the codebase to handle this.Implement an explicit migration that checks for the missing column (via
PRAGMA table_info(peer)), adds it if absent with a default timestamp, and backfills existing rows before the index is created or pruning occurs.Suggested direction
db.create_tables().await?; + db.ensure_peer_created_at_column().await?; Ok(db)Implement
ensure_peer_created_at_column()by inspectingPRAGMA table_info(peer), adding a nullablecreated_atcolumn when missing, and backfilling existing rows with a default timestamp before creating the index.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database.rs` around lines 112 - 133, Add a migration step to ensure the peer.created_at column exists and is populated before creating the index or pruning: implement a new async function ensure_peer_created_at_column() that runs PRAGMA table_info(peer) to detect whether "created_at" exists, and if missing executes an ALTER TABLE peer ADD COLUMN created_at datetime (nullable) then UPDATE peer SET created_at = current_timestamp WHERE created_at IS NULL to backfill existing rows; call ensure_peer_created_at_column() from create_tables() before executing the SQL that creates index_peer_created_at and also ensure prune_old_peer_records() only runs after this migration completes. Reference these symbols: ensure_peer_created_at_column(), create_tables(), prune_old_peer_records().
♻️ Duplicate comments (1)
src/relay_server.rs (1)
482-490:⚠️ Potential issue | 🟠 MajorApply the WebSocket rate limit after trusted proxy IP reconstruction.
This still charges
hbbr-wsagainst the reverse proxy’s socket IP before the handshake extracts the forwarded client IP. WithTRUST_PROXY_HEADERSenabled, one noisy proxied client can consume the shared proxy budget and throttle unrelated clients behind the same proxy.Suggested direction
- if !crate::common::allow_connection_from_ip("hbbr-ws", addr) { - log::warn!("Rate limit exceeded for hbbr-ws from {}", addr.ip()); - continue; - } stream.set_nodelay(true).ok(); handle_connection(stream, addr, &limiter, key, true).await;Then perform the
hbbr-wsadmission check afterapply_trusted_proxy_addr()inmake_pair(), using the reconstructed address for trusted proxy deployments.Also applies to: 543-559
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/relay_server.rs` around lines 482 - 490, The hbbr-ws admission/rate-check is being done against the immediate socket addr (proxy IP) instead of the reconstructed client IP; move the allow_connection_from_ip("hbbr-ws", addr) check so it runs after trusted-proxy reconstruction (apply_trusted_proxy_addr) inside make_pair()/connection setup where the forwarded client address is established, and remove/replace the pre-handshake check in the listener accept path; repeat the same change for the second occurrence (the block around the other listener handling at lines ~543-559) so both ws paths use the reconstructed addr for rate limiting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/peer.rs`:
- Around line 168-185: prune_ip_change_entries currently drops entries that have
only one IP because it requires both elapsed < IP_CHANGE_DUR_X2 and
value.1.len() > 1; change the retention condition so recent single-IP entries
are kept: retain entries where value.0.elapsed().as_secs() < IP_CHANGE_DUR_X2 OR
value.1.len() > 1. Update the function prune_ip_change_entries (and leave
evict_oldest_ip_change_entry, track_ip_change, IpChangesMap usage unchanged) so
first-seen IPs are preserved long enough for a second IP to be observed and
evicted only when truly expired or after multiple IPs exist.
In `@src/relay_server.rs`:
- Around line 577-630: The code inserts into ACTIVE_RELAYS keyed by rf.uuid
without checking if that UUID is already active, allowing a client to reuse an
active UUID and overwrite the existing entry; fix by checking ACTIVE_RELAYS for
rf.uuid before pairing and refuse or disambiguate the new request instead of
inserting over an existing active relay. Concretely: when you take the PEERS
pending and before creating/using active_relays.insert(rf.uuid.clone(), ...),
acquire ACTIVE_RELAYS.lock().await, check if
active_relays.contains_key(&rf.uuid) and if so call send_relay_refuse(&mut
stream, "uuid_in_use").await (and refuse the pending.stream too) and return;
otherwise perform the insert while still holding the lock (or perform an atomic
check-and-insert under the same lock) so you never overwrite an existing
ActiveRelay. Apply the same guard in the other matching block that also inserts
into ACTIVE_RELAYS.
---
Outside diff comments:
In `@src/database.rs`:
- Around line 147-183: The insert_peer logic is racy because
peer_count/prune/check and the actual insert are separate DB operations; modify
insert_peer to perform count, optional prune, recheck and insert inside a single
DB transaction (use sqlx transaction API) and acquire a write lock (e.g., SELECT
COUNT(*) FOR UPDATE or an explicit table/advisory lock) so concurrent calls
serialize; call peer_count()/prune_old_peer_records() or their SQL equivalents
inside that transaction against the same connection, re-evaluate
peer_limit_reached using self.max_total_peers and peer_record_retention_days,
and only perform the INSERT when the recheck passes, returning
InsertPeerResult::PeerLimitReached otherwise, then commit/rollback the
transaction and propagate errors as before.
- Around line 112-133: Add a migration step to ensure the peer.created_at column
exists and is populated before creating the index or pruning: implement a new
async function ensure_peer_created_at_column() that runs PRAGMA table_info(peer)
to detect whether "created_at" exists, and if missing executes an ALTER TABLE
peer ADD COLUMN created_at datetime (nullable) then UPDATE peer SET created_at =
current_timestamp WHERE created_at IS NULL to backfill existing rows; call
ensure_peer_created_at_column() from create_tables() before executing the SQL
that creates index_peer_created_at and also ensure prune_old_peer_records() only
runs after this migration completes. Reference these symbols:
ensure_peer_created_at_column(), create_tables(), prune_old_peer_records().
---
Duplicate comments:
In `@src/relay_server.rs`:
- Around line 482-490: The hbbr-ws admission/rate-check is being done against
the immediate socket addr (proxy IP) instead of the reconstructed client IP;
move the allow_connection_from_ip("hbbr-ws", addr) check so it runs after
trusted-proxy reconstruction (apply_trusted_proxy_addr) inside
make_pair()/connection setup where the forwarded client address is established,
and remove/replace the pre-handshake check in the listener accept path; repeat
the same change for the second occurrence (the block around the other listener
handling at lines ~543-559) so both ws paths use the reconstructed addr for rate
limiting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 10f0a319-02d8-4ec3-9c09-de8f2ca09ce6
📒 Files selected for processing (6)
src/common.rssrc/database.rssrc/peer.rssrc/relay_server.rssrc/rendezvous_server.rstests/server_protection_process.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/server_protection_process.rs
- src/rendezvous_server.rs
Adjust prune_ip_change_entries to retain entries that are either recent or have multiple IPs (use || instead of &&) to avoid prematurely dropping recent single-IP records, and add a unit test covering that behavior. In relay_server.rs, add checks to detect and reject relay requests when the relay UUID is already active (both for immediate and pending flows), log warnings, record protection events, and send refuse responses. Add corresponding tests and minor test imports to validate UUID reuse detection.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/relay_server.rs (2)
992-1006:std::sync::Mutexheld across.awaitin async test.
TEST_RELAY_LIMITS_LOCK.lock().unwrap()returns astd::sync::MutexGuardthat is held acrossACTIVE_RELAYS.lock().await. Beyond tripping Clippy'sawait_holding_lock, if this module's test suite grows to use#[tokio::test(flavor = "multi_thread")], a panicked task holding the std mutex across an await could poison it and cause surprising failures. Prefer atokio::sync::Mutexfor tests that must await, or scope the std guard strictly around the synchronous mutations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/relay_server.rs` around lines 992 - 1006, The test active_relay_uuid_reuse_is_detected holds a std::sync::MutexGuard from TEST_RELAY_LIMITS_LOCK across an .await (ACTIVE_RELAYS.lock().await), which can trigger Clippy's await_holding_lock and risk poisoning; to fix, either replace TEST_RELAY_LIMITS_LOCK with a tokio::sync::Mutex so it can be .awaited safely, or limit the scope of the std mutex guard by dropping it (end its scope) before calling ACTIVE_RELAYS.lock().await; update the test to use TEST_RELAY_LIMITS_LOCK (tokio mutex) or move the guard so no std::sync::MutexGuard lives across the await in active_relay_uuid_reuse_is_detected.
677-680: Redundant self-scheduled removal overlaps withprune_expired_pending_relays.Every new pending relay spawns (via the outer
tokio::spawninhandle_connection) a task that sleeps forPENDING_RELAY_HOLD_SECSpurely toPEERS.lock().await.remove(&rf.uuid). The insert path already callsprune_expired_pending_relays(&mut peers)usingcreated_at, so the sleep-then-remove is duplicated work that also keeps a task alive for 30 s per registration. Consider dropping the sleep+remove and relying solely on the prune sweep (the entry'screated_atalready bounds its lifetime).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/relay_server.rs` around lines 677 - 680, The code spawns a per-registration task that sleeps PENDING_RELAY_HOLD_SECS and then calls PEERS.lock().await.remove(&rf.uuid), duplicating lifetime enforcement already handled by prune_expired_pending_relays; remove the sleep + removal block from the insertion path (the lines inserting PendingRelay::new and the subsequent drop(peers); sleep(...).await; PEERS.lock().await.remove(&rf.uuid)) so pending entries rely on their created_at and the existing prune_expired_pending_relays sweep invoked from handle_connection/insert logic; keep the initial peers.insert(rf.uuid.clone(), PendingRelay::new(...)) and ensure no other code depends on the self-scheduled removal task.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/relay_server.rs`:
- Around line 660-677: The PEERS map insertion can overwrite an existing
PendingRelay for the same rf.uuid, disconnecting the original client; before
inserting into PEERS in the new-pending branch (where you call
prune_expired_pending_relays, pending_relay_limit_reason and then
peers.insert(rf.uuid.clone(), PendingRelay::new(...))), check whether PEERS
already contains rf.uuid (e.g. peers.contains_key(&rf.uuid) or use the entry
API) and if so, record the protection event
(crate::common::record_protection_event("relay_pending_rejected")), log a
warning similar to the existing one, call send_relay_refuse(&mut stream,
"uuid_in_use" or appropriate reason).await, drop the new stream and return — do
not overwrite the existing PendingRelay in PEERS. Ensure the check uses the same
rf.uuid and that the send_relay_refuse path is exercised for the second
contender.
---
Nitpick comments:
In `@src/relay_server.rs`:
- Around line 992-1006: The test active_relay_uuid_reuse_is_detected holds a
std::sync::MutexGuard from TEST_RELAY_LIMITS_LOCK across an .await
(ACTIVE_RELAYS.lock().await), which can trigger Clippy's await_holding_lock and
risk poisoning; to fix, either replace TEST_RELAY_LIMITS_LOCK with a
tokio::sync::Mutex so it can be .awaited safely, or limit the scope of the std
mutex guard by dropping it (end its scope) before calling
ACTIVE_RELAYS.lock().await; update the test to use TEST_RELAY_LIMITS_LOCK (tokio
mutex) or move the guard so no std::sync::MutexGuard lives across the await in
active_relay_uuid_reuse_is_detected.
- Around line 677-680: The code spawns a per-registration task that sleeps
PENDING_RELAY_HOLD_SECS and then calls PEERS.lock().await.remove(&rf.uuid),
duplicating lifetime enforcement already handled by
prune_expired_pending_relays; remove the sleep + removal block from the
insertion path (the lines inserting PendingRelay::new and the subsequent
drop(peers); sleep(...).await; PEERS.lock().await.remove(&rf.uuid)) so pending
entries rely on their created_at and the existing prune_expired_pending_relays
sweep invoked from handle_connection/insert logic; keep the initial
peers.insert(rf.uuid.clone(), PendingRelay::new(...)) and ensure no other code
depends on the self-scheduled removal task.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2952d27a-c058-464c-9e27-51796275be28
📒 Files selected for processing (2)
src/peer.rssrc/relay_server.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/peer.rs
| let mut peers = PEERS.lock().await; | ||
| prune_expired_pending_relays(&mut peers); | ||
| let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count(); | ||
| if let Some(reason) = | ||
| pending_relay_limit_reason(peers.len(), pending_for_ip) | ||
| { | ||
| crate::common::record_protection_event("relay_pending_rejected"); | ||
| log::warn!( | ||
| "Rejecting relay request {} from {}: {}", | ||
| rf.uuid, | ||
| addr, | ||
| reason | ||
| ); | ||
| drop(peers); | ||
| send_relay_refuse(&mut stream, reason).await; | ||
| return; | ||
| } | ||
| peers.insert(rf.uuid.clone(), PendingRelay::new(Box::new(stream), ip)); |
There was a problem hiding this comment.
Missing PEERS UUID collision check — concurrent new-pending requests silently evict the earlier client.
The new-pending branch guards against rf.uuid already being in ACTIVE_RELAYS (line 648) but never checks whether it already exists in PEERS. Two clients that arrive with the same UUID during the 30 s hold window will both pass PEERS.remove() returning None at line 578, both pass the ACTIVE_RELAYS check, and then serialize on the PEERS lock at line 660. The second peers.insert(rf.uuid.clone(), …) at line 677 overwrites the first PendingRelay; its Box<dyn StreamTrait> is dropped and the original client is disconnected without a send_relay_refuse. A malicious peer that replays/guesses an active UUID can grief legitimate clients.
🔒 Proposed fix
drop(active_relays);
let mut peers = PEERS.lock().await;
prune_expired_pending_relays(&mut peers);
+ if peers.contains_key(&rf.uuid) {
+ drop(peers);
+ crate::common::record_protection_event("relay_uuid_in_use");
+ log::warn!(
+ "Rejecting pending relay request {} from {}: uuid already pending",
+ rf.uuid,
+ addr
+ );
+ send_relay_refuse(&mut stream, "uuid_in_use").await;
+ return;
+ }
let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut peers = PEERS.lock().await; | |
| prune_expired_pending_relays(&mut peers); | |
| let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count(); | |
| if let Some(reason) = | |
| pending_relay_limit_reason(peers.len(), pending_for_ip) | |
| { | |
| crate::common::record_protection_event("relay_pending_rejected"); | |
| log::warn!( | |
| "Rejecting relay request {} from {}: {}", | |
| rf.uuid, | |
| addr, | |
| reason | |
| ); | |
| drop(peers); | |
| send_relay_refuse(&mut stream, reason).await; | |
| return; | |
| } | |
| peers.insert(rf.uuid.clone(), PendingRelay::new(Box::new(stream), ip)); | |
| let mut peers = PEERS.lock().await; | |
| prune_expired_pending_relays(&mut peers); | |
| if peers.contains_key(&rf.uuid) { | |
| drop(peers); | |
| crate::common::record_protection_event("relay_uuid_in_use"); | |
| log::warn!( | |
| "Rejecting pending relay request {} from {}: uuid already pending", | |
| rf.uuid, | |
| addr | |
| ); | |
| send_relay_refuse(&mut stream, "uuid_in_use").await; | |
| return; | |
| } | |
| let pending_for_ip = peers.values().filter(|peer| peer.ip == ip).count(); | |
| if let Some(reason) = | |
| pending_relay_limit_reason(peers.len(), pending_for_ip) | |
| { | |
| crate::common::record_protection_event("relay_pending_rejected"); | |
| log::warn!( | |
| "Rejecting relay request {} from {}: {}", | |
| rf.uuid, | |
| addr, | |
| reason | |
| ); | |
| drop(peers); | |
| send_relay_refuse(&mut stream, reason).await; | |
| return; | |
| } | |
| peers.insert(rf.uuid.clone(), PendingRelay::new(Box::new(stream), ip)); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/relay_server.rs` around lines 660 - 677, The PEERS map insertion can
overwrite an existing PendingRelay for the same rf.uuid, disconnecting the
original client; before inserting into PEERS in the new-pending branch (where
you call prune_expired_pending_relays, pending_relay_limit_reason and then
peers.insert(rf.uuid.clone(), PendingRelay::new(...))), check whether PEERS
already contains rf.uuid (e.g. peers.contains_key(&rf.uuid) or use the entry
API) and if so, record the protection event
(crate::common::record_protection_event("relay_pending_rejected")), log a
warning similar to the existing one, call send_relay_refuse(&mut stream,
"uuid_in_use" or appropriate reason).await, drop the new stream and return — do
not overwrite the existing PendingRelay in PEERS. Ensure the check uses the same
rf.uuid and that the send_relay_refuse path is exercised for the second
contender.
Summary
This PR hardens
rustdesk-serveragainst malicious clients and generic internet abuse.The main goal is to treat clients as a possible attack vector, not only the server as a trust root for clients. The changes focus on:
This PR also includes the required shared protocol update in
hbb_commonso client and server use the same keyed request fields.Why
Before this change, several server paths were still too permissive:
hbbsdeployments enforced the server key for punch-hole requests, but not for all registration or presence pathshbbrcould be abused to accumulate large numbers of half-open relay requestsThat made the server vulnerable to:
What changed
1. Keyed auth on
hbbscontrol-plane requestsVulnerability:
hbbsrequests still worked without the configured server keyFix:
RegisterPeerRegisterPkOnlineRequestTestNatRequestLICENSE_MISMATCHforRegisterPkwhere the client benefits from a clear failure reasonImplementation notes:
licence_keyto the relevant rendezvous protocol messages2. Relay half-open abuse caps in
hbbrVulnerability:
Fix:
Key mismatchrefusal on relay auth failure3. Do not trust forwarded IP headers by default
Vulnerability:
X-Real-IP/X-Forwarded-Forfrom arbitrary clients, weakening rate limits, logging, and abuse controlsFix:
TRUST_PROXY_HEADERSis explicitly enabledX-Forwarded-Forhandling to the first address only4. Bound punch-hole request tracking
Vulnerability:
Fix:
5. Registration abuse reduction
Vulnerability:
Fix:
RegisterPeerin addition to existingRegisterPkprotections6. In-memory peer cache growth controls
Vulnerability:
Fix:
7. Persistent peer record cap and retention pruning
Vulnerability:
Fix:
MAX_TOTAL_PEER_RECORDSPEER_RECORD_RETENTION_DAYSAdditional fix:
RegisterPknow returns the real backend result instead of always reportingOKPEER_LIMIT_REACHEDcorrectly on the wire8. Listener-level rate limiting
Vulnerability:
Fix:
hbbsmain TCP listenerhbbsNAT/test listenerhbbswebsocket listenerhbbsUDP receive loophbbrTCP listenerhbbrwebsocket listener9. Operator-facing protection stats
Problem:
Fix:
protection-stats/psadmin outputTests
Added process-level regression coverage in
tests/server_protection_process.rsfor:hbbsadminpshbbradminpshbbsOnlineRequestkeyed auth enforcementhbbsRegisterPkkeyed auth enforcementhbbrrelay key mismatch refusalhbbspersistent peer cap behaviorhbbsstale peer retention pruningValidated with:
cargo test --test server_protection_process -- --test-threads=1Security impact
This PR does not try to solve every operational abuse scenario, but it materially improves the default security posture for self-hosted RustDesk deployments:
Notes
This PR depends on the matching shared
hbb_commonprotocol update so client and server use the same keyed rendezvous request fields.Summary by CodeRabbit
New Features
Tests