-
Notifications
You must be signed in to change notification settings - Fork 287
chore: make the callback sync #2467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟣 NFSHandler.OnNetworkRelease performs blocking I/O (blocking on <-doneCh inside chroot.Close()), which violates the new non-blocking contract added to MapSubscriber in this PR (map.go:14-21). This is a pre-existing issue: NetworkReleased was already invoked synchronously before this PR and NFSHandler was already blocking, but the PR replaces the old 'synchronous, caller can rely on completion' comment with a new 'must be non-blocking' requirement without updating NFSHandler to comply. A slow chroot teardown will stall the goroutine calling NetworkReleased, delaying sandbox network slot recycling.
Extended reasoning...
What the bug is
The PR adds an interface-level contract to MapSubscriber (map.go:14-18):
At the same time, the PR removes the old NetworkReleased comment that said "Subscribers are invoked synchronously so the caller can rely on them having completed before taking any follow-up action." The old contract explicitly allowed blocking; the new contract explicitly forbids it.
The code path that triggers the violation
NFSHandler.OnNetworkRelease (nfsproxy/chroot/nfs.go:93-113) acquires h.mu.Lock(), removes entries from an internal map, releases the lock, then calls chroot.Close() in a loop for each removed entry. chroot.Close() calls fs.ns.Close() (chroot.go:119-121), which dispatches to mountNS.Close() (mountns.go:76-99). mountNS.Close() closes the stopCh channel to signal a dedicated OS thread to restore the original mount namespace, then blocks on
<-doneCh(mountns.go:91), waiting for that thread to call unix.Setns() and signal completion. This is a genuine synchronous blocking operation that can take non-trivial time for each chroot being torn down.Why existing code doesn't prevent it
NetworkReleased calls m.trigger() synchronously (no goroutine dispatch), and trigger() iterates all subscribers calling OnNetworkRelease in sequence. There is no timeout, context cancellation check, or goroutine wrapper around the subscriber calls. OnNetworkRelease is called directly on the same goroutine that initiated the network release.
Impact
Each chroot teardown in OnNetworkRelease stalls the goroutine calling NetworkReleased for the duration of the mount namespace restoration syscall. When multiple network slots are released concurrently or sequentially, these stalls accumulate, delaying sandbox network slot recycling. Future contributors reading the MapSubscriber interface will see "must be non-blocking" and assume compliance, potentially introducing race conditions or deadlocks if they rely on that guarantee.
Proof by example
<-doneCh.How to fix
Wrap the blocking work in a goroutine inside NFSHandler.OnNetworkRelease, or provide a separate async teardown path. For example:
This would satisfy the new non-blocking contract while preserving the teardown semantics.