Skip to content

feat: add NetworkPolicy support for sandbox isolation#291

Open
Sanchit2662 wants to merge 2 commits intovolcano-sh:mainfrom
Sanchit2662:feat/sandbox-network-policy
Open

feat: add NetworkPolicy support for sandbox isolation#291
Sanchit2662 wants to merge 2 commits intovolcano-sh:mainfrom
Sanchit2662:feat/sandbox-network-policy

Conversation

@Sanchit2662
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Sandboxes had zero network restrictions. A pod inside a sandbox could reach anything, internal or external. Since sandboxes run untrusted code this is a real isolation gap, so I picked up issue #216 to add NetworkPolicy support.

I added a SandboxNetworkPolicy field to both SandboxTemplate (AgentRuntime) and CodeInterpreterSandboxTemplate. It has three modes you pick from:

  • None - default, no policy created, nothing breaks for existing users - Restricted - deny-all baseline. DNS egress on port 53 (UDP+TCP) is always allowed so pods don't lose name resolution. Router ingress is always allowed too
    since the router has to be able to reach the sandbox - Custom - you pass raw k8s NetworkPolicy ingress/egress rules directly

A few specific things I was careful about:

Cross-namespace router ingress - using just PodSelector would only match pods in the same namespace as the NP. In a real deployment the router usually lives in a different namespace than the sandbox pods, so that would silently break the router rule. I fixed this by adding a NamespaceSelector using kubernetes.io/metadata.name so it works across namespaces.

Router selector is configurable - you can set it in values.yaml under workloadmanager.networkPolicy.routerSelector and routerNamespace, or pass --router-selector and --router-namespace flags. Defaults to app=agentcube-router and the workloadmanager's own namespace.

Per-session override - added NetworkPolicy field to CreateSandboxRequest. When set it fully replaces the template-level policy for that session only. No merging, just a clean replace.

Cleanup - NPs are deleted when you delete the sandbox through the API, on rollback if creation fails partway through, and in the GC loop. I also fixed a bug where the GC was returning early on IsNotFound before deleting the NP. Since the store record gets purged right after, that would've leaked the NP permanently. Now NP cleanup runs regardless of whether the sandbox was already gone.

Rollback - simplified it to always attempt NP delete when networkPolicy != nil rather than tracking a npCreated flag. This also handles the edge case where the k8s write succeeded but returned an error to us.

RBAC - added networking.k8s.io/networkpolicies create/get/list/watch/delete to the workloadmanager ClusterRole. Without this the API server returns 403 on NP create.

Which issue(s) this PR fixes:

Fixes #216

Special notes for your reviewer:

Known gap: for the warm-pool path (SandboxClaim), the NP gets created but its PodSelector uses SandboxNameLabelKey which agent-sandbox doesn't propagate onto warm pool pods. So the NP exists but won't enforce anything for warm-pool sessions right now. I want to confirm the label propagation behavior on the agent-sandbox side before fixing this, happy to do it as a follow-up.

No OwnerReferences on the NP for now. Cleanup is handled explicitly through the delete handler and GC. Can add owner refs in a follow-up if you'd prefer that approach.

Does this PR introduce a user-facing change?:

Added NetworkPolicy support for sandbox isolation. Set `networkPolicy.mode` on AgentRuntime or CodeInterpreter templates to `Restricted` (deny-all with DNS and router ingress allowed) or `Custom` (raw k8s rules). Defaults to `None` so existing setups are unaffected. Router selector and namespace are configurable via chart   values.
                                     

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements network isolation for sandboxes using Kubernetes NetworkPolicies, supporting 'None', 'Restricted', and 'Custom' modes via the AgentRuntime and CodeInterpreter CRDs. The workload-manager is updated to manage the lifecycle of these policies, including automated creation and cleanup. A critical issue was identified in pkg/workloadmanager/workload_builder.go where the per-session NetworkPolicy override is ignored during the creation of CodeInterpreter sandboxes when warm pools are not utilized.

Comment thread pkg/workloadmanager/workload_builder.go Outdated
}
sandbox := buildSandboxObject(buildParams)
return sandbox, nil, sandboxEntry, nil
np := buildNetworkPolicy(sandboxName, namespace, codeInterpreterObj.Spec.Template.NetworkPolicy, routerSelector, routerNamespace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The per-session NetworkPolicy override is ignored in this code path. While it is correctly handled in the warm-pool path (line 403) and in buildSandboxByAgentRuntime (line 305), it is missing here. This means that a custom NetworkPolicy provided in the CreateSandboxRequest will not be applied when creating a CodeInterpreter sandbox without a warm pool.

Suggested change
np := buildNetworkPolicy(sandboxName, namespace, codeInterpreterObj.Spec.Template.NetworkPolicy, routerSelector, routerNamespace)
np := buildNetworkPolicy(sandboxName, namespace, effectiveNetworkPolicy(sessionNetworkPolicy, codeInterpreterObj.Spec.Template.NetworkPolicy), routerSelector, routerNamespace)

- Add SandboxNetworkPolicy API type (None/Restricted/Custom modes)
- Build per-sandbox NetworkPolicy with router ingress always allowed
- Support cross-namespace router via NamespaceSelector
- Per-session NP override in CreateSandboxRequest (replace semantics)
- Configurable router selector/namespace via chart values and CLI flags
- Clean up NP on sandbox delete, rollback, and GC
- Add networkpolicies RBAC to workloadmanager ClusterRole

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional Kubernetes NetworkPolicy generation/cleanup to improve sandbox network isolation in AgentCube’s Workload Manager, closing the “no network restrictions” gap for untrusted sandbox code execution.

Changes:

  • Introduces SandboxNetworkPolicy in runtime API templates (AgentRuntime + CodeInterpreter) and a per-session override via CreateSandboxRequest.
  • Implements NetworkPolicy building (None/Restricted/Custom) plus lifecycle management (create on sandbox create; best-effort delete on rollback, delete API, and GC).
  • Adds router selector/namespace configuration (flags + Helm values) to support cross-namespace router→sandbox ingress allow rules.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/workloadmanager/workload_builder.go Wires template/session policy into sandbox build flow and returns a NetworkPolicy object for creation.
pkg/workloadmanager/server.go Adds config defaults for router selector/namespace used by Restricted mode ingress.
pkg/workloadmanager/networkpolicy_builder.go Builds NetworkPolicy objects for None/Restricted/Custom modes (router ingress + optional DNS egress).
pkg/workloadmanager/networkpolicy_builder_test.go Unit tests for NetworkPolicy building and “replace, not merge” selection behavior.
pkg/workloadmanager/k8s_client.go Adds helpers to create/delete NetworkPolicies and generalizes clientset type.
pkg/workloadmanager/handlers.go Creates NetworkPolicy during sandbox creation; deletes on rollback and explicit delete handler.
pkg/workloadmanager/handlers_test.go Updates handler tests for new builder signatures and adds session override forwarding test.
pkg/workloadmanager/garbage_collection.go Ensures NetworkPolicy cleanup runs even when Sandbox/SandboxClaim is already NotFound.
pkg/workloadmanager/garbage_collection_test.go Adds fake clientset to cover NP cleanup path without nil panics.
pkg/common/types/sandbox.go Extends CreateSandboxRequest to accept per-session NetworkPolicy override.
pkg/apis/runtime/v1alpha1/networkpolicy_types.go Adds new API types for SandboxNetworkPolicy modes and rules.
pkg/apis/runtime/v1alpha1/agent_type.go Adds NetworkPolicy field to AgentRuntime SandboxTemplate.
pkg/apis/runtime/v1alpha1/codeinterpreter_types.go Adds NetworkPolicy field to CodeInterpreterSandboxTemplate.
pkg/apis/runtime/v1alpha1/zz_generated.deepcopy.go Updates generated deep-copies for new NetworkPolicy fields/types.
manifests/charts/base/values.yaml Adds Helm values for router selector/namespace used in Restricted mode.
manifests/charts/base/templates/workloadmanager.yaml Passes router selector/namespace flags into workload-manager deployment.
manifests/charts/base/templates/rbac/workloadmanager.yaml Grants workload-manager RBAC to manage networkpolicies.
manifests/charts/base/crds/runtime.agentcube.volcano.sh_agentruntimes.yaml Updates AgentRuntime CRD schema with networkPolicy fields.
manifests/charts/base/crds/runtime.agentcube.volcano.sh_codeinterpreters.yaml Updates CodeInterpreter CRD schema with networkPolicy fields.
cmd/workload-manager/main.go Adds flags + selector parsing for router selector/namespace configuration.
cmd/workload-manager/main_test.go Unit tests for selector parsing logic.

Comment thread pkg/workloadmanager/workload_builder.go Outdated
}
sandbox := buildSandboxObject(buildParams)
return sandbox, nil, sandboxEntry, nil
np := buildNetworkPolicy(sandboxName, namespace, codeInterpreterObj.Spec.Template.NetworkPolicy, routerSelector, routerNamespace)
Comment on lines +388 to +394
// createNetworkPolicy creates a NetworkPolicy using the workloadmanager's own client.
func createNetworkPolicy(ctx context.Context, clientset kubernetes.Interface, np *networkingv1.NetworkPolicy) error {
_, err := clientset.NetworkingV1().NetworkPolicies(np.Namespace).Create(ctx, np, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
return fmt.Errorf("failed to create network policy %s/%s: %w", np.Namespace, np.Name, err)
}
return nil
@@ -21,6 +21,7 @@ import (
"time"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 41.55844% with 135 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.32%. Comparing base (3fddb1b) to head (bf23937).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
pkg/apis/runtime/v1alpha1/zz_generated.deepcopy.go 0.00% 99 Missing ⚠️
pkg/workloadmanager/workload_builder.go 26.66% 11 Missing ⚠️
pkg/workloadmanager/handlers.go 46.66% 6 Missing and 2 partials ⚠️
pkg/workloadmanager/k8s_client.go 30.00% 6 Missing and 1 partial ⚠️
pkg/workloadmanager/garbage_collection.go 0.00% 2 Missing and 4 partials ⚠️
pkg/workloadmanager/server.go 0.00% 4 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #291      +/-   ##
==========================================
- Coverage   46.73%   46.32%   -0.42%     
==========================================
  Files          30       31       +1     
  Lines        2822     3033     +211     
==========================================
+ Hits         1319     1405      +86     
- Misses       1363     1481     +118     
- Partials      140      147       +7     
Flag Coverage Δ
unittests 46.32% <41.55%> (-0.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…eter path

- Fix session NP override being ignored in direct sandbox path
- Add CIDR format validation markers to SandboxEgressRule and SandboxIngressRule
- Add example manifest for NetworkPolicy modes (Restricted, Custom)

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Support network policy for sandbox

4 participants