Open
Conversation
scx_mitosis fails when loaded from inside a cgroup namespace (e.g., from within a container). The kernel's bpf_cgroup_from_id() checks cgroup_is_descendant(cgrp, current_cgns_cgroup_dfl()), so the hardcoded root_cgid of 1 (the global root) is not visible from a non-init cgroup namespace, causing ops.init() to fail with ENOENT. Set root_cgid from userspace by reading the inode of /sys/fs/cgroup/, which returns the kernfs node ID of the cgroup root visible to the process. On the host this is 1 (no behavior change); inside a cgroup namespace it returns the namespace root's ID. The BPF side needs several changes to handle tasks and cgroups outside the namespace boundary, since init_task and select_cpu run for all tasks on the system, including host processes whose cgroups were never initialized: - Add is_root_cgroup() helper that matches both root_cgid and the global root at level 0, which appears when init_task runs for kthreads and systemd but is not the namespace root. - init_cgrp_ctx_with_ancestors: skip ancestors at or above the namespace root. Their cgrp_ctx was never initialized and bpf_cgroup_from_id() cannot resolve them. - init_cgrp_ctx: use fallible parent context lookup with cell 0 default instead of fatal scx_bpf_error, handling parents that are outside the namespace or were skipped during initialization. - update_task_cell: fall back to root cgroup context unconditionally when a task's cgroup context is missing, rather than only for exiting tasks. This handles host tasks whose cgroups are outside the namespace. Tested by loading scx_mitosis from inside a container's cgroup namespace on a multi-container host. The scheduler attaches, handles host and container tasks without errors, and survives multiple tick() cycles.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix scx_mitosis to work when loaded from inside a cgroup namespace (e.g., from within a container). The kernel's bpf_cgroup_from_id() enforces namespace visibility, so the hardcoded root_cgid of 1 is not visible from a non-init namespace, causing ops.init() to fail with ENOENT. Additional crashes occur because BPF callbacks like init_task and select_cpu run for all tasks on the system, including host processes whose cgroups are outside the namespace.
Details
Userspace (main.rs):
Set root_cgid by reading the inode of /sys/fs/cgroup/, which returns the kernfs node ID of the cgroup root visible to the process. On the host this is 1 (no behavior change); inside a cgroup namespace it returns the namespace root's ID. This follows the same pattern used by scx_layered, which reads cgroup IDs via std::fs::metadata(path).ino() rather than
hardcoding them.
BPF (mitosis.bpf.c):
Test Plan
Loaded scx_mitosis with --cpu-controller-disabled from inside a container's cgroup namespace on a multi-container host.
Verified: