Skip to content

mitosis: Fix cgroup namespace support#3509

Open
pcodes wants to merge 1 commit intosched-ext:mainfrom
pcodes:mitosis-cgroupns-fix
Open

mitosis: Fix cgroup namespace support#3509
pcodes wants to merge 1 commit intosched-ext:mainfrom
pcodes:mitosis-cgroupns-fix

Conversation

@pcodes
Copy link
Copy Markdown

@pcodes pcodes commented Apr 3, 2026

Summary

Fix scx_mitosis to work when loaded from inside a cgroup namespace (e.g., from within a container). The kernel's bpf_cgroup_from_id() enforces namespace visibility, so the hardcoded root_cgid of 1 is not visible from a non-init namespace, causing ops.init() to fail with ENOENT. Additional crashes occur because BPF callbacks like init_task and select_cpu run for all tasks on the system, including host processes whose cgroups are outside the namespace.

Details

Userspace (main.rs):
Set root_cgid by reading the inode of /sys/fs/cgroup/, which returns the kernfs node ID of the cgroup root visible to the process. On the host this is 1 (no behavior change); inside a cgroup namespace it returns the namespace root's ID. This follows the same pattern used by scx_layered, which reads cgroup IDs via std::fs::metadata(path).ino() rather than
hardcoding them.

BPF (mitosis.bpf.c):

  • Add is_root_cgroup() helper that matches both root_cgid and the global root at level 0, which appears when init_task runs for kthreads and systemd but is not the namespace root.
  • init_cgrp_ctx_with_ancestors: skip ancestors at or above the namespace root. Their cgrp_ctx was never initialized and bpf_cgroup_from_id() cannot resolve them.
  • init_cgrp_ctx: use fallible parent context lookup with cell 0 default instead of fatal scx_bpf_error, handling parents that are outside the namespace or were skipped during initialization.
  • update_task_cell: fall back to root cgroup context unconditionally when a task's cgroup context is missing, rather than only for exiting tasks. This handles host tasks whose cgroups are outside the namespace.

Test Plan

Loaded scx_mitosis with --cpu-controller-disabled from inside a container's cgroup namespace on a multi-container host.
Verified:

  • Scheduler attaches successfully (no ops.init() failed error)
  • Host tasks (kthreads, systemd) are handled gracefully with cell 0 assignment
  • Container tasks are initialized and scheduled correctly
  • tick() runs without errors across multiple cycles
  • No scx_bpf_error messages in dmesg

scx_mitosis fails when loaded from inside a cgroup namespace (e.g., from
within a container). The kernel's bpf_cgroup_from_id() checks
cgroup_is_descendant(cgrp, current_cgns_cgroup_dfl()), so the hardcoded
root_cgid of 1 (the global root) is not visible from a non-init cgroup
namespace, causing ops.init() to fail with ENOENT.

Set root_cgid from userspace by reading the inode of /sys/fs/cgroup/,
which returns the kernfs node ID of the cgroup root visible to the
process. On the host this is 1 (no behavior change); inside a cgroup
namespace it returns the namespace root's ID.

The BPF side needs several changes to handle tasks and cgroups outside
the namespace boundary, since init_task and select_cpu run for all tasks
on the system, including host processes whose cgroups were never
initialized:

- Add is_root_cgroup() helper that matches both root_cgid and the
  global root at level 0, which appears when init_task runs for
  kthreads and systemd but is not the namespace root.

- init_cgrp_ctx_with_ancestors: skip ancestors at or above the namespace
  root. Their cgrp_ctx was never initialized and bpf_cgroup_from_id()
  cannot resolve them.

- init_cgrp_ctx: use fallible parent context lookup with cell 0 default
  instead of fatal scx_bpf_error, handling parents that are outside the
  namespace or were skipped during initialization.

- update_task_cell: fall back to root cgroup context unconditionally
  when a task's cgroup context is missing, rather than only for exiting
  tasks. This handles host tasks whose cgroups are outside the namespace.

Tested by loading scx_mitosis from inside a container's cgroup namespace
on a multi-container host. The scheduler attaches, handles host and
container tasks without errors, and survives multiple tick() cycles.
@pcodes pcodes changed the title scx_mitosis: Fix cgroup namespace support mitosis: Fix cgroup namespace support Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants