When a container is spawned using a block-based snapshotter, urunc attempts to use the rootfs snapshot as a block device for the sandbox, assuming both the guest and the sandbox monitor support block devices and the corresponding filesystem.
However, this process is not handled correctly by urunc. Specifically, the logic executes from the urunc reexec process. After verifying that the snapshot can be used as a block device, urunc reexec unmounts the rootfs and then passes the snapshot to the sandbox monitor as a block device.
The issue arises because the reexec process runs inside the container’s mount namespace. The snapshot itself is mounted by the shim in the parent mount namespace. If this mount is not marked with the MS_SHARED flag (which is the default behavior), then unmounting it inside the container namespace does not propagate back to the parent namespace. As a result, the device remains mounted in the parent namespace.
This can potentially lead to data corruption. In practice, however, since the snapshot corresponds to the container’s root filesystem and is cleaned up after the container terminates, the corruption does not have a lasting impact.
When a container is spawned using a block-based snapshotter, urunc attempts to use the rootfs snapshot as a block device for the sandbox, assuming both the guest and the sandbox monitor support block devices and the corresponding filesystem.
However, this process is not handled correctly by urunc. Specifically, the logic executes from the
urunc reexecprocess. After verifying that the snapshot can be used as a block device,urunc reexecunmounts the rootfs and then passes the snapshot to the sandbox monitor as a block device.The issue arises because the reexec process runs inside the container’s mount namespace. The snapshot itself is mounted by the shim in the parent mount namespace. If this mount is not marked with the MS_SHARED flag (which is the default behavior), then unmounting it inside the container namespace does not propagate back to the parent namespace. As a result, the device remains mounted in the parent namespace.
This can potentially lead to data corruption. In practice, however, since the snapshot corresponds to the container’s root filesystem and is cleaned up after the container terminates, the corruption does not have a lasting impact.