Skip to content

cuda.core.system: Add MIG-related APIs#1916

Open
mdboom wants to merge 6 commits intoNVIDIA:mainfrom
mdboom:cuda-core-system-dask-cuda
Open

cuda.core.system: Add MIG-related APIs#1916
mdboom wants to merge 6 commits intoNVIDIA:mainfrom
mdboom:cuda-core-system-dask-cuda

Conversation

@mdboom
Copy link
Copy Markdown
Contributor

@mdboom mdboom commented Apr 15, 2026

These APIs are required by dask-cuda.

@mdboom mdboom added this to the cuda.core v1.0.0 milestone Apr 15, 2026
@mdboom mdboom self-assigned this Apr 15, 2026
@mdboom mdboom added the cuda.core Everything related to the cuda.core module label Apr 15, 2026
@mdboom mdboom requested a review from cpcloud April 15, 2026 17:20
@github-actions
Copy link
Copy Markdown

@mdboom mdboom force-pushed the cuda-core-system-dask-cuda branch from b877b46 to f29b935 Compare April 20, 2026 17:21
@mdboom mdboom requested a review from rparolin April 20, 2026 17:28
device, as a 5 part hexadecimal string, that augments the immutable,
board serial identifier.
"""
# NVML UUIDs have a `GPU-` or `MIG-` prefix. We remove that here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong suggestion to move this comment into the docstring, mainly so that it appears in the online documentation.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented Apr 20, 2026

Generated with Cursor GPT-5.4 Extra High Fast

I did not check these findings manually. (Recently such findings have become generally highly reliable.)


  1. High: cuda_core/cuda/core/system/_mig.pxi:126 uses self._handle inside MigInfo.parent, but MigInfo only stores self._device. Accessing device.mig.parent will raise AttributeError instead of returning the parent device.

  2. High: the get_device_count -> device_count rename was only half-applied. cuda_core/cuda/core/system/_mig.pxi:176 still calls self.get_device_count(), and cuda_core/tests/system/test_system_device.py:745 still calls mig.get_device_count(). On any system that exercises the MIG path, both mig.get_all_devices() and the test will fail with AttributeError.

  3. Medium: the MIG mode logic is inverted for normal callers. system.Device.get_all_devices() returns top-level NVML devices from cuda_core/cuda/core/system/_device.pyx:296, but cuda_core/cuda/core/system/_mig.pxi:46, cuda_core/cuda/core/system/_mig.pxi:67, and cuda_core/cuda/core/system/_mig.pxi:92 gate mode/pending_mode/setter on is_mig_device. That means device.mig.mode reports False and the setter raises on the parent GPU devices that actually own MIG mode, so the new API does not expose enabled MIG mode through the normal enumeration path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants