Skip to content

cuda.core: two API contracts that could complicate future changes #1949

@mdboom

Description

@mdboom

Background

A comprehensive audit of all cuda_core wrapper code was performed to find places where the handling of input arguments or output values limits capabilities of the underlying low-level cuda_bindings API. The audit identified 18 such limitations. A follow-up analysis checked which of those would require breaking API or ABI changes to address — to ensure the project doesn't paint itself into a corner.

Good news: all 18 limitations can be resolved with purely additive changes (new keyword args with defaults, new methods/properties, new classes). However, two areas establish behavioral contracts or property shapes that deserve attention now, before more users depend on them.


1. Python int kernel argument convention

File: cuda/core/_kernel_arg_handler.pyx

When a bare Python int is passed as a kernel argument, it is unconditionally treated as an intptr_t (pointer address). This is documented in a code comment as an intentional judgment call:

We want to have a fast path to pass in Python integers as pointer addresses, but one could also (mistakenly) pass it with the intention of passing a scalar integer. It's a mistake because a Python int is ambiguous (arbitrary width). Our judgement call here is to treat it as a pointer address, without any warning!

Why this matters: This establishes a semantic contract that users will build against. If the project ever wanted int to mean "scalar integer of kernel-parameter-width" instead, that would be a silent behavioral breaking change — existing code passing pointer addresses as int would break without any error.

Recommendation: The current convention is defensible and the alternative (typed scalars via numpy/ctypes) covers the scalar case. However, consider:

  • Adding an explicit note in public documentation that int means pointer address
  • Optionally, adding a Pointer(addr) wrapper type so the intent is unambiguous, giving a future path to change the bare-int behavior if ever desired (with a deprecation cycle)

Risk level: Low, as long as the convention is documented and stable.


2. KernelNode.config and MemcpyNode — lossy round-trip of graph node parameters

File: cuda/core/graph/_subclasses.pyx

KernelNode.config

KernelNode.config reconstructs a LaunchConfig from CUDA_KERNEL_NODE_PARAMS_v3 but silently drops cluster_dimension and cooperative_launch. The docstring acknowledges this:

cluster dimensions and cooperative_launch are not preserved by the CUDA driver's kernel node params, so they are not included.

Code that reads .config, mutates it, and passes it to a new launch will silently lose cluster/cooperative settings. Fixing this later (populating the missing fields) is purely additive and non-breaking.

MemcpyNode

MemcpyNode flattens a CUDA_MEMCPY3D_v2 descriptor to 1D — only dst, src, and size (all int) are exposed as public properties. The Height, Depth, srcPitch, srcHeight, dstPitch, dstHeight fields are discarded.

Why this matters: The current properties (dst: int, src: int, size: int) define a public contract. If users write code that unpacks these three values, adding richer 3D properties later is safe (additive), but changing the meaning or type of the existing properties would be breaking. As long as new dimensions are exposed via new properties (e.g. height, depth, src_pitch), there is no conflict.

Recommendation:

  • For KernelNode.config: populate the missing LaunchConfig fields as soon as the driver exposes them through node params, or store them at node-creation time. This is additive.
  • For MemcpyNode: add height, depth, src_pitch, dst_pitch etc. as new properties rather than changing dst/src/size. Document that the current 1D view is intentionally minimal.
  • Do not rename or retype the existing dst, src, size properties in the future — that would be a breaking change.

Risk level: Low, as long as the additive-only approach is followed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cuda.coreEverything related to the cuda.core moduleenhancementAny code-related improvements

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions