Background
A comprehensive audit of all cuda_core wrapper code was performed to find places where the handling of input arguments or output values limits capabilities of the underlying low-level cuda_bindings API. The audit identified 18 such limitations. A follow-up analysis checked which of those would require breaking API or ABI changes to address — to ensure the project doesn't paint itself into a corner.
Good news: all 18 limitations can be resolved with purely additive changes (new keyword args with defaults, new methods/properties, new classes). However, two areas establish behavioral contracts or property shapes that deserve attention now, before more users depend on them.
1. Python int kernel argument convention
File: cuda/core/_kernel_arg_handler.pyx
When a bare Python int is passed as a kernel argument, it is unconditionally treated as an intptr_t (pointer address). This is documented in a code comment as an intentional judgment call:
We want to have a fast path to pass in Python integers as pointer addresses, but one could also (mistakenly) pass it with the intention of passing a scalar integer. It's a mistake because a Python int is ambiguous (arbitrary width). Our judgement call here is to treat it as a pointer address, without any warning!
Why this matters: This establishes a semantic contract that users will build against. If the project ever wanted int to mean "scalar integer of kernel-parameter-width" instead, that would be a silent behavioral breaking change — existing code passing pointer addresses as int would break without any error.
Recommendation: The current convention is defensible and the alternative (typed scalars via numpy/ctypes) covers the scalar case. However, consider:
- Adding an explicit note in public documentation that
int means pointer address
- Optionally, adding a
Pointer(addr) wrapper type so the intent is unambiguous, giving a future path to change the bare-int behavior if ever desired (with a deprecation cycle)
Risk level: Low, as long as the convention is documented and stable.
2. KernelNode.config and MemcpyNode — lossy round-trip of graph node parameters
File: cuda/core/graph/_subclasses.pyx
KernelNode.config
KernelNode.config reconstructs a LaunchConfig from CUDA_KERNEL_NODE_PARAMS_v3 but silently drops cluster_dimension and cooperative_launch. The docstring acknowledges this:
cluster dimensions and cooperative_launch are not preserved by the CUDA driver's kernel node params, so they are not included.
Code that reads .config, mutates it, and passes it to a new launch will silently lose cluster/cooperative settings. Fixing this later (populating the missing fields) is purely additive and non-breaking.
MemcpyNode
MemcpyNode flattens a CUDA_MEMCPY3D_v2 descriptor to 1D — only dst, src, and size (all int) are exposed as public properties. The Height, Depth, srcPitch, srcHeight, dstPitch, dstHeight fields are discarded.
Why this matters: The current properties (dst: int, src: int, size: int) define a public contract. If users write code that unpacks these three values, adding richer 3D properties later is safe (additive), but changing the meaning or type of the existing properties would be breaking. As long as new dimensions are exposed via new properties (e.g. height, depth, src_pitch), there is no conflict.
Recommendation:
- For
KernelNode.config: populate the missing LaunchConfig fields as soon as the driver exposes them through node params, or store them at node-creation time. This is additive.
- For
MemcpyNode: add height, depth, src_pitch, dst_pitch etc. as new properties rather than changing dst/src/size. Document that the current 1D view is intentionally minimal.
- Do not rename or retype the existing
dst, src, size properties in the future — that would be a breaking change.
Risk level: Low, as long as the additive-only approach is followed.
Background
A comprehensive audit of all
cuda_corewrapper code was performed to find places where the handling of input arguments or output values limits capabilities of the underlying low-levelcuda_bindingsAPI. The audit identified 18 such limitations. A follow-up analysis checked which of those would require breaking API or ABI changes to address — to ensure the project doesn't paint itself into a corner.Good news: all 18 limitations can be resolved with purely additive changes (new keyword args with defaults, new methods/properties, new classes). However, two areas establish behavioral contracts or property shapes that deserve attention now, before more users depend on them.
1. Python
intkernel argument conventionFile:
cuda/core/_kernel_arg_handler.pyxWhen a bare Python
intis passed as a kernel argument, it is unconditionally treated as anintptr_t(pointer address). This is documented in a code comment as an intentional judgment call:Why this matters: This establishes a semantic contract that users will build against. If the project ever wanted
intto mean "scalar integer of kernel-parameter-width" instead, that would be a silent behavioral breaking change — existing code passing pointer addresses asintwould break without any error.Recommendation: The current convention is defensible and the alternative (typed scalars via
numpy/ctypes) covers the scalar case. However, consider:intmeans pointer addressPointer(addr)wrapper type so the intent is unambiguous, giving a future path to change the bare-intbehavior if ever desired (with a deprecation cycle)Risk level: Low, as long as the convention is documented and stable.
2.
KernelNode.configandMemcpyNode— lossy round-trip of graph node parametersFile:
cuda/core/graph/_subclasses.pyxKernelNode.config
KernelNode.configreconstructs aLaunchConfigfromCUDA_KERNEL_NODE_PARAMS_v3but silently dropscluster_dimensionandcooperative_launch. The docstring acknowledges this:Code that reads
.config, mutates it, and passes it to a new launch will silently lose cluster/cooperative settings. Fixing this later (populating the missing fields) is purely additive and non-breaking.MemcpyNode
MemcpyNodeflattens aCUDA_MEMCPY3D_v2descriptor to 1D — onlydst,src, andsize(allint) are exposed as public properties. TheHeight,Depth,srcPitch,srcHeight,dstPitch,dstHeightfields are discarded.Why this matters: The current properties (
dst: int,src: int,size: int) define a public contract. If users write code that unpacks these three values, adding richer 3D properties later is safe (additive), but changing the meaning or type of the existing properties would be breaking. As long as new dimensions are exposed via new properties (e.g.height,depth,src_pitch), there is no conflict.Recommendation:
KernelNode.config: populate the missingLaunchConfigfields as soon as the driver exposes them through node params, or store them at node-creation time. This is additive.MemcpyNode: addheight,depth,src_pitch,dst_pitchetc. as new properties rather than changingdst/src/size. Document that the current 1D view is intentionally minimal.dst,src,sizeproperties in the future — that would be a breaking change.Risk level: Low, as long as the additive-only approach is followed.