Skip to content

Verify + document cuDNN/cuBLAS dispatch path on NVIDIA hosts #1159

@ooples

Description

@ooples

Context

Tensors 0.46.0 exposes high-level cuDNN / cuBLAS wrappers:

  • `CuDnnConvolution.Conv2DForward(...)` (float)
  • `CuDnnBatchNorm.ForwardInference(...)` (float)
  • `CuBlasMatMul.MatMulFloat(...)` + `MatMulWithCachedWeightsFloat(...)`

These are the NVIDIA fast paths (tensor cores, cuBLAS-LT autotune). AiDotNet layers go through `Engine.Conv2D` / `Engine.BatchNorm` / `Engine.MatMul` which auto-dispatch to `DirectGpuEngine`. It is unclear from the public Tensors API whether `DirectGpuEngine.Conv2D` internally routes to `CuDnnConvolution.Conv2DForward` when CUDA + cuDNN are present, or whether it uses a generic kernel.

Ask

  1. Verify via instrumentation (or direct Tensors code inspection) whether the engine auto-routes through cuDNN/cuBLAS when they're available. If yes: document this and close.
  2. If not: wrap the three major layer ops (Conv2D, BatchNorm, MatMul) with an AiDotNet-side dispatcher:

```csharp
public static class GpuOptimalDispatch
{
public static Tensor Conv2D(Tensor input, Tensor kernel, ...)
{
if (CuDnnConvolution.IsAvailable)
return CuDnnConvolution.ForwardWithCache(input, kernel, ...); // needs new overload
return AiDotNetEngine.Current.Conv2D(input, kernel, ...);
}
// similar for BatchNorm, MatMul
}
```

The public `Engine.*` methods stay unchanged; `GpuOptimalDispatch` is used from `Conv2DLayer` / `BatchNormalizationLayer` / `DenseLayer` when `T == float` and cuDNN is available.

Acceptance

  • Tests on an NVIDIA host verify `CuDnnConvolution.Conv2DForward` is actually invoked (can be asserted via `PerformanceProfiler` trace once `EnableTensorsOpProfiling()` is on).
  • CPU-only hosts + non-NVIDIA GPUs fall through to existing paths with no change.

Relationship

Blocked on: nothing. Can be done as a follow-up after the current Tensors-parity PR lands.
Parallels: Tensors-side issue would be cleaner — Tensors' `DirectGpuEngine` should auto-route to cuDNN for Conv2D/BN when its backend is CUDA + cuDNN available. That's the real fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions