Overview
Extend the IL kernel generator's SIMD reduction support beyond the current Sum/Max/Min element-wise operations.
Parent issue: #545
Current State
| Reduction |
Element-wise (axis=null) |
Axis Reduction |
| Sum |
✅ SIMD |
❌ NDIterator |
| Max |
✅ SIMD |
❌ NDIterator |
| Min |
✅ SIMD |
❌ NDIterator |
| Prod |
❌ Scalar |
❌ NDIterator |
| ArgMax |
❌ Scalar |
❌ NDIterator |
| ArgMin |
❌ Scalar |
❌ NDIterator |
Existing SIMD code: ILKernelGenerator.cs:3996-4095
Task List
Tier 1: Element-wise Improvements
Tier 2: Axis Reductions
Files to Modify
| File |
Changes |
ILKernelGenerator.cs |
Add Prod/ArgMax/ArgMin to element-wise SIMD paths |
ReductionKernel.cs |
Add AxisReductionKernel delegate if needed |
Default.Reduction.*.cs |
Refactor axis reductions to use IL kernels |
DefaultEngine.ReductionOp.cs |
Wire up new kernels |
Benchmarks to Add
[Benchmark] public double Prod_10M() => np.prod(_array);
[Benchmark] public int ArgMax_10M() => np.argmax(_array);
[Benchmark] public NDArray Sum_Axis0() => np.sum(_matrix, axis: 0);
[Benchmark] public NDArray Max_Axis1() => np.amax(_matrix, axis: 1);
Success Criteria
- Prod element-wise: ≥1.5× faster than current
- ArgMax/ArgMin element-wise: ≥1.5× faster than current
- Inner-axis reductions: ≥2× faster than current
- All existing reduction tests pass
Overview
Extend the IL kernel generator's SIMD reduction support beyond the current Sum/Max/Min element-wise operations.
Parent issue: #545
Current State
Existing SIMD code:
ILKernelGenerator.cs:3996-4095Task List
Tier 1: Element-wise Improvements
SIMD Prod (Product reduction)
Vector256.HorizontalMultiply()in .NETSIMD ArgMax/ArgMin
Tier 2: Axis Reductions
Default.Reduction.AMax.csuses slow NDIterator loopsFiles to Modify
ILKernelGenerator.csReductionKernel.csDefault.Reduction.*.csDefaultEngine.ReductionOp.csBenchmarks to Add
Success Criteria