Commit 3978fe2
authored
[NPU] Add optimized NPU mhc (#1173)
Add Ascend NPU Triton kernels for the three mHC sub-operators:
- Fused matmul + RMS normalization (forward/backward)
- Sinkhorn routing with split pre/post/residual coefficients
(forward/backward)
- Pre-aggregate weighted sum (forward/backward)
- Post + residual mixing (forward/backward)
NPU optimizations applied:
- Unified UB tiling via compute_default_tiling_strategy for matrix
- Persistent grid-stride loops (tl.range + num_programs)
- Adaptive BLOCK_N/BLOCK_M for core utilisation at small seq_len
- Fused backward coefficient assembly kernel
Hardware Type: Atlas 800I A2
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence1 parent 2ca3bd0 commit 3978fe2
File tree
2 files changed
+1686
-0
lines changed- src/liger_kernel/ops/backends/_ascend/ops
2 files changed
+1686
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
58 | 61 | | |
59 | 62 | | |
60 | 63 | | |
| |||
146 | 149 | | |
147 | 150 | | |
148 | 151 | | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
149 | 155 | | |
0 commit comments