Context
Tensors 0.46.0 exposes `ICompiledPlan.ThenAsync(ICompiledPlan next)` which chains two compiled plans into a pipeline that can run them on overlapping streams. AiDotNet doesn't use this today — each sub-model has its own standalone `CompiledModelHost` and each Predict is a synchronous compile+execute.
Opportunity
Models with natural multi-stage inference should chain their plans:
- `VAEModelBase`: encoder → sampler → decoder
- `DiffusionModelBase`: 50× identical sub-net (could chain the sub-net to itself N times)
- Noise predictors with separate time-embedding + main-net passes
Chaining gives:
- Pipelining: stage 2 of batch N overlaps with stage 1 of batch N+1
- One compile call per pipeline instead of per stage
- Potential shared stream for reduced dispatch
Out of scope for the main Tensors-parity PR
This is an advanced integration requiring per-model-architecture audit: which sub-models have their own plans, which outputs/inputs thread together, whether the types match. Scope this separately.
Suggested path
- Audit `src/Diffusion/`, `src/NeuralNetworks/VariationalAutoencoder.cs`, and similar multi-stage models for sub-plan opportunities.
- Add a `ChainedCompiledModelHost` that accepts 2+ `CompiledModelHost` instances and uses ThenAsync.
- Integration tests verifying end-to-end output matches the sequential-Predict baseline.
Estimated scope
~300 LOC per model family it's applied to; ~800 LOC for the generic helper + tests.
Context
Tensors 0.46.0 exposes `ICompiledPlan.ThenAsync(ICompiledPlan next)` which chains two compiled plans into a pipeline that can run them on overlapping streams. AiDotNet doesn't use this today — each sub-model has its own standalone `CompiledModelHost` and each Predict is a synchronous compile+execute.
Opportunity
Models with natural multi-stage inference should chain their plans:
Chaining gives:
Out of scope for the main Tensors-parity PR
This is an advanced integration requiring per-model-architecture audit: which sub-models have their own plans, which outputs/inputs thread together, whether the types match. Scope this separately.
Suggested path
Estimated scope
~300 LOC per model family it's applied to; ~800 LOC for the generic helper + tests.