Skip to content

Use ICompiledPlan.ThenAsync for multi-stage model chaining (VAE, diffusion) #1157

@ooples

Description

@ooples

Context

Tensors 0.46.0 exposes `ICompiledPlan.ThenAsync(ICompiledPlan next)` which chains two compiled plans into a pipeline that can run them on overlapping streams. AiDotNet doesn't use this today — each sub-model has its own standalone `CompiledModelHost` and each Predict is a synchronous compile+execute.

Opportunity

Models with natural multi-stage inference should chain their plans:

  • `VAEModelBase`: encoder → sampler → decoder
  • `DiffusionModelBase`: 50× identical sub-net (could chain the sub-net to itself N times)
  • Noise predictors with separate time-embedding + main-net passes

Chaining gives:

  • Pipelining: stage 2 of batch N overlaps with stage 1 of batch N+1
  • One compile call per pipeline instead of per stage
  • Potential shared stream for reduced dispatch

Out of scope for the main Tensors-parity PR

This is an advanced integration requiring per-model-architecture audit: which sub-models have their own plans, which outputs/inputs thread together, whether the types match. Scope this separately.

Suggested path

  1. Audit `src/Diffusion/`, `src/NeuralNetworks/VariationalAutoencoder.cs`, and similar multi-stage models for sub-plan opportunities.
  2. Add a `ChainedCompiledModelHost` that accepts 2+ `CompiledModelHost` instances and uses ThenAsync.
  3. Integration tests verifying end-to-end output matches the sequential-Predict baseline.

Estimated scope

~300 LOC per model family it's applied to; ~800 LOC for the generic helper + tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions