SparseFlow

SparseFlow is a commercial GPU inference product for reducing LLM inference cost on NVIDIA GPUs with structured sparsity.

Public Repo Scope

This public repository is intentionally limited to product-facing material:

Core implementation, kernel code, runtime internals, compiler passes, and deployment logic are kept private.

Current public benchmark framing for SparseFlow:

These are buyer-facing benchmark summaries, not a full public source release of the implementation.

Validated: A100 (primary benchmark platform), RTX 3090
In active validation: RTX 4090
Architecturally supported directionally, but not yet publicly validated here: H100, additional RTX 30/40 variants

Free benchmark review: Share representative model shapes, configs, or workload details and get a lightweight screening read on expected upside.
Paid pilot: Validate SparseFlow against your workload and deployment path in a deeper engineering engagement.

For evaluation access, pilot discussions, or commercial conversations:

This public repository remains under the existing MIT license for the materials published here. Commercial product access is handled separately.