SparseFlow is a commercial GPU inference product for reducing LLM inference cost on NVIDIA GPUs with structured sparsity.
This public repository is intentionally limited to product-facing material:
- Public benchmark framing
- Evaluation guidance
- Commercial access details
- Repository metadata for buyers and evaluators
Core implementation, kernel code, runtime internals, compiler passes, and deployment logic are kept private.
Current public benchmark framing for SparseFlow:
1.4xaverage speedup on validated production benchmark shapes1.6x-1.7xpeak gains on FFN-heavy inference paths30-40%potential inference cost reduction on a good workload fit- Zero model changes required to start evaluating
These are buyer-facing benchmark summaries, not a full public source release of the implementation.
- Validated:
A100(primary benchmark platform),RTX 3090 - In active validation:
RTX 4090 - Architecturally supported directionally, but not yet publicly validated here:
H100, additionalRTX 30/40variants
- Free benchmark review: Share representative model shapes, configs, or workload details and get a lightweight screening read on expected upside.
- Paid pilot: Validate SparseFlow against your workload and deployment path in a deeper engineering engagement.
For evaluation access, pilot discussions, or commercial conversations:
- Founder contact:
gourav.kumar@maplesilicon.co - General inquiries:
info@maplesilicon.co - Public website: maplesilicon.co
This public repository remains under the existing MIT license for the materials published here. Commercial product access is handled separately.