Assorted optimizations for sparse dense matrix multiplication#666
Merged
dkarrasch merged 10 commits intoJuliaSparse:mainfrom Apr 20, 2026
Merged
Assorted optimizations for sparse dense matrix multiplication#666dkarrasch merged 10 commits intoJuliaSparse:mainfrom
dkarrasch merged 10 commits intoJuliaSparse:mainfrom
Conversation
Better performance for mutable type like BigFloat
…throwing function
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #666 +/- ##
==========================================
+ Coverage 84.36% 84.42% +0.05%
==========================================
Files 13 13
Lines 9346 9400 +54
==========================================
+ Hits 7885 7936 +51
- Misses 1461 1464 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Member
|
Is it possible to update tests to increase the coverage? |
Contributor
Author
|
I’ve added more direct test for both the multiplications and the error checking. |
ViralBShah
reviewed
Jan 6, 2026
Member
|
Let's give this a couple more days and merge. |
dkarrasch
reviewed
Jan 6, 2026
dkarrasch
reviewed
Jan 6, 2026
dkarrasch
pushed a commit
that referenced
this pull request
Apr 20, 2026
dkarrasch
added a commit
that referenced
this pull request
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The change is mainly based on testing of the
_spmul!(C::StridedMatrix, X::DenseMatrixUnion, A::SparseMatrixCSCUnion2, α::Number, β::Number)function which is also the function used for my performance numbers below. I've then also applied the improvements to other similar functions though the performance for some of them may not be as big (since not all functions touched are amendable to vectorization).The main improvement is to hoist matrix size and pointer access out of the loop to work around JuliaLang/julia#60409 . This change have as much as 2x performance impact for complex numbers (it actually be even more on armv8.3-a (i.e. including all apple processors) and above by better triggering LLVM's complex number multiplication pattern matching with llvm/llvm-project#173818).
Adding
muladdis the second most important change which affect mostly complex number and bigfloat since the cost of operations saved is more significant compared to the bare memory access.And then there are other minor tweaks that are mostly useful for small matrices (~20% impact for ~10x10 matrix). These were included mainly because I was working on another optimization which may not work well for small cases. I then make these optimizations to the small matrix cases so that I can make a fair comparison for the effect caused by the other change. I've still not done with testing the other change yet but these small fixes are ready so I've included them here.