You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is filed by Claude based on the analysis it did.
--
Description
When running the full test suite with parallelize = True and multiple threads, an intermittent test failure occurs in CI. The failure is always in Raster Pairwise (or Network Pairwise, which shares the same solver code path). The failure manifests as a large residual in resistance comparisons, suggesting the CG solver returned corrupted results.
Reproduction
Occurs intermittently on all platforms in CI (macOS aarch64, ubuntu x86, Windows x64)
Most frequent on Windows (~1 in 3 runs)
Never reproduced locally on macOS/ARM with 8 threads over 10+ stress test trials
All platforms use JULIA_NUM_THREADS=2 in CI
Example failures
macOS aarch64: Network Pairwise, diff 0.125
ubuntu x86: Raster Pairwise, diff 3.65
Windows x64: Raster Pairwise, diffs 11.4 and 683804
Architecture
The parallel CG+AMG path uses Threads.@spawn to solve pairs concurrently. Each task:
Creates its own AMG preconditioner workspace via deepcopy(ml.workspace)
Shares the AMG hierarchy (levels, final_A, coarse_solver, smoothers) — all read-only
Calls Krylov.cg() which creates a fresh CgSolver per call
File I/O is serialized via IO_LOCK
Cumulative map accumulation is serialized via CUM_LOCK
All result arrays (ret, v, current, current_map) are task-local
All shared state appears to be read-only. The GaussSeidel smoother is immutable and stateless. Sparse matrix-vector multiply is pure Julia with no global state.
This issue is filed by Claude based on the analysis it did.
--
Description
When running the full test suite with
parallelize = Trueand multiple threads, an intermittent test failure occurs in CI. The failure is always in Raster Pairwise (or Network Pairwise, which shares the same solver code path). The failure manifests as a large residual in resistance comparisons, suggesting the CG solver returned corrupted results.Reproduction
JULIA_NUM_THREADS=2in CIExample failures
Architecture
The parallel CG+AMG path uses
Threads.@spawnto solve pairs concurrently. Each task:deepcopy(ml.workspace)levels,final_A,coarse_solver, smoothers) — all read-onlyKrylov.cg()which creates a freshCgSolverper callIO_LOCKCUM_LOCKret,v,current,current_map) are task-localAll shared state appears to be read-only. The
GaussSeidelsmoother is immutable and stateless. Sparse matrix-vector multiply is pure Julia with no global state.Investigation done
deepcopy(ml.workspace)creates independent vectors (not aliased)GaussSeidelstruct is immutable with no hidden mutable statecgcreates fresh workspace per call@threads :staticdoes not fix the issuesmoothed_aggregation(matrix)per task works but is wastefulIO_LOCKCUM_LOCKpostprocessonly reads shared data, creates local current mapsPossible causes
deepcopyand concurrent access patterns