Skip to content

Intermittent parallel CG+AMG test failure on Windows CI #451

@ViralBShah

Description

@ViralBShah

This issue is filed by Claude based on the analysis it did.

--

Description

When running the full test suite with parallelize = True and multiple threads, an intermittent test failure occurs in CI. The failure is always in Raster Pairwise (or Network Pairwise, which shares the same solver code path). The failure manifests as a large residual in resistance comparisons, suggesting the CG solver returned corrupted results.

Reproduction

  • Occurs intermittently on all platforms in CI (macOS aarch64, ubuntu x86, Windows x64)
  • Most frequent on Windows (~1 in 3 runs)
  • Never reproduced locally on macOS/ARM with 8 threads over 10+ stress test trials
  • All platforms use JULIA_NUM_THREADS=2 in CI

Example failures

  • macOS aarch64: Network Pairwise, diff 0.125
  • ubuntu x86: Raster Pairwise, diff 3.65
  • Windows x64: Raster Pairwise, diffs 11.4 and 683804

Architecture

The parallel CG+AMG path uses Threads.@spawn to solve pairs concurrently. Each task:

  1. Creates its own AMG preconditioner workspace via deepcopy(ml.workspace)
  2. Shares the AMG hierarchy (levels, final_A, coarse_solver, smoothers) — all read-only
  3. Calls Krylov.cg() which creates a fresh CgSolver per call
  4. File I/O is serialized via IO_LOCK
  5. Cumulative map accumulation is serialized via CUM_LOCK
  6. All result arrays (ret, v, current, current_map) are task-local

All shared state appears to be read-only. The GaussSeidel smoother is immutable and stateless. Sparse matrix-vector multiply is pure Julia with no global state.

Investigation done

  • Verified deepcopy(ml.workspace) creates independent vectors (not aliased)
  • Verified GaussSeidel struct is immutable with no hidden mutable state
  • Verified Krylov.jl cg creates fresh workspace per call
  • Verified no global mutable state in Krylov code paths
  • @threads :static does not fix the issue
  • Building a fresh smoothed_aggregation(matrix) per task works but is wasteful
  • All file writes protected by IO_LOCK
  • All cumulative accumulation protected by CUM_LOCK
  • postprocess only reads shared data, creates local current maps
  • Related: Fix smoothers and add a scratch JuliaLinearAlgebra/AlgebraicMultigrid.jl#125 restructures smoothers with per-level scratch space

Possible causes

  1. Julia runtime task scheduling issue affecting memory visibility across threads
  2. Subtle interaction between deepcopy and concurrent access patterns
  3. Unknown thread-unsafe code path not yet identified

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions