Galloc issue by calvinp0 · Pull Request #875 · ReactionMechanismGenerator/ARC

calvinp0 · 2026-04-19T14:28:53Z

This pull request refactors how ARC handles memory allocation for jobs, standardizing all internal memory calculations to use base-2 (MiB/GiB) units and ensuring consistent, conservative memory requests across different cluster schedulers and Gaussian input files. It introduces explicit memory overhead factors and reserves headroom for non-Gaussian overhead, preventing over-allocation and improving stability, especially on capped nodes. The changes also update tests to reflect the new memory calculations.

Memory calculation and handling improvements:

Standardized all internal memory math to use base-2 units (GiB/MiB), introducing constants like MEMORY_GB_TO_MIB, DEFAULT_JOB_MEMORY_OVERHEAD, and CAPPED_JOB_MEMORY_OVERHEAD for clarity and consistency.
Refactored set_cpu_and_mem to always compute and store submit script memory in MiB, applying the appropriate overhead factor, and updated memory handling logic for different schedulers (SGE/OGE, PBS, Slurm) to use these MiB values.
Added submit_script_memory_mib attribute throughout the codebase for clarity and updated initialization accordingly.

Gaussian-specific memory allocation:

Added a fixed GAUSSIAN_MEMORY_HEADROOM_FRACTION (set to 0.90) so Gaussian jobs only consume a portion of the allocated memory, leaving room for scheduler and runtime overhead. Updated set_input_file_memory to use this fraction when setting %mem in Gaussian input files. [1] [2]

Test updates:

Updated all affected tests to use the new base-2 memory calculations, adjusted expected values for %mem in Gaussian input files, and added assertions to verify the new logic for memory headroom and MiB calculations. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

Gaussian's `%mem` directive should not consume the entire scheduler allocation. ARC now reserves a fixed fraction (90%) of the submit-script memory for non-Gaussian overhead, such as the scheduler, scratch bookkeeping, and Gaussian allocations that occur outside the explicit memory budget. This prevents job failures on capped nodes where the physical memory is slightly less than the requested amount and avoids triggering Gaussian allocation (galloc) errors or scheduler kills.

Replaces the hardcoded 95% memory allocation cap with a configurable value from default_job_settings. Additionally, ensures troubleshooting terminates with a clear error message if a job requires more memory than the configured node-level cap allows. This prevents potential infinite resubmission loops when memory is the limiting factor for Gaussian or other ESS jobs.

Centralizes memory unit conversions using explicit MiB/GiB constants and clarifies ARC's use of base-2 units internally. This update ensures consistent application of memory overhead (10% default or 5% when capped) across different schedulers and fixes an inconsistency in PBS memory requests where MiB is now passed directly to the template rather than being converted to bytes.

Adds the `submit_script_memory_mib` attribute to the common job adapter and clarifies via comments in PBS submission templates that ARC passes memory values as base-2 integers (MiB). This ensures the internal standardization to MiB is correctly reflected in the adapter state and documented for scheduler directives.

Copilot

Pull request overview

Refactors ARC’s job memory handling to use consistent base-2 units (MiB/GiB), apply explicit overhead factors, and reserve headroom for Gaussian %mem to avoid galloc/memory failures on capped nodes.

Changes:

Standardize internal submit-script memory calculations to MiB with explicit overhead factors and store submit_script_memory_mib.
Apply a fixed headroom fraction when setting Gaussian %mem, using submit-script memory as the budget source.
Update troubleshooting and unit tests to reflect the new memory cap/headroom behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
arc/settings/submit.py	Adds clarifying comments around scheduler memory units/templates.
arc/job/adapter.py	Introduces MiB conversion/overhead constants and refactors scheduler memory setting logic.
arc/job/adapters/common.py	Initializes the new `submit_script_memory_mib` attribute.
arc/job/adapters/gaussian.py	Sets Gaussian `%mem` from submit-script memory with reserved headroom.
arc/job/trsh.py	Uses configured node-memory cap when attempting Gaussian memory troubleshooting; errors when capped.
arc/job/adapter_test.py	Updates expected scheduler memory values to match new MiB logic.
arc/job/adapters/gaussian_test.py	Updates expected `%mem` and asserts headroom / new MiB attribute behavior.
arc/job/trsh_test.py	Adds coverage for the “already at node-memory cap” Gaussian troubleshooting path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T14:32:47Z

+        # ARC passes a base-2-derived MiB integer into this sample PBS
+        # template. On our PBS systems, the `mb` suffix is interpreted in
+        # base-2, so the directive stays `mem={memory}mb`.


The comment above this submit script says it's a "sample PBS template" and refers to a mem={memory}mb PBS directive, but the actual script below is a Slurm #SBATCH template using --mem-per-cpu={memory}. Please update/remove the comment so it matches the scheduler directives in this block (or move it to the actual PBS template section).

Suggested change

# ARC passes a base-2-derived MiB integer into this sample PBS

# template. On our PBS systems, the `mb` suffix is interpreted in

# base-2, so the directive stays `mem={memory}mb`.

# ARC passes a base-2-derived MiB integer into this sample Slurm

# template. The memory request is emitted via the Slurm directive

# `#SBATCH --mem-per-cpu={memory}`.

codecov · 2026-04-19T16:19:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.39%. Comparing base (70dab25) to head (b0459fd).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #875      +/-   ##
==========================================
- Coverage   60.47%   60.39%   -0.08%     
==========================================
  Files         102      102              
  Lines       31102    31113      +11     
  Branches     8104     8104              
==========================================
- Hits        18808    18792      -16     
- Misses       9957     9975      +18     
- Partials     2337     2346       +9

Flag	Coverage Δ
functionaltests	`60.39% <ø> (-0.08%)`	⬇️
unittests	`60.39% <ø> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

calvinp0 added 4 commits April 19, 2026 17:05

calvinp0 requested review from Lilachn91, alongd and Copilot April 19, 2026 14:28

github-actions bot added the Module: trsh Troubleshooting label Apr 19, 2026

Copilot started reviewing on behalf of calvinp0 April 19, 2026 14:29 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

calvinp0 removed the request for review from alongd April 19, 2026 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Galloc issue#875

Galloc issue#875
calvinp0 wants to merge 4 commits intomainfrom
galloc_issue

calvinp0 commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

codecov bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

calvinp0 commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 19, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants