Conversation
Gaussian's `%mem` directive should not consume the entire scheduler allocation. ARC now reserves a fixed fraction (90%) of the submit-script memory for non-Gaussian overhead, such as the scheduler, scratch bookkeeping, and Gaussian allocations that occur outside the explicit memory budget. This prevents job failures on capped nodes where the physical memory is slightly less than the requested amount and avoids triggering Gaussian allocation (galloc) errors or scheduler kills.
Replaces the hardcoded 95% memory allocation cap with a configurable value from default_job_settings. Additionally, ensures troubleshooting terminates with a clear error message if a job requires more memory than the configured node-level cap allows. This prevents potential infinite resubmission loops when memory is the limiting factor for Gaussian or other ESS jobs.
Centralizes memory unit conversions using explicit MiB/GiB constants and clarifies ARC's use of base-2 units internally. This update ensures consistent application of memory overhead (10% default or 5% when capped) across different schedulers and fixes an inconsistency in PBS memory requests where MiB is now passed directly to the template rather than being converted to bytes.
Adds the `submit_script_memory_mib` attribute to the common job adapter and clarifies via comments in PBS submission templates that ARC passes memory values as base-2 integers (MiB). This ensures the internal standardization to MiB is correctly reflected in the adapter state and documented for scheduler directives.
There was a problem hiding this comment.
Pull request overview
Refactors ARC’s job memory handling to use consistent base-2 units (MiB/GiB), apply explicit overhead factors, and reserve headroom for Gaussian %mem to avoid galloc/memory failures on capped nodes.
Changes:
- Standardize internal submit-script memory calculations to MiB with explicit overhead factors and store
submit_script_memory_mib. - Apply a fixed headroom fraction when setting Gaussian
%mem, using submit-script memory as the budget source. - Update troubleshooting and unit tests to reflect the new memory cap/headroom behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| arc/settings/submit.py | Adds clarifying comments around scheduler memory units/templates. |
| arc/job/adapter.py | Introduces MiB conversion/overhead constants and refactors scheduler memory setting logic. |
| arc/job/adapters/common.py | Initializes the new submit_script_memory_mib attribute. |
| arc/job/adapters/gaussian.py | Sets Gaussian %mem from submit-script memory with reserved headroom. |
| arc/job/trsh.py | Uses configured node-memory cap when attempting Gaussian memory troubleshooting; errors when capped. |
| arc/job/adapter_test.py | Updates expected scheduler memory values to match new MiB logic. |
| arc/job/adapters/gaussian_test.py | Updates expected %mem and asserts headroom / new MiB attribute behavior. |
| arc/job/trsh_test.py | Adds coverage for the “already at node-memory cap” Gaussian troubleshooting path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ARC passes a base-2-derived MiB integer into this sample PBS | ||
| # template. On our PBS systems, the `mb` suffix is interpreted in | ||
| # base-2, so the directive stays `mem={memory}mb`. |
There was a problem hiding this comment.
The comment above this submit script says it's a "sample PBS template" and refers to a mem={memory}mb PBS directive, but the actual script below is a Slurm #SBATCH template using --mem-per-cpu={memory}. Please update/remove the comment so it matches the scheduler directives in this block (or move it to the actual PBS template section).
| # ARC passes a base-2-derived MiB integer into this sample PBS | |
| # template. On our PBS systems, the `mb` suffix is interpreted in | |
| # base-2, so the directive stays `mem={memory}mb`. | |
| # ARC passes a base-2-derived MiB integer into this sample Slurm | |
| # template. The memory request is emitted via the Slurm directive | |
| # `#SBATCH --mem-per-cpu={memory}`. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #875 +/- ##
==========================================
- Coverage 60.47% 60.39% -0.08%
==========================================
Files 102 102
Lines 31102 31113 +11
Branches 8104 8104
==========================================
- Hits 18808 18792 -16
- Misses 9957 9975 +18
- Partials 2337 2346 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request refactors how ARC handles memory allocation for jobs, standardizing all internal memory calculations to use base-2 (MiB/GiB) units and ensuring consistent, conservative memory requests across different cluster schedulers and Gaussian input files. It introduces explicit memory overhead factors and reserves headroom for non-Gaussian overhead, preventing over-allocation and improving stability, especially on capped nodes. The changes also update tests to reflect the new memory calculations.
Memory calculation and handling improvements:
MEMORY_GB_TO_MIB,DEFAULT_JOB_MEMORY_OVERHEAD, andCAPPED_JOB_MEMORY_OVERHEADfor clarity and consistency.set_cpu_and_memto always compute and store submit script memory in MiB, applying the appropriate overhead factor, and updated memory handling logic for different schedulers (SGE/OGE, PBS, Slurm) to use these MiB values.submit_script_memory_mibattribute throughout the codebase for clarity and updated initialization accordingly.Gaussian-specific memory allocation:
GAUSSIAN_MEMORY_HEADROOM_FRACTION(set to 0.90) so Gaussian jobs only consume a portion of the allocated memory, leaving room for scheduler and runtime overhead. Updatedset_input_file_memoryto use this fraction when setting%memin Gaussian input files. [1] [2]Test updates:
%memin Gaussian input files, and added assertions to verify the new logic for memory headroom and MiB calculations. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]