Skip to content

Limit concurrent workers in batch#873

Open
calvinp0 wants to merge 2 commits intomainfrom
pipe_concurrent
Open

Limit concurrent workers in batch#873
calvinp0 wants to merge 2 commits intomainfrom
pipe_concurrent

Conversation

@calvinp0
Copy link
Copy Markdown
Member

@calvinp0 calvinp0 commented Apr 18, 2026

This pull request adds support for throttling the number of concurrently running array jobs ("max_concurrent") in pipe runs, allowing users to limit how many jobs are active at once on supported schedulers (Slurm, PBS, SGE, HTCondor, and OGE). It introduces a new max_concurrent parameter, updates submit script templates and metadata handling, and adds comprehensive tests to ensure correct behavior across schedulers and edge cases.

Support for concurrent job throttling:

  • Added a max_concurrent parameter to PipeRun, allowing users to specify the maximum number of array jobs running simultaneously; this is validated and saved in run metadata, and is now loaded from disk for crash recovery. [1] [2] [3] [4] [5]
  • The submit_pipe_run function and settings now pass max_concurrent through to PipeRun. [1] [2]

Scheduler template and logic updates:

  • The resource calculation and submit script rendering logic now compute and inject the correct throttling syntax for each supported scheduler (Slurm, PBS, SGE, OGE, HTCondor), including clamping and disabling as appropriate. [1] [2] [3] [4] [5] [6] [7]

Testing improvements:

  • Added extensive tests to pipe_run_test.py to verify throttling behavior, correct template rendering for each scheduler, value validation, persistence across reloads, and edge cases such as clamping and unthrottled runs. [1] [2]

Internal code improvements:

  • Added a static method for validating max_concurrent values, ensuring only None or positive integers are accepted.
  • Refactored internal APIs to use the new throttle-aware resource and template logic. [1] [2] [3]

These changes make job submission more robust and flexible for large-scale compute campaigns.

Copilot AI review requested due to automatic review settings April 18, 2026 21:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for throttling batch-array concurrency (“max concurrent workers”) across supported schedulers so large PipeRuns can limit how many array tasks run simultaneously.

Changes:

  • Introduces pipe_settings.max_concurrent and persists it in PipeRun metadata.
  • Updates submit-script templates to accept an array_range plus optional scheduler-specific extra_directives.
  • Implements scheduler-specific throttle rendering in PipeRun and adds tests covering SLURM/PBS/SGE/OGE/HTCondor behavior and persistence.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
arc/settings/submit.py Parameterizes array size/range and injects optional extra directives for throttling support.
arc/settings/settings.py Adds new max_concurrent pipe setting to configure scheduler throttling.
arc/job/pipe/pipe_run.py Implements throttle calculation/rendering and persists max_concurrent in run.json.
arc/job/pipe/pipe_coordinator.py Wires max_concurrent from settings into PipeRun creation.
arc/job/pipe/pipe_run_test.py Adds unit tests verifying throttle rendering across schedulers and persistence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread arc/settings/settings.py
'enabled': False, # Set to True to enable pipe mode (it is off by default, use it for large compute campaigns).
'min_tasks': 10, # Minimum batch size to trigger pipe mode.
'max_workers': 100, # Upper bound on array worker slots per PipeRun.
'max_concurrent': None, # Max array workers running simultaneously (like PBS `%N`). None = unthrottled.
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new setting comment documents only None as disabling throttling, but the implementation/tests treat max_concurrent=0 as disabled as well. Please update this comment to reflect that 0 (and potentially other non-positive values) disables throttling, or alternatively enforce/validate that max_concurrent must be None or a positive integer.

Suggested change
'max_concurrent': None, # Max array workers running simultaneously (like PBS `%N`). None = unthrottled.
'max_concurrent': None, # Max array workers running simultaneously (like PBS `%N`). None or non-positive values disable throttling.

Copilot uses AI. Check for mistakes.
Comment thread arc/job/pipe/pipe_run.py
Comment on lines +209 to +212
throttle = None
if self.max_concurrent is not None and self.max_concurrent > 0:
throttle = min(self.max_concurrent, array_size)
return cpus, memory_mb, array_size, throttle
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_concurrent has implicit semantics for non-positive values (e.g., 0 disables throttling) but this isn’t clearly part of the public API contract of PipeRun. Consider normalizing/validating max_concurrent at construction time (e.g., coerce non-positive to None, or raise on negatives) and documenting the supported values in the class docstring to avoid silent configuration mistakes.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.37%. Comparing base (70dab25) to head (37352e3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #873      +/-   ##
==========================================
- Coverage   60.47%   60.37%   -0.10%     
==========================================
  Files         102      102              
  Lines       31102    31132      +30     
  Branches     8104     8111       +7     
==========================================
- Hits        18808    18797      -11     
- Misses       9957     9986      +29     
- Partials     2337     2349      +12     
Flag Coverage Δ
functionaltests 60.37% <ø> (-0.10%) ⬇️
unittests 60.37% <ø> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Allow user to define how many workers to run in parallel rather than attempting to apply workers to every job in the array

Added validation for settings of max concurrent (So only positive int or None)
@calvinp0 calvinp0 requested review from Lilachn91 and alongd April 19, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants