Open
Conversation
d6a83b4 to
c231a43
Compare
3e88c36 to
7674f5f
Compare
9c25f3d to
51368be
Compare
9be935e to
eab7647
Compare
Pipe job IDs submitted during the scheduler label loop weren't in the server_job_ids list (fetched earlier in the same iteration), causing poll_pipes to conclude the scheduler job was gone and mark all tasks FAILED TERMINAL Append the job ID to server_job_ids on both initial submission and resubmission so _is_scheduler_job_alive sees it immediately.
delete_all_species_jobs blanket-set all output job_types to False, including rotors and bde which are initialised to True by initialize_output_dict. For species with no torsional modes (e.g. cyclic TS from THF), no scan jobs are ever spawned, so rotors stays False and check_all_done incorrectly marks the TS as unconverged — even when opt, freq, sp, and IRC all passed. Additionally, switch_ts did not reset rotors_dict, so determine_rotors was never re-called for the new TS geometry and stale scan results from the previous guess carried over. Changes: - Preserve the True default for rotors/bde in delete_all_species_jobs, matching initialize_output_dict. - Reset rotors_dict and number_of_rotors in switch_ts when job_types['rotors'] is enabled, so the new geometry gets fresh rotor detection.
Previously self.level.cabs was stored on Level but never templated,
so F12 jobs silently ran with DimCABS = 0. Add a ${cabs} slot on the
! line after ${auxiliary_basis} and populate it from self.level.cabs.
Moved the separator space out of the template and into the cabs value itself so the non-F12 input file (the 99% case) gets the exact same output as before. Existing orca_test snapshots pass unchanged.
Previously, sp_level configs like DLPNO-CCSD(T)-F12 with no 'cabs:' field would reach ORCA silently and run with DimCABS = 0, returning non-F12 energies (and in some cases crashing ARC downstream with no clear cause). Fail fast at input-file generation with a descriptive error pointing at the missing 'cabs:' key.
When a monoatomic species hits the DLPNO-incompatibility branch in run_sp, the level was rebuilt via Level(method=..., basis=..., software= ..., args=...), silently dropping auxiliary_basis, cabs, dispersion, and all solvation_* fields. For an F12 sp_level on an atom this zeroed out CABS and aux, making ORCA run with DimCABS = 0. Rebuild from as_dict() and only override the method so the rest of the Level spec survives.
ORCA 5.x rejects non-RI F12-CC methods on open-shell species with 'F12-CC method not implemented for UHF references'. ORCA 6.x may handle it natively, so a per-run global override would be wrong. Add a reactive trsh path: detect the specific ORCA warning in determine_ess_status (keyword 'F12_UHF'), and in trsh_ess_job rebuild the level with '/RI' appended to the method (preserving aux, cabs, solvation, etc. via as_dict). Scoped to the failing job only.
Previous logic blanket-stripped 'dlpno-' from the method for every monoatomic species, but DLPNO works fine in ORCA for atoms with more than one electron (e.g. [O], [N], [C]). Only single-electron species lack pair correlations and need the HF fallback. Also narrow the trsh guard at trsh.py:1027 to match (is_h instead of is_monoatomic or is_h), so non-H monoatomics keep their DLPNO method through the retry path.
The old 'read user value OR default, then add_to_args' pattern always appended to args['keyword']['general'], even when the user already had the value under its named key. For a user who set dlpno_threshold: TightPNO, 'tightpno' was emitted twice — once from their key and once from 'general' — and ORCA rejected the input with UNRECOGNIZED OR DUPLICATED KEYWORD. Use setdefault() instead, so the default only lands when the user didn't supply the key. The value rides through update_input_dict_with_ args on its canonical key, not under 'general', so each keyword is emitted exactly once.
The DFT branch unconditionally appended defgrid2/defgrid3 via add_to_args, so a user who set their own defgrid keyword for a DFT sp_level got it twice in the ! line. Apply the same setdefault pattern we just used for scf_convergence and dlpno_threshold under the key 'dft_grid', so the default only lands when the user hasn't already supplied one.
| global _diagnostics_logged | ||
| if _diagnostics_logged: | ||
| return | ||
| _diagnostics_logged = True |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
Use function-local persistent state instead of a module-level global flag.
Best fix (without changing behavior):
- Remove
_diagnostics_logged = Falseat module scope. - In
_log_node_diagnostics(), replaceglobal _diagnostics_loggedand related checks with a function attribute (e.g.,_log_node_diagnostics._logged), initialized lazily viagetattr(..., False). - This preserves “log once per worker process” behavior and eliminates the flagged global variable.
Edits required in arc/scripts/pipe_worker.py:
- Delete the module-level
_diagnostics_loggeddefinition. - Update lines inside
_log_node_diagnostics()where the global is read/written.
No new imports or dependencies are needed.
Suggested changeset
1
arc/scripts/pipe_worker.py
| @@ -43,9 +43,7 @@ | ||
|
|
||
| logger = logging.getLogger('pipe_worker') | ||
|
|
||
| _diagnostics_logged = False | ||
|
|
||
|
|
||
| def _log_node_diagnostics() -> None: | ||
| """Dump hostname, PBS/Slurm array context, PATH, PYTHONPATH, TMPDIR, and | ||
| group membership on first task failure. A dead compute node otherwise | ||
| @@ -54,10 +51,9 @@ | ||
| many sites. Logging once per worker process keeps the volume bounded | ||
| when one bad node drains many tasks. | ||
| """ | ||
| global _diagnostics_logged | ||
| if _diagnostics_logged: | ||
| if getattr(_log_node_diagnostics, '_logged', False): | ||
| return | ||
| _diagnostics_logged = True | ||
| _log_node_diagnostics._logged = True | ||
| import socket | ||
| import subprocess | ||
| try: |
Copilot is powered by AI and may make mistakes. Always verify output.
When an Orca job fails due to insufficient memory and hits a total memory limit, ARC now attempts to resolve the issue by reducing the number of CPU cores to increase the memory available per core. - Added logic to calculate the maximum feasible CPU cores that fit within the total memory cap while meeting Orca's per-core requirements. - Ensures at least one core is utilized if viable, rather than failing the troubleshooting step. - Prevents recalculating/inflating total memory when it is already constrained by a cap. - Adjusted the conservative memory buffer added during total memory estimation from 5 GB to 3 GB. .
- Make the general memory error check in the scheduler case-insensitive and restricted to cases where 'memory' is the sole error keyword. - Add unit tests for ORCA memory troubleshooting to verify that CPU cores are reduced when the total memory limit is hit, while total memory is increased when no cap is present. .
…ction When a pipe task fails with an Electronic Structure System (ESS) error, the coordinator now attempts to read the `parser_summary` from the worker's result file. If found, the task is ejected to the scheduler via `troubleshoot_ess` rather than a blind resubmission. This allows the scheduler to apply intelligent troubleshooting logic—such as adjusting memory or CPU cores—based on the specific failure mode encountered during the pipe run. Additionally, resource requirements (CPU and memory) are now explicitly passed when ejecting tasks to ensure consistency between the pipe environment and the scheduler.
- Refresh scheduler-side job logs before parsing ESS status to detect server-reported out-of-memory (OOM) errors even when output files are absent or incomplete. - Tag jobs whose requested memory is clipped to a node's capacity with a `max_total_job_memory` keyword. - Ensure the capped memory marker is preserved during status updates to allow the troubleshooter to distinguish between hitting a node limit versus a simple insufficient memory request. - Enable status determination from additional job info (scheduler logs) when the primary output file cannot be found.
Transition state connectivity and bond orders are not uniquely defined (partial forming/breaking bonds at the saddle point), so enforcing isomorphism against a stored 2D adjacency list rejects valid TS geometries. - species.ARCSpecies.mol_from_xyz: short-circuit to accept the perceived molecule for TSs without running check_xyz_isomorphism. - scheduler: add is_ts guards at the three remaining check_xyz_isomorphism call sites (check_directed_scan, check_directed_scan_job, troubleshoot_scan_job). Matches the existing is_ts guards in parse_composite_geo and parse_opt_geo. - Fix secondary bug in the non-TS warning message: log perceived_mol in the "and:" branch (previously logged self.mol twice). - Update TS_search.rst and advanced.rst to document the policy. - Tests: TS mol_from_xyz acceptance, and two mock-based scheduler tests covering check_directed_scan_job and troubleshoot_scan_job.
r_cut_p_cut_isomorphic matched purely on molecular-formula fingerprint OR graph isomorphism, so constitutional isomers sharing a formula (e.g. the alpha vs beta pentyl-ether radicals in H-abstraction between identical donors) got paired as "isomorphic" and handed to map_two_species, which then failed to superimpose them and returned None. Add a strict flag that requires full graph isomorphism, and run pairing_reactants_and_products_for_mapping in two passes: strict first, then the loose fingerprint-or-isomorphic fallback for any unmatched r-cuts so rearrangements whose cuts aren't strictly isomorphic (e.g. Intra_Disproportionation, ring-openings) still work. Also guard glue_maps against a None per-pair map so the caller sees a clean None instead of a TypeError when a future mapping regression slips through.
map_rxn blindly handed fragment_maps to glue_maps. Any None entry (from map_two_species giving up on a pair) would crash inside glue_maps trying to subscript the None. Detect None entries up front, log which pairs failed, and either advance to the next product_dict index (same recovery path used by the existing template-order error branch) or return None so upstream consumers can fall back cleanly.
NMD (check_normal_mode_displacement) only needs the reactive bond set in reactant index space - which is canonically given by the RMG family recipe (BREAK_BOND / FORM_BOND / CHANGE_BOND actions) plus r_label_map. Previously NMD routed through reaction.get_bonds, which hard-required self.atom_map and raised otherwise, so any atom-mapping failure killed the whole TS validation even though the reactive bonds were directly computable. Three changes: - get_bonds: move the atom_map check below the r_bonds_only=True short- circuit. Reactant bonds never touch atom_map, so r_bonds_only should not require one. - _get_reactive_bonds_from_family: new helper that reads the family's actions (via ReactionFamily(...).actions) and resolves labels through r_label_map into reactant-indexed bond tuples. - get_formed_and_broken_bonds / get_changed_bonds: when self.atom_map is None but family + product_dicts are present, use the helper. Otherwise unchanged. Emits a warning so the fallback path is diagnosable.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addition of CREST Adapter that complements the heuristic adapter.
This pull request adds support for the CREST conformer and transition state (TS) search method to the ARC project, along with several related improvements and code cleanups. The most important changes include integrating CREST as a TS search adapter, updating configuration and constants, and enhancing the heuristics TS search logic for better provenance tracking and code clarity.
CREST Integration:
JobEnum(arc/job/adapter.py), included CREST in the list of adapters and RMG family mapping, and registered it as a default incore adapter (arc/job/adapters/common.py,arc/job/adapters/ts/__init__.py). [1] [2] [3] [4]arc/job/adapters/ts/crest_test.py).Makefile). [1] [2]Constants and Configuration:
angstrom_to_bohrconversion constant to both Cython and Python constants modules (arc/constants.pxd,arc/constants.py). [1] [2] [3]Heuristics TS Search Enhancements and Refactoring:
arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4]arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4] [5] [6] [7].