CREST Adapter by calvinp0 · Pull Request #807 · ReactionMechanismGenerator/ARC

calvinp0 · 2025-11-27T22:16:29Z

Addition of CREST Adapter that complements the heuristic adapter.

This pull request adds support for the CREST conformer and transition state (TS) search method to the ARC project, along with several related improvements and code cleanups. The most important changes include integrating CREST as a TS search adapter, updating configuration and constants, and enhancing the heuristics TS search logic for better provenance tracking and code clarity.

CREST Integration:

Added CREST as a supported TS search method: updated JobEnum (arc/job/adapter.py), included CREST in the list of adapters and RMG family mapping, and registered it as a default incore adapter (arc/job/adapters/common.py, arc/job/adapters/ts/__init__.py). [1] [2] [3] [4]
Implemented a new test suite for CREST input generation (arc/job/adapters/ts/crest_test.py).
Added a Makefile target and installation script for CREST (Makefile). [1] [2]

Constants and Configuration:

Added the angstrom_to_bohr conversion constant to both Cython and Python constants modules (arc/constants.pxd, arc/constants.py). [1] [2] [3]

Heuristics TS Search Enhancements and Refactoring:

Refactored heuristics TS search logic to track and combine method provenance for TS guesses, allowing for more precise attribution when multiple methods contribute to a guess (arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4]
Improved code readability and maintainability by reformatting imports and function calls, and clarifying data structures and comments in heuristics TS search (arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4] [5] [6] [7]
.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Pipe job IDs submitted during the scheduler label loop weren't in the server_job_ids list (fetched earlier in the same iteration), causing poll_pipes to conclude the scheduler job was gone and mark all tasks FAILED TERMINAL Append the job ID to server_job_ids on both initial submission and resubmission so _is_scheduler_job_alive sees it immediately.

delete_all_species_jobs blanket-set all output job_types to False, including rotors and bde which are initialised to True by initialize_output_dict. For species with no torsional modes (e.g. cyclic TS from THF), no scan jobs are ever spawned, so rotors stays False and check_all_done incorrectly marks the TS as unconverged — even when opt, freq, sp, and IRC all passed. Additionally, switch_ts did not reset rotors_dict, so determine_rotors was never re-called for the new TS geometry and stale scan results from the previous guess carried over. Changes: - Preserve the True default for rotors/bde in delete_all_species_jobs, matching initialize_output_dict. - Reset rotors_dict and number_of_rotors in switch_ts when job_types['rotors'] is enabled, so the new geometry gets fresh rotor detection.

Previously self.level.cabs was stored on Level but never templated, so F12 jobs silently ran with DimCABS = 0. Add a ${cabs} slot on the ! line after ${auxiliary_basis} and populate it from self.level.cabs.

Moved the separator space out of the template and into the cabs value itself so the non-F12 input file (the 99% case) gets the exact same output as before. Existing orca_test snapshots pass unchanged.

Previously, sp_level configs like DLPNO-CCSD(T)-F12 with no 'cabs:' field would reach ORCA silently and run with DimCABS = 0, returning non-F12 energies (and in some cases crashing ARC downstream with no clear cause). Fail fast at input-file generation with a descriptive error pointing at the missing 'cabs:' key.

When a monoatomic species hits the DLPNO-incompatibility branch in run_sp, the level was rebuilt via Level(method=..., basis=..., software= ..., args=...), silently dropping auxiliary_basis, cabs, dispersion, and all solvation_* fields. For an F12 sp_level on an atom this zeroed out CABS and aux, making ORCA run with DimCABS = 0. Rebuild from as_dict() and only override the method so the rest of the Level spec survives.

ORCA 5.x rejects non-RI F12-CC methods on open-shell species with 'F12-CC method not implemented for UHF references'. ORCA 6.x may handle it natively, so a per-run global override would be wrong. Add a reactive trsh path: detect the specific ORCA warning in determine_ess_status (keyword 'F12_UHF'), and in trsh_ess_job rebuild the level with '/RI' appended to the method (preserving aux, cabs, solvation, etc. via as_dict). Scoped to the failing job only.

Previous logic blanket-stripped 'dlpno-' from the method for every monoatomic species, but DLPNO works fine in ORCA for atoms with more than one electron (e.g. [O], [N], [C]). Only single-electron species lack pair correlations and need the HF fallback. Also narrow the trsh guard at trsh.py:1027 to match (is_h instead of is_monoatomic or is_h), so non-H monoatomics keep their DLPNO method through the retry path.

The old 'read user value OR default, then add_to_args' pattern always appended to args['keyword']['general'], even when the user already had the value under its named key. For a user who set dlpno_threshold: TightPNO, 'tightpno' was emitted twice — once from their key and once from 'general' — and ORCA rejected the input with UNRECOGNIZED OR DUPLICATED KEYWORD. Use setdefault() instead, so the default only lands when the user didn't supply the key. The value rides through update_input_dict_with_ args on its canonical key, not under 'general', so each keyword is emitted exactly once.

The DFT branch unconditionally appended defgrid2/defgrid3 via add_to_args, so a user who set their own defgrid keyword for a DFT sp_level got it twice in the ! line. Apply the same setdefault pattern we just used for scf_convergence and dlpno_threshold under the key 'dft_grid', so the default only lands when the user hasn't already supplied one.

+    global _diagnostics_logged
+    if _diagnostics_logged:
+        return
+    _diagnostics_logged = True


Use function-local persistent state instead of a module-level global flag.

Best fix (without changing behavior):

Remove _diagnostics_logged = False at module scope.

In _log_node_diagnostics(), replace global _diagnostics_logged and related checks with a function attribute (e.g., _log_node_diagnostics._logged), initialized lazily via getattr(..., False).

This preserves “log once per worker process” behavior and eliminates the flagged global variable.

Edits required in arc/scripts/pipe_worker.py:

Delete the module-level _diagnostics_logged definition.

Update lines inside _log_node_diagnostics() where the global is read/written.

No new imports or dependencies are needed.

When an Orca job fails due to insufficient memory and hits a total memory limit, ARC now attempts to resolve the issue by reducing the number of CPU cores to increase the memory available per core. - Added logic to calculate the maximum feasible CPU cores that fit within the total memory cap while meeting Orca's per-core requirements. - Ensures at least one core is utilized if viable, rather than failing the troubleshooting step. - Prevents recalculating/inflating total memory when it is already constrained by a cap. - Adjusted the conservative memory buffer added during total memory estimation from 5 GB to 3 GB. .

- Make the general memory error check in the scheduler case-insensitive and restricted to cases where 'memory' is the sole error keyword. - Add unit tests for ORCA memory troubleshooting to verify that CPU cores are reduced when the total memory limit is hit, while total memory is increased when no cap is present. .

…ction When a pipe task fails with an Electronic Structure System (ESS) error, the coordinator now attempts to read the `parser_summary` from the worker's result file. If found, the task is ejected to the scheduler via `troubleshoot_ess` rather than a blind resubmission. This allows the scheduler to apply intelligent troubleshooting logic—such as adjusting memory or CPU cores—based on the specific failure mode encountered during the pipe run. Additionally, resource requirements (CPU and memory) are now explicitly passed when ejecting tasks to ensure consistency between the pipe environment and the scheduler.

- Refresh scheduler-side job logs before parsing ESS status to detect server-reported out-of-memory (OOM) errors even when output files are absent or incomplete. - Tag jobs whose requested memory is clipped to a node's capacity with a `max_total_job_memory` keyword. - Ensure the capped memory marker is preserved during status updates to allow the troubleshooter to distinguish between hitting a node limit versus a simple insufficient memory request. - Enable status determination from additional job info (scheduler logs) when the primary output file cannot be found.

Transition state connectivity and bond orders are not uniquely defined (partial forming/breaking bonds at the saddle point), so enforcing isomorphism against a stored 2D adjacency list rejects valid TS geometries. - species.ARCSpecies.mol_from_xyz: short-circuit to accept the perceived molecule for TSs without running check_xyz_isomorphism. - scheduler: add is_ts guards at the three remaining check_xyz_isomorphism call sites (check_directed_scan, check_directed_scan_job, troubleshoot_scan_job). Matches the existing is_ts guards in parse_composite_geo and parse_opt_geo. - Fix secondary bug in the non-TS warning message: log perceived_mol in the "and:" branch (previously logged self.mol twice). - Update TS_search.rst and advanced.rst to document the policy. - Tests: TS mol_from_xyz acceptance, and two mock-based scheduler tests covering check_directed_scan_job and troubleshoot_scan_job.

r_cut_p_cut_isomorphic matched purely on molecular-formula fingerprint OR graph isomorphism, so constitutional isomers sharing a formula (e.g. the alpha vs beta pentyl-ether radicals in H-abstraction between identical donors) got paired as "isomorphic" and handed to map_two_species, which then failed to superimpose them and returned None. Add a strict flag that requires full graph isomorphism, and run pairing_reactants_and_products_for_mapping in two passes: strict first, then the loose fingerprint-or-isomorphic fallback for any unmatched r-cuts so rearrangements whose cuts aren't strictly isomorphic (e.g. Intra_Disproportionation, ring-openings) still work. Also guard glue_maps against a None per-pair map so the caller sees a clean None instead of a TypeError when a future mapping regression slips through.

map_rxn blindly handed fragment_maps to glue_maps. Any None entry (from map_two_species giving up on a pair) would crash inside glue_maps trying to subscript the None. Detect None entries up front, log which pairs failed, and either advance to the next product_dict index (same recovery path used by the existing template-order error branch) or return None so upstream consumers can fall back cleanly.

NMD (check_normal_mode_displacement) only needs the reactive bond set in reactant index space - which is canonically given by the RMG family recipe (BREAK_BOND / FORM_BOND / CHANGE_BOND actions) plus r_label_map. Previously NMD routed through reaction.get_bonds, which hard-required self.atom_map and raised otherwise, so any atom-mapping failure killed the whole TS validation even though the reactive bonds were directly computable. Three changes: - get_bonds: move the atom_map check below the r_bonds_only=True short- circuit. Reactant bonds never touch atom_map, so r_bonds_only should not require one. - _get_reactive_bonds_from_family: new helper that reads the family's actions (via ReactionFamily(...).actions) and resolves labels through r_label_map into reactant-indexed bond tuples. - get_formed_and_broken_bonds / get_changed_bonds: when self.atom_map is None but family + product_dicts are present, use the helper. Otherwise unchanged. Emits a warning so the fallback path is diagnosable.

github-actions bot added Module: Converter Module: Species labels Nov 27, 2025

github-advanced-security AI found potential problems Nov 27, 2025

View reviewed changes

calvinp0 requested a review from Copilot November 28, 2025 12:56

Copilot started reviewing on behalf of calvinp0 November 28, 2025 12:56 View session

github-advanced-security AI found potential problems Nov 28, 2025

View reviewed changes

Comment thread arc/job/adapters/ts/crest.py Fixed

Copilot finished reviewing on behalf of calvinp0 November 28, 2025 13:00

This comment was marked as resolved.

Sign in to view

github-advanced-security AI found potential problems Nov 28, 2025

View reviewed changes

Comment thread arc/job/adapters/ts/crest.py Fixed

calvinp0 requested a review from Copilot January 15, 2026 16:08

Copilot started reviewing on behalf of calvinp0 January 15, 2026 16:08 View session

This comment was marked as resolved.

Sign in to view

calvinp0 force-pushed the crest_adapter branch from d6a83b4 to c231a43 Compare January 27, 2026 12:39

github-advanced-security AI found potential problems Jan 27, 2026

View reviewed changes

Comment thread arc/job/adapters/ts/heuristics.py Fixed

Comment thread arc/settings/settings.py Fixed

Comment thread arc/settings/settings.py Fixed

calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 3e88c36 to 7674f5f Compare February 2, 2026 09:49

calvinp0 requested a review from Copilot February 2, 2026 12:10

Copilot started reviewing on behalf of calvinp0 February 2, 2026 12:11 View session

This comment was marked as resolved.

Sign in to view

Copilot AI reviewed Feb 2, 2026

View reviewed changes

calvinp0 force-pushed the crest_adapter branch 3 times, most recently from 9c25f3d to 51368be Compare February 2, 2026 14:31

github-actions bot added the Module Functional label Feb 3, 2026

calvinp0 requested a review from Copilot February 4, 2026 08:49

Copilot started reviewing on behalf of calvinp0 February 4, 2026 08:49 View session

This comment was marked as resolved.

Sign in to view

calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 9be935e to eab7647 Compare February 4, 2026 20:14

calvinp0 requested a review from alongd February 4, 2026 21:30

calvinp0 added 13 commits April 12, 2026 16:17

Merge branch 'pipe_job_cancel' into crest_adapter_pipe

5646c5b

Merge branch 'main' into crest_adapter_pipe

2dbae7a

Wire Level.cabs into ORCA input file generation

9e1b3fa

Previously self.level.cabs was stored on Level but never templated, so F12 jobs silently ran with DimCABS = 0. Add a ${cabs} slot on the ! line after ${auxiliary_basis} and populate it from self.level.cabs.

Emit no whitespace when cabs is unset

ef121b6

Moved the separator space out of the template and into the cabs value itself so the non-F12 input file (the 99% case) gets the exact same output as before. Existing orca_test snapshots pass unchanged.

Merge branch 'main' into crest_adapter

9182a1a

calvinp0 force-pushed the crest_adapter branch from bd9a771 to 3e8a7e0 Compare April 18, 2026 20:52

calvinp0 added 2 commits April 18, 2026 23:53

Fixes

0149310

Diagnostics.

3e8a7e0

github-advanced-security AI found potential problems Apr 18, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 19, 2026

View reviewed changes

Comment thread arc/job/trsh.py Fixed

calvinp0 added 2 commits April 20, 2026 09:21

calvinp0 force-pushed the crest_adapter branch from 1b6c859 to 2ef5e15 Compare April 20, 2026 06:21

calvinp0 added 5 commits April 20, 2026 09:22

.

3ef6716

.

2ef5e15

github-actions bot added the Module: Mapping label Apr 20, 2026

calvinp0 added 3 commits April 20, 2026 18:31

@@ -43,9 +43,7 @@
             logger = logging.getLogger('pipe_worker')
-            _diagnostics_logged = False
             def _log_node_diagnostics() -> None:
                 """Dump hostname, PBS/Slurm array context, PATH, PYTHONPATH, TMPDIR, and
                 group membership on first task failure. A dead compute node otherwise
@@ -54,10 +51,9 @@
                 many sites. Logging once per worker process keeps the volume bounded
                 when one bad node drains many tasks.
                 """
-                global _diagnostics_logged
-                if _diagnostics_logged:
+                if getattr(_log_node_diagnostics, '_logged', False):
                     return
-                _diagnostics_logged = True
+                _log_node_diagnostics._logged = True
                 import socket
                 import subprocess
                 try:

Conversation

calvinp0 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Check notice

Copilot Autofix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

calvinp0 commented Nov 27, 2025 •

edited

Loading