Skip to content

Skip 2D-graph isomorphism enforcement for TS species#876

Open
calvinp0 wants to merge 1 commit intomainfrom
skip-ts-isomorphism-checks
Open

Skip 2D-graph isomorphism enforcement for TS species#876
calvinp0 wants to merge 1 commit intomainfrom
skip-ts-isomorphism-checks

Conversation

@calvinp0
Copy link
Copy Markdown
Member

Summary

  • ARC was enforcing 2D-graph isomorphism on TSs, but TS connectivity/bond-orders are not uniquely defined (partial forming/breaking bonds), so valid TS geometries were being rejected.
  • ARCSpecies.mol_from_xyz now short-circuits the isomorphism check for TSs and simply accepts the perceived molecule.
  • Scheduler: is_ts guards added at the three remaining check_xyz_isomorphism call sites (check_directed_scan, check_directed_scan_job, troubleshoot_scan_job). Matches the existing guards in parse_composite_geo and parse_opt_geo.
  • Secondary fix: the non-TS warning message in mol_from_xyz now logs perceived_mol in the "and:" branch (it was logging self.mol twice).
  • Docs updated in TS_search.rst and advanced.rst to describe the policy.

Design note for reviewer

I kept the policy at the call sites rather than pushing an early-return into check_xyz_isomorphism itself. Rationale: returning True from a method named "check..." when no check ran conflates "passed" with "skipped", and several callers propagate the value into logs/restart dicts (self.output[label]['isomorphism'] += '... passed ...'). The scheduler already used this call-site-guard pattern at parse_composite_geo (line 2434) and parse_opt_geo (line 2537), so this keeps the codebase consistent.

One thing worth a second look: at check_directed_scan_job, is_isomorphic=True is now stored into rotors_dict[i]['directed_scan'][...]['is_isomorphic'] for TSs. If any downstream consumer trusts that field as evidence a real check ran, a sentinel (None for skipped) or a rename would be better. I didn't change it because the same semantic already exists in the docstring for mol_from_xyz, but @alongd please weigh in.

Test plan

  • arc.species.species_test.TestARCSpecies.test_ts_mol_from_xyz_skips_isomorphism_enforcement — TS accepts perceived xyz-derived molecule even when stored 2D graph disagrees
  • arc.scheduler_test.TestScheduler.test_check_directed_scan_job_skips_isomorphism_for_ts — mock-based, asserts check_xyz_isomorphism is NOT called and is_isomorphic=True is recorded
  • arc.scheduler_test.TestScheduler.test_troubleshoot_scan_job_skips_isomorphism_for_ts — mock-based, asserts check_xyz_isomorphism is NOT called and final_xyz / run_opt_job path runs
  • Existing test_mol_from_xyz and test_check_xyz_isomorphism still pass (non-TS behavior unchanged)
  • Full suite run in CI

cc @alongd — would value your eyes on the design note above, especially whether the call-site-guard pattern is the right call or whether you'd prefer centralizing in check_xyz_isomorphism itself.

Transition state connectivity and bond orders are not uniquely defined
(partial forming/breaking bonds at the saddle point), so enforcing
isomorphism against a stored 2D adjacency list rejects valid TS
geometries.

- species.ARCSpecies.mol_from_xyz: short-circuit to accept the perceived
  molecule for TSs without running check_xyz_isomorphism.
- scheduler: add is_ts guards at the three remaining check_xyz_isomorphism
  call sites (check_directed_scan, check_directed_scan_job,
  troubleshoot_scan_job). Matches the existing is_ts guards in
  parse_composite_geo and parse_opt_geo.
- Fix secondary bug in the non-TS warning message: log perceived_mol
  in the "and:" branch (previously logged self.mol twice).
- Update TS_search.rst and advanced.rst to document the policy.
- Tests: TS mol_from_xyz acceptance, and two mock-based scheduler tests
  covering check_directed_scan_job and troubleshoot_scan_job.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts ARC’s 2D-graph isomorphism policy so transition-state (TS) species are not rejected due to non-unique TS connectivity/bond-order definitions.

Changes:

  • Skip 2D-graph isomorphism enforcement in ARCSpecies.mol_from_xyz for TS species and fix the non-TS warning to log the perceived molecule correctly.
  • Add is_ts guards at remaining scheduler call sites that previously enforced check_xyz_isomorphism for TS-related flows.
  • Add/extend unit tests and update docs to reflect the TS isomorphism policy.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/source/advanced.rst Documents that isomorphism enforcement applies to non-TS species, and clarifies TS validation alternatives.
docs/source/TS_search.rst States TS geometries are not required to match a single strict 2D graph; TS validation uses TS-specific checks.
arc/species/species.py Skips isomorphism enforcement for TS species in mol_from_xyz; fixes warning output to reference perceived_mol.
arc/species/species_test.py Adds a unit test asserting TS mol_from_xyz accepts the perceived molecule even when 2D graph disagrees.
arc/scheduler.py Adds is_ts guards to skip check_xyz_isomorphism for TS in directed scan and troubleshooting paths.
arc/scheduler_test.py Adds mock-based tests ensuring scheduler paths do not call check_xyz_isomorphism for TS species.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread arc/species/species.py
Comment on lines +1644 to +1669
if self.is_ts:
if not self.keep_mol:
self.mol = perceived_mol
else:
try:
order_atoms(ref_mol=perceived_mol, mol=self.mol)
except SanitizationError:
else:
allow_nonisomorphic_2d = (self.charge is not None and self.charge) \
or self.mol.has_charge() or perceived_mol.has_charge() \
or (self.multiplicity is not None and self.multiplicity >= 3) \
or self.mol.multiplicity >= 3 or perceived_mol.multiplicity >= 3
isomorphic = self.check_xyz_isomorphism(mol=perceived_mol,
xyz=xyz,
allow_nonisomorphic_2d=allow_nonisomorphic_2d)
if not isomorphic:
logger.warning(f'XYZ and the 2D graph representation for {self.label} are not isomorphic.\nGot '
f'xyz:\n{xyz}\n\nwhich corresponds to {self.mol.copy(deep=True).to_smiles()}\n'
f'{self.mol.copy(deep=True).to_adjacency_list()}\n\nand: '
f'{perceived_mol.copy(deep=True).to_smiles()}\n'
f'{perceived_mol.copy(deep=True).to_adjacency_list()}')
raise SpeciesError(f'XYZ and the 2D graph representation for {self.label} are not compliant.')
if not self.keep_mol:
if is_mol_valid(perceived_mol, charge=self.charge, multiplicity=self.multiplicity, n_radicals=self.number_of_radicals):
self.mol = perceived_mol
else:
try:
order_atoms(ref_mol=perceived_mol, mol=self.mol)
except SanitizationError:
self.mol = perceived_mol
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mol_from_xyz() no longer handles keep_mol=True: for TSs it does nothing (so atom order may stay inconsistent with xyz), and for non-TSs the order_atoms() fallback is now under if not self.keep_mol, which appears inverted relative to the docstring (“prevent the generation of a new Molecule object”). This likely breaks workflows that set keep_mol=True to preserve the existing Molecule while reordering atoms (e.g., E0-only species created with keep_mol=True). Consider restoring the previous branching: when keep_mol is True, attempt order_atoms(ref_mol=perceived_mol, mol=self.mol) (with a fallback to assigning self.mol = perceived_mol on sanitization failure); when keep_mol is False, assign self.mol = perceived_mol (optionally gated by is_mol_valid).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order_atoms() fallback was already gated under if not self.kee_mol on main. So both self.mol = perceived_mol assignment and the order_atoms fallback sat inside the same if not self.keep_mol block. So, keep_mol=True has always been don't touch self.mol. The PR perserves that exactly for non-TSs.

The TS branch is deliberately different: a transition state is not a species in the RMG sense, and a 2D molecular graph cannot represent the partial forming/breaking bonds at a saddle point

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.43%. Comparing base (70dab25) to head (6efe451).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #876      +/-   ##
==========================================
- Coverage   60.48%   60.43%   -0.05%     
==========================================
  Files         102      102              
  Lines       31102    31111       +9     
  Branches     8104     8109       +5     
==========================================
- Hits        18811    18803       -8     
+ Misses       9954     9952       -2     
- Partials     2337     2356      +19     
Flag Coverage Δ
functionaltests 60.43% <ø> (-0.05%) ⬇️
unittests 60.43% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants