Skip 2D-graph isomorphism enforcement for TS species#876
Skip 2D-graph isomorphism enforcement for TS species#876
Conversation
Transition state connectivity and bond orders are not uniquely defined (partial forming/breaking bonds at the saddle point), so enforcing isomorphism against a stored 2D adjacency list rejects valid TS geometries. - species.ARCSpecies.mol_from_xyz: short-circuit to accept the perceived molecule for TSs without running check_xyz_isomorphism. - scheduler: add is_ts guards at the three remaining check_xyz_isomorphism call sites (check_directed_scan, check_directed_scan_job, troubleshoot_scan_job). Matches the existing is_ts guards in parse_composite_geo and parse_opt_geo. - Fix secondary bug in the non-TS warning message: log perceived_mol in the "and:" branch (previously logged self.mol twice). - Update TS_search.rst and advanced.rst to document the policy. - Tests: TS mol_from_xyz acceptance, and two mock-based scheduler tests covering check_directed_scan_job and troubleshoot_scan_job.
There was a problem hiding this comment.
Pull request overview
Adjusts ARC’s 2D-graph isomorphism policy so transition-state (TS) species are not rejected due to non-unique TS connectivity/bond-order definitions.
Changes:
- Skip 2D-graph isomorphism enforcement in
ARCSpecies.mol_from_xyzfor TS species and fix the non-TS warning to log the perceived molecule correctly. - Add
is_tsguards at remaining scheduler call sites that previously enforcedcheck_xyz_isomorphismfor TS-related flows. - Add/extend unit tests and update docs to reflect the TS isomorphism policy.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| docs/source/advanced.rst | Documents that isomorphism enforcement applies to non-TS species, and clarifies TS validation alternatives. |
| docs/source/TS_search.rst | States TS geometries are not required to match a single strict 2D graph; TS validation uses TS-specific checks. |
| arc/species/species.py | Skips isomorphism enforcement for TS species in mol_from_xyz; fixes warning output to reference perceived_mol. |
| arc/species/species_test.py | Adds a unit test asserting TS mol_from_xyz accepts the perceived molecule even when 2D graph disagrees. |
| arc/scheduler.py | Adds is_ts guards to skip check_xyz_isomorphism for TS in directed scan and troubleshooting paths. |
| arc/scheduler_test.py | Adds mock-based tests ensuring scheduler paths do not call check_xyz_isomorphism for TS species. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if self.is_ts: | ||
| if not self.keep_mol: | ||
| self.mol = perceived_mol | ||
| else: | ||
| try: | ||
| order_atoms(ref_mol=perceived_mol, mol=self.mol) | ||
| except SanitizationError: | ||
| else: | ||
| allow_nonisomorphic_2d = (self.charge is not None and self.charge) \ | ||
| or self.mol.has_charge() or perceived_mol.has_charge() \ | ||
| or (self.multiplicity is not None and self.multiplicity >= 3) \ | ||
| or self.mol.multiplicity >= 3 or perceived_mol.multiplicity >= 3 | ||
| isomorphic = self.check_xyz_isomorphism(mol=perceived_mol, | ||
| xyz=xyz, | ||
| allow_nonisomorphic_2d=allow_nonisomorphic_2d) | ||
| if not isomorphic: | ||
| logger.warning(f'XYZ and the 2D graph representation for {self.label} are not isomorphic.\nGot ' | ||
| f'xyz:\n{xyz}\n\nwhich corresponds to {self.mol.copy(deep=True).to_smiles()}\n' | ||
| f'{self.mol.copy(deep=True).to_adjacency_list()}\n\nand: ' | ||
| f'{perceived_mol.copy(deep=True).to_smiles()}\n' | ||
| f'{perceived_mol.copy(deep=True).to_adjacency_list()}') | ||
| raise SpeciesError(f'XYZ and the 2D graph representation for {self.label} are not compliant.') | ||
| if not self.keep_mol: | ||
| if is_mol_valid(perceived_mol, charge=self.charge, multiplicity=self.multiplicity, n_radicals=self.number_of_radicals): | ||
| self.mol = perceived_mol | ||
| else: | ||
| try: | ||
| order_atoms(ref_mol=perceived_mol, mol=self.mol) | ||
| except SanitizationError: | ||
| self.mol = perceived_mol |
There was a problem hiding this comment.
mol_from_xyz() no longer handles keep_mol=True: for TSs it does nothing (so atom order may stay inconsistent with xyz), and for non-TSs the order_atoms() fallback is now under if not self.keep_mol, which appears inverted relative to the docstring (“prevent the generation of a new Molecule object”). This likely breaks workflows that set keep_mol=True to preserve the existing Molecule while reordering atoms (e.g., E0-only species created with keep_mol=True). Consider restoring the previous branching: when keep_mol is True, attempt order_atoms(ref_mol=perceived_mol, mol=self.mol) (with a fallback to assigning self.mol = perceived_mol on sanitization failure); when keep_mol is False, assign self.mol = perceived_mol (optionally gated by is_mol_valid).
There was a problem hiding this comment.
the order_atoms() fallback was already gated under if not self.kee_mol on main. So both self.mol = perceived_mol assignment and the order_atoms fallback sat inside the same if not self.keep_mol block. So, keep_mol=True has always been don't touch self.mol. The PR perserves that exactly for non-TSs.
The TS branch is deliberately different: a transition state is not a species in the RMG sense, and a 2D molecular graph cannot represent the partial forming/breaking bonds at a saddle point
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #876 +/- ##
==========================================
- Coverage 60.48% 60.43% -0.05%
==========================================
Files 102 102
Lines 31102 31111 +9
Branches 8104 8109 +5
==========================================
- Hits 18811 18803 -8
+ Misses 9954 9952 -2
- Partials 2337 2356 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
ARCSpecies.mol_from_xyznow short-circuits the isomorphism check for TSs and simply accepts the perceived molecule.is_tsguards added at the three remainingcheck_xyz_isomorphismcall sites (check_directed_scan,check_directed_scan_job,troubleshoot_scan_job). Matches the existing guards inparse_composite_geoandparse_opt_geo.mol_from_xyznow logsperceived_molin the "and:" branch (it was loggingself.moltwice).TS_search.rstandadvanced.rstto describe the policy.Design note for reviewer
I kept the policy at the call sites rather than pushing an early-return into
check_xyz_isomorphismitself. Rationale: returningTruefrom a method named "check..." when no check ran conflates "passed" with "skipped", and several callers propagate the value into logs/restart dicts (self.output[label]['isomorphism'] += '... passed ...'). The scheduler already used this call-site-guard pattern atparse_composite_geo(line 2434) andparse_opt_geo(line 2537), so this keeps the codebase consistent.One thing worth a second look: at
check_directed_scan_job,is_isomorphic=Trueis now stored intorotors_dict[i]['directed_scan'][...]['is_isomorphic']for TSs. If any downstream consumer trusts that field as evidence a real check ran, a sentinel (Nonefor skipped) or a rename would be better. I didn't change it because the same semantic already exists in the docstring formol_from_xyz, but @alongd please weigh in.Test plan
arc.species.species_test.TestARCSpecies.test_ts_mol_from_xyz_skips_isomorphism_enforcement— TS accepts perceived xyz-derived molecule even when stored 2D graph disagreesarc.scheduler_test.TestScheduler.test_check_directed_scan_job_skips_isomorphism_for_ts— mock-based, assertscheck_xyz_isomorphismis NOT called andis_isomorphic=Trueis recordedarc.scheduler_test.TestScheduler.test_troubleshoot_scan_job_skips_isomorphism_for_ts— mock-based, assertscheck_xyz_isomorphismis NOT called andfinal_xyz/run_opt_jobpath runstest_mol_from_xyzandtest_check_xyz_isomorphismstill pass (non-TS behavior unchanged)cc @alongd — would value your eyes on the design note above, especially whether the call-site-guard pattern is the right call or whether you'd prefer centralizing in
check_xyz_isomorphismitself.