Conversation
There was a problem hiding this comment.
Pull request overview
Refactors Scheduler.schedule_jobs() to ensure TS conformer-stage completion is evaluated even when the per-job processing loop exits early (e.g., due to troubleshooting/break), preventing successful TS conformers from being ignored.
Changes:
- Removes inline “all conformer jobs done” / “all TSG jobs done” checks from the per-job event loop.
- Adds a second-phase per-iteration stage-transition pass via
_check_conformer_stage_complete(),_check_tsg_stage_complete(), and_check_species_complete(). - Adds explicit handling for “all conformers failed (no energy)” and an isomorphism-troubleshooting exhaustion guard.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #864 +/- ##
=======================================
Coverage 60.08% 60.09%
=======================================
Files 102 102
Lines 31043 31080 +37
Branches 8082 8095 +13
=======================================
+ Hits 18652 18676 +24
- Misses 10079 10082 +3
- Partials 2312 2322 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dd827e2 to
9c4221a
Compare
Refactor the scheduling logic to separate job termination processing from stage transition decisions. By moving conformer, TS guess, and species completion checks into dedicated methods called at the end of the polling loop, the state machine becomes more robust and ensures transitions are evaluated consistently across different execution modes.
| return | ||
| if any('tsg' in j for j in self.running_jobs.get(label, [])): | ||
| return | ||
| if not self.species_dict[label].is_ts: |
There was a problem hiding this comment.
intuitively, I'd start with this check (will return from this func more often than the other checks)
| del self.running_jobs[label] | ||
| for label in list(self.unique_species_labels): | ||
| if label in self.output and self.output[label]['convergence'] is False: | ||
| continue |
There was a problem hiding this comment.
This skips failed species but still runs all three checks for already-converged species. Maybe instead, continue if convergence is not None?
| return | ||
| if any(label in {t.owner_key for t in p.tasks} | ||
| for p in self.active_pipes.values() | ||
| if any(t.task_family in ('conf_opt', 'conf_sp', 'ts_opt') for t in p.tasks)): |
There was a problem hiding this comment.
This checks ts_opt , but the function is about conformers. If a species has active ts_opt jobs but its conformers are all done, this guard would incorrectly block the conformer stage from completing. Restrict to ('conf_opt', 'conf_sp') only?
| """ | ||
| if 'tsg' not in self.job_dict.get(label, {}): | ||
| return | ||
| if any('tsg' in j for j in self.running_jobs.get(label, [])): |
There was a problem hiding this comment.
If TSG jobs are submitted through pipe mode, the method only checks self.running_jobs for 'tsg' entries but doesn't check self.active_pipes
| e is not None for e in self.species_dict[label].conformer_energies) | ||
|
|
||
| if not has_successful_conformer: | ||
| logger.error(f'All conformer jobs for {label} failed. ' |
There was a problem hiding this comment.
should we also mark a failed conformer state in the species or .output in scheduler? (I don't remember but we should have a f;lag for this that should be turned on here)
| ) | ||
| return sched, spc.label | ||
|
|
||
| def test_check_conformer_stage_complete_spawns_opt_for_ts(self): |
There was a problem hiding this comment.
The test covers the TS path. Add a test for non-TS species?
The Bug
When the last conformer job for a TS exhausted troubleshooting, determine_most_likely_ts_conformer was never called — 47 successful conformers were ignored and TS0 was marked "did not converge."
Refactoring
Structural fix — Two-Phase Loop:
Three new methods:
Edge cases found:
All guards are idempotent — Phase 2 runs every iteration but each check fires at most once per species per stage.