Published validation roadmap

## Summary

The q2mm rewrite now has strong unit and integration coverage, but it still needs a repo-visible validation program proving that it can reproduce or improve upon published Q2MM results.

This umbrella issue tracks that program end-to-end.

## Validation program

### Check 1 — Evaluate published force fields with q2mm engines

Goal: load the force field actually published in the literature, evaluate it against the corresponding QM reference data with the new q2mm engine stack, and save the resulting evidence (metrics, golden fixtures, and force-field artifacts).

For each actionable published system, Check 1 should answer:

- Can q2mm load the published FF correctly?
- Can q2mm evaluate it against the original QM reference data?
- Does the resulting fit quality match expectations from the literature and legacy code?
- Are the results saved in committed, reproducible artifacts?

### Check 2 — Re-derive the published force field with q2mm optimizers

Goal: starting from the untrained or pre-optimization FF, use q2mm's current optimization stack to recover parameters that are as good as or better than the published FF, and ideally do so faster.

For each actionable published system, Check 2 should answer:

- Can q2mm converge from the starting FF to the published region of parameter space?
- Does the final objective score match or beat the published FF under the same evaluation path?
- How close are the re-derived parameters to the published parameters?
- What are the wall-time and evaluation-count costs?

## Current state

PR #196 established the foundation for this roadmap (Jaguar Hessian AU bug fix, Rh-enamide Check 1 evaluation harness). Subsequent PRs advanced the state:

- **PR #224** — Fixed MM3 FLD parsing to load standard parameters (not just substructure), closing #197
- **PR #223** — Implemented QFUERZA analytical frequency gradients (#208)
- **PR #225** — Documentation and code cleanup sweep

The Check 1 parity gap for Rh-enamide is now attributed to MM3 functional-form differences between MacroModel and OpenMM (not a parsing bug). The golden fixture and xfail gates remain in place.

## Systems inventory

| System | Published FF | Structures | QM reference | Check 1 status | Check 2 status |
|---|---|---|---|---|---|
| Rh-enamide (Donoghue 2008) | Yes | Yes | Yes | Harness complete; parity gap attributed to MM3 functional-form differences (#197, closed) | Not started |
| OsO4 dihydroxylation (Norrby 2000) | Yes | No | No | Blocked by missing training data | Blocked |
| Ru ketone hydrogenation (Hansen 2016) | Yes | Partial | No | Blocked by missing QM data | Blocked |
| Sulfone | Yes | No | No | Blocked by missing training data | Blocked |
| Pd / Heck-family systems | Partial | Partial | No | Blocked by missing FF/QM data | Blocked |

## Exit criteria for this umbrella issue

This issue is complete when:

1. At least one published system has a fully documented, committed Check 1 result with saved golden fixture(s) and force-field artifacts.
2. At least one published system has a fully documented, committed Check 2 result comparing re-derived vs published parameters and scores.
3. The workflow for adding additional published systems is documented and repeatable.
4. Remaining blocked systems are either sourced with data or explicitly documented as blocked by missing external assets.

## Immediate next steps

1. Start Rh-enamide Check 2:
   - create the re-derivation test
   - save comparison fixtures and the re-derived FF
   - benchmark convergence quality and runtime
2. Expand to the next actionable published system once Rh-enamide has both Check 1 and Check 2 evidence.

## Notes

- The goal is not just passing tests; it is preserving scientific evidence in committed artifacts.
- Check 1 must come before Check 2 for a given system.
- ee%% prediction is explicitly out of scope until the published-system validation path is established.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Published validation roadmap #198

Summary

Validation program

Check 1 — Evaluate published force fields with q2mm engines

Check 2 — Re-derive the published force field with q2mm optimizers

Current state

Systems inventory

Exit criteria for this umbrella issue

Immediate next steps

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

System	Published FF	Structures	QM reference	Check 1 status	Check 2 status
Rh-enamide (Donoghue 2008)	Yes	Yes	Yes	Harness complete; parity gap attributed to MM3 functional-form differences (#197, closed)	Not started
OsO4 dihydroxylation (Norrby 2000)	Yes	No	No	Blocked by missing training data	Blocked
Ru ketone hydrogenation (Hansen 2016)	Yes	Partial	No	Blocked by missing QM data	Blocked
Sulfone	Yes	No	No	Blocked by missing training data	Blocked
Pd / Heck-family systems	Partial	Partial	No	Blocked by missing FF/QM data	Blocked

Published validation roadmap #198

Description

Summary

Validation program

Check 1 — Evaluate published force fields with q2mm engines

Check 2 — Re-derive the published force field with q2mm optimizers

Current state

Systems inventory

Exit criteria for this umbrella issue

Immediate next steps

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions