Skip to content

Published validation roadmap #198

@ericchansen

Description

@ericchansen

Summary

The q2mm rewrite now has strong unit and integration coverage, but it still needs a repo-visible validation program proving that it can reproduce or improve upon published Q2MM results.

This umbrella issue tracks that program end-to-end.

Validation program

Check 1 — Evaluate published force fields with q2mm engines

Goal: load the force field actually published in the literature, evaluate it against the corresponding QM reference data with the new q2mm engine stack, and save the resulting evidence (metrics, golden fixtures, and force-field artifacts).

For each actionable published system, Check 1 should answer:

  • Can q2mm load the published FF correctly?
  • Can q2mm evaluate it against the original QM reference data?
  • Does the resulting fit quality match expectations from the literature and legacy code?
  • Are the results saved in committed, reproducible artifacts?

Check 2 — Re-derive the published force field with q2mm optimizers

Goal: starting from the untrained or pre-optimization FF, use q2mm's current optimization stack to recover parameters that are as good as or better than the published FF, and ideally do so faster.

For each actionable published system, Check 2 should answer:

  • Can q2mm converge from the starting FF to the published region of parameter space?
  • Does the final objective score match or beat the published FF under the same evaluation path?
  • How close are the re-derived parameters to the published parameters?
  • What are the wall-time and evaluation-count costs?

Current state

PR #196 established the foundation for this roadmap (Jaguar Hessian AU bug fix, Rh-enamide Check 1 evaluation harness). Subsequent PRs advanced the state:

The Check 1 parity gap for Rh-enamide is now attributed to MM3 functional-form differences between MacroModel and OpenMM (not a parsing bug). The golden fixture and xfail gates remain in place.

Systems inventory

System Published FF Structures QM reference Check 1 status Check 2 status
Rh-enamide (Donoghue 2008) Yes Yes Yes Harness complete; parity gap attributed to MM3 functional-form differences (#197, closed) Not started
OsO4 dihydroxylation (Norrby 2000) Yes No No Blocked by missing training data Blocked
Ru ketone hydrogenation (Hansen 2016) Yes Partial No Blocked by missing QM data Blocked
Sulfone Yes No No Blocked by missing training data Blocked
Pd / Heck-family systems Partial Partial No Blocked by missing FF/QM data Blocked

Exit criteria for this umbrella issue

This issue is complete when:

  1. At least one published system has a fully documented, committed Check 1 result with saved golden fixture(s) and force-field artifacts.
  2. At least one published system has a fully documented, committed Check 2 result comparing re-derived vs published parameters and scores.
  3. The workflow for adding additional published systems is documented and repeatable.
  4. Remaining blocked systems are either sourced with data or explicitly documented as blocked by missing external assets.

Immediate next steps

  1. Start Rh-enamide Check 2:
    • create the re-derivation test
    • save comparison fixtures and the re-derived FF
    • benchmark convergence quality and runtime
  2. Expand to the next actionable published system once Rh-enamide has both Check 1 and Check 2 evidence.

Notes

  • The goal is not just passing tests; it is preserving scientific evidence in committed artifacts.
  • Check 1 must come before Check 2 for a given system.
  • ee%% prediction is explicitly out of scope until the published-system validation path is established.

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapUmbrella tracking issuesvalidationParity checks and published FF comparisons

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions