Protocol-driven test suite rework + cross-check pipeline by mmschlk · Pull Request #512 · mmschlk/shapiq

mmschlk · 2026-04-16T20:44:19Z

Consolidates and supersedes #511. Contains the full scope of the test-suite rework in three layers.

1. Protocol-driven rewrite (from #511's original commits)

Replaces the 75-file suite (~10k LOC, 321 tests) with 8 protocol-driven files (~1.5k LOC). Design docs at docs/superpowers/plans/2026-04-15-test-rework.md and docs/superpowers/specs/2026-04-15-test-rework-design.md.

test_approximators.py — registry + parametrized TestApproximatorProtocol across 20+ approximator/index configs
test_explainers.py — TabularExplainer, AgnosticExplainer, ProductKernelExplainer protocols + validation
test_tree.py — TreeExplainer across sklearn/xgboost/lightgbm with manual TreeModel tests + segfault regressions
test_imputers.py — imputer registry with 4 core imputers
test_interaction_values.py — data-structure correctness
test_game_theory.py — ExactComputer, indices, MoebiusConverter
test_plots.py — plot smoke tests
test_public_api.py — every concrete public subclass is exported in __all__
conftest.py — shared game/model/data fixtures + skip_if_no_* markers
pyproject.toml — slow marker + addopts = -m 'not slow' tiering

2. Close coverage gaps (commit `fd0282a`)

Imputers: add GenerativeConditionalImputer to IMPUTER_CONFIGS; slow-gated TestTabPFNImputer
Explainers: slow-gated TestTabPFNExplainer
Plots: smoke tests for network_plot, stacked_bar_plot, upset_plot, si_graph_plot, sentence_plot, beeswarm_plot, abbreviate_feature_names
New test_utils.py — 26 unit tests for shapiq.utils.{sets,modules,datasets,errors}
New TestAggregation, TestCore, TestGame in test_game_theory.py
New slow-gated test_datasets.py for the three built-in dataset loaders

3. Cross-check pipeline (commit `67cd77f`) — correctness layer

Turns protocol contract checks into correctness tests by making independent ground-truth sources agree on the same game. Five test classes in test_cross_checks.py:

Test class	Ground-truth edge	Indices / methods
`TestExactVsSOUM`	`ExactComputer(SOUM)` ↔ `SOUM.exact_values`	SV, SII, k-SII, STII, FSII, FBII
`TestMoebiusConverter`	`ExactComputer("Moebius")` → `MoebiusConverter` → target ↔ `ExactComputer(target)`	same set
`TestApproximatorAtFullBudget`	11 consistent approximators at `budget=2**n` ↔ `SOUM.exact_values`	KernelSHAP, KernelSHAPIQ, InconsistentKernelSHAPIQ, UnbiasedKernelSHAP, RegressionFSII, RegressionFBII, SHAPIQ (SII/k-SII/STII), SVARMIQ, SVARM
`TestApproximatorConvergence` (slow)	error decreases with budget	Permutation*, Owen, Stratified
`TestTreeExplainerVsExactComputer`	`ExactComputer(TreeSHAPIQXAI)` ↔ `TreeExplainer.explain(x)`	SV, k-SII on a small decision tree

Tolerance strategy: atol=1e-10 for pairs that should be analytically identical, 1e-8 for larger Moebius-converted games (n=7), 1e-6 for LS / Monte Carlo noise. Sampling-based methods verify monotonic error decrease instead of exactness.

Supporting changes in conftest.py: soum_5 / soum_7 fixtures, GROUND_TRUTH_INDICES constant, and an assert_iv_close helper that aligns InteractionValues by interaction_lookup (skipping empty-interaction asymmetry across pipelines).

Runtime

Suite	Before rework	After gaps	After cross-checks
Default (`-m 'not slow'`)	3–5 min / 321 tests	~23s / 232 tests	~25s / 257 tests
Full (`-m ''`)	3–5 min	~36s / 241 tests	~74s / 277 tests

Test plan

uv run pytest tests/shapiq -q — default tier
uv run pytest tests/shapiq -m '' -q — full tier
uv run pre-commit run --all-files
Inject a small bias into one approximator and confirm the matching cross-check fails (correctness tests actually check numerics, not just types)

https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Protocol-driven test suite replacing 75 files with 8, targeting ~1min default runtime. Covers approximator/explainer/tree/imputer protocols, tiering strategy, fixture design, and migration approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

10 tasks covering: pytest config, conftest, approximator/explainer/tree/ imputer/interaction_values/game_theory/plot/public_api tests, old test deletion, and final verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Track shared Claude Code settings (settings.json, agents, commands) while keeping local settings and worktrees gitignored. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replaces the per-module test files under tests_unit/, tests_integration_tests/, and tests_deprecation/ with the 8 protocol-driven files added in previous commits. Shared fixture plugins under tests/shapiq/fixtures/ are preserved because legacy tests in tests/shapiq_games still consume them. The new suite runs in ~25s and makes adding new components trivial: append a config dict to the relevant registry.

Adds a parametrized protocol test for SVR, SVC, and GaussianProcessRegressor models, checking that explain() returns an InteractionValues object, that sum(values) matches the regression prediction, and that explain_X handles batches. Also adds validation tests for the three documented error paths: max_order > 1, unsupported model type, and multiclass SVC. Lifts overall coverage from 60% to 62% and brings explainer/product_kernel/ from 0% to ~85% (game.py remains uncovered as it's a separate Game subclass not exercised by the explainer path).

Extends the existing 8-file protocol suite with targeted additions — no redesign. New coverage: - Imputers: GenerativeConditionalImputer added to IMPUTER_CONFIGS; slow-gated TestTabPFNImputer for the Remove-and-Contextualize imputer. - Explainers: slow-gated TestTabPFNExplainer. - Plots: smoke tests for network, stacked_bar, upset, si_graph, sentence, beeswarm, plus abbreviate_feature_names. - New test_utils.py: unit tests for powerset / pair_subset_sizes / split_subsets_budget / get_explicit_subsets / interaction lookup / coalition transforms / count_interactions / safe_isinstance / check_import_module / shuffle_data / raise_deprecation_warning. - game_theory: TestAggregation (aggregate_base_interaction, aggregate_to_one_dimension), TestCore (egalitarian_least_core), TestGame (Game base-class API: __call__, access_counter, grand/empty coalition values, precompute, save_values/load_values, save/load JSON round-trip). - New slow-gated test_datasets.py: load_california_housing, load_bike_sharing, load_adult_census. Default suite: 232 passed, 12 skipped in ~23s. Full suite (incl. slow): 241 passed, 16 skipped in ~36s. https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

…ruth Turns protocol contract checks into real correctness tests by making independent ground-truth sources agree on the same game. Five test classes in the new tests/shapiq/test_cross_checks.py: 1. TestExactVsSOUM — ExactComputer(SOUM) == SOUM.exact_values for SV/SII/k-SII/STII/FSII/FBII on n=5; n=7 slow-gated. 2. TestMoebiusConverter — round-trip ExactComputer("Moebius") -> MoebiusConverter -> target index matches ExactComputer on the target. 3. TestApproximatorAtFullBudget — 11 consistent approximators (KernelSHAP / KernelSHAPIQ / InconsistentKernelSHAPIQ / UnbiasedKernelSHAP / RegressionFSII / RegressionFBII / SHAPIQ on SII,k-SII,STII / SVARMIQ / SVARM) at budget=2**n match SOUM within 1e-6. 4. TestApproximatorConvergence (slow) — sampling-based approximators (Permutation*, Owen, Stratified) show monotonically decreasing error with more budget on n=7 SOUM. 5. TestTreeExplainerVsExactComputer — TreeExplainer output matches ExactComputer run on TreeSHAPIQXAI.value_function for a 5-feature decision tree (SV and k-SII). Supporting changes in tests/shapiq/conftest.py: - SOUM fixtures (soum_5 default, soum_7 slow). - GROUND_TRUTH_INDICES constant. - assert_iv_close helper that aligns InteractionValues by interaction_lookup (skips empty-interaction asymmetry across pipelines; optional check_baseline flag). Runtime impact: Default suite: 232 -> 257 passed (22s -> 25s) Full suite: 241 -> 277 passed (36s -> 74s) https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

- test_reproducible now aligns InteractionValues by interaction_lookup rather than comparing raw values arrays. SPEX's sparse transform produces the same interaction values on Windows but stores them in a different order in the values array across runs, which broke the old np.allclose(r1.values, r2.values) check. - Remove the stale `from shapiq_games.synthetic import SOUM` import from test_cross_checks.py (ruff auto-removal caused Code Quality to fail). https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

codecov · 2026-04-16T21:01:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The original tests/shapiq/data/test_croc.JPEG was removed in the test-suite rewrite (commit 76aa3ce), but four tests in tests/shapiq_games/tests_legacy/test_local_xai.py still depend on it through the image_and_path fixture. CI fails with FileNotFoundError on those four tests. Make image_and_path skip with a clear message when the JPEG isn't on disk, rather than erroring. Restoring the file (or pointing the fixture elsewhere) re-enables the tests automatically. https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

…exact list Review found that InconsistentKernelSHAPIQ was passing TestApproximatorAtFullBudget only because the SOUM fixture (max_interaction_size=3, max_order=2) happened to sit in a trivial k-additive regime. On a genuinely non-k-additive game the estimator's own docstring says it does not recover the true SII — and indeed it produces ~1.3e-1 errors once the fixture is strengthened. Changes: - soum_5 / soum_7: raise n_basis_games (25 / 40) and set max_interaction_size = n with min_interaction_size = 1, so basis games span all orders from 1..n and the SOUM is not k-additive for any small k. - Drop InconsistentKernelSHAPIQ from CONSISTENT_APPROXIMATORS; leave a comment explaining why it doesn't belong. - Add "BV" to GROUND_TRUTH_INDICES (supported by both ExactComputer and MoebiusConverter). - Add TestMoebiusVsSOUM that compares ExactComputer("Moebius", n) against soum.moebius_coefficients — two independent ground-truth Möbius transforms. - Cache ExactComputer per SOUM (module-scoped exact_soum_5 / exact_soum_7) to avoid redundant 2^n recomputation across parametrised tests. - Update tolerances with measured noise floors (1e-8 for the LS solves in TestExactVsSOUM / TestMoebiusConverter; 1e-6 for the approximator test where Shapley-kernel LS hits ~5e-7 on random non-k-additive games).

…eeExplainer Previously the tree cross-check relied on TreeSHAPIQXAI from shapiq_games, which is slated for removal. Replace it with the coalition-valued game that lives inside shapiq itself — the one actually used under the hood by local XAI setups — and pair it with the matching InterventionalTreeExplainer. - shapiq.tree.interventional.InterventionalGame is a Game subclass whose value_function computes v(S) = E_ref[f(x_S, z_{not S})] over a reference dataset. Running ExactComputer on it brute-forces the Shapley / Banzhaf / faithful values from 2^n coalition evaluations. - shapiq.tree.interventional.InterventionalTreeExplainer computes the same quantities via a tree-walking TreeSHAP-IQ variant. The two are semantically matched (both interventional) — verified empirically: SV, BV, SII, BII, FSII, FBII all agree to ~4e-9. STII is omitted because the two implementations disagree (~1e-1 error, separate bug), and k-SII because InterventionalTreeExplainer does not support it. Note: the default shapiq.TreeExplainer uses path-dependent TreeSHAP-IQ, which has different semantics than InterventionalGame. The test now pairs matching pairs — path-dependent vs interventional explanations were accidentally being compared before only because TreeSHAPIQXAI itself used path-dependent averaging via node_sample_weight.

… tree efficiency Second-round review surfaced five major gaps in the cross-check pipeline. Acting on all of them: 1. assert_iv_close now takes strict=True. When set, both sides must cover the same non-empty interactions (modulo zero-valued keys, which MoebiusConverter drops and ExactComputer emits — a pure encoding difference, not a bug). Adopted in TestExactVsSOUM and TestMoebiusConverter where both pipelines are analytical and should agree on support. 2. Added TestKAddSHAPAtFullBudget. kADD-SHAP is user-facing via kADDSHAP but had no independent ground truth: SOUM.exact_values and MoebiusConverter don't support it. Cross-check against ExactComputer("kADD-SHAP") closes the gap (agreement to ~1e-7). 3. Added TestPathDependentTreeEfficiency. The default shapiq.TreeExplainer (path-dependent TreeSHAP-IQ) was completely unexercised by the interventional cross-check pair. Since no path-dependent Game wrapper exists for a full cross-check, we pin the SV efficiency axiom: sum(SV) == f(x) - E[f]. Cheap, catches most regressions in the polynomial arithmetic or baseline computation. 4. Strengthened TestApproximatorConvergence. errors[-1] < errors[0] was tautological — a 16x budget increase for essentially zero error reduction would pass. Now averages errors over 3 seeds per budget and requires the mean error to halve with 16x budget. Catches silently broken sampling estimators. 5. Tightened TestMoebiusVsSOUM tolerance from 1e-10 to 1e-9 — the alternating-sign sum over 2^n coalitions was liable to flake on Windows/macOS where FMA ordering can eat a few ULPs. Minor polish: - Renamed _small_tree_setup → small_tree_setup for consistency with other fixtures. - Narrowed warnings.catch_warnings to category=UserWarning in the approximator tests — blanket ignore was swallowing deprecation signal. - Updated module docstring from 5 to 6 ground-truth sources.

TestExactVsSOUM and TestMoebiusConverter previously exercised a single SOUM instance (random_state=42). One game can hide real conditioning edge cases: zero-valued interactions, near-singular LS matrices, degenerate basis overlaps. Add soum_5_seeded / soum_7_seeded fixtures parametrised over a fixed list of seeds (42, 1337, 7, 2024, 31415). Each test now runs once per seed × per index, giving 5x game-instance diversity while staying fully deterministic — tolerances remain tight, CI stays reproducible, bisects still work. Test count goes from ~30 to ~130, still sub-10s total. Left the approximator tests on the single-seed fixtures: multiplying 10 approximators × 5 seeds adds visible cost without proportional coverage gain on algorithms that are already exact at full budget.

Closes the last explainer-vs-brute-force gap in the cross-check pipeline. ProductKernelExplainer computes SV analytically via elementary symmetric polynomials on kernel vectors. ProductKernelGame wraps the same RBF kernel and training data as a coalition-valued game. Running ExactComputer on it must agree with the explainer's closed-form output — and empirically does so to ~1e-16 on a 5-feature SVR. Pattern mirrors TestInterventionalTreeCrossCheck. Only SV with max_order=1 is tested since the explainer hard-rejects anything else. Single wiring detail: ProductKernelGame takes the validated ProductKernelModel, not a raw sklearn estimator. The explainer already does this conversion internally, so reading explainer.converted_model is both the cleanest and the most user-accurate path.

Closes the coverage gap on shapiq.tree.linear.LinearTreeSHAP. The class computes first-order path-dependent Shapley values via a Chebyshev polynomial shortcut; it previously had no numerical regression test. The test pairs it against a small private helper game, _PathDependentTreeGame, that brute-forces the same path-dependent value function over 2^n coalitions (the same logic the now-deprecated TreeSHAPIQXAI used internally, replicated in ~25 lines of test scaffolding to avoid depending on shapiq_games). Agreement is asserted to atol=1e-10; empirically matches to ~1e-16. Semantic match: - LinearTreeSHAP: Chebyshev-basis closed-form on validated TreeModel. - _PathDependentTreeGame: for each absent feature, average both children weighted by node_sample_weight. Fallback to uniform weighting only when both node weights are zero (degenerate pruning). Left XGBoost/LightGBM conversion coverage as a separate follow-up to keep this change focused on LinearTreeSHAP alone.

…ter pin Closes two real coverage gaps and pins one known unsupported path: - lgbm_reg (LGBMRegressor) fixture + full protocol (task="regression"). Efficiency check passes to ~2e-9. Previously missing entirely. - lgbm_booster (native lightgbm.Booster) fixture + full protocol (task="regression"). Exercises the native-Booster code path in _lightgbm_model_to_bytes that sklearn-wrapper fixtures never hit. - TestXGBoostBoosterUnsupported — standalone pin test asserting that passing a raw xgboost.Booster raises TypeError("not supported"). Reverse-alarms when the conversion is implemented. Scope intentionally narrow: - No ExtraTreeRegressor/IsolationForest/ExtraTreesClassifier (different concern — separate sklearn conversion paths). - No XGBClassifier/LGBMClassifier efficiency upgrade (task="basic" is semantically correct — XGB/LGBM classifiers output raw margins, not probabilities, so efficiency in proba space doesn't hold). Coverage of src/shapiq/tree/conversion: ~60% -> 82%.

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

The entire tests/shapiq/fixtures/ directory and 13 fixtures in conftest.py had zero consumers after the test suite rework. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

These tests exercise the full seam — sklearn → imputer → approximator → InteractionValues → plots/serialisation — mirroring the canonical flows from README.md and docs/source/introduction/start.rst. They catch cross-module regressions that pass every per-module unit test. Coverage (8 test invocations, <2s total): - test_tabular_explainer_readme_flow (parametrised SV / k-SII / FSII / STII), asserts the efficiency axiom holds end-to-end - test_tree_explainer_efficiency (parametrised SV / k-SII), asserts pointwise efficiency for TreeExplainer - test_agnostic_explainer_on_soum, verifies the Game-based researcher path against ExactComputer ground truth - test_interaction_values_roundtrip_and_plots, covers JSON save/load and all five top-level plot functions https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

- loosen TabularExplainer efficiency tolerance 1e-4 -> 1e-2 for CI robustness across approximators with different budget accounting (still ~1e7x headroom over observed error, still catches real efficiency breaks whose magnitude scales with |pred|) - drop TreeExplainer SV/1 parametrisation; that invariant is already covered by test_cross_checks.TestPathDependentTreeEfficiency. Keep only the novel k-SII/2 case as test_tree_explainer_ksii_efficiency - remove redundant mpl.use("Agg") — conftest.py already sets it https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Revert the SV/1 removal from test_tree_explainer_efficiency. The overlap with test_cross_checks.TestPathDependentTreeEfficiency is a feature of the integration layer, not a bug: the cross-check exercises the lower-level invariant with min_order=1, whereas the integration test asserts the same property through the canonical public API flow (shapiq.TreeExplainer(...).explain(x)). Distinct entry points into the same invariant catches different regressions. https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Extends tests/shapiq/test_datasets.py beyond return-type smoke checks with: exact-shape guards against the docstring, target-column-leakage check, no-NaN postcondition, numpy/pandas path equivalence, and a binary-label check for adult census. https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Replaces the prior 11-test smoke layer in test_plots.py with a four-layer strategy catching regressions that "did it crash?" tests miss: - TestPlotUtils: proper coverage of the pure helpers (format_value, format_labels, abbreviate_feature_names, get_color) that were nearly untested. - TestPlots / TestPlotsNoAbbreviate / TestPlotsWithWords: each public plot is parametrised over (abbreviate, feature_names) variants so a kwarg regressing in one branch doesn't pass silently. - TestPlotStructure: one rich test per plot inspecting the returned Axes/Figure — tick labels, title/xlabel/ylabel honored, expected artists drawn. - TestPlotEdgeCases: all-zero IV, feature_names=None, long names with abbreviate=True, max_display below n_features. 68 tests, ~3s runtime. No new dependencies. https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Both functions previously accepted malformed input silently: - beeswarm_plot with data.shape[1] != n_players would plot a subset or scramble columns without warning. - sentence_plot with len(words) != n_players would index past the InteractionValues or drop entries silently. Each gets a ValueError guard with a clear message. Re-enables the two dropped edge-case tests in TestPlotEdgeCases. https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Rewrite test_imputers.py as a DRY protocol-driven suite: TestImputerProtocol holds the shared contract (full coalition == model(x), present features use x, missing features don't leak x, fit/refit behaviour, random_state reproducibility); per-imputer classes cover what's unique (Baseline mean-from-background, Marginal joint-vs-per-feature sampling on dependent data, Gaussian closed-form conditional mean, GaussianCopula rank round-trip, Generative cluster-aware neighbourhood sampling, TabPFN remove-and-contextualize); TestCrossImputerAgreement asserts relationships across imputers (baseline == marginal on constant background, gaussian ~ marginal on independent data, copula ~ gaussian on standard normal). https://claude.ai/code/session_01XN6xQdEpvZHekXYnRuJwyT

mmschlk and others added 17 commits April 15, 2026 12:09

Add test rework design spec

1175edf

Protocol-driven test suite replacing 75 files with 8, targeting ~1min default runtime. Covers approximator/explainer/tree/imputer protocols, tiering strategy, fixture design, and migration approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add test rework implementation plan

7d0a207

10 tasks covering: pytest config, conftest, approximator/explainer/tree/ imputer/interaction_values/game_theory/plot/public_api tests, old test deletion, and final verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add .claude config to version control

934ba7b

Track shared Claude Code settings (settings.json, agents, commands) while keeping local settings and worktrees gitignored. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Configure pytest slow marker and default addopts for test tiering

00dbbc1

Add new shared conftest with games, fixtures, and skip markers

fb54d72

Add protocol-driven approximator tests with registry

dab6d1a

Add protocol-driven explainer tests

cc7bc98

Add protocol-driven tree tests with SHAP comparison and edge cases

505df06

Add protocol-driven imputer tests

4ed3323

Add condensed InteractionValues tests

5b97a20

Add game theory, plot smoke, and public API tests

5422662

Relax SPEX reproducibility tolerance for sparse transform noise

81fdfbf

Merge branch 'main' into claude/rework-tests-QNyYo

0fafdde

github-project-automation Bot added this to shapiq development Apr 16, 2026

mmschlk changed the title ~~Flesh out protocol test suite + add cross-check pipeline~~ Protocol-driven test suite rework + cross-check pipeline Apr 16, 2026

mmschlk mentioned this pull request Apr 16, 2026

Add test rework design spec #511

Closed

claude added 8 commits April 16, 2026 21:21

claude and others added 16 commits April 18, 2026 14:07

Add test rework summary doc

4df4101

made comment

0e9f7b6

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

removes stale shapiq_games tests

8e8e0c1

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

Remove 1,288 lines of dead test fixtures

6cd273f

The entire tests/shapiq/fixtures/ directory and 13 fixtures in conftest.py had zero consumers after the test suite rework. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

removes stale shapiq_games tests from CI

3534553

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

moves markers into conftest

5e9e555

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

respects slow in CI

d3b422b

Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>

docs: note plot input-validation in changelog

943f9b4

https://claude.ai/code/session_01DHsGf4an1Dnnw4qTnmdB22

Merge branch 'main' into claude/plan-testing-suite-rework-FF4ZA

5d1107f

mmschlk self-assigned this Apr 21, 2026

mmschlk added this to the 1.5.0 milestone Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protocol-driven test suite rework + cross-check pipeline#512

Protocol-driven test suite rework + cross-check pipeline#512
mmschlk wants to merge 42 commits intomainfrom
claude/plan-testing-suite-rework-FF4ZA

mmschlk commented Apr 16, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmschlk commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Protocol-driven rewrite (from #511's original commits)

2. Close coverage gaps (commit fd0282a)

3. Cross-check pipeline (commit 67cd77f) — correctness layer

Runtime

Test plan

Uh oh!

codecov Bot commented Apr 16, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmschlk commented Apr 16, 2026 •

edited

Loading

2. Close coverage gaps (commit `fd0282a`)

3. Cross-check pipeline (commit `67cd77f`) — correctness layer