Add explainers for nearest neighbor models#487
Add explainers for nearest neighbor models#487Zaphoood wants to merge 89 commits intommschlk:mainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces dedicated explainers for scikit-learn nearest-neighbor classifiers to support efficient data-valuation-style explanations, and adds accompanying benchmark games, tests, and documentation.
Changes:
- Add NN explainers (
KNNExplainer,WeightedKNNExplainer,ThresholdNNExplainer) and benchmark games for exact/ground-truth comparisons. - Extend explainer dispatch to recognize scikit-learn KNN/RadiusNeighbors classifiers and add utility APIs for converting first-order arrays to/from
InteractionValues. - Add unit tests/fixtures and a documentation notebook covering NN data valuation.
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
src/shapiq/explainer/utils.py |
Adds explainer type dispatch for sklearn KNN/RadiusNeighbors models. |
src/shapiq/explainer/nn/base.py |
Introduces shared NN explainer base (data extraction + validation). |
src/shapiq/explainer/nn/knn.py |
Adds unweighted KNN explainer implementation. |
src/shapiq/explainer/nn/weighted_knn.py |
Adds weighted KNN explainer implementation. |
src/shapiq/explainer/nn/threshold_nn.py |
Adds radius/threshold NN explainer implementation. |
src/shapiq/explainer/nn/_util.py |
Adds shared NN helper validation + ignored-parameter warning. |
src/shapiq/explainer/nn/games/* |
Adds benchmark “Game” implementations for exact Shapley ground truth. |
src/shapiq/interaction_values.py |
Adds from_first_order_array / to_first_order_array helpers. |
tests/shapiq/fixtures/data.py |
Adds a small multi-class dataset fixture + feature constant changes. |
tests/shapiq/fixtures/models.py |
Adds sklearn NN model fixtures. |
tests/shapiq/tests_unit/tests_explainer/tests_nn_explainers/* |
Adds unit tests for NN explainers, games, and utilities. |
tests/shapiq/tests_unit/test_interaction_values.py |
Adds tests for first-order array conversion helpers. |
docs/source/notebooks/tabular_notebooks/knn_data_valuation.ipynb |
New notebook demonstrating NN explainers for data valuation. |
docs/source/references.bib |
Adds citations for Jia 2019 / Wang 2023 / Wang 2024. |
CHANGELOG.md |
Adds changelog entry for NN explainers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>
Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>
Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>
Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>
Signed-off-by: Maximilian <maximilian.muschalik@gmail.com>
mmschlk
left a comment
There was a problem hiding this comment.
I applied some minor changes I noticed going through the review. The biggest change is that I introduced a ValidNNExplainerIndices = Literal["SV"] that captures the meaning of the current KNN Explainers better (aka. only SV possible).
There is only one outstanding change that need to be resolved before we can merge this in: We need to remove all notebooks from the repo, as I want to merge this PR before I am merging the KNN PR in. This means all notebooks need to be converted into this sphinx gallery format and examples then live under examples/<folder/plot_.py. Notably, each example needs to be very quick and take not more than a couple of seconds. We will also have a way to have longer examples somewhere but currently I would like to emphasize speed here (which KNN explainers anyway are)
This reverts commit c47ea53.
Motivation and Context
This PR adds three new explainers for efficiently explaining sklearns
KNearestNeighborandRadiusNeighborClassifiermodels.Public API Changes
How Has This Been Tested?
Unittests have been added that compare the results of the explainers' efficient algorithms to baseline values which are obtained by computing the exact Shapley Values of a ground-truth Game.
Checklist
CHANGELOG.md(if relevant for users).