8000 DOC Add example comparing permutation importance and SHAP by lucyleeow · Pull Request #18139 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Add example comparing permutation importance and SHAP #18139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 64 commits into from

Conversation

lucyleeow
Copy link
Member
@lucyleeow lucyleeow commented Aug 11, 2020

Reference Issues/PRs

Related to development of inspection tools from: https://scikit-learn.fondation-inria.fr/technical-committee-february-3-2020/

What does this implement/fix? Explain your changes.

Add example comparing permutation importance to SHAP, implemented in shap, which works with scikit-learn estimators.

Any other comments?

First draft, happy to make changes.

@lucyleeow
Copy link
Member Author
lucyleeow commented Aug 11, 2020

Some considerations:

  • I thought about showing calculation time to show that TreeSHAP is much faster than KernelSHAP but didn't know the best way? I don't think it is necessary to use a timing function that runs the fun 100's of times as we don't need that accuracy.
  • The Sundararajan paper mentions several other properties that TreeSHAP violates, but I only mentioned 2 as I thought they were more relevant but open to suggestions.
  • Should I add a summary at the end?

cc @ogrisel

@cmarmo
Copy link
Contributor
cmarmo commented Aug 12, 2020

Thanks @lucyleeow !
@sebconort, @xrenard would you be interested in having a look at this PR? Thanks!

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small non-functional comment.

@@ -5,4 +5,5 @@ scikit-image
pandas
sphinx-gallery
scikit-learn
shap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How stable are the APIs of shap that we rely on here? should we pin this dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small follow-up: do you know if APIs of shap are stable enough, @lucyleeow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be

@amueller
Copy link
Member

I'm fine with this, but I feel it's a bit of a departure from our usual way of doing things. We haven't really compared against other implementations or methods in our examples so far, with the goal of making them easier to execute for users.

What might also be interesting is to use a plain permutation version of what SHAP is doing, i.e. look at the average change in response, not the overall accuracy, but I'm not sure if that's super easy to do with the current implementation.

Base automatically changed from master to main January 22, 2021 10:53
@glemaitre glemaitre added this to the 1.0 milestone Feb 1, 2021
@glemaitre
Copy link
Member

I put the following reference: https://arxiv.org/pdf/2009.11023.pdf

I think that it would nice to put into perspective the three methods.

I'm fine with this, but I feel it's a bit of a departure from our usual way of doing things. We haven't really compared against other implementations or methods in our examples so far, with the goal of making them easier to execute for users.

I agree with this point. I think that we wanted first to gain a bit of knowledge regarding the methods. Then, I am yet sure how we could disseminate whatever we would learn from the example, meaning "should it be in scikit-learn or on external blog post?"

@lucyleeow
Copy link
Member Author

Happy to rework it. We could include SHAP in our discussion but not show any code/use the external dependency. Or if better somewhere else, happy to help.

@adrinjalali
Copy link
Member

Removing the milestone, and a reminder ping.

@ogrisel
Copy link
Member
ogrisel commented Mar 9, 2022

I merged main to see if it fixes some of the problems of the circle ci runs.

@ogrisel
Copy link
Member
ogrisel commented Mar 10, 2022

I think this example would benefit from a summary table with to constrast the following properties for the three models (permutation importances, TreeSHAP, KernelSHAP):

  • model agnosticism
  • ability to account for
  • misleading on strong between-features dependencies
  • relative computational costs
  • additive interpretation of the importances (sum of feature importances has a meaning)
  • measures impact on predictive score vs impact on decision function (the form implies the latter but not necessarily the opposite on modes with poor overall scores)

Furthermore I think the paragraph with item lists would benefit from putting the most important words in bold.

@lucyleeow
Copy link
Member Author

Thanks for your input @ogrisel !

I was having problems with the plotting functions in the SHAP package: #18139 (comment)
It seems the API is not as polished as I thought and I am not sure which direction to go now.

@glemaitre
Copy link
Member
glemaitre commented Mar 11, 2022

Uhm I am wondering if they did not make a refactoring because I don't see anymore KernelExplainer or TreeExplainer: https://shap.readthedocs.io/en/latest/api.html#explainers

Instead it seems that you have a single class Explainer and the type of explainer will be a parameter.

I need a bit more time to figure out because I am not finding a clear changelog.

@lucyleeow
Copy link
Member Author
lucyleeow commented Mar 12, 2022

@glemaitre yes, I think the refactoring happened in the last release: https://github.com/slundberg/shap/releases/tag/v0.36.0 but they have kept the code backwards compatible.

AFAICT, their explainer docstring says they accept 'kernel' but it doesn't seem to be implemented? https://github.com/slundberg/shap/blob/46b3800b31df04745416da27c71b216f91d61775/shap/explainers/_explainer.py#L171

@ogrisel
Copy link
Member
ogrisel commented Mar 15, 2022

Note that for this example we could use the "exact" method because we only have few features. But I find it interesting to present the kernel-based approximation.

@ogrisel
Copy link
Member
ogrisel commented Mar 17, 2022

On top of the suggestion above (#18139 (comment)), I think it might be interesting the contrast permutation impotrances and SHAP values for noisy feature on an overfitting model such as an Random Forest with deep trees on a data set with some noise in the target and not too many samples (maybe in a dedicated section at the bottom of the example).

I would expect that the SHAP values to be medium to large even for the noisy features (both on the training and the test set) because if the forest use them, they should impact the decision function. However, PI measured on an held out test set should reveal that those features are not useful to generalize (low PI values on the test set) but can cause the model to overfit (larger PI values on the training set).

Credit for this idea:

https://twitter.com/ChristophMolnar/status/1504390594277302277

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by @glemaitre, probably we should watch the status of SHAP: shap/shap#2423.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, I am 0 integrating this example (+1 for the example you have done, but -1 as I do not think methods which rely on post hoc explanations are valid from a theoretical and epistemological perspectives).

@glemaitre
Copy link
Member

-1 as I do not think methods which rely on post hoc explanations are valid from a theoretical and epistemological perspectives

Can you elaborate on this later aspect?

If there are issues regarding using those techniques, then I think that this is actually worth having the example to explain the limitations and shortcomings of using these techniques since in practice they are widely used.

@jjerphan
Copy link
Member
jjerphan commented Apr 21, 2022

Can you elaborate on this later aspect?

Yes, can the axioms (listed in section 2.6 of the original paper) be realized in practice?

If there are issues regarding using those techniques, then I think that this is actually worth having the example to explain the limitations and shortcomings of using these techniques since in practice they are widely used.

I do agree that explaining the limitations and shortcomings is of value here. However, I don't think that the popularity of a method induces its validity (hence my original -1).

@glemaitre
Copy link
Member

Yes, can the axioms (listed in section 2.6 of the original paper) be realized in practice?

Computing the exact Shapley values will ensure that you have these properties. For a small number of features, you can still use the exact method.

Then, for a larger number of features, on will use the algorithms that approximate the Shapley values. KernelSHAP and TreeSHAP are such algorithms. They do come with additional assumptions under which the axioms are still valid (feature independence, type of feature perturbation, etc.). The axiom of symmetry can be violated depending on the type of feature perturbation used to compute the approximated Shapley values (http://proceedings.mlr.press/v108/janzing20a/janzing20a.pdf).

So after getting through SHAP a bit more while preparing the tutorial of PyData Berlin, I think that we can have 3 contributions in the documentation:

  • Explain how to read the additive SHAP values
    • The fact that it uses a baseline (mean predictions of the model) is not straightforward
  • Contrast it with permutation importance
    • Global vs. local explanation
    • Explanations that take true y vs. only explaining the model prediction
  • Highlight the limitations and the implication on the reported SHAP values

However, I don't think that the popularity of a method induces its validity (hence my original -1).

I completely agree with the point that popularity does not cause validity. And this is the exact reason why I think this is important to highlight the limitations and under which assumptions it is relevant to use such a method.

@lorentzenchr
Copy link
Member
lorentzenchr commented Apr 21, 2022

Outings first: I'm a big fan of shapley values. They are a very important tool in production settings to gain trust by explainability of ML models. I'm not aware of a better tool (if it weren't for its computational intractability...).

... but I feel it's a bit of a departure from our usual way of doing things. We haven't really compared against other implementations or methods in our examples so far, with the goal of making them easier to execute for users.

I like this example, but I'm also -1 on including it here - at the moment. I think there are better places:

@lucyleeow
Copy link
Member Author

I like this example, but I'm also -1 on including it here

I would agree with this just due to the robustness of the API of the package (#18139 (comment)), the repo has been a bit quiet and there are a lot of open issues.

@glemaitre glemaitre removed the cython label Apr 26, 2022
@glemaitre
Copy link
Member

@lucyleeow We discuss this PR in yesterday's developer call.

We agreed that it is worth having the example but due to the current lack of API robustness, we should not have it in the main repository. We think that moving it as an example post on the blog website would be better.

We can leverage the blog post via communication platforms to get visibility and make sure to maintain it with further SHAP development.

I will open an issue on the blog repository such that we can start to move forward there.

@lucyleeow
Copy link
Member Author

Thanks for making the decision! I am happy to help where I can! Just let me know

@glemaitre
Copy link
Member

Thanks for making the decision! I am happy to help where I can! Just let me know

I will set up the example on the blog repository and ping you on the associated PR.

@cmarmo cmarmo added the Needs Decision - Close Requires decision for closing label May 10, 2022
@cmarmo
Copy link
Contributor
cmarmo commented May 10, 2022

@lucyleeow , would you be interested in solving scikit-learn/blog/#107 as a follow-up of this pull request? Then this one could maybe be closed? Thanks!

@lucyleeow
Copy link
Member Author

Thanks it's on my to-do! Will have to play around with the SHAP API again

@lucyleeow
Copy link
Member Author
lucyleeow commented Sep 16, 2022

I think this can be closed now, I have opened scikit-learn/blog#139 in the blog repo.

Edit: Will close in a few days if there are no objections.

@lucyleeow lucyleeow closed this Sep 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Needs Decision - Close Requires decision for closing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0