-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Add example comparing permutation importance and SHAP #18139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some considerations:
cc @ogrisel |
Thanks @lucyleeow ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small non-functional comment.
@@ -5,4 +5,5 @@ scikit-image | |||
pandas | |||
sphinx-gallery | |||
scikit-learn | |||
shap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How stable are the APIs of shap that we rely on here? should we pin this dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small follow-up: do you know if APIs of shap are stable enough, @lucyleeow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be
I'm fine with this, but I feel it's a bit of a departure from our usual way of doing things. We haven't really compared against other implementations or methods in our examples so far, with the goal of making them easier to execute for users. What might also be interesting is to use a plain permutation version of what SHAP is doing, i.e. look at the average change in response, not the overall accuracy, but I'm not sure if that's super easy to do with the current implementation. |
I put the following reference: https://arxiv.org/pdf/2009.11023.pdf I think that it would nice to put into perspective the three methods.
I agree with this point. I think that we wanted first to gain a bit of knowledge regarding the methods. Then, I am yet sure how we could disseminate whatever we would learn from the example, meaning "should it be in scikit-learn or on external blog post?" |
Happy to rework it. We could include SHAP in our discussion but not show any code/use the external dependency. Or if better somewhere else, happy to help. |
Removing the milestone, and a reminder ping. |
I merged main to see if it fixes some of the problems of the circle ci runs. |
…y with pip when installing shap
I think this example would benefit from a summary table with to constrast the following properties for the three models (permutation importances, TreeSHAP, KernelSHAP):
Furthermore I think the paragraph with item lists would benefit from putting the most important words in bold. |
Thanks for your input @ogrisel ! I was having problems with the plotting functions in the SHAP package: #18139 (comment) |
Uhm I am wondering if they did not make a refactoring because I don't see anymore Instead it seems that you have a single class I need a bit more time to figure out because I am not finding a clear changelog. |
@glemaitre yes, I think the refactoring happened in the last release: https://github.com/slundberg/shap/releases/tag/v0.36.0 but they have kept the code backwards compatible. AFAICT, their explainer docstring says they accept 'kernel' but it doesn't seem to be implemented? https://github.com/slundberg/shap/blob/46b3800b31df04745416da27c71b216f91d61775/shap/explainers/_explainer.py#L171 |
Note that for this example we could use the |
On top of the suggestion above (#18139 (comment)), I think it might be interesting the contrast permutation impotrances and SHAP values for noisy feature on an overfitting model such as an Random Forest with deep trees on a data set with some noise in the target and not too many samples (maybe in a dedicated section at the bottom of the example). I would expect that the SHAP values to be medium to large even for the noisy features (both on the training and the test set) because if the forest use them, they should impact the decision function. However, PI measured on an held out test set should reveal that those features are not useful to generalize (low PI values on the test set) but can cause the model to overfit (larger PI values on the training set). Credit for this idea: https://twitter.com/ChristophMolnar/status/1504390594277302277 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out by @glemaitre, probably we should watch the status of SHAP: shap/shap#2423.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, I am 0 integrating this example (+1 for the example you have done, but -1 as I do not think methods which rely on post hoc explanations are valid from a theoretical and epistemological perspectives).
Can you elaborate on this later aspect? If there are issues regarding using those techniques, then I think that this is actually worth having the example to explain the limitations and shortcomings of using these techniques since in practice they are widely used. |
Yes, can the axioms (listed in section 2.6 of the original paper) be realized in practice?
I do agree that explaining the limitations and shortcomings is of value here. However, I don't think that the popularity of a method induces its validity (hence my original -1). |
Computing the exact Shapley values will ensure that you have these properties. For a small number of features, you can still use the exact method. Then, for a larger number of features, on will use the algorithms that approximate the Shapley values. So after getting through SHAP a bit more while preparing the tutorial of PyData Berlin, I think that we can have 3 contributions in the documentation:
I completely agree with the point that popularity does not cause validity. And this is the exact reason why I think this is important to highlight the limitations and under which assumptions it is relevant to use such a method. |
Outings first: I'm a big fan of shapley values. They are a very important tool in production settings to gain trust by explainability of ML models. I'm not aware of a better tool (if it weren't for its computational intractability...).
I like this example, but I'm also -1 on including it here - at the moment. I think there are better places:
|
I would agree with this just due to the robustness of the API of the package (#18139 (comment)), the repo has been a bit quiet and there are a lot of open issues. |
@lucyleeow We discuss this PR in yesterday's developer call. We agreed that it is worth having the example but due to the current lack of API robustness, we should not have it in the main repository. We think that moving it as an example post on the blog website would be better. We can leverage the blog post via communication platforms to get visibility and make sure to maintain it with further SHAP development. I will open an issue on the blog repository such that we can start to move forward there. |
Thanks for making the decision! I am happy to help where I can! Just let me know |
I will set up the example on the blog repository and ping you on the associated PR. |
@lucyleeow , would you be interested in solving scikit-learn/blog/#107 as a follow-up of this pull request? Then this one could maybe be closed? Thanks! |
Thanks it's on my to-do! Will have to play around with the SHAP API again |
I think this can be closed now, I have opened scikit-learn/blog#139 in the blog repo. Edit: Will close in a few days if there are no objections. |
Reference Issues/PRs
Related to development of inspection tools from: https://scikit-learn.fondation-inria.fr/technical-committee-february-3-2020/
What does this implement/fix? Explain your changes.
Add example comparing permutation importance to SHAP, implemented in shap, which works with scikit-learn estimators.
Any other comments?
First draft, happy to make changes.