8000 Improvement of PDP and ICE plotting · Issue #19410 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Improvement of PDP and ICE plotting #19410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
glemaitre opened this issue Feb 9, 2021 · 5 comments
Open

Improvement of PDP and ICE plotting #19410

glemaitre opened this issue Feb 9, 2021 · 5 comments

Comments

@glemaitre
Copy link
Member

Some discussions from a workshop on interpretation lead us to propose some potential improvements regarding the PDP and ICE plotting utilities.

Regarding PDP, it could be nice to get information regarding the distribution of the in-domain samples for a specific feature. Zhao and Hastie illustrate this in-domain values with a scatter plot in the following paper (p. 7). We potentially could provide such visualization or an alternative representation that could encode this information (e.g., histogram or box-plot). We should probably make some proposal and see what is best.

Regarding ICE, we currently offer the option subsample that will randomly subsample the available ICE. However, it seems to be more meaningful to build percentile from the available ICE lines instead of plotting the ICE lines. Optionally, we could use subsample to compute the different percentile.

@glemaitre
Copy link
Member Author

ping @dsleo in case that I am missing some of the point discuss during the sprint.

@cmarmo cmarmo added the module:inspection label Feb 11, 2021 < 8000 /div>
@vitaliset
Copy link
Contributor

Hello @glemaitre! I really liked the ideas shown here and would be happy to help put them into action. Are you still interested in these features? If so, how can I help out?

Maybe, I can make a prototype of these visualizations, trying them on a few datasets and post them here. We can them discuss if the info they give is actually helpful. Does it make sense or do you prefer to approach this issue in a different way?

@glemaitre
Copy link
Member Author

We would certainly be happy to have better way to visualize the ICEs.

I think we can directly discuss a draft PR. I'm not sure what would be the right API to control the plotting.

@vitaliset
Copy link
Contributor

I was thinking about introducing the first idea as a new parameter, something like feature_visualization or sample_visualization, which can take on values depending on the desired plot type. For example, None would represent the usual plotting, and we could also have "scatter" (only available when kind="both" or "individual"), "histogram", and "box-plot", as you proposed.

The scatter plot could be overlaid on the usual plot (similar to your example), while the histogram or box-plot could be displayed on a twinx axis behind the actual partial dependence plots. Alternatively, the histogram and box-plot could be plotted on a separate axis (directly below the standard axis), but I'm not fond of the idea of adding extra axes, as the current API assumes one axis per PDP plot. The from_estimator class method and the plot function would require the new feature_visualization_kw (or sample_visualization_kw) parameter to accommodate the respective keywords associated with each plot type.

Regarding the ICE percentiles, I believe adding a new argument seems artificial, as it would likely override the subsample parameter when used. Instead, I suggest renaming the subsample argument to something different, perhaps ice_representation (though it's challenging to come up with an appropriate name and maybe we can keep the subsample name). The behavior would remain the same as subsample if you pass None, an integer, or a float. However, if you pass an array-like of floats between 0 and 100 (inclusive), it would plot the percentiles version. The ice_lines_kw would still apply the same behaviour.

@arvkevi
Copy link
arvkevi commented Sep 16, 2023

Hi @vitaliset and @glemaitre, it had been a while since any activity on this thread so I attempted to address the PDP improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Discussion
Development

No branches or pull requests

4 participants
0