-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Improvement of PDP and ICE plotting #19410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ping @dsleo in case that I am missing some of the point discuss during the sprint. |
Hello @glemaitre! I really liked the ideas shown here and would be happy to help put them into action. Are you still interested in these features? If so, how can I help out? Maybe, I can make a prototype of these visualizations, trying them on a few datasets and post them here. We can them discuss if the info they give is actually helpful. Does it make sense or do you prefer to approach this issue in a different way? |
We would certainly be happy to have better way to visualize the ICEs. I think we can directly discuss a draft PR. I'm not sure what would be the right API to control the plotting. |
I was thinking about introducing the first idea as a new parameter, something like The scatter plot could be overlaid on the usual plot (similar to your example), while the histogram or box-plot could be displayed on a twinx axis behind the actual partial dependence plots. Alternatively, the histogram and box-plot could be plotted on a separate axis (directly below the standard axis), but I'm not fond of the idea of adding extra axes, as the current API assumes one axis per PDP plot. The Regarding the ICE percentiles, I believe adding a new argument seems artificial, as it would likely override the |
Hi @vitaliset and @glemaitre, it had been a while since any activity on this thread so I attempted to address the PDP improvements. |
Some discussions from a workshop on interpretation lead us to propose some potential improvements regarding the PDP and ICE plotting utilities.
Regarding PDP, it could be nice to get information regarding the distribution of the in-domain samples for a specific feature. Zhao and Hastie illustrate this in-domain values with a scatter plot in the following paper (p. 7). We potentially could provide such visualization or an alternative representation that could encode this information (e.g., histogram or box-plot). We should probably make some proposal and see what is best.
Regarding ICE, we currently offer the option
subsample
that will randomly subsample the available ICE. However, it seems to be more meaningful to build percentile from the available ICE lines instead of plotting the ICE lines. Optionally, we could usesubsample
to compute the different percentile.The text was updated successfully, but these errors were encountered: