8000 ENH Support sample weights in partial_dependence by vitaliset · Pull Request #25209 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Support sample weights in partial_dependence #25209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Jun 7, 2023

Conversation

vitaliset
Copy link
Contributor
@vitaliset vitaliset commented Dec 19, 2022

Reference Issues/PRs

Towards #24872. Once this PR is merged I will open another PR to add support for sample_weight in the PartialDependenceDisplay, as glemaitre pointed out during review.

As partial dependence of a model at a point is defined as an expectation, as #24872 points out, it should respect sample_weight if someone wishes to use it (for instance, when you know your X does not follow the distribution you are interested in).

Note: as discussed in #24872 (comment), the method='recursion' should not be able to use partial_dependence's sample_weight as it is calculating the average over the training data of the estimator (which can consider the training sample_weight for some types of algorithms as discussed on the original issue #24872 (comment)). This PR changes the calculation of the method='brute'.


Thanks in advance for the reviews! :D

@glemaitre glemaitre self-requested a review January 14, 2023 17:01
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a first pass of review for this feature. We will have to open another PR to add support for sample_weight in the PartialDependenceDisplay

@glemaitre
Copy link
Member

@vitaliset Would you have time to address the review comments?

@vitaliset
Copy link
Contributor Author

Hello @glemaitre! I'll do it on this weekend! Thanks for the detailed review!

vitaliset and others added 11 commits February 3, 2023 01:26
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@lorentzenchr
Copy link
Member

@vitaliset Could you try to get CI green again? This might help attract a 2nd reviewer.

Copy link
Contributor Author
@vitaliset vitaliset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the advice @lorentzenchr. I hadn't realized it was not green.
It seems to be just the "values" deprecation (the resolved conflict from my last commit). :)

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting back my +1 here.

vitaliset and others added 2 commits March 20, 2023 19:54
@vitaliset vitaliset requested a review from lorentzenchr March 21, 2023 03:23
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since my review, we added the parameter validation for this function. You need to validate sample_weight as an array-like or None to avoid the remaining failure.

@jeremiedbb
Copy link
Member

There are already 2 reviewers here so I just made a quick pass. Everything looks good. @lorentzenchr are you happy with the current state of the PR ?

Copy link
Member
@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
As a follow up, we should propagate sample weight support to PartialDependenceDisplay.from_stimator.

vitaliset and others added 4 commits June 7, 2023 11:07
@lorentzenchr
Copy link
Member

@vitaliset CI is green. Can I merge?

@vitaliset
Copy link
Contributor Author

Sure @lorentzenchr! By the way, I'll be working on PartialDependenceDisplay as soon as this is merged in main.

@lorentzenchr lorentzenchr merged commit ea9894a into scikit-learn:main Jun 7, 2023
@vitaliset
Copy link
Contributor Author

This is my first non-trivial merged PR to scikit-learn! 🥳

I am extremely grateful for the insightful comments provided by @glemaitre, @lorentzenchr, @jeremiedbb, and @NicolasHug during both review and issue discussions. Also, I would like to express my appreciation to @mayer79 for initiating the issue and allowing me the opportunity to work on it. Thank you so much folks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:inspection Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0