-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data) #17377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I was wondering about this when I was working on sample weights on HGBT. So I'm all in :) If we have one cython version, then we could use it in other places such as HGBT as well. |
So I think that I am converging here (the tests with the NumPy are fine). Not sure that we actually need
I really rely on NumPy there. If we are fine internally in the HGBDT to use the |
|
||
if last_y_pred is not None: | ||
assert_array_almost_equal(last_y_pred, y_pred) | ||
|
||
assert_allclose(last_y_pred, y_pred) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucyleeow I added your test here. Using the strategy nearest
everywhere seems to be the winner here to be able to keep the sample_weight
semantic rights.
I merged a couple of things in this PR to be able to check the integration in the gradient boosting as well. If the test are passing I would propose to cut this PR into 3 parts:
|
So this is no longer for review? |
No, I think that I need to cut it into smaller pieces because the diff is too large. I don't want to make the reviewing process even harder. I will take care of it this week. |
See #17370 (comment) |
closes #17370
closes #6189
Add a new parameter to select the type of interpolation. By default, the median in NumPy and our implementation should be the same.