-
Notifications
You must be signed in to change notification settings - Fork 2
sample weight support for robust regression via weighted percentile algo #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample weight support for robust regression via weighted percentile algo #10
Conversation
33052ba
to
f07f8ad
Compare
@@ -50,6 +50,18 @@ | |||
from ._gradient_boosting import _random_sample_mask | |||
|
|||
|
|||
def _weighted_percentile(arr, sample_weight, percentile=50): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this could go in utils.stats.
What do you think of working with quantile instead of percentile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjoly what exactly do you propose? changing the name percentile
to quantile
and using fractions instead of 0-100 ?
I agree that would be nicer -- I did it like this to be consistent with scipy.stats.mstats.scoreatpercentile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you proposes, I would rename the function.
Do you have tests for this? |
assert_raises(NotImplementedError, est.fit, boston.data, boston.target, | ||
sample_weight=sample_weight) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no test for the robust regression losses with sample weights, right?
+1 for adding those features to the sample_weight PR but this need to be properly tested. This would be interesting at some point to evaluate the use of models that support A new example covariate shift correction would be great. Although probably not the for GB w/ sample_weight PR itself. |
added robust regression tests for boston housing and some tests for weighted percentile |
@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this? One could use a checkerboard synthetic dataset where one changes P(x) (ie the probability that you draw an example from one of the checkerboard cells) |
…tile sample weight support for robust regression via weighted percentile algo
We could use one of the existing datasets and create an artificial train / test split that introduces a shift. For instance we could use the Boston dataset and use in the test set samples with higher tax rate (the TAX feature) with a higher likelihood than in the training set. |
No description provided.