8000 sample weight support for robust regression via weighted percentile algo by pprett · Pull Request #10 · pprett/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

sample weight support for robust regression via weighted percentile algo #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 17, 2014

Conversation

pprett
Copy link
Owner
@pprett pprett commented Sep 15, 2014

No description provided.

@pprett pprett force-pushed the gbrt-sample-weight-weighted-percentile branch from 33052ba to f07f8ad Compare September 15, 2014 09:32
@pprett
Copy link
Owner Author
pprett commented Sep 15, 2014

@arjoly @glouppe @ogrisel here is a branch that supports sample_weights also for robust regression. If you are fine with it I merge it into my sample_weight PR

@@ -50,6 +50,18 @@
from ._gradient_boosting import _random_sample_mask


def _weighted_percentile(arr, sample_weight, percentile=50):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this could go in utils.stats.

What do you think of working with quantile instead of percentile?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjoly what exactly do you propose? changing the name percentile to quantile and using fractions instead of 0-100 ?

I agree that would be nicer -- I did it like this to be consistent with scipy.stats.mstats.scoreatpercentile

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you proposes, I would rename the function.

@arjoly
Copy link
arjoly commented Sep 15, 2014

Do you have tests for this?

@arjoly
Copy link
arjoly commented Sep 15, 2014

@arjoly @glouppe @ogrisel here is a branch that supports sample_weights also for robust regression. If you are fine with it I merge it into my sample_weight PR

It's ok for me if you add support for this feature.

assert_raises(NotImplementedError, est.fit, boston.data, boston.target,
sample_weight=sample_weight)


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no test for the robust regression losses with sample weights, right?

@ogrisel
Copy link
ogrisel commented Sep 15, 2014

+1 for adding those features to the sample_weight PR but this need to be properly tested.

This would be interesting at some point to evaluate the use of models that support sample_weight to leverage co-variate shift corrections as implemented in http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction .

A new example covariate shift correction would be great. Although probably not the for GB w/ sample_weight PR itself.

@pprett
Copy link
Owner Author
pprett commented Sep 17, 2014

added robust regression tests for boston housing and some tests for weighted percentile

@pprett
Copy link
Owner Author
pprett commented Sep 17, 2014

@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this? One could use a checkerboard synthetic dataset where one changes P(x) (ie the probability that you draw an example from one of the checkerboard cells)

pprett added a commit that referenced this pull request Sep 17, 2014
…tile

sample weight support for robust regression via weighted percentile algo
@pprett pprett merged commit 8c1a95f into gbrt-sample-weight Sep 17, 2014
@pprett pprett deleted the gbrt-sample-weight-weighted-percentile branch September 17, 2014 09:41
@ogrisel
Copy link
ogrisel commented Sep 17, 2014

@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this?

We could use one of the existing datasets and create an artificial train / test split that introduces a shift. For instance we could use the Boston dataset and use in the test set samples with higher tax rate (the TAX feature) with a higher likelihood than in the training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0