sample weight support for robust regression via weighted percentile algo #10

pprett · 2014-09-15T09:31:14Z

No description provided.

pprett · 2014-09-15T09:33:35Z

@arjoly @glouppe @ogrisel here is a branch that supports sample_weights also for robust regression. If you are fine with it I merge it into my sample_weight PR

arjoly · 2014-09-15T15:15:25Z

sklearn/ensemble/gradient_boosting.py

@@ -50,6 +50,18 @@
 from ._gradient_boosting import _random_sample_mask


+def _weighted_percentile(arr, sample_weight, percentile=50):


maybe this could go in utils.stats.

What do you think of working with quantile instead of percentile?

@arjoly what exactly do you propose? changing the name percentile to quantile and using fractions instead of 0-100 ?

I agree that would be nicer -- I did it like this to be consistent with scipy.stats.mstats.scoreatpercentile

As you proposes, I would rename the function.

arjoly · 2014-09-15T15:17:40Z

Do you have tests for this?

arjoly · 2014-09-15T15:18:52Z

@arjoly @glouppe @ogrisel here is a branch that supports sample_weights also for robust regression. If you are fine with it I merge it into my sample_weight PR

It's ok for me if you add support for this feature.

ogrisel · 2014-09-15T17:56:50Z

sklearn/ensemble/tests/test_gradient_boosting.py

-        assert_raises(NotImplementedError, est.fit, boston.data, boston.target,
-                      sample_weight=sample_weight)
-
-


There is currently no test for the robust regression losses with sample weights, right?

ogrisel · 2014-09-15T18:00:15Z

+1 for adding those features to the sample_weight PR but this need to be properly tested.

This would be interesting at some point to evaluate the use of models that support sample_weight to leverage co-variate shift corrections as implemented in http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction .

A new example covariate shift correction would be great. Although probably not the for GB w/ sample_weight PR itself.

…adient

pprett · 2014-09-17T09:37:37Z

added robust regression tests for boston housing and some tests for weighted percentile

pprett · 2014-09-17T09:40:41Z

@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this? One could use a checkerboard synthetic dataset where one changes P(x) (ie the probability that you draw an example from one of the checkerboard cells)

…tile sample weight support for robust regression via weighted percentile algo

ogrisel · 2014-09-17T16:17:24Z

@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this?

We could use one of the existing datasets and create an artificial train / test split that introduces a shift. For instance we could use the Boston dataset and use in the test set samples with higher tax rate (the TAX feature) with a higher likelihood than in the training set.

sample weight support for robust regression via weighted percentile algo

f07f8ad

pprett force-pushed the gbrt-sample-weight-weighted-percentile branch from 33052ba to f07f8ad Compare September 15, 2014 09:32

arjoly reviewed Sep 15, 2014
View reviewed changes

ogrisel reviewed Sep 15, 2014
View reviewed changes

pprett added 3 commits September 17, 2014 09:49

fix: consider sample_weights in robost init estimator and negative_gr…

08e4048

…adient

fix: add **kwargs to Multinomial loss' negative_gradient

fb06bba

more tests for weighted percentile

fcb44f2

cosmit

9b1b79d

pprett added a commit that referenced this pull request Sep 17, 2014

Merge pull request #10 from pprett/gbrt-sample-weight-weighted-percen…

8c1a95f

…tile sample weight support for robust regression via weighted percentile algo

pprett merged commit 8c1a95f into gbrt-sample-weight Sep 17, 2014

pprett deleted the gbrt-sample-weight-weighted-percentile branch September 17, 2014 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample weight support for robust regression via weighted percentile algo #10

sample weight support for robust regression via weighted percentile algo #10

		@@ -50,6 +50,18 @@
		from ._gradient_boosting import _random_sample_mask


		def _weighted_percentile(arr, sample_weight, percentile=50):

		assert_raises(NotImplementedError, est.fit, boston.data, boston.target,
		sample_weight=sample_weight)

sample weight support for robust regression via weighted percentile algo #10

sample weight support for robust regression via weighted percentile algo #10

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment