[MRG+1] Add sample_weight support to Dummy Regressor #3779

arjoly · 2014-10-16T14:25:10Z

No description provided.

arjoly · 2014-10-16T14:25:32Z

sklearn/dummy.py

@@ -389,25 +390,43 @@ def fit(self, X, y, sample_weight=None):
                             "'mean', 'median', 'quantile' or 'constant'"
                             % self.strategy)

-        y = check_array(y, accept_sparse='csr', ensure_2d=False)
+        y = check_array(y, ensure_2d=False)


I removed accept_sparse='csr' since it's not supported.

arjoly · 2014-10-16T14:26:47Z

This is ready for review and it will fix #3420

MechCoder · 2014-10-16T14:54:49Z

sklearn/dummy.py

-        check_consistent_length(X, y)
+        self.output_2d_ = y.ndim == 2
+        if y.ndim == 1:
+            y = np.reshape(y, (-1, 1))


Noob question: out of curiosity, is there any difference between doing this and

`y = y[:, np.newaxis]`

I always use the latter.

The y = y[:, np.newaxis] doesn't preserve contiguity. This is a known bug and will be solve in the current or next release of numpy.

oh yes, I remember now :)

MechCoder · 2014-10-16T15:40:18Z

sklearn/utils/stats.py

+    # Find index of median prediction for each sample
+    weight_cdf = sample_weight[sorted_idx].cumsum()
+    percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]
+    percentile_idx = percentile_or_above.argmax()


If I'm understanding right, these two lines can be replaced by

precentile_idx = np.searchsorted(weight_cdf, (percentile / 100.) * weight_cdf[-1])

or am I wrong?

Do you think this could be optimized in another pr? I have just taken what @pprett has done previously and put it there to be useful to more than just gradient boosting.

okay, unless @pprett thinks if it is ok, to change this over here.

MechCoder · 2014-10-16T15:53:30Z

@arjoly Done with my review. LGTM 👍 Would greatly appreciate it if you have the time to look at #3772 and give your comments.

MechCoder · 2014-10-17T09:32:43Z

@arjoly Updated the PR description.

arjoly · 2014-10-17T11:32:35Z

Thanks @MechCoder !

SaurabhJha · 2014-10-19T10:26:28Z

sklearn/utils/stats.py

+
+    # Find index of median prediction for each sample
+    weight_cdf = sample_weight[sorted_idx].cumsum()
+    percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]


Sorry for being stupid, but I am not able to get this to work. My arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight respectively. The sorted_idx is an array and thus throwing a TypeError. I wonder what are the expected arguments here.

sample_weight should be a numpy array

On Sun, Oct 19, 2014 at 12:26 PM, Saurabh Jha notifications@github.com
wrote:

In sklearn/utils/stats.py:

@@ -44,3 +44,16 @@ def _rankdata(a, method="average"):

except TypeError as e:
rankdata = _rankdata
+
+
+def _weighted_percentile(array, sample_weight, percentile=50):

"""Compute the weighted percentile of array with sample_weight. """

sorted_idx = np.argsort(array)

Find index of median prediction for each sample

weight_cdf = sample_weight[sorted_idx].cumsum()

percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]

Sorry for being stupid, but I am not able to get this to work. My
arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight
respectively. The sorted_idx is an array and thus throwing a TypeError. I
wonder what are the expected arguments here.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3779/files#r19059282.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

Thanks @MechCoder !

arjoly · 2014-10-21T13:37:48Z

A last reviewer ? ping @ogrisel, @pprett, @glouppe

arjoly · 2014-10-24T11:15:22Z

In the long term, I hope to replace the dummy estimator in gradient boosting by the dummy regressor and classifier. any last reviewer?

MechCoder · 2014-10-28T08:28:10Z

sklearn/tests/test_dummy.py

+def test_dummy_regressor_sample_weight(n_samples=10):
+    random_state = np.random.RandomState(seed=1)
+
+    X = [[0]] * n_samples


would it better to generate X randomly? Just for a sanity check.

MechCoder · 2014-10-28T08:30:41Z

I suppose you can go ahead and merge if noone replies in 2-3 days, or when you feel like. I believe none is following these changes and this is not that big a diff.

arjoly · 2014-10-31T12:11:33Z

Thanks @MechCoder ! I am ok to merge. Still I would appreciate a quick last review for this small pr.

MechCoder · 2014-11-04T08:49:50Z

I think this should go in.

[MRG+1] Add sample_weight support to Dummy Regressor

MechCoder · 2014-11-04T08:50:24Z

Thanks @arjoly .

arjoly · 2014-11-04T08:56:59Z

Thanks :-)

arjoly · 2014-11-04T08:58:36Z

@MechCoder Just to let you know for later, whenever you merge branch. It's better to do that by rebase to have a linear history. I have been told that it greatly eases maintenance and bug fix.

MechCoder · 2014-11-04T09:03:35Z

Thanks for the tip.!

arjoly reviewed Oct 16, 2014
View reviewed changes

arjoly mentioned this pull request Oct 16, 2014

Adding sample_weightsupport to DummyRegresor #3420

Closed

arjoly force-pushed the sw-dummy-regressor branch 2 times, most recently from 9e4f6d5 to de49e67 Compare October 16, 2014 14:41

Add sample_weight support to Dummy Regressor

da31344

arjoly force-pushed the sw-dummy-regressor branch from de49e67 to da31344 Compare October 16, 2014 14:41

MechCoder reviewed Oct 16, 2014
View reviewed changes

Use np.average instead of np.mean

e0f4cd9

MechCoder reviewed Oct 16, 2014
View reviewed changes

MechCoder changed the title ~~[MRG] Add sample_weight support to Dummy Regressor~~ [MRG+1] Add sample_weight support to Dummy Regressor Oct 17, 2014

SaurabhJha reviewed Oct 19, 2014
View reviewed changes

MechCoder reviewed Oct 28, 2014
View reviewed changes

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

MechCoder closed this Nov 4, 2014

MechCoder reopened this Nov 4, 2014

MechCoder added a commit that referenced this pull request Nov 4, 2014

Merge pull request #3779 from arjoly/sw-dummy-regressor

e6835a7

[MRG+1] Add sample_weight support to Dummy Regressor

MechCoder merged commit e6835a7 into scikit-learn:master Nov 4, 2014

arjoly deleted the sw-dummy-regressor branch November 4, 2014 09:17

arjoly mentioned this pull request Nov 4, 2014

[MRG] MAINT Deprecated redundant and specific class from gbrt module #3822

Closed

lorentzenchr mentioned this pull request Apr 10, 2020

[MRG] ENH: Add sample_weight to median_absolute_error #6217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] Add sample_weight support to Dummy Regressor #3779

[MRG+1] Add sample_weight support to Dummy Regressor #3779

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Find index of median prediction for each sample

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+1] Add sample_weight support to Dummy Regressor #3779

[MRG+1] Add sample_weight support to Dummy Regressor #3779

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Find index of median prediction for each sample

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!