8000 [MRG+1] Add sample_weight support to Dummy Regressor by arjoly · Pull Request #3779 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Add sample_weight support to Dummy Regressor #3779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 4, 2014

Conversation

arjoly
Copy link
Member
@arjoly arjoly commented Oct 16, 2014

No description provided.

@@ -389,25 +390,43 @@ def fit(self, X, y, sample_weight=None):
"'mean', 'median', 'quantile' or 'constant'"
% self.strategy)

y = check_array(y, accept_sparse='csr', ensure_2d=False)
y = check_array(y, ensure_2d=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed accept_sparse='csr' since it's not supported.

@arjoly
Copy link
Member Author
arjoly commented Oct 16, 2014

This is ready for review and it will fix #3420

@arjoly arjoly force-pushed the sw-dummy-regressor branch 2 times, most recently from 9e4f6d5 to de49e67 Compare October 16, 2014 14:41
@arjoly arjoly force-pushed the sw-dummy-regressor branch from de49e67 to da31344 Compare October 16, 2014 14:41
check_consistent_length(X, y)
self.output_2d_ = y.ndim == 2
if y.ndim == 1:
y = np.reshape(y, (-1, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question: out of curiosity, is there any difference between doing this and

`y = y[:, np.newaxis]`

I always use the latter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The y = y[:, np.newaxis] doesn't preserve contiguity. This is a known bug and will be solve in the current or next release of numpy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, I remember now :)

# Find index of median prediction for each sample
weight_cdf = sample_weight[sorted_idx].cumsum()
percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]
percentile_idx = percentile_or_above.argmax()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding right, these two lines can be replaced by

precentile_idx = np.searchsorted(weight_cdf, (percentile / 100.) * weight_cdf[-1])

or am I wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this could be optimized in another pr? I have just taken what @pprett has done previously and put it there to be useful to more than just gradient boosting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, unless @pprett thinks if it is ok, to change this over here.

@MechCoder
Copy link
Member

@arjoly Done with my review. LGTM 👍 Would greatly appreciate it if you have the time to look at #3772 and give your comments.

@MechCoder MechCoder changed the title [MRG] Add sample_weight support to Dummy Regressor [MRG+1] Add sample_weight support to Dummy Regressor Oct 17, 2014
@MechCoder
Copy link
Member

@arjoly Updated the PR description.

@arjoly
Copy link
Member Author
arjoly commented Oct 17, 2014

Thanks @MechCoder !


# Find index of median prediction for each sample
weight_cdf = sample_weight[sorted_idx].cumsum()
percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being stupid, but I am not able to get this to work. My arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight respectively. The sorted_idx is an array and thus throwing a TypeError. I wonder what are the expected arguments here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_weight should be a numpy array

On Sun, Oct 19, 2014 at 12:26 PM, Saurabh Jha notifications@github.com
wrote:

In sklearn/utils/stats.py:

@@ -44,3 +44,16 @@ def _rankdata(a, method="average"):

except TypeError as e:
rankdata = _rankdata
+
+
+def _weighted_percentile(array, sample_weight, percentile=50):

  • """Compute the weighted percentile of array with sample_weight. """
  • sorted_idx = np.argsort(array)
  • Find index of median prediction for each sample

  • weight_cdf = sample_weight[sorted_idx].cumsum()
  • percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]

Sorry for being stupid, but I am not able to get this to work. My
arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight
respectively. The sorted_idx is an array and thus throwing a TypeError. I
wonder what are the expected arguments here.


Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3779/files#r19059282.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MechCoder !

@arjoly
Copy link
Member Author
arjoly commented Oct 21, 2014

A last reviewer ? ping @ogrisel, @pprett, @glouppe

@arjoly
Copy link
Member Author
arjoly commented Oct 24, 2014

In the long term, I hope to replace the dummy estimator in gradient boosting by the dummy regressor and classifier. any last reviewer?

def test_dummy_regressor_sample_weight(n_samples=10):
random_state = np.random.RandomState(seed=1)

X = [[0]] * n_samples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it better to generate X randomly? Just for a sanity check.

@MechCoder
Copy link
Member

I suppose you can go ahead and merge if noone replies in 2-3 days, or when you feel like. I believe none is following these changes and this is not that big a diff.

@arjoly
Copy link
Member Author
arjoly commented Oct 31, 2014

Thanks @MechCoder ! I am ok to merge. Still I would appreciate a quick last review for this small pr.

@MechCoder
Copy link
Member

I think this should go in.

@MechCoder MechCoder closed this Nov 4, 2014
@MechCoder MechCoder reopened this Nov 4, 2014
MechCoder added a commit that referenced this pull request Nov 4, 2014
[MRG+1] Add sample_weight support to Dummy Regressor
@MechCoder MechCoder merged commit e6835a7 into scikit-learn:master Nov 4, 2014
@MechCoder
Copy link
Member

Thanks @arjoly .

@arjoly
Copy link
Member Author
arjoly commented Nov 4, 2014

Thanks :-)

@arjoly
Copy link
Member Author
arjoly commented Nov 4, 2014

@MechCoder Just to let you know for later, whenever you merge branch. It's better to do that by rebase to have a linear history. I have been told that it greatly eases maintenance and bug fix.

@MechCoder
Copy link
Member

Thanks for the tip.!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0