8000 [WIP] Adds support for sample weights in median_absolute_error by SaurabhJha · Pull Request #3784 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] Adds support for sample weights in median_absolute_error #3784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

SaurabhJha
Copy link
Contributor

Adds support for sample_weight in median_absolute_error. Issue #3450

@SaurabhJha
Copy link
Contributor Author

I am not able to find any out of the bag solution from numpy like it is in weighted mean.

@MechCoder
Copy link
Member

You can maybe reuse the _weighted_percentile function here https://github.com/scikit-learn/scikit-learn/pull/3779/files#diff-bfbd0a804887c264deff8746546526a9R49 for sample_weights support.

@MechCoder MechCoder closed this Oct 19, 2014
@MechCoder MechCoder reopened this Oct 19, 2014
@SaurabhJha
Copy link
Contributor Author

Thanks for bringing that issue to my attention @MechCoder

@@ -207,7 +234,12 @@ def median_absolute_error(y_true, y_pred):
1.0

"""
import ipdb; ipdb.set_trace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing test failures.

@MechCoder
Copy link
Member

I've change the title to WIP. Feel free to change it to MRG when you think you have completed the PR.

@MechCoder MechCoder changed the title Adds support for sa 8000 mple weights in median_absolute_error [WIP] Adds support for sample weights in median_absolute_error Oct 20, 2014
@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 553dc7d on SaurabhJha:3450 into 2b6a9fa on scikit-learn:master.

@@ -179,6 +180,7 @@ def mean_squared_error(y_true, y_pred, sample_weight=None):


def median_absolute_error(y_true, y_pred):
#def median_absolute_error(y_true, y_pred, sample_weight=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

@ogrisel
Copy link
Member
ogrisel commented Dec 8, 2014

Please also add a test.

Edit: actually this would already been tested by test_common.

@maniteja123
Copy link
Contributor

Median_absolute_error is still not supporting sample_weights. This PR seems to add sample_weights to the median_absolute_error. But the commit needs to be refactored a little. Any reason for not adding this ?

Thanks.

@MechCoder
Copy link
Member

no, go ahead and do it..

@maniteja123
Copy link
Contributor

@MechCoder Thanks will surely do it. I need a small help. Is it possible to continue with this PR by sending to this branch ?

@MechCoder
Copy link
Member

not unless you have write-access to this fork.

You can cherry-pick these commits onto your branch and send a new PR if you prefer.

@MechCoder
Copy link
Member

@maniteja123 Would you be able to wrap up work on this?

@maniteja123
Copy link
Contributor

Hi, sorry couldn't get into this and my college work has started. Will look into this now and get back with my doubts if any.

@maniteja123
Copy link
Contributor

The function _weighted_percentile is for a 1-D array (row vector) while the median_absolute_error converts them to column vector. Would it be okay to make them as row vectors and then pass to _weighted_percentile or is 8000 there any workaround here ?

I will use an example here :

a = np.array([1,2])
b = np.array([1,1])
>>>a
array([1, 2])
>>>_weighted_percentile(a, a)
2
>>>_weighted_percentile(a, b)
1
>>>c = a.reshape(2,1)
>>>d = b.reshape(2,1)
>>> c
array([[1],
       [2]])
>>>  _weighted_percentile(c, c)
array([[1]])
>>>  _weighted_percentile(c, d)
array([[1]])

Thanks and please let me know if I am understanding something wrong here.

@MechCoder
Copy link
Member

Yes, it's okay to ravel and pass them as row vectors.

@maniteja123
Copy link
Contributor

Sorry for the delay, but I have trivial doubt. Suppose the sample weights are same, should the value be the median when there is no sample_weight ?

>>>y_true = np.array([1,1])
>>>y_pred = np.array([2,3])
>>>median_absolute_error(y_true, y_pred)
1.5
>>>median_absolute_error(y_true, y_pred, [1,1])
1

This is the result, if the _weigthed_percentile function is used when sample_weight is given. Please let me know if this is expected since according to my understanding equal weights mean the normal median.

Also would it be better if I just create a PR first before to make it easy ?

@MechCoder
Copy link
Member

Yes, it should be the median of the absolute error when there is no sample weight provided.

@maniteja123
Copy link
Contributor

Thanks a lot for the reply. But my doubt was the result when the sample weights are all equal. Sorry if I wasn't clear.

@MechCoder
Copy link
Member

Ah, good point !

One work around for this would be to check if the sum of the sample_weights is even, and then handle this case specifically.

If it is not clear, please issue a PR and it would be better for me to comment there.

@MechCoder
Copy link
Member

The issue is slightly more complex, see #6189

For now, I would say let us write our own implementation independent of _weighted_percentile

@maniteja123
Copy link
Contributor

Oh this means there is a need for stricter version of weighted percentile. Could you elaborate on your suggested way to do it now ? Something like the linear interpolation method given in the Wikipedia link ? Thanks for patiently answering my doubts.

@MechCoder
Copy link
Member

Superseded by #6217

@MechCoder MechCoder closed this Jan 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0