8000 precision_recall_curve is not as I would expect · Issue #9359 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

precision_recall_curve is not as I would expect #9359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chananshgong opened this issue Jul 14, 2017 · 5 comments
Closed

precision_recall_curve is not as I would expect #9359

chananshgong opened this issue Jul 14, 2017 · 5 comments

Comments

@chananshgong
Copy link
chananshgong commented Jul 14, 2017

Description

precision_recall_curves return values which are not correct.

Steps/Code to Reproduce

Example:
compare the answer here https://stats.stackexchange.com/questions/183504/are-precision-and-recall-supposed-to-be-monotonic-to-classification-threshold to what sklearn returns

import numpy as np
from sklearn import metrics
labels = np.array([False,True,False,True,True,True,False,False])
scores = np.linspace(0,1, len(labels))
pr, rc, th = metrics.precision_recall_curve(y_true=labels, probas_pred=scores,pos_label=True)
plt.plot(rc, pr,'o-',label='sklearn')
pr = np.cumsum(labels)/np.arange(1,len(labels)+1)
rc = np.cumsum(labels)/np.sum(labels)
pr[0]=0
plt.plot(rc,pr,'.-',label='mine')
plt.legend()
labels.mean()

default

Expected Results

rc = array([ 0.  ,  0.25,  0.25,  0.5 ,  0.75,  1.  ,  1.  ,  1.  ])
pr = array([ 0.        ,  0.5       ,  0.33333333,  0.5       ,  0.6       ,        0.66666667,  0.57142857,  0.5       ])

Actual Results

rc= array([ 1.  ,  0.75,  0.75,  0.5 ,  0.25,  0.  ,  0.  ,  0.  ])
pr=array([ 0.57142857,  0.5       ,  0.6       ,  0.5       ,  0.33333333,         0.        ,  0.        ,  1.        ])

Versions

Windows-10-10.0.14393-SP0
Python 3.5.3 |Anaconda custom (64-bit)| (default, Feb 22 2017, 21:28:42) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1

@amueller
Copy link
Member

can you check on master please?

@amueller
Copy link
Member

possibly related to #6265

@jnothman
Copy link
Member

Could you please reverse your expected results so it's easier to compare numerically? Thanks

@qinhanmin2014
Copy link
Member

@chananshgong I think there is a mistake in your code snippet:
scores = np.linspace(0,1, len(labels))
should be
scores = np.linspace(1,0, len(labels))
After changing, the two curves are the same.
index
Note that scikit-learn provides an additional point(The last precision and recall values are 1. and 0.), which has been discussed in #4223.

@qinhanmin2014
Copy link
Member

@chananshgong Closing this one since there's something wrong with your code. See the previous comment. Feel free to reopen if you disagree :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0