10000 Pipeline score broken for unsupervised algorithms · Issue #4063 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Pipeline score broken for unsupervised algorithms #4063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmaher86 opened this issue Jan 8, 2015 · 6 comments · Fixed by #4064
Closed

Pipeline score broken for unsupervised algorithms #4063

pmaher86 opened this issue Jan 8, 2015 · 6 comments · Fixed by #4064
Labels
Milestone

Comments

@pmaher86
Copy link
pmaher86 commented Jan 8, 2015

The score method for pipeline assigns y=None and always passes it to the score method of the final estimator, but some estimators (for instance KMeans) take only X. Is there a reason to use y=None instead of *args?

Reproducible by

import numpy as np,sklearn
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans
X=np.random.rand(100,2)
p=Pipeline([('clf',KMeans(n_clusters=3))])
p.fit(X)
p.score(X)
@jnothman
Copy link
Member
jnothman commented Jan 8, 2015

Because explicit is better than implicit. You're right, this is broken, but
the fix is just to call with y only if y is not None.

On 8 January 2015 at 11:30, pmaher86 notifications@github.com wrote:

The score method for pipeline assigns y=None and always passes it to the
score method of the final estimator, but some estimators (for instance
KMeans) take only X. Is there a reason to use y=None instead of *args?

Reproducible by

import numpy as np,sklearnfrom sklearn.pipeline import Pipelinefrom sklearn.cluster import KMeans
X=np.random.rand(100,2)
p=Pipeline([('clf',KMeans(n_clusters=3))])
p.fit(X)
p.score(X)


Reply to this email directly or view it on GitHub
#4063.

@jnothman
Copy link
Member
jnothman commented Jan 8, 2015

(And it's not really unsupervised algorithms that are the problem so much
as "internal evaluation" metrics)

On 8 January 2015 at 11:50, Joel Nothman joel.nothman@gmail.com wrote:

Because explicit is better than implicit. You're right, this is broken,
but the fix is just to call with y only if y is not None.

On 8 January 2015 at 11:30, pmaher86 notifications@github.com wrote:

The score method for pipeline assigns y=None and always passes it to
the score method of the final estimator, but some estimators (for instance
KMeans) take only X. Is there a reason to use y=None instead of *args?

Reproducible by

import numpy as np,sklearnfrom sklearn.pipeline import Pipelinefrom sklearn.cluster import KMeans
X=np.random.rand(100,2)
p=Pipeline([('clf',KMeans(n_clusters=3))])
p.fit(X)
p.score(X)


Reply to this email directly or view it on GitHub
#4063.

@amueller
Copy link
Member
amueller commented Jan 8, 2015

I'm surprised, pretty sure I added a regression test on that one :-/

@amueller amueller added the Bug label Jan 8, 2015
@amueller
Copy link
Member
amueller commented Jan 8, 2015

fit already takes a y=None parameter, so I guess score should take that, too, to be consistent?

@amueller
Copy link
Member
amueller commented Jan 8, 2015

PCA already has that btw.... it seems to be a shortcoming of KMeans.... smells like a missing common test. Testing that score can take y=None should be sufficient, right?

@amueller
Copy link
Member
amueller commented Jan 8, 2015

See #4064.

@amueller amueller added this to the 0.16 milestone Jan 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants
0