8000 cross_val_predict should work for sparse `y` · Issue #5132 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

cross_val_predict should work for sparse y #5132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Aug 18, 2015 · 7 comments
Closed

cross_val_predict should work for sparse y #5132

jnothman opened this issue Aug 18, 2015 · 7 comments
Labels
Bug Easy Well-defined and straightforward way to resolve

Comments

@jnothman
Copy link
Member

Currently it uses np.concatenate to merge predictions, but predictions could be sparse matrices.

@jnothman jnothman added Bug Easy Well-defined and straightforward way to resolve Need Contributor labels Aug 18, 2015
@dubstack
Copy link
Contributor

I would like to take this one up. Should I go ahead with this?

@jnothman
Copy link
Member Author

Sure, thanks

On 18 August 2015 at 20:53, Buddha Prakash notifications@github.com wrote:

I would like to take this one up. Should I go ahead with this?


Reply to this email directly or view it on GitHub
#5132 (comment)
.

@dubstack
Copy link
Contributor

Sorry for the late reply, I got caught up with some other work.

@jnothman Can you please clarify.
"predictions could be sparse matrices"

Can you provide an example dataset so that I can reproduce this case?
When is the returned variable "pred_blocks" in the sparse format.

@jnothman
Copy link
Member Author

Currently we use sparse format optionally as output of multilabel
classification. So OneVsRestClassifier is the most straightforward example.

On 24 August 2015 at 03:09, Buddha Prakash notifications@github.com wrote:

Sorry for the late reply, I got caught up with some other work.

@jnothman https://github.com/jnothman Can you please clarify.
"predictions could be sparse matrices"

Can you provide an example dataset so that I can reproduce this case?
When is the returned variable "pred_blocks" in the sparse format.


Reply to this email directly or view it on GitHub
#5132 (comment)
.

@jackzhang84
Copy link

Hey guys, I am also interested in taking a stab at this and am not sure either what the problem exactly is. It looks like the current implementation is able to handle sparse y. This is the code I ran:

import sklearn
from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_predict
x, y = make_multilabel_classification(n_classes = 7, n_labels = 4,
allow_unlabeled = True, return_indicator = True, random_state = 1)
classif = OneVsRestClassifier(SVC(kernel = 'linear'))
predicted = cross_val_predict(classif, x, y, cv = 10)

This does generate a reasonable cross-validated prediction of 'y' in a sparse form. I'm not sure if I am missing anything, and my apologies for being a newbie:)

Thanks

@jnothman
Copy link
Member Author

Running you snippet, both y and predicted are plain numpy arrays, not scipy
sparse martrices. use make_multilabel_classificat 8000 ion(..., sparse=True)

On 26 August 2015 at 06:00, jackzhang84 notifications@github.com wrote:

Hey guys, I am also interested in taking a stab at this and am not sure
either what the problem exactly is. It looks like the current
implementation is able to handle sparse y. This is the code I ran:

import sklearn
from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_predict
x, y = make_multilabel_classification(n_classes = 7, n_labels = 4,
allow_unlabeled = True, return_indicator = True, random_state = 1)
classif = OneVsRestClassifier(SVC(kernel = 'linear'))
predicted = cross_val_predict(classif, x, y, cv = 10)

This does generate a reasonable cross-validated prediction of 'y' in a
sparse form. I'm not sure if I am missing anything, and my apologies for
being a newbie:)

Thanks


Reply to this email directly or view it on GitHub
#5132 (comment)
.

@dubstack
Copy link
Contributor

Opened a pull request #5161
@jnothman Please have a look, I use scipy.sparse vstack for concatenating the sparse matrices.

@jackzhang84 The problem is that np.concatenate() doesn't support concatenation of sparse matrices and hence cross_val_predict gives an error for sparse prediction matrix.

@ogrisel ogrisel closed this as completed Aug 27, 2015
ogrisel added a commit that referenced this issue Aug 27, 2015
[MRG+1] Add check for sparse prediction in cross_val_predict (fixes #5132)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve
Projects
None yet
Development

No branches or pull requests

4 participants
0