8000 SVC(kernel="linear") for dense and sparse matrices differ significantly [renamed by amueller] · Issue #1476 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

SVC(kernel="linear") for dense and sparse matrices differ significantly [renamed by amueller] #1476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fannix opened this issue Dec 15, 2012 · 29 comments
Labels
Milestone

Comments

@fannix
Copy link
Contributor
fannix commented Dec 15, 2012

I compare the performance between SVC(kernel="linear") and LinearSVC. The difference is very large. LinearSVC is far better than SVC(kernel="linear"). The code is in https://gist.github.com/4294378. I also test with the original LibSVM toolkit; its performance is close to LinearSVC under 5-fold cross validation.

I don't think it is expected from a user's perspective.

@fannix
Copy link
Contributor Author
fannix commented Dec 15, 2012

SVC(kernel="linear")

[[242 0]
[105 1]]
precision recall f1-score support

     -1       0.70      1.00      0.82       242
      1       1.00      0.01      0.02       106

avg / total 0.79 0.70 0.58 348

[[242 0]
[106 0]]
precision recall f1-score support

     -1       0.70      1.00      0.82       242
      1       0.00      0.00      0.00       106

avg / total 0.48 0.70 0.57 348

[[242 0]
[104 1]]
precision recall f1-score support

     -1       0.70      1.00      0.82       242
      1       1.00      0.01      0.02       105

avg / total 0.79 0.70 0.58 347

[[242 0]
[104 1]]
precision recall f1-score support

     -1       0.70      1.00      0.82       242
      1       1.00      0.01      0.02       105

avg / total 0.79 0.70 0.58 347

[[241 0]
[105 1]]
precision recall f1-score support

     -1       0.70      1.00      0.82       241
      1       1.00      0.01      0.02       106

avg / total 0.79 0.70 0.58 347


LinearSVC()

[[209 33]
[ 52 54]]
precision recall f1-score support

     -1       0.80      0.86      0.83       242
      1       0.62      0.51      0.56       106

avg / total 0.75 0.76 0.75 348

[[210 32]
[ 50 56]]
precision recall f1-score support

     -1       0.81      0.87      0.84       242
      1       0.64      0.53      0.58       106

avg / total 0.76 0.76 0.76 348

[[212 30]
[ 52 53]]
precision recall f1-score support

     -1       0.80      0.88      0.84       242
      1       0.64      0.50      0.56       105

avg / total 0.75 0.76 0.75 347

[[212 30]
[ 53 52]]
precision recall f1-score support

     -1       0.80      0.88      0.84       242
      1       0.63      0.50      0.56       105

avg / total 0.75 0.76 0.75 347

[[209 32]
[ 41 65]]
precision recall f1-score support

     -1       0.84      0.87      0.85       241
      1       0.67      0.61      0.64       106

avg / total 0.79 0.79 0.79 347

@amueller
Copy link
Member

There are several reasons why the results might differ:
Different solvers, different tolerance parameters, one is one-vs-rest, the other one-vs-all.
Also, there are different penalizations of the slacks. This issue came up multiple times and don't really have to do with scikit-learn.
Maybe we should add a warning to users to the docs.

You are saying the results of the LibSVM executable are different from the results of SVC? That would be rather surprising to me. Are you using exactly the same parameters?

@fannix
Copy link
Contributor Author
fannix commented Dec 15, 2012

I am not sure how to use the same settings for both sklearn and LibSVm, but ./svm-train -v 5 -t 0 ntcir.en.vec give 74.5%. The difference is large.

@GaelVaroquaux
Copy link
Member

Different solvers, different tolerance parameters, one is one-vs-rest, the
other one-vs-all.

And squared hinge loss vs hinge loss.

@amueller
Copy link
Member

Ha, I think I know the problem. This might be #1411. Could you please set the class weight explicitly( i.e. not to "auto" as I think is the default?).
If this is the problem, #1411 is even worse than I thought.

@amueller
Copy link
Member

One more bug I introduced it seems :(

@fannix
Copy link
Contributor Author
fannix commented Dec 15, 2012

Setting it (class_weight={1:0.5, -1:0.5}) explicitly makes no differences. And FYI, I test it on 0.10, 0.11 and 0.12, the results all stay the same.

@fannix
Copy link
Contributor Author
fannix commented Dec 15, 2012

I am also very confused. So you might want to try the gist yourself :)

@amueller
Copy link
Member

That is weird. I'll try to have a look in a couple of minutes.

@amueller
Copy link
Member

I have no idea what is happening here....

@amueller
Copy link
Member

This looks pretty serious to me.... Not sure I have the energy to do look more into it today, though. I have to install libsvm on this box first.

Things that could be different are the tol and the C parameter, though that doesn't seem to be the issue here. I can't get SVC to produce any reasonable predictions using master.

@GaelVaroquaux
Copy link
Member

Normalization of features?

@amueller
Copy link
Member

@GaelVaroquaux I also thought about that, but that shouldn't make the result different between the libsvm binary and SVC, should it?

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux I also thought about that, but that shouldn't make the result
different between the libsvm binary and SVC, should it?

Unless the libsvm binary does prescaling.

@ogrisel
Copy link
Member
ogrisel commented Dec 16, 2012

AFAIK, libsvm does not do scaling automatically when using the svm-train executable. You have to use svm-scale on the dataset as an explicit, separate preprocessing step.

@amueller
Copy link
Member

This seems to be a serious bug in the sparse SVM.
I could reproduce the problem.

I ran the LibSVM binary with a linear kernel. The defaults are exactly the same as ours.
The LibSVM results are the same as when running our wrappers on dense arrays.
Feeding sparse arrays gives a completely different, much worse result.

@agramfort
Copy link
Member

Try lower the tol and increase the scale_intercept

@amueller
Copy link
Member

shouldn't the sparse and dense version with the same parameters give the same results?

@amueller
Copy link
Member

changing tol to 1e-6 doesn't change the result. I think something is wrong with the fitting, as already the number of support vectors vary substantially between the dense and sparse versions.

@amueller
Copy link
Member

I get the same behavior already in 0.10, so it is not a consequence of merging of sparse and dense SVC.

@amueller
Copy link
Member

btw @agramfort the binary is the default sparse implementation afaik. So it should really be identical with our sparse implementation.

@amueller
Copy link
Member

@fabianp any idea where to start?

@fannix
Copy link
Contributor Author
fannix commented Jan 16, 2013

Hi, I think I have a clue. It might be caused by the change of layout of the csr matrix, which is passed to the libsvm implementation. Take a look at the following test script.

from sklearn.datasets import load_svmlight_file
import scipy.sparse as sp
import numpy as np

X, y = load_svmlight_file("ntcir.en.vec")

clf = SVC(kernel="linear")

print "#plain linear SVM"
clf.fit(X, y)
print clf.n_support_

print "#retrieve all rows of X, and train again, now the trained model are quite differnt!"
X_slice = X[range(X.shape[0])]
clf.fit(X_slice, y)
print clf.n_support_

print "# convert to dense and then convert back to sparse, and train again, we get back the first model."
X_slice2 = sp.csr_matrix(X_slice.todense())
clf.fit(X_slice2, y)
print clf.n_support_

print "#note that after converting to dense, they are identical"
print np.alltrue(X.todense() == X_slice.todense())

print "#but they have different layout!"
print X.tocoo().col
print X_slice.tocoo().col

The output is like this:

#plain linear SVM
[748 433]
#retrieve all rows of X, and train again, now the trained model are quite differnt!
[1207 525]
#convert to dense and then convert back to sparse, and train again, we get back the first model.
[748 433]
#note that after converting to dense, they are identical
True
#but they have different layout!
[ 1 2 3 ..., 5223 6830 7187]
[20 19 18 ..., 20 17 16]

The problem is that after we "slice" the X, the layout is changed; After slicing, the order of column index is reversed. I suspect some codes assume that this order is always increasing, hence causing the bug here.

@amueller
Copy link
Member

Thank you so much for investigating. Hopefully I will have time for it tonight.

@amueller
Copy link
Member

Ok, so we need to call X.sort_indices() in the SVM wrappers. Will do a PR with test now.
@fannix thank you very much for working on this, I'd have never found it!

@amueller
Copy link
Member

Posted a fix in #1587.

@GaelVaroquaux
Copy link
Member

Posted a fix in #1587.

Awesome work, team!

@mblondel
Copy link
Member

Regarding CSR (or CSC) matrices with unsorted indices, I think that estimators written in pure Python should be unaffected. Estimators such as Lasso written in Cython but that relie on sparse-dense dot products should be unaffected too (to be checked). SVC was affected because the sparse-sparse dot product implementation used in libsvm (used to compute kernels) assumes the indices are sorted. I'm not sure if we have other native pieces of code for sparse-sparse stuff in the scikit.

@larsmans
Copy link
Member

Pushed fix as c75dd39.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
0