-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
SVC(kernel="linear") for dense and sparse matrices differ significantly [renamed by amueller] #1476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
SVC(kernel="linear") [[242 0]
avg / total 0.79 0.70 0.58 348 [[242 0]
avg / total 0.48 0.70 0.57 348 [[242 0]
avg / total 0.79 0.70 0.58 347 [[242 0]
avg / total 0.79 0.70 0.58 347 [[241 0]
avg / total 0.79 0.70 0.58 347 LinearSVC() [[209 33]
avg / total 0.75 0.76 0.75 348 [[210 32]
avg / total 0.76 0.76 0.76 348 [[212 30]
avg / total 0.75 0.76 0.75 347 [[212 30]
avg / total 0.75 0.76 0.75 347 [[209 32]
avg / total 0.79 0.79 0.79 347 |
There are several reasons why the results might differ: You are saying the results of the LibSVM executable are different from the results of SVC? That would be rather surprising to me. Are you using exactly the same parameters? |
I am not sure how to use the same settings for both sklearn and LibSVm, but |
And squared hinge loss vs hinge loss. |
One more bug I introduced it seems :( |
Setting it (class_weight={1:0.5, -1:0.5}) explicitly makes no differences. And FYI, I test it on 0.10, 0.11 and 0.12, the results all stay the same. |
I am also very confused. So you might want to try the gist yourself :) |
That is weird. I'll try to have a look in a couple of minutes. |
I have no idea what is happening here.... |
This looks pretty serious to me.... Not sure I have the energy to do look more into it today, though. I have to install libsvm on this box first. Things that could be different are the tol and the C parameter, though that doesn't seem to be the issue here. I can't get |
Normalization of features? |
@GaelVaroquaux I also thought about that, but that shouldn't make the result different between the libsvm binary and SVC, should it? |
Unless the libsvm binary does prescaling. |
AFAIK, libsvm does not do scaling automatically when using the |
This seems to be a serious bug in the sparse SVM. I ran the LibSVM binary with a linear kernel. The defaults are exactly the same as ours. |
Try lower the tol and increase the scale_intercept |
shouldn't the sparse and dense version with the same parameters give the same results? |
changing tol to 1e-6 doesn't change the result. I think something is wrong with the fitting, as already the number of support vectors vary substantially between the dense and sparse versions. |
I get the same behavior already in 0.10, so it is not a consequence of merging of sparse and dense SVC. |
btw @agramfort the binary is the default sparse implementation afaik. So it should really be identical with our sparse implementation. |
@fabianp any idea where to start? |
Hi, I think I have a clue. It might be caused by the change of layout of the csr matrix, which is passed to the libsvm implementation. Take a look at the following test script. from sklearn.datasets import load_svmlight_file
import scipy.sparse as sp
import numpy as np
X, y = load_svmlight_file("ntcir.en.vec")
clf = SVC(kernel="linear")
print "#plain linear SVM"
clf.fit(X, y)
print clf.n_support_
print "#retrieve all rows of X, and train again, now the trained model are quite differnt!"
X_slice = X[range(X.shape[0])]
clf.fit(X_slice, y)
print clf.n_support_
print "# convert to dense and then convert back to sparse, and train again, we get back the first model."
X_slice2 = sp.csr_matrix(X_slice.todense())
clf.fit(X_slice2, y)
print clf.n_support_
print "#note that after converting to dense, they are identical"
print np.alltrue(X.todense() == X_slice.todense())
print "#but they have different layout!"
print X.tocoo().col
print X_slice.tocoo().col The output is like this: #plain linear SVM The problem is that after we "slice" the |
Thank you so much for investigating. Hopefully I will have time for it tonight. |
Ok, so we need to call |
Posted a fix in #1587. |
Awesome work, team! |
Regarding CSR (or CSC) matrices with unsorted indices, I think that estimators written in pure Python should be unaffected. Estimators such as Lasso written in Cython but that relie on sparse-dense dot products should be unaffected too (to be checked). SVC was affected because the sparse-sparse dot product implementation used in libsvm (used to compute kernels) assumes the indices are sorted. I'm not sure if we have other native pieces of code for sparse-sparse stuff in the scikit. |
Pushed fix as c75dd39. |
I compare the performance between SVC(kernel="linear") and LinearSVC. The difference is very large. LinearSVC is far better than SVC(kernel="linear"). The code is in https://gist.github.com/4294378. I also test with the original LibSVM toolkit; its performance is close to LinearSVC under 5-fold cross validation.
I don't think it is expected from a user's perspective.
The text was updated successfully, but these errors were encountered: