8000 [MRG+2] Fix SVC predict_proba fails with new-style kernel strings by qmick · Pull Request #10412 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 13, 2018

Conversation

qmick
Copy link
Contributor
@qmick qmick commented Jan 6, 2018

Reference Issues/PRs

Fixes #10374. Closes #10338

What does this implement/fix? Explain your changes.

This fixes SVC predict_proba fails. As discussion in #10338, I change dense case to be like sparse case, that is passing int kernel_type instead of str kernel.

It is done by moving kernel_index = LIBSVM_KERNEL_TYPES.index(kernel) from cypthon code(svm/libsvm.pyx) to python code(svm/base.py)

Any other comments?

Q1: Since both predict() and decision_function() in svm/libsvm.pyx also use set_predict_params(), where kernel_index = LIBSVM_KERNEL_TYPES.index(kernel) locates in, I modify predict() and decision_function() for consistency. Is it what we want?

Q2: Should we make a change to fit() in svm/libsvm.pyx as well?

Q3: Both dense and sparse predict fail when kernel=b'linear' is given under Python3. In fact, only _dense_fit() handles this case. Should we handle this case?

NOTE: I reuse the test code from #10338, thanks @JoshuaMeyers for the great work!

@qmick qmick changed the title Fix SVC predict_proba fails with new-style kernel strings (#10374) [WIP] Fix SVC predict_proba fails with new-style kernel strings (#10374) Jan 6, 2018
@jnothman
Copy link
Member
jnothman commented Jan 8, 2018

Questions answered:

Q1: yes, modify and test all applicable methods.

Q2: I think fit was fixed in a previous patch

Q3: I don't think we should go out of our way to support 8000 bytes in Python 3. As opposed to the magic of interchange between unicode and str, Python 3 tries to make a clear delineation between str and bytes. We clearly don't need/want bytes here.

Please change WIP to MRG when you feel this is complete enough to be merged (pending review)

@qmick qmick changed the title [WIP] Fix SVC predict_proba fails with new-style kernel strings (#10374) [MRG] Fix SVC predict_proba fails with new-style kernel strings (#10374) Jan 8, 2018
@qmick
Copy link
Contributor Author
qmick commented Jan 8, 2018

Thanks for your answers, I have updated the title to MRG.

@@ -246,7 +246,7 @@ def fit(


cdef void set_predict_params(
svm_parameter *param, int svm_type, kernel, int degree, double gamma,
svm_parameter *param, int svm_type, int kernel_type, int degree, double gamma,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider this breaking a public interface, @amueller??

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the flake8 errors.
I believe we should also modify dense fit in the same way.

@jnothman
Copy link
Member
jnothman commented Jan 9, 2018 via email

@qmick
Copy link
Contributor Author
qmick commented Jan 10, 2018

@qinhanmin2014 The flake8 error is introduced in #7064. See

if six.PY2:
# Test unicode (same as str on python3)
clf = svm.SVC(kernel=unicode('linear'))

This test is designed to be run under Python2 and the name unicode is not defined under Python3.
May be we should silence flake8 at that line like this?

clf = svm.SVC(kernel=unicode('linear'),  # noqa: F821
                       probability=True)

@jnothman
Copy link
Member
jnothman commented Jan 10, 2018 via email

@qinhanmin2014
Copy link
Member

@jnothman

_dense_fit was handled in #7064

It is the case, but I might prefer to use an unified way to handle the problem (i.e. revert #7064 and apply the same way to dense fit). This might make the code easier to maintain (see my comment #10338 (comment)). I agree with amuller (#10338 (comment)) that current solution is better than #7064. WDYT? Thanks.

@qinhanmin2014
Copy link
Member

LGTM apart from the following concerns:
(1) I might still prefer to apply the same change to dense fit. Now dense/sparse predict, dense/sparse predict_proba, dense/sparse decision_funtion & sparse fit all use the same method to handle the problem. It seems strange for dense fit to use a different method (Also see #10412 (comment)).
(2) I suddenly find out that the functions in svm.libsvm are in doc/modules/classes.rst. So maybe we are breaking a public interface?
(3) Not sure if we should also change the interface of svm.libsvm.cross_validation from str kernel to int kernel_type.

@jnothman
Copy link
Member
jnothman commented Jan 11, 2018 via email

@qmick
Copy link
Contributor Author
qmick commented Jan 12, 2018

Not sure whether we should break the public interface. But if don't want to break the public interface, we have two ways to do that (as @qinhanmin2014 proposed in #10338):

(1) Remove type constraint of libsvm.predict_proba() parameter. That is change str kernel to kernel. Though changes the interface, it still has backward compatibility.

(2) Add string handling code to _dense_predict_proba(), just like what _dense_fit() has done.

@jnothman
Copy link
Member
jnothman commented Jan 12, 2018 via email

@qinhanmin2014
Copy link
Member

+1 for the solution from jnothman. This will make the dense case consistent, which seems enough from my side. @qmick Please also remove the type conversion in dense fit introduced in #7064 since we no longer need it.

@qmick
Copy link
Contributor Author
qmick commented Jan 12, 2018

I changed parameter str kernel to kernel in libsvm.fit() and libsvm.predict_proba(). I think the same change should be applied to libsvm.cross_validation() too. Or is there any complexity I can't see?

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. ping @jnothman

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Fix SVC predict_proba fails with new-style kernel strings (#10374) [MRG+1] Fix SVC predict_proba fails with new-style kernel strings Jan 12, 2018
@glemaitre
Copy link
Member

I think that I would change libsvm.cross_validation and remove str to be consistent with the other changes. I don't see why it would break some tests.

@glemaitre
Copy link
Member

otherwise LGTM

@qmick
Copy link
Contributor Author
qmick commented Jan 13, 2018

Thanks for the advice, updated.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM


# Test ascii bytes (same as str on python2)
clf = svm.SVC(kernel=bytes('linear', 'ascii'))
clf = svm.SVC(kernel=str('linear'), probability=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now the same as the case below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these lines are duplicated, I have removed them.

@jnothman jnothman changed the title [MRG+1] Fix SVC predict_proba fails with new-style kernel strings [MRG+2] Fix SVC predict_proba fails with new-style kernel strings Jan 13, 2018
@jnothman
Copy link
Member

Happy to merge when green

@qmick
Copy link
Contributor Author
qmick commented Jan 13, 2018

Thanks for your reviews!

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. merging. Thanks @qmick

@qinhanmin2014 qinhanmin2014 merged commit 74b69df into scikit-learn:master Jan 13, 2018
@jnothman
Copy link
Member

This did not include a change to what's new.

@qmick:
In a new pull request, please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@qmick
Copy link
Contributor Author
qmick commented Jan 14, 2018

Sorry, did't realize it before, what can I do to handle this problem now?

@qinhanmin2014
Copy link
Member

@qmick Please just open another PR.

jnothman pushed a commit that referenced this pull request Jan 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SVC predict_proba fails with new-style kernel strings (expected str, got newstr)
4 participants
0