-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412
Conversation
Questions answered: Q1: yes, modify and test all applicable methods. Q2: I think fit was fixed in a previous patch Q3: I don't think we should go out of our way to support 8000 bytes in Python 3. As opposed to the magic of interchange between unicode and str, Python 3 tries to make a clear delineation between str and bytes. We clearly don't need/want bytes here. Please change WIP to MRG when you feel this is complete enough to be merged (pending review) |
Thanks for your answers, I have updated the title to MRG. |
sklearn/svm/libsvm.pyx
Outdated
@@ -246,7 +246,7 @@ def fit( | |||
|
|||
|
|||
cdef void set_predict_params( | |||
svm_parameter *param, int svm_type, kernel, int degree, double gamma, | |||
svm_parameter *param, int svm_type, int kernel_type, int degree, double gamma, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider this breaking a public interface, @amueller??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the flake8 errors.
I believe we should also modify dense fit in the same way.
_dense_fit was handled in #7064
|
@qinhanmin2014 The flake8 error is introduced in #7064. See scikit-learn/sklearn/svm/tests/test_svm.py Lines 509 to 511 in 6fdcb3b
This test is designed to be run under Python2 and the name unicode is not defined under Python3.May be we should silence flake8 at that line like this? clf = svm.SVC(kernel=unicode('linear'), # noqa: F821
probability=True) |
I think u'linear' should work in place of unicode('linear') in all
supported pythons
…On 10 Jan 2018 2:46 pm, "Jiongyan Zhang" ***@***.***> wrote:
@qinhanmin2014 <https://github.com/qinhanmin2014> The flake8 error is
introduced in #7064
<#7064>. See
https://github.com/scikit-learn/scikit-learn/blob/
6fdcb3b/sklearn/svm/tests/
test_svm.py#L509-L511
This test is designed to be run under Python2 and the name unicode is not
defined under Python3.
May be we should silence flake8 at that line like this?
clf = svm.SVC(kernel=unicode('linear'), # noqa: F821
probability=True)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10412 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69uozsoz7AaqiecRJ2_ee3MhXGxRks5tJDKFgaJpZM4RVUeO>
.
|
It is the case, but I might prefer to use an unified way to handle the problem (i.e. revert #7064 and apply the same way to dense fit). This might make the code easier to maintain (see my comment #10338 (comment)). I agree with amuller (#10338 (comment)) that current solution is better than #7064. WDYT? Thanks. |
LGTM apart from the following concerns: |
yes, I suspect we are breaking a public interface ... even one that's
rarely used. So we are better off just being inconsistent and making it
work without breaking current code.
|
Not sure whether we should break the public interface. But if don't want to break the public interface, we have two ways to do that (as @qinhanmin2014 proposed in #10338): (1) Remove type constraint of (2) Add string handling code to |
Do (1). set_predict_params does already does not type kernel, so we lose
absolutely nothing by removing the type. This would also be consistent with
predict.
I think we should do the same in fit. I'm not entirely sure how cython
handles LIBSVM_KERNEL_TYPES.index, but since that list is not typed, I
expect it should use CPython's list.index and object matching, hence we're
likely already casting the str back to generic object there too.
…On 12 January 2018 at 12:44, Jiongyan Zhang ***@***.***> wrote:
If don't want to break the public interface, we have two ways to do that:
(1) Remove type constraint of libsvm.predict_proba() parameter. That is
change str kernel to kernel. Though changes the interface, it still has
backward compatibility.
(2) Add string handling code to _dense_predict_proba(), just like what
_dense_fit() has done.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10412 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz60u-ztcID9ubq8_wAQUrjRhkb7wuks5tJrkHgaJpZM4RVUeO>
.
|
I changed parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. ping @jnothman
I think that I would change |
otherwise LGTM |
Thanks for the advice, updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/svm/tests/test_svm.py
Outdated
|
||
# Test ascii bytes (same as str on python2) | ||
clf = svm.SVC(kernel=bytes('linear', 'ascii')) | ||
clf = svm.SVC(kernel=str('linear'), probability=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now the same as the case below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these lines are duplicated, I have removed them.
Happy to merge when green |
Thanks for your reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. merging. Thanks @qmick
This did not include a change to what's new. @qmick: |
Sorry, did't realize it before, what can I do to handle this problem now? |
@qmick Please just open another PR. |
Reference Issues/PRs
Fixes #10374. Closes #10338
What does this implement/fix? Explain your changes.
This fixes SVC predict_proba fails. As discussion in #10338, I change dense case to be like sparse case, that is passing int kernel_type instead of str kernel.
It is done by moving
kernel_index = LIBSVM_KERNEL_TYPES.index(kernel)
from cypthon code(svm/libsvm.pyx
) to python code(svm/base.py
)Any other comments?
Q1: Since both
predict()
anddecision_function()
insvm/libsvm.pyx
also useset_predict_params()
, wherekernel_index = LIBSVM_KERNEL_TYPES.index(kernel)
locates in, I modifypredict()
anddecision_function()
for consistency. Is it what we want?Q2: Should we make a change to
fit()
insvm/libsvm.pyx
as well?Q3: Both dense and sparse predict fail when
kernel=b'linear'
is given under Python3. In fact, only_dense_fit()
handles this case. Should we handle this case?NOTE: I reuse the test code from #10338, thanks @JoshuaMeyers for the great work!