[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

qmick · 2018-01-06T14:04:31Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This fixes SVC predict_proba fails. As discussion in #10338, I change dense case to be like sparse case, that is passing int kernel_type instead of str kernel.

It is done by moving kernel_index = LIBSVM_KERNEL_TYPES.index(kernel) from cypthon code(svm/libsvm.pyx) to python code(svm/base.py)

Any other comments?

Q1: Since both predict() and decision_function() in svm/libsvm.pyx also use set_predict_params(), where kernel_index = LIBSVM_KERNEL_TYPES.index(kernel) locates in, I modify predict() and decision_function() for consistency. Is it what we want?

Q2: Should we make a change to fit() in svm/libsvm.pyx as well?

Q3: Both dense and sparse predict fail when kernel=b'linear' is given under Python3. In fact, only _dense_fit() handles this case. Should we handle this case?

NOTE: I reuse the test code from #10338, thanks @JoshuaMeyers for the great work!

jnothman · 2018-01-08T01:11:13Z

Questions answered:

Q1: yes, modify and test all applicable methods.

Q2: I think fit was fixed in a previous patch

Q3: I don't think we should go out of our way to support 8000 bytes in Python 3. As opposed to the magic of interchange between unicode and str, Python 3 tries to make a clear delineation between str and bytes. We clearly don't need/want bytes here.

Please change WIP to MRG when you feel this is complete enough to be merged (pending review)

qmick · 2018-01-08T01:38:40Z

Thanks for your answers, I have updated the title to MRG.

jnothman · 2018-01-08T07:13:00Z

sklearn/svm/libsvm.pyx

@@ -246,7 +246,7 @@ def fit(


 cdef void set_predict_params(
-    svm_parameter *param, int svm_type, kernel, int degree, double gamma,
+    svm_parameter *param, int svm_type, int kernel_type, int degree, double gamma,


Should we consider this breaking a public interface, @amueller??

qinhanmin2014

Please fix the flake8 errors.
I believe we should also modify dense fit in the same way.

jnothman · 2018-01-09T22:07:57Z

_dense_fit was handled in #7064

qmick · 2018-01-10T03:46:11Z

@qinhanmin2014 The flake8 error is introduced in #7064. See

scikit-learn/sklearn/svm/tests/test_svm.py

Lines 509 to 511 in 6fdcb3b

    
           if six.PY2: 
        
               # Test unicode (same as str on python3) 
        
               clf = svm.SVC(kernel=unicode('linear'))

This test is designed to be run under Python2 and the name unicode is not defined under Python3.
May be we should silence flake8 at that line like this?

clf = svm.SVC(kernel=unicode('linear'),  # noqa: F821
                       probability=True)

jnothman · 2018-01-10T04:04:13Z

I think u'linear' should work in place of unicode('linear') in all supported pythons

…

On 10 Jan 2018 2:46 pm, "Jiongyan Zhang" ***@***.***> wrote: @qinhanmin2014 <https://github.com/qinhanmin2014> The flake8 error is introduced in #7064 <#7064>. See https://github.com/scikit-learn/scikit-learn/blob/ 6fdcb3b/sklearn/svm/tests/ test_svm.py#L509-L511 This test is designed to be run under Python2 and the name unicode is not defined under Python3. May be we should silence flake8 at that line like this? clf = svm.SVC(kernel=unicode('linear'), # noqa: F821 probability=True) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10412 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69uozsoz7AaqiecRJ2_ee3MhXGxRks5tJDKFgaJpZM4RVUeO> .

qinhanmin2014 · 2018-01-10T04:23:44Z

@jnothman

_dense_fit was handled in #7064

It is the case, but I might prefer to use an unified way to handle the problem (i.e. revert #7064 and apply the same way to dense fit). This might make the code easier to maintain (see my comment #10338 (comment)). I agree with amuller (#10338 (comment)) that current solution is better than #7064. WDYT? Thanks.

qinhanmin2014 · 2018-01-11T13:08:16Z

LGTM apart from the following concerns:
(1) I might still prefer to apply the same change to dense fit. Now dense/sparse predict, dense/sparse predict_proba, dense/sparse decision_funtion & sparse fit all use the same method to handle the problem. It seems strange for dense fit to use a different method (Also see #10412 (comment)).
(2) I suddenly find out that the functions in svm.libsvm are in doc/modules/classes.rst. So maybe we are breaking a public interface?
(3) Not sure if we should also change the interface of svm.libsvm.cross_validation from str kernel to int kernel_type.

jnothman · 2018-01-11T21:15:26Z

yes, I suspect we are breaking a public interface ... even one that's rarely used. So we are better off just being inconsistent and making it work without breaking current code.

qmick · 2018-01-12T01:44:38Z

Not sure whether we should break the public interface. But if don't want to break the public interface, we have two ways to do that (as @qinhanmin2014 proposed in #10338):

(1) Remove type constraint of libsvm.predict_proba() parameter. That is change str kernel to kernel. Though changes the interface, it still has backward compatibility.

(2) Add string handling code to _dense_predict_proba(), just like what _dense_fit() has done.

jnothman · 2018-01-12T02:27:15Z

Do (1). set_predict_params does already does not type kernel, so we lose absolutely nothing by removing the type. This would also be consistent with predict. I think we should do the same in fit. I'm not entirely sure how cython handles LIBSVM_KERNEL_TYPES.index, but since that list is not typed, I expect it should use CPython's list.index and object matching, hence we're likely already casting the str back to generic object there too.

…

On 12 January 2018 at 12:44, Jiongyan Zhang ***@***.***> wrote: If don't want to break the public interface, we have two ways to do that: (1) Remove type constraint of libsvm.predict_proba() parameter. That is change str kernel to kernel. Though changes the interface, it still has backward compatibility. (2) Add string handling code to _dense_predict_proba(), just like what _dense_fit() has done. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10412 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60u-ztcID9ubq8_wAQUrjRhkb7wuks5tJrkHgaJpZM4RVUeO> .

qinhanmin2014 · 2018-01-12T03:36:51Z

+1 for the solution from jnothman. This will make the dense case consistent, which seems enough from my side. @qmick Please also remove the type conversion in dense fit introduced in #7064 since we no longer need it.

…proba()

qmick · 2018-01-12T08:03:35Z

I changed parameter str kernel to kernel in libsvm.fit() and libsvm.predict_proba(). I think the same change should be applied to libsvm.cross_validation() too. Or is there any complexity I can't see?

qinhanmin2014

LGTM. ping @jnothman

glemaitre · 2018-01-12T16:39:26Z

I think that I would change libsvm.cross_validation and remove str to be consistent with the other changes. I don't see why it would break some tests.

glemaitre · 2018-01-12T16:39:38Z

otherwise LGTM

qmick · 2018-01-13T03:40:50Z

Thanks for the advice, updated.

jnothman

Otherwise LGTM

jnothman · 2018-01-13T11:53:57Z

sklearn/svm/tests/test_svm.py

-
-        # Test ascii bytes (same as str on python2)
-        clf = svm.SVC(kernel=bytes('linear', 'ascii'))
+        clf = svm.SVC(kernel=str('linear'), probability=True)


I think this is now the same as the case below

Yes, these lines are duplicated, I have removed them.

jnothman · 2018-01-13T12:37:19Z

Happy to merge when green

qmick · 2018-01-13T13:13:15Z

Thanks for your reviews!

qinhanmin2014

LGTM. merging. Thanks @qmick

jnothman · 2018-01-14T03:01:20Z

This did not include a change to what's new.

@qmick:
In a new pull request, please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

qmick · 2018-01-14T03:35:46Z

Sorry, did't realize it before, what can I do to handle this problem now?

qinhanmin2014 · 2018-01-14T03:44:11Z

@qmick Please just open another PR.

qmick added 4 commits January 6, 2018 20:05

Use int kernel_type in dense svm case instead of str kernel

4d5b88b

Add test cases

18dd4a6

Fix docstring and default value of kernel_type

5c4d9d9

Improve format

79466b7

qmick changed the title ~~Fix SVC predict_proba fails with new-style kernel strings (#10374)~~ [WIP] Fix SVC predict_proba fails with new-style kernel strings (#10374) Jan 6, 2018

Fix default arguments

5a08011

Fix kernel list order in docstrings

1f25c40

qmick changed the title ~~[WIP] Fix SVC predict_proba fails with new-style kernel strings (#10374)~~ [MRG] Fix SVC predict_proba fails with new-style kernel strings (#10374) Jan 8, 2018

jnothman reviewed Jan 8, 2018

View reviewed changes

JoshuaMeyers mentioned this pull request Jan 8, 2018

added unicode catch to predict_proba function in SVC #10338

Closed

qinhanmin2014 reviewed Jan 9, 2018

View reviewed changes

Fix flake8 errors

9aeea45

qmick added 2 commits January 12, 2018 15:06

Revert to master

333eb1c

Remove type constraints of kernel of libsvm.fit() and libsvm.predict_…

24342f2

…proba()

qinhanmin2014 approved these changes Jan 12, 2018

View reviewed changes

qinhanmin2014 changed the title ~~[MRG] Fix SVC predict_proba fails with new-style kernel strings (#10374)~~ [MRG+1] Fix SVC predict_proba fails with new-style kernel strings Jan 12, 2018

Remove type constraint from libsvm.cross_validation()

1d9a1dc

jnothman approved these changes Jan 13, 2018

View reviewed changes

Remove redundant test code

9e881a7

jnothman approved these changes Jan 13, 2018

View reviewed changes

jnothman changed the title ~~[MRG+1] Fix SVC predict_proba fails with new-style kernel strings~~ [MRG+2] Fix SVC predict_proba fails with new-style kernel strings Jan 13, 2018

qinhanmin2014 approved these changes Jan 13, 2018

View reviewed changes

qinhanmin2014 merged commit 74b69df into scikit-learn:master Jan 13, 2018

qmick mentioned this pull request Jan 14, 2018

Add change log to whats_new (#10412) #10469

Merged

jnothman pushed a commit that referenced this pull request Jan 14, 2018

DOC what's new for #10412 (#10469)

ac11e4b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!