added unicode catch to predict_proba function in SVC #10338

JoshuaMeyers · 2017-12-18T16:49:21Z

What does this implement/fix? Explain your changes.

Adds mixed object type handling (unicode/string) to the kernel parameter of SVM models.
This was present for the predict function, but neglected for predict_proba
For a recreation of the error, see EducationalTestingService/skll#87

See EducationalTestingService/skll#87 Unicode handling in `.predict` had not been added to `.predict_proba`

rth · 2017-12-18T17:21:00Z

Thank you for your PR.

Are you sure this wasn't fixed in #7064 ? If so please add a unit test (or extend test_unicode_kernel) that would fail on master but pass with this PR. Thanks.

amueller · 2017-12-18T17:21:34Z

you're mixing tabs and spaces. Please use spaces.
(also there's a weird issue in the test for tabs vs spaces?!)
Why is this necessary? It's not done in predict or in the sparse case. And in fitting it's only done in the dense case in _dense_fit, right?

…nctions Previously, unicode test only tested `clf.fit`. In particular, `predict_proba` currently fails if unicode is provided to the `kernel` parameter

JoshuaMeyers · 2017-12-19T11:02:56Z

Thanks for your responses!

@rth The code from the issue #7064 is the same but this was not applied to the predict_proba endpoint. I have also now added a call to predict_proba in the unit tests - this will fail on master.

@amueller My bad, I have fixed the spacing! It is necessary to catch unicode objects that are given to the kernel parameter which are not handled by libSVM. Other parameters are not sensitive to this and neither should kernel be. It is only necessary for the _dense_predict_proba also

amueller · 2017-12-19T16:28:01Z

sklearn/svm/base.py

@@ -322,6 +322,11 @@ def _dense_predict(self, X):
                                 "the number of samples at training time" %
                                 (X.shape[1], self.shape_fit_[0]))

+        if six.PY3:


Yeah I was confused as to why this didn't happen ;)

amueller · 2017-12-19T16:28:23Z

sklearn/svm/tests/test_svm.py

        clf.fit(X, Y)
+        clf.predict(T)
+        clf.predict_proba(T)


decision function?

yep sounds a good idea, works on my local copy but just for completeness

amueller · 2017-12-19T16:29:08Z

worry less about lgtm and more about travis ;)

Why is this not necessary in the sparse case?

JoshuaMeyers · 2017-12-19T17:00:04Z

@amueller haha okay I'll focus on travis first.

As for why this isn't necessary in the sparse case, the kernel is specified by its index in the sparse case which seems not to be sensitive to this...
_sparse_predict: kernel_type = self._sparse_kernels.index(kernel)

I've tested and indeed, the sparse versions do not error.

amueller · 2017-12-19T17:25:02Z

    # The order of these must match the integer values in LibSVM.
    # XXX These are actually the same in the dense case. Need to factor
    # this out.
    _sparse_kernels = ["linear", "poly", "rbf", "sigmoid", "precomputed"]

Maybe we should just do that? It's weird to have substantially different code paths for sparse and dense. If you'd rather not get involved in this, that's fine, too ;)

amueller · 2017-12-19T17:27:19Z

Basically we just need to move kernel_index = LIBSVM_KERNEL_TYPES.index(kernel) from the cython code to the python code in the dense case, then it'll be the same as the sparse case, we can rename _sparse_kernels to _kernels and can get rid of the string handling code and just pass integers to cython everywhere.

qmick · 2018-01-05T04:07:21Z

It is still in progress?

qinhanmin2014 · 2018-01-05T10:15:22Z

@qmick Feel free to take it if @JoshuaMeyers doesn't reply after some time. Note that we are using multiple ways to handle the problem currently.
(1)in sparse case : passing int kernel_type instead of str kernel
(2)in some functions in dense case(e.g., predict): passing kernel instead of str kernel
(3)in some functions in dense case(e.g., fit): using str to cast kernel to string before passing
I agree with amueller's comment above (i.e., always use the first way).

qmick · 2018-01-05T13:59:17Z

@qinhanmin2014 Thanks, I'll keep looking at it.

JoshuaMeyers · 2018-01-05T14:21:03Z

Hey @qmick, feel free to have a go at this. I haven't had the time. Apologies!

qinhanmin2014 · 2018-01-05T14:25:16Z

Thanks @JoshuaMeyers for your great work so far :)

JoshuaMeyers · 2018-01-08T10:09:54Z

This issue was addressed by #10412. Closing this MR.

added unicode catch to predict_proba function

45cd060

See EducationalTestingService/skll#87 Unicode handling in `.predict` had not been added to `.predict_proba`

JoshuaMeyers changed the title ~~added unicode catch to predict_proba function~~ added unicode catch to predict_proba function in SVC Dec 18, 2017

JoshuaMeyers added 3 commits December 19, 2017 10:39

Update base.py

6d7a54e

update unicode kernel unit test to include predict & predict_proba fu…

fa33e13

…nctions Previously, unicode test only tested `clf.fit`. In particular, `predict_proba` currently fails if unicode is provided to the `kernel` parameter

added blank line

eb62a2d

JoshuaMeyers changed the title ~~added unicode catch to predict_proba function in SVC~~ [WIP] added unicode catch to predict_proba function in SVC Dec 19, 2017

added catch for kernel parameter if in ascii bytes in Python3

5dd2418

JoshuaMeyers changed the title ~~[WIP] added unicode catch to predict_proba function in SVC~~ added unicode catch to predict_proba function in SVC Dec 19, 2017

amueller reviewed Dec 19, 2017

View reviewed changes

Josh Meyers added 2 commits December 19, 2017 16:29

remove whitespace

2e25aa2

added clf.decision_function to unit test

63c07e0

qmick mentioned this pull request Dec 31, 2017

SVC predict_proba fails with new-style kernel strings (expected str, got newstr) #10374

Closed

qmick mentioned this pull request Jan 6, 2018

[MRG+2] Fix SVC predict_proba fails with new-style kernel strings #10412

Merged

JoshuaMeyers closed this Jan 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

added unicode catch to predict_proba function in SVC #10338

added unicode catch to predict_proba function in SVC #10338

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

added unicode catch to predict_proba function in SVC #10338

added unicode catch to predict_proba function in SVC #10338

Uh oh!

Conversation

Uh oh!

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!