8000 added unicode catch to predict_proba function in SVC by JoshuaMeyers · Pull Request #10338 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

added unicode catch to predict_proba function in SVC #10338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

JoshuaMeyers
Copy link
@JoshuaMeyers JoshuaMeyers commented Dec 18, 2017

What does this implement/fix? Explain your changes.

Adds mixed object type handling (unicode/string) to the kernel parameter of SVM models.
This was present for the predict function, but neglected for predict_proba
For a recreation of the error, see EducationalTestingService/skll#87

See EducationalTestingService/skll#87
Unicode handling in `.predict` had not been added to `.predict_proba`
@rth
Copy link
Member
rth commented Dec 18, 2017

Thank you for your PR.

Are you sure this wasn't fixed in #7064 ? If so please add a unit test (or extend test_unicode_kernel) that would fail on master but pass with this PR. Thanks.

@JoshuaMeyers JoshuaMeyers changed the title added unicode catch to predict_proba function added unicode catch to predict_proba function in SVC Dec 18, 2017
@amueller
Copy link
Member

you're mixing tabs and spaces. Please use spaces.
(also there's a weird issue in the test for tabs vs spaces?!)
Why is this necessary? It's not done in predict or in the sparse case. And in fitting it's only done in the dense case in _dense_fit, right?

…nctions

Previously, unicode test only tested `clf.fit`.
In particular, `predict_proba` currently fails if unicode is provided to the `kernel` parameter
@JoshuaMeyers
Copy link
Author

Thanks for your responses!

@rth The code from the issue #7064 is the same but this was not applied to the predict_proba endpoint. I have also now added a call to predict_proba in the unit tests - this will fail on master.

@amueller My bad, I have fixed the spacing! It is necessary to catch unicode objects that are given to the kernel parameter which are not handled by libSVM. Other parameters are not sensitive to this and neither should kernel be. It is only necessary for the _dense_predict_proba also

@JoshuaMeyers JoshuaMeyers changed the title added unicode catch to predict_proba function in SVC [WIP] added unicode catch to predict_proba function in SVC Dec 19, 2017
@JoshuaMeyers JoshuaMeyers changed the title [WIP] added unicode catch to predict_proba function in SVC added unicode catch to predict_proba function in SVC Dec 19, 2017
@@ -322,6 +322,11 @@ def _dense_predict(self, X):
"the number of samples at training time" %
(X.shape[1], self.shape_fit_[0]))

if six.PY3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was confused as to why this didn't happen ;)

clf.fit(X, Y)
clf.predict(T)
clf.predict_proba(T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decision function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep sounds a good idea, works on my local copy but just for completeness

@amueller
Copy link
Member

worry less about lgtm and more about travis ;)

Why is this not necessary in the sparse case?

@JoshuaMeyers
Copy link
Author

@amueller haha okay I'll focus on travis first.

As for why this isn't necessary in the sparse case, the kernel is specified by its index in the sparse case which seems not to be sensitive to this...
_sparse_predict: kernel_type = self._sparse_kernels.index(kernel)

I've tested and indeed, the sparse versions do not error.

@amueller
Copy link
Member
    # The order of these must match the integer values in LibSVM.
    # XXX These are actually the same in the dense case. Need to factor
    # this out.
    _sparse_kernels = ["linear", "poly", "rbf", "sigmoid", "precomputed"]

Maybe we should just do that? It's weird to have substantially different code paths for sparse and dense. If you'd rather not get involved in this, that's fine, too ;)

@amueller
Copy link
Member

Basically we just need to move kernel_index = LIBSVM_KERNEL_TYPES.index(kernel) from the cython code to the python code in the dense case, then it'll be the same as the sparse case, we can rename _sparse_kernels to _kernels and can get rid of the string handling code and just pass integers to cython everywhere.

@qmick
Copy link
Contributor
qmick commented Jan 5, 2018

It is still in progress?

@qinhanmin2014
Copy link
Member

@qmick Feel free to take it if @JoshuaMeyers doesn't reply after some time. Note that we are using multiple ways to handle the problem currently.
(1)in sparse case : passing int kernel_type instead of str kernel
(2)in some functions in dense case(e.g., predict): passing kernel instead of str kernel
(3)in some functions in dense case(e.g., fit): using str to cast kernel to string before passing
I agree with amueller's comment above (i.e., always use the first way).

@qmick
Copy link
Contributor
qmick commented Jan 5, 2018

@qinhanmin2014 Thanks, I'll keep looking at it.

@JoshuaMeyers
Copy link
Author

Hey @qmick, feel free to have a go at this. I haven't had the time. Apologies!

@qinhanmin2014
Copy link
Member

Thanks @JoshuaMeyers for your great work so far :)

@JoshuaMeyers
Copy link
Author

This issue was addressed by #10412. Closing this MR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0