8000 [MRG] Changed examples so they produce the same values on OS X by georgipeev · Pull Request #11289 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Changed examples so they produce the same values on OS X #11289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

georgipeev
Copy link
Contributor

Reference Issues/PRs

Fixes #10213.

What does this implement/fix? Explain your changes.

Doctests were failing because of different math libraries used for random number generation. Decreasing the tolerance fixed the problem. Also switched the scoring method for cross validation and the corresponding results.

@amueller
Copy link
Member

lgtm if it passes ;)

@TomDLT
Copy link
Member
TomDLT commented Jun 15, 2018

Almost there, fails only on one build:

    -     random_state=0, tol=0.00001, verbose=0)
    +     random_state=0, tol=1e-05, verbose=0)
    -     multi_class='ovr', penalty='l2', random_state=0, tol=0.00001,
    -     verbose=0)
    +     multi_class='ovr', penalty='l2', random_state=0, tol=1e-05, verbose=0)

@amueller
Copy link
Member

So that's a numpy version thing? Should we add a fixture? seems a bit overkill... this doesn't happen in other places?

@georgipeev
Copy link
Contributor Author

Doc tests pass even if I change those 0.00001s to scientific notation. I'll do that.

@jnothman
Copy link
Member

Py3.6 failing:

332     LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
333          intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
334          random_state=0, tol=1e-05, verbose=0)
335     >>> print(regr.coef_)
Expected:
    [16.35750999 26.91499923 42.30652207 60.47843124]
Got:
    [16.35841504 26.91644036 42.30619026 60.47800997]

@amueller
Copy link
Member

That seems like a reasonable level of precision for adding ellipsis, though :)

@qinhanmin2014
Copy link
Member

Personally, I don't like making examples complex (e.g., using something like scoring='recall_macro', LinearSVC(tol=0.00001) ). I think it's unfriendly to users, especially new comers. I wonder whether it's better to solve the problem by only reducing decimal digits (e.g., 16.35750999 -> 16.35).

@amueller
Copy link
Member

The point of the cross-validation example was to show off the scoring met 8000 hods, right? I think this was a simplification because it doesn't use probability=True which makes the SVM use platt scaling which is a pretty weird thing to do, imho, in particular inside the cross-validation.

The problem was that with the default tolerance there are no digits that are equal, I think.

@georgipeev
Copy link
Contributor Author

"All checks have passed" - so elusive, but apparently still attainable.

@@ -352,7 +351,7 @@ class LinearSVR(LinearModel, RegressorMixin):

sklearn.linear_model.SGDRegressor
SGDRegressor can optimize the same cost function as LinearSVR
by adjusting the penalty and loss parameters. In addition it requires
by adjusting the penalty and loss parameters. In addition it .requires
Copy link
Member
@TomDLT TomDLT Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small mistake

@qinhanmin2014
Copy link
Member

The problem was that with the default tolerance there are no digits that are equal, I think.

@georgipeev Could you take some time to confirm? I would be surprised if this is the case. Thanks in advance.

@georgipeev
Copy link
Contributor Author

The problem was that with the default tolerance there are no digits that are equal, I think.

@amueller's recollection is correct - at the default tolerance many values in these examples only matched 1-2 digits after the decimal point, and one value matched none when running against whichever RNG library gets used under OS X.

@qinhanmin2014
Copy link
Member

@amueller's recollection is correct - at the default tolerance many values in these examples only matched 1-2 digits after the decimal point, and one value matched none when running against whichever RNG library gets used under OS X.

Thanks. I think I'll vote +0 here, since I still doubt whether it's good to make examples complex. E.g., I think clf = LinearSVC(random_state=0) is a good example for new comers, but clf = LinearSVC(random_state=0, tol=0.00001) is not.
Maybe we have some other ways, e.g., modifying n_features or random_state in make_classification(n_features=4, random_state=0).
Anyway, thanks @georgipeev for the PR. I'm fine with merging if we have two approvals here.

@georgipeev
Copy link
Contributor Author

CI failure seems unrelated. How should we proceed?

@qinhanmin2014
Copy link
Member

@georgipeev Master branch is failing now. Please merge master branch in after #11318 is merged.

twosigmajab and others added 15 commits June 20, 2018 09:54
Recommend to use of `filename.joblib` instead of `filename.pkl` for models persisted via the joblib library to reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the `pickle` or `joblib` library.
@georgipeev georgipeev force-pushed the fix-doctest-for-LinearSVC-and-LinearSVR branch from e704a24 to 6492d8a Compare June 20, 2018 13:58
@georgipeev
Copy link
Contributor Author

Is anything else required before this PR can be merged?

@amueller
Copy link
Member

Something weird happened with your history but I think it'll go away if we squash and merge. looks good otherwise.

@glemaitre glemaitre merged commit 91bfca6 into scikit-learn:master Jul 14, 2018
@glemaitre
Copy link
Member

LGTM merging.
Thanks @georgipeev

@TheAtomicOption
Copy link
Contributor

I tested this issue before the above merge with a conda env that included mkl in an attempt to get the same result on macOS, but the test still failed. While the above merge fixes the tests by removing precision, it doesn't align the actual results.

The liblinear code is directly compiled against the system blas as in the output below. This likely explains the system specific discrepancy.

sklearn/svm/src/liblinear/tron.cpp:16:10:
#include <cblas.h>

@amueller
Copy link
Member

@TheAtomicOption yes, so there's no way to align the results without increasing the precision a lot. You can increase the precision and align the results but we found that wasn't worth it for a doctest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Doctest failure for LinearSVC and LinearSVR
0