[MRG] Changed examples so they produce the same values on OS X #11289

georgipeev · 2018-06-15T19:32:46Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Doctests were failing because of different math libraries used for random number generation. Decreasing the tolerance fixed the problem. Also switched the scoring method for cross validation and the corresponding results.

amueller · 2018-06-15T19:48:33Z

lgtm if it passes ;)

TomDLT · 2018-06-15T20:46:52Z

Almost there, fails only on one build:

    -     random_state=0, tol=0.00001, verbose=0)
    +     random_state=0, tol=1e-05, verbose=0)

    -     multi_class='ovr', penalty='l2', random_state=0, tol=0.00001,
    -     verbose=0)
    +     multi_class='ovr', penalty='l2', random_state=0, tol=1e-05, verbose=0)

amueller · 2018-06-15T21:42:50Z

So that's a numpy version thing? Should we add a fixture? seems a bit overkill... this doesn't happen in other places?

georgipeev · 2018-06-15T21:54:15Z

Doc tests pass even if I change those 0.00001s to scientific notation. I'll do that.

jnothman · 2018-06-17T22:26:32Z

Py3.6 failing:

332     LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
333          intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
334          random_state=0, tol=1e-05, verbose=0)
335     >>> print(regr.coef_)
Expected:
    [16.35750999 26.91499923 42.30652207 60.47843124]
Got:
    [16.35841504 26.91644036 42.30619026 60.47800997]

amueller · 2018-06-18T14:43:07Z

That seems like a reasonable level of precision for adding ellipsis, though :)

qinhanmin2014 · 2018-06-18T14:50:59Z

Personally, I don't like making examples complex (e.g., using something like scoring='recall_macro', LinearSVC(tol=0.00001) ). I think it's unfriendly to users, especially new comers. I wonder whether it's better to solve the problem by only reducing decimal digits (e.g., 16.35750999 -> 16.35).

amueller · 2018-06-18T16:17:57Z

The point of the cross-validation example was to show off the scoring met 8000 hods, right? I think this was a simplification because it doesn't use probability=True which makes the SVM use platt scaling which is a pretty weird thing to do, imho, in particular inside the cross-validation.

The problem was that with the default tolerance there are no digits that are equal, I think.

georgipeev · 2018-06-19T13:54:45Z

"All checks have passed" - so elusive, but apparently still attainable.

TomDLT · 2018-06-19T14:28:26Z

sklearn/svm/classes.py

@@ -352,7 +351,7 @@ class LinearSVR(LinearModel, RegressorMixin):

    sklearn.linear_model.SGDRegressor
        SGDRegressor can optimize the same cost function as LinearSVR
-        by adjusting the penalty and loss parameters. In addition it requires
+        by adjusting the penalty and loss parameters. In addition it .requires


small mistake

qinhanmin2014 · 2018-06-19T14:34:10Z

The problem was that with the default tolerance there are no digits that are equal, I think.

@georgipeev Could you take some time to confirm? I would be surprised if this is the case. Thanks in advance.

georgipeev · 2018-06-19T14:43:00Z

The problem was that with the default tolerance there are no digits that are equal, I think.

@amueller's recollection is correct - at the default tolerance many values in these examples only matched 1-2 digits after the decimal point, and one value matched none when running against whichever RNG library gets used under OS X.

qinhanmin2014 · 2018-06-19T15:05:45Z

@amueller's recollection is correct - at the default tolerance many values in these examples only matched 1-2 digits after the decimal point, and one value matched none when running against whichever RNG library gets used under OS X.

Thanks. I think I'll vote +0 here, since I still doubt whether it's good to make examples complex. E.g., I think clf = LinearSVC(random_state=0) is a good example for new comers, but clf = LinearSVC(random_state=0, tol=0.00001) is not.
Maybe we have some other ways, e.g., modifying n_features or random_state in make_classification(n_features=4, random_state=0).
Anyway, thanks @georgipeev for the PR. I'm fine with merging if we have two approvals here.

georgipeev · 2018-06-19T15:12:57Z

CI failure seems unrelated. How should we proceed?

qinhanmin2014 · 2018-06-19T15:24:18Z

@georgipeev Master branch is failing now. Please merge master branch in after #11318 is merged.

Fixes scikit-learn#11286.

Fixes scikit-learn#9294

…enerator functions (scikit-learn#11276)

…-learn#11302)

…matrix (scikit-learn#11264)

Recommend to use of `filename.joblib` instead of `filename.pkl` for models persisted via the joblib library to reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the `pickle` or `joblib` library.

@deprecated

…_takes_y() to work with @deprecated in Python 2 (scikit-learn#11277)

…t-learn#11323)

georgipeev · 2018-06-22T14:50:15Z

Is anything else required before this PR can be merged?

amueller · 2018-07-14T19:08:22Z

Something weird happened with your history but I think it'll go away if we squash and merge. looks good otherwise.

glemaitre · 2018-07-14T19:20:34Z

LGTM merging.
Thanks @georgipeev

TheAtomicOption · 2018-07-14T21:54:34Z

I tested this issue before the above merge with a conda env that included mkl in an attempt to get the same result on macOS, but the test still failed. While the above merge fixes the tests by removing precision, it doesn't align the actual results.

The liblinear code is directly compiled against the system blas as in the output below. This likely explains the system specific discrepancy.

sklearn/svm/src/liblinear/tron.cpp:16:10:
#include <cblas.h>

amueller · 2018-07-14T22:01:57Z

@TheAtomicOption yes, so there's no way to align the results without increasing the precision a lot. You can increase the precision and align the results but we found that wasn't worth it for a doctest.

TomDLT reviewed Jun 19, 2018

View reviewed changes

twosigmajab and others added 15 commits June 20, 2018 09:54

DOC add missing requirements for building docs (scikit-learn#11292)

773ae4c

Fixes scikit-learn#11286.

DOC: replace TODO with link to the glossary (scikit-learn#11279)

47891de

Fixes scikit-learn#9294

FIX scikit-learn#11215 : Changing return in docstring to yields for g…

7613559

…enerator functions (scikit-learn#11276)

DOC replace OpenHub/ohloh badge with star button (scikit-learn#11288)

9395424

MAINT skip dataset downloading doctest (scikit-learn#11284)

e751d65

DOC: add references for CD in LASSO and duality gap criterion (scikit…

13b33ed

…-learn#11302)

Add sparse efficiency warning to randomized_svd for dok_matrix / lil_…

cb5ec0a

…matrix (scikit-learn#11264)

FIX Uses self.scoring for score function (scikit-learn#11192)

0badbea

BLD fix sphx gallery errors (scikit-learn#11307)

c80665f

DOC: use .joblib file extension rather than .pkl

2aee027 E7F5

DOC Add libraries.io and changelog links (scikit-learn#11298)

877ab46

DOC reorganize datasets documentation page (scikit-learn#11180)

c9e48bf

ENH Add refit_time_ attribute to model selection (scikit-learn#11310)

b67149e

MAINT Fix scikit-learn#9350: Enable has_fit_parameter() and fit_score…

9566738

…_takes_y() to work with @deprecated in Python 2 (scikit-learn#11277)

jnothman and others added 8 commits June 20, 2018 09:54

Fix skipping in conftest.py (scikit-learn#11318)

2ce21c2

MAINT clarifications in ColumnTransformer._update_transformers (sciki…

786c94d

…t-learn#11323)

COSMIT fix syntax quirk

caa426f

changed examples so they produce the same values on MacOS

580026e

switched two constants to scientific notation

ca6adf7

use slipsis for example values that depend on RNG library

f473b29

addede more ellipsis

58161fc

removed extra dot

6492d8a

georgipeev force-pushed the fix-doctest-for-LinearSVC-and-LinearSVR branch from e704a24 to 6492d8a Compare June 20, 2018 13:58

Georgi Peev added 2 commits June 20, 2018 10:00

Merge remote-tracking branch 'upstream/master'

e4d2ce0

Merge branch 'master' into fix-doctest-for-LinearSVC-and-LinearSVR

ff42981

amueller approved these changes Jul 14, 2018

View reviewed changes

nitpicks

723cd09

glemaitre merged commit 91bfca6 into scikit-learn:master Jul 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Changed examples so they produce the same values on OS X #11289

[MRG] Changed examples so they produce the same values on OS X #11289

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG] Changed examples so they produce the same values on OS X #11289

[MRG] Changed examples so they produce the same values on OS X #11289

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!