Sag handle numerical error outside of cython #13389

pierreglaser · 2019-03-04T16:29:56Z

Cython's gcc branch prediction directive make sag unnecessarily fight for the GIL at a high frequency in case an error needs to be raised. This severely impacts the performance of parallel sag calls using a joblib "threading" backend.

I propose to handle numerical error "C-style", by propagating return codes and breaking out of for-loops. An error is raised once and for all after sag, in sag_solver.

Note that I need to manually break out of for-loops and not simply return -1 after the check, as returning requires the GIL, so the same problem would happen.

@jeremiedbb followed the issue during last sprint.

jeremiedbb

looks good.

just a couple of minor comments.

Also can you run the benchmark #13316 with this change ?

doc/whats_new/v0.21.rst

sklearn/linear_model/sag_fast.pyx.tp

pierreglaser · 2019-03-04T17:21:08Z

Running the same #13316 benchmark, I now get:

backend: threading n_jobs: 1 total time: (6.028,  n_iter: 15.0)
backend: threading n_jobs: 2 total time: (3.815,  n_iter: 15.6)
backend: threading n_jobs: 4 total time: (3.018,  n_iter: 15.8)
backend:      loky n_jobs: 1 total time: (6.448,  n_iter: 15.3)
backend:      loky n_jobs: 2 total time: (4.532,  n_iter: 15.4)
backend:      loky n_jobs: 4 total time: (3.255,  n_iter: 15.3)

TomDLT

LGTM

massich

LGTM

sklearn/linear_model/sag.py

ogrisel

This is a really weird performance bug. Have you tried to re-run your benchmark with this PR to confirm that it fixes the performance issue observed with the threading backend?

Can you try to compile this extension without branch prediction to confirm that the GIL acquisition is caused by this as suggested in #13316 (comment)? That sounds really weird to me.

sklearn/linear_model/tests/test_sag.py

jeremiedbb · 2019-03-05T16:11:35Z

This is a really weird performance bug. Have you tried to re-run your benchmark with this PR to confirm that it fixes the performance issue observed with the threading backend?

just look a bit above :)

jeremiedbb · 2019-03-05T16:13:47Z

Can you try to compile this extension without branch prediction to confirm that the GIL acquisition is caused by this as suggested in #13316 (comment)? That sounds really weird to me.

This is strange indeed but if you comment the

if something_not_finite:
    with gil:
        raise error

bloks, the issue disappears, although the condition is never met (because no error is raised)

pierreglaser · 2019-03-05T16:42:54Z

@ogrisel I tried disabling Cython.Compiler.Options.gcc_branch_hints using current master but that did not improve performance.
However, not trying to aquire the gil in such setting is not unusual: in sgd_fast.pyx, the same "trick" is done:

scikit-learn/sklearn/linear_model/sgd_fast.pyx

Lines 758 to 762 in 7b26762

    
           # floating-point under-/overflow check. 
        
           if (not skl_isfinite(intercept) 
        
               or any_nonfinite(<double *>weights.data, n_features)): 
        
               infinity = True 
        
               break

then further along the way:

scikit-learn/sklearn/linear_model/sgd_fast.pyx

Lines 795 to 798 in 7b26762

    
           if infinity: 
        
               raise ValueError(("Floating-point under-/overflow occurred at epoch" 
        
                                 " #%d. Scaling input data with StandardScaler or" 
        
                                 " MinMaxScaler might help.") % (epoch + 1))

jeremiedbb · 2019-03-05T17:12:55Z

I tried disabling Cython.Compiler.Options.gcc_branch_hints

these are just hints but I think it does not disable branch prediction. Branch prediction is done by the cpu. I can't see how to disable it.

ogrisel

nitpick but otherwise looks good to me!

I suppose we might want to set prefer="threads" for LogisticRegressionCV when the saga solver is used but this should probably be done elsewhere.

sklearn/linear_model/sag_fast.pyx.tp

sklearn/linear_model/tests/test_sag.py

wamuir · 2019-03-10T02:24:04Z

So much better. Thank you @pierreglaser

)" This reverts commit b6c2b95.

pierreglaser added 3 commits March 4, 2019 17:19

FIX handle numerical error from sag in sag_solver

16004a1

TST add a numerical error handling test

c5daa6b

MNT update whats_new

042230f

pierreglaser mentioned this pull request Mar 4, 2019

Performance of LogisticRegression with saga. #13316

Closed

jeremiedbb reviewed Mar 4, 2019

View reviewed changes

doc/whats_new/v0.21.rst Show resolved Hide resolved

sklearn/linear_model/sag_fast.pyx.tp Outdated Show resolved Hide resolved

pierreglaser added 2 commits March 4, 2019 18:07

CLN use return for functions returning c objects

58307e4

CLN issue no. -> PR no. in whats_new

d00eeb4

TomDLT approved these changes 8000 Mar 4, 2019

View reviewed changes

massich approved these changes Mar 5, 2019

View reviewed changes

sklearn/linear_model/sag.py Outdated Show resolved Hide resolved

CLN handle error in sag_fast after the for loop

617c69e

ogrisel reviewed Mar 5, 2019

View reviewed changes

sklearn/linear_model/tests/test_sag.py Outdated Show resolved Hide resolved

MNT update test comment after last commit

5aeb407

ogrisel approved these changes Mar 6, 2019

View reviewed changes

sklearn/linear_model/sag_fast.pyx.tp Outdated Show resolved Hide resolved

sklearn/linear_model/sag_fast.pyx.tp Outdated Show resolved Hide resolved

sklearn/linear_model/tests/test_sag.py Show resolved Hide resolved

CLN correct epoch no., in error message, comment

cea0d59

ogrisel merged commit 0f0799a into scikit-learn:master Mar 6, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Sag handle numerical error outside of cython (scikit-learn#13389)

b6c2b95

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "Sag handle numerical error outside of cython (scikit-learn#13389

c78f3fb

)" This reverts commit b6c2b95.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "Sag handle numerical error outside of cython (scikit-learn#13389

3cc43e4

)" This reverts commit b6c2b95.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

Sag handle numerical error outside of cython (scikit-learn#13389)

1e58ede

pierreglaser mentioned this pull request Aug 1, 2019

RFC: Add pierreglaser/sklearn_parallel_benchmark to scikit-learn_benchmarks jeremiedbb/scikit-learn_benchmarks#16

Open

3 tasks

jeremiedbb mentioned this pull request Dec 17, 2021

[WIP] Callback API continued #22000

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sag handle numerical error outside of cython #13389

Sag handle numerical error outside of cython #13389

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sag handle numerical error outside of cython #13389

Sag handle numerical error outside of cython #13389

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!