[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719

MechCoder · 2014-09-29T13:12:28Z

The current cd algorithm breaks when the dual gap if the biggest coordinate update is less than tolerance. Glmnet however checks if the max change in the objective is less than tol and then breaks. This surprisingly leads to huge changes in speedup with almost no noticeable regression in prediction.

It states that, Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance.

It should be noted that the default tolerance in this case is 1e-7

Some initial benchmarks using LassoCV and precompute=False.

For the newsgroup dataset (using two categories), 5 random splits. (1174 X 130107), test_size = 2/3 of total size.

In this branch (using tol=1e-7)
mean_time = 16.6982872009
mean_accuracy_score = 0.89587628866
In master (using tol=1e-4)
mean_time = 236.406961584
mean_accuracy_score = 0.889175257732

For the haxby dataset with the mask, using 5 random splits, (216 X 577)

In this branch (using tol=1e-7)
mean_time = 0.495861053467
mean_accuracy_score = 0.958333333333
In master (using tol=1e-4)
mean_time = 3.05996584892
mean_accuracy_score = 0.930555555556

For the arcene dataset, (100 * 10000)

In this branch (using tol=1e-7)
mean_time = 4.69407510757
accuracy_score on test data = 0.68
In master (using tol=1e-7)
mean_time = 57.34074401860.68
accuracy_score on test data = 0.68

For the duke 8000 dataset, 5 random splits

In master
mean_accuracy_score = 0.84000000000000008
mean_time = 3.316893196105957
In this branch
mean_accuracy_score = 0.82666666666666655
mean_time = 1.2530781269073485

Since the default tolerances are different, how do we accomodate this change in terms of API. Do we need a new stopping criteria called "glmnet?".

MechCoder · 2014-09-29T13:14:57Z

ping @agramfort , @jnothman and @GaelVaroquaux might be interested.

mblondel · 2014-09-29T14:43:23Z

I would introduce a stopping parameter which would take the values "duality_gap" or "objective_value".

MechCoder · 2014-09-29T14:58:18Z

And do we have another tol parameter since the tolerance does not mean the same for both.

agramfort · 2014-09-29T15:05:53Z

I would introduce a stopping parameter which would take the values "duality_gap" or "objective_value".

+1 with one tol param.

MechCoder · 2014-09-29T15:35:23Z

+1 with one tol param.

But how? Tolerance for both the stopping conditions are not the same, and clearly the default values are different. i.e 1e-4 and 1e-7.

MechCoder · 2014-09-29T16:48:07Z

Can we have something like tol="auto" as default that chooses the tolerance based on the stopping criteria?
Also if we are sure that there are no test failures and are convinced that the stopping=objective_value is always better, should we just document this as a bug / API change and remove the dual_gap_ param? instead of having two conditions.

agramfort · 2014-09-30T07:39:30Z

+1 for tol='auto'

I am -1 on removing dual_gap_. Having an optimality certificate is relevant for some problems.

MechCoder · 2014-09-30T13:33:36Z

@agramfort I have cleaned up the code, and set stopping=dual_gap by default. I would like some feedback.

MechCoder · 2014-09-30T14:40:12Z

I've added a test as Proof of Concept to show that this is actually working.

The current cd algorithm breaks when the dual gap if the biggest coordinate update is less than tolerance. Glmnet however checks if the max change in the objective is less than tol and then breaks. This surprisingly leads to huge changes in speedup with almost no noticeable regression in prediction

The same stopping criteria extended to MultiTask models.

Since the default tolerances of the two stopping conditions are different, for the objective condition, tol is set by default to 1e-7 and for the dual gap, tol is set to 1e-4. The cython code has also been updated to take care of these conditions.

Added a test to show that the number of passes made are smaller for stopping="objective" even on a small dataset.

MechCoder · 2014-10-07T23:53:36Z

I believe this PR is hugely important, and some initial feedback would be great.

ping @larsmans @mblondel

mblondel · 2014-10-08T00:26:09Z

Could you clarify why this speed up is possible? Is it because the duality gap is expensive to compute?

larsmans · 2014-10-08T12:12:32Z

sklearn/linear_model/cd_fast.pyx

+                    if w[ii] != w_ii:
+                        # R_norm2 = np.dot(R, R)
+                        R_norm2 = ddot(n_samples, <DOUBLE*>R.data, 1,
+                                       <DOUBLE*>R.data, 1)


It might be an idea to factor this out:

cdef inline double squared_norm(const double *x, int n, int stride) nogil: return ddot(n, x, stride, x, stride)

(and maybe the <double *>x.data stuff as well).

MechCoder · 2014-10-08T13:00:54Z

@mblondel @agramfort It just seems that we have a lower default tolerance than glmnet. That is the reason it makes higher passes on data :(

GaelVaroquaux · 2014-10-08T13:09:34Z

It just seems that we have a lower default tolerance than glmnet. That is the
reason it makes higher passes on data :(

Maybe we should change our default. Is there a good reason why it is
there? People are going to compare us to glmnet, using the defaults, and
find that glmnet is faster, and go on with using glmnet.

MechCoder · 2014-10-08T13:12:34Z

git blame points me to Alex.

agramfort · 2014-10-08T14:26:35Z

everybody can change his mind based on experimental evidence :)

MechCoder · 2014-12-03T23:42:17Z

Sorry for bringing this back, but what is the status on increasing the default tolerance? When can we convince ourselves that the tolerance can be raised upto 1e-2?

GaelVaroquaux · 2014-12-04T07:42:05Z

I am +1 on raising the tol. It seems to me that 1 e-2 is alot, thought

-------- Original message --------
From: Manoj Kumar notifications@github.com
Date:04/12/2014 00:42 (GMT+01:00)
To: scikit-learn/scikit-learn scikit-learn@noreply.github.com
Cc: Gael Varoquaux gael.varoquaux@normalesup.org
Subject: Re: [scikit-learn] [WIP] ENH: Faster stopping criteria based on
glmnet for coordinate descent (#3719)

Sorry for bringing this back, but what is the status on increasing the default tolerance? When can we convince ourselves that the tolerance can be raised upto 1e-2?

—
Reply to this email directly or view it on GitHub.

mblondel · 2014-12-04T07:54:12Z

To choose the default tol, try different values on a few real datasets
(regression and binary classification). Then choose a value for which test
accuracy is within 0.01 percent of the test accuracy with tol large.

MechCoder changed the title ~~ENH: Faster stopping criteria based on glmnet for coordinate descent~~ [WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent Sep 29, 2014

MechCoder force-pushed the better_stopping branch from 646f299 to f6a884c Compare September 30, 2014 13:31

MechCoder mentioned this pull request Sep 30, 2014

[MRG] Change the default of precompute from "auto" to False #3249

Merged

MechCoder added 4 commits October 7, 2014 16:18

FIX: Stopping criteria for MultiTask models 8000

856a9ba

The same stopping criteria extended to MultiTask models.

TST: Added a test as proof of concept.

bc94923

Added a test to show that the number of passes made are smaller for stopping="objective" even on a small dataset.

MechCoder force-pushed the better_stopping branch from da00b11 to bc94923 Compare October 7, 2014 14:21

larsmans reviewed Oct 8, 2014
View reviewed changes

MechCoder closed this Oct 8, 2014

MechCoder deleted the better_stopping branch December 3, 2014 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719

[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719

[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!