-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping @agramfort , @jnothman and @GaelVaroquaux might be interested. |
I would introduce a |
And do we have another |
+1 with one tol param. |
But how? Tolerance for both the stopping conditions are not the same, and clearly the default values are different. i.e 1e-4 and 1e-7. |
|
+1 for tol='auto' I am -1 on removing dual_gap_. Having an optimality certificate is relevant for some problems. |
646f299
to
f6a884c
Compare
@agramfort I have cleaned up the code, and set |
I've added a test as Proof of Concept to show that this is actually working. |
The current cd algorithm breaks when the dual gap if the biggest coordinate update is less than tolerance. Glmnet however checks if the max change in the objective is less than tol and then breaks. This surprisingly leads to huge changes in speedup with almost no noticeable regression in prediction
The same stopping criteria extended to MultiTask models.
Since the default tolerances of the two stopping conditions are different, for the objective condition, tol is set by default to 1e-7 and for the dual gap, tol is set to 1e-4. The cython code has also been updated to take care of these conditions.
Added a test to show that the number of passes made are smaller for stopping="objective" even on a small dataset.
da00b11
to
bc94923
Compare
Could you clarify why this speed up is possible? Is it because the duality gap is expensive to compute? |
if w[ii] != w_ii: | ||
# R_norm2 = np.dot(R, R) | ||
R_norm2 = ddot(n_samples, <DOUBLE*>R.data, 1, | ||
<DOUBLE*>R.data, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be an idea to factor this out:
cdef inline double squared_norm(const double *x, int n, int stride) nogil:
return ddot(n, x, stride, x, stride)
(and maybe the <double *>x.data
stuff as well).
@mblondel @agramfort It just seems that we have a lower default tolerance than glmnet. That is the reason it makes higher passes on data :( |
Maybe we should change our default. Is there a good reason why it is |
git blame points me to Alex. |
everybody can change his mind based on experimental evidence :) |
Sorry for bringing this back, but what is the status on increasing the default tolerance? When can we convince ourselves that the tolerance can be raised upto |
I am +1 on raising the tol. It seems to me that 1 e-2 is alot, thought -------- Original message -------- Sorry for bringing this back, but what is the status on increasing the default tolerance? When can we convince ourselves that the tolerance can be raised upto 1e-2? — |
To choose the default tol, try different values on a few real datasets |
The current cd algorithm breaks when the dual gap if the biggest coordinate update is less than tolerance. Glmnet however checks if the max change in the objective is less than tol and then breaks. This surprisingly leads to huge changes in speedup with almost no noticeable regression in prediction.
It states that, Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance.
It should be noted that the default tolerance in this case is 1e-7
Some initial benchmarks using
LassoCV
andprecompute=False
.For the newsgroup dataset (using two categories), 5 random splits. (1174 X 130107), test_size = 2/3 of total size.
For the haxby dataset with the mask, using 5 random splits, (216 X 577)
For the arcene dataset, (100 * 10000)
For the duke 8000 dataset, 5 random splits
Since the default tolerances are different, how do we accomodate this change in terms of API. Do we need a new stopping criteria called "glmnet?".