8000 [WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent by MechCoder · Pull Request #3719 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent #3719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

MechCoder
Copy link
Member

The current cd algorithm breaks when the dual gap if the biggest coordinate update is less than tolerance. Glmnet however checks if the max change in the objective is less than tol and then breaks. This surprisingly leads to huge changes in speedup with almost no noticeable regression in prediction.

It states that, Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance.

It should be noted that the default tolerance in this case is 1e-7

Some initial benchmarks using LassoCV and precompute=False.

For the newsgroup dataset (using two categories), 5 random splits. (1174 X 130107), test_size = 2/3 of total size.

In this branch (using tol=1e-7)
mean_time = 16.6982872009
mean_accuracy_score = 0.89587628866
In master (using tol=1e-4)
mean_time = 236.406961584
mean_accuracy_score = 0.889175257732

For the haxby dataset with the mask, using 5 random splits, (216 X 577)

In this branch (using tol=1e-7)
mean_time = 0.495861053467
mean_accuracy_score = 0.958333333333
In master (using tol=1e-4)
mean_time = 3.05996584892
mean_accuracy_score = 0.930555555556

For the arcene dataset, (100 * 10000)

In this branch (using tol=1e-7)
mean_time = 4.69407510757
accuracy_score on test data = 0.68
In master (using tol=1e-7)
mean_time = 57.34074401860.68
accuracy_score on test data = 0.68

For the duke 8000 dataset, 5 random splits

In master
mean_accuracy_score = 0.84000000000000008
mean_time = 3.316893196105957
In this branch
mean_accuracy_score = 0.82666666666666655
mean_time = 1.2530781269073485

Since the default tolerances are different, how do we accomodate this change in terms of API. Do we need a new stopping criteria called "glmnet?".

@MechCoder MechCoder changed the title ENH: Faster stopping criteria based on glmnet for coordinate descent [WIP] ENH: Faster stopping criteria based on glmnet for coordinate descent Sep 29, 2014
@MechCoder
Copy link
Member Author

ping @agramfort , @jnothman and @GaelVaroquaux might be interested.

@mblondel
Copy link
Member

I would introduce a stopping parameter which would take the values "duality_gap" or "objective_value".

@MechCoder
Copy link
Member Author

And do we have another tol parameter since the tolerance does not mean the same for both.

@agramfort
Copy link
Member

I would introduce a stopping parameter which would take the values "duality_gap" or "objective_value".

+1 with one tol param.

@MechCoder
Copy link
Member Author

+1 with one tol param.

But how? Tolerance for both the stopping conditions are not the same, and clearly the default values are different. i.e 1e-4 and 1e-7.

@MechCoder
Copy link
Member Author
  1. Can we have something like tol="auto" as default that chooses the tolerance based on the stopping criteria?
  2. Also if we are sure that there are no test failures and are convinced that the stopping=objective_value is always better, should we just document this as a bug / API change and remove the dual_gap_ param? instead of having two conditions.

@agramfort
Copy link
Member

+1 for tol='auto'

I am -1 on removing dual_gap_. Having an optimality certificate is relevant for some problems.

@MechCoder
Copy link
Member Author

@agramfort I have cleaned up the code, and set stopping=dual_gap by default. I would like some feedback.

@MechCoder
Copy link
Member Author

I've added a test as Proof of Concept to show that this is actually working.

The current cd algorithm breaks when the dual gap if the biggest coordinate
update is less than tolerance. Glmnet however checks if the max change in
the objective is less than tol and then breaks. This surprisingly leads to
huge changes in speedup with almost no noticeable regression in prediction
The same stopping criteria extended to MultiTask models.
Since the default tolerances of the two stopping conditions are
different, for the objective condition, tol is set by default
to 1e-7 and for the dual gap, tol is set to 1e-4. The cython
code has also been updated to take care of these conditions.
Added a test to show that the number of passes made are smaller for
stopping="objective" even on a small dataset.
@MechCoder
Copy link
Member Author

I believe this PR is hugely important, and some initial feedback would be great.

ping @larsmans @mblondel

@mblondel
Copy link
Member
mblondel commented Oct 8, 2014

Could you clarify why this speed up is possible? Is it because the duality gap is expensive to compute?

if w[ii] != w_ii:
# R_norm2 = np.dot(R, R)
R_norm2 = ddot(n_samples, <DOUBLE*>R.data, 1,
<DOUBLE*>R.data, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be an idea to factor this out:

cdef inline double squared_norm(const double *x, int n, int stride) nogil:
     return ddot(n, x, stride, x, stride)

(and maybe the <double *>x.data stuff as well).

@MechCoder MechCoder closed this Oct 8, 2014
@MechCoder
Copy link
Member Author

@mblondel @agramfort It just seems that we have a lower default tolerance than glmnet. That is the reason it makes higher passes on data :(

@GaelVaroquaux
Copy link
Member

It just seems that we have a lower default tolerance than glmnet. That is the
reason it makes higher passes on data :(

Maybe we should change our default. Is there a good reason why it is
there? People are going to compare us to glmnet, using the defaults, and
find that glmnet is faster, and go on with using glmnet.

@MechCoder
Copy link
Member Author

git blame points me to Alex.

@agramfort
Copy link
Member

everybody can change his mind based on experimental evidence :)

@MechCoder MechCoder deleted the better_stopping branch December 3, 2014 21:49
@MechCoder
Copy link
Member Author

Sorry for bringing this back, but what is the status on increasing the default tolerance? When can we convince ourselves that the tolerance can be raised upto 1e-2?

@GaelVaroquaux
Copy link
Member

I am +1 on raising the tol. It seems to me that 1 e-2 is alot, thought

-------- Original message --------
From: Manoj Kumar notifications@github.com
Date:04/12/2014 00:42 (GMT+01:00)
To: scikit-learn/scikit-learn scikit-learn@noreply.github.com
Cc: Gael Varoquaux gael.varoquaux@normalesup.org
Subject: Re: [scikit-learn] [WIP] ENH: Faster stopping criteria based on
glmnet for coordinate descent (#3719)

Sorry for bringing this back, but what is the status on increasing the default tolerance? When can we convince ourselves that the tolerance can be raised upto 1e-2?


Reply to this email directly or view it on GitHub.

@mblondel
Copy link
Member
mblondel commented Dec 4, 2014

To choose the default tol, try different values on a few real datasets
(regression and binary classification). Then choose a value for which test
accuracy is within 0.01 percent of the test accuracy with tol large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0