-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Removing repeated input checking in Lasso and DictLearning #5133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -421,12 +426,12 @@ def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy): | |||
precompute = (n_samples > n_features) | |||
|
|||
if precompute is True: | |||
precompute = np.dot(X.T, X) | |||
precompute = np.dot(X.T, X).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same rmk here
do we really want to add this example? travis is not happy. |
@@ -0,0 +1,104 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already have this here: http://scikit-learn.org/dev/auto_examples/decomposition/plot_faces_decomposition.html ?
This example has no point I only use it for benchmark. I am putting into a seperate gist |
I added a pool context manager in
On a side note, I encountered problem with the context manger in We obtain small improvement working on samples of dimension 409 600 and batch_size = 400, as We should thus allow multiprocessing only past a certain threshold of |
@@ -397,7 +398,8 @@ def fit(self, X, y): | |||
return self | |||
|
|||
|
|||
def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy): | |||
def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy, | |||
aux_order=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what "aux" stands for. Is it an acronym? A more explicit is probably required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this flag ask for Fortran order Xy
and precompute
. auxiliary_matrix_order
?
In order to move forward, I think this we should focus on fixing this PR (address the remaining comments and make the tests pass) and move the introduction a managed parallel context to a later PR. |
@@ -585,7 +599,7 @@ class ElasticNet(LinearModel, RegressorMixin): | |||
def __init__(self, alpha=1.0, l1_ratio=0.5, fit_intercept=True, | |||
normalize=False, precompute=False, max_iter=1000, | |||
copy_X=True, tol=1e-4, warm_start=False, positive=False, | |||
random_state=None, selection='cyclic'): | |||
random_state=None, selection='cyclic', check_input=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other scikit-learn models use check_input
only as per-method parameters rather than global constructor parameter when used in estimators. I think we should stay consistent with that pattern.
Comparison with master for 1 core, bypassing checks : green is master, blue is this PR. Average improvement for a single iteration : x 3,47. Time before the first iteration is a little longer as we perform initial input checking within |
2dd6b5a
to
9b2c894
Compare
I will wait for reviews before opening the multiprocessing PR. Travis does not complain anymore. I have not look into lars code (which is way slower for dictionary learning in scikit-learn, especially with this improvement), but it might be worth it to do the same changes. |
clf = ElasticNet(selection='cyclic', tol=1e-8) | ||
# Check that no error is raised if data is provided in the right format | ||
clf.fit(X, y, check_input=False) | ||
X = check_array(X, order='F', dtype='float32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that if you try with the correct dtype but the wrong memory layout:
X = check_array(X, order='C', dtype='float64')
you should not get an error but a wrong results. This is expected but maybe this should be asserted in the tests as well.
This comment has not been addressed: https://github.com/scikit-learn/scikit-learn/pull/5133/files#r37388137 |
clf = Lasso(alpha=alpha, fit_intercept=False, precompute=gram, | ||
max_iter=max_iter, warm_start=True) | ||
clf = Lasso(alpha=alpha, fit_intercept=False, normalize=False, | ||
precompute=gram, max_iter=max_iter, warm_start=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me realise that we always use Lasso with precomputed Gram matrix in the context of online dictionary learning. Have you checked if this is actually a always a good idea from a performance point of view?
If instead we want to use allow the use of coordinate descent with row slices (sample-wise minibatch) of a Fortran aligned feature array then we would need to change the cython code of coordinate descent to deal properly with strided arrays (views) in the daxpy
calls if we want to leverage the check_input=False
code path safely.
This does not seem to be hard but maybe this should be investigated in a separate PR.
This looks good. Please add an entry in |
Also please squash this PR into a single commit. |
3debe2b
to
77b7012
Compare
@@ -77,7 +78,7 @@ def sparse_center_data(X, y, fit_intercept, normalize=False): | |||
return X, y, X_mean, y_mean, X_std | |||
|
|||
|
|||
def center_data(X, y, fit_intercept, normalize=False, copy=True, | |||
def center_data(X, y, fit_intercept, normalize=False, copy=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups. Extra space
coef_false = clf.coef_ | ||
clf.fit(X, y, check_input=True) | ||
coef_true = clf.coef_ | ||
assert_true(np.any(coef_true != coef_false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert_array_almost_equal
besides LGTM if travis is happy. |
9575b31
to
2ed0c22
Compare
Done. |
[MRG+1] Removing repeated input checking in Lasso and DictLearning
Great, thank for the optim @arthurmensch! |
I use :
check_input
flag incoordinate_descent.enet_path
, that is set toFalse
when called fromElasticNet.fit
check_input
flag incoordinate_descent.ElasticNet.fit
, that is set toFalse
when called fromsparse_encode
I changed the condition for overriding provided Gram Matrix in
linear_model.base._pre_fit
, in order not to do a pass on data when fit_intercept and normalize are set to False (in master, we override Gram matrix if we find that X was changed by centering and rescaling, even when these are disabled, which wastes computation).I also avoid computing
cov
insparse_encode
when using coordiante_descent, ascov
computation is done within theLasso
class.On provided
plot_online_sparse_pca
example, we gain a factor 2 in performance.