8000 [MRG+1] Removing repeated input checking in Lasso and DictLearning by arthurmensch · Pull Request #5133 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Removing repeated input checking in Lasso and DictLearning #5133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 20, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ Enhancements
with ``n_jobs > 1`` used with a large grid of parameters on a small
dataset. By `Vlad Niculae`_, `Olivier Grisel`_ and `Loic Esteve`_.

- Improved speed (3 times per iteration) of
:class:`decomposition.DictLearning` with coordinate descent method
from :class:`linear_model.Lasso`. By `Arthur Mensch`_.


Bug fixes
.........
Expand Down
32 changes: 23 additions & 9 deletions sklearn/decomposition/dict_learning.py
8000
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,10 @@ def _sparse_encode(X, dictionary, gram, cov=None, algorithm='lasso_lars',

elif algorithm == 'lasso_cd':
alpha = float(regularization) / n_features # account for scaling
clf = Lasso(alpha=alpha, fit_intercept=False, precompute=gram,
max_iter=max_iter, warm_start=True)
clf = Lasso(alpha=alpha, fit_intercept=False, normalize=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be three lines only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed

precompute=gram, max_iter=max_iter, warm_start=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realise that we always use Lasso with precomputed Gram matrix in the context of online dictionary learning. Have you checked if this is actually a always a good idea from a performance point of view?

If instead we want to use allow the use of coordinate descent with row slices (sample-wise minibatch) of a Fortran aligned feature array then we would need to change the cython code of coordinate descent to deal properly with strided arrays (views) in the daxpy calls if we want to leverage the check_input=False code path safely.

This does not seem to be hard but maybe this should be investigated in a separate PR.

clf.coef_ = init
clf.fit(dictionary.T, X.T)
clf.fit(dictionary.T, X.T, check_input=False)
new_code = clf.coef_

elif algorithm == 'lars':
Expand Down Expand Up @@ -224,8 +224,10 @@ def sparse_encode(X, dictionary, gram=None, cov=None, algorithm='lasso_lars',
n_components = dictionary.shape[0]

if gram is None and algorithm != 'threshold':
gram = np.dot(dictionary, dictionary.T)
if cov is None:
# Transposing product to ensure Fortran ordering
gram = np.dot(dictionary, dictionary.T).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensure that Gram matrix is in Fortran order

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an inline comment to make that optim explicit.


if cov is None and algorithm != 'lasso_cd':
copy_cov = False
cov = np.dot(dictionary, X.T)

Expand All @@ -239,18 +241,27 @@ def sparse_encode(X, dictionary, gram=None, cov=None, algorithm='lasso_lars',
regularization = 1.

if n_jobs == 1 or algorithm == 'threshold':
return _sparse_encode(X, dictionary, gram, cov=cov,
code = _sparse_encode(X,
dictionary, gram, cov=cov,
algorithm=algorithm,
regularization=regularization, copy_cov=copy_cov,
init=init, max_iter=max_iter)
init=init,
max_iter=max_iter)
# This ensure that dimensionality of code is always 2,
# consistant with the case n_jobs > 1
if code.ndim == 1:
code = code[np.newaxis, :]
return code

# Enter parallel code block
code = np.empty((n_samples, n_components))
slices = list(gen_even_slices(n_samples, _get_n_jobs(n_jobs)))

code_views = Parallel(n_jobs=n_jobs)(
delayed(_sparse_encode)(
X[this_slice], dictionary, gram, cov[:, this_slice], algorithm,
X[this_slice], dictionary, gram,
cov[:, this_slice] if cov is not None else None,
algorithm,
regularization=regularization, copy_cov=copy_cov,
init=init[this_slice] if init is not None else None,
max_iter=max_iter)
Expand Down Expand Up @@ -639,7 +650,6 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
else:
dictionary = np.r_[dictionary,
np.zeros((n_components - r, dictionary.shape[1]))]
dictionary = np.ascontiguousarray(dictionary.T)

if verbose == 1:
print('[dict_learning]', end=' ')
Expand All @@ -650,6 +660,10 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
else:
X_train = X

dictionary = check_array(dictionary.T, order='F', dtype=np.float64,
copy=False)
X_train = check_array(X_train, order='C', dtype=np.float64, copy=False)

batches = gen_batches(n_samples, batch_size)
batches = itertools.cycle(batches)

Expand Down
22 changes: 16 additions & 6 deletions sklearn/linear_model/base.py
A93C
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from __future__ import division
from abc import ABCMeta, abstractmethod
import numbers
import warnings

import numpy as np
import scipy.sparse as sp
Expand Down Expand Up @@ -397,7 +398,8 @@ def fit(self, X, y):
return self


def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy):
def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy,
Xy_precompute_order=None):
"""Aux function used at beginning of fit in linear models"""
n_samples, n_features = X.shape
if sparse.isspmatrix(X):
Expand All @@ -408,10 +410,13 @@ def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy):
# copy was done in fit if necessary
X, y, X_mean, y_mean, X_std = center_data(
X, y, fit_intercept, normalize, copy=copy)

if hasattr(precompute, '__array__') \
and not np.allclose(X_mean, np.zeros(n_features)) \
and not np.allclose(X_std, np.ones(n_features)):
if hasattr(precompute, '__array__') and (
fit_intercept and not np.allclose(X_mean, np.zeros(n_features))
or normalize and not np.allclose(X_std, np.ones(n_features))):
warnings.warn("Gram matrix was provided but X was centered"
" to fit intercept, "
"or X was normalized : recomputing Gram matrix.",
UserWarning)
# recompute Gram
precompute = 'auto'
Xy = None
Expand All @@ -422,11 +427,16 @@ def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy):

if precompute is True:
precompute = np.dot(X.T, X)
if Xy_precompute_order == 'F':
precompute = np.dot(X.T, X).T

if not hasattr(precompute, '__array__'):
Xy = None # cannot use Xy if precompute is not Gram

if hasattr(precompute, '__array__') and Xy is None:
Xy = np.dot(X.T, y)
if Xy_precompute_order == 'F':
Xy = np.dot(y.T, X).T
else:
Xy = np.dot(X.T, y)

return X, y, X_mean, y_mean, X_std, precompute, Xy
55 changes: 37 additions & 18 deletions sklearn/linear_model/coordinate_descent.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,11 +359,18 @@ def enet_path(X, y, l1_ratio=0.5, eps=1e-3, n_alphas=100, alphas=None,
ElasticNet
ElasticNetCV
"""
X = check_array(X, 'csc', dtype=np.float64, order='F', copy=copy_X)
y = check_array(y, 'csc', dtype=np.float64, order='F', copy=False, ensure_2d=False)
if Xy is not None:
Xy = check_array(Xy, 'csc', dtype=np.float64, order='F', copy=False,
ensure_2d=False)
# We expect X and y to be already float64 Fortran ordered when bypassing
# checks
check_input = 'check_input' not in params or params['check_input']
pre_fit = 'check_input' not in params or params['pre_fit']
if check_input:
X = check_array(X, 'csc', dtype=np.float64, order='F', copy=copy_X)
y = check_array(y, 'csc', dtype=np.float64, order='F', copy=False,
ensure_2d=False)
if Xy is not None:
Xy = check_array(Xy, 'csc', dtype=np.float64, order='F',
copy=False,
ensure_2d=False)
n_samples, n_features = X.shape

multi_output = False
Expand All @@ -380,10 +387,13 @@ def enet_path(X, y, l1_ratio=0.5, eps=1e-3, n_alphas=100, alphas=None,
else:
X_sparse_scaling = np.zeros(n_features)

# X should be normalized and fit already.
X, y, X_mean, y_mean, X_std, precompute, Xy = \
_pre_fit(X, y, Xy, precompute, normalize=False, fit_intercept=False,
copy=False)
# X should be normalized and fit already if function is called
# from ElasticNet.fit
if pre_fit:
X, y, X_mean, y_mean, X_std, precompute, Xy = \
_pre_fit(X, y, Xy, precompute, normalize=False,
fit_intercept=False,
copy=False, Xy_precompute_order='F')
if alphas is None:
# No need to normalize of fit_intercept: it has been done
# above
Expand Down Expand Up @@ -428,7 +438,11 @@ def enet_path(X, y, l1_ratio=0.5, eps=1e-3, n_alphas=100, alphas=None,
model = cd_fast.enet_coordinate_descent_multi_task(
coef_, l1_reg, l2_reg, X, y, max_iter, tol, rng, random)
elif isinstance(precompute, np.ndarray):
precompute = check_array(precompute, 'csc', dtype=np.float64, order='F')
# We expect precompute to be already Fortran ordered when bypassing
# checks
if check_input:
precompute = check_array(precompute, 'csc', dtype=np.float64,
order='F')
model = cd_fast.enet_coordinate_descent_gram(
coef_, l1_reg, l2_reg, precompute, Xy, y, max_iter,
tol, rng, random, positive)
Expand Down Expand Up @@ -601,7 +615,7 @@ def __init__(self, alpha=1.0, l1_ratio=0.5, fit_intercept=True,
self.random_state = random_state
self.selection = selection

def fit(self, X, y):
def fit(self, X, y, check_input=True):
"""Fit model with coordinate descent.

Parameters
Expand All @@ -622,6 +636,7 @@ def fit(self, X, y):
To avoid memory re-allocation it is advised to allocate the
initial data in memory directly using that format.
"""

if self.alpha == 0:
warnings.warn("With alpha=0, this algorithm does not converge "
"well. You are advised to use the LinearRegression "
Expand All @@ -632,14 +647,16 @@ def fit(self, X, y):
"slower even when n_samples > n_features. Hence "
"it will be removed in 0.18.",
DeprecationWarning, stacklevel=2)

X, y = check_X_y(X, y, accept_sparse='csc', dtype=np.float64,
order='F', copy=self.copy_X and self.fit_intercept,
multi_output=True, y_numeric=True)

# We expect X and y to be already float64 Fortran ordered arrays
# when bypassing checks
if check_input:
X, y = check_X_y(X, y, accept_sparse='csc', dtype=np.float64,
order='F',
copy=self.copy_X and self.fit_intercept,
multi_output=True, y_numeric=True)
X, y, X_mean, y_mean, X_std, precompute, Xy = \
_pre_fit(X, y, None, self.precompute, self.normalize,
self.fit_intercept, copy=True)
self.fit_intercept, copy=False, Xy_precompute_order='F')

if y.ndim == 1:
y = y[:, np.newaxis]
Expand Down Expand Up @@ -678,7 +695,9 @@ def fit(self, X, y):
X_mean=X_mean, X_std=X_std, return_n_iter=True,
coef_init=coef_[k], max_iter=self.max_iter,
random_state=self.random_state,
selection=self.selection)
selection=self.selection,
check_input=False,
pre_fit=False)
coef_[k] = this_coef[:, 0]
dual_gaps_[k] = this_dual_gap[0]
self.n_iter_.append(this_iter[0])
Expand Down
38 changes: 38 additions & 0 deletions sklearn/linear_model/tests/test_coordinate_descent.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from sklearn.utils.testing import assert_greater
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import assert_warns_message
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.testing import assert_array_equal
from sklearn.utils.testing import TempMemmap
Expand All @@ -25,6 +26,7 @@
LassoCV, ElasticNet, ElasticNetCV, MultiTaskLasso, MultiTaskElasticNet, \
MultiTaskElasticNetCV, MultiTaskLassoCV, lasso_path, enet_path
from sklearn.linear_model import LassoLarsCV, lars_path
from sklearn.utils import check_array


def check_warnings():
Expand Down Expand Up @@ -628,3 +630,39 @@ def test_sparse_dense_descent_paths():
_, coefs, _ = path(X, y, fit_intercept=False)
_, sparse_coefs, _ = path(csr, y, fit_intercept=False)
assert_array_almost_equal(coefs, sparse_coefs)


def test_check_input_false():
X, y, _, _ = build_dataset(n_samples=20, n_features=10)
X = check_array(X, order='F', dtype='float64')
y = check_array(X, order='F', dtype='float64')
clf = ElasticNet(selection='cyclic', tol=1e-8)
# Check that no error is raised if data is provided in the right format
clf.fit(X, y, check_input=False)
X = check_array(X, order='F', dtype='float32')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that if you try with the correct dtype but the wrong memory layout:

X = check_array(X, order='C', dtype='float64')

you should not get an error but a wrong results. This is expected but maybe this should be asserted in the tests as well.

clf.fit(X, y, check_input=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant check_input=False here I guess. Does this test pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I did not read this line and the following line correctly...

# Check that an error is raised if data is provided in the wrong format,
# because of check bypassing
assert_raises(ValueError, clf.fit, X, y, check_input=False)

# With no input checking, providing X in C order should result in false
# computation
X = check_array(X, order='C', dtype='float64')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe insert an inline comment before this line to explain the following assertions.

clf.fit(X, y, check_input=False)
coef_false = clf.coef_
clf.fit(X, y, check_input=True)
coef_true = clf.coef_
assert_raises(AssertionError, assert_array_almost_equal,
coef_true, coef_false)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line.

def test_overrided_gram_matrix():
X, y, _, _ = build_dataset(n_samples=20, n_features=10)
Gram = X.T.dot(X)
clf = ElasticNet(selection='cyclic', tol=1e-8, precompute=Gram,
fit_intercept=True)
assert_warns_message(UserWarning,
"Gram matrix was provided but X was centered"
" to fit intercept, "
"or X was normalized : recomputing Gram matrix.",
clf.fit, X, y)
0