-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Fix omp non normalize #10071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fix omp non normalize #10071
Conversation
@@ -91,7 +92,6 @@ def _cholesky_omp(X, y, n_nonzero_coefs, tol=None, copy_X=True, | |||
# old scipy, we need the garbage upper triangle to be non-Inf | |||
L = np.zeros((max_features, max_features), dtype=X.dtype) | |||
|
|||
L[0, 0] = 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L is the cholesky of the gram matrix. It's only 1 for normalized columns
@@ -110,17 +111,23 @@ def _cholesky_omp(X, y, n_nonzero_coefs, tol=None, copy_X=True, | |||
overwrite_b=True, | |||
**solve_triangular_args) | |||
v = nrm2(L[n_active, :n_active]) ** 2 | |||
if 1 - v <= min_float: # selected atoms are dependent | |||
Lkk = linalg.norm(X[:, lam]) ** 2 - v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. It's not 1 - v
but ||Xk||^2 - v
@@ -22,6 +22,9 @@ | |||
n_samples, n_features, n_nonzero_coefs, n_targets = 20, 30, 5, 3 | |||
y, X, gamma = make_sparse_coded_signal(n_targets, n_features, n_samples, | |||
n_nonzero_coefs, random_state=0) | |||
# Make X not of norm 1 for testing | |||
X *= 10 | |||
y *= 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now all tests are ran with non-normalized X
coef_normalized = omp.coef_[0].copy() | ||
omp.set_params(fit_intercept=True, normalize=False) | ||
omp.fit(X, y[:, 0]) | ||
assert_array_almost_equal(coef_normalized, omp.coef_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I check that coef_ are the same if you normalize during fit or not
maybe @vene can you have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know OMP or the linalg well, but from a non-expert perspective, this looks well-justified and tested. LGTM
@@ -120,6 +120,10 @@ Classifiers and regressors | |||
error for prior list which summed to 1. | |||
:issue:`10005` by :user:`Gaurav Dhingra <gxyd>`. | |||
|
|||
- Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this should be mentioned in changed models above so that users expect different results without reading the full changelog.
doc/whats_new/v0.20.rst
Outdated
@@ -120,6 +120,10 @@ Classifiers and regressors | |||
error for prior list which summed to 1. | |||
:issue:`10005` by :user:`Gaurav Dhingra <gxyd>`. | |||
|
|||
- Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was | |||
broken for X having non-normalized columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's not going to have any effect for uses where normalize=True
?
Codecov Report
@@ Coverage Diff @@
## master #10071 +/- ##
==========================================
+ Coverage 96.19% 96.19% +<.01%
==========================================
Files 336 336
Lines 62739 62749 +10
==========================================
+ Hits 60353 60363 +10
Misses 2386 2386
Continue to review full report at Codecov.
|
@jnothman I rephrased the what's new entry. |
I'd not meant to "request changes". Thanks. |
* fix OMP when columns of X are not normalized
Reference Issues/PRs
there was no open issue. I fixed directly. Basically OMP was returning
garbage as soon as norms of each column of X was not 1. This happens
when setting normalize=False as option.
What does this implement/fix? Explain your changes.
cf above
Any other comments?
thanks @josephsalmon for pointing me the bug.