[MRG+1] Fix omp non normalize #10071

agramfort · 2017-11-05T11:05:20Z

Reference Issues/PRs

there was no open issue. I fixed directly. Basically OMP was returning
garbage as soon as norms of each column of X was not 1. This happens
when setting normalize=False as option.

What does this implement/fix? Explain your changes.

cf above

Any other comments?

thanks @josephsalmon for pointing me the bug.

agramfort · 2017-11-05T11:06:05Z

sklearn/linear_model/omp.py

@@ -91,7 +92,6 @@ def _cholesky_omp(X, y, n_nonzero_coefs, tol=None, copy_X=True,
        # old scipy, we need the garbage upper triangle to be non-Inf
        L = np.zeros((max_features, max_features), dtype=X.dtype)

-    L[0, 0] = 1.


L is the cholesky of the gram matrix. It's only 1 for normalized columns

agramfort · 2017-11-05T11:06:38Z

sklearn/linear_model/omp.py

@@ -110,17 +111,23 @@ def _cholesky_omp(X, y, n_nonzero_coefs, tol=None, copy_X=True,
                                    overwrite_b=True,
                                    **solve_triangular_args)
            v = nrm2(L[n_active, :n_active]) ** 2
-            if 1 - v <= min_float:  # selected atoms are dependent
+            Lkk = linalg.norm(X[:, lam]) ** 2 - v


same here. It's not 1 - v
but ||Xk||^2 - v

agramfort · 2017-11-05T11:07:09Z

sklearn/linear_model/tests/test_omp.py

@@ -22,6 +22,9 @@
 n_samples, n_features, n_nonzero_coefs, n_targets = 20, 30, 5, 3
 y, X, gamma = make_sparse_coded_signal(n_targets, n_features, n_samples,
                                       n_nonzero_coefs, random_state=0)
+# Make X not of norm 1 for testing
+X *= 10
+y *= 10


now all tests are ran with non-normalized X

agramfort · 2017-11-05T11:08:00Z

sklearn/linear_model/tests/test_omp.py

+    coef_normalized = omp.coef_[0].copy()
+    omp.set_params(fit_intercept=True, normalize=False)
+    omp.fit(X, y[:, 0])
+    assert_array_almost_equal(coef_normalized, omp.coef_)


here I check that coef_ are the same if you normalize during fit or not

agramfort · 2017-11-08T09:10:20Z

maybe @vene can you have a look?

TomDLT

LGTM

jnothman

I don't know OMP or the linalg well, but from a non-expert perspective, this looks well-justified and tested. LGTM

jnothman · 2017-11-15T23:04:00Z

doc/whats_new/v0.20.rst

@@ -120,6 +120,10 @@ Classifiers and regressors
  error for prior list which summed to 1.
  :issue:`10005` by :user:`Gaurav Dhingra <gxyd>`.

+- Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was


I suppose this should be mentioned in changed models above so that users expect different results without reading the full changelog.

jnothman · 2017-11-15T23:08:40Z

doc/whats_new/v0.20.rst

@@ -120,6 +120,10 @@ Classifiers and regressors
  error for prior list which summed to 1.
  :issue:`10005` by :user:`Gaurav Dhingra <gxyd>`.

+- Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was
+  broken for X having non-normalized columns.


And it's not going to have any effect for uses where normalize=True?

codecov · 2017-11-16T17:50:24Z

Codecov Report

Merging #10071 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #10071      +/-   ##
==========================================
+ Coverage   96.19%   96.19%   +<.01%     
==========================================
  Files         336      336              
  Lines       62739    62749      +10     
==========================================
+ Hits        60353    60363      +10     
  Misses       2386     2386

Impacted Files	Coverage Δ
sklearn/linear_model/omp.py	`95.95% <100%> (+0.06%)`	⬆️
sklearn/linear_model/tests/test_omp.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abb43c1...7fdff59. Read the comment docs.

agramfort · 2017-11-16T17:50:37Z

@jnothman I rephrased the what's new entry.

jnothman · 2017-11-16T21:42:37Z

I'd not meant to "request changes". Thanks.

* fix OMP when columns of X are not normalized

agramfort added 4 commits November 4, 2017 17:51

fix OMP when columns of X are not normalized

073761f

update test + make it work gram precomputed

647326c

update what's new

1b9d865

update test

b5c15f7

agramfort commented Nov 5, 2017

View reviewed changes

TomDLT changed the title ~~Fix omp non normalize~~ [MRG+1] Fix omp non normalize Nov 15, 2017

TomDLT approved these changes Nov 15, 2017

View reviewed changes

jnothman requested changes Nov 15, 2017

View reviewed changes

rephrase what's new

7fdff59

jnothman approved these changes Nov 16, 2017

View reviewed changes

jnothman merged commit f8b09ce into scikit-learn:master Nov 16, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Fix omp non normalize (scikit-learn#10071)

b0b71c3

* fix OMP when columns of X are not normalized

This was referenced Dec 18, 2018

Release 0.20.2 #12784

Merged

test_omp_cv fails with MKL and AVX-512 #12676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] Fix omp non normalize #10071

[MRG+1] Fix omp non normalize #10071

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+1] Fix omp non normalize #10071

[MRG+1] Fix omp non normalize #10071

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!