8000 [MRG+1] Fix omp non normalize by agramfort · Pull Request #10071 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Fix omp non normalize #10071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 16, 2017

Conversation

agramfort
Copy link
Member

Reference Issues/PRs

there was no open issue. I fixed directly. Basically OMP was returning
garbage as soon as norms of each column of X was not 1. This happens
when setting normalize=False as option.

What does this implement/fix? Explain your changes.

cf above

Any other comments?

thanks @josephsalmon for pointing me the bug.

@@ -91,7 +92,6 @@ def _cholesky_omp(X, y, n_nonzero_coefs, tol=None, copy_X=True,
# old scipy, we need the garbage upper triangle to be non-Inf
L = np.zeros((max_features, max_features), dtype=X.dtype)

L[0, 0] = 1.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L is the cholesky of the gram matrix. It's only 1 for normalized columns

@@ -110,17 +111,23 @@ def _cholesky_omp(X, y, n_nonzero_coefs, tol=None, copy_X=True,
overwrite_b=True,
**solve_triangular_args)
v = nrm2(L[n_active, :n_active]) ** 2
if 1 - v <= min_float: # selected atoms are dependent
Lkk = linalg.norm(X[:, lam]) ** 2 - v
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. It's not 1 - v
but ||Xk||^2 - v

@@ -22,6 +22,9 @@
n_samples, n_features, n_nonzero_coefs, n_targets = 20, 30, 5, 3
y, X, gamma = make_sparse_coded_signal(n_targets, n_features, n_samples,
n_nonzero_coefs, random_state=0)
# Make X not of norm 1 for testing
X *= 10
y *= 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now all tests are ran with non-normalized X

coef_normalized = omp.coef_[0].copy()
omp.set_params(fit_intercept=True, normalize=False)
omp.fit(X, y[:, 0])
assert_array_almost_equal(coef_normalized, omp.coef_)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I check that coef_ are the same if you normalize during fit or not

@agramfort
Copy link
Member Author

maybe @vene can you have a look?

@TomDLT TomDLT changed the title Fix omp non normalize [MRG+1] Fix omp non normalize Nov 15, 2017
Copy link
Member
@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know OMP or the linalg well, but from a non-expert perspective, this looks well-justified and tested. LGTM

@@ -120,6 +120,10 @@ Classifiers and regressors
error for prior list which summed to 1.
:issue:`10005` by :user:`Gaurav Dhingra <gxyd>`.

- Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this should be mentioned in changed models above so that users expect different results without reading the full changelog.

@@ -120,6 +120,10 @@ Classifiers and regressors
error for prior list which summed to 1.
:issue:`10005` by :user:`Gaurav Dhingra <gxyd>`.

- Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was
broken for X having non-normalized columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's not going to have any effect for uses where normalize=True?

@codecov
Copy link
codecov bot commented Nov 16, 2017

Codecov Report

Merging #10071 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10071      +/-   ##
==========================================
+ Coverage   96.19%   96.19%   +<.01%     
==========================================
  Files         336      336              
  Lines       62739    62749      +10     
==========================================
+ Hits        60353    60363      +10     
  Misses       2386     2386
Impacted Files Coverage Δ
sklearn/linear_model/omp.py 95.95% <100%> (+0.06%) ⬆️
sklearn/linear_model/tests/test_omp.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abb43c1...7fdff59. Read the comment docs.

@agramfort
Copy link
Member Author

@jnothman I rephrased the what's new entry.

@jnothman
Copy link
Member

I'd not meant to "request changes". Thanks.

@jnothman jnothman merged commit f8b09ce into scikit-learn:master Nov 16, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* fix OMP when columns of X are not normalized
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0