8000 [WIP] REFACTOR: dict_learning and dict_learning_online by dohmatob · Pull Request #9036 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] REFACTOR: dict_learning and dict_learning_online #9036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dohmatob
Copy link
Contributor
@dohmatob dohmatob commented Jun 7, 2017

Addresses issue #9009.

Proposal

Have a single function dict_learning_online(batch_size=some_default) which reproduces full-batch mode (the current dict_learning function) when batch_size == n_samples.

@dohmatob
Copy link
Contributor Author
dohmatob commented Jun 7, 2017

@agramfort: could you review this ? Thanks.

@dohmatob dohmatob changed the title REFACTOR: dic_learning and dict_learning_online refactored into a sin… REFACTOR: dict_learning and dict_learning_online Jun 7, 2017
@vene
Copy link
Member
vene commented Jun 7, 2017

I'm looking at this, unfortunately I have zero memory of the old code, I need to re-read it and figure things out.

Depending on how simple you think the changeset is, we might want to split it into the projection bugfix and the refactoring.

Data matrix.

code : array of shape (n_components, n_samples)
A : array of shape (n_components, n_components)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the old version of this just plain wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, I see what is happening. This needs some documentation in terms of what this function is solving

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I don't understand the math in here. Would you mind explaining the changes a bit? The shapes are not the same as before. If the math does the same thing, it is not obvious to me why.

Copy link
Contributor Author
@dohmatob dohmatob Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please be precise about what part of the computations you don't understand. Is it normalization by A[k, k], etc ?

Also, in a moment I'll push a modified docstring with more detailed comments in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a more detailed comment below after thinking about it. No, the normalization makes sense to me. It just seems like the old computation does not correspond to what is in the paper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to be clear, the old code is what is confusing here, not yours!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, indeed it looks like the old _dict_update code is incorrect in the online case, due to the extra dictionary[:, k] = np.dot(R, A[k, :].T) on line 357.

Copy link
Member
@raghavrv raghavrv Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does A denote atoms? if so can we just have it as atoms?

dictionary[:, k] /= A[k, k]
_proj_l2(dictionary[:, k])
# R <- -1.0 * U_k * V_k^T + R
R = ger(-1.0, dictionary[:, k], A[k, :], a=R, overwrite_a=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This went down one level of indentation, is it correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional. The old indentation appeared incorrect to me: it would lead to incorrect modifications (at least unjustified) in R, whenever a random atom was inserted into the dictionary (in case the old one was close to zero). Agreed ?

# R <- -1.0 * U_k * V_k^T + R
R = ger(-1.0, dictionary[:, k], code[k, :], a=R, overwrite_a=True)
dictionary[:, k] /= A[k, k]
_proj_l2(dictionary[:, k])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix of the l2 projection here agrees with eqn 10 in http://www.di.ens.fr/sierra/pdfs/icml09.pdf. Nice catch!

R = ger(1.0, dictionary[:, k], code[k, :], a=R, overwrite_a=True)
dictionary[:, k] = np.dot(R, code[k, :].T)
R = ger(1.0, dictionary[:, k], A[k, :], a=R, overwrite_a=True)
dictionary[:, k] = np.dot(R, A[k, :])
Copy link
Member
@vene vene Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried working through this on paper and I think the batch update ends up different by a factor of CCt.

The old implementation seems incorrect to me; your implementation corresponds to the online update in the paper, applied over the entire dataset.

Previously we had:

R = X - DCt   # dimensions here are transposed compared to sklearn conventions
D[:, k] = R * C[k] = (X - DCt) C[k]

Now, by forming A and B, we have

R = B - DA = XC - DCtC = (X - DCt)C
D[:, k] = R Ak = R CtC[k] = (X - DCt) CCt C[k]

Am I missing something or was this indeed a bug previously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick check and indeed I seem to get slightly different dictionary atoms before and after this PR with the batch version. They seem equal at the first decimal but differ at the second or the third, so they're close.

Your current implementation seems to match the paper correctly, at least in the online case. Unfortunately, the dict_learning_online function doesn't print the obj value at each iteration, so I can't compare that easily, but for the old implementation the objective does seem to decrease at least.

Copy link
Member
@vene vene Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even with n_iter=10000 and tol=1e-20 a difference persists. I computed the residual by hand with

import numpy as np
rng = np.random.RandomState(0)
X = rng.randn(30, 20)
U, V, _ = dict_learning(X, n_components=3, alpha=0.1, random_state=0, verbose=2, max_iter=10000, tol=1e-20)
print(0.5 * np.sum((X - np.dot(U, V)) ** 2) + 0.1 * np.sum(np.abs(U)))

On master I get 188.766637, and on the current branch I get 188.785850, so slightly more...

Also it seems quite faster on master.

Copy link
Contributor Author
@dohmatob dohmatob Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looks like i actually introduced a bug: the line dictionary[:, k] = np.dot(R, A[k, :]) shouldn't be there anymore. This additional buggy computation should account for some of the speed loss you're experiencing. Thanks for the catch. Fixed

BTW, this is what I'm doing:

Minimize    .5 * ||Xt - DC||_F^2 = .5 * tr(DtDA) -  * tr(DtB) + .5 * tr(XXt),

    as a function of the dictionary (V), where

    B = XtCt, A = CCt, R = B - DA.

    Note that the update of the jth atom is given by (see eqn 10 of
    ref paper below):

BCD for kth atom:
    R = R + D[:, k]A[k, :]  # rank-1 update
    D[:, k] = R[:, k] / A[k, k]  # = D[:, k] + R[:, k] / A[k, k]
    D[:, k] = proj(D[:, k])  # eqn 10
    R = R - D[:, k]A[k, :]  # rank-1 update

Copy link
Contributor Author
@dohmatob dohmatob Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked with with your code (modified)

import time
import numpy as np
from sklearn.decomposition.dict_learning import dict_learning

rng = np.random.RandomState(0)
X = rng.randn(30, 20)

t0 = time.time()
U, V, _ = dict_learning(X, n_components=3, alpha=0.1, random_state=0,
                        verbose=2, max_iter=10000, tol=1e-20)
print(0.5 * np.sum((X - np.dot(U, V)) ** 2) + 0.1 * np.sum(np.abs(U)))
print("Time elapsed: %.2f" % (time.time() - t0))

On master:

Iteration  647 (elapsed time:  16s,  0.3mn, current cost  198.777)
Iteration  648 (elapsed time:  16s,  0.3mn, current cost  198.777)
--- Convergence reached after 648 iterations
198.776663706
Time elapsed: 16.46

On PR:

--- Convergence reached after 647 iterations
Learning code... done (total time:  10s,  0.2mn)
198.776663706
Time elapsed: 10.38

Thus the results (and trajectory of cost function, convergence point, etc.) are exactly the same, but my PR is faster than the master branch. Can you confirm. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that your PR is faster and reaches the same error.

With tol=1e-20 the number of iterations needed for convergence fluctuates (both on master and here), but I think (although odd) it's a numerical issue. With reasonable tol 1e-6 the number of iterations is exactly the same. The returned U and V have the same norm as well, so I think you're right in saying the paths are identical.
master

[...]
Iteration  166 (elapsed time:   3s,  0.1mn, current cost  198.781)
--- Convergence reached after 166 iterations
198.780858248
Time elapsed: 3.13
U norm 14.4716684728
V norm 1.73205080757

here

[...]
Iteration  150 (elapsed time:   1s,  0.0mn)
--- Convergence reached after 166 iterations
Learning code... done (total time:   1s,  0.0mn)
198.780765235
Time elapsed: 1.99
U norm 14.4722062044
V norm 1.73205080757

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, the dctionary is the same, but the code is a bit different now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master

Iteration   1 (elapsed time:   0s,  0.0mn, current cost  279.778010084)
Iteration   2 (elapsed time:   0s,  0.0mn, current cost  199.544246461)
Iteration   3 (elapsed time:   0s,  0.0mn, current cost  199.522729734)
[...]
Iteration  165 (elapsed time:   2s,  0.0mn, current cost  198.781266815)
Iteration  166 (elapsed time:   2s,  0.0mn, current cost  198.781055910)
--- Convergence reached after 166 iterations
198.780858248
Time elapsed: 2.13
U norm 14.4716684728
V norm 1.73205080757

head

Iteration   0 (elapsed time:   0s,  0.0mn, current cost  0.000000000
Iteration   1 (elapsed time:   0s,  0.0mn, current cost  279.778014383
Iteration   2 (elapsed time:   0s,  0.0mn, current cost  199.544247482
Iteration   3 (elapsed time:   0s,  0.0mn, current cost  199.522729073
[...]
Iteration  165 (elapsed time:   2s,  0.0mn, current cost  198.781266718
Iteration  166 (elapsed time:   2s,  0.0mn, current cost  198.781055820
--- Convergence reached after 166 iterations
Learning code... done (total time:   2s,  0.0mn)
198.780765048
Time elapsed: 2.16
U norm 14.4722067347
V norm 1.73205080757

I think the paths are indeed the same up to numerical precision. Other than that, with the new version, the code is updated once more at the end. (previously, in the batch mode the dict was the last block updated.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'd expect things to be numerically a bit different due to at least one of the following

  • the modified projection formula
  • the indentation issue on the second call to ger(...)

I'd have proposed to open a separate issue on these things and then address them in a separate PR, but then this would be a really minor change (in terms of patch size). So i'd rather address them as part of this PR, except there are compelling reasons to do otherwise. What do you think ?


.5 * ||Xt - DC||_F^2 = .5 * tr(DtDA) - 2 * tr(DtB) + tr(XXt),

as a function of the dictionary (V), where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dictionary D?

We should get the variable names consistent throughout. I would not be opposed to using C/D for code/dictionary in the whole file, and B/A with the meanings from the paper.

theta = float(batch_size ** 2 + ii + 1 - batch_size)
beta = (theta + 1 - batch_size) / (theta + 1)

beta = 0.
A *= beta
A += np.dot(this_code, this_code.T)
B *= beta
B += np.dot(this_X.T, this_code.T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to move the moving average update inside the if online case and do

else:
    A = np.dot(...)
    B = np.dot(...)

just because it's less subtle

else:
return dictionary.T
return dictionary.T, errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately this is public API that we cannot break, we need to return the exact same signature as before. We can add a default argument (return_errors=False), but this logic is going to get messy


if n_jobs == -1:
n_jobs = cpu_count()
def _compute_residuals_from_code(X, V, U):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helper is currently only called in one place, we could inline it

method='lars', iter_offset=0, random_state=None,
return_inner_stats=False, inner_stats=None,
return_n_iter=False):
batch_size=None, verbose=False, shuffle=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size should stay 3 by default, otherwise this breaks the public API.

And even beyond the fact that the API breaks, current behavior is counterintuitive: the function name is "dict_learning_online", but it does batch learning with the default arguments!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Fixed.

return_inner_stats=False, inner_stats=None,
return_n_iter=False):
batch_size=None, verbose=False, shuffle=True,
n_jobs=1, method='lars', iter_offset=0, tol=0.,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new tol argument is unused in the online case, right? This is confusing from an API standpoint.

Maybe a better idea is to have a private function _dict_learning that does what you're currently doing here, and wrap it with dict_learning and dict_learning_online which implement the appropriate API.

Copy link
Contributor Author
@dohmatob dohmatob Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this lead to a messier code base due to the proliferation of *dict_learning* functions ? Seems there might be a comprise to be made here.

A less invasive solution would be to enrich the documentation of the tol param, explaining that it only takes effect in batch mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions starting with "_" are not documented and not visible to the user normally, so it will be just a way for us to organize the im F987 plementation.

I think that's preferable to having parameters that only work some of the time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed. "dict_learning_online" no longer has "tol" param. Now, the APIs should be as before.

@@ -13,6 +13,7 @@
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.testing import TempMemmap
from sklearn.utils.testing import assert_almost_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

@vene vene changed the title REFACTOR: dict_learning and dict_learning_online [WIP] REFACTOR: dict_learning and dict_learning_online Jun 8, 2017
@vene
Copy link
Member
vene commented Jun 8, 2017

In batch mode the new implementation is nearly identical to the old, up to (I suspect) numerical precision and an extra fit of the code at the end.

In online mode there are differences. This is on the digits dataset:

code:

import time
import numpy as np
from sklearn.decomposition.dict_learning import dict_learning_online
from sklearn.datasets import load_digits

X, _ = load_digits(return_X_y=True)

t0 = time.time()
res = dict_learning_online(X, n_components=3, alpha=0.1, random_state=0,
                           batch_size=32)
U = res[0]
V = res[1]  # this ugly unpacking is needed because of the API differences in this PR

print(0.5 * np.sum((X - np.dot(U, V)) ** 2) + 0.1 * np.sum(np.abs(U)))
print("Time elapsed: %.2f" % (time.time() - t0))
print("U norm", np.linalg.norm(U, 'fro'))
print("V norm", np.linalg.norm(V, 'fro'))
this pr:
final obj 755227.180842
Time elapsed: 2.04
U norm 2391.27139755
V norm 1.73205080757

master:
756096.103391
Time elapsed: 2.23
U norm 2556.1895028
V norm 1.73205080757

@dohmatob, I remember you mentioned yesterday that the old online version might have been incorrect, can you double-check the math? If you are right and can write a test that fails on master but passes here, it will motivate the differences in results and justify merging.

@dohmatob dohmatob force-pushed the dict-update branch 2 times, most recently from e5dcffa to bb2cddc Compare June 8, 2017 08:51
@dohmatob
Copy link
Contributor Author
dohmatob commented Jun 8, 2017

OK, great. I'll look at it this afternoon.

@dohmatob
Copy link
Contributor Author
dohmatob commented Jun 8, 2017

I've double checked the math in the dict update on master, and indeed:

1- The batch mode is correct
2 - The online mode is wrong

Why there is an error on the master branch
From our previous discussion, it's clear that the dictionary update (without the recaling and projection steps):

R = R + D[:, k]A[k, :]
D[:, k] = R[:, k]
R = R - D[:, k]A[k, :]

where

R := B - DA, B := XCt, A := CCt

Now, if we rewrite this in terms of the input data X and the codes C, we get:

D[:, k] = (R - D[:, k]A[k, :])[:, k] = (B - DA + D[:, k]A[k, :])[:, k]
          = ((X - DC)Ct + D[:, k](CCt)[:, k])[:, k]
          = ((X-DC)Ct)[:, k] + (D[:, k](CCt)[:, k])[:, k]
          = (X-DC)C[k, :] + DCC[:, k]
          = (X - DC + D[:, k]C[k, :])C[:, k]
          = formula for dict update on master branch running batch mode

Why the bug doesn't manifest ?
Roughly, the "bug" is equivalent to post-multiplying the a (correctly updated) dictionary with an invertible linear transformation. This transformation get's scaled-away during the code update stage, since (D, C) and (DG, G^{-1}C) are equivalent DL solutions, for any invertible G. Thus the "bug" is absorbed. Unfortunately, there is practical F438 ly no numerical way (i.e tests) to reveal this issue.

@dohmatob
Copy link
Contributor Author
dohmatob commented Jun 8, 2017

BTW, i've fixed the "API unpacking" problem you complained about. The API dict_learning and dict_learning_online APIs should be same as before.

random_state=None):
"""Update the dense dictionary factor in place.
def _proj_l2(v, copy=False):
"""Projects v unto l2 unit ball in-place.
Copy link
Member
@raghavrv raghavrv Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> in-place (if copy=False) ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, that's what happens when docstring and func prototype follow different evolutions and become out-of-sync. Good catch. Thx :)

def _update_dict(dictionary, Y, code, verbose=False, return_r2=False,
random_state=None):
"""Update the dense dictionary factor in place.
def _proj_l2(v, copy=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename it to _project_l2


B = XtCt, A = CCt, R = B - DA.

Note that the update of the jth atom is given by (see eqn 10 of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean kth atom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, fixed. Thanks.

R = R + D[:, k]A[k, :] # rank-1 update
D[:, k] = R[:, k] / A[k, k] # = D[:, k] + R[k] / A[k, k]
D[:, k] = proj(D[:, k]) # eqn 10
R = R - D[:, k]A[k, :] # rank-1 update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note on what the different alphabets represent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A, B are standard notation from the reference paper (and this notation is already used on master).
Please see eqn 11 of ref paper http://www.di.ens.fr/sierra/pdfs/icml09.pdf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raghav makes a good point, we need to document A and B. No, A are not atoms, D[:, k] are the atoms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually scratch that. This is already documented at line 332, @raghavrv, is it easy to miss?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could rename dictionary to D in the entire file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how about instead making clear (by means of docstrings, comments, etc.) that D means dictionary (which is already the case). If we set out to rename D, we'd have to do something similar for code, this_code, etc. I think the labels D, C, etc. are only interesting when stating the formula being implemented (as done in the docstrings). But I may be wrong.

Data matrix.

code : array of shape (n_components, n_samples)
A : array of shape (n_components, n_components)
Copy link
Member
@raghavrv raghavrv Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does A denote atoms? if so can we just have it as atoms?

This function has two modes:

1. Batch mode, activated when batch_size is None or
batch_size >= n_sampels. This is the default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_samples

@@ -593,14 +582,18 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
n_iter : int,
Number of iterations to perform.

tol : float,
Tolerance for the stopping condition. Used in batch mode.

return_code : boolean,
Whether to also return the code U or just the dictionary V.

dict_init : array of shape (n_components, n_features),
Initial value for the dictionary for warm restart scenarios.

callback :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callback : callable, optional default None

Copy link
Contributor Author
@dohmatob dohmatob Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring issue you'r pointing to already exists in master. Fixed anyways.

@dohmatob dohmatob force-pushed the dict-update branch 5 times, most recently from 584b955 to 5c39eb5 Compare June 8, 2017 13:23
@dohmatob dohmatob force-pushed the dict-update branch 8 times, most recently from c4cd31e to c4b1b28 Compare June 9, 2017 09:10
@dohmatob dohmatob force-pushed the dict-update branch 5 times, most recently from 9339fc0 to c4a2297 Compare June 10, 2017 12:29
@vene
Copy link
Member
vene commented Jun 10, 2017

We could pose the bugfix as an optimization, because we avoid the spurious multiplication with a rotation matrix that was present before. Maybe this way @GaelVaroquaux will like it :)

@@ -571,12 +577,19 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,

(U^*, V^*) = argmin 0.5 || X - U V ||_2^2 + alpha * || U ||_1
(U,V)
with || V_k ||_2 = 1 for all 0 <= k < n_components
with || V_k ||_2 <= 1 for all 0 <= k < n_components

where V is the dictionary and U is the sparse code. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the U, V here is also inconsistent with D/C as used above, and with D/alpha as used in the paper... but consistent with other decompositions algos in sklearn. Maybe we should use U/V in the other docstrings and maybe in code too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

@@ -600,7 +613,8 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
Initial value for the dictionary for warm restart scenarios.

callback :
Callable that gets invoked every five iterations.
Callable that gets invoked every five iterations. If it returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be customizable how often it gets invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the old logic. But yeah, I can add that to the PR.

@@ -600,7 +613,8 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
Initial value for the dictionary for warm restart scenarios.

callback :
Callable that gets invoked every five iterations.
Callable that gets invoked every five iterations. If it returns
non-True, then the main loop (iteration on data) is aborted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess the term is "falsish"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, ya sure

@dohmatob
Copy link
Contributor Author
dohmatob commented Jun 11, 2017

@vene: the spurious multiplication bug that we found, came as a by-product after looking at the code. My main objective with this PR was / is to collapse dict_learning and dict_learning_online into a single function (as they really should be!) :) Along the road, this fixes the spurious multiplication bug too.

R = -np.dot(dictionary, code)
R += Y
R = -np.dot(dictionary, A)
R += B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually faster than R = B - np.dot(dictionary, A)? The latter is more readable imo

R = np.asfortranarray(R)
ger, = linalg.get_blas_funcs(('ger',), (dictionary, code))
ger, = linalg.get_blas_funcs(('ger',), (dictionary, A))
for k in range(n_components):
# R <- 1.0 * U_k * V_k^T + R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V_k -> A_k

U_k should be dictionary or the other way around, as long as consistent in the entire file

@jeremiedbb
Copy link
Member

Some refactoring has been done in #18975. Now dict_learning_online calls MiniBatchDictionaryLearning instead of the opposite, so the proposed refactoring here is no longer relevant. Instead, the same refactoring should be done for DictionaryLearning. The fix for the loss has also been done in a separate PR. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0