-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] FIX Projected Gradient NMF stopping condition #2557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -499,17 +499,18 @@ def fit_transform(self, X, y=None): | |
- safe_sparse_dot(X, H.T, dense_output=True)) | ||
gradH = (np.dot(np.dot(W.T, W), H) | ||
- safe_sparse_dot(W.T, X, dense_output=True)) | ||
init_grad = norm(np.r_[gradW, gradH.T]) | ||
tolW = max(0.001, self.tol) * init_grad # why max? | ||
init_grad = (gradW ** 2).sum() + (gradH ** 2).sum() | ||
tolW = max(0.001, self.tol) * np.sqrt(init_grad) # why max? | ||
tolH = tolW | ||
|
||
tol = self.tol * init_grad | ||
tol = init_grad * self.tol ** 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather not change the tol. I think the following stopping criterion would be easier to follow: This is intuitive because because the sum of the projected gradient norms is supposed to become smaller and smaller compared to |
||
|
||
for n_iter in range(1, self.max_iter + 1): | ||
# stopping condition | ||
# as discussed in paper | ||
proj_norm = norm(np.r_[gradW[np.logical_or(gradW < 0, W > 0)], | ||
gradH[np.logical_or(gradH < 0, H > 0)]]) | ||
# stopping condition on the norm of the projected gradient | ||
proj_norm = ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find the code a bit hard to follow. I would rather define variables |
||
((gradW * np.logical_or(gradW < 0, W > 0)) ** 2).sum() + | ||
((gradH * np.logical_or(gradH < 0, H > 0)) ** 2).sum()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. In fact the code would get more readable with a |
||
|
||
if proj_norm < tol: | ||
break | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(X ** 2).sum()
is very memory-intensive and slow. I'd do it this way:then redefine
norm
assqrt(sqnorm(x))
. This way, the memory overhead is constant instead of linear and BLAS'sddot
can be used to compute the squared norm in a single pass.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe you can do something like:
to avoid a memory copy for fortran-ordered arrays (the ravel is already
avoiding one for C-ordered arrays).
This function should be a helper that goes in utils, IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
x.ravel(order='K')
, "to view the elements in the order they occur in memory, except for reversing the data when strides are negative" (docs).X.squeeze()
has the same guarantee of never copying, but leaves negative strides alone which may impact BLAS performance. It's a tad faster, though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw., we have a different definition of
norm
insklearn.utils.extmath
which uses the BLASnrm2
function...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Is this in the oldest version of numpy that we support? If not,
should we raise our requirements?
The negative strides will induce a copy before the BLAS call I believe.
But anyhow, it is not the same semantincs than ravel on a 2D array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's no copy. When the strides are inappropriate for BLAS,
dot
(the vector dot version) backs off to a slower routine.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly we discussed it and it proved slower for computing the Frobenius norm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Maybe we should write more comments :)