8000 [WIP] FIX Projected Gradient NMF stopping condition by vene · Pull Request #2557 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] FIX Projected Gradient NMF stopping condition #2557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions sklearn/decomposition/nmf.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,17 +499,18 @@ def fit_transform(self, X, y=None):
- safe_sparse_dot(X, H.T, dense_output=True))
gradH = (np.dot(np.dot(W.T, W), H)
- safe_sparse_dot(W.T, X, dense_output=True))
init_grad = norm(np.r_[gradW, gradH.T])
tolW = max(0.001, self.tol) * init_grad # why max?
init_grad = (gradW ** 2).sum() + (gradH ** 2).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(X ** 2).sum() is very memory-intensive and slow. I'd do it this way:

def sqnorm(x):
    x = x.ravel()
    return np.dot(x, x)

then redefine norm as sqrt(sqnorm(x)). This way, the memory overhead is constant instead of linear and BLAS's ddot can be used to compute the squared norm in a single pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def sqnorm(x):
x = x.ravel()
return np.dot(x, x)

Actually, maybe you can do something like:

   if x.flags['F_CONTIGUOUS']:
      x = x.T.ravel()
   else:
      x = x.ravel()

to avoid a memory copy for fortran-ordered arrays (the ravel is already
avoiding one for C-ordered arrays).

This function should be a helper that goes in utils, IMHO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or x.ravel(order='K'), "to view the elements in the order they occur in memory, except for reversing the data when strides are negative" (docs).

X.squeeze() has the same guarantee of never copying, but leaves negative strides alone which may impact BLAS performance. It's a tad faster, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw., we have a different definition of norm in sklearn.utils.extmath which uses the BLAS nrm2 function...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or x.ravel(order='K'), "to view the elements in the order they occur in
memory, except for reversing the data when strides are negative"
(docs).

Indeed. Is this in the oldest version of numpy that we support? If not,
should we raise our requirements?

X.squeeze() has the same guarantee of never copying, but leaves negative
strides alone which may impact BLAS performance. It's a tad faster, though.

The negative strides will induce a copy before the BLAS call I believe.
But anyhow, it is not the same semantincs than ravel on a 2D array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no copy. When the strides are inappropriate for BLAS, dot (the vector dot version) backs off to a slower routine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw., we have a different definition of norm in sklearn.utils.extmath which uses the BLAS nrm2 function...

If I remember correctly we discussed it and it proved slower for computing the Frobenius norm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Maybe we should write more comments :)

tolW = max(0.001, self.tol) * np.sqrt(init_grad) # why max?
tolH = tolW

tol = self.tol * init_grad
tol = init_grad * self.tol ** 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not change the tol. I think the following stopping criterion would be easier to follow:
if current_norm / init_norm < tol: break

This is intuitive because because the sum of the projected gradient norms is supposed to become smaller and smaller compared to init_norm.


for n_iter in range(1, self.max_iter + 1):
# stopping condition
# as discussed in paper
proj_norm = norm(np.r_[gradW[np.logical_or(gradW < 0, W > 0)],
gradH[np.logical_or(gradH < 0, H > 0)]])
# stopping condition on the norm of the projected gradient
proj_norm = (
8000 Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the code a bit hard to follow. I would rather define variables proj_grad_W and proj_grad_H and compute the Frobenius norms of those.

((gradW * np.logical_or(gradW < 0, W > 0)) ** 2).sum() +
((gradH * np.logical_or(gradH < 0, H > 0)) ** 2).sum())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. In fact the code would get more readable with a sqnorm helper.


if proj_norm < tol:
break

Expand Down
0