-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: support for empty matrices in linalg.lstsq #11594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI test failures in numpy/linalg/tests/test_linalg.py
are related but not surprising after adding support for handling empty matrices.
There are quite a few tests cases there with empty inputs that expect for LinAlgError
to be raised, so those will have to be adjusted I guess.
Just adjusted those. I've been having trouble with "building and linking". (See #8654) Wonder if you could offer any advice. |
I see you've already made adjustments! I took a look at your workflow--I sometimes have new issues when I start working on another machine, but usually I resort to using a conda environment. I'm working on a mac at the moment using conda and:
works ok from within the NumPy repo ( |
I see you are editing code directly in |
Generally, editing stuff in |
Anyway... installing from the repo into a virtualenv doesn't quite change things... not sure what is wrong with my system...
then...
|
6f3b1ad
to
a78a044
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the patch, but adding this many special cases is definitely not the right approach. You should be able to do this much less verbosely by copying the style of #11424
Hmmm... When A has some height but no columns and b is non-empty. The norm squared of b’s columns have to be computed for the residuals. When A is non-empty but b is, we need to compute singular values as per normal. (On hindsight, I would use the first column as a replacement for b.) The easiest special case is when A is flat, ... Let me see if there’s a more elegant way to do that. |
It’s really quite a different operation from svd based on the outputs. |
... anyway... let me find a machine that doesn't have build problems and I'll look into this again. |
100baa7
to
f7002b2
Compare
@eric-wieser: Let's use the branch you're working on. BTW, on your tests... there should also be a check on the values of the residuals (when they are returned)... def test_empty_a_b(self):
for m, n, n_rhs in [(0, 4, 1), (0, 4, 2), (4, 0, 1), (4, 0, 2), (4, 2, 0)]:
a = np.arange(m * n).reshape(m, n)
b = np.ones((m, n_rhs))
x, residuals, rank, s = linalg.lstsq(a, b, rcond=None)
assert_(x.shape == (n, n_rhs))
assert_(tuple(residuals.shape) == ((n_rhs,) if m > n else (0,)))
##### HERE #####
if m > n and n_rhs > 0:
assert_((residuals == np.full(residuals.shape, m)).all())
##### HERE #####
assert_(rank == min(m, n))
assert_(s.shape == (min(m, n),))
for m, n in [(0, 4), (4, 0)]:
a = np.arange(m * n).reshape(m, n)
b = np.ones((m,))
x, residuals, rank, s = linalg.lstsq(a, b, rcond=None)
assert_(x.shape == (n,))
assert_(tuple(residuals.shape) == ((1,) if m > n else (0,)))
##### HERE #####
if m > n:
assert_(residuals[0] == m)
##### HERE #####
assert_(rank == min(m, n))
assert_(s.shape == (min(m, n),)) ... btw, yay |
Codecov Report
@@ Coverage Diff @@
## master #11594 +/- ##
==========================================
+ Coverage 85.71% 85.72% +<.01%
==========================================
Files 327 327
Lines 82068 82086 +18
==========================================
+ Hits 70347 70365 +18
Misses 11721 11721
Continue to review full report at Codecov.
|
@eric-wieser: Looking at #11604, I realize that LAPACK can be very particular about inputs and what is valid. It may be easier to just code around LAPACK. |
0e2c5e4
to
0d75715
Compare
Tests are passing but could do with a restart. I'll rebase and push again... |
I can restart them |
@@ -875,10 +875,6 @@ def test_0_size(self): | |||
class LstsqCases(LinalgSquareTestCase, LinalgNonsquareTestCase): | |||
|
|||
def do(self, a, b, tags): | |||
if 'size-0' in tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser: The entire early return has been removed. The tests below that bit will run.
doc/release/1.16.0-notes.rst
Outdated
@@ -40,6 +40,12 @@ New Features | |||
Improvements | |||
============ | |||
|
|||
``linalg.lstsq`` now works with empty matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser: This will have to be replaced...
Possibly with:
``linalg.qr`` and ``linalg.lstsq`` now work with empty matrices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously, a ``LinAlgError`` would be raised when an empty matrix/empty
matrices (with zero rows and/or columns) is/are passed in. Now outputs of
appropriate shapes are returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. You should merge / rebase on master so that you can combine the messages
Are we keeping the
My understanding is that
|
numpy/linalg/linalg.py
Outdated
x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj) | ||
if m == 0: | ||
x = zeros(x.shape, dtype=x.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly better: x[...] = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
r = b - np.dot(a, x) | ||
assert_almost_equal(residuals, (r * r).sum(axis=-2)) | ||
assert_equal(rank, min(m, n)) | ||
assert_equal(s.shape, (min(m, n),)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this block could be replaced with self.do
? It seems there's a fair amount of repetition between this test and the one that was used before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that part can be left to another enhancement.
Because I would have to add test cases that apply to everything. That's fine, and it would be great.
cond
(around L1566) still has _assertNoEmpty2d
... and we can just have cond(m)=0
for empty matrix m
(because sup_x (|Ax| / |x|) = 0 when |Ax| = 0 for all x).
Currently, the qr
code is in master but not in this branch. I could cherry pick it over, but I'm not sure if things might get ugly.
Once the two are in, we won't have any more _assertNoEmpty2d
's running about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... that's the norm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'm conflicted because there is no inverse of an empty matrix. Maybe that bit with cond
should be left as is. LinAlgError
is as good an output as any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole test_empty_a_b_1d
bit can be exactly performed with self.do
but as for the other, test_empty_a_b
has b
which is 2d. So I'd keep that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave cond
as is for now. Keeping test_empty_a_b
sounds fine to me - just wanted to make sure you'd considered using .do
. Switching to using it or hearing an argument why you didn't (b
being 2d) are both good responses.
x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj) | ||
if m == 0: | ||
x[...] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the tests should be checking for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding that...
I'm adding tests for broadcasting works... but constrained to signature...
IMO, accepting @eric-wieser: Would that make sense? |
Don't bother with the broadcasting tests - that can wait till we actually implement it. Let's keep this PR contained to empty matrices, and leaving things as ready for broadcasting (ie, not quite) as they were before |
I think this is good - all it needs is a rebase - right now the diff is showing the qr changes as well, which are already merged. |
Re: broadcasting, it's almost there anyway... just need to rationalize |
@convexset: Please leave that till another issue. There are big problems with |
f3cb117
to
82d5c1d
Compare
It looks like you've perhaps not fetched from this repository in your local copy, and are rebasing on an old version? I typically do
|
82d5c1d
to
cb0fc23
Compare
Well, I am doing too many rebases and forced pushes. Could I leave the documentation tidying up to you? That would be cleaner. |
@eric-wieser: |
You can, but it would be nice to know what's not working your end.
The |
@eric-wieser: Just combining the Re: Broadcasting, you're right. The residuals suck. We should return arrays of the same shape regardless of rank or shape of Rank is a small matter, LAPACK either returns an array or an integer. The "shapes" make sense. |
To make broadcasting work, we can just calculate the residuals: # ...
x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
r = abs(einsum('...mn,...nq', a, x) - b)
resids = (r * r).sum(axis=-2)
# ... and get rid of... # ...
# as documented
if rank != n or m <= n:
resids = array([], result_real_t)
# ... (We also have to import The tests that pass: @pytest.mark.parametrize(["p_a", "p_b", "m", "n", "n_rhs"], [
(2, 1, 4, 2, 3),
(1, 2, 4, 2, 3),
(2, 2, 4, 2, 3),
(2, 1, 3, 3, 3),
(1, 2, 3, 3, 3),
(2, 2, 3, 3, 3),
(2, 1, 4, 2, 1),
(1, 2, 4, 2, 1),
( Rank comes out as an The change: the residuals match the dimensions and are not conditional on rank. |
…exset/numpy into handle-empty-matrices-lstsq * 'handle-empty-matrices-lstsq' of https://github.com/convexset/numpy: ENH: support for empty matrices in linalg.lstsq
Rebased on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good now, thanks!
Great. |
Thanks @convexset |
Added support for empty matrices in linalg.lstsq with tests
Relates to #8654