8000 ENH: support for empty matrices in linalg.lstsq by convexset · Pull Request #11594 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: support for empty matrices in linalg.lstsq #11594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 7, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ENH: support for empty matrices in linalg.lstsq
  • Loading branch information
convexset committed Aug 2, 2018
commit 45d8c5d1562007492c459f290e16cbbf99c72e1c
6 changes: 6 additions & 0 deletions doc/release/1.16.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ New Features
Improvements
============

``linalg.lstsq`` now works with empty matrices
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser: This will have to be replaced...

Possibly with:

``linalg.qr`` and ``linalg.lstsq`` now work with empty matrices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously, a ``LinAlgError`` would be raised when an empty matrix/empty
matrices (with zero rows and/or columns) is/are passed in. Now outputs of
appropriate shapes are returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. You should merge / rebase on master so that you can combine the messages

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously, a ``LinAlgError`` would be raised when an empty matrix/empty
matrices (with zero rows and/or columns) is passed in. Now outputs of
appropriate shapes are returned.

``randint`` and ``choice`` now work on empty distributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Even when no elements needed to be drawn, ``np.random.randint`` and
Expand Down
10 changes: 9 additions & 1 deletion numpy/linalg/linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -2115,7 +2115,6 @@ def lstsq(a, b, rcond="warn"):
if is_1d:
b = b[:, newaxis]
_assertRank2(a, b)
_assertNoEmpty2d(a, b) # TODO: relax this constraint
m, n = a.shape[-2:]
m2, n_rhs = b.shape[-2:]
if m != m2:
Expand Down Expand Up @@ -2146,7 +2145,16 @@ def lstsq(a, b, rcond="warn"):

signature = 'DDd->Ddid' if isComplexType(t) else 'ddd->ddid'
extobj = get_linalg_error_extobj(_raise_linalgerror_lstsq)
if n_rhs == 0:
# lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
if m == 0:
x[...] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the tests should be checking for this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding that...

if n_rhs == 0:
# remove the item we added
x = x[..., :n_rhs]
resids = resids[..., :n_rhs]

# remove the axis we added
if is_1d:
Expand Down
30 changes: 26 additions & 4 deletions numpy/linalg/tests/test_linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -875,14 +875,12 @@ def test_0_size(self):
class LstsqCases(LinalgSquareTestCase, LinalgNonsquareTestCase):

def do(self, a, b, tags):
if 'size-0' in tags:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser: The entire early return has been removed. The tests below that bit will run.

assert_raises(LinAlgError, linalg.lstsq, a, b)
return

arr = np.asarray(a)
m, n = arr.shape
u, s, vt = linalg.svd(a, 0)
x, residuals, rank, sv = linalg.lstsq(a, b, rcond=-1)
if m == 0:
assert_((x == 0).all())
if m <= n:
assert_almost_equal(b, dot(a, x))
assert_equal(rank, m)
Expand Down Expand Up @@ -923,6 +921,30 @@ def test_future_rcond(self):
# Warning should be raised exactly once (first command)
assert_(len(w) == 1)

@pytest.mark.parametrize(["m", "n", "n_rhs"], [
(4, 2, 2),
(0, 4, 1),
(0, 4, 2),
(4, 0, 1),
(4, 0, 2),
(4, 2, 0),
(0, 0, 0)
])
def test_empty_a_b(self, m, n, n_rhs):
a = np.arange(m * n).reshape(m, n)
b = np.ones((m, n_rhs))
x, residuals, rank, s = linalg.lstsq(a, b, rcond=None)
if m == 0:
assert_((x == 0).all())
assert_equal(x.shape, (n, n_rhs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the contents of x? Should it be 0s in the (0, 4, 2) case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x might be full of uninitialized nonsense like:

[[3.10503618e+231 0.00000000e+000]
 [3.10503618e+231 0.00000000e+000]
 [2.96439388e-323 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000]]

in the case where a is 0 × n and b is 0 × n_rhs.

We could fill it with zeros, of course.

Copy link
Contributor Author
@convexset convexset Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but the test for the residuals below are enough. I'm adding a regular (non-empty) case to make sure things are correct.

Copy link
Member
@eric-wieser eric-wieser Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x might be full of uninitialized nonsense like:

Worse, it might be filled with nan or inf, which would make your tests fail but only extremely rarely. The whole point of fixing these 0-size cases is to ensure the arrays are not filled with nonsense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nan's and inf's... actually they won't but I prefer 0's

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm not advocating that we fill with nan / inf - I'm saying that some invariants you'd expect for uninitialized nonsense, such as 0 * x == 0, do not hold if the nonsense is non-finite. Perhaps that doesn't matter here

Copy link
Contributor Author
@convexset convexset Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course not everyone hates nan's... but when a is 0 × n, post multiplying will have a first dimension of 0. =P

(Actually, they are pretty good for debugging numerical PDE solvers.)

assert_equal(residuals.shape, ((n_rhs,) if m > n else (0,)))
if m > n and n_rhs > 0:
# residuals are exactly the squared norms of b's columns
r = b - np.dot(a, x)
assert_almost_equal(residuals, (r * r).sum(axis=-2))
assert_equal(rank, min(m, n))
assert_equal(s.shape, (min(m, n),))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of this block could be replaced with self.do? It seems there's a fair amount of repetition between this test and the one that was used before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that part can be left to another enhancement.

Because I would have to add test cases that apply to everything. That's fine, and it would be great.

cond (around L1566) still has _assertNoEmpty2d... and we can just have cond(m)=0 for empty matrix m (because sup_x (|Ax| / |x|) = 0 when |Ax| = 0 for all x).

Currently, the qr code is in master but not in this branch. I could cherry pick it over, but I'm not sure if things might get ugly.

Once the two are in, we won't have any more _assertNoEmpty2d's running about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... that's the norm...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'm conflicted because there is no inverse of an empty matrix. Maybe that bit with cond should be left as is. LinAlgError is as good an output as any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole test_empty_a_b_1d bit can be exactly performed with self.do but as for the other, test_empty_a_b has b which is 2d. So I'd keep that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave cond as is for now. Keeping test_empty_a_b sounds fine to me - just wanted to make sure you'd considered using .do. Switching to using it or hearing an argument why you didn't (b being 2d) are both good responses.



class TestMatrixPower(object):
R90 = array([[0, 1], [-1, 0]])
Expand Down
0