ENH: support for empty matrices in linalg.lstsq #11594

convexset · 2018-07-20T11:44:39Z

Added support for empty matrices in linalg.lstsq with tests

Relates to #8654

tylerjereddy

The CI test failures in numpy/linalg/tests/test_linalg.py are related but not surprising after adding support for handling empty matrices.

There are quite a few tests cases there with empty inputs that expect for LinAlgError to be raised, so those will have to be adjusted I guess.

convexset · 2018-07-20T16:29:08Z

Just adjusted those. I've been having trouble with "building and linking". (See #8654) Wonder if you could offer any advice.

tylerjereddy · 2018-07-20T16:34:06Z

I see you've already made adjustments! I took a look at your workflow--I sometimes have new issues when I start working on another machine, but usually I resort to using a conda environment. I'm working on a mac at the moment using conda and:

python setup.py install --user
python runtests.py -t "numpy/linalg/tests/test_linalg.py"

works ok from within the NumPy repo (runtests.py seems a little more prone to build failure than calling setup.py first in my hands).

tylerjereddy · 2018-07-20T16:40:28Z

I see you are editing code directly in site_packages? I added a comment about that in the linked discussion as that may be a bit unusual if I'm not mistaken.

convexset · 2018-07-20T16:44:48Z

Generally, editing stuff in site_packages would be crazy. It makes it hard to check if the tests pass, so I missed the LinAlgError... The improvement is not so complex that figuring out what changes to make by doing that (and writing the accompanying tests) would not be too problematic (and be verifiable via CI... until my local build issues are resolved).

convexset · 2018-07-20T16:59:03Z

Anyway... installing from the repo into a virtualenv doesn't quite change things... not sure what is wrong with my system...

$ pip install .
Processing /Users/jeremychen/repositories/others/numpy
Building wheels for collected packages: numpy
  Running setup.py bdist_wheel for numpy ... done
  Stored in directory: /private/var/folders/d1/yj_41zfn6hs56_qvndpqqs_c0000gn/T/pip-ephem-wheel-cache-aprsjuuj/wheels/a0/64/8f/9b30c13db561f340b1ca7bb48332c0330242d3fc57e2fe4344
Successfully built numpy
Installing collected packages: numpy
Successfully installed numpy-1.16.0.dev0+ba148ba

then...

Sat Jul 21, 00:55:13 !597 $ python
Python 3.6.5 (default, Mar 30 2018, 06:41:53) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import numpy as np
>>> np.linalg.lstsq(np.arange(6).reshape(3,2), np.ones(3))
__main__:1: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
python(60174,0x7fffa3104380) malloc: *** mach_vm_map(size=18446744072405426176) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
init_dgelsd failed init
(array([-2.00000000e+000, -1.29074075e-231]), array([], dtype=float64), 1073741824, array([-2., -2.]))
>>>

eric-wieser

I appreciate the patch, but adding this many special cases is definitely not the right approach. You should be able to do this much less verbosely by copying the style of #11424

convexset · 2018-07-22T09:16:17Z

Hmmm... lstsq has rather distinct special cases...

When A has some height but no columns and b is non-empty. The norm squared of b’s columns have to be computed for the residuals.

When A is non-empty but b is, we need to compute singular values as per normal. (On hindsight, I would use the first column as a replacement for b.)

The easiest special case is when A is flat, ...

Let me see if there’s a more elegant way to do that.

convexset · 2018-07-22T09:17:18Z

It’s really quite a different operation from svd based on the outputs.

convexset · 2018-07-22T09:42:50Z

... anyway... let me find a machine that doesn't have build problems and I'll look into this again.

convexset · 2018-07-29T13:42:45Z

@eric-wieser: Let's use the branch you're working on.

BTW, on your tests... there should also be a check on the values of the residuals (when they are returned)...

    def test_empty_a_b(self):
        for m, n, n_rhs in [(0, 4, 1), (0, 4, 2), (4, 0, 1), (4, 0, 2), (4, 2, 0)]:
            a = np.arange(m * n).reshape(m, n)
            b = np.ones((m, n_rhs))
            x, residuals, rank, s = linalg.lstsq(a, b, rcond=None)
            assert_(x.shape == (n, n_rhs))
            assert_(tuple(residuals.shape) == ((n_rhs,) if m > n else (0,)))
            ##### HERE #####
            if m > n and n_rhs > 0:
                assert_((residuals == np.full(residuals.shape, m)).all())
            ##### HERE #####
            assert_(rank == min(m, n))
            assert_(s.shape == (min(m, n),))

        for m, n in [(0, 4), (4, 0)]:
            a = np.arange(m * n).reshape(m, n)
            b = np.ones((m,))
            x, residuals, rank, s = linalg.lstsq(a, b, rcond=None)
            assert_(x.shape == (n,))
            assert_(tuple(residuals.shape) == ((1,) if m > n else (0,)))
            ##### HERE #####
            if m > n:
                assert_(residuals[0] == m)
            ##### HERE #####
            assert_(rank == min(m, n))
            assert_(s.shape == (min(m, n),))

... btw, yay pytest.

codecov-io · 2018-07-29T13:44:33Z

Codecov Report

Merging #11594 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #11594      +/-   ##
==========================================
+ Coverage   85.71%   85.72%   +<.01%     
==========================================
  Files         327      327              
  Lines       82068    82086      +18     
==========================================
+ Hits        70347    70365      +18     
  Misses      11721    11721

Impacted Files	Coverage Δ
numpy/linalg/tests/test_linalg.py	`97.14% <100%> (+0.03%)`	⬆️
numpy/linalg/linalg.py	`91.14% <100%> (+0.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6105281...3abfc05. Read the comment docs.

convexset · 2018-07-29T13:51:01Z

@eric-wieser: Looking at #11604, I realize that LAPACK can be very particular about inputs and what is valid. It may be easier to just code around LAPACK.

convexset · 2018-07-29T20:00:11Z

Tests are passing but could do with a restart. I'll rebase and push again...

eric-wieser · 2018-07-29T20:00:58Z

I can restart them

convexset · 2018-08-01T08:21:55Z

numpy/linalg/tests/test_linalg.py

@@ -875,10 +875,6 @@ def test_0_size(self):
 class LstsqCases(LinalgSquareTestCase, LinalgNonsquareTestCase):

    def do(self, a, b, tags):
-        if 'size-0' in tags:


@eric-wieser: The entire early return has been removed. The tests below that bit will run.

convexset · 2018-08-01T08:23:50Z

doc/release/1.16.0-notes.rst

@@ -40,6 +40,12 @@ New Features
 Improvements
 ============

+``linalg.lstsq`` now works with empty matrices


@eric-wieser: This will have to be replaced...

Possibly with:

``linalg.qr`` and ``linalg.lstsq`` now work with empty matrices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Previously, a ``LinAlgError`` would be raised when an empty matrix/empty matrices (with zero rows and/or columns) is/are passed in. Now outputs of appropriate shapes are returned.

Sounds good to me. You should merge / rebase on master so that you can combine the messages

convexset · 2018-08-01T14:48:19Z

Are we keeping the consistent_subclass (see L26 or below) test?

        assert_(consistent_subclass(x, b))
        assert_(consistent_subclass(residuals, b))

My understanding is that matrix is being deprecated.

def consistent_subclass(out, in_):
    # For ndarray subclass input, our output should have the same subclass
    # (non-ndarray input gets converted to ndarray).
    return type(out) is (type(in_) if isinstance(in_, np.ndarray)
                         else np.ndarray)

eric-wieser · 2018-08-01T15:52:08Z

numpy/linalg/linalg.py

    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
+    if m == 0:
+        x = zeros(x.shape, dtype=x.dtype)


Possibly better: x[...] = 0

eric-wieser · 2018-08-01T15:57:46Z

numpy/linalg/tests/test_linalg.py

+            r = b - np.dot(a, x)
+            assert_almost_equal(residuals, (r * r).sum(axis=-2))
+        assert_equal(rank, min(m, n))
+        assert_equal(s.shape, (min(m, n),))


How much of this block could be replaced with self.do? It seems there's a fair amount of repetition between this test and the one that was used before

I think that part can be left to another enhancement.

Because I would have to add test cases that apply to everything. That's fine, and it would be great.

cond (around L1566) still has _assertNoEmpty2d... and we can just have cond(m)=0 for empty matrix m (because sup_x (|Ax| / |x|) = 0 when |Ax| = 0 for all x).

Currently, the qr code is in master but not in this branch. I could cherry pick it over, but I'm not sure if things might get ugly.

Once the two are in, we won't have any more _assertNoEmpty2d's running about.

Wait... that's the norm...

Then I'm conflicted because there is no inverse of an empty matrix. Maybe that bit with cond should be left as is. LinAlgError is as good an output as any.

The whole test_empty_a_b_1d bit can be exactly performed with self.do but as for the other, test_empty_a_b has b which is 2d. So I'd keep that.

I'd leave cond as is for now. Keeping test_empty_a_b sounds fine to me - just wanted to make sure you'd considered using .do. Switching to using it or hearing an argument why you didn't (b being 2d) are both good responses.

eric-wieser · 2018-08-02T05:45:33Z

numpy/linalg/linalg.py

    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
+    if m == 0:
+        x[...] = 0


One of the tests should be checking for this case

adding that...

convexset · 2018-08-02T06:17:41Z

I'm adding tests for broadcasting works... but constrained to signature...

a is (... × m × n); b is (... × n × n_rhs)

IMO, accepting a is (... × m × n); b is (... × n) would be confusing.

@eric-wieser: Would that make sense?

eric-wieser · 2018-08-02T06:22:54Z

Don't bother with the broadcasting tests - that can wait till we actually implement it. Let's keep this PR contained to empty matrices, and leaving things as ready for broadcasting (ie, not quite) as they were before

eric-wieser · 2018-08-02T06:24:56Z

I think this is good - all it needs is a rebase - right now the diff is showing the qr changes as well, which are already merged.

Just waiting on a rebase

convexset · 2018-08-02T06:29:50Z

Re: broadcasting, it's almost there anyway... just need to rationalize rank in a backwards compatible manner.

eric-wieser · 2018-08-02T06:32:19Z

@convexset: Please leave that till another issue. There are big problems with resids that are possibly unresolvable without breaking compatibility. rank should generalize without problem.

eric-wieser · 2018-08-02T06:39:30Z

It looks like you've perhaps not fetched from this repository in your local copy, and are rebasing on an old version? I typically do

git remote add upstream https://github.com/numpy/numpy
git fetch upstream

convexset · 2018-08-02T06:41:48Z

Well, I am doing too many rebases and forced pushes. Could I leave the documentation tidying up to you? That would be cleaner.

convexset · 2018-08-02T06:43:41Z

@eric-wieser: rank is a bigger problem because it comes as an int, except in the vectorized case.

eric-wieser · 2018-08-02T06:48:14Z

Could I leave the documentation tidying up to you

You can, but it would be nice to know what's not working your end.

it comes as an int, except in the vectorized case.

The gufunc machinery handles this all for us, so this isn't a concern.

convexset · 2018-08-02T06:53:08Z

@eric-wieser: Just combining the qr and lstsq release notes. Resolving the merge conflict causes a big mess.

Re: Broadcasting, you're right. The residuals suck. We should return arrays of the same shape regardless of rank or shape of a. That's the most sensible breaking change.

Rank is a small matter, LAPACK either returns an array or an integer. The "shapes" make sense.

convexset · 2018-08-02T07:18:42Z

To make broadcasting work, we can just calculate the residuals:

    # ...
    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
    r = abs(einsum('...mn,...nq', a, x) - b)
    resids = (r * r).sum(axis=-2)
    # ...

and get rid of...

    # ...
    # as documented
    if rank != n or m <= n:
        resids = array([], result_real_t)
    # ...

(We also have to import einsum.)

The tests that pass:

    @pytest.mark.parametrize(["p_a", "p_b", "m", "n", "n_rhs"], [
        (2, 1, 4, 2, 3),
        (1, 2, 4, 2, 3),
        (2, 2, 4, 2, 3),
        (2, 1, 3, 3, 3),
        (1, 2, 3, 3, 3),
        (2, 2, 3, 3, 3),
        (2, 1, 4, 2, 1),
        (1, 2, 4, 2, 1),
        (2, 2, 4, 2, 1),
        (2, 1, 3, 3, 1),
        (1, 2, 3, 3, 1),
        (2, 2, 3, 3, 1),
    ])
    def test_broadcast_a_b(self, p_a, p_b, m, n, n_rhs):
        # validate test parameters
        assert_((p_a == p_b) or (p_a == 1) or (p_b == 1))

        a = np.random.random((p_a, m, n))
        x = np.random.random((p_b, n, n_rhs))
        b = np.einsum('...mn,...nq', a, x)
        _x, residuals, rank, s = linalg.lstsq(a, b, rcond=None)
        if p_a > 1 and p_b == 1:
            for __x in _x:
                assert_almost_equal(__x, x[0])
        # elif p_b > 1 and p_a == 1:
        #     assert_almost_equal(x, _x)
        else:
            assert_almost_equal(x, _x)
        assert_equal(rank, np.full((max(p_a, p_b),), min(m, n)))
        assert_equal(s.shape, (max(p_a, p_b), min(m, n)))
        assert_equal(residuals.shape, ((max(p_a, p_b), n_rhs)))

Rank comes out as an int unless we are doing broadcasting, in which case it will be an array.

The change: the residuals match the dimensions and are not conditional on rank.

…exset/numpy into handle-empty-matrices-lstsq * 'handle-empty-matrices-lstsq' of https://github.com/convexset/numpy: ENH: support for empty matrices in linalg.lstsq

convexset · 2018-08-02T16:25:10Z

Rebased on master; Updated documentation.

eric-wieser

All looks good now, thanks!

convexset · 2018-08-02T17:21:22Z

Great.

mattip · 2018-08-07T23:47:52Z

Thanks @convexset

This was referenced Jul 20, 2018

ENH: Implement np.linalg.{svd,qr,lstsq} for matrices with .size == 0 #8654

Closed

np.linalg.qr: handle empty matrices #11412

Closed

tylerjereddy added 01 - Enhancement component: numpy.linalg labels Jul 20, 2018

tylerjereddy reviewed Jul 20, 2018

View reviewed changes

tylerjereddy mentioned this pull request Jul 20, 2018

ENH: handle empty matrices in qr decomposition #11593

Merged

convexset force-pushed the handle-empty-matrices-lstsq branch from 6f3b1ad to a78a044 Compare July 20, 2018 18:04

eric-wieser previously requested changes Jul 22, 2018

View reviewed changes

eric-wieser mentioned this pull request Jul 23, 2018

WIP: Allow lstsq on empty arrays #11604

Closed

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Jul 23, 2018

TST: Add tests adapted from numpygh-11594

f031527

convexset force-pushed the handle-empty-matrices-lstsq branch from 100baa7 to f7002b2 Compare July 29, 2018 13:20

convexset closed this Jul 29, 2018

convexset reopened this Jul 29, 2018

convexset force-pushed the handle-empty-matrices-lstsq branch from 0e2c5e4 to 0d75715 Compare July 29, 2018 15:07

eric-wieser closed this Jul 29, 2018

convexset commented Aug 1, 2018

View reviewed changes

eric-wieser reviewed Aug 1, 2018

View reviewed changes

eric-wieser reviewed Aug 2, 2018

View reviewed changes

convexset force-pushed the handle-empty-matrices-lstsq branch 2 times, most recently from f3cb117 to 82d5c1d Compare August 2, 2018 06:37

ENH: support for empty matrices in linalg.lstsq

cb0fc23

convexset force-pushed the handle-empty-matrices-lstsq branch from 82d5c1d to cb0fc23 Compare August 2, 2018 06:40

convexset added 3 commits August 3, 2018 00:18

ENH: support for empty matrices in linalg.lstsq

45d8c5d

Merge branch 'handle-empty-matrices-lstsq' of https://github.com/conv…

2c05f70

…exset/numpy into handle-empty-matrices-lstsq * 'handle-empty-matrices-lstsq' of https://github.com/convexset/numpy: ENH: support for empty matrices in linalg.lstsq

cleaned up documentation

3abfc05

eric-wieser approved these changes Aug 2, 2018

View reviewed changes

mattip merged commit 77a86c9 into numpy:master Aug 7, 2018

Uh oh!

ENH: support for empty matrices in linalg.lstsq #11594

ENH: support for empty matrices in linalg.lstsq #11594

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!