MAINT: lstsq: compute residuals inside the ufunc #10890

eric-wieser · 2018-04-12T06:55:14Z

This saves a larger-than-needed output array from being allocated

eric-wieser · 2018-04-12T06:57:28Z

numpy/linalg/umath_linalg.c.src

+                *(npy_int*) args[5] = params.RANK;
+                delinearize_@REALTYPE@_matrix(args[6], params.S, &s_out);
+
+                if (excess >= 0 && params.RANK == n) {


Note that np.linalg.lstsq only actually returns residuals if excess > 0. I'd consider this a bug, as to me the residuals are well defined as 0 when m == n.

Either way, I've deliberately not changed the external interface yet.

eric-wieser · 2018-04-12T06:58:11Z

numpy/linalg/umath_linalg.c.src

+                    }
+                }
+                else {
+                    nan_@REALTYPE@_matrix(args[4], &r_out);


This also never escapes to the external interface, and is replaced by an shape == (0,) array (which makes no sense!)

eric-wieser · 2018-04-12T16:39:08Z

I think #9997 might have been incorrect, which is why this fails.

eric-wieser · 2018-04-17T06:43:05Z

Updated with a manual and probably faster implementation of abs2, rather than dot(x, x.conj()) which needlessly computes a zero for the imaginary part.

pv · 2018-04-17T09:00:07Z

hypot(el.real, el.imag)

pv · 2018-04-17T09:00:26Z

sorry, ignore the last comment

eric-wieser · 2018-04-17T09:07:48Z

That crossed my mind too, but I assume it's only useful when avoiding overflow.

I suppose that by moving this to C, we lose the pairwise summation that numpy normally does. But I doubt that level of precision matters here anyway

This prevents an overly large output array being allocated. It also means the the residuals can be handled as a separate out argument in future.

ewmoore · 2018-04-17T13:12:33Z

There’s always the routines from the blas/lapack too.

…

On Tue, Apr 17, 2018 at 5:07 AM Eric Wieser ***@***.***> wrote: That crossed my mind too, but I assume it's only useful when avoiding overflow. I suppose that by moving this to C, we lose the pairwise summation that numpy normally does. But I doubt that level of precision matters here anyway — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#10890 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACNRkf5iwHsrxRSZWUxr6t2q8hhwsEhkks5tpbDpgaJpZM4TRPDL> .

eric-wieser · 2018-04-17T15:48:04Z

@ewmoore: the best I could find was zdotc. Unfortunately, that's tricky to invoke, as the function prototype changes depending on whether we're using the f2c'd lapack or the Fortran one (a bug in f2c?)

mhvk · 2018-04-17T15:52:00Z

For reference, #3994 is a long-standing request for an abs2 ufunc, with more discussion about pros and cons of approaches. I think if it weren't for not being sure what the name should be, we'd have it already...

eric-wieser · 2018-04-17T16:05:39Z

Even if we had such a ufunc, it would be quite tricky to call it from within this ufunc - in particular, it would end up being called multiple times, since the memory I'm calling it on does not persist across the stacked arrays. With the overhead associated with a ufunc call, that would be prohibitively expensive.

mhvk

Great to wrap the blas functions more completely! Two small comments/questions only.

mhvk · 2018-04-17T21:08:31Z

numpy/linalg/umath_linalg.c.src

+                        @ftyp@ *vector = components + i*m;
+                        /* Numpy and fortran floating types are the same size,
+                         * so this case is safe */
+                        @basetyp@ abs2 = @TYPE@_abs2((@typ@ *)vector, excess);


Am not sure this is worth splitting out as a function: it feels to me like it just becomes less readable.

We're already at 5 levels of indentation here - I didn't want to introduce a 6th as well as #ifdefs.

To me, it seems clearer to try and fold the #ifdefs into the function definitions as much as possible.

Splitting the function also makes it easier to drop in a lapack/blas implementation of them if someone finds that's faster (and cares)

OK, it was not a big deal.

mhvk · 2018-04-17T21:09:36Z

numpy/linalg/umath_linalg.c.src

+                        /* Numpy and fortran floating types are the same size,
+                         * so this case is safe */
+                        @basetyp@ abs2 = @TYPE@_abs2((@typ@ *)vector, excess);
+                        memcpy(


My poor knowledge of C does not really help here, but memcpy feels very odd. Can one not just assign?

If resid + i*r_out.column_strides is not aligned, there's no guarantee that the write will succeed. Unless there's some gufunc magic wrapping us that ensures this is the case, memcpy is safer.

delinearize_@TYPE@_matrix uses memcpy, which is where we copy between numpy and the other lapack buffers - so memcpy is at least consistent.

Also, something-something-strict-aliasing.

I fear I remain confused, but fine to go with something that works!

A simple example: This will not work on some platforms:

char bytes[8] = {0}; (uint32_t *)&bytes[1] = 0x12345678; // unaligned 32-bit write may fail assert((uint32_t *)&bytes[0] == 0x00123456); // assuming big endian

But using memcpy would work every time

Relatedly: https://www.alfonsobeato.net/arm/how-to-access-safely-unaligned-data/

Ah, thanks that makes it clearer (link is a good one too)

mhvk · 2018-04-20T17:57:13Z

OK, I think this is all set, so will merge.

eric-wieser added component: numpy.linalg 03 - Maintenance labels Apr 12, 2018

eric-wieser requested a review from mhvk April 12, 2018 06:55

eric-wieser changed the title ~~MAINT: compute residuals inside the ufunc~~ MAINT: lstsq: compute residuals inside the ufunc Apr 12, 2018

eric-wieser commented Apr 12, 2018

View reviewed changes

eric-wieser force-pushed the linalg-lstsq-ufunc branch from d0eca24 to 0bfdb41 Compare April 12, 2018 09:30

eric-wieser force-pushed the linalg-lstsq-ufunc branch 2 times, most recently from 3c6f110 to 8442391 Compare April 17, 2018 06:42

MAINT: compute residuals inside the ufunc

12114c7

This prevents an overly large output array being allocated. It also means the the residuals can be handled as a separate out argument in future.

eric-wieser force-pushed the linalg-lstsq-ufunc branch from 8442391 to 12114c7 Compare April 17, 2018 09:10

mhvk reviewed Apr 17, 2018

View reviewed changes

mhvk added this to the 1.15.0 release milestone Apr 20, 2018

mhvk merged commit de62ef9 into numpy:master Apr 20, 2018

eric-wieser mentioned this pull request Apr 20, 2018

ENH: broadcast lstsq #8720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: lstsq: compute residuals inside the ufunc #10890

MAINT: lstsq: compute residuals inside the ufunc #10890

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MAINT: lstsq: compute residuals inside the ufunc #10890

MAINT: lstsq: compute residuals inside the ufunc #10890

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!