-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT: lstsq: compute residuals inside the ufunc #10890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/linalg/umath_linalg.c.src
Outdated
*(npy_int*) args[5] = params.RANK; | ||
delinearize_@REALTYPE@_matrix(args[6], params.S, &s_out); | ||
|
||
if (excess >= 0 && params.RANK == n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that np.linalg.lstsq
only actually returns residuals if excess > 0
. I'd consider this a bug, as to me the residuals are well defined as 0
when m == n
.
Either way, I've deliberately not changed the external interface yet.
numpy/linalg/umath_linalg.c.src
Outdated
} | ||
} | ||
else { | ||
nan_@REALTYPE@_matrix(args[4], &r_out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also never escapes to the external interface, and is replaced by an shape == (0,)
array (which makes no sense!)
d0eca24
to
0bfdb41
Compare
I think #9997 might have been incorrect, which is why this fails. |
3c6f110
to
8442391
Compare
Updated with a manual and probably faster implementation of |
hypot(el.real, el.imag)
|
sorry, ignore the last comment
|
That crossed my mind too, but I assume it's only useful when avoiding overflow. I suppose that by moving this to C, we lose the pairwise summation that numpy normally does. But I doubt that level of precision matters here anyway |
This prevents an overly large output array being allocated. It also means the the residuals can be handled as a separate out argument in future.
8442391
to
12114c7
Compare
There’s always the routines from the blas/lapack too.
…On Tue, Apr 17, 2018 at 5:07 AM Eric Wieser ***@***.***> wrote:
That crossed my mind too, but I assume it's only useful when avoiding
overflow.
I suppose that by moving this to C, we lose the pairwise summation that
numpy normally does. But I doubt that level of precision matters here anyway
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10890 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACNRkf5iwHsrxRSZWUxr6t2q8hhwsEhkks5tpbDpgaJpZM4TRPDL>
.
|
@ewmoore: the best I could find was |
For reference, #3994 is a long-standing request for an |
Even if we had such a ufunc, it would be quite tricky to call it from within this ufunc - in particular, it would end up being called multiple times, since the memory I'm calling it on does not pers 8000 ist across the stacked arrays. With the overhead associated with a ufunc call, that would be prohibitively expensive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to wrap the blas functions more completely! Two small comments/questions only.
@ftyp@ *vector = components + i*m; | ||
/* Numpy and fortran floating types are the same size, | ||
* so this case is safe */ | ||
@basetyp@ abs2 = @TYPE@_abs2((@typ@ *)vector, excess); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am not sure this is worth splitting out as a function: it feels to me like it just becomes less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already at 5 levels of indentation here - I didn't want to introduce a 6th as well as #ifdefs
.
To me, it seems clearer to try and fold the #ifdefs
into the function definitions as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting the function also makes it easier to drop in a lapack/blas implementation of them if someone finds that's faster (and cares)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it was not a big deal.
/* Numpy and fortran floating types are the same size, | ||
* so this case is safe */ | ||
@basetyp@ abs2 = @ 8000 TYPE@_abs2((@typ@ *)vector, excess); | ||
memcpy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My poor knowledge of C does not really help here, but memcpy
feels very odd. Can one not just assign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If resid + i*r_out.column_strides
is not aligned, there's no guarantee that the write will succeed. Unless there's some gufunc magic wrapping us that ensures this is the case, memcpy
is safer.
delinearize_@TYPE@_matrix
uses memcpy
, which is where we copy between numpy and the other lapack buffers - so memcpy
is at least consistent.
Also, something-something-strict-aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear I remain confused, but fine to go with something that works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple example: This will not work on some platforms:
char bytes[8] = {0};
(uint32_t *)&bytes[1] = 0x12345678; // unaligned 32-bit write may fail
assert((uint32_t *)&bytes[0] == 0x00123456); // assuming big endian
But using memcpy
would work every time
Relatedly: https://www.alfonsobeato.net/arm/how-to-access-safely-unaligned-data/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks that makes it clearer (link is a good one too)
OK, I think this is all set, so will merge. |
This saves a larger-than-needed output array from being allocated