8000 ENH: Added libdivide for floor divide by ganesh-k13 · Pull Request #17727 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Added libdivide for floor divide #17727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Dec 2, 2020
Merged
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
179038f
ENH: Added libdiv
ganesh-k13 Nov 7, 2020
e89175b
ENH: Fixed typos in header | use in2 over ip2
ganesh-k13 Nov 7, 2020
565759b
ENH: Added optimal divisor
ganesh-k13 Nov 8, 2020
d0c934c
ENH: Added libdivide header
ganesh-k13 Nov 8, 2020
b02399a
ENH: Made libdivide default
ganesh-k13 Nov 8, 2020
f0ddb7c
ENH: Handled divide by 0 case
ganesh-k13 Nov 8, 2020
72dcc04
ENH: Added libdivide zlib license
ganesh-k13 Nov 9, 2020
19835d2
ENH: Removed empty structure
ganesh-k13 Nov 10, 2020
3975a28
ENH: Auto generate libdivide structs
ganesh-k13 Nov 11, 2020
90e6cf5
ENH: Logic to optimize %
ganesh-k13 Nov 11, 2020
969aa03
ENH: Fix breaking case
ganesh-k13 Nov 11, 2020
44a3a31
ENH: Change comments
ganesh-k13 Nov 11, 2020
b3d70ef
ENH: Improved floor division (#17727)
ganesh-k13 Nov 11, 2020
931134b
ENH: Added asv benchmarks
ganesh-k13 Nov 11, 2020
6e2e281
ENH: Change comments
ganesh-k13 Nov 12, 2020
90a84af
ENH: Linting
ganesh-k13 Nov 12, 2020
61c3d38
MAINT: Added libdivide as linguist-vendored
ganesh-k13 Nov 12, 2020
827bc38
ENH: Removed legacy division
ganesh-k13 Nov 12, 2020
0ce0ebd
ENH: Improved floor division (#17727)
ganesh-k13 Nov 12, 2020
c85c44a
ENH: Added libdivide to timedelta
ganesh-k13 Nov 13, 2020
0517f13
TST: Added UT for floor divide
ganesh-k13 Nov 20, 2020
a769d6f
ENH: Improved floor division (#17727)
ganesh-k13 Nov 21, 2020
0e2116f
ENH: Optimized 0 divisor cases
ganesh-k13 Nov 21, 2020
f93ca93
TST: Minor changes to floor divide | Added cases for timedelta divide
ganesh-k13 Nov 21, 2020
285d810
ENH: Remove looping definitions | Renamed fast loop macros
ganesh-k13 Nov 22, 2020
9825795
ENH: Removed unsed macro check
ganesh-k13 Nov 22, 2020
1f104fd
BUG: Added better 0 checks
ganesh-k13 Nov 23, 2020
2fde590
BENCH: Added floor divide benchmarks (#17727)
ganesh-k13 Nov 23, 2020
8912ffd
DOC: Improved floor division (#17727)
ganesh-k13 Nov 23, 2020
a5e1235
BENCH: Improve floor divide benchmarks (#17727)
ganesh-k13 Nov 23, 2020
ca4ba20
BUG,TST: Fixed division by 0 status setting
ganesh-k13 Nov 23, 2020
28aa883
MAINT: Linting fixes
ganesh-k13 Dec 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ENH: Auto generate libdivide structs
  • Loading branch information
ganesh-k13 committed Nov 11, 2020
commit 3975a28d8b3efa385c58a0196f55d7d377e21a77
20 changes: 16 additions & 4 deletions numpy/core/src/umath/loops.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -832,7 +832,6 @@ NPY_NO_EXPORT void
* #TYPE = BYTE, SHORT, INT, LONG, LONGLONG#
* #type = npy_byte, npy_short, npy_int, npy_long, npy_longlong#
* #c = ,,,l,ll#
* #div = s32, s32, s32, s64, s64#
*/

NPY_NO_EXPORT NPY_GCC_OPT_3 void
Expand All @@ -847,6 +846,19 @@ NPY_NO_EXPORT NPY_GCC_OPT_3 void
UNARY_LOOP_FAST(@type@, @type@, *out = in > 0 ? 1 : (in < 0 ? -1 : 0));
}

/* Using nested loops, few more fields to be added in the future */
/**begin repeat1
* #kind = t, gen, do#
*/
/* Libdivde only supports 32 and 64 bit types
* We try to pick the best possible one */
#if NPY_BITSOF_@TYPE@ <= 32
#define libdivide_@type@_@kind@ libdivide_s32_@kind@
#else
#define libdivide_@type@_@kind@ libdivide_s64_@kind@
#endif
/**end repeat1**/

#ifndef USE_LEGACY_DIVISION
NPY_NO_EXPORT void
@TYPE@_divide(char **args, npy_intp const *dimensions, npy_intp const *steps, void *NPY_UNUSED(func))
Expand All @@ -857,7 +869,7 @@ NPY_NO_EXPORT void
const @type@ in2 = *(@type@ *)ip2;

/* Creating a divisor of 0 is treated as an error by libdivide */
struct libdivide_@div@_t fast_d = in2 ? libdivide_@div@_gen(in2) : (struct libdivide_@div@_t){0};
struct libdivide_@type@_t fast_d = in2 ? libdivide_@type@_gen(in2) : (struct libdivide_@type@_t){0};
BINARY_LOOP_FIXED {
const @type@ in1 = *(@type@ *)ip1; 8000
/*
Expand All @@ -872,10 +884,10 @@ NPY_NO_EXPORT void
*((@type@ *)op1) = 0;
}
else if (((in1 > 0) != (in2 > 0)) && (in1 % in2 != 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can libdivide compute in1 % in2 for you? It seems a bit silly to use libdivide only to then perform a remainder calculation without it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think we can avoid this % and rewrite it as postprocessing?

Copy link
Member Author
@ganesh-k13 ganesh-k13 Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I still don't know how removing the % gave no performance boost :). The compiler is magically optimizing something. #17727 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. @ganesh-k13 two things: First make sure you are dividing a positive by a negative number (or vice versa), otherwise this is not hit at all. Second, was the timing difference with libdivide? I guess it might be the compiler is smart enough to optimize the modulo away, but I would be surprised if it is smart enough when libdivide is being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They seem to have not done it yet: ridiculousfish/libdivide#9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this changes is subtract one for rounding purproses, Now unless there is some edge case again, I think you can just do without the subtract, and then move the if to later, so that if (res < 0) && (res * in2 != in1) { res -= 1}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right about not hitting the case, @seberg , seems like in the profile script I forgot to invert the signs. Above method seems to work, few edge cases to iron out(like <= 0, etc), will try them.

Copy link
Member Author
@ganesh-k13 ganesh-k13 Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found three edge cases:

  1. res is 0, then possible negative divisor/dividend
  2. divisor is 0 or -0, handled by putting inside else
  3. dividend is 0, handled by the same logic as 1.

Let me know if any more are there.
[EDIT]: Can use the same logic in sliding as well.

*((@type@ *)op1) = libdivide_@div@_do(in1, &fast_d) - 1;
*((@type@ *)op1) = libdivide_@type@_do(in1, &fast_d) - 1;
}
else {
*((@type@ *)op1) = libdivide_@div@_do(in1, &fast_d);
*((@type@ *)op1) = libdivide_@type@_do(in1, &fast_d);
}
}
}
Expand Down
0