-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: handle introsort depth limit properly #7871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7142aa5
to
c173946
Compare
I've taken a closer look at this and I'm not convinced that it is correct. Note that the largest partition gets pushed on the stack so that the stack never grows rapidly, in fact it grows fastest if each partitioning exactly divides the working partition into two equal pieces. I think that what you want to check is if it is growing too slowly, which implies that the partitions on the stack are larger than desired. What am I missing? Note that the stack was managed that way so it never got bigger than log2(n) recursions. |
Or you could push the smallest partition on the stack instead, making it grow too fast, no need for another stack in that case. If you want to duplicate the introsort pseudo code, just push the first partition on the stack along with the depth, no need for an extra stack. There are a lot of ways to do it... |
If you don't push the largest partition on the stack you will want a bigger stack. Using our own stack rather than straight recursion adds speed, so we should keep it, but it complicates introsort. Note that if you push the smallest partitions a lot of them will already be sorted one element partitions by the time you hit maximum depth. |
I think the thing to do is ditch the new stack, decrement the depth after the partition, and put the depth on the stack along with the largest partition, then reset the depth from the stack when the stack is popped. The depth popped will not always be the same as the actual stack depth, which is as it should be as it will vary with the pattern of pushes and pops. EDIT: I'd also move the heapsort check inside the main while loop so that insertion sort gets called by preference. |
it should be sufficient to only increase the depth counter on the large partition as the small partition is always less than 50% of the data and thus not degraded. |
to clarify, the proper mirroring of the recursive intro sort would be to check inside the while loop and increment the local |
Maybe also add benchmarks for the quicksort worst case + random input?
|
Yes, the depth counter only matters when the depth is popped off the stack which is why it should be decrementing before pushing on the stack. If the check is inside the loop insertion sort will be called before heapsort except possibly after a stack pop. And please put everything on one stack, using a separate integer stack doesn't save much room. EDIT: Using the original introsort decrementing depth. |
Maybe only check the depth immediately after the stack pop. |
hm the depth check could indeed be put behind the stack pop. But it is a quite minor optimization, the check is only true in rare exceptional cases in the first place, it shouldn't make much difference if one calls insertion sort or heapsort on the small remaining data then. |
Agree, it would probably never be noticed. |
Does have the advantage that there is no goto and the while loop always runs to completion. |
you still need the goto as you have to skip the sorting of the already heapsorted array and get to the next stack pop |
OK, so in the original, just
and add
Update the comment before the stack push and increase the stack size by 50, and I think that does it. |
Oh, right. I missed that, forgot we were pushing pointers instead of indices. Could have done indices originally, I guess, but the pointers were a bit faster... |
Another option is to call heapsort rather than pushing the partition on the stack. |
I have removed the initialization of the depth stack and added some new benchmarks |
@@ -107,6 +111,7 @@ quicksort_@suff@(void *start, npy_intp num, void *NOT_USED) | |||
*sptr++ = pi - 1; | |||
pl = pi + 1; | |||
} | |||
*psdepth++ = cdepth + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be *psdepth++ = ++cdepth;
. cdepth should be increasing up the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its cdepth + 1
, the other variables are also changed without the postfix so this is consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it is wrong. Every push on the stack should increment, depth has nothing to do with actual stack depth here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as every recursion in the introsort pseudo code decrements depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every stack push does increase the depth, we are only skipping the increase in the shorter partition as it is not necessary to keep track of it there
the depth is never decreased the only thing that is done is switch to a different depth in a different "recursive" calltree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh right I moved this variable into a higher scope ...
it does not really matter for performance as we don't really care about the absolute depth of the stack we just want to cut of at some point
But it is relevant to avoid the overflow of the bounded to 100 elements stack which could happen for arrays greater than 2^50 elements so I'll change it
sorry for being so dense here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm the overflow can still happen as we aren't checking the depth in the small partition, so lets just increase the stack from 100 to 128 (as each level adds two pointers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need the ++
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes updated the code, still got the goto though
I'll sleep about it lets see how I like it tomorrow :)
I'll switch it to decreasing the depth down from the limit, I initially increased it because I misguidedly though I had to initialize the depth stack (so memset(0) vs a for loop) |
I'm thinking that instead of |
npy_intp depth_limit = npy_get_msb(num) * 2; | ||
int depth[PYA_QS_STACK]; | ||
int * psdepth = depth; | ||
int cdepth = NPY_SORT_MIN(PYA_QS_STACK, npy_get_msb(num) * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think * 2
is needed here, just adjust the cutoff point. Then you don't need the NPY_SORT_MIN
either, especially as it is now 64
. I suppose you could also use NPY_BITSOF_INTP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM. However, the stack can't fill up, so just
int cdepth = npy_get_msb(num) * 2;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the factor 2 is just some random increase (which should be documented) to give the algorithm some slack on not ideal data. The only thing that needs to be done is avoid that the depth is almost linear in number of elements so as long as the limit is constant * log2(n) we should have our worst case.
but the min can now be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, cdepth can be anything that serves as a reasonable cutoff. It is independent of the actual stack depth as there may be jumps in it's value greater than one along the stack: partition gets popped, gets replaced by partition with cdepth decremented. Continuous bad partitioning actually leads to the smallest stack.
Looks generally good, just a few comments for consideration. |
@@ -33,10 +53,11 @@ | |||
#include <stdlib.h> | |||
|
|||
#define NOT_USED NPY_UNUSED(unused) | |||
#define PYA_QS_STACK 100 | |||
#define PYA_QS_STACK 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
#define PYA_QS_STACK 2*NPY_BITSOF_INTP;
Or just use NPY_BITSOF_INTP directly below.
hm shifting the count is a neat way of writing it, but I think it makes it less obvious what we are doing. Performance wise it shouldn't matter, the msb is cheap (and could be further optimized using |
On second thought I decided it was a bad idea ;) It can't get the factor of two increase in bits. |
the larger partition needs its own depth limit else it grows faster than logarithmic. Also increase the stack size to hold up to 64 instead of 50 pointer pairs else it might overflow on arrays larger than than 2^50 elements.
benchmark for random, all equal elements, many equal elements and median of 3 worst case
@@ -69,10 +94,12 @@ quicksort_@suff@(void *start, npy_intp num, void *NOT_USED) | |||
@type@ *stack[PYA_QS_STACK]; | |||
@type@ **sptr = stack; | |||
@type@ *pm, *pi, *pj, *pk; | |||
npy_intp depth_limit = npy_get_msb(num) * 2; | |||
int depth[PYA_QS_STACK]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is large by a factor of 2, but it certainly doesn't hurt.
Thanks Julian. |
each partition needs its own depth limit else it grows faster than
logarithmic ...