-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Consider a faster alternative algorithm for random.shuffle() #108598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@d-rideout we'd welcome a suggested rephrasing of the offending paragraph if you have one, please! Quoting for posterity
On your aside, the tests for cpython/Lib/test/test_random.py Lines 74 to 105 in 39506ee
cc: @rhettinger & @mdickinson as the listed A |
I don't see a good reason to change this. There are, in general, no wholly "safe" lengths, in the sense of making all permutations equally likely. For a PRNG with P states, and a fixed list length K, there are P possible outcomes from shuffling the list (one for each possible starting state), so some outcomes are necessarily more common than others unless K! divides P exactly, and assuming all internal states are equally likely at the start. So the Wikipedia article on "Fisher–Yates shuffle" (which Python implements) says - but without quantitative supporting argument: """ When the Python docs were first written, Python used a Wichmann-Hill generator, with far smaller state space. Then even all permutations of a deck of cards were far out of reach, which I'm told some people had actually noticed in practice. That gross problems are far harder to provoke today doesn't mean they can/should be ignored. But neither are the Python docs the place for extended discussion of possible problems. |
I see. Thank you for the clarifications. There appear to be numerous unstated assumptions being made in all this discourse, including in the Wikipedia article that you cite. And I appreciate your admonition that the Python docs are not the correct place to iron out these ambiguities. However, I feel that the documentation should be correct, in the sense that the output of algorithm should be as claimed. I submitted the following rewording as a pull request, which is cited above, before reading your response.
How about the following, as a second, more accurate, attempt?
|
Leaving this to others - I'm not interested in pursuing it. The potential issues are too numerous and subtle to bother trying to be "more accurate" in some uselessly academic sense. It's uselessly academic enough already 😉. For example, even
is wrong. Python the language guarantees no such thing. And for CPython using the Mersenne Twister, it's not true. Leaving aside that Going on to suggest Another alternative: remove all cautionary text from the docs. Offhand, I can't recall any other language/library writeup that gives any warnings about its shuffling code, despite that they're all vulnerable to the same things. As said before, the Python warnings date back to days when CPython was using a much feebler PRNG. As a practical matter, it may now be doing more harm than good to highlight any cautions. |
Yes, I meant The algorithm does not need to return a permutation for every internal state of the PRNG. The revulsive character of the Thanks for considering it anyway. |
Sorry, I don't understand your intent there. When You could, e.g., hypothesize a quite different implementation that, say, looks at the input and goes "oh! this is a list of length 3, and now I'll consult a table indexed by I'm only talking about the Fisher-Yates shuffling algorithm we have (and have had since day 1). If you're going on to advocate for a different algorithm, then this is far from just a "doc" issue. |
Okay. I agree with your assessment of Fisher-Yates. I was indeed imagining a more general algorithm, which might circumvent the divisor issue that you raise by a special handling of some of the internal states. |
In the realm of different algorithms, this one comes to mind: pick an int in from math import factorial
from random import randrange
def shuffle(xs):
n = len(xs)
f = factorial(n)
r = randrange(f)
for i in range(n - 1):
f //= n - i
d, r = divmod(r, f)
xs[i], xs[i+d] = xs[i+d], xs[i] Then uniformity depends entirely on how well I never really thought about that. Seems possible that it may hit its own limitations. In any case, the giant-int arithmetic makes it much slower - it's been running over an hour on my box just now just to try to shuffle a list with a measly million elements 😉. |
Seems simpler and a bit faster to divmod by n-i: def shuffle(xs):
n = len(xs)
r = randrange(factorial(n))
for i in range(n - 1):
r, d = divmod(r, n-i)
xs[i], xs[i+d] = xs[i+d], xs[i] |
Cool! Then perhaps a tinier saving by reversing the loop order (while, e.g., a C oompiler would optimize the repeated for i in reversed(range(1, n)):
r, j = divmod(r, i + 1)
xs[i], xs[j] = xs[j], xs[i] But does it actually address the original concerns? After some thought, I don't think so. It's still the case that the outcome is wholly determined by the Twister's initial state, which can have one of "only" |
Thank you everybody! My main takeaway is a cultural one: The declared output of library routines can not be assumed to be mathematically exact unless the documentation explicitly states otherwise. I am happy with whatever resolution you all deem is reasonable. I very much appreciate your thoughts about an exactly uniform algorithm for shuffling. But agree that this is a much larger, separate, topic. Regarding the proposals for an exactly uniform shuffling algorithm, are these using the system entropy to generate the enormous random numbers? If so, then can't the Fisher-Yates algorithm be adapted to use these random bits as well, and thereby also provide exactly uniform random permutations? (I have not thought through the details carefully.) If not, then won't randrange() run into the same problems as Fisher-Yates was encountering, in its attempt to generate uniform samples from an enormous finite set? [I just saw the previous post now, which raises the same concern.] |
No, was just a little code improvement.
Java says:
|
I confess I'm still not clear on what the objection is. The docs don't promise mathematical perfection, but explicitly point out one possible class of imperfection, which was of some "practical" importance some years ago. Fine by me if that was removed entirely. What I'm least keen on is expanding on it, because more than just that can go wrong, and explaining all that is a chore better suited to a paper, or at least a blog post. That, e.g., over the Twister's whole period, one permutation of
Everything I talked about here was in reference to the But we can't make promises about that either, because we're at the mercy of the platform's implementation of |
My reading of the I like the hedge "approximately". But defer to your judgement. In my context I am fairly deeply troubled by some of the permutations of |
@d-rideout, Java's brief "approximate" dodge read well to me too. Care to take a stab at that? Note that I stopped staying up at night over these things a couple decades ago 😉. For example, there are states in the Mersenne Twister where, if you're asking for 32-bit unsigned ints, you can get back exactly zero 600 times in a row. More than 600? Ya, but not a lot more. IIRC, the most often it can happen is 623. Prssumably there's no upper bound on how often that can happen building on @pochmann, you like pursuing speed, so I'll note that the last sample code unranking |
Yep, up to factor 1.6 times faster for me. Most extreme for 12 and 19 elements. Maybe because 12! and 19! are the largest fitting in one and two 30-bit digits, respectively. How much faster does it need to be to have any chance to make it into the stdlib despite producing different shuffles? |
A factor of 1.6x is pretty good and would seem worthwhile to me. However, there is the matter of factorials quickly growing very large and of the time it takes to apply Also, I prefer to leave the wording in the doc as-is. That way, people will think about the issue when looking at proposals to substitute other PRNGs with a comparatively tiny state space. |
Side note: I vaguely remember years ago there was a successful lawsuit against a gambling device company where certain winning hands from shuffled cards were not possible given that the possible card arrangements exceeded the state space for the underlying PRNG. I think it is reasonable for the docs to point out that large shuffled many not cover all possible permutations. |
Any change along these lines would need to be conditional on the list length So, e.g., shuffle-by-unranking a million-element list takes well over 100 times longer than a 100-thousand element list. Completely impractical. This just reflects that k-by-1 bigint division takes time proportional to the number of digits "k", and that the number of numerator digits goes down roughly by 1 each time through the loop. Not even using GMP would help that much. Dividing by a single CPython 30-bit digit is pretty fast already. But the base idea may be generalizable: in effect, the unranking approach works by encoding the entire sequenece of swap indices in a single int, spanning as small a range as possible. There's only one direct call to randrange instead of n-1 direct calls to randbelow. It minimizes the number of times the underlying generator has to be accessed and the number of bits asked of it. But, in return, decoding the giant int becomes ever-more expensive. "All or nothing" are two extremes. In between, for example, could be repeatedly encoding "just the next two" swap indices in a single int. Then (for most plausible n) we'd be sticking to at worst 2-digit CPython ints, where arithmetic is as fast as it gets. Or "just the next J" swap indices, where J varies dynamically so that the product of "the next J" fits in 60 bits (2 CPython digits). But, ya, details left as an exercise for the reader 😆 |
@tim-one Yeah I meant using this only for small sizes where it's faster, or a hybrid like you just described, or different optimizations I played with a while back. |
My extremely rough estimate leads me to not be terribly surprised that some machine somewhere sees two consecutive 32-bit zeros emerging from a perfectly random source of independent random bits, produced at the rate that Mersenne Twister library functions might be called worldwide. But I may be a bit surprised at three! My first reaction might be to try to tighten my estimate. |
Might just be me, but that seems rather long and far too technical and thereby more confusing/intimidating than helpful. (And "delicate business" sounds strange in the doc.) I much prefer the current text. The only part I might change is the one I bolded:
That sounds like comparing length 2080 with length 2**19937-1. Might be better to say:
|
As a developer of scientific software, I appreciate understanding this issue better, including how the lack of uniformity transitions from an irrelevant technicality at Does documentation like this have footnotes? |
Should that be "random" or "ransom" ;-) |
Maybe some less optimized pseudo-implementation in that comment block? Like that it's mostly an optimized version of this: def shuffle(x):
for i in range(1, len(x)):
j = randint(0, i)
x[i], x[j] = x[j], x[i] And like this, just avoiding def shuffle(x):
for i in range(1, len(x)):
k = i.bit_length()
j = getrandbits(k)
while j > i:
j = getrandbits(k)
x[i], x[j] = x[j], x[i] I'll btw likely be mostly offline for a week now. I can get back to this afterwards, but if one of you wants to do it, I'm happy with that. Making finishing touches might be more efficient anyway if you just do it yourself. |
No strong opinion; happy with whatever you pick. The speedup looks nice. On the original doc issue, I'd be delighted to see the caveat and the explicit As a thought experiment: suppose you have the Mersenne-Twister-based For historical contrast, if |
As usual, I'm most sympathetic with Mark here. I added this disclaimer way back in the Wichmann-Hill days, when someone actually came up with a permutation of a deck of cards that Now it's a technical curiosity of no seeming practical importance - now it's arguably needless FUD. In a blog post or appendix it could be fun to explore, but the audience would be hardcore nerds with nothing better to do 😉. Should we add this warning to
It's the same thing, really. The starting state wholly determines the N-tuple constructed, so no more than So this kind of thing wasn't unique to |
I'd like to argue for a slight shift in attitude towards computer documentation. Along lines that have been discussed above: that it can be seen as an important opportunity to warn the user about potentially severe issues that they may not have considered previously. All these claims of the form "everything will be fine in practice" seem not so dissimilar to the shoddy reasoning that I presented to argue that 2075! "was a small number", which was, rightly so, rejected. We have no proof that
How do we know that it is "a technical curiosity of no seeming practical importance"1?
How about promoting the My impression is that many scientific users of this code trust it to produce the output it describes, at times, long before they become intimately familiar F438 with the intricacies of pseudo-random number generators (PRNGs). And my feeling is that the public trust will be better served by admitting this potential pitfall in a prominent place.
I, for one, greatly appreciate that the remark about 2080! was left in the Footnotes
|
My problem remains that I don't know what could be said that would do more good than harm. Crypto is very different. There's nothing hypothetical about the complete unsuitability of the Twister in crypto contexts. Extremely effective attacks against crypto algorithms using non-crypto-strength RNGs have been published and are widely known in the security world. Real-world examples of relevant security breaches are also known. Crypto newbies need to be warned about this - it's a highly visible failure mode that makes "the news" routinely because people actually fall into it. What can we say about non-crypto contexts? The Twister has been around for 25 years now, has to be the most heavily tested PRNG in history, and there are just no stories I've heard of it getting the blame for a real-world problem. It fails some statistical tests that are trying to detect whether a generator is doing arithmetic in GF(2), and that's about it for non-theoretical "quality" complaints. I haven't heard of an organization dropping the Twister for failing its own "due diligence" tests - quite the contrary, it gained near-universal adoption because it passed so many organizations' tests. I can't give an example of it failing, because I've never heard of one(*), and don't know how to provoke one in less than an absurd amount of time. I can't tell a user how to detect a failure if one occurs. And, if one does, I can't tell the user how to avoid- or sidestep -it. So warning about the theoretical possibilities is pretty much the definition of FUD (stoking fear, uncertainty, and doubt about pure hypotheticals). For most users, that would do them more harm than good. (*) I can construct internal states that, e.g., would lead to In scientific code, users are far more likely to get burned badly by custom code that hasn't been analyzed at all by a numerical expert for potentially damaging loss of precision due to a numerically naïve algorithm failing to control for rounding errors. That's a problem that burns people every day, but the Python docs don't even mention it once anywhere 😉 |
FYI, this paper was written just a few years ago: It Is High Time We Let Go Of The Mersenne Twister I think it's fair overall, although the author has his own PRNG schemes to sell and is "highly motivated" to diss others' work. What I take from it is that nothing has really changed since the last time I looked: he has no real-life Twister failure anecdotes to share, just repeating that it's known to fail some specific tests that all GF(2)-based PRNGs fail. Which has been known for quite a long time. One of these tests was due to George Marsaglia, the late RNG genius who first proved that n-tuples created from the successive terms of a LCG generator fall in a relatively small number of (n-1)-dimensional hyperplanes. He realized too that for generators "like" the Twister, large "random" 0/1 matrices created from their bits will have, overall, ranks that are too small. Which is the test. While that doesn't appear to have consequences anyone cares about 😜, all else being equal, of course we'd prefer a generator that didn't fail any known test. But that's not the point here: this paper is likely the strongest informed criticism of the Twister that can be found. And it's weak tea. Anything going beyond it would be speculative - FUD-like. Note that I'm not wedded to the Twister. I would prefer a generator with much smaller state (preferably fitting in a cache line or two), simpler code, faster code, fast "jumpahead" abilities, and one that passes all known tests. I don't really care about having a massive period. But I also think Python should be a follower in this area, not a pioneer. So, @mdickinson, if you're still here, I believe numpy switched to a member of the much newer PCG family. If so, how has that been working out for you? I'm a big fan of Melissa O'Neill's writing and thinking on the topic, so I'm looking for "reasons" to consider whether Python should follow numpy's lead here. |
Okay. Thank you everybody for considering my concerns. I am going to think about this further, along the lines of a variant of the test that Mark Dickinson has proposed. Anyone care to join me? |
@mdickinson, I didn't follow this line of argument. It's certain that WH can't generate more than ~7e12 (its period) distinct permutations, but it goes through every one of its possible internal states before repeating. So if there's a duplicate permutation generated before then, it's not due to WH suffering a birthday-paradox internal state repetition, but due to the shuffling algorithm mapping distinct WH states to the same permutation. It's the algorithm that's subject to the birthday paradox, not the underlying generator. Some quick coding doesn't change my mind about that 😉. Here's output from a run with a sequence with only 17 elements (I don't want to wait forever on this). Even at such a "small" size, there was a case where using Wichmann-Hill as the generator took over 45 million shuffles to find the first duplicate. Offhand, WH doesn't appear to be doing any worse at this than using the Twister as the generator, or using the system EDIT: to try to nail this, I put in a far feebler version of WH: def whbase(c=4991):
while True:
c = 170 * c % 30323
yield c/30323.0
wh = whbase().__next__ 30323 is a prime, so EDIT 2: "all or half?" are special cases of a more general pattern, which turns out to be shallow. If I pass a list with 15162 elements,, the horridly feeble WH variant generates only 2 distinct permutations. Why? Because the shuffling algorithm calls the generator Note that the above is nearly irrelevant for the Twister, because its period is prime.
Click here for the codedef shuf(xs, r):
for i in range(len(xs)-1, 0, -1):
j = int(r() * (i+1))
xs[i], xs[j] = xs[j], xs[i]
def whbase(a=123, b=145, c=4991):
while True:
a = 171 * a % 30269
b = 172 * b % 30307
c = 170 * c % 30323
yield (a/30269.0 + b/30307.0 + c/30323.0) % 1.0
wh = whbase().__next__
def dupdetect(r, n):
count = 0
base = bytearray(range(n))
seen = set()
while True:
count += 1
xs = base[:]
shuf(xs, r)
t = bytes(xs)
if t in seen:
print(f"dup found at {count=:_}")
break
seen.add(t)
N = 17
import math
fac = math.factorial(N)
s = math.isqrt(fac)
print(f"{N=:} N!={fac:_} isqrt={s:_}")
import random
for meth, name in [(random.SystemRandom().random, "random.SystemRandom().random"),
(random.random, "random.random"),
(wh, "WH")]:
print("using", name)
for i in range(5):
dupdetect(meth, N)
print() |
Please see my reply to Mark just above - I believe he suffered a rare thinko when proposing that test (best I can tell, it won't work unless you try a number of shuffles proportional to
I'd like to at least lurk, but can't commit to another time sink. How about setting up a new github repo for this? Then those interested can discuss, and share code, and lurk, in a more-or-less convenient shared space. Suggestion: tackle Wichmann-Hill first. I gave an implementation in my reply to Mark. It should be far easier to detect problems in that. "It exhibits 12 clear failures in the TestU01 Crush suite and 22 in the BigCrush suite (L'Ecuyer, 2007)" |
Aargh! Yes, you're right of course. 😳 Please ignore everything I said. :-)
On NumPy's use of PCG: Robert Kern (@rkern) would be the person to ask; I've been following, but not closely. The most interesting development was the somewhat recent addition of a new PCG variant PCG-64 DXSM, which mitigates concerns about possible correlations between what were intended to be statistically independent streams, and doesn't appear to have any significant downside relative to the original PCG XSL RR 128/64. I believe NumPy intends to eventually make PCG-64 DXSM the default bit generator, but hasn't yet done so. Despite having read the relevant thread several times, I still can't claim to have a clear picture of the status of the correlation concerns (if any) with PCG-64 DXSM. My takeaway understanding (and I hope @rkern will chime in to correct me if necessary) is that with "reasonable" data sizes and a "reasonable" number of non-adversarially-seeded (and non-naively-seeded) streams, there aren't currently grounds for concern, but I can't quantify "reasonable" here. As far as I can tell, adopting PCG-64 DXSM for Python wouldn't be at all unreasonable (at least from a statistical point of view; I'm totally neglecting all the logistic aspects of changing the default generator). Additional references:
|
There is an application of the birthday paradox to how many permutations a PRNG can generate. Suppose we're looking at lists with states = a list of all the generator's P possible states
for state in states:
force the generator's state to state
shuffle range(N)
if we've seen the resulting permutation before:
raise ValueError("not possible to generate all permutations")
remember that we've seen the resulting permutation
print("WOW - got 'em all") If the generator is "good", forcing the starting state shouldn't change that the shuffle yields a random-ish permutation. But, if so, we expect to see a duplicate after Empirically, I built 5 different generators with period
So, that was expected: lots of permutations couldn't be generated. Making the period 10x longer helped, but still left a few hundred permutations unreachable. After making the period 20x longer, all permutations were reached, in each of 3 runs. Tech note: how to make a good generator with an arbitrary period class ArbPeriod:
def __init__(self, P):
import random
r = random.SystemRandom().random
self.P = P
self.floats = [r() for i in range(P)]
self.i = 0
def random(self):
i = self.i
result = self.floats[i]
i += 1
if i >= self.P:
i -= self.P
self.i = i
return result
def setstate(self, i):
self.i = i % self.P |
My most approachable attempt at summarizing the issue is now in our documentation. To put some numbers on it, with DXSM if the 128-bit states share the lower 110 bits or less, and have exactly the same increment, I don't see a problem out to 256 GiB in PractRand (before I get bored and kill it). If I share all 128 bits of state and have increments with a Hamming weight between 16 and 100, roughly, then I don't see problems with this interleaving test either. So to get the same interleaving problem that we see earlier with XSL-RR, I need to share (up to certain equivalence classes) significantly more than 128 bits between the two All that said, we're not super-concerned about the issue with XSL-RR to begin with and have no particular timeline for making the switch (though that's in some part just held up by my shame at misimplementing the initialization). Even my summary probably overstates the problem out of an abundance of caution. To even observe the PractRand failure in a real situation, you have to find and pull these two "colliding" streams together out of the horde of streams that you've made and interleave their raw bits precisely aligned before PractRand's sensitive tests find an issue. If you do any useful work with those raw bitstreams, you wouldn't see a problem from the results of that work. But if looking for options to migrate to the PCG family from another family, the DXSM variant is the one I'd choose. It's easier to document it's properties without too many asterisks. I have unpublished work looking at some variants of the DXSM output function that improve its avalanche properties without paying a performance tax, but I'm not nearly invested enough to devote the CPU hours of testing much less publication and evangelism to make it a viable option. I also personally like SFC64 if one is looking around. |
Thanks for the links, Mark! Looks like PCG has been working fine for numpy. What I took from the thread is that the recent churn was all about the possibility of correlations among "Independent" streams, which could occur if:
It's great that numpy is pro-actively addressing the possibility, instead of waiting for the atmosphere to explode due to a flawed Monte Carlo code 😉. |
Anecdata: I asked a knowledgeable person who is in fact convinced that this is a problem, and he had never been able to construct even an adversarial example of a practical problem. |
Talk imneme into thinking they're her ideas? 😉. I saw that, in one of the numpy threads, Vigna pointed out that avalanche could be improved just by picking different multipliers. Modern "fast" (non-crypto) hash/digest functions have put a lot of thought into that. Not so much in the way of theory, but more computer searches for sequences of operations (multiplies and shifts by fixed constants) with demonstrably good avalanche behavior. CPython got spectacular improvement in some pathological cases of tuple hashing by implementing part of xxHash's scheme. dict lookup speed is best if the low-order bits of hashes don't collide. The specific multipliers and shift counts used were important, but the original author had no account for "why?" other than they did best in extensive avalanche testing.
Thanks for the tip! On your recommendation, I'll stop ignoring it 😉 - I always just knee-jerk assumed that something so short and simple and multiplication-free just had to be suffering major shortcomings. But maybe not!
I think shuffling is a red herring here. Cut to the heart: using the Twister, we can only generate an inconceivably small fraction of the mathematically legitimate results from |
Oh, it's been offered up. Playing around, just reversing the order of the multiplications (free) so that the low bits get mixed in first made a big difference to the avalanche properties, then adding on one more xorshift cleans it up even more (minor penalty, but just bringing it back into line with XSL-RR and the older expensive LCG multiplier). I think I played around with the multiplier constant, but I don't think I saw any noticeable difference. DXSM as it stands has enough of a safety factor above my 128-bit-do-I-really-care? line for this test that I don't care to pursue it as a new option.
Indeed. My practical-minded response to this concern about missing permutations is that I want to see a new test that I could add to PractRand that would show a problem with a PRNG faster than other tests already in PractRand that fail for those PRNGs. One can use small PRNG variants for this purpose, of course, because most of the ones under consideration are already too big to fail. Re: SFC64, I liked imneme's articles about its proximate ancestor, Jenkin's Small Fast generator, and the statistics of random invertible mappings to get a feel for what this class of PRNG does. SFC64 is in the same style, but adds in a counter to the state so that the cycles are all a minimum of |
Jenkins has some very interesting things to say about avalanche. Short course: tests that manipulate the internal state bits directly and measure resulting avalanche in outputs can run orders of magnitude faster than chi-square tests (which compare some statistic computed from the output stream against an ideal distribution). So much faster that it's thoroughly practical to test all possible combinations of shift/rotate constants for avalanche effects. If avalanche is poor, chi-square tests fail too. But if avalanche is good, chi-square tests also pass. In his experience, at least with his style of generator. But while "close to perfect" avalanche behavior is vital for block ciphers and digest functions, at least his kind of RNG is far more forgiving. Turns out that none of his initial tries for 32-bit generators flipped more than 4 bits (ideal would be 16) on average (but exactly how he measured this isn't yet clear to me). They all eventually failed some chi-square tests. But by using rotate constants found by computer search, the average went up to 8.8 bits (still far short of 16!), and that passed all tests. And that's the one he released to the world. He later found another brief sequence of operations that achieved 13 flipped bits on average, but apparently thinks that even the earlier 8.8 was overkill in this context. I'll note that the Twister is extraordinarily slow at diffusing internal bit changes. Take a Twister instance, and clone it, but flip one bit in its state. Then run the original and the clone in lockstep. For the first 1000 (or so) iterations, they'll typically produce identical 32-bit outputs on most iterations. When they differ, the Hamming distance will be low. This can persist for dozens of thousands of iterations. So that demonstrates that passing most chi-square tests (which the Twister does) doesn't require "reasonably fast" avalanche from internal state bit changes to outputs, or even anything close to reasonably fast - but then "too big to fail" covers a multitude of sins 😉. |
Hi all, Apologies for the delayed response. I have been reading up on some of the background that you all have been pointing to. I have yet to digest the O'Neill PCG paper, which appears to be very important. In the mean time: @mdickinson 's proposal is very much along the lines that I was imagining, and I think is exactly correct. The fact that @tim-one 's experiments appear to contradict this test is indeed curious! And perhaps worth pursuing. But I would like to understand what O'Neill has to say on the subject before creating a repository to explore the details. |
@d-rideout, there's no rush 😄.
If you didn't notice, @mdickinson retracted that suggestion. It can't work, and for the reasons I sketched. Think carefully:
To connect the dots explicitly, after throwing 10 million balls into 7 trillion bins picked uniformly at random with replacement, the chance that at least one bin contains more than one ball is > 99.9209511800985347485%. But the internal state transitions in WIchmann-Hill don't behave at all in a pseudo-random fashion. Very much the contrary: by design, you never hit a duplicate state until after the full period has been exhausted. The birthday paradox is simply irrelevant to the generator's internal states as the program goes on. The total number of times WH was invoked can be relevant, but 10 million shuffles of a 100-element list consumes an insignificant fraction of WH's period.
Trust me when I say it's very much worth your fighting your way to understanding. It's fundamental, which isn't to say easy or self-evident. Reasoning about "randomness" can be subtle! Write code with small examples - don't trust your head 😆 |
I'll close this issue as not actionable. |
Uh oh!
There was an error while loading. Please reload this page.
The documentation for the
shuffle()
function in the random module expresses concern that, for sequences of length larger than 2080, theshuffle()
function is not able to produce all permutations of the sequence. While true, I find the emphasis on the finite period of the Mersenne Twister random number generator misleading, since the generator is only called n-1 times for a sequence of length n. The real issue is that any algorithm based on a sequence of calls to a pseudorandom number generator will not be able to generate more permutations than the generator has internal states.(I also wonder how large of a sequence can be sent to
shuffle()
in practice, to be guaranteed a uniform distribution over permutations. Is 2079 small enough? 2078? Might there be a check for this?)Linked PRs
The text was updated successfully, but these errors were encountered: