-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Allow multiple versions of RandomState: init' work. #5911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Following the discussion in numpy#5299, and after discussing this with @njsmith today, this is the approach I implemented: RandomState() return the latest, best, etc. version of RandomState. RandomState([seed=]x) return the current (as of 1.9) version of RandomState. RandomStats([[seed=]x], version=version, rng=rng) return a RandomState object with methods from the given `version` (an integer) and underlying random stream given by `rng` (a string, e.g. "mersennetwister"). It is intended that new versions of methods (e.g. random.choice) check the `version` attribute to dispatch. Because new versions of RNGs may want to have a different kind of internal state than `rk_state`, perhaps that attribute should be converted to a `void*`. Ideas for testing welcome.
Woohoo! Thanks for getting this started. A few immediate thoughts:
Non-code preconditions to merging:
|
Probably for a future PR.
... and minor doc update.
BTW -- github, in its infinite wisdom, does not provide any notification when you push new changes to a PR. If you have made changes and want people to look at them, you have to remember to leave a comment. |
cdef rk_error errcode | ||
cdef ndarray obj "arrayObject_obj" | ||
self._free_internal_state() | ||
self.internal_state = <rk_state*>PyMem_Malloc(sizeof(rk_state)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dropping and recreating internal state stuff seems superfluous to me... (and even the existence of the _free_internal_state
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a leftover of the initial attempt at supporting various internal PRNGs as well (in which case the internal state needs to be reallocated whenever the underlying PRNG is changed). I'll get rid of it.
Tests are in. |
Also remove some leftover code from attempt to support multiple PRNGs.
Kindly bumping the issue. |
I would like to see this going into 1.10. Kindly bumping the issue, now that 1.10 is being branched out. |
@@ -597,6 +601,11 @@ cdef class RandomState: | |||
``/dev/urandom`` (or the Windows analogue) if available or seed from | |||
the clock otherwise. | |||
|
|||
version : int between 0 and HIGHEST_VERSION, keyword-only, optional | |||
The version of the RNG methods. Defaults to `HIGHEST_VERSION`, | |||
except if `seed` is set in which case it defaults to `0` for backwards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ReST, not markdown, so you need to double the backticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look fixed here -- maybe you need to push?
The remaining tests I'd still like to see:
Given those tests plus the above comments, I think this is good to go. (Sorry for the delay! #5844 and related have kinda been soaking up all my numpy-brain the last few weeks.) |
c66b9cf
to
86d214a
Compare
I should have addressed your issues now (and indeed I was missing a push). |
@anntzer: Looks like you have test failures. |
1fb16a6
to
ef9ad34
Compare
@anntzer Needs documentation, in I'm inclined to put this off to 1.11.0 when we may actually have different versions. It would be good if we actually had different generators and not just versions of the Mersenne Twister, and rewriting the distributions to check the version might be a good thing to do in the process. |
@charris: Point of information, may or may not affect your conclusions: this PR does in fact define a "version 1" RandomState, which is similar to "version 0" except that it fixes #5299 by providing a better algorithm for sampling without replacement via The different generators part would indeed be nice, but is orthogonal to this PR so I don't think should block it either way. |
...actually, now that I look more closely, it looks like the new implementation of Sorry Antony, this is definitely not looking like an easy one to slip in ahead of 1.10 :-/ |
Yup, if I remember correctly I put in the truncated Knuth shuffle because there was already an implementation there and the idea of the PR was more to make sure we had the infrastructure to handle multiple implementations rather than introducing new implementations. |
☔ The latest upstream changes (presumably #6831) made this pull request unmergeable. Please resolve the merge conflicts. |
What is the status of this? |
I think we're waiting for #6801 to be ready, to be sure that this patch actually addresses the issues needed. |
has_gauss = self.internal_state.has_gauss | ||
gauss = self.internal_state.gauss | ||
pos = self.internal_state.pos | ||
state = <ndarray>np.asarray(state, np.uint32) | ||
return ('MT19937', state, pos, has_gauss, gauss) | ||
return version_info + (rng_info, state, pos, has_gauss, gauss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the docstring, which claims it returns nested tuples. I reckon the docstring format is the better one, so go with that.
@@ -713,9 +735,10 @@ cdef class RandomState: | |||
|
|||
Parameters | |||
---------- | |||
state : tuple(str, ndarray of 624 uints, int, int, float) | |||
state : tuple([int], str, ndarray of 624 uints, int, int, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, doesn't match the docstring for get_state
version, rng, *rest = state | ||
if version not in range(_HIGHEST_VERSION + 1): | ||
raise ValueError("`version` must be an integer between 0 and {0}". | ||
format(_HIGHEST_VERSION)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error should be split in two, to distinguish "this is not a valid version" and "this version is newer than this version of numpy"
I am going to close this PR, as I don't intend to work on this anymore. However, feel free to take over the patches for a new PR. |
Following the discussion in #5299, and after discussing this with
@njsmith today, this is the approach I implemented:
RandomState()
=> return the latest, best, etc. version of RandomState.
RandomState([seed=]x)
=> return the current (as of 1.9) version of RandomState.
RandomStats([[seed=]x], version=version, rng=rng)
=> return a RandomState object with methods from the given
version
(an integer) and underlying random stream given by
rng
(a string,e.g. "mersennetwister").
It is intended that new versions of methods (e.g. random.choice) check
the
version
attribute to dispatch.Because new versions of RNGs may want to have a different kind of
internal state than
rk_state
, perhaps that attribute should beconverted to a
void*
.Ideas for testing welcome.