8000 Allow multiple versions of RandomState: init' work. by anntzer · Pull Request #5911 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Allow multiple versions of RandomState: init' work. #5911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented May 23, 2015

Following the discussion in #5299, and after discussing this with
@njsmith today, this is the approach I implemented:

RandomState()
=> return the latest, best, etc. version of RandomState.
RandomState([seed=]x)
=> return the current (as of 1.9) version of RandomState.
RandomStats([[seed=]x], version=version, rng=rng)
=> return a RandomState object with methods from the given version
(an integer) and underlying random stream given by rng (a string,
e.g. "mersennetwister").

It is intended that new versions of methods (e.g. random.choice) check
the version attribute to dispatch.

Because new versions of RNGs may want to have a different kind of
internal state than rk_state, perhaps that attribute should be
converted to a void*.

Ideas for testing welcome.

Following the discussion in numpy#5299, and after discussing this with
@njsmith today, this is the approach I implemented:

RandomState()
    return the latest, best, etc. version of RandomState.
RandomState([seed=]x)
    return the current (as of 1.9) version of RandomState.
RandomStats([[seed=]x], version=version, rng=rng)
    return a RandomState object with methods from the given `version`
    (an integer) and underlying random stream given by `rng` (a string,
    e.g. "mersennetwister").

It is intended that new versions of methods (e.g. random.choice) check
the `version` attribute to dispatch.

Because new versions of RNGs may want to have a different kind of
internal state than `rk_state`, perhaps that attribute should be
converted to a `void*`.

Ideas for testing welcome.
@njsmith
Copy link
Member
njsmith commented May 23, 2015

Woohoo! Thanks for getting this started.

A few immediate thoughts:

  • I'd just leave out the rng stuff for now. We don't want to expose any API here until we actually have support for more than just the mersenne twister (because if you expose API prematurely you run the risk of locking yourself into something that you later discover is somehow insufficient), and even if you are planning to add support for non-MT core rngs then that should still be in a separate PR after this one to keep things clean.
  • OTOH, it might be nice to include one improved random method in here (maybe sample?), so that we immediately have two distinct versions and can make sure that the infrastructure you're adding actually works the way we want.
  • I'd expose version as a read-only attribute. Makes testing easier if you can do r = RandomState(foo, version=bar); assertEqual(r.version, bar) or r = RandomState(); assertEqual(r.version, LATEST_VERSION).
  • Now that I think of it, we should probably call it HIGHEST_VERSION just for parallelism with pickle's HIGHEST_PROTOCOL.
  • Obviously pickle methods and get_state/set_state need to be augmented to include the version. (Actually the former might just call the latter?)

Non-code preconditions to merging:

  • Someone (ideally you?) needs to write a short summary of the new API and why it is the way it is, and send it to the mailing list for review. I think it's actually pretty straightforward in this case, but that's the rule for all new APIs, and some of the rationales aren't obvious until you think it through (as we noticed today :-).)
  • Tests: ideally I guess we should have a little script that instantiates a RandomState object with some fixed seed, and then does a loop calling every single method with a few varying parameters, that we can run using numpy version 1.x, save the output to a file, and then compare the output against what we get on numpy version 1.(x+i). Similarly it would be a good idea to pickle some RandomState objects on old versions and save those in the test suite, so we can test that loading them into new numpys works as expected.

@njsmith
Copy link
Member
njsmith commented May 25, 2015

BTW -- github, in its infinite wisdom, does not provide any notification when you push new changes to a PR. If you have made changes and want people to look at them, you have to remember to leave a comment.

cdef rk_error errcode
cdef ndarray obj "arrayObject_obj"
self._free_internal_state()
self.internal_state = <rk_state*>PyMem_Malloc(sizeof(rk_state))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dropping and recreating internal state stuff seems superfluous to me... (and even the existence of the _free_internal_state method).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a leftover of the initial attempt at supporting various internal PRNGs as well (in which case the internal state needs to be reallocated whenever the underlying PRNG is changed). I'll get rid of it.

@anntzer
Copy link
Contributor Author
anntzer commented May 26, 2015

Tests are in.

Also remove some leftover code from attempt to support multiple PRNGs.
@anntzer
Copy link
Contributor Author
anntzer commented Jun 9, 2015

Kindly bumping the issue.

@anntzer
Copy link
Contributor Author
anntzer commented Jun 19, 2015

I would like to see this going into 1.10. Kindly bumping the issue, now that 1.10 is being branched out.

@@ -597,6 +601,11 @@ cdef class RandomState:
``/dev/urandom`` (or the Windows analogue) if available or seed from
the clock otherwise.

version : int between 0 and HIGHEST_VERSION, keyword-only, optional
The version of the RNG methods. Defaults to `HIGHEST_VERSION`,
except if `seed` is set in which case it defaults to `0` for backwards
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ReST, not markdown, so you need to double the backticks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look fixed here -- maybe you need to push?

@rgommers rgommers added this to the 1.10.0 release milestone Jun 22, 2015
@charris
Copy link
Member
charris commented Jun 25, 2015

@anntzer @njsmith Be good to get this settled.

@njsmith
Copy link
Member
njsmith commented Jun 25, 2015

The remaining tests I'd still like to see:

  • Some explicit pickles (stored as strings or whatever) made with 1.9 and current master, to make sure we really will continue to be able to load old ones.
  • A test that trying to set a too-high version via seed or __init__ raises an error
  • A test that trying to set a too-high version via set_state or unpickling raises an error.

Given those tests plus the above comments, I think this is good to go.

(Sorry for the delay! #5844 and related have kinda been soaking up all my numpy-brain the last few weeks.)

@anntzer anntzer force-pushed the random-api-for-new-versions branch from c66b9cf to 86d214a Compare June 26, 2015 00:06
@anntzer
Copy link
Contributor Author
anntzer commented Jun 26, 2015

I should have addressed your issues now (and indeed I was missing a push).

@njsmith
Copy link
Member
njsmith commented Jun 26, 2015

@anntzer: Looks like you have test failures.

@anntzer anntzer force-pushed the random-api-for-new-versions branch 4 times, most recently from 1fb16a6 to ef9ad34 Compare June 26, 2015 08:44
@charris
Copy link
Member
charris commented Jun 27, 2015

@anntzer Needs documentation, in doc/release/1.10.0-notes.rst, doc/source/reference/routines.random.rst. If it isn't documented, it doesn't exist.

I'm inclined to put this off to 1.11.0 when we may actually have different versions. It would be good if we actually had different generators and not just versions of the Mersenne Twister, and rewriting the distributions to check the version might be a good thing to do in the process.

@njsmith
Copy link
Member
njsmith commented Jun 28, 2015

@charris: Point of information, may or may not affect your conclusions: this PR does in fact define a "version 1" RandomState, which is similar to "version 0" except that it fixes #5299 by providing a better algorithm for sampling without replacement via RandomState.choice.

The different generators part would indeed be nice, but is orthogonal to this PR so I don't think should block it either way.

@njsmith
Copy link
Member
njsmith commented Jun 28, 2015

...actually, now that I look more closely, it looks like the new implementation of choice actually fails to fix #5299, because the truncated Knuth shuffle it uses still takes O(population size) memory (and thus O(population size) time as well). I thought the idea was to switch to Floyd's algorithm, which is O(samples) in both time and space, at least when samples << population size.

Sorry Antony, this is definitely not looking like an easy one to slip in ahead of 1.10 :-/

@anntzer
Copy link
Contributor Author
anntzer commented Jun 29, 2015

Yup, if I remember correctly I put in the truncated Knuth shuffle because there was already an implementation there and the idea of the PR was more to make sure we had the infrastructure to handle multiple implementations rather than introducing new implementations.
I don't have the time right now to implement Floyd's algorithm (which requires a set implementation in C(ython), I don't know if there's one lying around).

@homu
Copy link
Contributor
homu commented Dec 17, 2015

☔ The latest upstream changes (presumably #6831) made this pull request unmergeable. Please resolve the merge conflicts.

@charris
Copy link
Member
charris commented Dec 17, 2015

What is the status of this?

@anntzer
Copy link
Contributor Author
anntzer commented Dec 17, 2015

I think we're waiting for #6801 to be ready, to be sure that this patch actually addresses the issues needed.

@ryanpeach
Copy link
ryanpeach commented Oct 12, 2017

So my conditional restriction #9855 #6801 prevents the need which broke issue #6824 needing int64's. What do I need to do to make it compatible with RandomState?

has_gauss = self.internal_state.has_gauss
gauss = self.internal_state.gauss
pos = self.internal_state.pos
state = <ndarray>np.asarray(state, np.uint32)
return ('MT19937', state, pos, has_gauss, gauss)
return version_info + (rng_info, state, pos, has_gauss, gauss)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the docstring, which claims it returns nested tuples. I reckon the docstring format is the better one, so go with that.

@@ -713,9 +735,10 @@ cdef class RandomState:

Parameters
----------
state : tuple(str, ndarray of 624 uints, int, int, float)
state : tuple([int], str, ndarray of 624 uints, int, int, float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, doesn't match the docstring for get_state

version, rng, *rest = state
if version not in range(_HIGHEST_VERSION + 1):
raise ValueError("`version` must be an integer between 0 and {0}".
format(_HIGHEST_VERSION))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error should be split in two, to distinguish "this is not a valid version" and "this version is newer than this version of numpy"

@anntzer
Copy link
Contributor Author
anntzer commented Oct 18, 2017

I am going to close this PR, as I don't intend to work on this anymore. However, feel free to take over the patches for a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0