8000 RandomGen review notes (feel free to close) · Issue #13606 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

RandomGen review notes (feel free to close) #13606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seberg opened this issue May 22, 2019 · 17 comments
Closed

RandomGen review notes (feel free to close) #13606

seberg opened this issue May 22, 2019 · 17 comments

Comments

@seberg
Copy link
Member
seberg commented May 22, 2019

@mattip some silly things I noticed looking at randomgen. Just creating an issue for now, to keep editing it in case I find something more interesting. Running things to valgrind, but unsurprisingly, looks good (no invalid behaviour, no definitely lost memory, except the expected ones due to buffer caching of scalars).

  • Are examples exposed in the documentation somewhere?
  • Namespace issues (maybe OK):
    • np.random.bounded_integers exposure does not matter I guess
    • np.random.common exposure also does not matter?
    • Should we expose all BitGenerators in np.random namespace? Same for the corresponding module, e.g. np.random.dsfmt and np.random.DSFMT or have a np.random.generators namespace?
  • Generator repr could have <> (maybe longer path).
  • What is src/legacy/distributions-boxmuller.c about, aren't those already in the _mtrand distributions?
  • We may want to fix up style (such as 2-space indentation) at some later point.
  • Pretty sure you do not need PyArray_calloc_aligned in DSFMT (no initialization is OK? but no matter)

Generator

  • random_sample exists, but is just a warning and returns None. Probably meant to raise an error?
  • .poisson_lam_max is it necessary to expose, or add leading underscore?
  • tomaxint is something we want I guess (well nobody complained and it was always there)?

BitGenerator

  • Is there a use to exposing .lock on the python level (all py-level functions must check lock anyway)? But nothing much wrong with it either.
@seberg seberg added this to the 1.17.0 release milestone May 22, 2019
@mattip
Copy link
Member
mattip commented May 23, 2019

Are examples exposed in the documentation somewhere?

The main page has a Features section with a link to Extending, which states "An example showing the use of a more complicated distribution is in the examples folder."

top-level namespace

np.random.bounded_integers, np.random.common, are not included in __all__. The BitGenerators are included in __all__, they were in a subdirectory but we moved them to top-level. In the future we will remove the RandomState singleton methods from the top-level namespace (beta, binormal, ...), I think it makes sense to have the BitGenerators there.

repr(Generator)

repr(np.random.Generator) gives me "<class 'numpy.random.generator.Generator'>". I am not sure what you were asking for, sorry.

_mtrand

removing _mtrand. We can always restore it from git.

style

Added to #13164

random_sample exists, but is just a warning and returns None

added missing return statement

Generator.poisson_lam_max, Generator.tomaxint, BitGenerator.lock

Not clear to me the motivation. @bashtage?

@seberg
Copy link
Member Author
seberg commented May 23, 2019

Yeah, the namespace stuff since not in __all__, it is a normal thing that other packages appear. I meant the repr of the instanciated Generator(), but it is just a changable nitpick in any case.

I can't really sieve through all the code (or well, I guess I could, but would need much more time), in any case, as you see 2 hours looking at did not make me see any real issue...

If _mtrand is fully deleted, do we have to be careful if we say fixup a distribution to move code from one file to the other, or it is already all split up into the legacy and not legacy part, right?

@mattip
Copy link
Member
mattip commented May 23, 2019

The intention is to fully split into legacy and non-legacy.

@bashtage
Copy link
Contributor

Generator.poisson_lam_max, Generator.tomaxint, BitGenerator.lock

Not clear to me the motivation. @bashtage?

lam_max is there because it is there on RandomState,

poisson_lam_max = np.iinfo('l').max - np.sqrt(np.iinfo('l').max)*10

It is actually one final change that needs to go through. I never changed the value from a long to an int64 in Generator. I'll PR in a bit on this.

tomaxint should 100% be removed from generator. This is also there on RandomState, and is no unloved that it doesn't even appear in np.random. It is also redundant since `integers(0,np.iinfo(np.int).max, endpoint=True) produces values in the same range.

.lock is an instance fo the lock and is used to get the lock from the bitgen when instantizing a Generator, see

https://github.com/bashtage/randomgen/blob/master/randomgen/generator.pyx#L115

I suppose this should have a docstring somewhere.

@bashtage
Copy link
Contributor

If _mtrand is fully deleted, do we have to be careful if we say fixup a distribution to move code from one file to the other, or it is already all split up into the legacy and not legacy part, right?

It is split where needed but shared where the generation code is the same. For example, the bounded integers code is that same.

@bashtage
Copy link
Contributor

np.random.bounded_integers and np.random.common both only expose cdef functions. They both export a few standard imported modules (e.g., sys) or function. I suppose these could have __all__ = []

@bashtage
Copy link
Contributor
  • What is src/legacy/distributions-boxmuller.c about, aren't those already in the _mtrand distributions?

This is where the distributions that are used by mtrand.pyx but not Generator are kept. It is not a great name, and perhaps should be called legacy-distributions.c, which is what I switched to in randomgen (but after mattip forked for NumPy). Should probably have a clear docstring at the top explaining what this is and when to use it. Same thing is true for distributions.c so explain that changing a distribution that is shared with mtrand.pyx requires ensuring that the version exposed to mtrand.pyx is unchanged.

@bashtage
Copy link
Contributor

@mattip I think random_sample should be completely missing from Generator. This was removed in favor of random in the new API.

@mattip
Copy link
Member
mattip commented May 23, 2019

Ok, I seem to remember someone mentioning it should be a stub function that warns and calls random_sample to aid people in the transition, but we can always add it back in later. Removing both it and tomaxint

@bashtage
Copy link
Contributor

I don't have a strong opinion, but probably better to keep it minimal from the beginning if possible.

@bashtage
Copy link
Contributor

One final note about lock - this is part of the visible API since anyone who wishes to use a bit generator in their own code and share this same bit generator with Generator needs the same lock.

@bashtage
Copy link
Contributor
  • np.random.bounded_integers exposure does not matter I guess
  • np.random.common exposure also does not matter?

These could be hidden without any loss. They probably should be.

  • We may want to fix up style (such as 2-space indentation) at some later point.

Is there a NumPy C style doc? I used clang-format's defaults since I didn't see a definitive numpy C style guide. It would probably be a good idea to autoreformat, if possible, before the first merge, to simplify history once bugs or other changes come in.

@mattip
Copy link
Member
mattip commented May 23, 2019

The NumPy C style guide is here.

@bashtage
Copy link
Contributor

Anyone have a clang-format file for the guide? :-)

@rkern
Copy link
Member
rkern commented May 23, 2019

lam_max is there because it is there on RandomState,

We'd have to trawl the history to confirm, but I strongly suspect that it was not intended to be public. It's just a constant. I don't think anyone is relying on its presence.

@bashtage
Copy link
Contributor

We moved it to private. It is undocumented but is unit tested so it is convenient to leave it exposed to Python.

@seberg
Copy link
Member Author
6969
seberg commented May 23, 2019

Thanks for all the small fixups! The lock I was just a bit curious, because it seemed that from the python side there is not much actual use. Anyway, it is good to keep it there, probably better, it may make lowlevel calls a bit smoother to implement even if hiding were easy.

@seberg seberg closed this as completed May 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0