8000 DOC: Add example for ``np.random.default_rng().beta()`` by linus-md · Pull Request #25349 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: Add example for np.random.default_rng().beta() #25349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from

Conversation

linus-md
Copy link
Contributor
@linus-md linus-md commented Dec 8, 2023

Relates to #21351.

@rkern
Copy link
Member
rkern commented Dec 8, 2023

Thank you for the contribution! I'd like to see it fleshed out a little more. See my comments on an earlier PR for the kinds of things that we'd like to see in examples for these distribution methods. Showing just one call of the method with no commentary is below the threshold (at least in my opinion).

@linus-md
Copy link
Contributor Author
linus-md commented Dec 8, 2023

@rkern I hope the improved example is to your liking. Is there some guide/heuristic on how well document specific functions should be?

@rkern
Copy link
Member
rkern commented Dec 8, 2023

A good general rule is to look at other functions/methods that are similar, and see what their documentation looks like. Fortunately, for these distribution methods on Generator, there are lots to look at; c.f. normal() for a pretty canonical example.

Specifically for these methods, I find that plotting the histogram of a large sample is typically more informative than showing the raw values of a handful of samples. Plotting the histogram (potentially with the analytical PDF) is probably the first priority (we have more methods where that's the only thing in the Examples and few methods where we don't do this). Second might be to show the computations with the samples to confirm some interesting properties of the distribution (e.g. comparing the mean of the samples to the known mean derived from the parameters). Then you can think about maybe showing the effects of changing the parameters (e.g. showing the cases when a>b, a==b, and a<b as well as when both are low vs. both are high). If you introduce an application like the order statistics of uniform variates, like you mention here, it's probably good to demonstrate that relationship, but that can take up a lot of space, so it's lowest on my personal list of priorities.

As you look at the other methods' example, please note that you will see things like np.random.default_rng().normal(); this is the result of an automatic conversion of old docs and not something you should replicate. You are doing the correct style with rng = np.random.default_rng(); rng.beta(...).

If you do show the raw values in the outputs, or other computations based on the random output (e.g. computing means), please add the # may vary at the end of the outputs that may vary if you aren't using a seed. We have automated test runners that try to execute the example code, and need this notation to ignore the output.

@charris charris changed the title DOC: Add example for np.random.default_rng().beta() DOC: Add example for np.random.default_rng().beta() Dec 8, 2023
linus-md and others added 2 commits December 9, 2023 12:30
Co-authored-by: Robert Kern <robert.kern@gmail.com>
@linus-md
Copy link
Contributor Author
linus-md commented Dec 9, 2023

Thank you for taking so much time to explain everything clearly. Is it ok to use math.gamma I could only find the rng.gamma() distribution in numpy.

I also added the comparison to the order statistics but it might take up too much space. We could refactor the pdf into its own function to save space.

@mattip
Copy link
Member
mattip commented Dec 14, 2023

I am not sure we should have too many examples in the function docstring.

A couple of justifications:

  • limited documentation team bandwidth and expertise for reviewing and approving these PRs (correct me if I am wrong)
  • the documentation is built for every PR, inflating the documentation will make it slower
  • docstrings are shipped with NumPy. Inflating them makes the runtime heavier and slower to load (albeit very slightly)
  • By mixing theoretical information with the API documentation, we make the same documentation page serve two audiences: those looking for what the input parameters mean with those looking for deeper knowledge about the distributions. This is problematic as neither audience will be getting what they want.

I would be happier if we refer users to a wikipedia page or open a new section in the user guide around the theory and practice of the random module. We could then consider CI management to not build the user guide on every PR.

@linus-md
Copy link
Contributor Author

@mattip ok. Would you prefer that I put in something like

rng = np.random.default_rng()
n, k = 100, 5
size = 1000
sample = rng.beta(k, n+1-k, size=size) 

because right now there is no example at all or should I just close this?

@mattip
Copy link
Member
mattip commented Dec 14, 2023

@rkern: am I being too harsh? Was the ask to add examples to np.random motivated by end-user complaints?

@rkern
Copy link
Member
rkern commented Dec 14, 2023

Per #21351 from the Docs Team:

"Examples, more examples, and more detailed examples" is the recurrent theme in the feedback about the NumPy documentation we received via the 2020 and 2021 NumPy user surveys.

Given that, I'd say that overcomes the minor objections concerning things like the wheel size.

I do think that all of the distributions benefit from at least one plot of a typical distribution, to trigger visual memory. If it can be demonstrated that if we expanded the example for all of the distribution methods to include multiple plots, then the making of the plots and saving them balloon the CI times by noticeable amounts, then I'd be happy to have a soft-limit guideline of keeping things down to 1 plot per method.

I also think that most of the methods could use a simple demonstration of what each of the parameters mean. Particularly in cases like this where we have two that that are hard to distinguish. I think it's really good to have that final example that shows when a > b the peak moves to the lower end of the range, and when b > a the peak moves high. That said, this particular phenomenon could also be demonstrated without the plot by calculating the means of the samples.

While each distribution method should link to the appropriate Wikipedia article, and we shouldn't recapitulate every interesting fact about the distribution, we should have our own quick demonstrations of the effects of the method arguments. We often have nonstandard parameterizations that confuse people who use the Wikipedia article as a reference.

I do think this PR is a little on the high end of what we want (and thank you @linus-md for being patient with us while we whip you round-and-around on this; we all currently have different ideas of what we want). If I'd drop something, it would be the demonstration of the relationship to the explicit order statistics of the uniforms; it's a nice-to-have example, but if we're feeling pressure to slim this down, that would be the one; it's more about the distribution itself than using the method. But overall, I think this is in range.

@mattip
Copy link
Member
mattip commented Dec 14, 2023

Thanks @rkern, I am convinced :). And thanks @linus-md for your contributions.

@linus-md
Copy link
Contributor Author
linus-md commented Dec 15, 2023

Hi @rkern @mattip, what do you say to the current version? I removed the example, the math import and am doing only one plot now. Do you think we can agree on something like the current version?

There might be some disagreement but I think its is nice to see the mirror symmetry visually.

@linus-md linus-md marked this pull request as draft December 17, 2023 20:10
@linus-md linus-md closed this Dec 17, 2023
@linus-md linus-md deleted the random-doc branch December 17, 2023 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0