8000 DOC, API: improve the C-API/Cython documentation and interfaces for random by mattip · Pull Request #15007 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC, API: improve the C-API/Cython documentation and interfaces for random #15007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 3, 2019

Conversation

mattip
Copy link
Member
@mattip mattip commented Nov 29, 2019
  • Clean up the random c documentation

    • Split the page into Cython and C-API sections
    • Document the bitgen_t struct
    • Function formatting should render cleanly, linking to non-standard C arguments
  • Rename functions in distributions

  • Refactor CFFI test, better document the example files

@mattip
Copy link
Member Author
mattip commented Nov 29, 2019

xref checklist in gh-14778

@mattip mattip changed the title DOC: improve the C-API/Cython documentation for random DOC, API: improve the C-API/Cython documentation and interfaces for random Nov 29, 2019
@mattip mattip requested a review from rgommers November 29, 2019 13:04
@mattip
Copy link
Member Author
mattip commented Nov 29, 2019

The reworked "extending" documentation can be seen here and here is the cython/c-api. The functions in that last link could use some more documentation

Copy link
Contributor
@bashtage bashtage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good and is an improvement.

@@ -112,7 +104,7 @@ The functions are named with the following cconventions:

.. c:function:: double random_f(bitgen_t *bitgen_state, double dfnum, double dfden)

.. c:function:: double random_standard_cauchy(bitgen_t *bitgen_state)
.. c:function:: double random_cauchy(bitgen_t *bitgen_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it already matches the name at the Python level, please leave it as such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting the cauchy and random_t changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip I think you forgot to push the revert commit here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. fixing.

@bashtage
Copy link
Contributor
bashtage commented Nov 29, 2019 via email

@rkern
Copy link
Member
rkern commented Nov 29, 2019

Making it harder to upgrade from RandomState to Generator is not the price that I would pay for cleaning this up. Besides, these are standard_ methods. They are more like standard_normal than they are normal, and these are distributions where I kind of expect to be able to specify the loc and scale unless otherwise marked, which is what standard_ does. That's not the case with, say, beta. I get that it bugs some people when looking at it from one direction, but the opposite situation would bug me!

If anything needs to be done to rationalize this, I'd add a cauchy and t (ship having sailed on student_t) methods that do take the loc and scale parameters (or whatever names are deemed appropriate).

@bashtage
Copy link
Contributor

Making it harder to upgrade from RandomState to Generator is not the price that I would pay for cleaning this up. Besides, these are standard_ methods. They are more like standard_normal than they are normal, and these are distributions where I kind of expect to be able to specify the loc and scale unless otherwise marked, which is what standard_ does. That's not the case with, say, beta. I get that it bugs some people when looking at it from one direction, but the opposite situation would bug me!

I'm not actually arguing these must be changed, although I would argue that it was a mistake to not change then when Generator was introduced when other functions were renamed, e.g., randint -> integers.

I personally find standard_t to be a bit confusing since in my area there is a distribution known as the standard ized t that is a Student's t that is normalized to have a unit variance for all choices of the degree of freedom parameter (dof > 2). The t that adds two parameters to set the location and scale is known as a generalized t.

@mattip
Copy link
Member Author
mattip commented Nov 30, 2019

Windows 32bit failures are unrelated. We have I think at least three choices to make with python standard_t/ C random_standard_t:

  1. change the python level and C level to student_t, random_student_t
  2. add an additional python level student_t, change the C to random_student_t
    • possibly deprecate standard_t
  3. leave it alone.

I would tend to 2, with no deprecation but with a comment in the documentation.

@rgommers rgommers added this to the 1.18.0 release milestone Dec 3, 2019
size_t num_colors, int64_t *colors,
int64_t nsample,
size_t num_variates, int64_t *variates)
.. c:function:: int random_mvhg_count(bitgen_t *bitgen_state, npy_int64 total, size_t num_colors, npy_int64 *colors, npy_int64 nsample, size_t num_variates, npy_int64 *variates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know if mvhg is a sensible name, there's no docs other than the function signature and the name is very non-descriptive. Adding a few lines of docs can be addressed after the 1.18.x split, that would be nice (now it's "read the source code" I think).

For 1.18.x though, can we have a decision on the name? The legacy code has random_hypergeometric, so that's taken. The new code could use hypergeom or multi_hypergeom or something similar perhaps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see now, random_mvhg_count and random_mvhg_marginals are the only two poor naming choices that are left, everything else looks clear to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarrenWeckesser any ideas on nicer names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the names random_mvhg_count and random_mvhg_marginals are not great for a public-facing API. The verbose, completely explicit name for the marginals version would be

random_multivariate_hypergeometric_marginals

but I don't think anyone wants a name that long. Is

random_multivar_hypergeom_marginals

still too long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgommers wrote:

I still don't know if mvhg is a sensible name, there's no docs other than the function signature

The C functions random_hypergeometric, random_mvhg_count and random_mvhg_marginals have documentation in the comments of the C code. These comments explain the arguments, including assumptions about the input values (e.g. input values are expected to be nonnegative) that are not checked in the C code.

Is the plan to eventually expand the documentation in this file (doc/source/reference/random/c-api.rst) to include such information?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@rgommers
Copy link
Member
rgommers commented Dec 3, 2019
1. change the python level and C level to student_t, random_student_t

2. add an additional python level `student_t`, change the C to `random_student_t`
   
   * possibly deprecate `standard_t`

3. leave it alone.

I would tend to 2, with no deprecation but with a comment in the documentation.

I'm fine with "3. leave it alone". When I brought up the naming inconsistency I missed that those names were already present in the Python API. So I'd say perhaps not worth it; I'd be happy to just have the new additions be named as clearly as possible.

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there, thanks @mattip. All requested changes are minor I think.

@mattip
Copy link
Member Author
mattip commented Dec 3, 2019

Fixed the niggles and mistakes, tests pass, now waiting for a decision about the names.

@bashtage
Copy link
Contributor
bashtage commented Dec 3, 2019 via email

@rgommers rgommers merged commit fdd8395 into numpy:master Dec 3, 2019
@rgommers
Copy link
Member
rgommers commented Dec 3, 2019

Changes look good, so merged. A couple of things left to do or decide in a follow-up PR:

  • expanding the mvhg naming
  • removing one instance of `random.examples in the docs
  • use of private names in the Cython API - I thought we agreed to add __init__.pxd to not have to do cimport numpy.random._generator?

I think the first is exceedingly clear. The second is also readable, but long or long +9 is still a long name.

I'd be happy with either, with a light preference for the longer version (so agreeing with @bashtage)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0