8000 Remove some remnants of hist{,2d}(normed=...). by anntzer · Pull Request #16572 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Remove some remnants of hist{,2d}(normed=...). #16572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2020

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented Feb 25, 2020
  • Remove mentions of it from docstrings, example.
  • np.histogram2d also supports density since np1.15
    eric-wieser/numpy@081050a
  • Reword docs for density in a hopefully(?) clearer manner? In
    particular, few users will care about the "first element of the return
    tuple"; what mostly matters is what gets drawn.
  • Refer to hist's density in hist2d.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@anntzer anntzer added this to the v3.3.0 milestone Feb 25, 2020
If ``True``, draw and return a probability density: each bin
will display the bin's raw count *divided by the bin width*
(``n = n0 / sum(n0) * np.diff(bins)``), so that the area under
the histogram integrates to 1 (``np.sum(n * np.diff(bins)) == 1``).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a mistake that slipped in via #16149 (where the text was changed after the first approving review).

So initially it said "This is achieved by dividing the count by the number of observations times the bin width [..]", which is correct, but ambiguous. But that turned into n = n0 / sum(n0) * np.diff(bins), which is incorrect and should rather be n = n0 / (sum(n0) * np.diff(bins))

I also find n0 used here adds to confusion. So n already has two meanings in this docstring, namely the histogram counts, which are returned, as well as the dimensionality of the input data. n0 is then then histogram counts in case of multiple data input, but also the "raw counts".
Hence ideally the complete docstring is revised to use one letter for the return, and another for the dimensionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I just renamed them "density" and "counts" :)

- Remove mentions of it from docstrings, example.
- np.histogram2d also supports density since np1.15
  eric-wieser/numpy@081050a
- Reword docs for `density` in a hopefully(?) clearer manner?  In
  particular, few users will care about the "first element of the return
  tuple"; what mostly matters is what gets drawn.
- Refer to hist's `density` in hist2d.
@dstansby dstansby merged commit 8ac3ea1 into matplotlib:master Feb 26, 2020
@anntzer anntzer deleted the unnormed branch February 26, 2020 21:51
If ``True``, draw and return a probability density: each bin
will display the bin's raw count divided by the total number of
counts *and the bin width*
(``density = counts / (sum(counts) * np.diff(bins))``),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right either, is it? It should be density = counts / sum(counts * np.diff(bins))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum(counts) is a scalar and diff(bins) a vector, aren't they?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess both are the same in case of constant bin width. But in general it needs to be
density = counts / (sum(counts) * np.diff(bins)). So the way it's currently written is correct. (Maybe it helps thinking of sum(counts) as the total number of observations N and write it down as 1/N * counts / diff(bins) ?)

8000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies. I was confusing myself. This is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0