8000 BUG: np.histogramdd loses precision on its inputs, leading to incorrect results by eric-wieser · Pull Request #11023 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: np.histogramdd loses precision on its inputs, leading to incorrect results #11023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 14, 2018

Conversation

eric-wieser
Copy link
Member
@eric-wieser eric-wieser commented May 1, 2018

Previously:

  • The last bin edge was treated as "fuzzy", extending 1e-6 times the smallest bin width beyond the end of the array (histogramdd fuzzes the rightmost bin edge #10864)
  • Integer bin edges were cast to float, which for large integers would cause non-equal edges to become equal

Note that this is different to gh-7845, which is about precision on the outputs.

Previously a fuzzy rounded comparison was used for the rightmost bin of histogramdd.

It's not clear why this was done, and it resulted in surprising behavior.

This also removes the restriction that bin edges must be floats, and allows integer arrays to be passed (and returned)

Fixes numpygh-10864
@eric-wieser eric-wieser changed the title BUG: np.histogramdd loses precision on its inputs BUG: np.histogramdd loses precision on its inputs, leading to incorrect results May 1, 2018
hist, edges = histogramdd((x, y), bins=(x_edges, y_edges))

assert_equal(edges[0].dtype, x_edges.dtype)
assert_equal(edges[1].dtype, y_edges.dtype)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fail on master, as edges[i].dtype is float64

y = big + x
y_edges = big + x_edges

hist, edges = histogramdd((x, y), bins=(x_edges, y_edges))
Copy link
Member Author
@eric-wieser eric-wieser May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also fails on master with

ValueError: `bins[0]` must be strictly increasing, when an array

There exist more malicious test cases that give the wrong result rather than crash, but I figured this was sufficient.

y = np.array([0, 1, 2])
x_edges = np.array([0, 2, 2])
y_edges = 1
hist, edges = histogramdd((x, y), bins=(x_edges, y_edges))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On master, this test crashes with

ValueError: `bins[0]` must be strictly increasing, when an array

@@ -646,7 +644,7 @@ def test_rightmost_binedge(self):
bins = [[0., 0.5, 1.0]]
hist, _ = histogramdd(x, bins=bins)
assert_(hist[0] == 0.0)
assert_(hist[1] == 1.)
assert_(hist[1] == 0.0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was weird. 1.0000000001 <= 1.0 should be false.

@charris
Copy link
Member
charris commented May 11, 2018

Hmm, this does change behavior and might cause problems downstream, so it probably needs a compatibility note in the release notes.

@eric-wieser
Copy link
Member Author

I'll add one. I don't think the change is substantially different from the rounding fixes added to np.histogram as part of 1.15, so I'll add an entry for both

@eric-wieser eric-wieser added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label May 12, 2018
@eric-wieser eric-wieser added this to the 1.15.0 release milestone May 12, 2018
@eric-wieser
Copy link
Member Author

@charris: Added, along with np.histogram notes. If this doesn't make 1.15, then the release notes still need to (with modifications).

@eric-wieser eric-wieser removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label May 14, 2018
@charris charris merged commit 2cf2db3 into numpy:master May 14, 2018
@charris
Copy link
Member
charris commented May 14, 2018

Thanks Eric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0