8000 numpy.histogram wrong output dtype when using uint8 weights · Issue #16616 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

numpy.histogram wrong output dtype when using uint8 weights #16616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PiRK opened this issue Jun 16, 2020 · 4 comments · Fixed by #25404
Closed

numpy.histogram wrong output dtype when using uint8 weights #16616

PiRK opened this issue Jun 16, 2020 · 4 comments · Fixed by #25404

Comments

@PiRK
Copy link
PiRK commented Jun 16, 2020

I try to compute an histogram on a masked image. For this, I pass my mask array (zeros and ones) as weights to numpy.histogram. The resulting histogram seems inconsistent.
After scratching my head for a while, I noticed that the output dtype of my histogram was uint8, and that the problem is most likely an overflow.

Reproducing code example:

import numpy

img = numpy.random.randint(0, 256, (1000, 1000), dtype=numpy.uint8)
# mask one quarter of the image
mask = numpy.ones_like(img)
mask[0:500, 0:500] = 0

weighted_hist, bin_edges = numpy.histogram(img, bins=numpy.arange(257), weights=mask)
hist, bin_edges = numpy.histogram(img, bins=numpy.arange(257))


print(weighted_hist / hist)    # expected: values close to  3/4, got much lower values
print(weighted_hist.dtype)   # expected int32 or int64, got uint8


# much smaller img works (no overflow)
img2 = numpy.arange(20, dtype=numpy.uint8).reshape((4, 5))
img2[2:] = img2[0:2]
mask2 = numpy.array([[1, 0, 1, 1, 1], 
                     [0, 1, 1, 1, 1],
                     [1, 0, 0, 1, 1],
                     [1, 1, 1, 1, 0]], dtype=numpy.uint8)
weighted_hist2, bin_edges = numpy.histogram(img2, bins=numpy.arange(11), weights=mask2)
hist2, bin_edges = numpy.histogram(img2, bins=numpy.arange(11))

Error message:

None

Numpy/Python version information:

1.18.2 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]

@eric-wieser
Copy link
Member

This is likely by design, since chances are you want the output to be the same type as your weights - for example, if you were using float16 weights you likely want a float16 histogram.

I think we have two options here:

  • Document this behavior
  • Use sum-like semantics, by:
    • Promoting weights to a larger type with less chance of overflow
    • Adding a dtype argument to override this promotion

Thoughts @seberg, since you've been looking at casting?

@PiRK
Copy link
Author
PiRK commented Jun 16, 2020

Thanks for your answer. I was looking for a way to do a masked histogram without creating intermediary arrays, for very large images. This weights seemed like a good way of achieving this behavior.

I can achieve what I want with a numba function.

@seberg
Copy link
Member
seberg commented Jun 16, 2020

@eric-wieser, I don't think there is anything special about the promotion here. There are two input that may be promoted (or only the weights), and both are uint8. Now we do some special things for add and multiply reductions, but I was never looking much into changing this at the time. Things that I want to limit is value-based-casting in the future (even if that will occasionally mean function using np.asarray() on all their arguments at input might fail to do value-based-casting, if the input is a python int/float.

There is a point in using reduce-like promotion for add here of course. But I guess we have may have to figure out the bigger picture around these reduce-like promotions...

@rossbar
Copy link
Contributor
rossbar commented Jun 16, 2020

Another option might be to make histogram work with masked arrays, though the appetite for this seems low. See also: #10019, #5363, #1812

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
0