8000 [WIP] Masking invalid x and/or weights in hist (#6483) by TrishGillett · Pull Request #7133 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

[WIP] Masking invalid x and/or weights in hist (#6483) #7133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
< 10000 /include-fragment>
Diff view
Diff view
Prev Previous commit
Mask invalid entries in x and weights before plotting histograms.
  • Loading branch information
TrishGillett committed Sep 19, 2016
commit bf8d2cc41bbef667a1922a6e3a837fdb079593e2
15 changes: 13 additions & 2 deletions lib/matplotlib/axes/_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -6055,7 +6055,7 @@ def _normalize_input(inp, ename='input'):
if (isinstance(x, np.ndarray) or
not iterable(cbook.safe_first_element(inp))):

inp = np.asarray(inp)
inp = cbook.safe_masked_invalid(inp)
if inp.ndim == 2:
# 2-D input with columns as datasets; switch to rows
inp = inp.T
Expand All @@ -6077,7 +6077,7 @@ def _normalize_input(inp, ename='input'):
"{ename} must be 1D or 2D".format(ename=ename))
else:
# Change to a list of arrays
inp = [np.asarray(arr) for arr in inp]
inp = [cbook.safe_masked_invalid(arr) for arr in inp]

return inp

Expand Down Expand Up @@ -6138,6 +6138,7 @@ def _normalize_input(inp, ename='input'):
else:
w = _normalize_input(weights, 'weights')

# Comparing shape of weights vs. x
if len(w) != nx:
raise ValueError('weights should have the same shape as x')

Expand All @@ -6146,6 +6147,16 @@ def _normalize_input(inp, ename='input'):
raise ValueError(
'weights should have the same shape as x')

# Combine the masks from x[i] and w[i] (if applicable) into a single
# mask and apply it to both.
if not input_empty:
for i in range(len(x)):
mask_i = x[i].mask
if w[i] is not None:
mask_i = mask_i | w[i].mask
10000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would raise an error if the weight matrix contained mask elements that weren't masked in the data x itself. Unless I am not seeing a use case for that?

Copy link
Contributor Author
@TrishGillett TrishGillett Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question of whether to mask invalid weights was raised on #6483 but not settled. I don't know of a use case offhand and am okay with throwing an error instead if the feeling is that this is overbuilding. @efiring Do you think there could be a use case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using histograms with weights is reasonable way to do spatial integration of images, ex histogram(convert_pixel_locations_to_1D, bins, weights=image.ravel()). If you are looping over a bunch of images, some of which have artifacts that you need to mask out, being able to do this loop with out having to re-mask the pixel position values would be nice.

That said, this is something that should be pushed down into numpy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that explains the mask on x, not on the weights. I am also fine with a mask on the weights as long as it is identical as the one on x. But masks different on the weights and on x seem like it may be an error on the users side, hence throwing a ValueError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the other way around

xx = convent_to_1D(imgs.shape[:-1])
for im in imgs:
    m = compute_mask(im)
    w = ma.masked_array(im, m)
    ax.hist(m, bins, weight=w)

Basically, if we are going to allow masking on either, masking on both and taking the union seems like the best course.

Copy link
Member
@efiring efiring Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; combining independent masks on both via a union is not difficult, it is easy to explain, and it handles every situation that might arise.
See cbook.d E54D elete_masked_points; you could use this as soon as you have the list of sequences of values and the matching list of sequences of weights or of Nones. You would apply it to each matching sequence of values and its corresponding sequence of weights, or None. Maybe it's overkill, but it was designed for this sort of thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiring Thanks for the cbook tip again, between that and a realization that masks seem to stack, I believe I can rewrite this piece as:

if not input_empty:
    for i in range(len(w)):
        if w[i] is not None:
            x[i] = np.ma.masked_array(x[i], mask=w[i].mask)
            w[i] = np.ma.masked_array(w[i], mask=x[i].mask)
    x = cbook.delete_masked_points(x)    
    w = cbook.delete_masked_points(w)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite what I had in mind...

xclean, wclean = [], []
for xi, wi in zip(x, w):
    xiclean, wiclean = cbook.delete_masked_points(xi, wi)
    xclean.append(xiclean)
    wclean.append(wiclean)

This works regardless of whether wi is None or a sequence the length of xi. Using cbook.delete_masked_points is only worthwhile if you are giving it more than one argument. Deleting the masked points from a single 1-D masked array can be done simply by calling it's compressed() method, so that's all you need in your version above. You would call w[i] = w[i].compressed() inside the inner if, and x[i] = x[i].compressed() immediately after it. There are many variations on the theme; I'm not sure which is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's pretty nice, I didn't read the docs carefully enough to see the part about any points masked in one arg getting masked and deleted in the others. I'm not sure I understand what 'All input arguments that are not passed unchanged' means though...

w[i] = np.ma.masked_array(w[i], mask=mask_i).compressed()
x[i] = np.ma.masked_array(x[i], mask=mask_i).compressed()

if color is None:
color = [self._get_lines.get_next_color() for i in xrange(nx)]
else:
Expand Down
0