Handle nan/masked values Axes.vlines and hlines #7408

phobson · 2016-11-04T22:22:30Z

Closes #7406.

Also applied fixes to Axes.hlines and wrote tests for both.

dopplershift

This looks good to me. I'm not holding it up for that line.

dopplershift · 2016-11-04T22:50:43Z

lib/matplotlib/axes/_axes.py

@@ -979,14 +979,14 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',

        verts = [((thisxmin, thisy), (thisxmax, thisy))
                 for thisxmin, thisxmax, thisy in zip(xmin, xmax, y)]
-        coll = mcoll.LineCollection(verts, colors=colors,
+        lines = mcoll.LineCollection(verts, colors=colors,
                                    linestyles=linestyles, label=label)


Shouldn't this line need to indent to keep things lined up (pep8 and all that...)

nice catch!

NelleV

Thanks @phobson !

efiring · 2016-11-04T23:38:39Z

This supports nans in the input, but not masked arrays; in most cases we try to support both. If you want to handle masked arrays, you could run the arguments through a function analogous to cbook.safe_masked_invalid, but going in the opposite direction by using np.ma.filled(x, np.nan) if x is floating point.

Overall, we have a somewhat confusing mixture of handling missing points via nans (which only works for floating point, and which requires nanmax etc.) and via masked arrays, for which methods work mostly as they do on plain ndarrays.

efiring · 2016-11-04T23:42:32Z

Second thought: for these functions, would it make more sense to simply filter out the bad values at the start? I don't see any point in pushing the handling of bad values down into the min and max calculations. Admittedly, I haven't looked closely.

phobson · 2016-11-05T00:00:38Z

@efiring Filtering was my thought when I first went into this. Let me mess around with feeding masked arrays into this function before we merge.

In either case, handling both masked arrays and sequences with invalid (np.nan) values in a simple way will take some thinking.

phobson · 2016-11-05T00:23:37Z

With the current implementation in this PR, masked arrays does not work as expected.

For example

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

fig3, ax5 = plt.subplots()
x5 = np.ma.masked_equal([2, 4, 6, 8, 10, 12], 8)
ymin5 = np.ma.masked_equal([0, 1, -1, 0, 2, 1], 2)
ymax5 = np.ma.masked_equal([13, 14, 15, 16, 17, 18], 18)
ax5.vlines(x5, ymin5, ymax5, colors='k', linewidth=2)

Produces a plot with 5 lines. Only the line where x5 == 8 is omitted.

I'll get back to this later this weekend.

efiring · 2016-11-05T00:24:13Z

Filtering a single array is easy: x = np.ma.compressed(cbook.safe_masked_invalid(x)).
For vlines, however, one to three arrays have to be filtered together. This is accomplished by cbook.delete_masked_points. Right after the unit conversion, insert

    x, ymin, ymax = cbook.delete_masked_points(x, ymin, ymax)

and I think that will take care of everything.

phobson · 2016-11-05T00:37:57Z

@efiring Great. Thanks for the tip. I'll include that and the example from my previous comment as a test.

Closes GH matplotlib#7406. Also applied fixes to Axes.hlines and wrote tests for both.

complete reworked how nan/masked values are handled. need fresh review.

dopplershift

Looks good to me, but would like a clarification on the question below.

dopplershift · 2016-11-07T17:41:04Z

lib/matplotlib/axes/_axes.py

-            minx = min(xmin.min(), xmax.min())
-            maxx = max(xmin.max(), xmax.max())
+            minx = np.min([xmin.min(), xmax.min()])
+            maxx = np.max([xmin.max(), xmax.max()])


What's the rationale behind the change from using the builtin min and max to numpy's? If we're talking micro-performance-optimization, the builtins will be faster.

Originally, vlines used the builtin min and max, while hlines used a mix of builtin and numpy. I was going for consistency.

To my eyes, the numpy syntax is more familiar and it feels "safer" (not sure if that's true). I wasn't considering performance with those changes.

My understanding is also that it's faster:

x = numpy.ranndom.normal(size=3700)
%timeit x.max()

The slowest run took 53.27 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.88 µs per loop
where (2.9 * 55 ~ 160 µs)

%timeit max(x)

1000 loops, best of 3: 231 µs per loop

I'm referring specifically to this change:

minx = min(xmin.min(), xmax.min())

to

minx = np.min([xmin.min(), xmax.min()])

and similar for max()--not the change from min(ymin) to ymin.min(), etc. This latter change I agree with. For the former, I dislike using numpy's min just to find the minimum of two scalars. You end up creating an extra list, which then descends into numpy and creates an array. On my system, taking the minimum of two scalars takes 10us with numpy, 4us with the builtin. There's also more for my eye to scan across with how you've changed it.

But I don't want to bikeshed this to death (6us is not going to have any impact on matplotlib's slowness)--but I'd be interested if we as developers have consensus on what the convention is or should be.

I agree: builtin min is cleaner for comparing two scalars. As @anntzer has pointed out, however, there is a difference in the way NaN is handled. Numpy also has minimum, which can be used with two scalars and is faster than using np.min:

In [1]: import numpy as np In [2]: min(np.nan, 1.1) nan In [3]: min(1.1, np.nan) 1.1 In [7]: timeit np.minimum(2.2, 1.1) The slowest run took 16.35 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.13 µs per loop In [8]: timeit min(2.2, 1.1) The slowest run took 11.75 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 197 ns per loop In [9]: timeit np.min([2.2, 1.1]) The slowest run took 260.96 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 9.37 µs per loop

Conclusion: if there is any possibility of nans, use np.minimum; otherwise, builtin min is fine.

I dislike using numpy's min just to find the minimum of two scalars.

Ahh that makes sense. I'll make the change. You've convinced me :)

@efiring since we're using e.g., cbook.delete_masked_points(x, ymin, ymax) I think the possibility for NaNs has been removed.

This is a little simpler since we're finding the min or max of two scalars (avoids an intermediate list + diving into numpy). Also synced up the decorator between the two functions.

dopplershift · 2016-11-07T20:04:28Z

Test failure is a weird linkchecker failure--seems unrelated.

QuLogic · 2016-11-07T21:03:36Z

Link checker failed because the documentation does not exist; the failure is here and appears to be related to this PR.

dopplershift · 2016-11-07T21:06:39Z

Wow, not sure how I missed that.

QuLogic · 2016-11-07T21:09:13Z

We should probably change Travis to not run link checker when the doc build fails.

phobson · 2016-11-08T05:16:58Z

That failure is really confusing to me because the tests pass and the docs build on my local branch.

Nonetheless, I think the latest commit (deleting masked/invalid points after we make sure the first arg is an iterable) should do the trick.

dopplershift · 2016-11-08T17:11:45Z

All green--I think we're good here.

dopplershift · 2016-11-08T20:33:21Z

If this is actually intended for 2.0, it will need to be cherry-picked.

[MRG] Handle nan/masked values Axes.vlines and hlines Conflicts: lib/matplotlib/axes/_axes.py We keep the v2.x version on 2 lines: use unpack_labeled_data, instead of _preprocess_data, which is only in master.

efiring · 2016-11-08T20:34:22Z

back-ported to v2.x as ad192f0

phobson · 2016-11-08T21:30:33Z

thanks for the guidance and merging everyone

phobson added this to the 2.0.1 (next bug fix release) milestone Nov 4, 2016

phobson added the status: needs review label Nov 4, 2016

dopplershift previously approved these changes Nov 4, 2016

View reviewed changes

NelleV changed the title ~~Use np.nanmin/max to set ax limits in Axes.vlines~~ [MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines Nov 4, 2016

NelleV previously approved these changes Nov 4, 2016

View reviewed changes

phobson changed the title ~~[MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines~~ [No Merge] Use np.nanmin/max to set ax limits in Axes.vlines Nov 5, 2016

phobson added status: needs revision and removed status: needs review labels Nov 5, 2016

Use np.nanmin/max to set ax limits in Axes.vlines

747c3cd

Closes GH matplotlib#7406. Also applied fixes to Axes.hlines and wrote tests for both.

phobson force-pushed the vlines-with-nan branch from 39b5b24 to 747c3cd Compare November 7, 2016 16:09

phobson changed the title ~~[No Merge] Use np.nanmin/max to set ax limits in Axes.vlines~~ [MRG] Use np.nanmin/max to set ax limits in Axes.vlines Nov 7, 2016

phobson added the status: needs review label Nov 7, 2016

phobson changed the title ~~[MRG] Use np.nanmin/max to set ax limits in Axes.vlines~~ [MRG] Handle nan/masked values Axes.vlines and hlines Nov 7, 2016

dopplershift requested changes Nov 7, 2016

View reviewed changes

Use builtin min/max in hlines/vlines

61c214e

This is a little simpler since we're finding the min or max of two scalars (avoids an intermediate list + diving into numpy). Also synced up the decorator between the two functions.

dopplershift approved these changes Nov 7, 2016

View reviewed changes

efiring removed the status: needs revision label Nov 7, 2016

remove masked/invalid points after forcing to iterable

1b3fb65

efiring merged commit d7b8992 into matplotlib:master Nov 8, 2016

QuLogic removed the status: needs review label Nov 8, 2016

phobson deleted the vlines-with-nan branch November 8, 2016 21:29

phobson changed the title ~~[MRG] Handle nan/masked values Axes.vlines and hlines~~ Handle nan/masked values Axes.vlines and hlines Nov 8, 2016

QuLogic modified the milestones: 2.0.1 (next bug fix release), 2.0 (style change major release) Dec 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handle nan/masked values Axes.vlines and hlines #7408

Handle nan/masked values Axes.vlines and hlines #7408

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Handle nan/masked values Axes.vlines and hlines #7408

Handle nan/masked values Axes.vlines and hlines #7408

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!