8000 Handle nan/masked values Axes.vlines and hlines by phobson · Pull Request #7408 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Handle nan/masked values Axes.vlines and hlines #7408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 8, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 20 additions & 18 deletions lib/matplotlib/axes/_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -965,6 +965,8 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',
xmin = self.convert_xunits(xmin)
xmax = self.convert_xunits(xmax)

y, xmin, xmax = cbook.delete_masked_points(y, xmin, xmax)

if not iterable(y):
y = [y]
if not iterable(xmin):
Expand All @@ -973,20 +975,19 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',
xmax = [xmax]

y = np.ravel(y)

xmin = np.resize(xmin, y.shape)
xmax = np.resize(xmax, y.shape)

verts = [((thisxmin, thisy), (thisxmax, thisy))
for thisxmin, thisxmax, thisy in zip(xmin, xmax, y)]
coll = mcoll.LineCollection(verts, colors=colors,
linestyles=linestyles, label=label)
self.add_collection(coll, autolim=False)
coll.update(kwargs)
lines = mcoll.LineCollection(verts, colors=colors,
linestyles=linestyles, label=label)
self.add_collection(lines, autolim=False)
lines.update(kwargs)

if len(y) > 0:
minx = min(xmin.min(), xmax.min())
maxx = max(xmin.max(), xmax.max())
minx = np.min([xmin.min(), xmax.min()])
maxx = np.max([xmin.max(), xmax.max()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind the change from using the builtin min and max to numpy's? If we're talking micro-performance-optimization, the builtins will be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, vlines used the builtin min and max, while hlines used a mix of builtin and numpy. I was going for consistency.

To my eyes, the numpy syntax is more familiar and it feels "safer" (not sure if that's true). I wasn't considering performance with those changes.

My understanding is also that it's faster:

x = numpy.ranndom.normal(size=3700)
%timeit x.max()

The slowest run took 53.27 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.88 µs per loop
where (2.9 * 55 ~ 160 µs)

%timeit max(x)

1000 loops, best of 3: 231 µs per loop

Copy link
Contributor
@dopplershift dopplershift Nov 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring specifically to this change:

minx = min(xmin.min(), xmax.min())

to

minx = np.min([xmin.min(), xmax.min()])

and similar for max()--not the change from min(ymin) to ymin.min(), etc. This latter change I agree with. For the former, I dislike using numpy's min just to find the minimum of two scalars. You end up creating an extra list, which then descends into numpy and creates an array. On my system, taking the minimum of two scalars takes 10us with numpy, 4us with the builtin. There's also more for my eye to scan across with how you've changed it.

But I don't want to bikeshed this to death (6us is not going to have any impact on matplotlib's slowness)--but I'd be interested if we as developers have consensus on what the convention is or should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: builtin min is cleaner for comparing two scalars. As @anntzer has pointed out, however, there is a difference in the way NaN is handled. Numpy also has minimum, which can be used with two scalars and is faster than using np.min:

In [1]: import numpy as np
In [2]: min(np.nan, 1.1)
nan
In [3]: min(1.1, np.nan)
1.1
In [7]: timeit np.minimum(2.2, 1.1)
The slowest run took 16.35 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.13 µs per loop

In [8]: timeit min(2.2, 1.1)
The slowest run took 11.75 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 197 ns per loop

In [9]: timeit np.min([2.2, 1.1])
The slowest run took 260.96 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.37 µs per loop

Conclusion: if there is any possibility of nans, use np.minimum; otherwise, builtin min is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike using numpy's min just to find the minimum of two scalars.

Ahh that makes sense. I'll make the change. You've convinced me :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiring since we're using e.g., cbook.delete_masked_points(x, ymin, ymax) I think the possibility for NaNs has been removed.

miny = y.min()
maxy = y.max()

Expand All @@ -995,7 +996,7 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',
self.update_datalim(corners)
self.autoscale_view()

return coll
return lines

@_preprocess_data(replace_names=["x", "ymin", "ymax", "colors"],
label_namer="x")
Expand Down Expand Up @@ -1046,6 +1047,8 @@ def vlines(self, x, ymin, ymax, colors='k', linestyles='solid',
ymin = self.convert_yunits(ymin)
ymax = self.convert_yunits(ymax)

x, ymin, ymax = cbook.delete_masked_points(x, ymin, ymax)

if not iterable(x):
x = [x]
if not iterable(ymin):
Expand All @@ -1060,23 +1063,22 @@ def vlines(self, x, ymin, ymax, colors='k', linestyles='solid',
verts = [((thisx, thisymin), (thisx, thisymax))
for thisx, thisymin, thisymax in zip(x, ymin, ymax)]
#print 'creating line collection'
coll = mcoll.LineCollection(verts, colors=colors,
linestyles=linestyles, label=label)
self.add_collection(coll, autolim=False)
coll.update(kwargs)
lines = mcoll.LineCollection(verts, colors=colors,
linestyles=linestyles, label=label)
self.add_collection(lines, autolim=False)
lines.update(kwargs)

if len(x) > 0:
minx = min(x)
maxx = max(x)

miny = min(min(ymin), min(ymax))
maxy = max(max(ymin), max(ymax))
minx = x.min()
maxx = x.max()
miny = np.min([ymin.min(), ymax.min()])
maxy = np.max([ymin.max(), ymax.max()])

corners = (minx, miny), (maxx, maxy)
self.update_datalim(corners)
8000 self.autoscale_view()

return coll
return lines

@_preprocess_data(replace_names=["positions", "lineoffsets",
"linelengths", "linewidths",
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions lib/matplotlib/tests/test_axes.py
992E
Original file line number Diff line number Diff line change
Expand Up @@ -2812,6 +2812,90 @@ def test_eb_line_zorder():
ax.set_title("errorbar zorder test")


@image_comparison(
baseline_images=['vlines_basic', 'vlines_with_nan', 'vlines_masked'],
extensions=['png']
)
def test_vlines():
# normal
x1 = [2, 3, 4, 5, 7]
y1 = [2, -6, 3, 8, 2]
fig1, ax1 = plt.subplots()
ax1.vlines(x1, 0, y1, colors='g', linewidth=5)

# GH #7406
x2 = [2, 3, 4, 5, 6, 7]
y2 = [2, -6, 3, 8, np.nan, 2]
fig2, (ax2, ax3, ax4) = plt.subplots(nrows=3, figsize=(4, 8))
ax2.vlines(x2, 0, y2, colors='g', linewidth=5)

x3 = [2, 3, 4, 5, 6, 7]
y3 = [np.nan, 2, -6, 3, 8, 2]
ax3.vlines(x3, 0, y3, colors='r', linewidth=3, linestyle='--')

x4 = [2, 3, 4, 5, 6, 7]
y4 = [np.nan, 2, -6, 3, 8, np.nan]
ax4.vlines(x4, 0, y4, colors='k', linewidth=2)

# tweak the x-axis so we can see the lines better
for ax in [ax1, ax2, ax3, ax4]:
ax.set_xlim(0, 10)

# check that the y-lims are all automatically the same
assert ax1.get_ylim() == ax2.get_ylim()
assert ax1.get_ylim() == ax3.get_ylim()
assert ax1.get_ylim() == ax4.get_ylim()

fig3, ax5 = plt.subplots()
x5 = np.ma.masked_equal([2, 4, 6, 8, 10, 12], 8)
ymin5 = np.ma.masked_equal([0, 1, -1, 0, 2, 1], 2)
ymax5 = np.ma.masked_equal([13, 14, 15, 16, 17, 18], 18)
ax5.vlines(x5, ymin5, ymax5, colors='k', linewidth=2)
ax5.set_xlim(0, 15)


@image_comparison(
baseline_images=['hlines_basic', 'hlines_with_nan', 'hlines_masked'],
extensions=['png']
)
def test_hlines():
# normal
y1 = [2, 3, 4, 5, 7]
x1 = [2, -6, 3, 8, 2]
fig1, ax1 = plt.subplots()
ax1.hlines(y1, 0, x1, colors='g', linewidth=5)

# GH #7406
y2 = [2, 3, 4, 5, 6, 7]
x2 = [2, -6, 3, 8, np.nan, 2]
fig2, (ax2, ax3, ax4) = plt.subplots(nrows=3, figsize=(4, 8))
ax2.hlines(y2, 0, x2, colors='g', linewidth=5)

y3 = [2, 3, 4, 5, 6, 7]
x3 = [np.nan, 2, -6, 3, 8, 2]
ax3.hlines(y3, 0, x3, colors='r', linewidth=3, linestyle='--')

y4 = [2, 3, 4, 5, 6, 7]
x4 = [np.nan, 2, -6, 3, 8, np.nan]
ax4.hlines(y4, 0, x4, colors='k', linewidth=2)

# tweak the y-axis so we can see the lines better
for ax in [ax1, ax2, ax3, ax4]:
ax.set_ylim(0, 10)

# check that the x-lims are all automatically the same
assert ax1.get_xlim() == ax2.get_xlim()
assert ax1.get_xlim() == ax3.get_xlim()
assert ax1.get_xlim() == ax4.get_xlim()

fig3, ax5 = plt.subplots()
y5 = np.ma.masked_equal([2, 4, 6, 8, 10, 12], 8)
xmin5 = np.ma.masked_equal([0, 1, -1, 0, 2, 1], 2)
xmax5 = np.ma.masked_equal([13, 14, 15, 16, 17, 18], 18)
ax5.hlines(y5, xmin5, xmax5, colors='k', linewidth=2)
ax5.set_ylim(0, 15)


@image_comparison(baseline_images=['step_linestyle', 'step_linestyle'],
remove_text=True)
def test_step_linestyle():
Expand Down
0