8000 fix markevery plot option with nans in data by rc · Pull Request #22519 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

fix markevery plot option with nans in data #22519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 2, 2022

Conversation

rc
Copy link
Contributor
@rc rc commented Feb 21, 2022

PR Summary

Hi matplotlib developers,

This PR addresses the following code failure (with matplotlib '3.6.0.dev1701+gb80bc91827', Python 3.8.10):

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(100, dtype=np.float64)
y = x**2
y[50:80] = np.nan

plt.plot(x, y, 'o', markevery=0.1)

plt.show()

where the error is

  File "/home/share/software/packages/matplotlib/lib/matplotlib/backends/backend_qt.py", line 458, in _draw_idle
    self.draw()
  File "/home/share/software/packages/matplotlib/lib/matplotlib/backends/backend_agg.py", line 426, in draw
    self.figure.draw(self.renderer)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/artist.py", line 73, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/figure.py", line 2860, in draw
    mimage._draw_list_compositing_images(
  File "/home/share/software/packages/matplotlib/lib/matplotlib/image.py", line 133, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/axes/_base.py", line 3105, in draw
    mimage._draw_list_compositing_images(
  File "/home/share/software/packages/matplotlib/lib/matplotlib/image.py", line 133, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "/home/share/software/packages/matplotlib/lib/matplotlib/lines.py", line 819, in draw
    subsampled = _mark_every_path(
  File "/home/share/software/packages/matplotlib/lib/matplotlib/lines.py", line 171, in _mark_every_path
    marker_delta = np.arange(start * scale, delta[-1], step * scale)
ValueError: arange: cannot compute length

I have fixed the issue by replacing the non-finite values by means for each axis - not sure if it is right, but it is simple and works well for my data. Let me know if anything fancier is needed.

Best regards,
r.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a while, please feel free to ping @matplotlib/developers or anyone who has commented on the PR. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on gitter for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@oscargus oscargus added status: needs workflow approval For PRs from new contributors, from which GitHub blocks workflows by default. topic: markers labels Feb 21, 2022
@oscargus
Copy link
Member

(I cannot try out the patch right now. So if you can attach a plot that would help at least me.)

Thanks for this! It is clearly an issue that should be fixed.

Isn't there a risk that markers end up where the value is NaN with this fix? (Probably clearer is '-o' is used, and maybe other nan-locations/markevery-values?) Not sure how one really should deal with it though.

@rc
Copy link
Contributor Author
rc commented Feb 21, 2022

I have tried a number of functions, such as

import numpy as np
import matplotlib.pyplot as plt

x = -50 + np.arange(100, dtype=np.float64)
y = 10000 + x**5
y[40:95] = np.nan
y[14:20] = np.nan

plt.plot(x, y, '-o', markevery=0.1)
plt.tight_layout()
plt.savefig('nans-marker.png', bbox_inches='tight')
plt.show()

and the results seemed OK (did not manage to get markers outside the line):
nans-marker

Compare with the full curve without the nans:
nans-marker

I guess the logic where to plot something is elsewhere, determined by the original data with the nans. But, of course, take it with the grain of salt - I do not really understand how matplotlib plotting works at this level :)

@oscargus
Copy link
Member

Thanks for the figures! I've manage to try out the patch a bit now and I agree with your conclusion that it doesn't look like there will markers on NaN. However, it is possible to provoke it to do some "strange" things, but as I mentioned earlier, it may in these cases not be obvious what one really would like to obtain. Still, based on the documentation: " markers will be spaced at approximately equal visual distances along the line; the distance along the line between markers is determined by multiplying the display-coordinate distance of the axes bounding-box diagonal by the value of every." or the demo: "Float arguments allow markers to be spaced at approximately equal distances along the line.", I would not expect it to become like:
image

or

image

Maybe an alternative approach would be to compute the "theoretical position" and then remove them if the closest actual data point is NaN?

(On the other hand, the gallery shows some irregular spacing behavior when zoomed in: https://matplotlib.org/stable/gallery/lines_bars_and_markers/markevery_demo.html )

@rc
Copy link
Contributor Author
rc commented Feb 22, 2022

What is the "theoretical position" when there are nans? I guess the correct approach would be to split the line to consecutive chunks of non-nans and apply the markevery there - but then what if one has data like y = [1, nan, 2, nan, 3, nan]... It seems tricky. BTW. I have tried:

x = -50 + np.arange(100, dtype=np.float64)
y = 10000 + x**5
y[::2] = np.nan

plt.plot(x, y, '-o', markevery=0.07)
plt.plot(x, y, 'x', markevery=(1, 2))

and got
nans-marker

The crosses show the non-nans... It does not look too bad for me, although there are many more markes than there would be in case of the uninterrupted curve.

Then I have tried to fill the nans with an average of the neighbouring non-nans:
nans-marker
-> it seems better, but the code is much more complicated (first try, so it might be possible to simplify):

            aux = np.empty_like(tpath.vertices)
            for ic, vals in enumerate(tpath.vertices.T):
                mask = np.isfinite(vals)
                ii = np.where(mask, np.arange(mask.shape[0]), -1)
                i0, i1 = np.nonzero(ii >= 0)[0][[0, -1]]
                ii[:i0] = i0
                ilo = np.maximum.accumulate(ii)
                ii[~mask] = mask.shape[0] - 1
                ii[i1:] = i1
                ihi = np.minimum.accumulate(ii[::-1])[::-1]
                aux[:, ic] = 0.5 * (vals[ilo] + vals[ihi])

            disp_coords = affine.transform(aux)

@rc
Copy link
Contributor Author
rc commented Feb 22, 2022

I have tried to polish it a bit, see the current version.

@oscargus
Copy link
Member

What is the "theoretical position" when there are nans?

You are right. Maybe one can assume some sort of maximum derivative based on the existing data? But it will probably be possible to find edge cases independent of how it is done.

To some extent the most important thing is that it doesn't give an error... (And the behavior I was considering is obtained for an integer markevery, sort of.)

y = [1, nan, 2, nan, 3, nan].

This is indeed a tricky one, especially since any type of line disappears.

I'm a bit too junior when it comes to approving stuff that actually defines a behavior, but I think the new version is probably as good as I can think of. Hopefully someone more experienced can chip in.

@rc
Copy link
Contributor Author
rc commented Feb 23, 2022

Thanks for the feedback. I have now tried adding a test so that the code coverage passes.

@rc rc force-pushed the fix-float-markevery-with-nans branch from 4557b26 to 08a0eec Compare February 23, 2022 15:37
@rc
Copy link
Contributor Author
rc commented Mar 6, 2022

Ping @matplotlib/developers - the tests pass, not sure if this counts as a new feature for the docs checklist - previously the code just failed.

@oscargus
Copy link
Member
oscargus commented Mar 6, 2022

Thanks and sorry for forgetting about this. I think the fact that it is fixed is great and I think that at the end of the day one can probably always find NaN-patterns that gives non-ideal behavior (especially since one cannot really tell what the ideal behavior is).

So for me it is a go.

@oscargus oscargus added this to the v3.6.0 milestone Mar 6, 2022
@rc
Copy link
Contributor Author
rc commented Mar 6, 2022

Thanks!

@@ -159,7 +159,24 @@ def _slice_or_none(in_v, slc):
"markevery is specified relative to the axes size, but "
"the line does not have a Axes as parent")
# calc cumulative distance along path (in display coords):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem to be in the right place to me.

Comment on lines 164 to 174
aux = np.empty_like(tpath.vertices)
for ic, vals in enumerate(tpath.vertices.T):
ok = ~mask[:, ic]
ii = np.where(ok, np.arange(ok.shape[0]), -1)
i0, i1 = np.nonzero(ok)[0][[0, -1]]
ii[:i0] = i0
ilo = np.maximum.accumulate(ii)
ii[mask[:, ic]] = ok.shape[0] - 1
ii[i1:] = i1
ihi = np.minimum.accumulate(ii[::-1])[::-1]
aux[:, ic] = 0.5 * (vals[ilo] + vals[ihi])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part really needs a comment for explanation; I'm not sure it's the simplest way to achieve what you want since I'm not sure what it's trying to do.

@rc
Copy link
Contributor Author
rc commented Mar 20, 2022

I have tried to explain the code.

@jklymak
Copy link
Member
jklymak commented Mar 28, 2022

Sorry for being dense, but can you explain this algorithm to me? Naively, I would just do markevery on x, and if a value of y is NaN, then it wouldn't get marked. If I recall correctly there is some visual version of this - so perhaps that is the problem here? Sorry, I need more context without spending a lot of time.

@rc
Copy link
Contributor Author
rc commented Mar 28, 2022

The patch does nothing to the marking algorithm (which I did not study), it just allows computing the "cumulative distance along path (in display coordinates)" when there are nans in the coordinates. Originally I tried just to remove the nans, but that did not work (data length-related errors). Hmmm. But it should be possible, let me retry.

@jklymak
Copy link
Member
jklymak commented Mar 28, 2022

.. OK, so it is the "in display co-ordinates" that is the problem, caused by passing a float to markevery. Thanks for the reminder. I guess filling in with nonsense y values and then choosing based on that is a good algorithm as any.

@rc
Copy link
Contributor Author
rc commented Mar 28, 2022

I have just pushed another implementation that just removes the nans. It gives slightly different results, subjectively sometimes not as "good", but the code is trivial and much easier to maintain. Let me know which version you prefer.

disp_coords = affine.transform(tpath.vertices)
fin = np.isfinite(verts).all(axis=1)
fverts = verts[fin]
disp_coords = affine.transform(fverts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems the easiest, and has the feature of only testing shown markers for visual distance. Not always what folks want, but at least doesn't error, and has the benefit of simplicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you ok with this @rc or did you want to argue for a different algorithm? @oscargus this changed since your approval. Still OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still OK! I think we have similar views: no error thrown and some output that makes sense (although it is and always will be possible to find cases that gives "sub-optimal" results).

And nicer with a shorter/simpler solution as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is OK!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rc can you squash to one commit, and then I will merge... If you have not done a rebase before, make sure to make a backup branch so you can get back to this point. But basically

git fetch upstream
git rebase -i upstream/master

and choose squash for all your commits except the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- TST: new test_markevery_linear_scales_nans() test + baseline images
@rc rc force-pushed the fix-float-markevery-with-nans branch from 842ad60 to 99d9475 Compare March 30, 2022 13:39
@rc
Copy link
Contributor Author
rc commented Apr 28, 2022

Re-hi, just checking if the current state is OK.

@jklymak
Copy link
Member
jklymak commented Jun 2, 2022

Sorry to let this languish!

@jklymak jklymak merged commit 534fbb0 into matplotlib:main Jun 2, 2022
@rc
Copy link
Contributor Author
rc commented Jun 3, 2022

No problem, thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs workflow approval For PRs from new contributors, from which GitHub blocks workflows by default. topic: markers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0