improve sub-second datetime plotting and documentation #10076

nmartensen · 2017-12-22T18:10:52Z

PR Summary

Users trying to plot extremely short (sub-millisecond) time intervals using datetime will run into issues, as shown in #10073. I'd like to make the limitations more explicit in the documentation, and investigate if the situation can be improved without making things much more complicated.

Feedback is welcome.

Main changes right now are:

the addition of a note to the MicrosecondLocator docstring, and
a small change to microsecond rounding.

The rounding change improves the situation when using the early-year workaround and should not have any effect anywhere else.

The remaining issue (the reason for the warning at the end of the note) is the 10-millisecond "snapping distance" around full seconds (dates.py lines 292 to 296). This is needed in order to produce correct tick labels for plots of longer time intervals, therefore it cannot simply be dropped.

Perhaps an optional microsecond_precision parameter (default: 20) could be added to _from_ordinalf() and num2date(), with microseconds then being rounded to nearest integer multiple of this? The MicrosecondLocator would be the only one to use it, and can set it to 1. Would something like this be acceptable? Then the warning in the note could be dropped.

PR Checklist

Has Pytest style unit tests
Code is PEP 8 compliant
New features are documented, with examples if plot related
Documentation is sphinx and numpydoc compliant
Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

nmartensen · 2017-12-23T18:58:55Z

updated with new commits:

replace the original full-second snapping with musec-interval rounding
DateFormatter: use full musec precision tick labeling when using '%f' in strftime format string, but otherwise always round
add a warning when detecting years where the precision might not be sufficient.

nmartensen · 2017-12-23T19:11:35Z

With these commits, output with year <= 20 now looks exactly like the "Expected outcome" in #10073. Higher years trigger a warning. The precision issues with modern years are clearly visible in the tick labels and are not hidden in any way.

Whenever the numeric input is small enough to provide actual microsecond precision, we don't want to always round down, but instead to the nearest microsecond. The difference becomes visible when the AutoDateLocator picks the MicrosecondLocator, and the format string for the tick labels is configured to show microseconds. Support for both was added in Matplotlib 2.0.

_to_ordinalf is used by num2date to actually convert floating point numbers to datetime objects. The main purpose of this conversion is to prepare the datetimes from which tick labels are generated. Since datetimes have microsecond resolution, but the input (float) does not have microsecond precision for any modern date, a 10-microsecond "snapping distance" around full seconds was used to obtain timestamps suitable for tick labels with per-second and lower resolution. However, when higher resolution is needed (which is now possible after the introduction of Microsecond support in AutoDateLocator and -Formatter in Matplotlib 2.0), unconditional full-second distance snapping is not appropriate: It is unable to perform rounding to milliseconds, and in addition may lead to multiple identical tick labels at highest plotting time resolution. The solution implemented here is to introduce a microsecond precision parameter (`musec_prec`) to the appropriate functions and function calls, and perform rounding to the nearest multiple of the given number of microseconds. The default value of 20 offers the same effect as the previous "snapping distance" around full seconds. In addition, it can also be turned off (when set to 1) or be used to perform rounding to full milliseconds.

DateLocator: viewlim_to_dt and datalim_to_dt should always get the most accurate datetimes we can offer, so turn off the newly introduced interval rounding for them. AutoDateFormatter: We need to decide whether the tick labels should be rounded when formatting them or not. Rounding is most likely always desired, except when exact microseconds are needed. Inclusion of microsecond resolution in the strftime format string is used as an indication that the user really wants to see microseconds and they are relevant, so simply do not round them in that case.

The proposed workaround now works correctly, after the musec precision handling has been added. Therefore the warning is no longer needed.

The warning threshold of is based on the assumption that the user really wants full microsecond resolution in the pl 8000 ot, and expects the plot frame to come out nicely (no jagged corners). Besides being able to identify accurate tick labels/positions, it requires a small "accuracy margin" for the drawing operations. If we assume a desired accuracy of 50ns in the underlying numerical time data, we need to go back to at least the year 27. It cannot hurt to start warning about the issue slightly earlier, hence the treshold is set to 20.

nmartensen · 2017-12-27T19:58:20Z

more updates:

commit messages provide more background information
code changes are PEP 8 compliant
updated test results (test_axes.py::test_date_timezone_*, test_dates.py::test_DateFormatter, test_auto_date_locator*)

Unfortunately github does not support stacked pull requests yet. Otherwise this WIP would consist of three PRs, one on top of the previous:

documentation changes (first three commits)
changes to microsecond rounding (remaining commits)
improved millisecond tick labeling without microseconds (not included yet)

Commits implementing (3) already exist, but that's probably too much for this single PR. If you prefer, (1) and (2) can also be split into separate pull requests.

tacaswell · 2018-01-02T05:27:41Z

@nmartensen Thanks for working on this!

However, I am not sure that this is the best approach here. Adding a kwarg to all of the time related functions is a bit disruptive.

How hard would it be to up the precision of the internal storage type?

efiring · 2018-01-02T06:58:56Z

It seems to me that time-handling with high precision is so fundamentally broken (e.g., it ignores leap seconds) with all calendar-related code, not to mention matplotlib's use of double precision days since an absurdly long-ago epoch, that trying to patch it up with a hack--a fake year--is just making things more confusing. Wouldn't it make more sense for the user to pick an origin (e.g., the beginning of an experiment) and label the axis with microseconds (or milliseconds, or floating-point seconds) from that origin? Maybe make custom locators and formatters?

jklymak · 2018-01-02T23:01:40Z

datetime supports microseconds, and we purport to plot datetimes. So I guess we should be able to handle microsecond input.

We could stop translating to epoch and let the unit converter carry around a time=0 internal variable (probably easiest if its a day=0 datetime).
We could make datenums be np.longdouble.

I think 2 is easiest, though I'm not sure if our plotting respects np.longdouble? which you'd want to get the actual position on the axis correct.

I think 1 is most platform independent? 1) would probably just require date2num and num2date to have optional day=0 kwargs that allow an offset in calculating the floating-point value for drawing.

efiring · 2018-01-03T00:54:54Z

Using long double is not an option--essentially, there is no such thing. See https://docs.scipy.org/doc/numpy-dev/user/basics.types.html.

Changing our datenum is a big api break. If we add an alternative--like datetime64, in one or more of its variants--we would have to maintain full compatibility with existing datenums. This might be the way to go, since we should have native datetime64 support anyway.

What we have is a bit silly, but it works for normal dates and times. Datetime has microseconds, but given that it doesn't handle leap seconds, it is a mistake to think it is handling time units with high accuracy. Furthermore, what is a sensible way of labeling ticks on a time axis covering extremely short intervals? I doubt that it involves year, month, day, hour, minute, second, and microseconds.

I think the best course of action here is to start by documenting the present limitation, and then carefully think through strategies for improvement, starting with the need for our own datetime64 support (recognizing that datetime64 is flawed, but we are probably stuck with it.)

jklymak · 2018-01-03T01:00:28Z

@efiring yeah I agree.

If thats the consensus, maybe it makes more sense to deprecate the microsecond Locator so people don't expect it to work?

WRT datetime64, we already have support: #9779

efiring · 2018-01-03T01:17:44Z

On 2018/01/02 3:00 PM, Jody Klymak wrote: @efiring <https://github.com/efiring> yeah I agree. If thats the consensus, maybe it makes more sense to deprecate the microsecond Locator so people don't expect it to work?

That makes sense to me.

WRT |datetime64|, we already have support: #9779 <#9779>

Great! I completely forgot about it.

nmartensen · 2018-01-07T16:15:10Z

If thats the consensus, maybe it makes more sense to deprecate the microsecond Locator so people don't expect it to work?

Users not knowing about implementation details (such as microsecondlocators) may still expect it to work, since plotting datetime.datetime in general is supported, and datetime.datetime has microseconds.

What's wrong with documenting the limitations and making it work as good as possible? The note added to the documentation already contains advice to use some other time format.

jklymak · 2018-01-07T16:26:22Z

If it could be done without a lot of extra code outside the locator and formatter it’s probably fine.

I think the current situation plus a better doc/info output would suffice.

addition of a new kwarg to num2date and _from_ordinalf is too intrusive. Since plotting microsecond time ranges with current dates isn't really possible, we don't need to try, so we can simply always do the musec interval rounding. Insert a hard-coded threshold making microsecond plotting still work for users trying to use the 'early-year' workaround hack.

This reverts commit 63b9849. Latest code changes now lead to the original result images from the axes tests.

efiring · 2018-01-07T18:04:07Z

What is the use case for a microsecond locator? Under what circumstances does it really make sense to use datetime machinery and calendars for generating tick labels for extremely short intervals? What can a user do with such a locator that cannot be done as well or better with plain numbers, using non-date-specific locators?
Sometimes things get into mpl before we realize they shouldn't; I think the microsecond locator is in that category. Matplotlib suffers from bloat. When we have the opportunity to trim it out, we should do so.
I'm open to being convinced otherwise if there is a compelling use case.

jklymak

This looks good to me, except for the few minor points below. Certainly having 20 micro-second resolution work nicely is better than having quasi random numbers spit out.

jklymak · 2018-01-07T18:03:28Z

lib/matplotlib/dates.py

@@ -1296,6 +1304,11 @@ def get_locator(self, dmin, dmax):
            locator = RRuleLocator(rrule, self.tz)
        else:
            locator = MicrosecondLocator(interval, tz=self.tz)
+            if dmin.year > 20 and interval < 1000:
+                warnings.warn('Plotting microsecond time intervals is not'


I'd do _log.warn so knowledgeable users can turn this warning off.

jklymak · 2018-01-07T18:07:03Z

lib/matplotlib/dates.py

-    # Round down to the nearest microsecond.
-    dt += datetime.timedelta(microseconds=int(remainder * MUSECONDS_PER_DAY))
+    # Round to the nearest microsecond.
+    dt += datetime.timedelta(


This could be an else of the if below, couldn't it?

Then the code below would need to change. The above is needed to set the hours, minutes, and seconds; the stuff below changes only microseconds.

jklymak · 2018-01-07T18:08:16Z

lib/matplotlib/dates.py

@@ -96,11 +96,11 @@
      <../gallery/ticks_and_spines/date_demo_rrule.html>`_.

    * :class:`AutoDateLocator`: On autoscale, this class picks the best
-      :class:`RRuleLocator` to set the view limits and the tick
+      :class:`DateLocator` to set the view limits and the tick


I think in general its more useful to point to the RRuleLocator than the generic DateLocator. If you want to be pedantic, you can refer to both.

In my first draft I had been referring to both, but then found it too complicated/pedantic for this description. The Yearlocator is also no RRuleLocator.

jklymak · 2018-01-09T00:58:34Z

@efiring I think this PR improves the situation. It actually simplifies the microsecond handling, and it improves the documentation and adds a warning to the user that MPL/datetime is doing something imprecise. We can talk more about deprecation later, as better Datetime handling is still a todo item I have so it won't be forgotten.

jklymak · 2018-01-19T17:41:47Z

@nmartensen is this still a WIP?

nmartensen · 2018-01-19T17:51:19Z

No, you are right. I don't have plans for further changes, unless you have more comments. Changed the title accordingly.

jklymak · 2018-01-19T17:55:42Z

Great - it needs a second approver before we can merge. @efiring appears somewhat against, so that may squash other support 😉 which means it might not get merged. This happens sometimes. However, your contribution is very much aprpeciated, and if it doesn't get included, please do take advantage of having the machinery set up to send more PRs.

efiring

I can agree with one of @jklymak's comments: the code addition is moderate, it is not affecting the API, and overall this is an improvement, particularly via the documentation.

jklymak · 2018-01-19T18:41:23Z

Thanks @efiring and thanks a lot @nmartensen !

dstansby added topic: date handling Documentation labels Dec 23, 2017

nmartensen force-pushed the us_round branch from a549ab5 to a5cddc0 Compare December 23, 2017 18:45

nmartensen added 13 commits December 27, 2017 20:34

dates: fix guaranteed typo

140b766

doc: AutoDateLocator does not always pick RRuleLocator

9902349

doc: MicrosecondLocator: Add note on limitations and workaround

6107cf3

dates: remove "bad plot" warning from docstring note

bc92319

The proposed workaround now works correctly, after the musec precision handling has been added. Therefore the warning is no longer needed.

dates: adjust view/datalim for longer plots

59f7f52

dates: coerce to int

7a74530

test_dates: update test_auto_date_locator* expected results

3f3dfcf

test_axes: update expected result images

63b9849

test_dates: update DateFormatter expected result image

c419583

nmartensen force-pushed the us_round branch from a5cddc0 to c419583 Compare December 27, 2017 19:38

tacaswell added this to the v2.2 milestone Jan 2, 2018

dates/microsecondlocator: refine warning

d67333e

nmartensen added 4 commits January 7, 2018 18:03

doc: dates: update microsecondlocator note

3dd0800

Revert "test_axes: update expected result images"

6fb7c12

This reverts commit 63b9849. Latest code changes now lead to the original result images from the axes tests.

test_dates: update fractional seconds result image

e04464a

jklymak reviewed Jan 7, 2018

View reviewed changes

nmartensen added 2 commits January 8, 2018 18:16

dates: use _log.warn for sub-second warning

4a4c216

dates: address review comments

45ff942

nmartensen force-pushed the us_round branch from 1ab7935 to 45ff942 Compare January 8, 2018 18:02

jklymak approved these changes Jan 9, 2018

View reviewed changes

nmartensen changed the title ~~WIP: sub-second datetime plotting~~ improve sub-second datetime plotting and documentation Jan 19, 2018

efiring approved these changes Jan 19, 2018

View reviewed changes

jklymak merged commit ac2da0e into matplotlib:master Jan 19, 2018

nmartensen mentioned this pull request Jan 21, 2018

datetime and sub-second resolution plotting #10073

Closed

QuLogic modified the milestones: needs sorting, v2.2.0 Feb 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

improve sub-second datetime plotting and documentation #10076

improve sub-second datetime plotting and documentation #10076

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

improve sub-second datetime plotting and documentation #10076

improve sub-second datetime plotting and documentation #10076

Uh oh!

Conversation

Uh oh!

PR Summary

PR Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!