-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
improve sub-second datetime plotting and documentation #10076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
updated with new commits:
|
With these commits, output with year <= 20 now looks exactly like the "Expected outcome" in #10073. Higher years trigger a warning. The precision issues with modern years are clearly visible in the tick labels and are not hidden in any way. |
Whenever the numeric input is small enough to provide actual microsecond precision, we don't want to always round down, but instead to the nearest microsecond. The difference becomes visible when the AutoDateLocator picks the MicrosecondLocator, and the format string for the tick labels is configured to show microseconds. Support for both was added in Matplotlib 2.0.
_to_ordinalf is used by num2date to actually convert floating point numbers to datetime objects. The main purpose of this conversion is to prepare the datetimes from which tick labels are generated. Since datetimes have microsecond resolution, but the input (float) does not have microsecond precision for any modern date, a 10-microsecond "snapping distance" around full seconds was used to obtain timestamps suitable for tick labels with per-second and lower resolution. However, when higher resolution is needed (which is now possible after the introduction of Microsecond support in AutoDateLocator and -Formatter in Matplotlib 2.0), unconditional full-second distance snapping is not appropriate: It is unable to perform rounding to milliseconds, and in addition may lead to multiple identical tick labels at highest plotting time resolution. The solution implemented here is to introduce a microsecond precision parameter (`musec_prec`) to the appropriate functions and function calls, and perform rounding to the nearest multiple of the given number of microseconds. The default value of 20 offers the same effect as the previous "snapping distance" around full seconds. In addition, it can also be turned off (when set to 1) or be used to perform rounding to full milliseconds.
DateLocator: viewlim_to_dt and datalim_to_dt should always get the most accurate datetimes we can offer, so turn off the newly introduced interval rounding for them. AutoDateFormatter: We need to decide whether the tick labels should be rounded when formatting them or not. Rounding is most likely always desired, except when exact microseconds are needed. Inclusion of microsecond resolution in the strftime format string is used as an indication that the user really wants to see microseconds and they are relevant, so simply do not round them in that case.
The proposed workaround now works correctly, after the musec precision handling has been added. Therefore the warning is no longer needed.
The warning threshold of is based on the assumption that the user really wants full microsecond resolution in the plot, and expects the plot frame to come out nicely (no jagged corners). Besides being able to identify accurate tick labels/positions, it requires a small "accuracy margin" for the drawing operations. If we assume a desired accuracy of 50ns in the underlying numerical time data, we need to go back to at least the year 27. It cannot hurt to start warning about the issue slightly earlier, hence the treshold is set to 20.
more updates:
Unfortunately github does not support stacked pull requests yet. Otherwise this WIP would consist of three PRs, one on top of the previous:
Commits implementing (3) already exist, but that's probably too much for this single PR. If you prefer, (1) and (2) can also be split into separate pull requests. |
@nmartensen Thanks for working on this! However, I am not sure that this is the best approach here. Adding a kwarg to all of the time related functions is a bit disruptive. How hard would it be to up the precision of the internal storage type? |
It seems to me that time-handling with high precision is so fundamentally broken (e.g., it ignores leap seconds) with all calendar-related code, not to mention matplotlib's use of double precision days since an absurdly long-ago epoch, that trying to patch it up with a hack--a fake year--is just making things more confusing. Wouldn't it make more sense for the user to pick an origin (e.g., the beginning of an experiment) and label the axis with microseconds (or milliseconds, or floating-point seconds) from that origin? Maybe make custom locators and formatters? |
I think 2 is easiest, though I'm not sure if our plotting respects I think 1 is most platform independent? 1) would probably just require |
Using long double is not an option--essentially, there is no such thing. See https://docs.scipy.org/doc/numpy-dev/user/basics.types.html. Changing our datenum is a < 8000 em>big api break. If we add an alternative--like datetime64, in one or more of its variants--we would have to maintain full compatibility with existing datenums. This might be the way to go, since we should have native datetime64 support anyway. What we have is a bit silly, but it works for normal dates and times. Datetime has microseconds, but given that it doesn't handle leap seconds, it is a mistake to think it is handling time units with high accuracy. Furthermore, what is a sensible way of labeling ticks on a time axis covering extremely short intervals? I doubt that it involves year, month, day, hour, minute, second, and microseconds. I think the best course of action here is to start by documenting the present limitation, and then carefully think through strategies for improvement, starting with the need for our own datetime64 support (recognizing that datetime64 is flawed, but we are probably stuck with it.) |
On 2018/01/02 3:00 PM, Jody Klymak wrote:
@efiring <https://github.com/efiring> yeah I agree.
If thats the consensus, maybe it makes more sense to deprecate the
microsecond Locator so people don't expect it to work?
That makes sense to me.
Great! I completely forgot about it.
|
Users not knowing about implementation details (such as microsecondlocators) may still expect it to work, since plotting datetime.datetime in general is supported, and datetime.datetime has microseconds. What's wrong with documenting the limitations and making it work as good as possible? The note added to the documentation already contains advice to use some other time format. |
If it could be done without a lot of extra code outside the locator and formatter it’s probably fine. I think the current situation plus a better doc/info output would suffice. |
addition of a new kwarg to num2date and _from_ordinalf is too intrusive. Since plotting microsecond time ranges with current dates isn't really possible, we don't need to try, so we can simply always do the musec interval rounding. Insert a hard-coded threshold making microsecond plotting still work for users trying to use the 'early-year' workaround hack.
This reverts commit 63b9849. Latest code changes now lead to the original result images from the axes tests.
What is the use case for a microsecond locator? Under what circumstances does it really make sense to use datetime machinery and calendars for generating tick labels for extremely short intervals? What can a user do with such a locator that cannot be done as well or better with plain numbers, using non-date-specific locators? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, except for the few minor points below. Certainly having 20 micro-second resolution work nicely is better than having quasi random numbers spit out.
lib/matplotlib/dates.py
Outdated
@@ -1296,6 +1304,11 @@ def get_locator(self, dmin, dmax): | |||
locator = RRuleLocator(rrule, self.tz) | |||
else: | |||
locator = MicrosecondLocator(interval, tz=self.tz) | |||
if dmin.year > 20 and interval < 1000: | |||
warnings.warn('Plotting microsecond time intervals is not' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do _log.warn
so knowledgeable users can turn this warning off.
lib/matplotlib/dates.py
Outdated
# Round down to the nearest microsecond. | ||
dt += datetime.timedelta(microseconds=int(remainder * MUSECONDS_PER_DAY)) | ||
# Round to the nearest microsecond. | ||
dt += datetime.timedelta( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an else
of the if
below, couldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the code below would need to change. The above is needed to set the hours, minutes, and seconds; the stuff below changes only microseconds.
lib/matplotlib/dates.py
Outdated
@@ -96,11 +96,11 @@ | |||
<../gallery/ticks_and_spines/date_demo_rrule.html>`_. | |||
|
|||
* :class:`AutoDateLocator`: On autoscale, this class picks the best | |||
:class:`RRuleLocator` to set the view limits and the tick | |||
:class:`DateLocator` to set the view limits and the tick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general its more useful to point to the RRuleLocator
than the generic DateLocator
. If you want to be pedantic, you can refer to both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my first draft I had been referring to both, but then found it too complicated/pedantic for this description. The Yearlocator
is also no RRuleLocator
.
@efiring I think this PR improves the situation. It actually simplifies the microsecond handling, and it improves the documentation and adds a warning to the user that MPL/datetime is doing something imprecise. We can talk more about deprecation later, as better Datetime handling is still a todo item I have so it won't be forgotten. |
@nmartensen is this still a WIP? |
No, you are right. I don't have plans for further changes, unless you have more comments. Changed the title accordingly. |
Great - it needs a second approver before we can merge. @efiring appears somewhat against, so that may squash other support 😉 which means it might not get merged. This happens sometimes. However, your contribution is very much aprpeciated, and if it doesn't get included, please do take advantage of having the machinery set up to send more PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can agree with one of @jklymak's comments: the code addition is moderate, it is not affecting the API, and overall this is an improvement, particularly via the documentation.
Thanks @efiring and thanks a lot @nmartensen ! |
PR Summary
Users trying to plot extremely short (sub-millisecond) time intervals using datetime will run into issues, as shown in #10073. I'd like to make the limitations more explicit in the documentation, and investigate if the situation can be improved without making things much more complicated.
Feedback is welcome.
Main changes right now are:
The rounding change improves the situation when using the early-year workaround and should not have any effect anywhere else.
The remaining issue (the reason for the warning at the end of the note) is the 10-millisecond "snapping distance" around full seconds (dates.py lines 292 to 296). This is needed in order to produce correct tick labels for plots of longer time intervals, therefore it cannot simply be dropped.
Perhaps an optional
microsecond_precision
parameter (default: 20) could be added to_from_ordinalf()
andnum2date()
, with microseconds then being rounded to nearest integer multiple of this? The MicrosecondLocator would be the only one to use it, and can set it to 1. Would something like this be acceptable? Then the warning in the note could be dropped.PR Checklist