8000 DOC: add units to user/explain [ci doc] by jklymak · Pull Request #26969 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

DOC: add units to user/explain [ci doc] #26969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions galleries/examples/ticks/date_concise_formatter.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""
.. _date_concise_formatter:

================================================
Formatting date ticks using ConciseDateFormatter
================================================
Expand Down
2 changes: 2 additions & 0 deletions galleries/examples/ticks/date_formatters_locators.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""
.. _date_formatters_locators:

=================================
Date tick locators and formatters
=================================
Expand Down
2 changes: 2 additions & 0 deletions galleries/examples/units/basic_units.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""
.. _basic_units:

===========
Basic Units
===========
Expand Down
293 changes: 293 a 8000 dditions & 0 deletions galleries/users_explain/axes/axes_units.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
"""
.. _user_axes_units:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno that it's untuitive that this is in axes-I had a really hard time finding it in the TOC - I know that the implementation is on the axes, but using it feels more akin to the first box of more general functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first box is the quick start guide. This support could/should be mentioned there (if its not already), but no where near in this detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but no where near in this detail

I honestly think this guide is trying to do too many things, which is why I'm harping on more signposting please.
I think the general units stuff should be a sentence or so here and fleshed out in the units tutorial that I think @ksunden is working on. Which that's also on second thought where I'd put the info about convertors and formatters and stuff. That leaves two or three examples of dates and categories for the quick start guide. Basically I think folks generally shouldn't need loads of detail for something we're in theory providing as an automagic unless they're trying to make their own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also to my original point, I don';t think folks would think to look for it in Axes cause this is a data thing and everything else is layout or labeling
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general units stuff should be a sentence or so here and fleshed out in the units tutorial that I think

There is a sentence at the top that says:

The method to add converters to Matplotlib is described in `matplotlib.units`.
Here we briefly overview the built-in date and string converters.

The final section is because people are quite likely to see more general units support in other libraries or even in our examples. I would like to keep it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don';t think folks would think to look for it in Axes cause this is a data thing and everything else is layout or labeling

I disagree that this is not about labelling - that is the major advantage of unit support, providing useful tick locators and formatters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the major advantage of unit support, providing useful tick locators and formatters.

Yes, but using unitid data hands over control of locators and formatters to the unit implementation. The users interaction with the unit framework is in the data they plot, not in the ticks/labels (annotations) they set. It's analogous to colormapping, which is also mapping data to the unit (color) that the heatmap visualizations can work with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but using unitid data hands over control of locators and formatters to the unit implementation. The users interaction with the unit framework is in the data they plot, not in the ticks/labels (annotations) they set.

After passing unitful data to the plotting routines, which the user doesn't control, the only way to interact with the units machinery is via tick locators, formatters, and axis limits. Indeed these are the things that cause confusion, and the main motivation for this section. eg. user uses mplfinances converter which ignores weekends, and then tries to use our Locators and Formatters and get nonsense results. I'd say a lot of people who use datetimes end up setting the Locator and customizing the Formatters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say a lot of people who use datetimes end up setting the Locator and customizing the Formatter

But we have documentation to explain the datetime formatters/convertors. They literally can't use any other locator/formatter if they're using categorical or everything will break.

Copy link
Member
@story645 story645 Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which actually, I would support this document focusing on datetime (+ probably consolidating or linking out to some of the other datetime guides) and the general discussion being in what's currently called the quick start guide (and I think should be transitioned into overview).


==========================
Plotting dates and strings
==========================

The most basic way to use Matplotlib plotting methods is to pass coordinates in
as numerical numpy arrays. For example, ``plot(x, y)`` will work if ``x`` and
``y`` are numpy arrays of floats (or integers). Plotting methods will also
work if `numpy.asarray` will convert ``x`` and ``y`` to an array of floating
point numbers; e.g. ``x`` could be a python list.

Matplotlib also has the ability to convert other data types if a "unit
converter" exists for the data type. Matplotlib has two built-in converters,
one for dates and the other for lists of strings. Other downstream libraries
have their own converters to handle their data types.
Comment on lines +15 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically there is also one for decimal.Decimal, though admittedly it is just calling float(val).

(This is also included in the printout at the bottom of the page)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thats such a weird edge case, I'd prefer not to mention it. I assume few people use this.


The method to add converters to Matplotlib is described in `matplotlib.units`.
Here we briefly overview the built-in date and string converters.

Date conversion
===============
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can datetime be put near dates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what you mean here. The first sentence notes datetime and datetime64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 217/219 -> batch all the datetime stuff together


If ``x`` and/or ``y`` are a list of `datetime` or an array of
`numpy.datetime64`, Matplotlib has a built-in converter that will convert the
datetime to a float, and add tick locators and formatters to the axis that are
appropriate for dates. See `matplotlib.dates`.

In the following example, the x-axis gains a converter that converts from
`numpy.datetime64` to float, and a locator that put ticks at the beginning of
the month, and a formatter that label the ticks appropriately:
"""

import numpy as np

import matplotlib.dates as mdates
import matplotlib.units as munits

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5.4, 2), layout='constrained')
time = np.arange('1980-01-01', '1980-06-25', dtype='datetime64[D]')
x = np.arange(len(time))
ax.plot(time, x)

# %%
#
# Note that if we try to plot a float on the x-axis, it will be plotted in
# units of days since the "epoch" for the converter, in this case 1970-01-01
# (see :ref:`date-format`). So when we plot the value 0, the ticks start at
# 1970-01-01. (The locator also now chooses every two years for a tick instead
# of every month):

fig, ax = plt.subplots(figsize=(5.4, 2), layout='constrained')
time = np.arange('1980-01-01', '1980-06-25', dtype='datetime64[D]')
x = np.arange(len(time))
ax.plot(time, x)
# 0 gets labeled as 1970-01-01
ax.plot(0, 0, 'd')
ax.text(0, 0, ' Float x=0', rotation=45)

# %%
#
# We can customize the locator and the formatter; see :ref:`date-locators` and
# :ref:`date-formatters` for a complete list, and
# :ref:`date_formatters_locators` for examples of them in use. Here we locate
# by every second month, and format just with the month's 3-letter name using
# ``"%b"`` (see `~datetime.datetime.strftime` for format codes):

fig, ax = plt.subplots(figsize=(5.4, 2), layout='constrained')
time = np.arange('1980-01-01', '1980-06-25', dtype='datetime64[D]')
x = np.arange(len(time))
ax.plot(time, x)
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=np.arange(1, 13, 2)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.set_xlabel('1980')

# %%
#
# The default locator is the `~.dates.AutoDateLocator`, and the default
# Formatter `~.dates.AutoDateFormatter`. There are also "concise" formatter
# and locators that give a more compact labelling, and can be set via rcParams.
# Note how instead of the redundant "Jan" label at the start of the year,
# "1980" is used instead. See :ref:`date_concise_formatter` for more examples.

plt.rcParams['date.converter'] = 'concise'

fig, ax = plt.subplots(figsize=(5.4, 2), layout='constrained')
time = np.arange('1980-01-01', '1980-06-25', dtype='datetime64[D]')
x = np.arange(len(time))
ax.plot(time, x)

# %%
#
# We can set the limits on the axis either by passing the appropriate dates as
# limits, or by passing a floating-point value in the proper units of days
# since the epoch. If we need it, we can get this value from
# `~.dates.date2num`.

fig, axs = plt.subplots(2, 1, figsize=(5.4, 3), layout='constrained')
for ax in axs.flat:
time = np.arange('1980-01-01', '1980-06-25', dtype='datetime64[D]')
x = np.arange(len(time))
ax.plot(time, x)

# set xlim using datetime64:
axs[0].set_xlim(np.datetime64('1980-02-01'), np.datetime64('1980-04-01'))

# set xlim using floats:
# Note can get from mdates.date2num(np.datetime64('1980-02-01'))
axs[1].set_xlim(3683, 3683+60)

# %%
#
# String conversion: categorical plots
# ====================================
#
# Sometimes we want to label categories on an axis rather than numbers.
# Matplotlib allows this using a "categorical" converter (see
# `~.matplotlib.category`).

data = {'apple': 10, 'orange': 15, 'lemon': 5, 'lime': 20}
names = list(data.keys())
values = list(data.values())

fig, axs = plt.subplots(1, 3, figsize=(7, 3), sharey=True, layout='constrained')
axs[0].bar(names, values)
axs[1].scatter(names, values)
axs[2].plot(names, values)
fig.suptitle('Categorical Plotting')

# %%
#
# Note that the "categories" are plotted in the order that they are first
# specified and that subsequent plotting in a different order will not affect
# the original order. Further, new additions will be added on the end (see
# "pear" below):

fig, ax = plt.subplots(figsize=(5, 3), layout='constrained')
ax.bar(names, values)

# plot in a different order:
ax.scatter(['lemon', 'apple'], [7, 12])

# add a new category, "pear", and put the other categories in a different order:
ax.plot(['pear', 'orange', 'apple', 'lemon'], [13, 10, 7, 12], color='C1')


# %%
#
# Note that when using ``plot`` like in the above, the order of the plotting is
# mapped onto the original order of the data, so the new line goes in the order
# specified.
#
# The category converter maps from categories to integers, starting at zero. So
# data can also be manually added to the axis using a float. Note that if a
# float is passed in that does not have a "category" associated with it, the
# data point can still be plotted, but a tick will not be created. In the
# following, we plot data at 4.0 and 2.5, but no tick is added there because
# those are not categories.

fig, ax = plt.subplots(figsize=(5, 3), layout='constrained')
ax.bar(names, values)
# arguments for styling the labels below:
args = {'rotation': 70, 'color': 'C1',
'bbox': {'color': 'white', 'alpha': .7, 'boxstyle': 'round'}}


# 0 gets labeled as "apple"
ax.plot(0, 2, 'd', color='C1')
ax.text(0, 3, 'Float x=0', **args)

# 2 gets labeled as "lemon"
ax.plot(2, 2, 'd', color='C1')
ax.text(2, 3, 'Float x=2', **args)

# 4 doesn't get a label
ax.plot(4, 2, 'd', color='C1')
ax.text(4, 3, 'Float x=4', **args)

# 2.5 doesn't get a label
ax.plot(2.5, 2, 'd', color='C1')
ax.text(2.5, 3, 'Float x=2.5', **args)

# %%
#
# Setting the limits for a category axis can be done by specifying the
# categories, or by specifying floating point numbers:

fig, axs = plt.subplots(2, 1, figsize=(5, 5), layout='constrained')
ax = axs[0]
ax.bar(names, values)
ax.set_xlim('orange', 'lemon')
ax.set_xlabel('limits set with categories')
ax = axs[1]
ax.bar(names, values)
ax.set_xlim(0.5, 2.5)
ax.set_xlabel('limits set with floats')

# %%
#
# The category axes are helpful for some plot types, but can lead to confusion
# if data is read in as a list of strings, even if it is meant to be a list of
# floats or dates. This sometimes happens when reading comma-separated value
# (CSV) files. The categorical locator and formatter will put a tick at every
# string value and label each one as well:
Comment on lines +203 to +207
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already an FAQ entry, can it please only be in one place?

Suggested change
# The category axes are helpful for some plot types, but can lead to confusion
# if data is read in as a list of strings, even if it is meant to be a list of
# floats or dates. This sometimes happens when reading comma-separated value
# (CSV) files. The categorical locator and formatter will put a tick at every
# string value and label each one as well:
# The category axes are helpful for some plot types, but can lead to confusion
# if data is read in as a list of strings, even if it is meant to be a list of
# floats or dates. This sometimes happens when reading comma-separated value
# (CSV) files. The categorical locator and formatter will put a tick at every
# string value and label each one as well:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes up naturally here as well. I don't see the harm in discussing it here in context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means maintaining two sets of documentation, which introduces the documentation maintenance cost of keeping them in sync. I think it's fine to have here, but either this should link to the trouble shooting entry, or (my preference honestly), the troubleshooting reference should be deleted and point folks here.

< F42D details class="details-overlay details-reset position-relative d-inline-block"> Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need both - this is a super-common problem. I don't think the duplication is at all a maintenance burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the duplication is at all a maintenance burden.

It means that if there's a change to the unit framework that changes this behavior, there are 2 places this information needs to be updated. And there are 2 places to link folks when they ask "hey, what should I do here?", which is the doc equivalent of the API complaint we get all the time about having way too many ways to do the same thing. And we've been working at that on the API side & trying to be more explicit about when to reach for which function & same here - is there a reason that the FAQ needs to carry around a second version of this troubleshooting rather than linking here?

An added bonus of linking the FAQ to this document is that then the user is pointed to it & if they're having trouble with datetimes they can get more context here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood your point the first time. I simply disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll do a follow up PR where I delete the FAQ entry and link here.
https://en.wikipedia.org/wiki/Single_source_of_truth

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jody, there the small harm in discussing this in multiple places is out weighed by making any given section readable on its own.

Copy link
Member
@story645 story645 Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would linking the FAQ entry back to this guide harm readability? the semi-consensus in #17920 was to transitioning the FAQ into more of an alternative TOC, and linking the FAQ entry back into the guide is inline w/ that consensus.


fig, ax = plt.subplots(figsize=(5.4, 2.5), layout='constrained')
x = [str(xx) for xx in np.arange(100)] # list of strings
ax.plot(x, np.arange(100))
ax.set_xlabel('x is list of strings')

# %%
#
# If this is not desired, then simply convert the data to floats before plotting:

fig, ax = plt.subplots(figsize=(5.4, 2.5), layout='constrained')
x = np.asarray(x, dtype='float') # array of float.
ax.plot(x, np.arange(100))
ax.set_xlabel('x is array of floats')

# %%
#
# Determine converter, formatter, and locator on an axis
# ======================================================
#
# Sometimes it is helpful to be able to debug what Matplotlib is using to
# convert the incoming data. We can do that by querying the ``converter``
# property on the axis. We can also query the formatters and locators using
# `~.axis.Axis.get_major_locator` and `~.axis.Axis.get_major_formatter`.
#
# Note that by default the converter is *None*.

fig, axs = plt.subplots(3, 1, figsize=(6.4, 7), layout='constrained')
x = np.arange(100)
ax = axs[0]
ax.plot(x, x)
label = f'Converter: {ax.xaxis.converter}\n '
label += f'Locator: {ax.xaxis.get_major_locator()}\n'
label += f'Formatter: {ax.xaxis.get_major_formatter()}\n'
ax.set_xlabel(label)

ax = axs[1]
time = np.arange('1980-01-01', '1980-06-25', dtype='datetime64[D]')
x = np.arange(len(time))
ax.plot(time, x)
label = f'Converter: {ax.xaxis.converter}\n '
label += f'Locator: {ax.xaxis.get_major_locator()}\n'
label += f'Formatter: {ax.xaxis.get_major_formatter()}\n'
ax.set_xlabel(label)

ax = axs[2]
data = {'apple': 10, 'orange': 15, 'lemon': 5, 'lime': 20}
names = list(data.keys())
values = list(data.values())
ax.plot(names, values)
label = f'Converter: {ax.xaxis.converter}\n '
label += f'Locator: {ax.xaxis.get_major_locator()}\n'
label += f'Formatter: {ax.xaxis.get_major_formatter()}\n'
ax.set_xlabel(label)

# %%
#
# More about "unit" support
# =========================
#
# The support for dates and categories is part of "units" support that is built
# into Matplotlib. This is described at `.matplotlib.units` and in the
# :ref:`basic_units` example.
#
# Unit support works by querying the type of data passed to the plotting
# function and dispatching to the first converter in a list that accepts that
# type of data. So below, if ``x`` has ``datetime`` objects in it, the
# converter will be ``_SwitchableDateConverter``; if it has has strings in it,
# it will be sent to the ``StrCategoryConverter``.

for k, v in munits.registry.items():
print(f"type: {k};\n converter: {type(v)}")

# %%
#
# There are a number of downstream libraries that provide their own converters
# with locators and formatters. Physical unit support is provided by
# `astropy <https://www.astropy.org>`_, `pint <https://pint.readthedocs.io>`_, and
# `unyt <https://unyt.readthedocs.io>`_, among others.
#
# High level libraries like `pandas <https://pandas.pydata.org>`_ and
# `nc-time-axis <https://nc-time-axis.readthedocs.io>`_ (and thus
# `xarray <https://docs.xarray.dev>`_) provide their own datetime support.
# This support can sometimes be incompatible with Matplotlib native datetime
# support, so care should be taken when using Matplotlib locators and
# formatters if these libraries are being used.
2 changes: 1 addition & 1 deletion galleries/users_explain/axes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ annotations like x- and y-labels, titles, and legends.
color='darkgrey')
fig.suptitle('plt.subplots()')


.. toctree::
:maxdepth: 2

Expand All @@ -43,6 +42,7 @@ annotations like x- and y-labels, titles, and legends.

axes_scales
axes_ticks
axes_units
Legends <legend_guide>
Subplot mosaic <mosaic>

Expand Down
0