8000 Improve HTML reprs by shoyer · Pull Request #10816 · pydata/xarray · GitHub
[go: up one dir, main page]

Skip to content

Conversation

shoyer
Copy link
Member
@shoyer shoyer commented Oct 5, 2025

This PR adds a number of improvements and revisions to the Xarray's HTML reprs, especially for DataTree:

  1. No line breaks in long headers like "Data variables" and "Inherited Coordinates"
  2. Add ~4px of extra padding at the end of HTML reprs, to make pages like Xarray's docs look a little better
  3. Remove 2px shift on headers when actively clicked on. (I think this was intentional, but it seems to result in weird layout glitches because the :active selector doesn't always go away when focus is moved elsewhere)
  4. Remove the collapsable "Groups" header from DataTree. Instead, each group is separately collapsable, and shows the total number of contained elements.
  5. Truncation for too HTML elements is revised. I've added the options display_max_items and display_max_html_elements for controlling at what point the DataTree HTML repr collapses and truncates nodes, instead of doing this all based on display_max_children.

This needs a few more tests and release notes, but is ready for feedback! @jsignell @TomNicholas @benbovy

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Code to generate HTML previews:

import xarray as xr
import numpy as np

# Set up coordinates
time = xr.DataArray(data=["2022-01", "2023-01"], dims="time")
stations = xr.DataArray(data=list("abcdef"), dims="station")
lon = [-100, -80, -60]
lat = [10, 20, 30]

# Set up fake data
wind_speed = xr.DataArray(np.ones((2, 6)) * 2, dims=("time", "station"))
pressure = xr.DataArray(np.ones((2, 6)) * 3, dims=("time", "station"))
air_temperature = xr.DataArray(np.ones((2, 6)) * 4, dims=("time", "station"))
dewpoint = xr.DataArray(np.ones((2, 6)) * 5, dims=("time", "station"))
infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat"))
true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat"))

dt2 = xr.DataTree.from_dict(
    {
        "/": xr.Dataset(
            coords={"time": time},
        ),
        "/weather": xr.Dataset(
            coords={"station": stations},
            data_vars={
                "wind_speed": wind_speed,
                "pressure": pressure,
            },
        ),
        "/weather/temperature": xr.Dataset(
            data_vars={
                "air_temperature": air_temperature,
                "dewpoint": dewpoint,
            },
        ),
        "/satellite": xr.Dataset(
            coords={"lat": lat, "lon": lon},
            data_vars={
                "infrared": infrared,
                "true_color": true_color,
            },
        ),
    },
)
dt2['/other'] = xr.Dataset({f'x{i}': 0 for i in range(500)})

number_of_files = 20
number_of_groups = 50
tree_dict = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        tree_dict[f"file_{f}/group_{g}"] = xr.Dataset({"g": f * g})
tree_too_many = xr.DataTree.from_dict(tree_dict)


print("<h1>DataTree root</h1>")
print(dt2._repr_html_())

print("<hr />")
print("<h1>Dataset</h1>")

print(dt2.weather.to_dataset()._repr_html_())

print("<hr />")

print("<h1>DataTree inherited</h1>")
print(dt2.weather._repr_html_())

print("<hr />")
print("<h1>DataTree too many nodes</h1>")
print(tree_too_many._repr_html_())

Revised (this PR)

Interactive preview

image

Baseline

Interactive preview

image

@jsignell
Copy link
Contributor
jsignell commented Oct 9, 2025

Ok I took a look at this with this kind of evil DataTree from the truncation work:

import numpy as np
import xarray as xr

number_of_files = 700
number_of_groups = 5
number_of_variables= 10

datasets = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        # Create random data
        time = np.linspace(0, 50 + f, 1 + 1000 * g)
        y = f * time + g

        # Create dataset:
        ds = xr.Dataset(
            data_vars={
                f"temperature_{g}{i}": ("time", y)
                for i in range(number_of_variables // number_of_groups)
            },
            coords={"time": ("time", time)},
        ).chunk()

        # Prepare for xr.DataTree:
        name = f"file_{f}/group_{g}"
        datasets[name] = ds

dt = xr.DataTree.from_dict(datasets)

I really like the space changes and removing the collapsible "Groups" header and having each group be collapsible on its own.

I wasn't quite sure how to interpret the collapsed count for a group that just has one dataset in it. It seems like it is the n coords + n data_vars. Which seems odd. I think there shouldn't be a count on a group that just contains a single dataset.

The group level count when there are child groups should just be the number of groups.

image

I like the idea of having a display_max_html_elements and would be happy for it to be a lot lower than 300 by default, but truncation is still necessary for the case where there just are more than display_max_html_elements at the top level.

For instance you still get 700 top-level nodes in the repr when you do:

with xr.set_options(display_max_html_elements=5):
    display(dt)

I think in general it would be nice to be able to drill down into a particular node within the repr even if there are a bunch of items at a particular level.

@shoyer
Copy link
Member Author
shoyer commented Oct 9, 2025

I wasn't quite sure how to interpret the collapsed count for a group that just has one dataset in it. It seems like it is the n coords + n data_vars. Which seems odd. I think there shouldn't be a count on a group that just contains a single dataset.

The group level count when there are child groups should just be the number of groups.

The strategy I was using is counting the number of hidden items (at any level), with the idea being that it should be obvious if a large amount of data is hidden. Otherwise you could have a collapsed group marked as "(1)" that hides hundreds of data variables, which felt wrong to me.

I like the idea of having a display_max_html_elements and would be happy for it to be a lot lower than 300 by default, but truncation is still necessary for the case where there just are more than display_max_html_elements at the top level.

Do you think this is common? I don't think we do this for the other Xarray HTML reprs. They get collapsed but nodes are not truncated at the top level.

I think in general it would be nice to be able to drill down into a particular node within the repr even if there are a bunch of items at a particular level.

I am currently displaying DataTree elements in priority order, based on showing the top-most levels as completely as possible (breadth-first). We could start by going deep (depth-first), but this would mean that some high-level nodes could be truncated.

Maybe there's some compromise algorithm that could work better?

@benbovy
Copy link
Member
benbovy commented Oct 10, 2025

This looks great @shoyer!

I've always been confused by the collapsible "Groups" header so I think this is a nice improvement.

I haven't tried this branch yet. I only looked at the JSFiddle previews linked above where I noticed some UI/UX regressions in the "new reprs" one (e.g., no pointer cursor for variable attrs / data icons) although the CSS code seem to be not exactly the same as in this PR so it may not be an issue.

Regarding truncation of large objects I'm wondering whether we should use a same set of consistent rules across all objects for both the text and html reprs? It is tricky to find a solution that accommodates everyone so perhaps best would be to come up with a very basic solution (e.g., show the first and last n items for groups at any level, for variables, etc.) and provide a more flexible solution (pagination, etc.) via a 3rd-party widget with client-server communication.

The strategy I was using is counting the number of hidden items (at any level), with the idea being that it should be obvious if a large amount of data is hidden. Otherwise you could have a collapsed group marke 74DF d as "(1)" that hides hundreds of data variables, which felt wrong to me.

I think we can be more specific for collapsed groups, e.g., show "(2 subgroups)" for groups with no variable, "(5 variables)" for leaf groups, "(3 subgroups, 10 variables)" for hybrid groups, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0