-
Notifications
You must be signed in to change notification settings - Fork 113
Update and reorganize apply_ufunc material #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
a425211
Update and reorganize apply_ufunc material
dcherian 1291a5c
More complex output
dcherian 799c94a
Fix syntax
dcherian fccc15f
fix syntax?
dcherian 627fcce
Add more solutions
dcherian 20c4537
Remove nested code-cell
dcherian 3fe7152
Updates
dcherian 5f7bf8e
fix toc
dcherian a31526a
WIP
dcherian ad40947
small edits
dcherian 7884868
finish automatic vectorizing
dcherian 2e100c4
spellcheck
dcherian 62ddb3b
Expect exception raising
dcherian e7a82a0
fixes
dcherian d076f21
cleanup
dcherian d9a093d
small updates
dcherian 7b908be
Add numba vectorize notebook
dcherian 2befb26
fix
dcherian 748eab0
Small edits
dcherian f961e0e
Merge branch 'main' into apply-ufunc-reorg
dcherian 7540f2c
add numba logo
dcherian 87968ba
small text update
dcherian 1fe8d87
fix spelling
dcherian 68b9103
Tweaks to dask material
dcherian 2401c14
more tweaks
dcherian 67f973d
Review comments
dcherian 7f8170d
Migrate to exercise syntax.
dcherian 630e851
Merge branch 'main' into apply-ufunc-reorg
dcherian 4a55bcb
Add sphinx_exercise + review feedback
dcherian 4b92ebf
Break out core dimensions material
dcherian 0756860
fix
dcherian 742231b
fix link
dcherian 89a37c9
one more use of exercise syntax
dcherian 95e0717
More review comments.
dcherian 4cf9e9d
Significant dask updates.
dcherian e36fde5
last fix?
dcherian 6fefa59
more improvements.
dcherian 0c5fb88
lint
dcherian e2c77b0
suppress gufunc error formatting warning
dcherian 34e34c0
remove dask output spam
dcherian File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,359 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6849dcdc-3484-4f41-8b23-51613d36812f", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"(vectorize)=\n", | ||
"# Automatic Vectorization" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "afc56d28-6e55-4967-b27d-28e2cc539cc7", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"Previously we looked at [applying functions](gentle-intro) on numpy arrays, and the concept of [core dimensions](core-dimensions).\n", | ||
"We learned that functions commonly support specifying \"core dimensions\" through the `axis` keyword\n", | ||
"argument. \n", | ||
"\n", | ||
"However many functions exist, that implicitly have core dimensions, but do not provide an `axis` keyword\n", | ||
"argument. Applying such functions to a nD array usually involves one or multiple loops over the other dimensions\n", | ||
"--- termed \"loop dimensions\" or \"broadcast dimensions\".\n", | ||
"\n", | ||
"\n", | ||
"A good example is numpy's 1D interpolate function `numpy.interp`:\n", | ||
"\n", | ||
"```\n", | ||
" Signature: np.interp(x, xp, fp, left=None, right=None, period=None)\n", | ||
" Docstring:\n", | ||
" One-dimensional linear interpolation.\n", | ||
"\n", | ||
" Returns the one-dimensional piecewise linear interpolant to a function\n", | ||
" with given discrete data points (`xp`, `fp`), evaluated at `x`.\n", | ||
"```\n", | ||
"\n", | ||
"This function expects 1D arrays as input, so there is one core dimension and we cannot easily apply \n", | ||
"it to a nD array since there is no `axis` keyword argument. \n", | ||
"\n", | ||
"\n", | ||
"Our goal here is to \n", | ||
"1. Understand the difference between core dimensions and loop dimensions\n", | ||
"1. Understand vectorization\n", | ||
"1. Learn how to apply such functions without loops using `apply_ufunc` by providing the `vectorize` keyword argument.\n", | ||
"\n", | ||
"## Core dimensions and looping\n", | ||
"\n", | ||
"Let's say we want to\n", | ||
"interpolate an array with two dimensions (`space`, `time`) over the `time` dimension, we might \n", | ||
"1. loop over the `space` dimension, \n", | ||
"1. subset the array to a 1D array at that `space` location, \n", | ||
"1. Interpolate the 1D arrays to the new `time` vector, and\n", | ||
"1. Assign that new interpolated 1D array to the appropriate location of a 2D output array\n", | ||
"\n", | ||
"In pseudo-code this might look like\n", | ||
"\n", | ||
"```python\n", | ||
"for index in range(size_of_space_axis):\n", | ||
" out[index, :] = np.interp(..., array[index, :], ...)\n", | ||
"```\n", | ||
"\n", | ||
"\n", | ||
"```{exercise}\n", | ||
":label: coreloopdims\n", | ||
"\n", | ||
"Consider the example problem of interpolating a 2D array with dimensions `space` and `time` along the `time` dimension.\n", | ||
"Which dimension is the core dimension, and which is the \"loop dimension\"?\n", | ||
"```\n", | ||
"```{solution} coreloopdims\n", | ||
":class: dropdown\n", | ||
"\n", | ||
"`time` is the core dimension, and `space` is the loop dimension.\n", | ||
"```\n", | ||
"\n", | ||
"## Vectorization\n", | ||
"\n", | ||
"The pattern of looping over any number of \"loop dimensions\" and applying a function along \"core dimensions\" \n", | ||
"is so common that numpy provides wrappers that automate these steps: \n", | ||
"1. [numpy.apply_along_axis](https://numpy.org/doc/stable/reference/generated/numpy.apply_along_axis.html)\n", | ||
"1. [numpy.apply_over_axes](https://numpy.org/doc/stable/reference/generated/numpy.apply_over_axes.html)\n", | ||
"1. [numpy.vectorize](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html)\n", | ||
"\n", | ||
"\n", | ||
"`apply_ufunc` provides an easy interface to `numpy.vectorize` through the keyword argument `vectorize`. Here we see how to use\n", | ||
"that to automatically apply `np.interp` along a single axis of a nD array\n", | ||
"\n", | ||
"## Load data\n", | ||
"\n", | ||
"First lets load an example dataset\n", | ||
"\n", | ||
"```{tip}\n", | ||
"We'll reduce the length of error messages using `%xmode minimal` See the [ipython documentation](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-xmode) for details.\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "76aa13b8-5ced-4468-a72e-6b0a29172d6d", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%xmode minimal\n", | ||
"\n", | ||
"import xarray as xr\n", | ||
"import numpy as np\n", | ||
"\n", | ||
"xr.set_options(display_expand_data=False)\n", | ||
"\n", | ||
"air = (\n", | ||
" xr.tutorial.load_dataset(\"air_temperature\")\n", | ||
" .air.sortby(\"lat\") # np.interp needs coordinate in ascending order\n", | ||
" .isel(time=slice(4), lon=slice(3)) # choose a small subset for convenience\n", | ||
")\n", | ||
"air" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "81356724-6c1a-4d4a-9a32-bb906a9419b2", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"## Review\n", | ||
"\n", | ||
"\n", | ||
"We'll work with the `apply_ufunc` call from the section on [handling dimensions that change size](complex-output-change-size). See the \"Handling Complex Output\" section for how to get here.\n", | ||
"\n", | ||
"This version only works with 1D vectors. We will expand that to work with inputs of any number of dimensions." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "cb286fa0-deba-4929-b18a-79af5acb0b5b", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"newlat = np.linspace(15, 75, 100)\n", | ||
"\n", | ||
"xr.apply_ufunc(\n", | ||
" np.interp, # first the function\n", | ||
" newlat,\n", | ||
" air.lat,\n", | ||
" air.isel(lon=0, time=0), # this version only works with 1D vectors\n", | ||
" input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n", | ||
" output_core_dims=[[\"lat\"]],\n", | ||
" exclude_dims={\"lat\"},\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e3382658-14e1-4842-a618-ce7a27948c31", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"## Try nD input\n", | ||
"\n", | ||
"Our goal is to interpolate latitude at every longitude and time, such that we go from a dataset with dimensions `(time: 4, lat: 25, lon: 3)` to `(time: 4, lat: 100, lon: 3)`. \n", | ||
"\n", | ||
"If we blindly try passing `air` (a 3D DataArray), we get a hard-to-understand error" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1476bcce-cc7b-4252-90dd-f45502dffb09", | ||
"metadata": { | ||
"tags": [ | ||
"raises-exception" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"newlat = np.linspace(15, 75, 100)\n", | ||
"\n", | ||
"xr.apply_ufunc(\n", | ||
" np.interp, # first the function\n", | ||
" newlat,\n", | ||
" air.lat,\n", | ||
" air,\n", | ||
" input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n", | ||
" output_core_dims=[[\"lat\"]],\n", | ||
" exclude_dims={\"lat\"},\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1d1da9c2-a634-4920-890c-74d9bec9eab9", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"We will use a \"wrapper\" function `debug_interp` to examine what gets passed to `numpy.interp`.\n", | ||
"\n", | ||
"```{tip}\n", | ||
"Such wrapper functions are a great way to understand and debug `apply_ufunc` use cases.\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fa306dcf-eec3-425c-b278-42d15bbc0e4f", | ||
"metadata": { | ||
"tags": [ | ||
"raises-exception" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def debug_interp(xi, x, data):\n", | ||
" print(f\"data: {data.shape} | x: {x.shape} | xi: {xi.shape}\")\n", | ||
" return np.interp(xi, x, data)\n", | ||
"\n", | ||
"\n", | ||
"interped = xr.apply_ufunc(\n", | ||
" debug_interp, # first the function\n", | ||
" newlat,\n", | ||
" air.lat,\n", | ||
" air,\n", | ||
" input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n", | ||
" output_core_dims=[[\"lat\"]],\n", | ||
" exclude_dims={\"lat\"}, # dimensions allowed to change size. Must be set!\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6f5c928b-f8cb-4016-9d6d-39743f9c2976", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"That's a hard-to-interpret error from NumPy but our `print` call helpfully printed the shapes of the input data: \n", | ||
"\n", | ||
" data: (4, 3, 25) | x: (25,) | xi: (100,)\n", | ||
"\n", | ||
"We see that `apply_ufunc` passes the full 3D array to `interp1d_np` which in turn passes that on to `numpy.interp`. But `numpy.interp` requires a 1D input, and thus the error.\n", | ||
"\n", | ||
"Instead of passing the full 3D array we want loop over all combinations of `lon` and `time`; and apply our function to each corresponding vector of data along `lat`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "737cc6b4-522f-488c-9124-524cc42ebef3", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"## Vectorization with `np.vectorize`\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b6dac8da-8420-4fc4-9aeb-29b8999d4b37", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"`apply_ufunc` makes it easy to loop over the loop dimensions by specifying `vectorize=True`:\n", | ||
"\n", | ||
" vectorize : bool, optional\n", | ||
" If True, then assume ``func`` only takes arrays defined over core\n", | ||
" dimensions as input and vectorize it automatically with\n", | ||
" :py:func:`numpy.vectorize`. This option exists for convenience, but is\n", | ||
" almost always slower than supplying a pre-vectorized function.\n", | ||
" Using this option requires NumPy version 1.12 or newer.\n", | ||
" \n", | ||
"\n", | ||
"```{warning}\n", | ||
"Also see the numpy documentation for [numpy.vectorize](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html). Most importantly\n", | ||
"\n", | ||
" The vectorize function is provided primarily for convenience, not for performance. \n", | ||
" The implementation is essentially a for loop.\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d72fdd8c-44d2-4f6e-9fc4-7084e0e49986", | ||
"metadata": { | ||
"tags": [], | ||
"user_expressions": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"interped = xr.apply_ufunc(\n", | ||
" debug_interp, # first the function\n", | ||
" newlat,\n", | ||
" air.lat,\n", | ||
" air,\n", | ||
" input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n", | ||
" output_core_dims=[[\"lat\"]],\n", | ||
" exclude_dims={\"lat\"}, # dimensions allowed to change size. Must be set!\n", | ||
" vectorize=True,\n", | ||
")\n", | ||
"interped" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d81f399e-1649-4d4b-ad28-81cba8403210", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"Wow that worked!\n", | ||
"\n", | ||
"Notice that \n", | ||
"1. the printed input shapes are all 1D and correspond to one vector of size 25 along the `lat` dimension.\n", | ||
"2. `debug_interp` was called 4x3 = 12 times which is the total number `lat` vectors since the size along `time` is 4, and the size along `lon` is 3.\n", | ||
"3. The result `interped` is now an xarray object with coordinate values copied over from `data`. \n", | ||
"\n", | ||
"\n", | ||
"```{note}\n", | ||
"`lat` is now the *last* dimension in `interped`. This is a \"property\" of core dimensions: they are moved to the end before being sent to `interp1d_np` as noted in the docstring for `input_core_dims`\n", | ||
"\n", | ||
" Core dimensions are automatically moved to the last axes of input\n", | ||
" variables before applying ``func``, which facilitates using NumPy style\n", | ||
" generalized ufuncs [2]_.\n", | ||
"```\n", | ||
"\n", | ||
"## Conclusion\n", | ||
"This is why `apply_ufunc` is so convenient; it takes care of a lot of code necessary to apply functions that consume and produce numpy arrays to xarray objects.\n", | ||
"\n", | ||
"The `vectorize` keyword argument, when set to True, will use `numpy.vectorize` to apply the function by looping over the \"loop dimensions\" --- dimensions that are not the core dimensions for the applied function." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.