xarray-contrib · dcherian · Jun 28, 2023 · Jun 14, 2023 · Jun 16, 2023 · Jun 16, 2023
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -28,10 +28,6 @@ repos:
     hooks:
       - id: flake8
 
-  - repo: https://github.com/asottile/seed-isort-config
-    rev: v2.2.0
-    hooks:
-      - id: seed-isort-config
   - repo: https://github.com/PyCQA/isort
     rev: 5.12.0
     hooks:

diff --git a/_config.yml b/_config.yml
@@ -65,9 +65,10 @@ sphinx:
     # maintain old paths and redirect them (so google results dont go to 404)
     # https://github.com/wpilibsuite/sphinxext-rediraffe
     - sphinxext.rediraffe
+    - sphinx_exercise
   config:
     # application/vnd.holoviews_load.v0+json, application/vnd.holoviews_exec.v0+json
-    suppress_warnings: ['mystnb.unknown_mime_type']
+    suppress_warnings: ['mystnb.unknown_mime_type', 'misc.highlighting_failure']
     notfound_context:
       body: "<h1>Whoops! 404 Page Not Found</h1>\n\n<p>Sorry, this page doesn't exist. Many sections of this book have been updated recently.</p><p> Try the search box 🔎 to find what you're looking for!</p>"
     notfound_urls_prefix: /

diff --git a/_toc.yml b/_toc.yml
@@ -47,8 +47,12 @@ parts:
       - file: advanced/apply_ufunc/apply_ufunc.md
         sections:
           - file: advanced/apply_ufunc/simple_numpy_apply_ufunc
-          - file: advanced/apply_ufunc/simple_dask_apply_ufunc
-          - file: advanced/apply_ufunc/vectorize_1d
+          - file: advanced/apply_ufunc/core-dimensions
+          - file: advanced/apply_ufunc/complex-output-numpy
+          - file: advanced/apply_ufunc/automatic-vectorizing-numpy
+          - file: advanced/apply_ufunc/dask_apply_ufunc
+          - file: advanced/apply_ufunc/numba-vectorization
+          - file: advanced/apply_ufunc/example-interp
       - file: advanced/map_blocks/map_blocks.md
         sections:
           - file: advanced/map_blocks/simple_map_blocks

diff --git a/advanced/apply_ufunc/automatic-vectorizing-numpy.ipynb b/advanced/apply_ufunc/automatic-vectorizing-numpy.ipynb
@@ -0,0 +1,359 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6849dcdc-3484-4f41-8b23-51613d36812f",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "(vectorize)=\n",
+    "# Automatic Vectorization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afc56d28-6e55-4967-b27d-28e2cc539cc7",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "Previously we looked at [applying functions](gentle-intro) on numpy arrays, and the concept of [core dimensions](core-dimensions).\n",
+    "We learned that functions commonly support specifying \"core dimensions\" through the `axis` keyword\n",
+    "argument. \n",
+    "\n",
+    "However many functions exist, that implicitly have core dimensions, but do not provide an `axis` keyword\n",
+    "argument. Applying such functions to a nD array usually involves one or multiple loops over the other dimensions\n",
+    "--- termed \"loop dimensions\" or \"broadcast dimensions\".\n",
+    "\n",
+    "\n",
+    "A good example is numpy's 1D interpolate function `numpy.interp`:\n",
+    "\n",
+    "```\n",
+    "    Signature: np.interp(x, xp, fp, left=None, right=None, period=None)\n",
+    "    Docstring:\n",
+    "        One-dimensional linear interpolation.\n",
+    "\n",
+    "    Returns the one-dimensional piecewise linear interpolant to a function\n",
+    "    with given discrete data points (`xp`, `fp`), evaluated at `x`.\n",
+    "```\n",
+    "\n",
+    "This function expects 1D arrays as input, so there is one core dimension and we cannot easily apply \n",
+    "it to a nD array since there is no `axis` keyword argument. \n",
+    "\n",
+    "\n",
+    "Our goal here is to \n",
+    "1. Understand the difference between core dimensions and loop dimensions\n",
+    "1. Understand vectorization\n",
+    "1. Learn how to apply such functions without loops using `apply_ufunc` by providing the `vectorize` keyword argument.\n",
+    "\n",
+    "## Core dimensions and looping\n",
+    "\n",
+    "Let's say we want to\n",
+    "interpolate an array with two dimensions (`space`, `time`) over the `time` dimension, we might \n",
+    "1. loop over the `space` dimension, \n",
+    "1. subset the array to a 1D array at that `space` location, \n",
+    "1. Interpolate the 1D arrays to the new `time` vector, and\n",
+    "1. Assign that new interpolated 1D array to the appropriate location of a 2D output array\n",
+    "\n",
+    "In pseudo-code this might look like\n",
+    "\n",
+    "```python\n",
+    "for index in range(size_of_space_axis):\n",
+    "    out[index, :] = np.interp(..., array[index, :], ...)\n",
+    "```\n",
+    "\n",
+    "\n",
+    "```{exercise}\n",
+    ":label: coreloopdims\n",
+    "\n",
+    "Consider the example problem of interpolating a 2D array with dimensions `space` and `time` along the `time` dimension.\n",
+    "Which dimension is the core dimension, and which is the \"loop dimension\"?\n",
+    "```\n",
+    "```{solution} coreloopdims\n",
+    ":class: dropdown\n",
+    "\n",
+    "`time` is the core dimension, and `space` is the loop dimension.\n",
+    "```\n",
+    "\n",
+    "## Vectorization\n",
+    "\n",
+    "The pattern of looping over any number of \"loop dimensions\" and applying a function along \"core dimensions\" \n",
+    "is so common that numpy provides wrappers that automate these steps: \n",
+    "1. [numpy.apply_along_axis](https://numpy.org/doc/stable/reference/generated/numpy.apply_along_axis.html)\n",
+    "1. [numpy.apply_over_axes](https://numpy.org/doc/stable/reference/generated/numpy.apply_over_axes.html)\n",
+    "1. [numpy.vectorize](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html)\n",
+    "\n",
+    "\n",
+    "`apply_ufunc` provides an easy interface to `numpy.vectorize` through the keyword argument `vectorize`. Here we see how to use\n",
+    "that to automatically apply `np.interp` along a single axis of a nD array\n",
+    "\n",
+    "## Load data\n",
+    "\n",
+    "First lets load an example dataset\n",
+    "\n",
+    "```{tip}\n",
+    "We'll reduce the length of error messages using `%xmode minimal` See the [ipython documentation](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-xmode) for details.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "76aa13b8-5ced-4468-a72e-6b0a29172d6d",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%xmode minimal\n",
+    "\n",
+    "import xarray as xr\n",
+    "import numpy as np\n",
+    "\n",
+    "xr.set_options(display_expand_data=False)\n",
+    "\n",
+    "air = (\n",
+    "    xr.tutorial.load_dataset(\"air_temperature\")\n",
+    "    .air.sortby(\"lat\")  # np.interp needs coordinate in ascending order\n",
+    "    .isel(time=slice(4), lon=slice(3))  # choose a small subset for convenience\n",
+    ")\n",
+    "air"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81356724-6c1a-4d4a-9a32-bb906a9419b2",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Review\n",
+    "\n",
+    "\n",
+    "We'll work with the `apply_ufunc` call from the section on [handling dimensions that change size](complex-output-change-size). See the \"Handling Complex Output\" section for how to get here.\n",
+    "\n",
+    "This version only works with 1D vectors. We will expand that to work with inputs of any number of dimensions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cb286fa0-deba-4929-b18a-79af5acb0b5b",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "newlat = np.linspace(15, 75, 100)\n",
+    "\n",
+    "xr.apply_ufunc(\n",
+    "    np.interp,  # first the function\n",
+    "    newlat,\n",
+    "    air.lat,\n",
+    "    air.isel(lon=0, time=0),  # this version only works with 1D vectors\n",
+    "    input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n",
+    "    output_core_dims=[[\"lat\"]],\n",
+    "    exclude_dims={\"lat\"},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3382658-14e1-4842-a618-ce7a27948c31",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Try nD input\n",
+    "\n",
+    "Our goal is to interpolate latitude at every longitude and time, such that we go from a dataset with dimensions `(time: 4, lat: 25, lon: 3)` to `(time: 4, lat: 100, lon: 3)`. \n",
+    "\n",
+    "If we blindly try passing `air` (a 3D DataArray), we get a hard-to-understand error"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1476bcce-cc7b-4252-90dd-f45502dffb09",
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "newlat = np.linspace(15, 75, 100)\n",
+    "\n",
+    "xr.apply_ufunc(\n",
+    "    np.interp,  # first the function\n",
+    "    newlat,\n",
+    "    air.lat,\n",
+    "    air,\n",
+    "    input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n",
+    "    output_core_dims=[[\"lat\"]],\n",
+    "    exclude_dims={\"lat\"},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d1da9c2-a634-4920-890c-74d9bec9eab9",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "We will use a \"wrapper\" function `debug_interp` to examine what gets passed to `numpy.interp`.\n",
+    "\n",
+    "```{tip}\n",
+    "Such wrapper functions are a great way to understand and debug `apply_ufunc` use cases.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fa306dcf-eec3-425c-b278-42d15bbc0e4f",
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "def debug_interp(xi, x, data):\n",
+    "    print(f\"data: {data.shape} | x: {x.shape} | xi: {xi.shape}\")\n",
+    "    return np.interp(xi, x, data)\n",
+    "\n",
+    "\n",
+    "interped = xr.apply_ufunc(\n",
+    "    debug_interp,  # first the function\n",
+    "    newlat,\n",
+    "    air.lat,\n",
+    "    air,\n",
+    "    input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n",
+    "    output_core_dims=[[\"lat\"]],\n",
+    "    exclude_dims={\"lat\"},  # dimensions allowed to change size. Must be set!\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f5c928b-f8cb-4016-9d6d-39743f9c2976",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "That's a hard-to-interpret error from NumPy but our `print` call helpfully printed the shapes of the input data: \n",
+    "\n",
+    "    data: (4, 3, 25) | x: (25,) | xi: (100,)\n",
+    "\n",
+    "We see that `apply_ufunc` passes the full 3D array to `interp1d_np` which in turn passes that on to `numpy.interp`. But `numpy.interp` requires a 1D input, and thus the error.\n",
+    "\n",
+    "Instead of passing the full 3D array we want loop over all combinations of `lon` and `time`; and apply our function to each corresponding vector of data along `lat`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "737cc6b4-522f-488c-9124-524cc42ebef3",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Vectorization with `np.vectorize`\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6dac8da-8420-4fc4-9aeb-29b8999d4b37",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "`apply_ufunc` makes it easy to loop over the loop dimensions by specifying `vectorize=True`:\n",
+    "\n",
+    "    vectorize : bool, optional\n",
+    "        If True, then assume ``func`` only takes arrays defined over core\n",
+    "        dimensions as input and vectorize it automatically with\n",
+    "        :py:func:`numpy.vectorize`. This option exists for convenience, but is\n",
+    "        almost always slower than supplying a pre-vectorized function.\n",
+    "        Using this option requires NumPy version 1.12 or newer.\n",
+    "        \n",
+    "\n",
+    "```{warning}\n",
+    "Also see the numpy documentation for [numpy.vectorize](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html). Most importantly\n",
+    "\n",
+    "    The vectorize function is provided primarily for convenience, not for performance. \n",
+    "    The implementation is essentially a for loop.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d72fdd8c-44d2-4f6e-9fc4-7084e0e49986",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "outputs": [],
+   "source": [
+    "interped = xr.apply_ufunc(\n",
+    "    debug_interp,  # first the function\n",
+    "    newlat,\n",
+    "    air.lat,\n",
+    "    air,\n",
+    "    input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n",
+    "    output_core_dims=[[\"lat\"]],\n",
+    "    exclude_dims={\"lat\"},  # dimensions allowed to change size. Must be set!\n",
+    "    vectorize=True,\n",
+    ")\n",
+    "interped"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d81f399e-1649-4d4b-ad28-81cba8403210",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "Wow that worked!\n",
+    "\n",
+    "Notice that \n",
+    "1. the printed input shapes are all 1D and correspond to one vector of size 25 along the `lat` dimension.\n",
+    "2. `debug_interp` was called 4x3 = 12 times which is the total number `lat` vectors since the size along `time` is 4, and the size along `lon` is 3.\n",
+    "3. The result `interped` is now an xarray object with coordinate values copied over from `data`. \n",
+    "\n",
+    "\n",
+    "```{note}\n",
+    "`lat` is now the *last* dimension in `interped`. This is a \"property\" of core dimensions: they are moved to the end before being sent to `interp1d_np` as noted in the docstring for `input_core_dims`\n",
+    "\n",
+    "        Core dimensions are automatically moved to the last axes of input\n",
+    "        variables before applying ``func``, which facilitates using NumPy style\n",
+    "        generalized ufuncs [2]_.\n",
+        "```\n",
+    "\n",
+    "## Conclusion\n",
+    "This is why `apply_ufunc` is so convenient; it takes care of a lot of code necessary to apply functions that consume and produce numpy arrays to xarray objects.\n",
+    "\n",
+    "The `vectorize` keyword argument, when set to True, will use `numpy.vectorize` to apply the function by looping over the \"loop dimensions\" --- dimensions that are not the core dimensions for the applied function."
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}