8000 use engine flox for ordered groups by mathause · Pull Request #266 · xarray-contrib/flox · GitHub
[go: up one dir, main page]

Skip to content

use engine flox for ordered groups #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Oct 15, 2023
Merged

Conversation

mathause
Copy link
Contributor
@mathause mathause commented Sep 29, 2023

Set engine=None and select engine="flox" for ordered groups. Some comments

  • this is certainly a very rough WIP
  • I think there could be other logic to determine the optimal engine
  • I only determine the engine in groupby_reduce however it is already accessed in xarray_reduce - this is not optimal (see below)
  • the testing approach is brute force at the moment - I think we'd need to test if the correct engine is choosen (and not just run all the tests)
  • so the engine detection should be done in a separate function and then we can test this function
  • I tested it and this would solve flox performance regression for cftime resampling pydata/xarray#7730 (at least for ordered groupings)

func == "count" and engine != "flox"

Copy link
Collaborator
@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

so the engine detection should be done in a separate function and then we can test this function

Yes!

I only determine the engine in groupby_reduce however it is already accessed in xarray_reduce - this is not optimal (see below)

Yes a helper function will fix this. xarray_reduce can jsut call that.

flox/core.py Outdated
@@ -1755,7 +1755,7 @@ def groupby_reduce(
dtype: np.typing.DTypeLike = None,
min_count: int | None = None,
method: T_Method = "map-reduce",
engine: T_Engine = "numpy",
engine: T_Engine = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring update.

flox/xarray.py Outdated
@@ -72,7 +72,7 @@ def xarray_reduce(
fill_value=None,
dtype: np.typing.DTypeLike = None,
method: str = "map-reduce",
engine: str = "numpy",
engine: str = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again needs docstring update.

@mathause
Copy link
Contributor Author
mathause commented Oct 6, 2023

Thanks for the feedback. I have to let this rest until I find some time again - so feel free to take this over if anyone is interested.

@dcherian dcherian marked this pull request as ready for review October 14, 2023 04:36
@dcherian dcherian merged commit fecd9a6 into xarray-contrib:main Oct 15, 2023
@mathause mathause deleted the engine_none branch October 15, 2023 04:41
@mathause
Copy link
Contributor Author

I hardly qualify to be on here 😅 thanks for finishing up - this will be a huge win!

dcherian added a commit that referenced this pull request Nov 3, 2023
* main: (24 commits)
  Add `packaging` as dependency
  use engine flox for ordered groups (#266)
  Update pyproject.toml: py3.12
  Bump numpy to >=1.22 (#278)
  Cleanups (#276)
  benchmarks updates (#273)
  repo-review comments (#270)
  Significantly faster cohorts detection. (#272)
  Add engine="numbagg" (#72)
  Support quantile, median, mode with method="blockwise". (#269)
  Add multidimensional binning demo (#203)
  [pre-commit.ci] pre-commit autoupdate (#268)
  Drop python 3.8, test python 3.11 (#209)
  tests: move xfail out of functions (#265)
  Bump actions/checkout from 3 to 4 (#267)
  convert datetime: micro-optimizations (#261)
  compatibility with `numpy>=2.0` (#257)
  replace the deprecated `provision-with-micromamba` with `setup-micromamba` (#258)
  Fix some typing errors in asv_bench and tests (#253)
  [pre-commit.ci] pre-commit autoupdate (#250)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use engine flox if array is ordered? optimize groupby for resample flox performance regression for cftime resampling
2 participants
0