-
Notifications
You must be signed in to change notification settings - Fork 45
Add ManifestStore for loading data from ManifestArrays #490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #490 +/- ##
===========================================
- Coverage 87.91% 87.65% -0.26%
===========================================
Files 28 31 +3
Lines 1713 1888 +175
===========================================
+ Hits 1506 1655 +149
- Misses 207 233 +26
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @maxrjones - I'm so impressed that you got this far this quickly. My comments are mostly nits about API choices.
virtualizarr/storage/_obstore.py
Outdated
except _ALLOWED_EXCEPTIONS: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are any of these exceptions allowed? Don't we want to raise them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how Zarr handles empty chunks (xref #16). I'll need to put some more thought into this and return with a more complete answer later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you read this? #427 (comment)
Seems related.
Co-authored-by: Tom Nicholas <tom@earthmover.io>
Thanks for prompt review, @TomNicholas! I addressed the simpler comments and will build out the test suite and request a follow-up once that's done and all the exposed issues are solved. |
Co-authored-by: Tom Nicholas <tom@earthmover.io>
# Check each key to see if it's a prefix of the uri_string | ||
for key in sorted_keys: | ||
if request_key.startswith(key): | ||
return StoreRequest(store=stores[key], key=request_key[len(key) :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using key=request_key[len(key) :]
makes some assumptions that are not universally valid. I will look into this tomorrow.
This has tests, which pass, so LGTM so we can iterate + build upon it. |
Co-authored-by: Tom Nicholas <tom@earthmover.io>
for more information, see https://pre-commit.ci
* need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * Add ManifestStore for loading data from ManifestArrays (#490) * Draft ManifestStore implementation --------- Co-authored-by: Tom Nicholas <tom@earthmover.io> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make it work for dask delayed * correct docstring --------- Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* copy implementation from xarray * sketch idea for C496 lithops parallelization * standardize naming of variables * add to public API * fix errors caused by trying to import xarray types * start writing tests * passing test for combining in serial * requires_kerchunk * test for lithops with default LocalHost executor * notes on confusing AssertionError * ensure lithops is installed * remove uneeded fixture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Additions to `open_virtual_mfdataset` (#508) * need latest version of xarray to import internals correctly * Fix metadata equality for nan fill value (#502) * add check that works for fill_values too * note about removing once merged upstream * type hint * regression test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove accidental changes to pyproject.toml * Update pyproject.toml * ignore mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Setup intersphinx mapping for docs (#503) * Setup intersphinx mapping for docs --------- Co-authored-by: Kyle Barron <kylebarron2@gmail.com> * Change default loadable_variables (and indexes) to match xarray's behaviour (#477) * draft refactor * sketch of simplified handling of loadable_variables * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get at least some tests working * separate VirtualBackend api definition from common utilities * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove indexes={} everywhere in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * stop passing through loadable_variables to where it isn't used * implement logic to load 1D dimension coords by default * remove more instances of indexes={} * remove more indexes={} * refactor logic for choosing loadable_variables * fix more tets * xfail Aimee's test that I don't understand * xfail test that explicitly specifies no indexes * made a bunch more stuff pass * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix netcdf3 reader * fix bad import in FITS reader * fix import in tiff reader * fix import in icechunk test * release note * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix fits reader * xfail on empty dict for indexes * linting * actually test new expected behaviour * fix logic for setting loadable_variables * update docs page to reflect new behaviour * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix expected behaviour in another tests * additional assert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use encode_dataset_coordinates in kerchunk writer * Encode zarr vars * fix some mypy errors * move drop_variables implmentation to the end of every reader * override loadable_variables and raise warning * fix failing test by not creating loadable variables that would get inlined by default * improve error message * remove some more occurrences of indexes={} * skip slow test * slay mypy errors * docs typos * should fix dmrpp test * Delete commented-out code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unecessary test skip --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> * Update pyproject.toml deps (#504) * re-add icechunk to upstream tests * add pytest-asyncio to test envs * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> * Additions to `open_virtual_mfdataset` (#509) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * Add ManifestStore for loading data from ManifestArrays (#490) * Draft ManifestStore implementation --------- Co-authored-by: Tom Nicholas <tom@earthmover.io> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make it work for dask delayed * correct docstring --------- Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * More open_virtual_mfdataset (#510) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * make it work for dask delayed * correct docstring * added compliant executor for lithops * add links to lithops issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Final fixes for open_virtual_mfdataset (#517) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * make it work for dask delayed * correct docstring * added compliant executor for lithops * add links to lithops issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * specify dask and lithops executors with a string again * fix easy typing stuff * fix typing errors by aligning executor signatures * remove open_virtual_mfdataset from public API for now * release note * refactor construction of expected result * implement preprocess arg, and dodge lithops bug * update comment --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply suggestions from code reviewRemRemove new deps * remove rogue print statement --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
This PR allows loading data directly from ManifestArrays using obstore.
Possible TO-DOs:
get_partial_values
,exists
,list_prefix
, andlist
.General checklist
docs/releases.rst
api.rst