8000 Ability to Pass Dask Arrays as `data` in DataArray Creation · Issue #4650 · pydata/xarray · GitHub
[go: up one dir, main page]

Skip to content

Ability to Pass Dask Arrays as data in DataArray Creation #4650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AyrtonB opened this issue Dec 5, 2020 · 4 comments · May be fixed by #4659
Closed

Ability to Pass Dask Arrays as data in DataArray Creation #4650

AyrtonB opened this issue Dec 5, 2020 · 4 comments · May be fixed by #4659

Comments

@AyrtonB
Copy link
Contributor
AyrtonB commented Dec 5, 2020

Is your feature request related to a problem? Please describe.

I'm trying to convert a dask dataframe into a dask xarray without having to load the data fully into memory.

I was hoping I'd be able to pass df.values which is a Dask array to the data parameter in xr.DataArray

idx_dim = 'datetime'
col_dim = 'fueltypes'

xr.DataArray(df.values, [df.index, df.columns], [idx_dim, col_dim])

However this raises the error: ValueError: conflicting sizes for dimension 'datetime': length nan on the data but length 90386 on coordinate 'datetime'


Describe the solution you'd like

An ability to create DataArrays from dask dataframes, similar to the existing reverse method for converting Datasets to dask dataframes: Dataset.to_dask_dataframe


Describe alternatives you've considered

I tried using xr.Dataset.from_dataframe(df) but it required the dataframe to be fully loaded into memory

Additionally, unlike the standard Pandas dataframe the Dask dataframe does not have a .to_xarray method.


Additional context

This is in part made necessary by the decision of the Zarr developers to not support saving of dask dataframes to zarr, instead suggesting that you convert to an xarray and then save that to zarr.

@keewis
Copy link
Collaborator
keewis commented Dec 5, 2020

Duplicate of #3929

@keewis keewis marked this as a duplicate of #3929 Dec 5, 2020
@keewis
Copy link
Collaborator
keewis commented Dec 5, 2020

As stated there and in dask/dask#6058, we would need to add a Dataset.from_dask_dataframe method which dask.dataframe could then wrap in a to_xarray method. This is currently only partially possible because we don't allow dask.array in dimension coordinates, but I guess it would be good to at least have partial support.

@keewis keewis closed this as completed Dec 5, 2020
@AyrtonB
Copy link
Contributor Author
AyrtonB commented Dec 5, 2020

Thanks, I saw dask/dask#6058 but missed #3929.

If I'm understanding you correctly there should be no problem passing a dask array for the data parameters its just the dims/coords. If the _infer_coords_and_dims method on DataArrays was adapted to check for any dask.delayed elements and compute them would that enable this functionality or are there additional blockers? Thanks for your help with this.

@AyrtonB
Copy link
Contributor Author
AyrtonB commented Dec 5, 2020

Have started to implement this but will continue the discussion in 3929

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0