-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix dtype=S1 encoding in to_netcdf() #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xarray/backends/h5netcdf_.py
Outdated
@@ -184,8 +184,9 @@ def prepare_variable(self, name, variable, check_encoding=False, | |||
raise ValueError("'zlib' and 'compression' encodings mismatch") | |||
encoding.setdefault('compression', 'gzip') | |||
|
|||
if (check_encoding and encoding.get('complevel') not in | |||
(None, encoding.get('compression_opts'))): | |||
if (check_encoding and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W291 trailing whitespace
xarray/backends/h5netcdf_.py
Outdated
(None, encoding.get('compression_opts'))): | ||
if (check_encoding and | ||
'complevel' in encoding and 'compression_opts' in encoding and | ||
encoding['complevel'] != encoding['compression_opts']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E127 continuation line over-indented for visual indent
xarray/tests/test_backends.py
Outdated
with self.roundtrip(ds, save_kwargs=kwargs) as actual: | ||
assert_equal(actual, ds) | ||
assert actual.x.encoding['dtype'] == 'f4' | ||
assert actual.x.encoding['zlib'] == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E712 comparison to True should be 'if cond is True:' or 'if cond:'
xarray/tests/test_backends.py
Outdated
assert actual.x.encoding['dtype'] == 'f4' | ||
assert actual.x.encoding['zlib'] == True | ||
assert actual.x.encoding['complevel'] == 9 | ||
assert actual.x.encoding['fletcher32'] == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will b 10000 e displayed to describe this comment to others. Learn more.
E712 comparison to True should be 'if cond is True:' or 'if cond:'
xarray/tests/test_backends.py
Outdated
assert actual.x.encoding['complevel'] == 9 | ||
assert actual.x.encoding['fletcher32'] == True | ||
assert actual.x.encoding['chunksizes'] == (5,) | ||
assert actual.x.encoding['shuffle'] == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E712 comparison to True should be 'if cond is True:' or 'if cond:'
xarray/tests/test_backends.py
Outdated
# there should be no chunks | ||
self.assertEqual(v.chunks, None) | ||
assert v.chunks == None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E711 comparison to None should be 'if cond is None:'
xarray/tests/test_backends.py
Outdated
@@ -1793,19 +1818,19 @@ def test_dump_encodings_h5py(self): | |||
kwargs = {'encoding': {'x': { | |||
'compression': 'gzip', 'compression_opts': 9}}} | |||
with self.roundtrip(ds, save_kwargs=kwargs) as actual: | |||
self.assertEqual(actual.x.encoding['zlib'], True) | |||
self.assertEqual(actual.x.encoding['complevel'], 9) | |||
assert actual.x.encoding['zlib'] == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E712 comparison to True should be 'if cond is True:' or 'if cond:'
xarray/tests/test_backends.py
Outdated
self.assertEqual(actual.x.encoding['compression'], 'lzf') | ||
self.assertEqual(actual.x.encoding['compression_opts'], None) | ||
assert actual.x.encoding['compression'] == 'lzf' | ||
assert actual.x.encoding['compression_opts'] == None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E711 comparison to None should be 'if cond is None:'
xarray/tests/test_backends.py
Outdated
original = Dataset({'x': [u'foo', u'bar', u'baz']}) | ||
kwargs = dict(encoding={'x': {'dtype': str}}) | ||
with raises_regex(ValueError, 'encoding dtype=str for vlen'): | ||
with self.roundtrip(original, save_kwargs=kwargs) as actual: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F841 local variable 'actual' is assigned to but never used
Fixes GH2149
I think this is ready for another review. @crusaderky I ended up splitting up your test into a few smaller pieces, so we could more easily handle logic for different backends with subclassing. |
doc/whats-new.rst
Outdated
@@ -58,6 +62,9 @@ Bug fixes | |||
bug where non-scalar data-variables that did not include the aggregation | |||
dimension were improperly skipped. | |||
By `Stephan Hoyer <https://github.com/shoyer>`_ | |||
- Fixed a regression in 0.10.4, where specifying ``{'dtype': 'S1'}`` in | |||
``encoding`` with ``to_netcdf()`` raised an error. | |||
`Stephan Hoyer <https://github.com/shoyer>`_ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
xarray/backends/h5netcdf_.py
Outdated
vlen_dtype = h5py.check_dtype(vlen=var.dtype) | ||
if vlen_dtype is not None: | ||
if vlen_dtype is not unicode_type: # pragma: no cover | ||
raise NotImplementedError('unexpected vlen dtype: {!r}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference from var.dtype.kind == 'U'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h5py uses special NumPy dtypes (np.object_
with metadata
) for defining variable length types. It doesn't use var.dtype.kind == 'U'
, because HDF5 doesn't have a unicode type with a memory model matching np.unicode_
. (HDF5 implements UTF-8 with a fixed number of bytes for storage, not a fixed number of unicode characters like NumPy.)
xarray/conventions.py
Outdated
if 'dtype' in var.encoding and var.encoding['dtype'] != 'S1': | ||
if ('dtype' in var.encoding and | ||
var.encoding['dtype'] != 'S1' and | ||
var.encoding['dtype'] is not str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if var.encoding.get('dtype') not in (None, 'S1', str):
or a bit more verbose but more readable:
if 'dtype' in var.encoding and var.encoding['dtype'] not in ('S1', str):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my first version. Sadly it doesn't work due to numpy/numpy#7242. I added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype in {'S1', str}
hashesh dtype and compares the hashes.
dtype in ('S1', str)
invokes dtype.__eq__
and is the same as any(dtype == x for x in ('S1', str))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I misread that. Updated.
xarray/conventions.py
Outdated
warnings.warn("CF decoding is overwriting dtype on variable {!r}" | ||
.format(name)) | ||
else: | ||
if 'dtype' not in encoding: | ||
encoding['dtype'] = original_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding.set_default('dtype', original_dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
xarray/conventions.py
Outdated
warnings.warn("CF decoding is overwriting dtype on variable {!r}" | ||
.format(name)) | ||
else: | ||
if 'dtype' not in encoding: | ||
encoding['dtype'] = original_dtype | ||
|
||
if 'dtype' in attributes and attributes['dtype'] == 'bool': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if attributes.get('dtype') == 'bool':
xarray/tests/test_backends.py
Outdated
@@ -873,6 +886,7 @@ def create_tmp_files(nfiles, suffix='.nc', allow_cleanup_failure=False): | |||
|
|||
@requires_netCDF4 | |||
class BaseNetCDF4Test(CFEncodedDataTest): | |||
"""Tests for both netCDF4-python and h5netcdf.""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment contradicts the @requires_netCDF4
decorator immediately above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
xarray/tests/test_conventions.py
Outdated
@@ -274,3 +274,6 @@ def test_invalid_dataarray_names_raise(self): | |||
|
|||
def test_encoding_kwarg(self): | |||
pass | |||
|
|||
def test_encoding_kwarg_fixed_width_string(self): | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the purpose of this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TestCFEncodedDataStore
test is intended for checking the backend interface without depending on any particular file format. That means it doesn't support string encodings. (Added a comment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crusaderky thanks for the review!
doc/whats-new.rst
Outdated
@@ -58,6 +62,9 @@ Bug fixes | |||
bug where non-scalar data-variables that did not include the aggregation | |||
dimension were improperly skipped. | |||
By `Stephan Hoyer <https://github.com/shoyer>`_ | |||
- Fixed a regression in 0.10.4, where specifying ``{'dtype': 'S1'}`` in | |||
``encoding`` with ``to_netcdf()`` raised an error. | |||
`Stephan Hoyer <https://github.com/shoyer>`_ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
xarray/conventions.py
Outdated
if 'dtype' in var.encoding and var.encoding['dtype'] != 'S1': | ||
if ('dtype' in var.encoding and | ||
var.encoding['dtype'] != 'S1' and | ||
var.encoding['dtype'] is not str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my first version. Sadly it doesn't work due to numpy/numpy#7242. I added a comment.
xarray/conventions.py
Outdated
warnings.warn("CF decoding is overwriting dtype on variable {!r}" | ||
.format(name)) | ||
else: | ||
if 'dtype' not in encoding: | ||
encoding['dtype'] = original_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
xarray/tests/test_backends.py
Outdated
@@ -873,6 +886,7 @@ def create_tmp_files(nfiles, suffix='.nc', allow_cleanup_failure=False): | |||
|
|||
@requires_netCDF4 | |||
class BaseNetCDF4Test(CFEncodedDataTest): | |||
"""Tests for both netCDF4-python and h5netcdf.""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
xarray/tests/test_conventions.py
Outdated
@@ -274,3 +274,6 @@ def test_invalid_dataarray_names_raise(self): | |||
|
|||
def test_encoding_kwarg(self): | |||
pass | |||
|
|||
def test_encoding_kwarg_fixed_width_string(self): | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TestCFEncodedDataStore
test is intended for checking the backend interface without depending on any particular file format. That means it doesn't support string encodings. (Added a comment.)
I plan to merge this (and issue the 0.10.5 release) end of day today unless anyone has further comments. |
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)@crusaderky please take a look. Testing here is not as thorough as in #2150 yet, but it does include a regression test.