8000 ENH: add in extension dtype registry by jreback · Pull Request #21185 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

ENH: add in extension dtype registry #21185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 3, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
review comments
  • Loading branch information
jreback committed Jun 29, 2018
commit 7dbd2f356c2937e53b41e3e08898c18756837d43
9 changes: 3 additions & 6 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,9 @@ ExtensionType Changes
- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore
the dtype has gained the ``construct_array_type`` (:issue:`21185`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'the dtype' -> ExtensionDtype has gained :meth:`ExtensionDtype.construct_array_type`

- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`)
- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)
-

.. _whatsnew_0240.api.other:

Expand Down Expand Up @@ -324,12 +327,6 @@ Reshaping
-
-

ExtensionArray
^^^^^^^^^^^^^^

- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)
-
-

Other
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def _from_sequence(cls, scalars, copy=False):
scalars : Sequence
Each element will be an instance of the scalar type for this
array, ``cls.dtype.type``.
copy : boolean, default True
copy : boolean, default False
if True, copy the underlying data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this exactly mean?
Often, the scalars objects are either not stored as is, and then copy or not copy makes no difference. Or if they are stored, they don't necessarily have a 'copy' method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same guarantee with copy we have already, if its True, then copy if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that to the docstring?

Returns
-------
Expand Down
2 changes: 0 additions & 2 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,6 @@ def construct_array_type(cls, array=None):
-------
type
"""
if array is None:
return cls
raise NotImplementedError

@classmethod
Expand Down
11 changes: 6 additions & 5 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,14 @@


class Registry(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without looking at the uses yet, could we simplify this a by just allowing string lookup? Ideally, registry would be a simple class holding a dict mapping dtype.name -> ExtensionDtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current code also supports finding the dtype for eg 'interval[int64]' and not just interval (so parametrized versions of the strings)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I suppose we want that to support .astype('interval[int64]'). That's fair...

""" Registry for dtype inference
"""
Registry for dtype inference

We can directly construct dtypes in pandas_dtypes if they are
a type; the registry allows us to register an extension dtype
to try inference from a string or a dtype class
The registry allows one to map a string repr of a extension
dtype to an extenstion dtype.

These are tried in order for inference.
Multiple extension types can be registered.
These are tried in order.

Examples
--------
Expand Down
A3E2
1 change: 1 addition & 0 deletions pandas/tests/dtypes/test_dtypes.py
8DE2
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,7 @@ def test_registry(dtype):
[('int64', None),
('interval', IntervalDtype()),
('interval[int64]', IntervalDtype()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you replace this (or also add) one that is not the default?
Eg ('interval[datetime64[ns]]', IntervalDtype('datetime64[ns]')),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(although that is maybe covered with the period or datetime64 one below. Is 'interval[datetime64[ns]]' already tested for IntervalDtype?)

('interval[datetime64[ns]]', IntervalDtype('datetime64[ns]')),
('category', CategoricalDtype()),
('period[D]', PeriodDtype('D')),
('datetime64[ns, US/Eastern]', DatetimeTZDtype('ns', 'US/Eastern'))])
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/extension/base/constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ def test_from_dtype(self, data):
dtype = data.dtype

expected = pd.Series(data)
result = pd.Series(np.array(data), dtype=dtype)
result = pd.Series(list(data), dtype=dtype)
self.assert_series_equal(result, expected)

result = pd.Series(np.array(data), dtype=str(dtype))
result = pd.Series(list(data), dtype=str(dtype))
self.assert_series_equal(result, expected)
0