8000 ENH: Rewrite of array-coercion to support new dtypes by seberg · Pull Request #16200 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Rewrite of array-coercion to support new dtypes #16200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jul 9, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
b5dc1ed
WIP: Rework array coercion
seberg Mar 7, 2020
f5df08c
WIP: Further steps toward new coercion, start with making discovery p…
seberg Mar 16, 2020
63bb417
Close to the first working setup
seberg Mar 27, 2020
28c8b39
WIP: Some cleanup/changes?
seberg Mar 30, 2020
b204379
WIP: Make things work by using AdaptFlexibleDType (without obj) for now
seberg May 5, 2020
5bd5847
Use new mechanism for np.asarray, and hopefully get void right, har
seberg May 5, 2020
9e03d8d
First version mainly working
seberg May 6, 2020
efbe979
Further fixes, make max-dims reached more logical and enter obj arrays
seberg May 6, 2020
a552d2a
TST: Small test adjustments
seberg May 8, 2020
2cfcf56
WIP: Seems pretty good, but needs cleaning up...
seberg May 8, 2020
302813c
Smaller cleanups, better errors mainly?
seberg May 8, 2020
cec10fb
Fixup for scalar kind, and ensure OBJECT is special for assignment
seberg May 9, 2020
1eaca02
Use PyArray_Pack in a few other places
seberg May 10, 2020
3f5e4a2
Some micro-optimization tries (should probably be largely reverted)
seberg May 12, 2020
1896813
Optimize away filling all dims with -1 at the start
seberg May 12, 2020
c7e7dd9
Other smallre changes, some optimization related.
seberg May 12, 2020
60fa9b9
Small bug fixup and rebase on master
seberg May 28, 2020
e20dded
Fixups/comments for compiler warnings
seberg May 28, 2020
4e0029d
update some comments, remove outdated old code path
seberg May 28, 2020
ad31a32
Small fixups/comment changes
seberg May 29, 2020
ca09045
BUG: Make static declaration safe (may be an issue on msvc mostly)
seberg May 29, 2020
9ceeb97
Replace AdaptFlexibleDType with object and delete some datetime thing…
seberg May 30, 2020
4a04e89
Add somewhat disgusting hacks for datetime support
seberg Jun 1, 2020
08a4687
MAINT: Remove use of PyArray_GetParamsFromObject from PyArray_CopyObject
seberg Jun 3, 2020
a1ee25a
MAINT: Delete legacy dtype discovery
seberg Jun 4, 2020
1405a30
Allow returning NULL for dtype when there is no object to discover from
seberg Jun 4, 2020
a7c5a59
BUG: Smaller fixes in object-array parametric discovery
seberg Jun 10, 2020
75a728f
BUG: remove incorrect assert
seberg Jun 10, 2020
b09217c
BUG: When filling an array from the cache, store original for objects
seberg Jun 11, 2020
b28b2a1
BUG: Fix discovery for empty lists
seberg Jun 11, 2020
7a343c6
BUG: Add missing DECREF
seberg Jun 13, 2020
7d1489a
Fixups: Some smaller fixups and comments to ensure we have tests
seberg Jun 15, 2020
946edc8
BUG: Add missing error check
seberg Jun 15, 2020
002fa2f
BUG: Reorder dimension fix/check and promotion
seberg Jun 16, 2020
29f1515
BUG: Add missing cache free...
seberg Jun 16, 2020
ba0a6d0
BUG: Fixup for PyArray_Pack
seberg Jun 16, 2020
b3544a1
BUG: Fix use after free in PyArray_CopyObject
seberg Jun 16, 2020
bcd3320
BUG: Need to set the base field apparently and swap promotion
seberg Jun 16, 2020
454d785
MAINT: Use flag to indicate that dtype discovery is not necessary
seberg Jun 16, 2020
68cd028
MAINT: Fixups (some based on new tests), almost finished
seberg Jun 16, 2020
1035c3f
MAINT: Use macros/functions instead of direct slot access
seberg Jun 16, 2020
e30cbfb
MAINT: Delete PyArray_AssignFromSequence
seberg Jun 18, 2020
56c63d8
MAINT: Undo change of how 0-D array-likes are handled as scalars
seberg Jun 18, 2020
605588c
MAINT: Undo some header changes...
seberg Jun 18, 2020
4eb9cfd
MAINT: Try to clean up headers a bit
seberg Jun 18, 2020
4ac514f
TST: Add test for too-deep non-object deprecation
seberg Jun 18, 2020
8a7f0e6
MAINt: Add assert for an unreachable exception path
seberg Jun 18, 2020
7012ef7
TST: Adapt coercion-tests to the new situation
seberg Jun 19, 2020
3ccf696
DOC: Add release notes for array-coercion changes
seberg Jun 19, 2020
6ff4d48
MAINT: Remove weakref from mapping (for now) and rename
seberg Jun 24, 2020
e3f091e
Update numpy/core/src/multiarray/array_coercion.c
seberg Jun 25, 2020
4fe0ad2
MAINT: Put a hack in place to allow datetime64 -> string assignment w…
seberg Jun 25, 2020
d39953c
Update doc/release/upcoming_changes/16200.compatibility.rst
seberg Jun 25, 2020
b36750b
TST: datetime64 test_scalar_coercion does not fail anymore
seberg Jun 25, 2020
0f78129
Update doc/release/upcoming_changes/16200.compatibility.rst
mattip Jun 30, 2020
aee13e0
DOC,STY: Use bitshift intsead of powers of two and fix comments
seberg Jun 30, 2020
22ee971
TST: Add test for astype to stringlength tests
seberg Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Replace AdaptFlexibleDType with object and delete some datetime thing…
… related

This fails two tests (but behaviour is better now). The string->datetime
unit discovery for string arrays is hard-coded now, and IMO should be
deprecated at some point, if necessary we can add a specific function
for it maybe?

Unless we really want to make this pattern a first class citizen (and
not special case object). But I am against that until someone finds a
better use-case...
  • Loading branch information
seberg committed Jul 8, 2020
commit 9ceeb97debefda7931ab8312d50f3cc4cb81c956
4 changes: 4 additions & 0 deletions numpy/core/src/multiarray/_datetime.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ create_datetime_dtype_with_unit(int type_num, NPY_DATETIMEUNIT unit);
NPY_NO_EXPORT PyArray_DatetimeMetaData *
get_datetime_metadata_from_dtype(PyArray_Descr *dtype);

NPY_NO_EXPORT int
find_string_array_datetime64_type(PyArrayObject *arr,
PyArray_DatetimeMetaData *meta);

/*
* Both type1 and type2 must be either NPY_DATETIME or NPY_TIMEDELTA.
* Applies the type promotion rules between the two types, returning
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/abstractdtypes.c
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ initialize_abstract_dtypes_and_map_others()
* Map str, bytes, and bool, for which we do not need abstract versions
* to the NumPy DTypes. This is done here using the `is_known_scalar_type`
* function.
* TODO: The `is_known_scalar_type` function was considered preliminary,
* TODO: The `is_known_scalar_type` function is considered preliminary,
* the same could be achieved e.g. with additional abstract DTypes.
*/
PyArray_DTypeMeta *dtype;
Expand Down
242 changes: 179 additions & 63 deletions numpy/core/src/multiarray/array_coercion.c
Original file line number Diff line number Diff line change
Expand Up @@ -127,10 +127,16 @@ _prime_global_pytype_to_type_dict()


/**
* Add a new mapping from a python type to the DType class. This assumes
* that the DType class is guaranteed to hold on the python type (this
* assumption is guaranteed).
* This function replaces ``_typenum_fromtypeobj``.
* Add a new mapping from a python type to the DType class.
*
* This assumes that the DType class is guaranteed to hold on the
* python type (this assumption is guaranteed).
* This functionality supercedes ``_typenum_fromtypeobj``.
*
* @param DType DType to map the python type to
* @param pytype Python type to map from
* @param userdef Whether or not it is user defined. We ensure that user
* defined scalars subclass from our scalars (for now).
*/
NPY_NO_EXPORT int
_PyArray_MapPyTypeToDType(
Expand Down Expand Up @@ -354,7 +360,7 @@ cast_descriptor_to_fixed_dtype(
if (fixed_DType->legacy && fixed_DType->parametric) {
/* Fallback to the old AdaptFlexibleDType logic for now */
PyArray_Descr *flex_dtype = PyArray_DescrFromType(fixed_DType->type_num);
return PyArray_AdaptFlexibleDType(NULL, descr, flex_dtype);
return PyArray_AdaptFlexibleDType(descr, flex_dtype);
}

PyErr_SetString(PyExc_NotImplementedError,
Expand Down Expand Up @@ -694,10 +700,10 @@ handle_scalar(
PyObject *obj, int curr_dims, int *max_dims,
PyArray_Descr **out_descr, npy_intp *out_shape,
PyArray_DTypeMeta *fixed_DType, PyArray_Descr *requested_descr,
enum _dtype_discovery_flags *flags,
PyArray_DTypeMeta *DType, PyArray_Descr *descr)
enum _dtype_discovery_flags *flags, PyArray_DTypeMeta *DType)
{
/* This is a scalar, so find the descriptor */
PyArray_Descr *descr;
descr = find_scalar_descriptor(fixed_DType, DType, obj, requested_descr);
if (descr == NULL) {
return -1;
Expand All @@ -717,6 +723,148 @@ handle_scalar(
}




/**
* Return the correct descriptor given an array object and a DType class.
*
* This is identical to casting the arrays descriptor/dtype to the new
* DType class
*
* @param arr The array object.
* @param DType The DType class to cast to (or NULL for convenience)
* @param out_descr The output descriptor will set. The result can be NULL
* when the array is of object dtype and has no elements.
*
* @return -1 on failure, 0 on success.
*/
static int
find_descriptor_from_array(
PyArrayObject *arr, PyArray_DTypeMeta *DType, PyArray_Descr **out_descr)
{
enum _dtype_discovery_flags flags = 0;
*out_descr = NULL;

if (NPY_UNLIKELY(DType != NULL && DType->parametric &&
PyArray_ISOBJECT(arr))) {
/*
* We have one special case, if (and only if) the input array is of
* object DType and the dtype is not fixed already but parametric.
* Then, we allow inspection of all elements, treating them as
* elements. We do this recursively, so nested 0-D arrays can work,
* but nested higher dimensional arrays will lead to an error.
*/
assert(DType->type_num != NPY_OBJECT); /* not parametric */

PyArrayIterObject *iter;
iter = (PyArrayIterObject *)PyArray_IterNew((PyObject *)arr);
if (iter == NULL) {
return -1;
}
int array_is_object = PyArray_ISOBJECT(arr);
while (iter->index < iter->size) {
/*
* TODO: We should only allow this for object arrays really,
* and it is slow for strings currently.
*/
PyObject *elem = PyArray_GETITEM(arr, iter->dataptr);
if (elem == NULL) {
elem = Py_None;
}
DType = discover_dtype_from_pyobject(elem, &flags, DType);
if (DType == (PyArray_DTypeMeta *)Py_None) {
Py_SETREF(DType, NULL);
}
int flat_max_dims = 0;
if (handle_scalar(elem, 0, &flat_max_dims, out_descr,
NULL, DType, NULL, &flags, DType) < 0) {
Py_DECREF(iter);
Py_XDECREF(DType);
return -1;
}
Py_XDECREF(DType);
PyArray_ITER_NEXT(iter);
}
Py_DECREF(iter);
}
else if (DType != NULL && NPY_UNLIKELY(DType->type_num == NPY_DATETIME) &&
PyArray_ISSTRING(arr)) {
/*
* TODO: This branch should be deprecated IMO, the workaround is
* to simply cast to the object to a string array, or we
* can create a special function for it, but I doubt it is
* necessary?
* Unless of course we actually want to support this kind of thing
* in general (not just for object dtype)...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion this should be useful!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could support it in coercion. In general, I think its nicer to just use types (e.g. float32 -> S32, etc.) without looking at values (that also gives type safety). For coercion, looking at values can be nice I guess, within ufuncs I am not sure. It feels like it just creates complexities that are probably unnecessary.
You can always create a small utility functions to do most of these things. But yes, I guess it is a plausible addition, but I am not keen rushing it :). I could make the comment less judgmental ;)

*/
PyArray_DatetimeMetaData meta;
meta.base = NPY_FR_GENERIC;
meta.num = 1;

if (find_string_array_datetime64_type(arr, &meta) < 0) {
return -1;
}
else {
*out_descr = create_datetime_dtype(NPY_DATETIME, &meta);
if (*out_descr == NULL) {
return -1;
}
}
}
else {
/*
* If this is not an object array figure out the dtype cast,
* or simply use the returned DType.
*/
*out_descr = cast_descriptor_to_fixed_dtype(
PyArray_DESCR(arr), DType);
if (*out_descr == NULL) {
return -1;
}
}
return 0;
}

/**
* Given a dtype or DType object, find the correct descriptor to cast the
* array to.
*
* This function is identical to normal casting using only the dtype, however,
* it supports inspecting the elements when the array has object dtype
* (and the given datatype describes a parametric DType class).
*
* @param arr
* @param dtype A dtype instance or class.
* @return A concrete dtype instance or NULL
*/
NPY_NO_EXPORT PyArray_Descr *
PyArray_AdaptDescriptorToArray(PyArrayObject *arr, PyObject *dtype)
{
/* If the requested dtype is flexible, adapt it */
PyArray_Descr *new_dtype;
PyArray_DTypeMeta *new_DType;
int res;

res= PyArray_ExtractDTypeAndDescriptor((PyObject *)dtype,
&new_dtype, &new_DType);
if (res < 0) {
return NULL;
}
if (new_dtype == NULL) {
res = find_descriptor_from_array(arr, new_DType, &new_dtype);
if (res < 0) {
Py_DECREF(new_DType);
return NULL;
}
if (new_dtype == NULL) {
/* This is an object array but contained no elements, use default */
new_dtype = new_DType->default_descr(new_DType);
}
}
return new_dtype;
}


NPY_NO_EXPORT int
PyArray_DiscoverDTypeAndShape_Recursive(
PyObject *obj, int curr_dims, int max_dims, PyArray_Descr**out_descr,
Expand All @@ -734,7 +882,6 @@ PyArray_DiscoverDTypeAndShape_Recursive(
* (which could fail and lead us to `object` dtype).
*/
PyArray_DTypeMeta *DType = NULL;
PyArray_Descr *descr = NULL;

if (NPY_UNLIKELY(*flags & DISCOVER_STRINGS_AS_SEQUENCES)) {
/*
Expand All @@ -761,7 +908,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
else {
max_dims = handle_scalar(
obj, curr_dims, &max_dims, out_descr, out_shape, fixed_DType,
requested_descr, flags, DType, descr);
requested_descr, flags, DType);
Py_DECREF(DType);
return max_dims;
}
Expand Down Expand Up @@ -803,60 +950,29 @@ PyArray_DiscoverDTypeAndShape_Recursive(
return max_dims;
}

if (NPY_UNLIKELY(fixed_DType != NULL && fixed_DType->parametric &&
requested_descr == NULL &&
PyArray_DESCR(arr)->type_num == NPY_OBJECT)) {
/*
* We have one special case, if (and only if) the input array is of
* object DType and the dtype is not fixed already but parametric.
* Then, we allow inspection of all elements, treating them as
* elements. We do this recursively, so nested 0-D arrays can work,
* but nested higher dimensional arrays will lead to an error.
*/
assert(fixed_DType->type_num != NPY_OBJECT);

PyArrayIterObject *iter;
iter = (PyArrayIterObject *)PyArray_IterNew((PyObject *)arr);
if (iter == NULL) {
return -1;
}
while (iter->index < iter->size) {
PyObject *elem = (*(PyObject **)(iter->dataptr));
if (elem == NULL) {
elem = Py_None;
}
DType = discover_dtype_from_pyobject(elem, flags, fixed_DType);
if (DType == (PyArray_DTypeMeta *)Py_None) {
Py_SETREF(DType, NULL);
}
int flat_max_dims = 0;
if (handle_scalar(elem, 0, &flat_max_dims, out_descr,
NULL, DType, NULL, flags, fixed_DType, NULL) < 0) {
Py_DECREF(iter);
Py_XDECREF(DType);
return -1;
}
Py_XDECREF(DType);
PyArray_ITER_NEXT(iter);
}
Py_DECREF(iter);
if (requested_descr != NULL) {
return max_dims;
}
else if (requested_descr == NULL) {
/*
* If this is not an object array figure out the dtype cast,
* or simply use the returned DType.
*/
descr = cast_descriptor_to_fixed_dtype(
PyArray_DESCR(arr), fixed_DType);
if (descr == NULL) {
return -1;
}
if (handle_promotion(out_descr, descr, requested_descr, flags) < 0) {
Py_ 10000 DECREF(descr);
return -1;
}
Py_DECREF(descr);
/*
* For arrays we may not just need to cast the dtype to the user
* provided fixed_DType. If this is an object array, the elements
* may need to be inspected individually.
* Note, this finds the descriptor of the array first and only then
* promotes here (different associativity).
*/
PyArray_Descr *cast_descr;
if (find_descriptor_from_array(arr, fixed_DType, &cast_descr) < 0) {
return -1;
}
if (cast_descr == NULL) {
/* object array with no elements, no need to promote/adjust. */
return max_dims;
}
if (handle_promotion(out_descr, cast_descr, requested_descr, flags) < 0) {
Py_DECREF(cast_descr);
return -1;
}
Py_DECREF(cast_descr);
return max_dims;
}

Expand All @@ -875,7 +991,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
PyErr_Clear();
max_dims = handle_scalar(
obj, curr_dims, &max_dims, out_descr, out_shape, fixed_DType,
requested_descr, flags, NULL, descr);
requested_descr, flags, NULL);
if (is_sequence) {
/* Flag as ragged or too deep array */
*flags |= FOUND_RAGGED_ARRAY;
Expand All @@ -898,7 +1014,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
PyErr_Clear();
max_dims = handle_scalar(
obj, curr_dims, &max_dims, out_descr, out_shape, fixed_DType,
requested_descr, flags, NULL, descr);
requested_descr, flags, NULL);
return max_dims;
}
return -1;
Expand Down
3 changes: 3 additions & 0 deletions numpy/core/src/multiarray/array_coercion.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ _PyArray_MapPyTypeToDType(
NPY_NO_EXPORT int
PyArray_Pack(PyArray_Descr *descr, char *item, PyObject *value);

NPY_NO_EXPORT PyArray_Descr *
PyArray_AdaptDescriptorToArray(PyArrayObject *arr, PyObject *dtype);

NPY_NO_EXPORT int
PyArray_DiscoverDTypeAndShape(
PyObject *obj, int max_dims,
Expand Down
Loading
0