8000 ENH: Rewrite of array-coercion to support new dtypes by seberg · Pull Request #16200 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Rewrite of array-coercion to support new dtypes #16200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jul 9, 2020
Merged
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
b5dc1ed
WIP: Rework array coercion
seberg Mar 7, 2020
f5df08c
WIP: Further steps toward new coercion, start with making discovery p…
seberg Mar 16, 2020
63bb417
Close to the first working setup
seberg Mar 27, 2020
28c8b39
WIP: Some cleanup/changes?
seberg Mar 30, 2020
b204379
WIP: Make things work by using AdaptFlexibleDType (without obj) for now
seberg May 5, 2020
5bd5847
Use new mechanism for np.asarray, and hopefully get void right, har
seberg May 5, 2020
9e03d8d
First version mainly working
seberg May 6, 2020
efbe979
Further fixes, make max-dims reached more logical and enter obj arrays
seberg May 6, 2020
a552d2a
TST: Small test adjustments
seberg May 8, 2020
2cfcf56
WIP: Seems pretty good, but needs cleaning up...
seberg May 8, 2020
302813c
Smaller cleanups, better errors mainly?
seberg May 8, 2020
cec10fb
Fixup for scalar kind, and ensure OBJECT is special for assignment
seberg May 9, 2020
1eaca02
Use PyArray_Pack in a few other places
seberg May 10, 2020
3f5e4a2
Some micro-optimization tries (should probably be largely reverted)
seberg May 12, 2020
1896813
Optimize away filling all dims with -1 at the start
seberg May 12, 2020
c7e7dd9
Other smallre changes, some optimization related.
seberg May 12, 2020
60fa9b9
Small bug fixup and rebase on master
seberg May 28, 2020
e20dded
Fixups/comments for compiler warnings
seberg May 28, 2020
4e0029d
update some comments, remove outdated old code path
seberg May 28, 2020
ad31a32
Small fixups/comment changes
seberg May 29, 2020
ca09045
BUG: Make static declaration safe (may be an issue on msvc mostly)
seberg May 29, 2020
9ceeb97
Replace AdaptFlexibleDType with object and delete some datetime thing…
seberg May 30, 2020
4a04e89
Add somewhat disgusting hacks for datetime support
seberg Jun 1, 2020
08a4687
MAINT: Remove use of PyArray_GetParamsFromObject from PyArray_CopyObject
seberg Jun 3, 2020
a1ee25a
MAINT: Delete legacy dtype discovery
seberg Jun 4, 2020
1405a30
Allow returning NULL for dtype when there is no object to discover from
seberg Jun 4, 2020
a7c5a59
BUG: Smaller fixes in object-array parametric discovery
seberg Jun 10, 2020
75a728f
BUG: remove incorrect assert
seberg Jun 10, 2020
b09217c
BUG: When filling an array from the cache, store original for objects
seberg Jun 11, 2020
b28b2a1
BUG: Fix discovery for empty lists
seberg Jun 11, 2020
7a343c6
BUG: Add missing DECREF
seberg Jun 13, 2020
7d1489a
Fixups: Some smaller fixups and comments to ensure we have tests
seberg Jun 15, 2020
946edc8
BUG: Add missing error check
seberg Jun 15, 2020
002fa2f
BUG: Reorder dimension fix/check and promotion
seberg Jun 16, 2020
29f1515
BUG: Add missing cache free...
seberg Jun 16, 2020
ba0a6d0
BUG: Fixup for PyArray_Pack
seberg Jun 16, 2020
b3544a1
BUG: Fix use after free in PyArray_CopyObject
seberg Jun 16, 2020
bcd3320
BUG: Need to set the base field apparently and swap promotion
seberg Jun 16, 2020
454d785
MAINT: Use flag to indicate that dtype discovery is not necessary
seberg Jun 16, 2020
68cd028
MAINT: Fixups (some based on new tests), almost finished
seberg Jun 16, 2020
1035c3f
MAINT: Use macros/functions instead of direct slot access
seberg Jun 16, 2020
e30cbfb
MAINT: Delete PyArray_AssignFromSequence
seberg Jun 18, 2020
56c63d8
MAINT: Undo change of how 0-D array-likes are handled as scalars
seberg Jun 18, 2020
605588c
MAINT: Undo some header changes...
seberg Jun 18, 2020
4eb9cfd
MAINT: Try to clean up headers a bit
seberg Jun 18, 2020
4ac514f
TST: Add test for too-deep non-object deprecation
seberg Jun 18, 2020
8a7f0e6
MAINt: Add assert for an unreachable exception path
seberg Jun 18, 2020
7012ef7
TST: Adapt coercion-tests to the new situation
seberg Jun 19, 2020
3ccf696
DOC: Add release notes for array-coercion changes
seberg Jun 19, 2020
6ff4d48
MAINT: Remove weakref from mapping (for now) and rename
seberg Jun 24, 2020
e3f091e
Update numpy/core/src/multiarray/array_coercion.c
seberg Jun 25, 2020
4fe0ad2
MAINT: Put a hack in place to allow datetime64 -> string assignment w…
seberg Jun 25, 2020
d39953c
Update doc/release/upcoming_changes/16200.compatibility.rst
seberg Jun 25, 2020
b36750b
TST: datetime64 test_scalar_coercion does not fail anymore
seberg Jun 25, 2020
0f78129
Update doc/release/upcoming_changes/16200.compatibility.rst
mattip Jun 30, 2020
aee13e0
DOC,STY: Use bitshift intsead of powers of two and fix comments
seberg Jun 30, 2020
22ee971
TST: Add test for astype to stringlength tests
seberg Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
MAINT: Use flag to indicate that dtype discovery is not necessary
If the user already set the descriptor, within the recursive call
`*out_descr` is simply set correctly ahead of time.
  • Loading branch information
seberg committed Jul 8, 2020
commit 454d785edb05b0a35d032c568d81c6eaa7104c34
82 changes: 39 additions & 43 deletions numpy/core/src/multiarray/array_coercion.c
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ enum _dtype_discovery_flags {
DISCOVER_STRINGS_AS_SEQUENCES = 8,
DISCOVER_TUPLES_AS_ELEMENTS = 16,
MAX_DIMS_WAS_REACHED = 32,
DESCRIPTOR_WAS_SET = 64,
};


Expand Down Expand Up @@ -387,22 +388,14 @@ cast_descriptor_to_fixed_dtype(
* known scalar. Can be `NULL` indicating no known type.
* @param obj The Python scalar object. At the time of calling this function
* it must be known that `obj` should represent a scalar.
* @param requested_descr The requested descriptor or NULL, if not NULL
* it is returned unmodified.
*/
static NPY_INLINE PyArray_Descr *
find_scalar_descriptor(
PyArray_DTypeMeta *fixed_DType, PyArray_DTypeMeta *DType,
PyObject *obj, PyArray_Descr *requested_descr)
PyObject *obj)
{
PyArray_Descr *descr;

if (requested_descr != NULL) {
/* We simply assume that this is correct and continue. */
Py_INCREF(requested_descr);
return requested_descr;
}

if (DType == NULL && fixed_DType == NULL) {
/* No known DType and no fixed one means we go to object. */
return PyArray_DescrFromType(NPY_OBJECT);
Expand Down Expand Up @@ -668,17 +661,12 @@ npy_free_coercion_cache(coercion_cache_obj *next) {
* @param flags dtype discover flags to signal failed promotion.
* @return -1 on error, 0 on success.
*/
static int
static NPY_INLINE int
handle_promotion(PyArray_Descr **out_descr, PyArray_Descr *descr,
PyArray_Descr *requested_descr, enum _dtype_discovery_flags *flags)
enum _dtype_discovery_flags *flags)
{
if (requested_descr != NULL) {
/*
* If the user fixed a descriptor, do not promote, this will just
* error during assignment if necessary.
*/
return 0;
}
assert(!(*flags & DESCRIPTOR_WAS_SET));

if (*out_descr == NULL) {
Py_INCREF(descr);
*out_descr = descr;
Expand All @@ -705,17 +693,14 @@ handle_promotion(PyArray_Descr **out_descr, PyArray_Descr *descr,
* @param out_shape The discovered output shape, will be filled
* @param coercion_cache The coercion cache object to use.
* @param DType the DType class that should be used, or NULL, if not provided.
* @param requested_descr The dtype instance passed in by the user, this is
* passed to array-likes, and otherwise prevents any form of promotion
* (to avoid errors).
* @param flags used signal that this is a ragged array, used internally and
* can be expanded if necessary.
*/
static NPY_INLINE int
handle_scalar(
PyObject *obj, int curr_dims, int *max_dims,
PyArray_Descr **out_descr, npy_intp *out_shape,
PyArray_DTypeMeta *fixed_DType, PyArray_Descr *requested_descr,
PyArray_DTypeMeta *fixed_DType,
enum _dtype_discovery_flags *flags, PyArray_DTypeMeta *DType)
{
PyArray_Descr *descr;
Expand All @@ -725,12 +710,16 @@ handle_scalar(
*flags |= FOUND_RAGGED_ARRAY;
return *max_dims;
}
if (*flags & DESCRIPTOR_WAS_SET) {
/* no need to do any promotion */
return *max_dims;
}
/* This is a scalar, so find the descriptor */
descr = find_scalar_descriptor(fixed_DType, DType, obj, requested_descr);
descr = find_scalar_descriptor(fixed_DType, DType, obj);
if (descr == NULL) {
return -1;
}
if (handle_promotion(out_descr, descr, requested_descr, flags) < 0) {
if (handle_promotion(out_descr, descr, flags) < 0) {
Py_DECREF(descr);
return -1;
}
Expand Down Expand Up @@ -797,7 +786,7 @@ find_descriptor_from_array(
}
int flat_max_dims = 0;
if (handle_scalar(elem, 0, &flat_max_dims, out_descr,
NULL, DType, NULL, &flags, item_DType) < 0) {
NULL, DType, &flags, item_DType) < 0) {
Py_DECREF(iter);
Py_DECREF(elem);
Py_XDECREF(item_DType);
Expand Down Expand Up @@ -893,8 +882,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
PyObject *obj, int curr_dims, int max_dims, PyArray_Descr**out_descr,
npy_intp out_shape[NPY_MAXDIMS],
coercion_cache_obj ***coercion_cache_tail_ptr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to take a deeper look here, but I couldn't understand the purpose of coercion_cache. Having some comments about coercion cache here would be great!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, the cache is consumed by PyArray_AssignFromCache, I should probably just reference that function, since it is the only consumer. There are few cases were we do not actually need the cache, that could be an improvement to allow passing in NULL for the cache to indicate not filling it up. It should be very simple (also later), will mean a bit of changes to correct for refcounting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to the function with some details (and expanded the details in PyArray_DiscoverDTypeAndShape) admittedly, you probably have to read both, let me know if you have thoughts on more explanations!

PyArray_DTypeMeta *fixed_DType, PyArray_Descr *requested_descr,
enum _dtype_discovery_flags *flags)
PyArray_DTypeMeta *fixed_DType, enum _dtype_discovery_flags *flags)
{
PyArrayObject *arr = NULL;
PyObject *seq;
Expand Down Expand Up @@ -931,7 +919,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
else {
max_dims = handle_scalar(
obj, curr_dims, &max_dims, out_descr, out_shape, fixed_DType,
requested_descr, flags, DType);
flags, DType);
Py_DECREF(DType);
return max_dims;
}
Expand All @@ -946,6 +934,11 @@ PyArray_DiscoverDTypeAndShape_Recursive(
Py_INCREF(arr);
}
else {
PyArray_Descr *requested_descr = NULL;
if (*flags & DESCRIPTOR_WAS_SET) {
/* __array__ may be passed the requested descriptor if provided */
requested_descr = *out_descr;
}
arr = (PyArrayObject *)_array_from_array_like(obj,
requested_descr, 0, NULL);
if (arr == NULL) {
Expand Down Expand Up @@ -981,7 +974,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
return max_dims;
}

if (requested_descr != NULL) {
if (*flags & DESCRIPTOR_WAS_SET) {
return max_dims;
}
/*
Expand All @@ -999,7 +992,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
/* object array with no elements, no need to promote/adjust. */
return max_dims;
}
if (handle_promotion(out_descr, cast_descr, requested_descr, flags) < 0) {
if (handle_promotion(out_descr, cast_descr, flags) < 0) {
Py_DECREF(cast_descr);
return -1;
}
Expand All @@ -1022,7 +1015,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
PyErr_Clear();
max_dims = handle_scalar(
obj, curr_dims, &max_dims, out_descr, out_shape, fixed_DType,
requested_descr, flags, NULL);
flags, NULL);
if (is_sequence) {
/* Flag as ragged or too deep array */
// TODO: Add test exercising this path (may need to add to cache)
Expand All @@ -1046,7 +1039,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
PyErr_Clear();
max_dims = handle_scalar(
obj, curr_dims, &max_dims, out_descr, out_shape, fixed_DType,
requested_descr, flags, NULL);
flags, NULL);
return max_dims;
}
return -1;
Expand Down Expand Up @@ -1076,7 +1069,7 @@ PyArray_DiscoverDTypeAndShape_Recursive(
max_dims = PyArray_DiscoverDTypeAndShape_Recursive(
objects[i], curr_dims + 1, max_dims,
out_descr, out_shape, coercion_cache_tail_ptr, fixed_DType,
requested_descr, flags);
flags);

if (max_dims < 0) {
return -1;
Expand Down Expand Up @@ -1130,26 +1123,30 @@ PyArray_DiscoverDTypeAndShape(
{
coercion_cache_obj **coercion_cache_head = coercion_cache;
*coercion_cache = NULL;
enum _dtype_discovery_flags flags = 0;

/*
* Support a passed in descriptor (but only if nothing was specified).
*/
assert(*out_descr == NULL || fixed_DType == NULL);
/* Validate input of requested descriptor and DType */
if (fixed_DType != NULL) {
assert(PyObject_TypeCheck(
(PyObject *)fixed_DType, (PyTypeObject *)&PyArrayDTypeMeta_Type));
}

if (requested_descr != NULL) {
assert(fixed_DType == NPY_DTYPE(requested_descr));
/* The output descriptor must be the input. */
Py_INCREF(requested_descr);
*out_descr = requested_descr;
flags |= DESCRIPTOR_WAS_SET;
}
/*
* For easier support of legacy behaviour we support a passed in output
* when no descriptor is already defined.
*/
assert(*out_descr == NULL || fixed_DType == NULL);

/*
* Call the recursive function, the setup for this may need expanding
* to handle caching better.
*/
enum _dtype_discovery_flags flags = 0;

/* Legacy discovery flags */
if (requested_descr != NULL) {
Expand All @@ -1167,7 +1164,7 @@ PyArray_DiscoverDTypeAndShape(

int ndim = PyArray_DiscoverDTypeAndShape_Recursive(
obj, 0, max_dims, out_descr, out_shape, &coercion_cache,
fixed_DType, requested_descr, &flags);
fixed_DType, &flags);
if (ndim < 0) {
goto fail;
}
Expand Down Expand Up @@ -1266,9 +1263,8 @@ PyArray_DiscoverDTypeAndShape(
/* We could check here for max-ndims being reached as well */

if (requested_descr != NULL) {
/* The user had given a specific one, make sure it is the output one */
Py_INCREF(requested_descr);
Py_XSETREF(*out_descr, requested_descr);
/* descriptor was provided, we did not accidentally change it */
assert(*out_descr == requested_descr);
}
else if (NPY_UNLIKELY(*out_descr == NULL)) {
/*
Expand Down
0