8000 ENH: Rewrite of array-coercion to support new dtypes by seberg · Pull Request #16200 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Rewrite of array-coercion to support new dtypes #16200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jul 9, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
b5dc1ed
WIP: Rework array coercion
seberg Mar 7, 2020
f5df08c
WIP: Further steps toward new coercion, start with making discovery p…
seberg Mar 16, 2020
63bb417
Close to the first working setup
seberg Mar 27, 2020
28c8b39
WIP: Some cleanup/changes?
seberg Mar 30, 2020
b204379
WIP: Make things work by using AdaptFlexibleDType (without obj) for now
seberg May 5, 2020
5bd5847
Use new mechanism for np.asarray, and hopefully get void right, har
seberg May 5, 2020
9e03d8d
First version mainly working
seberg May 6, 2020
efbe979
Further fixes, make max-dims reached more logical and enter obj arrays
seberg May 6, 2020
a552d2a
TST: Small test adjustments
seberg May 8, 2020
2cfcf56
WIP: Seems pretty good, but needs cleaning up...
seberg May 8, 2020
302813c
Smaller cleanups, better errors mainly?
seberg May 8, 2020
cec10fb
Fixup for scalar kind, and ensure OBJECT is special for assignment
seberg May 9, 2020
1eaca02
Use PyArray_Pack in a few other places
seberg May 10, 2020
3f5e4a2
Some micro-optimization tries (should probably be largely reverted)
seberg May 12, 2020
1896813
Optimize away filling all dims with -1 at the start
seberg May 12, 2020
c7e7dd9
Other smallre changes, some optimization related.
seberg May 12, 2020
60fa9b9
Small bug fixup and rebase on master
seberg May 28, 2020
e20dded
Fixups/comments for compiler warnings
seberg May 28, 2020
4e0029d
update some comments, remove outdated old code path
seberg May 28, 2020
ad31a32
Small fixups/comment changes
seberg May 29, 2020
ca09045
BUG: Make static declaration safe (may be an issue on msvc mostly)
seberg May 29, 2020
9ceeb97
Replace AdaptFlexibleDType with object and delete some datetime thing…
seberg May 30, 2020
4a04e89
Add somewhat disgusting hacks for datetime support
seberg Jun 1, 2020
08a4687
MAINT: Remove use of PyArray_GetParamsFromObject from PyArray_CopyObject
seberg Jun 3, 2020
a1ee25a
MAINT: Delete legacy dtype discovery
seberg Jun 4, 2020
1405a30
Allow returning NULL for dtype when there is no object to discover from
seberg Jun 4, 2020
a7c5a59
BUG: Smaller fixes in object-array parametric discovery
seberg Jun 10, 2020
75a728f
BUG: remove incorrect assert
seberg Jun 10, 2020
b09217c
BUG: When filling an array from the cache, store original for objects
seberg Jun 11, 2020
b28b2a1
BUG: Fix discovery for empty lists
seberg Jun 11, 2020
7a343c6
BUG: Add missing DECREF
seberg Jun 13, 2020
7d1489a
Fixups: Some smaller fixups and comments to ensure we have tests
seberg Jun 15, 2020
946edc8
BUG: Add missing error check
seberg Jun 15, 2020
002fa2f
BUG: Reorder dimension fix/check and promotion
seberg Jun 16, 2020
29f1515
BUG: Add missing cache free...
seberg Jun 16, 2020
ba0a6d0
BUG: Fixup for PyArray_Pack
seberg Jun 16, 2020
b3544a1
BUG: Fix use after free in PyArray_CopyObject
seberg Jun 16, 2020
bcd3320
BUG: Need to set the base field apparently and swap promotion
seberg Jun 16, 2020
454d785
MAINT: Use flag to indicate that dtype discovery is not necessary
seberg Jun 16, 2020
68cd028
MAINT: Fixups (some based on new tests), almost finished
seberg Jun 16, 2020
1035c3f
MAINT: Use macros/functions instead of direct slot access
seberg Jun 16, 2020
e30cbfb
MAINT: Delete PyArray_AssignFromSequence
seberg Jun 18, 2020
56c63d8
MAINT: Undo change of how 0-D array-likes are handled as scalars
seberg Jun 18, 2020
605588c
MAINT: Undo some header changes...
seberg Jun 18, 2020
4eb9cfd
MAINT: Try to clean up headers a bit
seberg Jun 18, 2020
4ac514f
TST: Add test for too-deep non-object deprecation
seberg Jun 18, 2020
8a7f0e6
MAINt: Add assert for an unreachable exception path
seberg Jun 18, 2020
7012ef7
TST: Adapt coercion-tests to the new situation
seberg Jun 19, 2020
3ccf696
DOC: Add release notes for array-coercion changes
seberg Jun 19, 2020
6ff4d48
MAINT: Remove weakref from mapping (for now) and rename
seberg Jun 24, 2020
e3f091e
Update numpy/core/src/multiarray/array_coercion.c
seberg Jun 25, 2020
4fe0ad2
MAINT: Put a hack in place to allow datetime64 -> string assignment w…
seberg Jun 25, 2020
d39953c
Update doc/release/upcoming_changes/16200.compatibility.rst
seberg Jun 25, 2020
b36750b
TST: datetime64 test_scalar_coercion does not fail anymore
seberg Jun 25, 2020
0f78129
Update doc/release/upcoming_changes/16200.compatibility.rst
mattip Jun 30, 2020
aee13e0
DOC,STY: Use bitshift intsead of powers of two and fix comments
seberg Jun 30, 2020
22ee971
TST: Add test for astype to stringlength tests
seberg Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixups/comments for compiler warnings
  • Loading branch information
seberg committed Jul 8, 2020
commit e20dded7c3bf28c5767c645f074920d8017eeba9
5 changes: 5 additions & 0 deletions numpy/core/include/numpy/ndarraytypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1547,6 +1547,11 @@ PyArray_GETITEM(const PyArrayObject *arr, const char *itemptr)
(void *)itemptr, (PyArrayObject *)arr);
}

/*
* SETITEM should only be used if it is known that the value is a scalar
* and of a type understood by the arrays dtype.
* Use `PyArray_Pack` if the value may be of a different dtype.
*/
static NPY_INLINE int
PyArray_SETITEM(PyArrayObject *arr, char *itemptr, PyObject *v)
{
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/abstractdtypes.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ discover_descriptor_from_pyint(
}

unsigned long long uvalue = PyLong_AsUnsignedLongLong(obj);
if (error_converting(uvalue)){
if (uvalue == (unsigned long long)-1 && PyErr_Occurred()){
PyErr_Clear();
}
else {
Expand Down
19 changes: 17 additions & 2 deletions numpy/core/src/multiarray/array_coercion.c
Original file line number Diff line number Diff line change
Expand Up @@ -451,13 +451,28 @@ find_scalar_descriptor(


/**
* Assign a single element in an array from a python value
* Assign a single element in an array from a python value.
*
* The dtypes SETITEM should only be trusted to generally do the right
* thing if something is known to be a scalar *and* is of a python type known
* to the DType (which should include all basic Python math types), but in
* general a cast may be necessary.
* This function handles the cast, which is for example hit when assigning
* a float128 to complex128.
*
* At this time, this function does not support arrays (historically we
* mainly supported arrays through `__float__()`, etc. Such support should
* possibly be added (although in some cases we know that the input is not
* an array).
*
* @param descr
* @param item
* @param value
* @return 0 on success -1 on failure.
*/
/*
* TODO: This function should possibly be public API.
*/
NPY_NO_EXPORT int
PyArray_Pack(PyArray_Descr *descr, char *item, PyObject *value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am quite sure this is not a very important use case , but doesnt look like it handles cases like

x[0] = np.datetime64("2020-01-01") 

where x is a string array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you elaborate on this (or maybe I just need to step through the code)? This should hit this code with the changes, if it does not, I missed a case where dtype->setitem should be replaced with PyArray_Pack(), since thats now the defined way to put a scalar into an array (I find it pretty fun that we never had a true correct way to do it).

I left out a few cases, on purpose, they seemed either corner case, or have well defined input, but this should not be one of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this code change :

>>> x = np.array(["qwdqwdqwd", "wqdqwd"])
>>> x[0] = np.datetime64("2020-01-01")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: The string provided for NumPy ISO datetime formatting was too short, with length 9

on numpy 1.18.5

>>> import numpy as np
>>> x = np.array(["qwdqwdqwd", "wqdqwd"])
>>> x[0] = np.datetime64("2020-01-01")
>>> x
array(['2020-01-0', 'wqdqwd'], dtype='<U9')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, didnt really pay attention the error message. I guess this is considered one of the fixed bugs of this PR !?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reuslt is now consistent with `np.datetime64("2020-01-01").astype("U5"), whether the casting behaviour is ideal, is another question...

I guess this is reversed to the float64(NaN) case though, because its actually more strict now, hmmmm. I could fix that, by defining that a datetime64 is a type known to string (and thus handled by string->setitemm which calls str(...) itself. Or just do that for all NumPy scalars -> string conversions, since at least the __str__ dunder is pretty well defined...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I have to relax it for np.array(np.datetime64("2020-01-01"), "U9") as well...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reuslt is now consistent with `np.datetime64("2020-01-01").astype("U5"), whether the casting behaviour is ideal, is another question...

aah got it.

I could fix that, by defining that a datetime64 is a type known to string (and thus handled by string->setitemm which calls str(...) itself.

sorry not familiar with all the code here. how will this change look on PyArray_pack side ? currently it calls DATETIME_setitem in this line

      if (tmp_descr->f->setitem(value, data, &arr_fields) < 0) {

will this change somehow ?

Or just do that for all NumPy scalars -> string conversions, since at least the str dunder is pretty well defined...

This sounds reasonable to me. So np.datetime64("2020-01-01").astype("U5") will also not error now right ? not sure about the backward compatibility and deprecation cycle though, since this is a behavior change.

Copy link
Member Author
@seberg seberg Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed that change, I am starting to be unwilling to squabble about whether or not this is small enough to just document as a change. Normally, I would say deprecate, but I am starting to wear down, there are too many of these tiny things making it hard to focus on the big picture. It is super difficult to fix the big picture while retaining behaviour that was never well defined :/.

(not to say it isn't important to undo it for now to be on the safe side)

Sorry: And as the explentation. For PyArray_Pack() the change in the code path is that it will hit:

String->is_known_scalar_type(datetime64)

and unlike before it now returns True and thus it calls String->setitem(datetime64), instead of using

Datetime->setitem(...).astype("S6")

internally (i.e. using the Datetime setitem to convert the scalar into an "array" (so to say)).

Copy link
Member
@anirudh2290 anirudh2290 Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, this sounds good to me, and agreed we should get rid of the inconsistency with np.datetime64("2020-01-01").astype("U5") in the future.

{
Expand Down Expand Up @@ -593,7 +608,7 @@ static coercion_cache_obj *_coercion_cache_cache[COERCION_CACHE_CACHE_SIZE];
/*
* Steals a reference to the object.
*/
NPY_NO_EXPORT NPY_INLINE int
static NPY_INLINE int
npy_new_coercion_cache(
PyObject *converted_obj, PyObject *arr_or_sequence, npy_bool sequence,
coercion_cache_obj ***next_ptr, int ndim)
Expand Down
4 changes: 2 additions & 2 deletions numpy/core/src/multiarray/item_selection.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
#include "npy_binsearch.h"
#include "alloc.h"
#include "arraytypes.h"

#include "array_coercion.h"


static NPY_GCC_OPT_3 NPY_INLINE int
Expand Down Expand Up @@ -2629,5 +2629,5 @@ PyArray_MultiIndexSetItem(PyArrayObject *self, const npy_intp *multi_index,
data += ind * strides[idim];
}

return PyArray_SETITEM(self, data, obj);
return PyArray_Pack(PyArray_DESCR(self), data, obj);
}
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/iterators.c
Original file line number Diff line number Diff line change
Expand Up @@ -842,7 +842,7 @@ iter_ass_subscript(PyArrayIterObject *self, PyObject *ind, PyObject *val)
goto finish;
}
PyArray_ITER_GOTO1D(self, start);
retval = type->f->setitem(val, self->dataptr, self->ao);
retval = PyArray_Pack(PyArray_DESCR(self->ao), self->dataptr, val);
PyArray_ITER_RESET(self);
if (retval < 0) {
PyErr_SetString(PyExc_ValueError,
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/mapping.c
Original file line number Diff line number Diff line change
Expand Up @@ -1755,7 +1755,7 @@ array_assign_item(PyArrayObject *self, Py_ssize_t i, PyObject *op)
if (get_item_pointer(self, &item, indices, 1) < 0) {
return -1;
}
if (PyArray_SETITEM(self, item, op) < 0) {
if (PyArray_Pack(PyArray_DESCR(self), item, op) < 0) {
return -1;
}
}
Expand Down
0