8000 Merge branch 'deprecate_array_field_access' · numpy/numpy@9163993 · GitHub
[go: up one dir, main page]

Skip to content

Commit 9163993

Browse files
author
Mark Wiebe
committed
Merge branch 'deprecate_array_field_access'
2 parents 1b62bdf + affea42 commit 9163993

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+3167
-2414
lines changed

doc/neps/missing-data.rst

Lines changed: 167 additions & 42 deletions
235
Original file line numberDiff line numberDiff line change
@@ -225,27 +225,30 @@ provides a starting point.
225225

226226
For example,::
227227

228-
>>> np.array([1.0, 2.0, np.NA, 7.0], namasked=True)
229-
array([1., 2., NA, 7.], namasked=True)
230-
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
228+
>>> np.array([1.0, 2.0, np.NA, 7.0], maskna=True)
229+
array([1., 2., NA, 7.], maskna=True)
230+
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA')
231231
array([1., 2., NA, 7.], dtype='NA[<f8]')
232+
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f4]')
233+
array([1., 2., NA, 7.], dtype='NA[<f4]')
232234

233
produce arrays with values [1.0, 2.0, <inaccessible>, 7.0] /
234-
mask [Unmasked, Unmasked, Masked, Unmasked], and
235-
values [1.0, 2.0, <NA bitpattern>, 7.0] respectively.
236+
mask [Exposed, Exposed, Hidden, Exposed], and
237+
values [1.0, 2.0, <NA bitpattern>, 7.0] for the masked and
238+
NA dtype versions respectively.
236239

237240
It may be worth overloading the np.NA __call__ method to accept a dtype,
238241
returning a zero-dimensional array with a missing value of that dtype.
239242
Without doing this, NA printouts would look like::
240243

241-
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], namasked=True))
242-
array(NA, dtype='float64', namasked=True)
244+
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True))
245+
array(NA, dtype='float64', maskna=True)
243246
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]'))
244247
array(NA, dtype='NA[<f8]')
245248

246249
but with this, they could be printed as::
247250

248-
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], namasked=True))
251+
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True))
249252
NA('float64')
250253
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]'))
251254
NA('NA[<f8]')
@@ -274,12 +277,12 @@ from another view which doesn't have them masked. For example::
274277

275278
>>> a = np.array([1,2])
276279
>>> b = a.view()
277-
>>> b.flags.hasnamask = True
280+
>>> b.flags.hasmaskna = True
278281
>>> b
279-
array([1,2], namasked=True)
282+
array([1,2], maskna=True)
280283
>>> b[0] = np.NA
281284
>>> b
282-
array([NA,2], namasked=True)
285+
array([NA,2], maskna=True)
283286
>>> a
284287
array([1,2])
285288
>>> # The underlying number 1 value in 'a[0]' was untouched
@@ -351,10 +354,10 @@ Creating Masked Arrays
351354
There are two flags which indicate and control the nature of the mask
352355
used in masked arrays.
353356

354-
First is 'arr.flags.hasnamask', which is True for all masked arrays and
357+
First is 'arr.flags.hasmaskna', which is True for all masked arrays and
355358
may be set to True to add a mask to an array which does not have one.
356359

357-
Second is 'arr.flags.ownnamask', which is True if the array owns the
360+
Second is 'arr.flags.ownmaskna', which is True if the array owns the
358361
memory to the mask, and False if the array has no mask, or has a view
359362
into the mask of another array. If this is set to False in a masked
360363
array, the array will create a copy of the mask so that further modifications
@@ -402,8 +405,16 @@ New functions added to the ndarray are::
402405
array is unmasked and has the 'NA' part stripped from the
403406
parameterized type ('NA[f8]' becomes just 'f8').
404407

405-
arr.view(namasked=True)
406-
This is a shortcut for 'a = arr.view(); a.flags.hasnamask=True'.
408+
arr.view(maskna=True)
409+
This is a shortcut for
410+
>>> a = arr.view()
411+
>>> a.flags.hasmaskna = True
412+
413+
arr.view(ownmaskna=True)
414+
This is a shortcut for
415+
>>> a = arr.view()
416+
>>> a.flags.hasmaskna = True
417+
>>> a.flags.ownmaskna = True
407418

408419
Element-wise UFuncs With Missing Values
409420
=======================================
@@ -461,21 +472,21 @@ will also use the unmasked value counts for their calculations if
461472

462473
Some examples::
463474

464-
>>> a = np.array([1., 3., np.NA, 7.], namasked=True)
475+
>>> a = np.array([1., 3., np.NA, 7.], maskna=True)
465476
>>> np.sum(a)
466-
array(NA, dtype='<f8', masked=True)
477+
array(NA, dtype='<f8', maskna=True)
467478
>>> np.sum(a, skipna=True)
468479
11.0
469480
>>> np.mean(a)
470481
NA('<f8')
471482
>>> np.mean(a, skipna=True)
472483
3.6666666666666665
473484

474-
>>> a = np.array([np.NA, np.NA], dtype='f8', namasked=True)
485+
>>> a = np.array([np.NA, np.NA], dtype='f8', maskna=True)
475486
>>> np.sum(a, skipna=True)
476487
0.0
477488
>>> np.max(a, skipna=True)
478-
array(NA, dtype='<f8', namasked=True)
489+
array(NA, dtype='<f8', maskna=True)
479490
>>> np.mean(a)
480491
NA('<f8')
481492
>>> np.mean(a, skipna=True)
@@ -487,20 +498,24 @@ The functions 'np.any' and 'np.all' require some special consideration,
487498
just as logical_and and logical_or do. Maybe the best way to describe
488499
their behavior is through a series of examples::
489500

490-
>>> np.any(np.array([False, False, False], namasked=True))
501+
>>> np.any(np.array([False, False, False], maskna=True))
491502
False
492-
>>> np.any(np.array([False, NA, False], namasked=True))
503+
>>> np.any(np.array([False, np.NA, False], maskna=True))
493504
NA
494-
>>> np.any(np.array([False, NA, True], namasked=True))
505+
>>> np.any(np.array([False, np.NA, True], maskna=True))
495506
True
496507

497-
>>> np.all(np.array([True, True, True], namasked=True))
508+
>>> np.all(np.array([True, True, True], maskna=True))
498509
True
499-
>>> np.all(np.array([True, NA, True], namasked=True))
510+
>>> np.all(np.array([True, np.NA, True], maskna=True))
500511
NA
501-
>>> np.all(np.array([False, NA, True], namasked=True))
512+
>>> np.all(np.array([False, np.NA, True], maskna=True))
502513
False
503514

515+
Since 'np.any' is the reduction for 'np.logical_or', and 'np.all'
516+
is the reduction for 'np.logical_and', it makes sense for them to
517+
have a 'skipna=' parameter like the other similar reduction functions.
518+
504519
Parameterized NA Data Types
505520
===========================
506521

@@ -609,14 +624,124 @@ The important part of future-proofing the design is making sure
609624
the C ABI-level choices and the Python API-level choices have a natural
610625
transition to multi-NA support. Here is one way multi-NA support could look::
611626

612-
>>> a = np.array([np.NA(1), 3, np.NA(2)], namasked='multi')
627+
>>> a = np.array([np.NA(1), 3, np.NA(2)], maskna='multi')
613628
>>> np.sum(a)
614-
NA(1)
629+
NA(1, dtype='<i4')
615630
>>> np.sum(a[1:])
616-
NA(2)
617-
>>> b = np.array([np.NA, 2, 5], namasked=True)
631+
NA(2, dtype='<i4')
632+
>>> b = np.array([np.NA, 2, 5], maskna=True)
618633
>>> a + b
619-
array([NA(0), 5, NA(2)], namasked='multi')
634+
array([NA(0), 5, NA(2)], maskna='multi')
635+
636+
The design of this NEP does not distinguish between NAs that come
637+
from an NA mask or NAs that come from an NA dtype. Both of these get
638+
treated equivalently in computations, with masks dominating over NA
639+
dtypes.::
640+
641+
>>> a = np.array([np.NA, 2, 5], maskna=True)
642+
>>> b = np.array([1, np.NA, 7], dtype='NA')
643+
>>> a + b
644+
array([NA, NA, 12], maskna=True)
645+
646+
The multi-NA approach allows one to distinguish between these NAs,
647+
through assigning different payloads to the different types. If we
648+
extend the 'skipna=' parameter to accept a list of payloads in addition
649+
to True/False, one could do this::
650+
651+
>>> a = np.array([np.NA(1), 2, 5], maskna='multi')
652+
>>> b = np.array([1, np.NA(0), 7], dtype='NA[f4,multi]')
653+
>>> a + b
654+
array([NA(1), NA(0), 12], maskna='multi')
655+
>>> np.sum(a, skipna=0)
656+
NA(1, dtype='<i4')
657+
>>> np.sum(a, skipna=1)
658+
7
659+
>>> np.sum(b, skipna=0)
660+
8
661+
>>> np.sum(b, skipna=1)
662+
NA(0, dtype='<f4')
663+
>>> np.sum(a+b, skipna=(0,1))
664+
12
665+
666+
Differences with numpy.ma
667+
=========================
668+
669+
The computational model that numpy.ma uses does not strictly adhere to
670+
either the NA or the IGNORE model. This section exhibits some examples
671+
of how these differences affect simple computations. This information
672+
will be very important for helping users navigate between the systems,
673+
so a summary probably should be put in a table in the documentation.::
674+
675+
>>> a = np.random.random((3, 2))
676+
>>> mask = [[False, True], [True, True], [False, False]]
677+
>>> b1 = np.ma.masked_array(a, mask=mask)
678+
>>> b2 = a.view(maskna=True)
679+
>>> b2[mask] = np.NA
680+
681+
>>> b1
682+
masked_array(data =
683+
[[0.110804969841 --]
684+
[-- --]
685+
[0.955128477746 0.440430735546]],
686+
mask =
687+
[[False True]
688+
[ True True]
689+
[False False]],
690+
fill_value = 1e+20)
691+
>>> b2
692+
array([[0.110804969841, NA],
693+
[NA, NA],
694+
[0.955128477746, 0.440430735546]],
695+
maskna=True)
696+
697+
>>> b1.mean(axis=0)
698+
masked_array(data = [0.532966723794 0.440430735546],
699+
mask = [False False],
700+
fill_value = 1e+20)
701+
702+
>>> b2.mean(axis=0)
703+
array([NA, NA], dtype='<f8', maskna=True)
704+
>>> b2.mean(axis=0, skipna=True)
705+
array([0.532966723794 0.440430735546], maskna=True)
706+
707+
For functions like np.mean, when 'skipna=True', the behavior
708+
for all NAs is consistent with an empty array::
709+
710+
>>> b1.mean(axis=1)
711+
masked_array(data = [0.110804969841 -- 0.697779606646],
712+
mask = [False True False],
713+
fill_value = 1e+20)
714+
715+
>>> b2.mean(axis=1)
716+
array([NA, NA, 0.697779606646], maskna=True)
717+
>>> b2.mean(axis=1, skipna=True)
718+
RuntimeWarning: invalid value encountered in double_scalars
719+
array([0.110804969841, nan, 0.697779606646], maskna=True)
720+
721+
>>> np.mean([])
722+
RuntimeWarning: invalid value encountered in double_scalars
723+
nan
724+
725+
In particular, note that numpy.ma generally skips masked values,
726+
except returns masked when all the values are masked, while
727+
the 'skipna=' parameter returns zero when all the values are NA,
728+
to be consistent with the result of np.sum([])::
729+
730+
>>> b1[1]
731+
masked_array(data = [-- --],
732+
mask = [ True True],
733+
fill_value = 1e+20)
734+
>>> b2[1]
735+
array([NA, NA], dtype='<f8', maskna=True)
736+
>>> b1[1].sum()
737+
masked
738+
>>> b2[1].sum()
739+
NA(dtype='<f8')
740+
>>> b2[1].sum(skipna=True)
741+
0.0
742+
743+
>>> np.sum([])
744+
0.0
620745

621746
PEP 3118
622747
========
@@ -696,28 +821,28 @@ This gives us the following additions to the PyArrayObject::
696821
/*
697822
* Descriptor for the mask dtype.
698823
* If no mask: NULL
699-
* If mask : bool/structured dtype of bools
824+
* If mask : bool/uint8/structured dtype of mask dtypes
700825
*/
701-
PyArray_Descr *maskdescr;
826+
PyArray_Descr *maskna_descr;
702827
/*
703828
* Raw data buffer for mask. If the array has the flag
704-
* NPY_ARRAY_OWNNAMASK enabled, it owns this memory and
829+
* NPY_ARRAY_OWNMASKNA enabled, it owns this memory and
705830
* must call PyArray_free on it when destroyed.
706831
*/
707-
npy_uint8 *maskdata;
832+
npy_mask *maskna_data;
708833
/*
709834
* Just like dimensions and strides point into the same memory
710835
* buffer, we now just make the buffer 3x the nd instead of 2x
711836
* and use the same buffer.
712837
*/
713-
npy_intp *maskstrides;
838+
npy_intp *maskna_strides;
714839

715840
There are 2 (or 3) flags which must be added to the array flags::
716841

717-
NPY_ARRAY_HASNAMASK
718-
NPY_ARRAY_OWNNAMASK
842+
NPY_ARRAY_HASMASKNA
843+
NPY_ARRAY_OWNMASKNA
719844
/* To possibly add in a later revision */
720-
NPY_ARRAY_HARDNAMASK
845+
NPY_ARRAY_HARDMASKNA
721846

722847
To allow the easy detection of NA support, and whether an array
723848
has any missing values, we add the following functions:
@@ -807,7 +932,7 @@ NPY_ITER_ARRAYMASK
807932
can be only one such mask, and there cannot also be a virtual
808933
mask.
809934

810-
As a special case, if the flag NPY_ITER_USE_NAMASK is specified
935+
As a special case, if the flag NPY_ITER_USE_MASKNA is specified
811936
at the same time, the mask for the operand is used instead
812937
of the operand itself. If the operand has no mask but is
813938
based on an NA dtype, that mask exposed by the iterator converts
@@ -827,14 +952,14 @@ Iterator NA-array Features
827952

828953
We add several new per-operand flags:
829954

830-
NPY_ITER_USE_NAMASK
955+
NPY_ITER_USE_MASKNA
831956
If the operand has an NA dtype, an NA mask, or both, this adds a new
832957
virtual operand to the end of the operand list which iterates
833958
over the mask of the particular operand.
834959

835-
NPY_ITER_IGNORE_NAMASK
960+
NPY_ITER_IGNORE_MASKNA
836961
If an operand has an NA mask, by default the iterator will raise
837-
an exception unless NPY_ITER_USE_NAMASK is specified. This flag
962+
an exception unless NPY_ITER_USE_MASKNA is specified. This flag
838963
disables that check, and is intended for cases where one has first
839964
checked that all the elements in the array are not NA using the
840965
PyArray_ContainsNA function.

doc/source/reference/c-api.array.rst

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,18 @@ sub-types).
5050

5151
.. cfunction:: PyObject *PyArray_BASE(PyObject* arr)
5252

53+
This returns the base object of the array. In most cases, this
54+
means the object which owns the memory the array is pointing at.
55+
56+
If you are constructing an array using the C API, and specifying
57+
your own memory, you should use the function :cfunc:`PyArray_SetBaseObject`
58+
to set the base to an object which owns the memory.
59+
60+
If the :cdata:`NPY_ARRAY_UPDATEIFCOPY` flag is set, it has a different
61+
meaning, namely base is the array into which the current array will
62+
be copied upon destruction. This overloading of the base property
63+
for two functions is likely to change in a future version of NumPy.
64+
5365
.. cfunction:: PyArray_Descr *PyArray_DESCR(PyObject* arr)
5466

5567
.. cfunction:: int PyArray_FLAGS(PyObject* arr)
@@ -149,7 +161,7 @@ From scratch
149161
is not ``NULL``, then it is assumed to point to the memory to be
150162
used for the array and the *flags* argument is used as the new
151163
flags for the array (except the state of :cdata:`NPY_OWNDATA` and
152-
:cdata:`UPDATEIFCOPY` flags of the new array will be reset). In
164+
:cdata:`NPY_ARRAY_UPDATEIFCOPY` flags of the new array will be reset). In
153165
addition, if *data* is non-NULL, then *strides* can also be
154166
provided. If *strides* is ``NULL``, then the array strides are
155167
computed as C-style contiguous (default) or Fortran-style
@@ -266,6 +278,19 @@ From scratch
266278
increments of ``step``. Equivalent to arange( ``start``,
267279
``stop``, ``step``, ``typenum`` ).
268280

281+
.. cfunction:: int PyArray_SetBaseObject(PyArrayObject *arr, PyObject *obj)
282+
283+
If you construct an array by passing in your own memory buffer as
284+
a parameter, you need to 78FA set the array's `base` property to ensure
285+
the lifetime of the memory buffer is appropriate. This function
286+
accomplishes the task.
287+
288+
The return value is 0 on success, -1 on failure.
289+
290+
If the object provided is an array, this function traverses the
291+
chain of `base` pointers so that each array points to the owner
292+
of the memory directly. Once the base is set, it may not be changed
293+
to another value.
269294

270295
From other objects
271296
^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)
0