8000 Adding isin function for multidimensional arrays by brsr · Pull Request #8423 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Adding isin function for multidimensional arrays #8423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 5, 2017
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
17faf5a
Adding isin function for multidimensional arrays
brsr Dec 28, 2016
34a40da
Fix comments on pull request
brsr Dec 31, 2016
2b4a81b
keep in1d mostly the same, changes in isin now
brsr Feb 15, 2017
db5e5fd
screwed up the whitespace
brsr Feb 15, 2017
f63cf31
docs, convert elements to array in case it isn't already
brsr Feb 15, 2017
a9bce34
Merge branch 'master' into master
charris Feb 22, 2017
f06ed40
Merge branch 'master' into master
charris Feb 22, 2017
0938763
Merge branch 'master' into master
brsr Mar 25, 2017
e10ee1e
removing extra line
brsr Mar 25, 2017
0ec089a
support iterables that aren't array_like
brsr Mar 27, 2017
ba10c98
it's hasattr not has_attr
brsr Mar 27, 2017
e72b686
check for __array__ instead
brsr Mar 27, 2017
818337d
replace list comprehension
brsr Mar 27, 2017
6ace52e
add comment
brsr Mar 27, 2017
7712179
Responding to seberg's comments
brsr Apr 8, 2017
a2c9b6c
Removing special handling for sets
brsr Apr 9, 2017
552a193
Renaming elements to element
brsr Apr 10, 2017
fa0b0be
eric-wieser's comments
brsr Apr 12, 2017
0ff6be4
Fixes to tests
brsr Apr 12, 2017
545df63
Docstrings, further expanding test
brsr Apr 13, 2017
41e5b0b
spacing
brsr Apr 13, 2017
521d517
Actual zero-d array
brsr Apr 13, 2017
8805bbb
More docstring changes
brsr Apr 13, 2017
4d3f67c
clean up function listing
brsr Apr 13, 2017
0395f39
Update 1.13.0-notes.rst
brsr Apr 13, 2017
3d809a6
Merge branch 'master' into master
brsr Apr 28, 2017
d22cafc
discouraged, not deprecated
brsr Apr 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/release/1.13.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ In an N-dimensional array, the user can now choose the axis along which to look
for duplicate N-1-dimensional elements using ``numpy.unique``. The original
behaviour is recovered if ``axis=None`` (default).

``isin`` function, improving on ``in1d``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The new function ``isin`` tests whether elements in an array are also present
in another array, preserving the shape of the first array. It builds on
the existing ``in1d`` routine.

``np.gradient`` now supports unevenly spaced data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Users can now specify a not-constant spacing for data.
Expand Down
1 change: 1 addition & 0 deletions doc/source/reference/routines.set.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Boolean operations

in1d
intersect1d
isin
setdiff1d
setxor1d
union1d
2 changes: 1 addition & 1 deletion numpy/add_newdocs.py
Original file line number Diff line number Diff line change
Expand Up @@ -1500,7 +1500,7 @@ def luf(lamdaexpr, *args, **kwargs):
Find the indices of elements of `x` that are in `goodvalues`.

>>> goodvalues = [3, 4, 7]
>>> ix = np.in1d(x.ravel(), goodvalues).reshape(x.shape)
>>> ix = np.isin(x, goodvalues)
>>> ix
array([[False, False, False],
[ True, True, False],
Expand Down
103 changes: 100 additions & 3 deletions numpy/lib/arraysetops.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
"""
Set operations for 1D numeric arrays based on sorting.
Set operations for arrays based on sorting.

:Contains:
ediff1d,
unique,
isin,
ediff1d,
intersect1d,
setxor1d,
in1d,
Expand Down Expand Up @@ -31,7 +32,7 @@

__all__ = [
'ediff1d', 'intersect1d', 'setxor1d', 'union1d', 'setdiff1d', 'unique',
'in1d'
'in1d', 'isin'
]


Expand Down Expand Up @@ -380,13 +381,17 @@ def setxor1d(ar1, ar2, assume_unique=False):
flag2 = flag[1:] == flag[:-1]
return aux[flag2]


def in1d(ar1, ar2, assume_unique=False, invert=False):
"""
Test whether each element of a 1-D array is also present in a second array.

Returns a boolean array the same length as `ar1` that is True
where an element of `ar1` is in `ar2` and False otherwise.

.. deprecated:: 1.13.0
Replaced by :func:`isin`.

Parameters
----------
ar1 : (M,) array_like
Expand All @@ -411,6 +416,8 @@ def in1d(ar1, ar2, assume_unique=False, invert=False):

See Also
---- EDBE ----
isin : Version of this function that preserves the
shape of ar1.
numpy.lib.arraysetops : Module with a number of other functions for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"See also: The module you're looking at right now" is a little weird. @brsr, you don't need to worry about it in this PR though.

performing set operations on arrays.

Expand Down Expand Up @@ -481,6 +488,96 @@ def in1d(ar1, ar2, assume_unique=False, invert=False):
else:
return ret[rev_idx]


def isin(element, test_elements, assume_unique=False, invert=False):
"""
Calculates `element in test_elements`, broadcasting over `element` only.
Returns a boolean array of the same shape as `element` that is True
where an element of `element` is in `test_elements` and False otherwise.

Parameters
----------
element : array_like
Input array.
test_elements : array_like
The values against which to test each value of `element`.
This argument is flattened if it is an array or array_like.
See notes for behavior with non-array-like parameters.
assume_unique : bool, optional
If True, the input arrays are both assumed to be unique, which
can speed up the calculation. Default is False.
invert : bool, optional
If True, the values in the returned array are inverted, as if
calculating `element not in test_elements`. Default is False.
``np.isin(a, b, invert=True)`` is equivalent to (but faster
than) ``np.invert(np.isin(a, b))``.

Returns
-------
isin : ndarray, bool
Has the same shape as `element`. The values `element[isin]`
are in `test_elements`.

See Also
--------
in1d : Flattened version of this function.
numpy.lib.arraysetops : Module with a number of other functions for
performing set operations on arrays.
Notes
-----

`isin` is an element-wise function version of the python keyword `in`.
``isin(a, b)`` is roughly equivalent to
``np.array([item in b for item in a])`` if `a` and `b` are 1-D sequences.

`element` and `test_elements` are converted to arrays if they are not
already. If `test_elements` is a set (or other non-sequence collection)
it will be converted to an object array with one element, rather than an
array of the values contained in `test_elements`. This is a consequence
of the `array` constructor's way of handling non-sequence collections.
Converting the set to a list usually gives the desired behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good thing to show an example of, for people skipping the notes


.. versionadded:: 1.13.0

Examples
--------
>>> element = 2*np.arange(4).reshape((2, 2))
>>> element
array([[0, 2],
[4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = np.isin(element, test_elements)
>>> mask
array([[ False, True],
[ True, False]], dtype=bool)
>>> element[mask]
array([2, 4])
>>> mask = np.isin(element, test_elements, invert=True)
>>> mask
array([[ True, False],
[ False, True]], dtype=bool)
>>> element[mask]
array([0, 6])

Because of how `array` handles sets, the following does not
work as expected:

>>> test_set = {1, 2, 4, 8}
>>> np.isin(element, test_set)
array([[ False, False],
[ False, False]], dtype=bool)

Casting the set to a list gives the expected result:

>>> np.isin(element, list(test_set))
array([[ False, True],
[ True, False]], dtype=bool)
"""
element = np.asarray(element)
return in1d(element, test_elements, assume_unique=assume_unique,
invert=invert).reshape(element.shape)


def union1d(ar1, ar2):
"""
Find the union of two arrays.
Expand Down
2 changes: 2 additions & 0 deletions numpy/lib/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@
setxor1d Set exclusive-or of 1D arrays with unique elements.
in1d Test whether elements in a 1D array are also present in
another array.
isin Test whether each element of one (possibly multidimensional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why just possibly? I think you can put ND before both of these arrays, to make it explicit compared to its surrounding 1D functions

array is present anywhere within a second array.
union1d Union of 1D arrays with unique elements.
setdiff1d Set difference of 1D arrays with unique elements.
================ ===================
Expand Down
42 changes: 41 additions & 1 deletion numpy/lib/tests/test_arraysetops.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
run_module_suite, TestCase, assert_array_equal, assert_equal, assert_raises
)
from numpy.lib.arraysetops import (
ediff1d, intersect1d, setxor1d, union1d, setdiff1d, unique, in1d
ediff1d, intersect1d, setxor1d, union1d, setdiff1d, unique, in1d, isin
)


Expand Down Expand Up @@ -77,6 +77,46 @@ def test_ediff1d(self):
assert(isinstance(ediff1d(np.matrix(1)), np.matrix))
assert(isinstance(ediff1d(np.matrix(1), to_begin=1), np.matrix))

def test_isin(self):
# the tests for in1d cover most of isin's behavior
# if in1d is removed, would need to change those tests to test
# isin instead.
def _isin_slow(a, b):
b = np.asarray(b).flatten().tolist()
return a in b
isin_slow = np.vectorize(_isin_slow, otypes=[bool], excluded={1})
def assert_isin_equal(a, b):
x = isin(a, b)
y = isin_slow(a, b)
assert_array_equal(x, y)

#multidimensional arrays in both arguments
a = np.arange(24).reshape([2, 3, 4])
b = np.array([[10, 20, 30], [0, 1, 3], [11, 22, 33]])
assert_isin_equal(a, b)

#array-likes as both arguments
c = [(9, 8), (7, 6)]
d = (9, 7)
assert_isin_equal(c, d)

#zero-d array:
f = np.array(3)
assert_isin_equal(f, b)
assert_isin_equal(a, f)
assert_isin_equal(f, f)

#scalar:
assert_isin_equal(5, b)
assert_isin_equal(a, 6)
assert_isin_equal(5, 6)

#empty array-like:
x = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] is 1d with shape (0,), not 0d with shape ().

I'm thinking testing things like np.isin(somearr, np.array(1)), or np.isin(np.array(1), somearr)

assert_isin_equal(x, b)
assert_isin_equal(a, x)
assert_isin_equal(x, x)

Copy link
Member
@eric-wieser eric-wieser Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test for 0d a here

def test_in1d(self):
# we use two different sizes for the b array here to test the
# two different paths in in1d().
Expand Down
30 changes: 29 additions & 1 deletion numpy/ma/extras.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
'column_stack', 'compress_cols', 'compress_nd', 'compress_rowcols',
'compress_rows', 'count_masked', 'corrcoef', 'cov', 'diagflat', 'dot',
'dstack', 'ediff1d', 'flatnotmasked_contiguous', 'flatnotmasked_edges',
'hsplit', 'hstack', 'in1d', 'intersect1d', 'mask_cols', 'mask_rowcols',
'hsplit', 'hstack', 'isin', 'in1d', 'intersect1d', 'mask_cols', 'mask_rowcols',
'mask_rows', 'masked_all', 'masked_all_like', 'median', 'mr_',
'notmasked_contiguous', 'notmasked_edges', 'polyfit', 'row_stack',
'setdiff1d', 'setxor1d', 'unique', 'union1d', 'vander', 'vstack',
Expand Down Expand Up @@ -1137,15 +1137,20 @@ def setxor1d(ar1, ar2, assume_unique=False):
flag2 = (flag[1:] == flag[:-1])
return aux[flag2]


def in1d(ar1, ar2, assume_unique=False, invert=False):
"""
Test whether each element of an array is also present in a second
array.

The output is always a masked array. See `numpy.in1d` for more details.

.. deprecated:: 1.13.0
Replaced by :func:`isin`.

See Also
--------
isin : Version of this function that preserves the shape of ar1.
numpy.in1d : Equivalent function for ndarrays.

Notes
Expand Down Expand Up @@ -1176,6 +1181,29 @@ def in1d(ar1, ar2, assume_unique=False, invert=False):
return flag[indx][rev_idx]


def isin(element, test_elements, assume_unique=False, invert=False):
"""
Calculates `element in test_elements`, broadcasting over
`element` only.

The output is always a masked array of the same shape as `element`.
See `numpy.isin` for more details.

See Also
--------
in1d : Flattened version of this function.
numpy.isin : Equivalent function for ndarrays.

Notes
-----
.. versionadded:: 1.13.0

"""
element = ma.asarray(element)
return in1d(element, test_elements, assume_unique=assume_unique,
invert=invert).reshape(element.shape)


def union1d(ar1, ar2):
"""
Union of two arrays.
Expand Down
23 changes: 22 additions & 1 deletion numpy/ma/tests/test_extras.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
median, average, unique, setxor1d, setdiff1d, union1d, intersect1d, in1d,
ediff1d, apply_over_axes, apply_along_axis, compress_nd, compress_rowcols,
mask_rowcols, clump_masked, clump_unmasked, flatnotmasked_contiguous,
notmasked_contiguous, notmasked_edges, masked_all, masked_all_like,
notmasked_contiguous, notmasked_edges, masked_all, masked_all_like, isin,
diagflat
)
import numpy.ma.extras as mae
Expand Down Expand Up @@ -1435,6 +1435,27 @@ def test_setxor1d(self):
#
assert_array_equal([], setxor1d([], []))

def test_isin(self):
# the tests for in1d cover most of isin's behavior
# if in1d is removed, would need to change those tests to test
# isin instead.
a = np.arange(24).reshape([2, 3, 4])
mask = np.zeros([2, 3, 4])
mask[1, 2, 0] = 1
a = array(a, mask=mask)
b = array(data=[0, 10, 20, 30, 1, 3, 11, 22, 33],
mask=[0, 1, 0, 1, 0, 1, 0, 1, 0])
ec = zeros((2, 3, 4), dtype=bool)
ec[0, 0, 0] = True
ec[0, 0, 1] = True
ec[0, 2, 3] = True
c = isin(a, b)
assert_(isinstance(c, MaskedArray))
assert_array_equal(c, ec)
Copy link
Member
@eric-wieser eric-wieser Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test here for type(c)? Should it be MaskedArray or array?

Also, a test of what happens if you call np.isin instead of np.ma.isin might be interesting as well.

#compare results of np.isin to ma.isin
d = np.isin(a, b[~b.mask]) & ~a.mask
assert_array_equal(c, d)

def test_in1d(self):
# Test in1d
a = array([1, 2, 5, 7, -1], mask=[0, 0, 0, 0, 1])
Expand Down
0