8000 ERR: disallow non-hashables in Index/MultiIndex construction & rename by arminv · Pull Request #20548 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Apr 23, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
9047d60
Check non-hashability on series construction and renaming
arminv Mar 30, 2018
df7650d
Removed changes from pandas/core/series.py
arminv Mar 30, 2018
dd64219
Check non-hashability on Index construction and renaming
arminv Mar 30, 2018
89e92ab
modified test_getitem_list example to disallow non-hashable names
arminv Mar 30, 2018
cd3e53a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Mar 30, 2018
cd070e3
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 1, 2018
351691f
Changed ErrorType message for hashability requirement
arminv Apr 1, 2018
3a7b0b2
Fixed how rename calls set_names to allow for MultiIndex hashable typ…
arminv Apr 1, 2018
8000
70933d5
Moved type checking from set_names back to rename
arminv Apr 1, 2018
56fd617
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 1, 2018
d4ed636
Moved hashable checking to set_names. Changed exception messages.
arminv Apr 1, 2018
b554bb3
Modified test_duplicate_level_names to pass with new (hashable names)…
arminv Apr 2, 2018
6efd6cc
Added test_constructor_nonhashable_names for checking hashability on …
arminv Apr 2, 2018
4fb3a6b
Fixed a typo
arminv Apr 2, 2018
786f43f
Minor refactoring of test_constructor_nonhashable_names
arminv Apr 2, 2018
01b712e
Added test_constructor_nonhashable_name for checking hashability on name
arminv Apr 2, 2018
6f13cd0
Added note in Other API Changes on hashability of names
arminv Apr 2, 2018
26433c3
Improved wording of the note
arminv Apr 2, 2018
91ef466
Addressed PEP 8 issues
arminv Apr 2, 2018
85e35ea
Modified exception message of Index
arminv Apr 2, 2018
5c2e240
Changed exception message format
arminv Apr 2, 2018
4ca2a52
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 2, 2018
840cd88
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 3, 2018
18bcf2a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 6, 2018
d98014f
Refactoring
arminv Apr 8, 2018
b8a1d7e
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 9, 2018
edfbd1d
Added internal comment
arminv Apr 9, 2018
2322346
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 10, 2018
fa52655
Refactoring
arminv Apr 10, 2018
c0f6936
Moved check from set_names to _set_names
arminv Apr 10, 2018
a9c14e6
test with fixture
jreback Apr 11, 2018
30da596
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 12, 2018
667d495
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 16, 2018
c4c1011
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 16, 2018
bd75433
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
74a9b54
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
b1cb7fd
Refactoring. Internal docstring. Minor typos
arminv Apr 17, 2018
863f7d3
PEP 8
arminv Apr 17, 2018
0723009
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
7092d49
Improved docstring wording
arminv Apr 17, 2018
1d8f67a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 19, 2018
12488ff
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 20, 2018
4a500ba
Shorten docstring
arminv Apr 21, 2018
9ec64b0
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 21, 2018
47903ae
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 22, 2018
04f2eed
Added examples
arminv Apr 22, 2018
1a68188
remove examples from _set_names
jreback Apr 22, 2018
97a2b06
consolidate logic a bit
jreback Apr 22, 2018
File filter

Filter by extension

Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Addressed PEP 8 issues
  • Loading branch information
arminv committed Apr 2, 2018
commit 91ef466c26253b06d4a2b0ab55104a36806f0c17
6 changes: 3 additions & 3 deletions pandas/core/indexes/base.py
8000
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,

if name is not None and not is_hashable(name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually think you need this here, as _set_name gets called from _simple_new (when .name is set). so rather more logical to put this in _set_name (like you do for MI)

Copy link
Contributor Author
@arminv arminv Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MI, the name validation is delegated to _set_names in this way:

result = object.__new__(MultiIndex)
# we've already validated levels and labels, so shortcut here
result._set_levels(levels, copy=copy, validate=False)
result._set_labels(labels, copy=copy, validate=False)
if names is not None:
# handles name validation
result._set_names(names)

Maybe add something like this in __new__():

_result = object.__new__(Index)

if name is not None:
    # handles name validation
    _result._set_names([name])

So that pd.Index([1, 2, 3], name=['foo']) would still raise but we check in a more logical place.

raise TypeError(__class__.__name__ +
'.name must be a hashable type')
'.name must be a hashable type')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this very likely also needs checking for MultiIndex (as that's a different path in some cases).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow non-hashable names for MultiIndex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Contributor Author
@arminv arminv Apr 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a MultiIndex, it seems that names is converted into FrozenList after creation. I found this answer from you on StackOverflow about hashability of a FrozenList.

Right now, if names can’t be converted to a FrozenList (if not hashable), it throws an exception. For example:

In [1]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
   ...:                    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
   ...:                    names=(['foo'], ['bar']))
   ...:           

TypeError: unhashable type: 'list'

while this passes:

In [2]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
    ...:                    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
    ...:                    names=[('foo'), ('bar')])

Do we need to change anything here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you just need to check that each name is hashable, not the frozen list itself. that's why .set_names is the best place for this

if fastpath:
return cls._simple_new(data, name)
Expand Down Expand Up @@ -1366,10 +1366,10 @@ def set_names(self, names, level=None, inplace=False):
else:
if self.nlevels == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only needs to deal with regular indices. MI has its own version

raise TypeError(__class__.__name__ +
'.name must be a hashable type')
'.name must be a hashable type')
else:
raise TypeError(self.__class__.__name__ +
'.name must be a hashable type')
'.name must be a hashable type')

if level is not None and self.nlevels == 1:
raise ValueError('Level must be None for non-MultiIndex')
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -647,7 +647,7 @@ def _set_names(self, names, level=None, validate=True):
pass
else:
raise TypeError(self.__class__.__name__ +
'.name must be a hashable type')
'.name must be a hashable type')

# GH 15110
# Don't allow a single string for names in a MultiIndex
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ def test_constructor_nonhashable_name(self):
assert isinstance(idx, Index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this assert

renamed = ['1']
tm.assert_raises_regex(TypeError, message,
idx.rename, name=renamed)
idx.rename, name=renamed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check set_names


def test_view_with_args(self):

Expand Down
12 changes: 5 additions & 7 deletions pandas/tests/indexes/test_multi.py
Original file line number Diff F438 line number Diff line change
Expand Up @@ -622,18 +622,16 @@ def test_constructor_nonhashable_names(self):
names = ((['foo'], ['bar']))
message = "MultiIndex.name must be a hashable type"
tm.assert_raises_regex(TypeError, message,
MultiIndex, levels=levels,
labels=labels,
names=names)
MultiIndex, levels=levels,
labels=labels, names=names)

# With .rename()
mi = MultiIndex(levels=[[1, 2], [u'one', u'two']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=('foo', 'bar'))
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=('foo', 'bar'))
assert isinstance(mi, MultiIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont' need the assert

renamed = [['foor'], ['barr']]
tm.assert_raises_regex(TypeError, message,
mi.rename, names=renamed)
tm.assert_raises_regex(TypeError, message, mi.rename, names=renamed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check set_names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@pytest.mark.parametrize('names', [['a', 'b', 'a'], ['1', '1', '2'],
['1', 'a', '1']])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arminv Is there a reason that you changed those parametrize values to all strings? (I suppose by accident?)
I am reworking the test in #21423, so will revert there if this was by accident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche IIRC I changed it (in this commit) because the test was failing, but implementation changed a lot after that commit so I'm not sure if reverting this would cause a problem now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be passing there!

Expand Down
0