8000 ERR: disallow non-hashables in Index/MultiIndex construction & rename by arminv · Pull Request #20548 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content
  • ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

    New issue

    Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

    By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

    Already on GitHub? Sign in to your account

    Merged
    merged 48 commits into from
    Apr 23, 2018
    Merged
    Show file tree
    Hide file tree
    Changes from all commits
    Commits
    Show all changes
    48 commits
    Select commit Hold shift + click to select a range
    9047d60
    Check non-hashability on series construction and renaming
    arminv Mar 30, 2018
    df7650d
    Removed changes from pandas/core/series.py
    arminv Mar 30, 2018
    dd64219
    Check non-hashability on Index construction and renaming
    arminv Mar 30, 2018
    89e92ab
    modified test_getitem_list example to disallow non-hashable names
    arminv Mar 30, 2018
    cd3e53a
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Mar 30, 2018
    cd070e3
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 1, 2018
    351691f
    Changed ErrorType message for hashability requirement
    arminv Apr 1, 2018
    3a7b0b2
    Fixed how rename calls set_names to allow for MultiIndex hashable typ…
    arminv Apr 1, 2018
    70933d5
    Moved type checking from set_names back to rename
    arminv Apr 1, 2018
    56fd617
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 1, 2018
    d4ed636
    Moved hashable checking to set_names. Changed exception messages.
    arminv Apr 1, 2018
    b554bb3
    Modified test_duplicate_level_names to pass with new (hashable names)…
    arminv Apr 2, 2018
    6efd6cc
    Added test_constructor_nonhashable_names for checking hashability on …
    arminv Apr 2, 2018
    4fb3a6b
    Fixed a typo
    arminv Apr 2, 2018
    786f43f
    Minor refactoring of test_constructor_nonhashable_names
    arminv Apr 2, 2018
    01b712e
    Added test_constructor_nonhashable_name for checking hashability on name
    arminv Apr 2, 2018
    6f13cd0
    Added note in Other API Changes on hashability of names
    arminv Apr 2, 2018
    26433c3
    Improved wording of the note
    arminv Apr 2, 2018
    91ef466
    Addressed PEP 8 issues
    arminv Apr 2, 2018
    85e35ea
    Modified exception message of Index
    arminv Apr 2, 2018
    5c2e240
    Changed exception message format
    arminv Apr 2, 2018
    4ca2a52
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 2, 2018
    840cd88
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 3, 2018
    18bcf2a
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 6, 2018
    d98014f
    Refactoring
    arminv Apr 8, 2018
    b8a1d7e
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 9, 2018
    edfbd1d
    Added internal comment
    arminv Apr 9, 2018
    2322346
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 10, 2018
    fa52655
    Refactoring
    arminv Apr 10, 2018
    c0f6936
    Moved check from set_names to _set_names
    arminv Apr 10, 2018
    a9c14e6
    test with fixture
    jreback Apr 11, 2018
    30da596
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 12, 2018
    667d495
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 16, 2018
    c4c1011
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 16, 2018
    bd75433
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 17, 2018
    74a9b54
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 17, 2018
    b1cb7fd
    Refactoring. Internal docstring. Minor typos
    arminv Apr 17, 2018
    863f7d3
    PEP 8
    arminv Apr 17, 2018
    0723009
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 17, 2018
    7092d49
    Improved docstring wording
    arminv Apr 17, 2018
    1d8f67a
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 19, 2018
    12488ff
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 20, 2018
    4a500ba
    Shorten docstring
    arminv Apr 21, 2018
    9ec64b0
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 21, 2018
    47903ae
    Merge remote-tracking branch 'upstream/master' into non_hashable_err
    arminv Apr 22, 2018
    04f2eed
    Added examples
    arminv Apr 22, 2018
    1a68188
    remove examples from _set_names
    jreback Apr 22, 2018
    97a2b06
    consolidate logic a bit
    jreback Apr 22, 2018
    File filter

    Filter by extension

    Filter by extens 8000 ion

    Conversations
    Failed to load comments.
    Loading
    Jump to
    Jump to file
    Failed to load files.
    Loading
    Diff view
    Diff view
    1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
    Original file line number Diff line number Diff line change
    Expand Up @@ -827,6 +827,7 @@ Other API Changes
    - A :class:`Series` of ``dtype=category`` constructed from an empty ``dict`` will now have categories of ``dtype=object`` rather than ``dtype=float64``, consistently with the case in which an empty list is passed (:issue:`18515`)
    - All-NaN levels in a ``MultiIndex`` are now assigned ``float`` rather than ``object`` dtype, promoting consistency with ``Index`` (:issue:`17929`).
    - Levels names of a ``MultiIndex`` (when not None) are now required to be unique: trying to create a ``MultiIndex`` with repeated names will raise a ``ValueError`` (:issue:`18872`)
    - Both construction and renaming of ``Index``/``MultiIndex`` with non-hashable ``name``/``names`` will now raise ``TypeError`` (:issue:`20527`)
    - :func:`Index.map` can now accept ``Series`` and dictionary input objects (:issue:`12756`, :issue:`18482`, :issue:`18509`).
    - :func:`DataFrame.unstack` will now default to filling with ``np.nan`` for ``object`` columns. (:issue:`12815`)
    - :class:`IntervalIndex` constructor will raise if the ``closed`` parameter conflicts with how the input data is inferred to be closed (:issue:`18421`)
    Expand Down
    30 changes: 28 additions & 2 deletions pandas/core/indexes/base.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -42,6 +42,7 @@
    is_datetime64_any_dtype,
    is_datetime64tz_dtype,
    is_timedelta64_dtype,
    is_hashable,
    needs_i8_conversion,
    is_iterator, is_list_like,
    is_scalar)
    Expand Down Expand Up @@ -1311,9 +1312,33 @@ def _get_names(self):
    return FrozenList((self.name, ))

    def _set_names(self, values, level=None):
    """
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    can you also add a mention on set_names itself that the names must be hashable (and examples if you want)

    Set new names on index. Each name has to be a hashable type.

    Parameters
    ----------
    values : str or sequence
    name(s) to set
    level : int, level name, or sequence of int/level names (default None)
    If the index is a MultiIndex (hierarchical), level(s) to set (None
    for all levels). Otherwise level must be None

    Raises
    ------
    TypeError if each name is not hashable.
    """
    if not is_list_like(values):
    raise ValueError('Names must be a list-like')
    if len(values) != 1:
    raise ValueError('Length of new names must be 1, got %d' %
    len(values))

    # GH 20527
    # All items in 'name' need to be hashable:
    for name in values:
    if not is_hashable(name):
    raise TypeError('{}.name must be a hashable type'
    .format(self.__class__.__name__))
    self.name = values[0]

    names = property(fset=_set_names, fget=_get_names)
    Expand All @@ -1339,9 +1364,9 @@ def set_names(self, names, level=None, inplace=False):
    Examples
    --------
    >>> Index([1, 2, 3, 4]).set_names('foo')
    Int64Index([1, 2, 3, 4], dtype='int64')
    Int64Index([1, 2, 3, 4], dtype='int64', name='foo')
    >>> Index([1, 2, 3, 4]).set_names(['foo'])
    Int64Index([1, 2, 3, 4], dtype='int64')
    Int64Index([1, 2, 3, 4], dtype='int64', name='foo')
    >>> idx = MultiIndex.from_tuples([(1, u'one'), (1, u'two'),
    (2, u'one'), (2, u'two')],
    names=['foo', 'bar'])
    Expand All @@ -1354,6 +1379,7 @@ def set_names(self, names, level=None, inplace=False):
    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
    names=[u'baz', u'bar'])
    """

    if level is not None and self.nlevels == 1:
    raise ValueError('Level must be None for non-MultiIndex')

    Expand Down
    38 changes: 33 additions & 5 deletions pandas/core/indexes/multi.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -16,6 +16,7 @@
    _ensure_platform_int,
    is_categorical_dtype,
    is_object_dtype,
    is_hashable,
    is_iterator,
    is_list_like,
    pandas_dtype,
    Expand Down Expand Up @@ -634,12 +635,29 @@ def _get_names(self):

    def _set_names(self, names, level=None, validate=True):
    """
    Set new names on index. Each name has to be a hashable type.

    Parameters
    ----------
    values : str or sequence
    name(s) to set
    level : int, level name, or sequence of int/level names (default None)
    If the index is a MultiIndex (hierarchical), level(s) to set (None
    for all levels). Otherwise level must be None
    validate : boolean, default True
    validate that the names match level lengths

    Raises
    ------
    TypeError if each name is not hashable.

    Notes
    -----
    sets names on levels. WARNING: mutates!

    Note that you generally want to set this *after* changing levels, so
    that it only acts on copies
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    looks good here. can you update the doc-string with the comment below

    """

    # GH 15110
    # Don't allow a single string for names in a MultiIndex
    if names is not None and not is_list_like(names):
    Expand All @@ -662,10 +680,20 @@ def _set_names(self, names, level=None, validate=True):

    # set the name
    for l, name in zip(level, names):
    if name is not None and name in used:
    raise ValueError('Duplicated level name: "{}", assigned to '
    'level {}, is already used for level '
    '{}.'.format(name, l, used[name]))
    if name is not None:

    # GH 20527
    # All items in 'names' need to be hashable:
    if not is_hashable(name):
    raise TypeError('{}.name must be a hashable type'
    .format(self.__class__.__name__))

    if name in used:
    raise ValueError(
    'Duplicated level name: "{}", assigned to '
    'level {}, is already used for level '
    '{}.'.format(name, l, used[name]))

    self.levels[l].rename(name, inplace=True)
    used[name] = l

    Expand Down
    4 changes: 2 additions & 2 deletions pandas/tests/frame/test_indexing.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -125,12 +125,12 @@ def test_getitem_list(self):
    # tuples
    df = DataFrame(randn(8, 3),
    columns=Index([('foo', 'bar'), ('baz', 'qux'),
    ('peek', 'aboo')], name=['sth', 'sth2']))
    ('peek', 'aboo')], name=('sth', 'sth2')))

    result = df[[('foo', 'bar'), ('baz', 'qux')]]
    expected = df.iloc[:, :2]
    assert_frame_equal(result, expected)
    assert result.columns.names == ['sth', 'sth2']
    assert result.columns.names == ('sth', 'sth2')

    def test_getitem_callable(self):
    # GH 12533
    Expand Down
    18 changes: 18 additions & 0 deletions pandas/tests/indexes/test_base.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -435,6 +435,24 @@ def test_constructor_empty(self):
    assert isinstance(empty, MultiIndex)
    assert not len(empty)

    def test_constructor_nonhashable_name(self, indices):
    # GH 20527

    if isinstance(indices, MultiIndex):
    pytest.skip("multiindex handled in test_multi.py")

    name = ['0']
    message = "Index.name must be a hashable type"
    tm.assert_raises_regex(TypeError, message, name=name)

    # With .rename()
    renamed = [['1']]
    tm.assert_raises_regex(TypeError, message,
    indices.rename, name=renamed)
    # With .set_names()
    tm.assert_raises_regex(TypeError, message,
    indices.set_names, names=renamed)

    def test_view_with_args(self):

    restricted = ['unicodeIndex', 'strIndex', 'catIndex', 'boolIndex',
    Expand Down
    23 changes: 21 additions & 2 deletions pandas/tests/indexes/test_multi.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -615,8 +615,27 @@ def test_constructor_mismatched_label_levels(self):
    with tm.assert_raises_regex(ValueError, label_error):
    self.index.copy().set_labels([[0, 0, 0, 0], [0, 0]])

    @pytest.mark.parametrize('names', [['a', 'b', 'a'], [1, 1, 2],
    [1, 'a', 1]])
    def test_constructor_nonhashable_names(self):
    # GH 20527
    levels = [[1, 2], [u'one', u'two']]
    labels = [[0, 0, 1, 1], [0, 1, 0, 1]]
    names = ((['foo'], ['bar']))
    message = "MultiIndex.name must be a hashable type"
    tm.assert_raises_regex(TypeError, message,
    MultiIndex, levels=levels,
    labels=labels, names=names)

    # With .rename()
    mi = MultiIndex(levels=[[1, 2], [u'one', u'two']],
    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
    names=('foo', 'bar'))
    renamed = [['foor'], ['barr']]
    tm.assert_raises_regex(TypeError, message, mi.rename, names=renamed)
    # With .set_names()
    tm.assert_raises_regex(TypeError, message, mi.set_names, names=renamed)

    @pytest.mark.parametrize('names', [['a', 'b', 'a'], ['1', '1', '2'],
    ['1', 'a', '1']])
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @arminv Is there a reason that you changed those parametrize values to all strings? (I suppose by accident?)
    I am reworking the test in #21423, so will revert there if this was by accident

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @jorisvandenbossche IIRC I changed it (in this commit) because the test was failing, but implementation changed a lot after that commit so I'm not sure if reverting this would cause a problem now

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Seems to be passing there!

    def test_duplicate_level_names(self, names):
    # GH18872
    pytest.raises(ValueError, pd.MultiIndex.from_product,
    Expand Down
    0