8000 Gh 36562 typeerror comparison not supported between float and str by ssche · Pull Request #37096 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

Gh 36562 typeerror comparison not supported between float and str #37096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 4, 2020
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
1320ff1
Fixed #36562
ssche Oct 13, 2020
a1b9385
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 13, 2020
1f76d21
Fixed flake issue
ssche Oct 13, 2020
7076841
fixed linting errors
ssche Oct 13, 2020
225675d
Addressed review comments
ssche Oct 14, 2020
2677166
Make `sort_tuples` last resort sorting strategy to allow faster sorte…
ssche Oct 16, 2020
6f476bc
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 16, 2020
3688238
manually apply pandas black changes (from CI)
ssche Oct 16, 2020
aba429c
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 16, 2020
8ae9279
Modified changelog message as per review comments
ssche Oct 21, 2020
dd5a38d
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 21, 2020
7b7d6f8
Removed pd.xyz
ssche Oct 22, 2020
4226662
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 22, 2020
8cbfa01
forgot `pd`
ssche Oct 22, 2020
6d71000
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 26, 2020
92e1e33
Changed sorting algorithm by using expanded array
ssche Oct 31, 2020
3ada9ce
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 31, 2020
95670f1
Add ticket ref to test case
ssche Oct 31, 2020
d9dcd22
sort import
ssche Oct 31, 2020
66c30c7
Address review comments
ssche Oct 31, 2020
db66528
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Oct 31, 2020
16ae4f4
Merge remote-tracking branch 'upstream/master' into gh-36562-typeerro…
ssche Nov 3, 2020
9d03dc3
Fixed: committed file was in merge state
ssche Nov 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Changed sorting algorithm by using expanded array
* Extract column arrays and use pandas' internal functions to obtain index which sorts the array of tuples
* Add function annotations to document expected argument types of `sort_tuples()`
  • Loading branch information
ssche committed Oct 31, 2020
commit 92e1e3384a1c1410bd59af90ca853aba1e09bdb9
48 changes: 11 additions & 37 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
"""
from __future__ import annotations

import functools
import operator
from textwrap import dedent
from typing import TYPE_CHECKING, Dict, Optional, Tuple, Union, cast
Expand Down Expand Up @@ -2071,42 +2070,17 @@ def sort_mixed(values):
strs = np.sort(values[str_pos])
return np.concatenate([nums, np.asarray(strs, dtype=object)])

def sort_tuples(values):
# sorts tuples with mixed values. can handle nan vs string comparisons.
def cmp_func(index_x, index_y):
x = values[index_x]
y = values[index_y]
# shortcut loop in case both tuples are the same
if x == y:
return 0
# lexicographic sorting
for i in range(max(len(x), len(y))):
# check if the tuples have different lengths (shorter tuples
# first)
if i >= len(x):
return -1
if i >= len(y):
return +1
x_is_na = isna(x[i])
y_is_na = isna(y[i])
# values are the same -> resolve tie with next element
if (x_is_na and y_is_na) or (x[i] == y[i]):
continue
# check for nan values (sort nan to the end)
if x_is_na and not y_is_na:
return +1
if not x_is_na and y_is_na:
return -1
# normal greater/less than comparison
if x[i] < y[i]:
return -1
return +1
# both values are the same (should already have been caught)
return 0

ixs = np.arange(len(values))
ixs = sorted(ixs, key=functools.cmp_to_key(cmp_func))
return values[ixs]
def sort_tuples(values: np.ndarray[tuple]):
# convert array of tuples (1d) to array or array (2d).
# we need to keep the columns separately as they contain different
# types and nans (can't use `np.sort` as it may fail when str and nan
# are mixed in a column as types cannot be compared).
from pandas.core.sorting import lexsort_indexer
from pandas.core.internals.construction import to_arrays

arrays, _ = to_arrays(values, None)
indexer = lexsort_indexer(arrays, orders=True)
return values[indexer]

sorter = None

Expand Down
0