-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
I find that .last()
does not perform as expected.
Example:
df = pd.DataFrame([[179293473,'2016-06-01 00:00:03.549745','http://www.dr.dk/nyheder/',39169523],[179293473,'2016-06-01 00:04:22.346018','https://www.information.dk/indland/2016/05/hvert-tredje-offer-naar-anmelde-voldtaegt-tide', 39125224],
[179773461, '2016-06-01 22:13:16.588146', 'https://www.google.dk', 31658124],
[179773461, '2016-06-01 22:14:04.059781', 'https://www.google.dk', 31658124],
[179773461, '2016-06-01 22:16:37.230587', np.nan, 31658124],
[179773461, '2016-06-01 22:23:09.847149', 'https://www.google.dk', 32718401],
[179773461, '2016-06-01 22:23:55.158929', np.nan, 32718401],
[179773461, '2016-06-01 22:27:00.857224', np.nan, 32718401]],
columns=['SessionID', 'PageTime', 'ReferrerURL', 'PageID'])
Problem description
When I run:
df.groupby('SessionID').last()
I get:
SessionID | PageTime | ReferrerURL | PageID |
---|---|---|---|
179293473 | 2016-06-01 00:04:22.346018 | https://www.information.dk/indland/2016/05/hve... | 39125224 |
179773461 | 2016-06-01 22:27:00.857224 | https://www.google.dk | 32718401 |
Expected Output
When, in fact, I was expecting the same result as obtained from:
df.groupby('SessionID').nth(-1)
SessionID | PageID | PageTime | ReferrerURL |
---|---|---|---|
179293473 | 39125224 | 2016-06-01 00:04:22.346018 | https://www.information.dk/indland/2016/05/hve... |
179773461 | 32718401 | 2016-06-01 22:27:00.857224 | NaN |
And while we are at .nth()
, why does it mix up my column order?
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.3
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None