Checking for length of categories before doing string conversion. fix… #11306

michaelaye · 2015-10-13T02:04:54Z

closes #11305

pandas-dev#11305

jreback · 2015-10-13T11:25:28Z

can you add a whats new note.
pls add an asv benchmark (like your sample but only use 1000 categories)

jreback · 2015-10-15T22:25:37Z

can you update

michaelaye · 2015-10-17T22:16:07Z

Coming up. Had to read up where/how to do that.

michaelaye · 2015-10-18T02:52:36Z

the categorical asv benchmark file breaks PEP8 302: 2 lines between functions. Should I correct? I searched but did not find a least of PEP8 errors the pandas decided to ignore.

jorisvandenbossche · 2015-10-18T09:32:57Z

asv_bench/benchmarks/categoricals.py

+        self.data = df[df.C == '20']
+
+    def time_rendering(self):
+        str(data.C)


data -> self.data ?

Yeah, sorry. I can't run asv on my machine. first it complained about a missing config file, when i tried to run only the categorical bench test. apparently, when i'm not restricting it with a -b flag, it creates a new environment, but it seems to be incompatible with conda, as it creates a py2.7 environment now, while being run in a conda python3.4 environment.

or maybe, that's meant to happen, as it tests for all to be tested environment? I don't know, have never used asv before.

jorisvandenbossche · 2015-10-18T09:34:43Z

@michaelaye that's ok for the PEP8 changes

jreback · 2015-10-18T13:45:43Z

doc/source/whatsnew/v0.17.1.txt

@@ -55,7 +55,7 @@ Bug Fixes

 - Bug in ``.to_latex()`` output broken when the index has a name (:issue: `10660`)
 - Bug in ``HDFStore.append`` with strings whose encoded length exceded the max unencoded length (:issue:`11234`)
-
+- Performance bug in ``Categorical._repr_categories`` was rendering string before chopping them for display (:issue: `11305`)


move to Performance section

michaelaye · 2015-10-18T21:47:50Z

I was able to start an asv run, but for some reason, it only runs a python2.7 env, don't know why:

$ asv continuous master HEAD -b categorical
· Creating environments
· Discovering benchmarks
·· Uninstalling from py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt
·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt...................................
·· Installing into py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt..
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For pandas commit hash cdff5bce:
[  0.00%] ·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt.........................................
[  0.00%] ·· Benchmarking py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt
[  8.33%] ··· Running categoricals.categorical_constructor.time_fastpath                                4.85ms
[ 16.67%] ··· Running categoricals.categorical_constructor.time_regular_constructor                   226.53ms
[ 25.00%] ··· Running categoricals.categorical_rendering.time_rendering                                 2.31ms
[ 33.33%] ··· Running categoricals.categorical_value_counts.time_value_counts                          18.63ms
[ 41.67%] ··· Running categoricals.categorical_value_counts.time_value_counts_dropna                   24.74ms
[ 50.00%] ··· Running categoricals.concat_categorical.time_concat_categorical                          46.91ms
[ 50.00%] · For pandas commit hash c2aa6a23:
[ 50.00%] ·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt......................................
[ 50.00%] ·· Benchmarking py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt
[ 58.33%] ··· Running categoricals.categorical_constructor.time_fastpath                                4.93ms
[ 66.67%] ··· Running categoricals.categorical_constructor.time_regular_constructor                   210.66ms
[ 75.00%] ··· Running categoricals.categorical_rendering.time_rendering                                17.63ms
[ 83.33%] ··· Running categoricals.categorical_value_counts.time_value_counts                          13.60ms
[ 91.67%] ··· Running categoricals.categorical_value_counts.time_value_counts_dropna                   17.59ms
[100.00%] ··· Running categoricals.concat_categorical.time_concat_categorical                          50.98ms    before     after       ratio
  [c2aa6a23] [cdff5bce]
-   17.63ms     2.31ms      0.13  categoricals.categorical_rendering.time_rendering
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2015-10-19T21:40:06Z

doc/source/whatsnew/v0.17.1.txt

@@ -50,6 +50,8 @@ Performance Improvements

 .. _whatsnew_0171.bug_fixes:

+- Performance bug in ``Categorical._repr_categories`` was rendering string before chopping them for display (:issue: `11305`)


move to Performance section

say Performance issue in rendering a large number of categories when printing a Categorical or Series of category dtype

jreback · 2015-10-19T21:40:30Z

comments, then pls squash

jreback · 2015-11-13T15:13:19Z

merged via 4777800

thanks!

Checking for length of categories before doing string conversion. fixes

52fa9fa

pandas-dev#11305

jreback added Performance Memory or execution speed performance Categorical Categorical Data Type labels Oct 13, 2015

jreback added this to the 0.17.1 milestone Oct 13, 2015

jreback added the Output-Formatting __repr__ of pandas objects, to_string label Oct 13, 2015

adding whatsnew entry and asv benchmark test.

b63b462

jorisvandenbossche reviewed Oct 18, 2015
View reviewed changes

jreback reviewed Oct 18, 2015
View reviewed changes

michaelaye added 2 commits October 18, 2015 15:25

moving whatsnew entry to performance. fixing broken asv test

cdff5bc

changing asv goal time to 3 ms, as it failed to meet goal on a trial run

8f2ca4a

jreback reviewed Oct 19, 2015
View reviewed changes

jreback closed this Nov 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Checking for length of categories before doing string conversion. fix… #11306

Checking for length of categories before doing string conversion. fix… #11306

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -50,6 +50,8 @@ Performance Improvements

		.. _whatsnew_0171.bug_fixes:

		- Performance bug in ``Categorical._repr_categories`` was rendering string before chopping them for display (:issue: `11305`)

Uh oh!

Checking for length of categories before doing string conversion. fix… #11306

Checking for length of categories before doing string conversion. fix… #11306

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!