8000 Checking for length of categories before doing string conversion. fix… by michaelaye · Pull Request #11306 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

Checking for length of categories before doing string conversion. fix… #11306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

michaelaye
Copy link
Contributor

closes #11305

@jreback jreback added Performance Memory or execution speed performance Categorical Categorical Data Type labels Oct 13, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 13, 2015
@jreback
Copy link
Contributor
jreback commented Oct 13, 2015
  • can you add a whats new note.
  • pls add an asv benchmark (like your sample but only use 1000 categories)

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Oct 13, 2015
@jreback
Copy link
Contributor
jreback commented Oct 15, 2015

can you update

@michaelaye
Copy link
Contributor Author

Coming up. Had to read up where/how to do that.

@michaelaye
Copy link
Contributor Author

the categorical asv benchmark file breaks PEP8 302: 2 lines between functions. Should I correct? I searched but did not find a least of PEP8 errors the pandas decided to ignore.

self.data = df[df.C == '20']

def time_rendering(self):
str(data.C)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data -> self.data ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry. I can't run asv on my machine. first it complained about a missing config file, when i tried to run only the categorical bench test. apparently, when i'm not restricting it with a -b flag, it creates a new environment, but it seems to be incompatible with conda, as it creates a py2.7 environment now, while being run in a conda python3.4 environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe, that's meant to happen, as it tests for all to be tested environment? I don't know, have never used asv before.

@jorisvandenbossche
Copy link
Member

@michaelaye that's ok for the PEP8 changes

@@ -55,7 +55,7 @@ Bug Fixes

- Bug in ``.to_latex()`` output broken when the index has a name (:issue: `10660`)
- Bug in ``HDFStore.append`` with strings whose encoded length exceded the max unencoded length (:issue:`11234`)

- Performance bug in ``Categorical._repr_categories`` was rendering string before chopping them for display (:issue: `11305`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to Performance section

@michaelaye
Copy link
Contributor Author

I was able to start an asv run, but for some reason, it only runs a python2.7 env, don't know why:

$ asv continuous master HEAD -b categorical
· Creating environments
· Discovering benchmarks
·· Uninstalling from py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt
·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt...................................
·· Installing into py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt..
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For pandas commit hash cdff5bce:
[  0.00%] ·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt.........................................
[  0.00%] ·· Benchmarking py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt
[  8.33%] ··· Running categoricals.categorical_constructor.time_fastpath                                4.85ms
[ 16.67%] ··· Running categoricals.categorical_constructor.time_regular_constructor                   226.53ms
[ 25.00%] ··· Running categoricals.categorical_rendering.time_rendering                                 2.31ms
[ 33.33%] ··· Running categoricals.categorical_value_counts.time_value_counts                          18.63ms
[ 41.67%] ··· Running categoricals.categorical_value_counts.time_value_counts_dropna                   24.74ms
[ 50.00%] ··· Running categoricals.concat_categorical.time_concat_categorical                          46.91ms
[ 50.00%] · For pandas commit hash c2aa6a23:
[ 50.00%] ·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt......................................
[ 50.00%] ·· Benchmarking py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlwt
[ 58.33%] ··· Running categoricals.categorical_constructor.time_fastpath                                4.93ms
[ 66.67%] ··· Running categoricals.categorical_constructor.time_regular_constructor                   210.66ms
[ 75.00%] ··· Running categoricals.categorical_rendering.time_rendering                                17.63ms
[ 83.33%] ··· Running categoricals.categorical_value_counts.time_value_counts                          13.60ms
[ 91.67%] ··· Running categoricals.categorical_value_counts.time_value_counts_dropna                   17.59ms
[100.00%] ··· Running categoricals.concat_categorical.time_concat_categorical                          50.98ms    before     after       ratio
  [c2aa6a23] [cdff5bce]
-   17.63ms     2.31ms      0.13  categoricals.categorical_rendering.time_rendering
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@@ -50,6 +50,8 @@ Performance Improvements

.. _whatsnew_0171.bug_fixes:

- Performance bug in ``Categorical._repr_categories`` was rendering string before chopping them for display (:issue: `11305`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to Performance section

say Performance issue in rendering a large number of categories when printing a Categorical or Series of category dtype

@jreback
Copy link
Contributor
jreback commented Oct 19, 2015

comments, then pls squash

@jreback
Copy link
Contributor
jreback commented Nov 13, 2015

merged via 4777800

thanks!

@jreback jreback closed this Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Output-Formatting __repr__ of pandas objects, to_string Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: rendering of large number of categories
3 participants
0