8000 BUG: Fix replace for different dtype equal value by dsaxton · Pull Request #34878 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix replace for different dtype equal value #34878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Replace object block
  • Loading branch information
dsaxton committed Jun 20, 2020
commit f45017f401d237c95dc39c466a7cfef0decce900
10 changes: 7 additions & 3 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -688,9 +688,13 @@ def replace(
# retry
if not self._can_hold_element(to_replace):
if not isinstance(to_replace, list):
if inplace:
return [self]
return [self.copy()]
return self.astype(object).replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only want to do this if these are different types, not generally. please show asv's for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if not self._can_hold_element(to_replace) check is making sure we have different types.

Not seeing any performance regressions from the change (at least in the replace benchmarks):

(pandas-dev) > asv continuous -f 1.1 upstream/master replace-diff-dtype -b ^replace
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Installing 1e014777 <replace-diff-dtype> into conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For pandas commit 9114b4bc <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  4.17%] ··· Running (replace.Convert.time_replace--)....
[ 20.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 25.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 1/2):
[ 25.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 29.17%] ··· Running (replace.Convert.time_replace--)....
[ 45.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 50.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 54.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 54.17%] ··· ============= ============ ============
              --                   replace_data      
              ------------- -------------------------
               constructor   Timestamp    Timedelta  
              ============= ============ ============
                DataFrame     98.6±3ms    95.2±0.5ms 
                  Series     74.0±0.3ms    74.6±1ms  
              ============= ============ ============

[ 58.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 58.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.04±0.01ms 
                False     3.01±0.3ms 
              ========= =============

[ 62.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 62.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.02ms 
                False    4.02±0.07ms 
              ========= =============

[ 66.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 66.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.27±0.02s 
                False    4.24±0.04s 
              ========= ============

[ 70.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 70.83%] ··· ========= ============
               inplace              
              --------- ------------
                 True    51.6±0.6μs 
                False     405±5ms   
              ========= ============

[ 75.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[ 75.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.2±3ms 
                False    467±4ms  
              ========= ==========

[ 75.00%] · For pandas commit 9114b4bc <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 79.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 79.17%] ··· ============= =========== ===========
              --                  replace_data     
              ------------- -----------------------
               constructor   Timestamp   Timedelta 
              ============= =========== ===========
                DataFrame     102±4ms     103±3ms  
                  Series      80.0±4ms    83.1±4ms 
              ============= =========== ===========

[ 83.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 83.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.04ms 
                False     3.04±0.1ms 
              ========= =============

[ 87.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 87.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.20±0.04ms 
                False     4.26±0.2ms 
              ========= =============

[ 91.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 91.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.26±0.01s 
                False    4.24±0.01s 
              ========= ============

[ 95.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 95.83%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    49.6±1μs 
                False    412±6ms  
              ========= ==========

[100.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[100.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.3±2ms 
                False    464±30ms 
              ========= ==========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

to_replace=to_replace,
value=value,
inplace=inplace,
regex=regex,
convert=convert,
)

to_replace = [x for x in to_replace if self._can_hold_element(x)]
if not len(to_replace):
Expand Down
0