PERF: optimize memory usage for to_hdf #9648

jreback · 2015-03-13T18:27:16Z

reduce memeory usage necessary for using to_hdf

was copying always in astyping
was ravelling then reshaping
was constantly allocating a new chunked buffer, now re-uses the same buffer

In [1]: df = pd.DataFrame(np.random.rand(1000000,500))
df.info()

In [2]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000000 entries, 0 to 999999
Columns: 500 entries, 0 to 499
dtypes: float64(500)
memory usage: 3.7 GB

Previously

In [3]: %memit -r 1 df.to_hdf('test.h5','df',format='table',mode='w')
peak memory: 11029.49 MiB, increment: 7130.57 MiB

With PR

In [2]: memit -r 1 df.to_hdf('test.h5','df',format='table',mode='w')
peak memory: 4669.21 MiB, increment: 794.57 MiB

PERF: optimize memory usage for to_hdf

bwillers · 2015-03-28T14:23:43Z

If you happen to find yourself in new york I'm buying you a beer for this fix.

jreback · 2015-03-28T15:13:48Z

hahha I figured I broke it I should fix it

in Nyc, so anytime!

tomanizer · 2015-03-30T10:30:23Z

Thanks a lot for fixing this! It helps a lot.

sagol · 2019-04-22T07:13:27Z

the bug is back

df = pd.DataFrame(np.random.rand(1000000,500))
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Columns: 500 entries, 0 to 499
dtypes: float64(500)
memory usage: 3.7 GB

%memit -r 1 df.to_hdf('test.h5','df',format='table',mode='w')
peak memory: 7934.20 MiB, increment: 3823.80 MiB

pd.__version__
'0.24.2'

With a more complex structure, everything is much worse.

data_ifa.info()
<class 'pandas.core.frame.DataFrame'>
Index: 100000 entries, b88d3b87-3432-43cc-8219-f45d97389d8f to eb705297-94e8-4ccf-a910-5f3b9734d572
Data columns (total 2 columns):
bundles        100000 non-null object
bundles_len    100000 non-null int64
dtypes: int64(1), object(1)
memory usage: 2.3+ MB

%memit -r 1 data_ifa.to_hdf(full_file_name_hd5, key='data_ifa', encoding='utf-8', complevel=9, mode='w', format='table')
peak memory: 22106.07 MiB, increment: 21324.53 MiB

jreback added Performance Memory or execution speed performance IO HDF5 read_hdf, HDFStore labels Mar 13, 2015

jreback added this to the 0.16.0 milestone Mar 13, 2015

jreback force-pushed the pytables_memory branch 6 times, most recently from d0f8583 to 21e727d Compare March 15, 2015 22:38

PERF: optimize memory usage for to_hdf

4f67501

jreback force-pushed the pytables_memory branch from 21e727d to 4f67501 Compare March 15, 2015 22:56

jreback added a commit that referenced this pull request Mar 16, 2015

Merge pull request #9648 from jreback/pytables_memory

269af25

PERF: optimize memory usage for to_hdf

jreback merged commit 269af25 into pandas-dev:master Mar 16, 2015

jorisvandenbossche mentioned this pull request Mar 23, 2015

PERF: add memory benchmarks? #9707

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: optimize memory usage for to_hdf #9648

PERF: optimize memory usage for to_hdf #9648

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

PERF: optimize memory usage for to_hdf #9648

PERF: optimize memory usage for to_hdf #9648

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants