8000 DOC: Update performance comparison section of io docs by WuraolaOyewusi · Pull Request #28890 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

DOC: Update performance comparison section of io docs #28890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Nov 9, 2019
Merged
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
4e85c6d
Merge pull request #1 from pandas-dev/master
WuraolaOyewusi Aug 21, 2019
44df2ee
Merge pull request #2 from pandas-dev/master
WuraolaOyewusi Aug 22, 2019
b887983
Merge pull request #3 from pandas-dev/master
WuraolaOyewusi Aug 23, 2019
9554ea6
Merge pull request #4 from pandas-dev/master
WuraolaOyewusi Sep 17, 2019
fd27a6f
Merge pull request #5 from pandas-dev/master
WuraolaOyewusi Sep 24, 2019
3425a0a
Merge pull request #6 from pandas-dev/master
WuraolaOyewusi Oct 2, 2019
e53bce0
Update io.rst
WuraolaOyewusi Oct 10, 2019
76ccef3
Update io.rst
WuraolaOyewusi Oct 10, 2019
9672526
Update io.rst
WuraolaOyewusi Oct 10, 2019
d2c1e20
Update io.rst
WuraolaOyewusi Oct 10, 2019
709d571
Update io.rst
WuraolaOyewusi Oct 10, 2019
ddd39f6
Update io.rst
WuraolaOyewusi Oct 10, 2019
26b5db1
Update io.rst
WuraolaOyewusi Oct 10, 2019
cf85f95
Update io.rst
WuraolaOyewusi Oct 10, 2019
3d71d40
Update io.rst
WuraolaOyewusi Oct 10, 2019
8c8ed93
Update io.rst
WuraolaOyewusi Oct 10, 2019
3e62c8f
Update io.rst
WuraolaOyewusi Oct 11, 2019
1af539c
Update io.rst
WuraolaOyewusi Oct 11, 2019
ce51d5e
Update io.rst
WuraolaOyewusi Oct 11, 2019
524c7e0
Update io.rst
WuraolaOyewusi Oct 12, 2019
2b77c5d
Update io.rst
WuraolaOyewusi Oct 12, 2019
2224738
restore indentation
jorisvandenbossche Oct 21, 2019
df377c1
fixup
jorisvandenbossche Oct 21, 2019
e3eba95
Update io.rst
WuraolaOyewusi Nov 8, 2019
3aa5dea
Update io.rst
WuraolaOyewusi Nov 8, 2019
0af75a0
Merge branch 'master' into Update-Performance-Comparison-section-of-I…
WuraolaOyewusi Nov 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update io.rst
  • Loading branch information
WuraolaOyewusi authored Oct 10, 2019
commit 709d5716b016d6d24ecdb5f3bb5fe4d2b97f86ea
112 changes: 55 additions & 57 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5593,97 +5593,95 @@ Given the next test set:

.. code-block:: python

from numpy.random import randn
from numpy.random import randn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this example to use our formatting, meaning don import like this, rather use
np.random.randn directory, and np.random.seed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sz = 1000000
df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz})

sz = 1000000
df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz})

def test_sql_write(df):
if os.path.exists('test.sql'):
os.remove('test.sql')
sql_db = sqlite3.connect('test.sql')
df.to_sql(name='test_table', con=sql_db)
sql_db.close()

def test_sql_write(df):
if os.path.exists('test.sql'):
os.remove('test.sql')
sql_db = sqlite3.connect('test.sql')
df.to_sql(name='test_table', con=sql_db)
sql_db.close()
def test_sql_read():
sql_db = sqlite3.connect('test.sql')
pd.read_sql_query("select * from test_table", sql_db)
sql_db.close()


def test_sql_read():
sql_db = sqlite3.connect('test.sql')
pd.read_sql_query("select * from test_table", sql_db)
sql_db.close()
def test_hdf_fixed_write(df):
df.to_hdf('test_fixed.hdf', 'test', mode='w')


def test_hdf_fixed_write(df):
df.to_hdf('test_fixed.hdf', 'test', mode='w')
def test_hdf_fixed_read():
pd.read_hdf('test_fixed.hdf', 'test')


def test_hdf_fixed_read():
pd.read_hdf('test_fixed.hdf', 'test')
def test_hdf_fixed_write_compress(df):
df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc')


def test_hdf_fixed_write_compress(df):
df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc')
def test_hdf_fixed_read_compress():
pd.read_hdf('test_fixed_compress.hdf', 'test')


def test_hdf_fixed_read_compress():
pd.read_hdf('test_fixed_compress.hdf', 'test')
def test_hdf_table_write(df):
df.to_hdf('test_table.hdf', 'test', mode='w', format='table')


def test_hdf_table_write(df):
df.to_hdf('test_table.hdf', 'test', mode='w', format='table')
def test_hdf_table_read():
pd.read_hdf('test_table.hdf', 'test')


def test_hdf_table_read():
pd.read_hdf('test_table.hdf', 'test')
def test_hdf_table_write_compress(df):
df.to_hdf('test_table_compress.hdf', 'test', mode='w',
complib='blosc', format='table')


def test_hdf_table_write_compress(df):
df.to_hdf('test_table_compress.hdf', 'test', mode='w',
complib='blosc', format='table')
def test_hdf_table_read_compress():
pd.read_hdf('test_table_compress.hdf', 'test')


def test_hdf_table_read_compress():
pd.read_hdf('test_table_compress.hdf', 'test')
def test_csv_write(df):
df.to_csv('test.csv', mode='w')


def test_csv_write(df):
df.to_csv('test.csv', mode='w')
def test_csv_read():
pd.read_csv('test.csv', index_col=0)


def test_csv_read():
pd.read_csv('test.csv', index_col=0)


def test_feather_write(df):
df.to_feather('test.feather')
def test_feather_write(df):
df.to_feather('test.feather')


def test_feather_read():
pd.read_feather('test.feather')


def test_pickle_write(df):
df.to_pickle('test.pkl')
def test_pickle_write(df):
df.to_pickle('test.pkl')


def test_pickle_read():
pd.read_pickle('test.pkl')
def test_pickle_read():
pd.read_pickle('test.pkl')


def test_pickle_write_compress(df):
df.to_pickle('test.pkl.compress', compression='xz')
def test_pickle_write_compress(df):
df.to_pickle('test.pkl.compress', compression='xz')


def test_pickle_read_compress():
pd.read_pickle('test.pkl.compress', compression='xz')
def test_pickle_read_compress():
pd.read_pickle('test.pkl.compress', compression='xz')


def test_parquet_write(df):
df.to_parquet('test.parquet')


def test_parquet_read():
pd.read_parquet('test.parquet')
def test_parquet_write(df):
df.to_parquet('test.parquet')
def test_parquet_read():
pd.read_parquet('test.parquet')


When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``.
Expand All @@ -5704,22 +5702,22 @@ When writing, the top-three functions in terms of speed are ``test_feather_write

In [8]: %timeit test_hdf_table_write_compress(df)
448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit test_csv_write(df)
3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit test_feather_write(df)
9.75 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [11]: %timeit test_pickle_write(df)
30.1 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [12]: %timeit test_pickle_write_compress(df)
4.29 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit test_parquet_write(df)
67.6 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and
``test_hdf_fixed_read``.
Expand Down
0