API: read_excel signature #11198

chris-b1 · 2015-09-27T15:46:20Z

Replace the **kwds in read_excel with the actual list of supported keyword args. This doesn't
change any functionality, just nicer for interactive use. Also a bit of clarification on the thousands
arg in the docstring.

Additionally, chunksize was a parameter in the ExcelFile.parse signature, but didn't do anything (xref #8011). I removed this and raise NotImplementedError if passed, which is potentially breaking.

jreback · 2015-09-27T16:36:06Z

pandas/io/excel.py

@@ -156,11 +159,18 @@ def get_writer(engine_name):
        Acceptable values are None or xlrd"""

 @Appender(excel_doc_common % read_excel_kwargs)


we still need the excel_doc_common?

Yes - ExcelFile.parse and read_excel both use it.

Just thinking, maybe the api should be changed to match the to_excel semantics where an ExcelFile object could be passed to read_excel (and deprecate ExcelFile.parse).

sure, read_excel should also be able to take an ExcelFile object. Go ahead and add that on.
Yeah .parse was existing before read_excel. go ahead and deprecate that as well (but obviously don't change the use for now). Point people to read_excel. You'll need to change the docs as well.

chris-b1 · 2015-09-27T19:25:03Z

@jreback - alright, I deprecated ExcelFile.parse, modified a bunch of tests to no longer use that.

I also reordered the Excel docs to fit the new note; I think it reads more logically now.

jreback · 2015-09-27T20:32:20Z

doc/source/whatsnew/v0.17.0.txt

@@ -989,6 +991,7 @@ Deprecations
 - ``Series.is_time_series`` deprecated in favor of ``Series.index.is_all_dates`` (:issue:`11135`)
 - Legacy offsets (like ``'A@JAN'``) listed in :ref:`here <timeseries.legacyaliases>` are deprecated (note that this has been alias since 0.8.0), (:issue:`10878`)
 - ``WidePanel`` deprecated in favor of ``Panel``, ``LongPanel`` in favor of ``DataFrame`` (note these have been aliases since < 0.11.0), (:issue:`10892`)
+- ``ExcelFile.parse`` has deprecated in favor ``read_excel(ExcelFile)`` (:issue:`11198`)


is deprecated; in favor of

chris-b1 · 2015-09-27T21:29:20Z

@jreback - made the changes you noted.

jreback · 2015-09-27T21:35:14Z

looks good. ping when green.

jorisvandenbossche · 2015-09-27T21:56:55Z

Is there a good reason to deprecate ExcelFile.parse?
The docs gives an explicit use for it compared to read_excel: http://pandas.pydata.org/pandas-docs/stable/io.html#reading-excel-files

chris-b1 · 2015-09-27T22:25:57Z

@jorisvandenbossche - there's no super-compelling reason; the main idea was to match up with api of to_excel, i.e. the "ExcelFileWrapper" (ExcelFile, ExcelWriter) doesn't have any pandas-specific functionality, instead you pass it into the io functions (read_excel, to_excel).

I did update the docs to cover that specific example. edit: although it may be hard to see in the diff - rendered below.

jorisvandenbossche · 2015-09-27T23:11:55Z

OK, I see that you changed that explanation in the docs I was pointing to. But, what is the point of ExcelFile then? What advantage does it give above just providing the string name?

jreback · 2015-09-27T23:14:46Z

@jorisvandenbossche

I would say there isn't any usecase for exposing ExcelFile in the current impl. Before read_excel, sure it was the way you passed things around. But it is not necessary anymore, unless I am missing something.

So should deprecate this as well.

I could see a use as an object holding the iterator if we do support chunksizing (e.g. kind of like the TableIterator/TextReader classes). But these are actually internal and not exposed (except thru the iteration itself).

jorisvandenbossche · 2015-09-27T23:17:45Z

What I mean is:

xls = pd.ExcelFile('path_to_file.xls')
data['Sheet1'] = pd.read_excel(xls, 'Sheet1', index_col=None, na_values=['NA'])
data['Sheet2'] = pd.read_excel(xls, 'Sheet2', index_col=1)

In the above, the ExcelFile is rather superfluous, and does not seem to add any value (you can just pass the string to both, which yields the same but is shorter). So if we want to deprecate parse, I would rather deprecate ExcelFile itself.

One thing that is possible with Excelfile is to inspect the sheet names. So that is a reason to keep it I think. But if we keep it, I don't really see a reason to deprecate its parse method. It's just like you also have both read_hdf as HDFStore (but I know, HDFStore< 8000 /code> has more extra functionality).

chris-b1 · 2015-09-27T23:20:14Z

So right now ExcelFile does two useful things - first it exposes sheet_names (list of sheets in that file) which can be handy for interactive.

Second, because xlrd loads the whole worbook into memory (as far as I can tell) - there can be a performance benefit.

In [12]: df = pd.DataFrame({'a': np.arange(10000),
    ...:                    'b': np.arange(10000)})

In [13]: with pd.ExcelWriter('temp.xlsx') as f:
    ...:     df.to_excel(f, 'Sheet1')
    ...:     df.to_excel(f, 'Sheet2')

In [14]: %%time
    ...: with pd.ExcelFile('temp.xlsx') as f:
    ...:     pd.read_excel(f, 'Sheet1')
    ...:     pd.read_excel(f, 'Sheet2')
Wall time: 862 ms


In [16]: %%time
    ...: pd.read_excel('temp.xlsx', 'Sheet1')
    ...: pd.read_excel('temp.xlsx', 'Sheet2')
Wall time: 1.55 s

jreback · 2015-09-27T23:22:26Z

@chris-b1 oh, that is good 2 know (maybe add a note about that in the docs).

Ok, I do see some utility in ExcelFile. but .parse seems kind of 'internal' to me. Its superfluous. So we can deprecate to remove from the public API (though it doesn't hurt anything).

chris-b1 · 2015-09-28T00:10:46Z

@jreback - expanded the ExcelFile docs to try and better clarify the purpose and mention the performance consideration.

jorisvandenbossche · 2015-09-28T00:13:17Z

doc/source/io.rst

-``read_excel`` can read more than one sheet, by setting ``sheetname`` to either
-a list of sheet names, a list of sheet positions, or ``None`` to read all sheets.
+   # also can be used as a context manager
+   with pd.ExcelFile('path_to_file.xls')


missing as xlsx:?

yep, good catch

jorisvandenbossche · 2015-09-28T00:31:12Z

I am still not fully convinced that deprecating parse is needed. OK, it does not have much added value, but given this already exists such a long time (and is explicitly included in the api.rst), it will only annoy users by removing it. And it is also not very costly to keep, since the one just calls the other.

This is more subjective, but, when having such a ExcelFile object, it feels a bit more natural to do ExcelFile.parse than passing it to a function like pd.read_excel(ExcelFile) (although ExcelFile.read have been better).

And for clarity, big +1 on the signature and doc changes!

jreback · 2015-09-28T00:37:15Z

ok, @chris-b1 why don't you back out the deprecation of .parse. I don't think its a big deal.

chris-b1 · 2015-09-28T01:58:30Z

Sure, not a problem. Latest changes backs out the deprecation - I left the testing changes in (primarily using read_excel over ExcelFile) although did add back a couple calls to ExcelFile.parse.

jreback · 2015-09-28T10:30:26Z

doc/source/io.rst


-``read_excel`` can read more than one sheet, by setting ``sheetname`` to either
-a list of sheet names, a list of sheet positions, or ``None`` to read all sheets.
+   # ExcelFile.parse commamnd which is equivalent


I would take this out as its duplicative of the above

jreback · 2015-09-28T10:33:55Z

some minor doc changes

chris-b1 · 2015-09-29T00:07:22Z

@jreback - made those doc changes. The travis failure seems to be unrelated? Has to do with HTML formatting

jreback · 2015-09-29T09:54:41Z

yeh that failure happens once in a while, no idea why

jorisvandenbossche · 2015-09-29T10:01:27Z

@chris-b1 I was just thinking: to make it more clear in the docs that ExcelFile.parse is superfluous, we could eg also limit its docstring with a reference to read_excel (and then the template is not needed)

chris-b1 · 2015-09-29T23:37:27Z

@jorisvandenbossche - sure makes sense to me, see update.

jreback · 2015-09-30T22:05:27Z

merged via 0d39ca1

thanks!

jreback reviewed Sep 27, 2015
View reviewed changes

jreback added API Design IO Excel read_excel, to_excel labels Sep 27, 2015

jreback added this to the 0.17.0 milestone Sep 27, 2015

chris-b1 force-pushed the read-excel-sig branch from a7e0945 to 0170181 Compare September 27, 2015 19:20

jsexauer mentioned this pull request Sep 27, 2015

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback reviewed Sep 27, 2015
View reviewed changes

chris-b1 force-pushed the read-excel-sig branch from 0170181 to b16cc4d Compare September 27, 2015 21:25

jorisvandenbossche reviewed Sep 28, 2015
View reviewed changes

chris-b1 force-pushed the read-excel-sig branch from 01d5513 to 70fb54f Compare September 28, 2015 01:02

jreback reviewed Sep 28, 2015
View reviewed changes

chris-b1 force-pushed the read-excel-sig branch 2 times, most recently from 5f6e934 to 9763cbd Compare September 28, 2015 22:56

API: read_excel signature

88708b2

chris-b1 force-pushed the read-excel-sig branch from 9763cbd to 88708b2 Compare September 29, 2015 23:37

jreback closed this Sep 30, 2015

chris-b1 deleted the read-excel-sig branch October 4, 2015 17:01

jorisvandenbossche mentioned this pull request Jul 27, 2017

chunksize argument removed from read_excel? #17094

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: read_excel signature #11198

API: read_excel signature #11198

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -156,11 +159,18 @@ def get_writer(engine_name):
		Acceptable values are None or xlrd"""

		@Appender(excel_doc_common % read_excel_kwargs)

Uh oh!

API: read_excel signature #11198

API: read_excel signature #11198

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!