ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame #8011

rkyleg · 2014-08-12T23:59:27Z

When specifying a chunksize during the ExcelFile.parse() method, I should be back an iterable object, but instead a DataFrame is returned.

If you return just the TextParser (change parser.read() to just parser) in the code block below, then iterating over the parsed ExcelFile works and returns the specified number of chunks.

The code block is taken from pandas.io.excel.py line 334

Hopefully I didn't make that confusing.

parser = TextParser(data, header=header, index_col=index_col,
                            has_index_names=has_index_names,
                            na_values=na_values,
                            thousands=thousands,
                            parse_dates=parse_dates,
                            date_parser=date_parser,
                            skiprows=skiprows,
                            skip_footer=skip_footer,
                            chunksize=chunksize,
                            **kwds)
return parser.read()

jreback · 2014-08-15T12:13:26Z

can you show an example of usage

rkyleg · 2014-08-19T21:41:24Z

Sure, code below is taken from my script that is using the workbook.parse() method to return the iterable object. Before I changed 'return parser.read()' to just 'return parser' in the code above, this did not work as it would return each 'cell' of data in the dataframe in the 'for chunk' loop rather than each row.

df = workbook.parse(sheetname, chunksize=2000, iterator=True)
for chunk in df:
       df = chunk.where(pandas.notnull(chunk), None)
       df.to_sql(name='table_name', con=self.engine, if_exists='append', index=False)

mroeschke · 2020-05-07T22:11:34Z

Looks like TextParser doesn't exist anymore. And looks like there there isn't a complete, reproducible example. Happy to reopen if this issue gains more context

jreback added the Excel label Aug 15, 2014

jreback mentioned this issue Sep 11, 2015

Iterate over an Excel file (.xls/.xlsx) without loading the data into the memory #11064

Closed

jreback added Bug Difficulty Intermediate labels Sep 11, 2015

jreback added this to the Next Major Release milestone Sep 11, 2015

jreback added the Prio-medium label Sep 11, 2015

chris-b1 mentioned this issue Sep 27, 2015

API: read_excel signature #11198

Closed

jorisvandenbossche mentioned this issue Jul 27, 2017

chunksize argument removed from read_excel? #17094

Closed

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame #8011

ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame #8011

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame #8011

ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame #8011

Comments

Uh oh!

Uh oh!

Uh oh!