8000 ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame · Issue #8011 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

ExcelFile.parse() does not return TextFileReader object, but rather a DataFrame #8011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rkyleg opened this issue Aug 12, 2014 · 3 comments
Closed
Labels
Bug IO Excel read_excel, to_excel

Comments

@rkyleg
Copy link
rkyleg commented Aug 12, 2014

When specifying a chunksize during the ExcelFile.parse() method, I should be back an iterable object, but instead a DataFrame is returned.

If you return just the TextParser (change parser.read() to just parser) in the code block below, then iterating over the parsed ExcelFile works and returns the specified number of chunks.

The code block is taken from pandas.io.excel.py line 334

Hopefully I didn't make that confusing.

parser = TextParser(data, header=header, index_col=index_col,
                            has_index_names=has_index_names,
                            na_values=na_values,
                            thousands=thousands,
                            parse_dates=parse_dates,
                            date_parser=date_parser,
                            skiprows=skiprows,
                            skip_footer=skip_footer,
                            chunksize=chunksize,
                            **kwds)
return parser.read()
@jreback jreback added the Excel label Aug 15, 2014
@jreback
Copy link
Contributor
jreback commented Aug 15, 2014

can you show an example of usage

@rkyleg
Copy link
Author
rkyleg commented Aug 19, 2014

Sure, code below is taken from my script that is using the workbook.parse() method to return the iterable object. Before I changed 'return parser.read()' to just 'return parser' in the code above, this did not work as it would return each 'cell' of data in the dataframe in the 'for chunk' loop rather than each row.

df = workbook.parse(sheetname, chunksize=2000, iterator=True)
for chunk in df:
       df = chunk.where(pandas.notnull(chunk), None)
       df.to_sql(name='table_name', con=self.engine, if_exists='append', index=False)

@mroeschke
Copy link
Member

Looks like TextParser doesn't exist anymore. And looks like there there isn't a complete, reproducible example. Happy to reopen if this issue gains more context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

4 participants
0