8000 BigQuery: load_table_from_dataframe should use a temporary file · Issue #7543 · googleapis/google-cloud-python · GitHub
[go: up one dir, main page]

Skip to content

BigQuery: load_table_from_dataframe should use a temporary file #7543

@tswast

Description

@tswast

load_table_from_dataframe currently uses BytesIO when it serializes a pandas dataframe to parquet before uploading it via a load job. This is actually violating the contract for to_parquet, which requires a filepath. BytesIO happens to work when pyarrow is used but not with fastparquet. A more minor reason we may wish to serialize to disk is that dataframe can sometimes be quite large, so spilling to disk would be preferable to filling up memory. Note: the function should clean up after itself by removing the temp file after the load job completes.

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the BigQuery API.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0