8000 BigQuery: Allow specifying index data type in partial schema to `load_table_from_dataframe`. by tswast · Pull Request #9084 · googleapis/google-cloud-python · GitHub
[go: up one dir, main page]

Skip to content

Conversation

tswast
Copy link
Contributor
@tswast tswast commented Aug 23, 2019

Closes #5572.

If an index (or level of a multi-index) has a name and is present in the
schema passed to load_table_from_dataframe, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.

Remaining items:

  • Update unit tests to account for new ValueErrors for missing / extra columns.
  • Fix tests for moved sample samples/load_table_dataframe.py.
  • Add unit tests to account for new index behavior, especially some tests with MultIndex DataFrames.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 23, 2019
@tswast tswast added the api: bigquery Issues related to the BigQuery API. label Aug 23, 2019
@tswast
Copy link
Contributor Author
tswast commented Aug 23, 2019

Thought: if an index column is requested, but we end up wanting to return a schema of None, that's an error, because the requested index column might not be written or it might not be written with the correct data type.

…m_dataframe`.

If an index (or level of a multi-index) has a name and is present in the
schema passed to `load_table_from_dataframe`, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.
@tswast tswast force-pushed the issue5572-load-dataframe-indexes branch from b04a3c6 to 14e6baa Compare August 26, 2019 18:23
@tswast tswast marked this pull request as ready for review August 27, 2019 17:59
@tswast tswast requested review from a team and plamut August 27, 2019 17:59
@tswast
Copy link
Contributor Author
tswast commented Aug 27, 2019

Counter-thought: We'll already display a deprecation warning when we have to fallback to automatic schema detection via to_parquet. Also, if they are using pyarrow, then we know the index will get written to the parquet file.

Copy link
Contributor
@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but spotted a few things that are worth checking again IMO.

@plamut
Copy link
Contributor
plamut commented Aug 28, 2019

@tswast Please just blacken test__pandas_helpers.py, the lint check complains.

Copy link
Contributor
@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks for the quick updates!

@tswast tswast merged commit a6ed945 into googleapis:master Aug 28, 2019
@tswast tswast deleted the issue5572-load-dataframe-indexes branch August 28, 2019 19:36
@tswast tswast mentioned this pull request Aug 28, 2019
HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019
…rame`. (googleapis#9084)

* Specify the index data type in partial schema to `load_table_from_dataframe` to include it.

If an index (or level of a multi-index) has a name and is present in the
schema passed to `load_table_from_dataframe`, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.

* Don't include index if has same name as column name.

* Move `load_table_dataframe` sample from `snippets.py` to `samples/`.

Sample now demonstrates how to manually include the index with a
partial schema definition. Update docs reference to new
`load_table_dataframe` sample location.
emar-kar pushed a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
…rame`. (googleapis#9084)

* Specify the index data type in partial schema to `load_table_from_dataframe` to include it.

If an index (or level of a multi-index) has a name and is present in the
schema passed to `load_table_from_dataframe`, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.

* Don't include index if has same name as column name.

* Move `load_table_dataframe` sample from `snippets.py` to `samples/`.

Sample now demonstrates how to manually include the index with a
partial schema definition. Update docs reference to new
`load_table_dataframe` sample location.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: Load to table from dataframe without index
3 participants
0