BigQuery Storage API sample for reading pandas dataframe #1994

tswast · 2019-02-04T23:20:05Z

How to get a pandas DataFrame, fast!

The first two examples use the existing BigQuery client. These examples
create a thread pool and read in parallel. The final example shows using
just the new BigQuery Storage client, but only shows how to read with a
single thread.

shollyman

Looks good, modulo the open question on small results

shollyman · 2019-02-05T17:20:06Z

bigquery_storage/to_dataframe/main.py

+    # [START bigquerystorage_pandas_read_query_results]
+    import uuid
+
+    # Due to a known issue in the BigQuery Storage API (TODO: link to


Should we consider simply running a large query that emits enough data to avoid the inline? Pros: better demonstrates the perf of the new API, and avoids us having to revisit the sample. Cons: test time overhead and potential pitfalls for people kicking tires with small results. Part of this is dependent on how the team will be maintaining their KI list.

Cons: test time overhead.

Long test time is my main reason for avoiding queries that return big results. I guess it's not so bad since this repo can test the different directories independently.

Cons: potential pitfalls for people kicking tires with small results.

The current failure case is rather bad: it returns a successful response, but gives you an empty result set. That's a pretty big pitfall, because catching it requires noticing that you didn't get the data you thought you were going to get.

Works for me. Let's confirm how the KIs will be maintained so you can link them.

bigquery_storage/to_dataframe/main.py

bigquery_storage/to_dataframe/main_test.py

How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.

Blacken sample.

* Move imports inside region tags. * Adjust query indentation to match region tags.

Move duplicate imports out of region tags. Add region tag for the whole sample.

to just above the sample where it is used. This makes the complete source code for the sample make more sense (bigquerystorage_pandas_tutorial_all)

…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.

tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 4, 2019

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 4, 2019

tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 5, 2019

tswast force-pushed the tswast-bqstorage-pandas branch from 0eaca4e to a3a48c0 Compare February 5, 2019 00:37

tswast requested review from alixhami and shollyman February 5, 2019 00:37

tswast added the bigquery label Feb 5, 2019

shollyman approved these changes Feb 5, 2019

View reviewed changes

alixhami reviewed Feb 5, 2019

View reviewed changes

bigquery_storage/to_dataframe/main_test.py Outdated Show resolved Hide resolved

alixhami reviewed Feb 5, 2019

View reviewed changes

alixhami approved these changes Feb 6, 2019

View reviewed changes

tswast added 10 commits February 7, 2019 10:46

Add tests.

b7fae9d

Add fixture to create temporary dataset.

6ded541

Blacken sample.

Remove unnecessary CLI logic.

3ccc002

* Move imports inside region tags. * Adjust query indentation to match region tags.

Move sample code to test file.

2274a87

Rename project_id to your_project_id

1be1a6e

Move duplicate imports out of region tags. Add region tag for the whole sample.

Remove check for no streams. Let index accesssor fail, instead.

0dbcca6

Move temporary dataset creation...

398a2de

to just above the sample where it is used. This makes the complete source code for the sample make more sense (bigquerystorage_pandas_tutorial_all)

Trim line length.

a4b6740

Add region tags for use in cleanup section.

925fe3b

tswast force-pushed the tswast-bqstorage-pandas branch from f064277 to 925fe3b Compare February 7, 2019 18:47

tswast merged commit e9bc7de into master Feb 7, 2019

tswast deleted the tswast-bqstorage-pandas branch February 7, 2019 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BigQuery Storage API sample for reading pandas dataframe #1994

BigQuery Storage API sample for reading pandas dataframe #1994

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BigQuery Storage API sample for reading pandas dataframe #1994

BigQuery Storage API sample for reading pandas dataframe #1994

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!