-
Notifications
You must be signed in to change notification settings - Fork 6.6k
BigQuery Storage API sample for reading pandas dataframe #1994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0eaca4e
to
a3a48c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, modulo the open question on small results
# [START bigquerystorage_pandas_read_query_results] | ||
import uuid | ||
|
||
# Due to a known issue in the BigQuery Storage API (TODO: link to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider simply running a large query that emits enough data to avoid the inline? Pros: better demonstrates the perf of the new API, and avoids us having to revisit the sample. Cons: test time overhead and potential pitfalls for people kicking tires with small results. Part of this is dependent on how the team will be maintaining their KI list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cons: test time overhead.
Long test time is my main reason for avoiding queries that return big results. I guess it's not so bad since this repo can test the different directories independently.
Cons: potential pitfalls for people kicking tires with small results.
The current failure case is rather bad: it returns a successful response, but gives you an empty result set. That's a pretty big pitfall, because catching it requires noticing that you didn't get the data you thought you were going to get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. Let's confirm how the KIs will be maintained so you can link them.
How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
Blacken sample.
* Move imports inside region tags. * Adjust query indentation to match region tags.
Move duplicate imports out of region tags. Add region tag for the whole sample.
to just above the sample where it is used. This makes the complete source code for the sample make more sense (bigquerystorage_pandas_tutorial_all)
f064277
to
925fe3b
Compare
…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
How to get a pandas DataFrame, fast!
The first two examples use the existing BigQuery client. These examples
create a thread pool and read in parallel. The final example shows using
just the new BigQuery Storage client, but only shows how to read with a
single thread.