8000 docs: set `index_cols` in `read_gbq` as a best practice by tswast · Pull Request #624 · googleapis/python-bigquery-dataframes · GitHub
[go: up one dir, main page]

Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions third_party/bigframes_vendored/pandas/io/gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,17 @@ def read_gbq(
):
"""Loads a DataFrame from BigQuery.

BigQuery tables are an unordered, unindexed data source. By default,
the DataFrame will have an arbitrary index and ordering.

Set the `index_col` argument to one or more columns to choose an
index. The resulting DataFrame is sorted by the index columns. For the
best performance, ensure the index columns don't contain duplicate
values.
BigQuery tables are an unordered, unindexed data source. To add support
pandas-compatibility, the following indexing options are supported:

* (Default behavior) Add an arbitrary sequential index and ordering
using an an analytic windowed operation that prevents filtering
push down.
* (Recommended) Set the ``index_col`` argument to one or more columns.
Unique values for the row labels are recommended. Duplicate labels
are possible, but note that joins on a non-unique index can duplicate
rows and operations like ``cumsum()`` that window across a non-unique
index can have some non-deternimism.

.. note::
By default, even SQL query inputs with an ORDER BY clause create a
Expand Down
0