8000 fix!: remove out-of-date BigQuery ML protocol buffers by tswast · Pull Request #1178 · googleapis/python-bigquery · GitHub
[go: up one dir, main page]

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
9cd7554
deps!: BigQuery Storage and pyarrow are required dependencies (#776)
plamut Jul 27, 2021
9319eb1
chore: merge recent changes from master (#823)
plamut Jul 28, 2021
e26d879
chore: sync v3 with master (#851)
plamut Aug 5, 2021
66014c3
chore: merge changes from master (#872)
tswast Aug 12, 2021
dcd78c7
fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786)
tswast Aug 16, 2021
60e73fe
chore: sync v3 with master branch (#880)
tswast Aug 16, 2021
2689df4
feat: Destination tables are no-longer removed by create_job (#891)
Aug 23, 2021
eed311e
chore: Simplify create_job slightly (#893)
Aug 23, 2021
2cb1c21
chore: sync v3 branch with main (#947)
plamut Sep 9, 2021
a7842b6
chore!: remove google.cloud.bigquery_v2 code (#855)
plamut Sep 27, 2021
b0cbfef
chore: sync v3 branch with main (#996)
plamut Sep 30, 2021
71dde11
feat: add a static copy of legacy proto-based types (#1000)
plamut Oct 6, 2021
deec8e7
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Oct 6, 2021
beaadc8
🦉 Updates from OwlBot
gcf-owl-bot[bot] Oct 6, 2021
750c808
chore: remove unnecessary replacement from owlbot
tswast Oct 6, 2021
15c4055
Merge remote-tracking branch 'upstream/sync-v3' into sync-v3
tswast Oct 6, 2021
6bfbb7d
🦉 Updates from OwlBot
gcf-owl-bot[bot] Oct 6, 2021
72255a6
Apply suggestions from code review
tswast Oct 6, 2021
8a3b1ad
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Oct 7, 2021
0d0aedb
Merge remote-tracking branch 'upstream/sync-v3' into sync-v3
tswast Oct 7, 2021
1661262
🦉 Updates from OwlBot
gcf-owl-bot[bot] Oct 7, 2021
2c90edc
chore: remove unused _PYARROW_BAD_VERSIONS
tswast Oct 7, 2021
aa3c7d2
Merge remote-tracking branch 'upstream/sync-v3' into sync-v3
tswast Oct 7, 2021
7852c5c
🦉 Updates from OwlBot
gcf-owl-bot[bot] Oct 7, 2021
ed9b6cf
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Oct 8, 2021
294990a
Merge remote-tracking branch 'upstream/sync-v3' into sync-v3
tswast Oct 8, 2021
d448d0e
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Oct 11, 2021
50753cc
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Oct 14, 2021
c67377a
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Oct 26, 2021
6706678
Merge pull request #1004 from googleapis/sync-v3
plamut Oct 28, 2021
40c92c3
chore: cleanup intersphinx links (#1035)
tswast Nov 1, 2021
23d1187
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Nov 4, 2021
61e3d57
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Nov 4, 2021
7162f98
Merge pull request #1049 from googleapis/sync-v3
tswast Nov 5, 2021
12c2272
Merge branch 'main' into sync-v3-with-main
plamut Nov 9, 2021
859a65d
Fix type hints and discovered bugs
plamut Nov 9, 2021
42d3db6
Merge pull request #1055 from plamut/sync-v3-with-main
tswast Nov 10, 2021
3d1af95
feat!: Use pandas custom data types for BigQuery DATE and TIME column…
Nov 10, 2021
070729f
process: mark the package as type-checked (#1058)
plamut Nov 11, 2021
3cae066
feat: default to DATETIME type when loading timezone-naive datetimes …
plamut Nov 16, 2021
86fd253
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Nov 16, 2021
dad555d
chore: release 3.0.0b1 (pre-release)
tswast Nov 16, 2021
9fd8eb9
Merge pull request #1065 from googleapis/sync-v3
tswast Nov 16, 2021
3b3ebff
feat: add `api_method` parameter to `Client.query` to select `INSERT`…
tswast Dec 2, 2021
7e3721e
fix: improve type annotations for mypy validation (#1081)
plamut Dec 14, 2021
2b76944
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Dec 15, 2021
950b24e
chore: add type annotations for mypy
tswast Dec 15, 2021
e888c71
chore: revert test for when pyarrow is not installed
tswast Dec 15, 2021
011f160
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Dec 16, 2021
08a9bcc
Merge pull request #1088 from googleapis/sync-v3
tswast Dec 16, 2021
aea8d55
chore: sync main into v3 branch
tswast Jan 13, 2022
dd40c24
test: fix pandas tests with new bqstorage client (#1113)
tswast Jan 19, 2022
727a18d
Merge branch 'v3' into sync-v3
tswast Jan 19, 2022
c58ba76
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Jan 19, 2022
b67b255
Merge pull request #1109 from googleapis/sync-v3
tswast Jan 20, 2022
5f50242
feat: use `StandardSqlField` class for `Model.feature_columns` and `M…
tswast Jan 28, 2022
fec1ae6
Merge branch 'upstream/main' into sync-v3
tswast Mar 25, 2022
b4f4847
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Mar 25, 2022
dedb2ea
docs: add type annotations to job samples
tswast Mar 25, 2022
0279fa9
chore: blacken with black 22.3.0
tswast Mar 29, 2022
35d2d70
Merge remote-tracking branch 'upstream/main' into sync-v3
tswast Mar 29, 2022
9d256d5
Merge pull request #1175 from googleapis/sync-v3
tswast Mar 29, 2022
af0ecb0
docs: Add migration guide from version 2.x to 3.x (#1027)
plamut Mar 29, 2022
f69bae7
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Mar 29, 2022
8bd4d39
Merge branch 'v3' of https://github.com/googleapis/python-bigquery in…
gcf-owl-bot[bot] Mar 29, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ fail_under = 100
show_missing = True
omit =
google/cloud/bigquery/__init__.py
google/cloud/bigquery_v2/* # Legacy proto-based types.
exclude_lines =
# Re-enable the standard pragma
pragma: NO COVER
Expand Down
5 changes: 1 addition & 4 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Python Client for Google BigQuery
=================================

|GA| |pypi| |versions|
|GA| |pypi| |versions|

Querying massive datasets can be time consuming and expensive without the
right hardware and infrastructure. Google `BigQuery`_ solves this problem by
Expand Down Expand Up @@ -140,6 +140,3 @@ In this example all tracing data will be published to the Google

.. _OpenTelemetry documentation: https://opentelemetry-python.readthedocs.io
.. _Cloud Trace: https://cloud.google.com/trace



186 changes: 185 additions & 1 deletion UPGRADING.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,190 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

# 3.0.0 Migration Guide

## New Required Dependencies

Some of the previously optional dependencies are now *required* in `3.x` versions of the
library, namely
[google-cloud-bigquery-storage](https://pypi.org/project/google-cloud-bigquery-storage/)
(minimum version `2.0.0`) and [pyarrow](https://pypi.org/project/pyarrow/) (minimum
version `3.0.0`).

The behavior of some of the package "extras" has thus also changed:
* The `pandas` extra now requires the [db-types](https://pypi.org/project/db-dtypes/)
package.
* The `bqstorage` extra has been preserved for comaptibility reasons, but it is now a
no-op and should be omitted when installing the BigQuery client library.

**Before:**
```
$ pip install google-cloud-bigquery[bqstorage]
```

**After:**
```
$ pip install google-cloud-bigquery
```

* The `bignumeric_type` extra has been removed, as `BIGNUMERIC` type is now
automatically supported. That extra should thus not be used.

**Before:**
```
$ pip install google-cloud-bigquery[bignumeric_type]
```

**After:**
```
$ pip install google-cloud-bigquery
```


## Type Annotations

The library is now type-annotated and declares itself as such. If you use a static
type checker such as `mypy`, you might start getting errors in places where
`google-cloud-bigquery` package is used.

It is recommended to update your code and/or type annotations to fix these errors, but
if this is not feasible in the short term, you can temporarily ignore type annotations
in `google-cloud-bigquery`, for example by using a special `# type: ignore` comment:

```py
from google.cloud import bigquery # type: ignore
```

But again, this is only recommended as a possible short-term workaround if immediately
fixing the type check errors in your project is not feasible.

## Re-organized Types

The auto-generated parts of the library has been removed, and proto-based types formerly
found in `google.cloud.bigquery_v2` have been replaced by the new implementation (but
see the [section](#legacy-types) below).

For example, the standard SQL data types should new be imported from a new location:

**Before:**
```py
from google.cloud.bigquery_v2 import StandardSqlDataType
from google.cloud.bigquery_v2.types import StandardSqlField
from google.cloud.bigquery_v2.types.standard_sql import StandardSqlStructType
```

**After:**
```py
from google.cloud.bigquery import StandardSqlDataType
from google.cloud.bigquery.standard_sql import StandardSqlField
from google.cloud.bigquery.standard_sql import StandardSqlStructType
```

The `TypeKind` enum defining all possible SQL types for schema fields has been renamed
and is not nested anymore under `StandardSqlDataType`:


**Before:**
```py
from google.cloud.bigquery_v2 import StandardSqlDataType

if field_type == StandardSqlDataType.TypeKind.STRING:
...
```

**After:**
```py

from google.cloud.bigquery import StandardSqlTypeNames

if field_type == StandardSqlTypeNames.STRING:
...
```


## Issuing queries with `Client.create_job` preserves destination table

The `Client.create_job` method no longer removes the destination table from a
query job's configuration. Destination table for the query can thus be
explicitly defined by the user.


## Changes to data types when reading a pandas DataFrame

The default dtypes returned by the `to_dataframe` method have changed.

* Now, the BigQuery `BOOLEAN` data type maps to the pandas `boolean` dtype.
Previously, this mapped to the pandas `bool` dtype when the column did not
contain `NULL` values and the pandas `object` dtype when `NULL` values are
present.
* Now, the BigQuery `INT64` data type maps to the pandas `Int64` dtype.
Previously, this mapped to the pandas `int64` dtype when the column did not
contain `NULL` values and the pandas `float64` dtype when `NULL` values are
present.
* Now, the BigQuery `DATE` data type maps to the pandas `dbdate` dtype, which
is provided by the
[db-dtypes](https://googleapis.dev/python/db-dtypes/latest/index.html)
package. If any date value is outside of the range of
[pandas.Timestamp.min](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.min.html)
(1677-09-22) and
[pandas.Timestamp.max](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.max.html)
(2262-04-11), the data type maps to the pandas `object` dtype. The
`date_as_object` parameter has been removed.
* Now, the BigQuery `TIME` data type maps to the pandas `dbtime` dtype, which
is provided by the
[db-dtypes](https://googleapis.dev/python/db-dtypes/latest/index.html)
package.


## Changes to data types loading a pandas DataFrame

In the absence of schema information, pandas columns with naive
`datetime64[ns]` values, i.e. without timezone information, are recognized and
loaded using the `DATETIME` type. On the other hand, for columns with
timezone-aware `datetime64[ns, UTC]` values, the `TIMESTAMP` type is continued
to be used.

## Changes to `Model`, `Client.get_model`, `Client.update_model`, and `Client.list_models`

The types of several `Model` properties have been changed.

- `Model.feature_columns` now returns a sequence of `google.cloud.bigquery.standard_sql.StandardSqlField`.
- `Model.label_columns` now returns a sequence of `google.cloud.bigquery.standard_sql.StandardSqlField`.
- `Model.model_type` now returns a string.
- `Model.training_runs` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.training_runs).

<a name="legacy-protobuf-types"></a>
## Legacy Protocol Buffers Types

For compatibility reasons, the legacy proto-based types still exists as static code
and can be imported:

```py
from google.cloud.bigquery_v2 import Model # a sublcass of proto.Message
```

Mind, however, that importing them will issue a warning, because aside from
being importable, these types **are not maintained anymore**. They may differ
both from the types in `google.cloud.bigquery`, and from the types supported on
the backend.

### Maintaining compatibility with `google-cloud-bigquery` version 2.0

If you maintain a library or system that needs to support both
`google-cloud-bigquery` version 2.x and 3.x, it is recommended that you detect
when version 2.x is in use and convert properties that use the legacy protocol
buffer types, such as `Model.training_runs`, into the types used in 3.x.

Call the [`to_dict`
method](https://proto-plus-python.readthedocs.io/en/latest/reference/message.html#proto.message.Message.to_dict)
on the protocol buffers objects to get a JSON-compatible dictionary.

```py
from google.cloud.bigquery_v2 import Model

training_run: Model.TrainingRun = ...
training_run_dict = training_run.to_dict()
```

# 2.0.0 Migration Guide

Expand Down Expand Up @@ -56,4 +240,4 @@ distance_type = enums.Model.DistanceType.COSINE
from google.cloud.bigquery_v2 import types

distance_type = types.Model.DistanceType.COSINE
```
```
14 changes: 14 additions & 0 deletions docs/bigquery/legacy_proto_types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Legacy proto-based Types for Google Cloud Bigquery v2 API
=========================================================

.. warning::
These types are provided for backward compatibility only, and are not maintained
anymore. They might also differ from the types uspported on the backend. It is
therefore strongly advised to migrate to the types found in :doc:`standard_sql`.

Also see the :doc:`3.0.0 Migration Guide<../UPGRADING>` for more information.

.. automodule:: google.cloud.bigquery_v2.types
:members:
:undoc-members:
:show-inheritance:
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Types for Google Cloud Bigquery v2 API
======================================

.. automodule:: google.cloud.bigquery_v2.types
.. automodule:: google.cloud.bigquery.standard_sql
:members:
:undoc-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,12 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = [
"google/cloud/bigquery_v2/**", # Legacy proto-based types.
"_build",
"**/.nox/**/*",
"samples/AUTHORING_GUIDE.md",
"samples/CONTRIBUTING.md",
"samples/snippets/README.rst",
"bigquery_v2/services.rst", # generated by the code generator
]

# The reST default role (used for this markup: `text`) to use for all
Expand Down
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ API Reference
Migration Guide
---------------

See the guide below for instructions on migrating to the 2.x release of this library.
See the guides below for instructions on migrating from older to newer *major* releases
of this library (from ``1.x`` to ``2.x``, or from ``2.x`` to ``3.x``).

.. toctree::
:maxdepth: 2
Expand Down
19 changes: 17 additions & 2 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,9 +202,24 @@ Encryption Configuration
Additional Types
================

Protocol buffer classes for working with the Models API.
Helper SQL type classes.

.. toctree::
:maxdepth: 2

bigquery_v2/types
bigquery/standard_sql


Legacy proto-based Types (deprecated)
=====================================

The legacy type classes based on protocol buffers.

.. deprecated:: 3.0.0
These types are provided for backward compatibility only, and are not maintained
anymore.

.. toctree::
:maxdepth: 2

bigquery/legacy_proto_types
4 changes: 0 additions & 4 deletions docs/snippets.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@
import pandas
except (ImportError, AttributeError):
pandas = None
try:
import pyarrow
except (ImportError, AttributeError):
pyarrow = None

from google.api_core.exceptions import InternalServerError
from google.api_core.exceptions import ServiceUnavailable
Expand Down
38 changes: 35 additions & 3 deletions docs/usage/pandas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ First, ensure that the :mod:`pandas` library is installed by running:

pip install --upgrade pandas

Alternatively, you can install the BigQuery python client library with
Alternatively, you can install the BigQuery Python client library with
:mod:`pandas` by running:

.. code-block:: bash

pip install --upgrade google-cloud-bigquery[pandas]
pip install --upgrade 'google-cloud-bigquery[pandas]'

To retrieve query results as a :class:`pandas.DataFrame`:

Expand All @@ -37,6 +37,38 @@ To retrieve table rows as a :class:`pandas.DataFrame`:
:start-after: [START bigquery_list_rows_dataframe]
:end-before: [END bigquery_list_rows_dataframe]

The following data types are used when creating a pandas DataFrame.

.. list-table:: Pandas Data Type Mapping
:header-rows: 1

* - BigQuery
- pandas
- Notes
* - BOOL
- boolean
-
* - DATETIME
- datetime64[ns], object
- The object dtype is used when there are values not representable in a
pandas nanosecond-precision timestamp.
* - DATE
- dbdate, object
- The object dtype is used when there are values not representable in a
pandas nanosecond-precision timestamp.

Requires the ``db-dtypes`` package. See the `db-dtypes usage guide
<https://googleapis.dev/python/db-dtypes/latest/usage.html>`_
* - FLOAT64
- float64
-
* - INT64
- Int64
-
* - TIME
- dbtime
- Requires the ``db-dtypes`` package. See the `db-dtypes usage guide
<https://googleapis.dev/python/db-dtypes/latest/usage.html>`_

Retrieve BigQuery GEOGRAPHY data as a GeoPandas GeoDataFrame
------------------------------------------------------------
Expand All @@ -60,7 +92,7 @@ As of version 1.3.0, you can use the
to load data from a :class:`pandas.DataFrame` to a
:class:`~google.cloud.bigquery.table.Table`. To use this function, in addition
to :mod:`pandas`, you will need to install the :mod:`pyarrow` library. You can
install the BigQuery python client library with :mod:`pandas` and
install the BigQuery Python client library with :mod:`pandas` and
:mod:`pyarrow` by running:

.. code-block:: bash
Expand Down
Loading
0