-
Notifications
You must be signed in to change notification settings - Fork 50
Comparing changes
Open a pull request
base repository: googleapis/python-bigquery-dataframes
base: v2.5.0
head repository: googleapis/python-bigquery-dataframes
compare: v2.6.0
- 18 commits
- 82 files changed
- 11 contributors
Commits on Jun 2, 2025
-
Configuration menu - View commit details
-
Copy full SHA for 7269512 - Browse repository at this point
Copy the full SHA 7269512View commit details -
feat: implement ai.classify() (#1781)
* feat: implement ai.classify() * check label duplicity
Configuration menu - View commit details
-
Copy full SHA for 8af26d0 - Browse repository at this point
Copy the full SHA 8af26d0View commit details -
docs: fix docstrings to improve html rendering of code examples (#1788)
* docs: fix docstrings to improve html rendering of code examples * fix examples docstring in one more file
Configuration menu - View commit details
-
Copy full SHA for 38d9b73 - Browse repository at this point
Copy the full SHA 38d9b73View commit details
Commits on Jun 3, 2025
-
Configuration menu - View commit details
-
Copy full SHA for e480d29 - Browse repository at this point
Copy the full SHA e480d29View commit details -
chore: use faster query_and_wait API in _read_gbq_colab (#1777)
* chore: use faster query_and_wait API in _read_gbq_colab * try to fix unit tests * more unit test fixes * more test fixes * fix mypy * fix metrics counter in read_gbq with allow_large_results=False * use managedarrowtable * Update bigframes/session/loader.py * split out a few special case return values for read_gbq_query * support slice node for repr * fix failing system test * move slice into semiexecutor and out of readlocalnode * unit test for local executor * split method instead of using reloads * fix reference to _start_query * use limit rewrite for slice support * do not use numpy for offsets
Configuration menu - View commit details
-
Copy full SHA for f495c84 - Browse repository at this point
Copy the full SHA f495c84View commit details
Commits on Jun 4, 2025
-
Configuration menu - View commit details
-
Copy full SHA for 0b59cf1 - Browse repository at this point
Copy the full SHA 0b59cf1View commit details -
Configuration menu - View commit details
-
Copy full SHA for c31f67b - Browse repository at this point
Copy the full SHA c31f67bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d45646 - Browse repository at this point
Copy the full SHA 1d45646View commit details
Commits on Jun 5, 2025
-
test: avoid exact float comparison in
test_apply_lambda
(#1795)* test: avoid exact float comparison in `test_apply_lambda` * use by_row=False in apply_simple_udf too
Configuration menu - View commit details
-
Copy full SHA for a600b23 - Browse repository at this point
Copy the full SHA a600b23View commit details
Commits on Jun 6, 2025
-
feat: add blob.transcribe function (#1773)
* add transcribe function * add verbose * add some debugging message * transcribe functin is completed. test case is done * move the place to capture col name * remove a few features, update testcase * change the testcase, add data * introduce user specified instructions * tweak prompt * rebase confest * change the way to read in input audio * update variable names * change variable names * change the way past in input * remove addtional instruction for now * change the column name * add a name for result
Configuration menu - View commit details
-
Copy full SHA for 86159a7 - Browse repository at this point
Copy the full SHA 86159a7View commit details -
fix: address
read_csv
with bothindex_col
anduse_cols
behavior…… inconsistency with pandas (#1785) * fix: read_csv with both index_col and use_cols inconsistent with pandas * ensure columns is not list type and avoid flacky ordered of columns * add docstring for index_col_in_columns and fix tests --------- Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Configuration menu - View commit details
-
Copy full SHA for ba7c313 - Browse repository at this point
Copy the full SHA ba7c313View commit details
Commits on Jun 9, 2025
-
test: Add ReadLocalNode tests (#1794)
* test: Add ReadLocalNode tests * adapt to canonical output types * fix sql snapshot expectation * comments
Configuration menu - View commit details
-
Copy full SHA for 570a40b - Browse repository at this point
Copy the full SHA 570a40bView commit details -
feat: Implement item() for Series and Index (#1792)
* feat: Implement item() for Series and Index This commit introduces the `item()` method to both `Series` and `Index` classes. The `item()` method allows you to extract the single value from a Series or Index. It calls `peek(2)` internally and raises a `ValueError` if the Series or Index does not contain exactly one element. This behavior is consistent with pandas. Unit tests have been added to verify the functionality for: - Single-item Series/Index - Multi-item Series/Index (ValueError expected) - Empty Series/Index (ValueError expected) * refactor: Move item() docstrings to third_party This commit moves the docstrings for the `item()` method in `Series` and `Index` to their respective files in the `third_party/bigframes_vendored/pandas/core/` directory. The docstrings have been updated to match the pandas docstrings as closely as possible, while adhering to the existing style in the BigQuery DataFrames repository. This ensures that the BigQuery DataFrames API documentation remains consistent with pandas where applicable. * Apply suggestions from code review * Here's the test I've prepared: **Test: Update item() tests to match pandas behavior** This commit updates the tests for `Series.item()` and `Index.item()` to align more closely with pandas. The changes include: - Comparing the return value of `bigframes_series.item()` and `bigframes_index.item()` with their pandas counterparts. - Asserting that the ValueError messages for multi-item and empty Series/Index cases are identical to those raised by pandas. The expected message is "can only convert an array of size 1 to a Python scalar". * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix: Ensure item() matches pandas error messages exactly This commit modifies the implementation of `Series.item()` and `Index.item()` to delegate the single-item check and ValueError raising to pandas. Previously, `item()` used `peek(2)` and manually checked the length. The new implementation changes: - `Series.item()` to `self.peek(1).item()` - `Index.item()` to `self.to_series().peek(1).item()` This ensures that the ValueError message ("can only convert an array of size 1 to a Python scalar") is identical to the one produced by pandas when the Series/Index does not contain exactly one element. Existing tests were verified to still pass and accurately cover these conditions by comparing against `pandas.Series.item()` and `pandas.Index.item()`. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix: Address feedback for Series.item() and Index.item() This commit incorporates several fixes and improvements based on feedback: 1. **Docstring Style**: * "Examples:" headings in `Series.item()` and `Index.item()` docstrings (in `third_party/`) are now bold (`**Examples:**`). 2. **Implementation of `item()`**: * `Series.item()` now uses `self.peek(2)` and then calls `.item()` on the peeked pandas Series if length is 1, otherwise raises `ValueError("can only convert an array of size 1 to a Python scalar")`. * `Index.item()` now uses `self.to_series().peek(2)` and then calls `.item()` on the peeked pandas Series if length is 1, otherwise raises the same ValueError. This change was made to allow tests to fail correctly when there is more than 1 item, rather than relying on pandas' `peek(1).item()` which would fetch only one item and not detect the multi-item error. 3. **Test Updates**: * Tests for `Series.item()` and `Index.item()` now capture the precise error message from the corresponding pandas method when testing error conditions (multiple items, empty). * The tests now assert that the BigQuery DataFrames methods raise a `ValueError` with a message identical to the one from pandas. 4. **Doctest Fix**: * The doctest for `Series.item()` in `third_party/bigframes_vendored/pandas/core/series.py` has been updated to expect `np.int64(1)` to match pandas behavior. `import numpy as np` was added to the doctest. 5. **Mypy Fix**: * A type annotation (`pd_idx_empty: pd.Index = ...`) was added in `tests/system/small/test_index.py` to resolve a `var-annotated` mypy error. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * split tests into multiple test cases --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for d2154c8 - Browse repository at this point
Copy the full SHA d2154c8View commit details -
feat: Implement ST_LENGTH geography function (#1791)
* feat: Implement ST_LENGTH geography function This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters. The implementation includes: - A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`. - The user-facing function `st_length` in `bigframes.bigquery._operations.geo`. - Exposure of the new operation and function in relevant `__init__.py` files. - Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs. The function behaves as per the BigQuery ST_LENGTH documentation: - Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs. - Returns the perimeter for POLYGON and MULTIPOLYGON. - Returns the total length for LINESTRING and MULTILINESTRING. - For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons. * feat: Add NotImplemented length property to GeoSeries This commit adds a `length` property to the `GeoSeries` class. Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead. This change includes: - The `length` property in `bigframes/geopandas/geoseries.py`. - A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed. * Update bigframes/bigquery/_operations/__init__.py * fix lint * add missing compilation method * use pandas for the expected values in tests * fix: Apply patch for ST_LENGTH and related test updates This commit applies a user-provided patch that includes: - Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`. - Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`. - Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`. - Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`. Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment. * feat: Add use_spheroid parameter to ST_LENGTH and update docs This commit introduces the `use_spheroid` parameter to the `ST_LENGTH` geography function, aligning it more closely with the BigQuery ST_LENGTH(geography_expression[, use_spheroid]) signature. Key changes: - `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass that accepts `use_spheroid` (defaulting to `False`). A check is included to raise `NotImplementedError` if `use_spheroid` is `True`, as this is the current limitation in BigQuery. - The Ibis compiler implementation for `geo_st_length_op` in `bigframes.core.compile.scalar_op_compiler.py` has been updated to accept the new `GeoStLengthOp` operator type. - The user-facing `st_length` function in `bigframes.bigquery._operations.geo.py` now includes the `use_spheroid` keyword argument. - The docstring for `st_length` has been updated to match the official BigQuery documentation, clarifying that only lines contribute to the length (points and polygons result in 0 length), and detailing the `use_spheroid` parameter. Examples have been updated accordingly. - Tests in `tests/system/small/bigquery/test_geo.py` have been updated to: - Reflect the correct behavior (0 length for polygons/points). - Test calls with both default `use_spheroid` and explicit `use_spheroid=False`. - Verify that `use_spheroid=True` raises a `NotImplementedError`. Note: System tests requiring Google Cloud authentication were not re-executed for this specific commit due to environment limitations identified in previous steps. The changes primarily affect the operator definition, function signature, and client-side validation, with the core Ibis compilation logic for length remaining unchanged. * feat: Implement use_spheroid for ST_LENGTH via Ibis UDF This commit refactors the ST_LENGTH implementation to correctly pass the `use_spheroid` parameter to BigQuery by using Ibis's `ibis_udf.scalar.builtin('ST_LENGTH', ...)` function. Key changes: - `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side `NotImplementedError` for `use_spheroid=True` (raised in `__post_init__`) has been removed. BigQuery DataFrames will now pass this parameter directly to BigQuery. - `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`: The implementation now always uses `ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)` instead of `x.length()`. This ensures the `use_spheroid` parameter is included in the SQL generated for BigQuery. - `tests/system/small/bigquery/test_geo.py`: - The test expecting a client-side `NotImplementedError` for `use_spheroid=True` has been removed. - A new test `test_st_length_use_spheroid_true_errors_from_bq` has been added. This test calls `st_length` with `use_spheroid=True` and asserts that an exception is raised from BigQuery, as BigQuery itself currently only supports `use_spheroid=False` for the `ST_LENGTH` function. - Existing tests for `st_length` were already updated in a previous commit to reflect that only line geometries contribute to the length, and these continue to verify behavior with `use_spheroid=False`. This change ensures that BigQuery DataFrames accurately reflects BigQuery's `ST_LENGTH` capabilities concerning the `use_spheroid` parameter. * refactor: Use Ibis UDF for ST_LENGTH BigQuery builtin This commit refactors the ST_LENGTH geography operation to use an Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with the pattern exemplified by other built-in functions like ST_DISTANCE when a direct Ibis method with all necessary parameters is not available. Key changes: - A new `st_length` function is defined in `bigframes/core/compile/scalar_op_compiler.py` using `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's `ST_LENGTH(geography, use_spheroid)` function. - The `geo_length_op_impl` in the same file now calls this `st_length` Ibis UDF, replacing the previous use of `op_typing.ibis_function`. - The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and the user-facing `st_length` function in `bigframes/bigquery/_operations/geo.py` remain unchanged from the previous version, as they correctly define the operation's interface and parameters. This change provides a cleaner and more direct way to map the BigQuery DataFrames operation to the specific BigQuery ST_LENGTH SQL function signature, while maintaining the existing BigQuery DataFrames operation structure. The behavior of the `st_length` function, including its handling of the `use_spheroid` parameter and error conditions from BigQuery, remains the same. * refactor: Consolidate st_length tests in test_geo.py This commit refactors the system tests for the `st_length` geography function in `tests/system/small/bigquery/test_geo.py`. The numerous individual test cases for different geometry types have been combined into a single, comprehensive test function `test_st_length_various_geometries`. This new test uses a single GeoSeries with a variety of inputs (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, None/Empty) and compares the output of `st_length` (with both default and explicit `use_spheroid=False`) against a pandas Series of expected lengths. This consolidation improves the conciseness and maintainability of the tests for `st_length`. The test for `use_spheroid=True` (expecting an error from BigQuery) remains separate. * fix: Correct export of GeoStLengthOp in operations init This commit fixes an ImportError caused by an incorrect name being used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`. When `geo_st_length_op` (a variable) was replaced by the dataclass `GeoStLengthOp`, the import and `__all__` list in this `__init__.py` file were not updated. This commit changes the import from `.geo_ops` to correctly import `GeoStLengthOp` and updates the `__all__` list to export `GeoStLengthOp`. * fix system test and some linting * fix lint * fix doctest * fix docstring * Update bigframes/core/compile/scalar_op_compiler.py --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for c5b7fda - Browse repository at this point
Copy the full SHA c5b7fdaView commit details -
feat: Implement ST_ISCLOSED geography function (#1789)
* feat: Implement ST_ISCLOSED geography function This commit implements the `ST_ISCLOSED` geography function. The following changes were made: - Added `GeoIsClosedOp` to `bigframes/operations/geo_ops.py`. - Added `st_isclosed` function to `bigframes/bigquery/_operations/geo.py`. - Added `is_closed` property to `GeoSeries` in `bigframes/geopandas/geoseries.py`. - Added system tests for the `is_closed` property. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix mypy failure * feat: Implement ST_ISCLOSED geography function This commit implements the `ST_ISCLOSED` geography function. The following changes were made: - Added `GeoIsClosedOp` to `bigframes/operations/geo_ops.py`. - Added `st_isclosed` function to `bigframes/bigquery/_operations/geo.py`. - Added `is_closed` property to `GeoSeries` in `bigframes/geopandas/geoseries.py`. - Registered `GeoIsClosedOp` in `bigframes/core/compile/scalar_op_compiler.py` by defining an Ibis UDF and registering the op. - Added system checks for the `is_closed` property. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * wait to implement geoseries.is_closed for now * fix doctest * address review comments * Update bigframes/bigquery/_operations/geo.py --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 36bc179 - Browse repository at this point
Copy the full SHA 36bc179View commit details -
docs: adjust strip method examples to match latest pandas (#1797)
The string representation of pandas Series has changed, causing doctest failures in `accessor.py`. This commit updates the expected output in the doctests for `lstrip`, `rstrip`, and `strip` to reflect the new string representation. The changes involve removing the `<BLANKLINE>` before `<NA>` and adjusting the spacing in the doctest examples. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 817b0c0 - Browse repository at this point
Copy the full SHA 817b0c0View commit details -
test: Run cleanup session before doctest session in Kokoro tests (#1799)
This commit changes the order of Nox sessions in the Kokoro doctest configurations to ensure that the `cleanup` session runs before the `doctest` session. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Configuration menu - View commit details
-
Copy full SHA for b7ac38e - Browse repository at this point
Copy the full SHA b7ac38eView commit details -
chore(main): release 2.6.0 (#1787)
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Shenyang Cai <sycai@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 72a021f - Browse repository at this point
Copy the full SHA 72a021fView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v2.5.0...v2.6.0