10000 Comparing v2.5.0...v2.6.0 · googleapis/python-bigquery-dataframes · GitHub
[go: up one dir, main page]

Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: googleapis/python-bigquery-dataframes
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v2.5.0
Choose a base ref
...
head repository: googleapis/python-bigquery-dataframes
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v2.6.0
Choose a head ref
  • 18 commits
  • 82 files changed
  • 11 contributors

Commits on Jun 2, 2025

  1. Configuration menu
    Copy the full SHA
    7269512 View commit details
    Browse the repository at this point in the history
  2. feat: implement ai.classify() (#1781)

    * feat: implement ai.classify()
    
    * check label duplicity
    sycai authored Jun 2, 2025
    Configuration menu
    Copy the full SHA
    8af26d0 View commit details
    Browse the repository at this point in the history
  3. docs: fix docstrings to improve html rendering of code examples (#1788)

    * docs: fix docstrings to improve html rendering of code examples
    
    * fix examples docstring in one more file
    shobsi authored Jun 2, 2025
    Configuration menu
    Copy the full SHA
    38d9b73 View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2025

  1. Configuration menu
    Copy the full SHA
    e480d29 View commit details
    Browse the repository at this point in the history
  2. chore: use faster query_and_wait API in _read_gbq_colab (#1777)

    * chore: use faster query_and_wait API in _read_gbq_colab
    
    * try to fix unit tests
    
    * more unit test fixes
    
    * more test fixes
    
    * fix mypy
    
    * fix metrics counter in read_gbq with allow_large_results=False
    
    * use managedarrowtable
    
    * Update bigframes/session/loader.py
    
    * split out a few special case return values for read_gbq_query
    
    * support slice node for repr
    
    * fix failing system test
    
    * move slice into semiexecutor and out of readlocalnode
    
    * unit test for local executor
    
    * split method instead of using reloads
    
    * fix reference to _start_query
    
    * use limit rewrite for slice support
    
    * do not use numpy for offsets
    tswast authored Jun 3, 2025
    Configuration menu
    Copy the full SHA
    f495c84 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2025

  1. Configuration menu
    Copy the full SHA
    0b59cf1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c31f67b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1d45646 View commit details
    Browse the repository at this point in the history

Commits on Jun 5, 2025

  1. test: avoid exact float comparison in test_apply_lambda (#1795)

    * test: avoid exact float comparison in `test_apply_lambda`
    
    * use by_row=False in apply_simple_udf too
    tswast authored Jun 5, 2025
    Configuration menu
    Copy the full SHA
    a600b23 View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2025

  1. feat: add blob.transcribe function (#1773)

    * add transcribe function
    
    * add verbose
    
    * add some debugging message
    
    * transcribe functin is completed. test case is done
    
    * move the place to capture col name
    
    * remove a few features, update testcase
    
    * change the testcase, add data
    
    * introduce user specified instructions
    
    * tweak prompt
    
    * rebase confest
    
    * change the way to read in input audio
    
    * update variable names
    
    * change variable names
    
    * change the way past in input
    
    * remove addtional instruction for now
    
    * change the column name
    
    * add a name for result
    shuoweil authored Jun 6, 2025
    Configuration menu
    Copy the full SHA
    86159a7 View commit details
    Browse the repository at this point in the history
  2. fix: address read_csv with both index_col and use_cols behavior…

    … inconsistency with pandas (#1785)
    
    * fix: read_csv with both index_col and use_cols inconsistent with pandas
    
    * ensure columns is not list type and avoid flacky ordered of columns
    
    * add docstring for index_col_in_columns and fix tests
    
    ---------
    
    Co-authored-by: Tim Sweña (Swast) <swast@google.com>
    chelsea-lin and tswast authored Jun 6, 2025
    Configuration menu
    Copy the full SHA
    ba7c313 View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2025

  1. test: Add ReadLocalNode tests (#1794)

    * test: Add ReadLocalNode tests
    
    * adapt to canonical output types
    
    * fix sql snapshot expectation
    
    * comments
    TrevorBergeron authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    570a40b View commit details
    Browse the repository at this point in the history
  2. feat: Implement item() for Series and Index (#1792)

    * feat: Implement item() for Series and Index
    
    This commit introduces the `item()` method to both `Series` and `Index` classes.
    
    The `item()` method allows you to extract the single value from a Series or Index.
    It calls `peek(2)` internally and raises a `ValueError` if the Series or Index
    does not contain exactly one element. This behavior is consistent with pandas.
    
    Unit tests have been added to verify the functionality for:
    - Single-item Series/Index
    - Multi-item Series/Index (ValueError expected)
    - Empty Series/Index (ValueError expected)
    
    * refactor: Move item() docstrings to third_party
    
    This commit moves the docstrings for the `item()` method in `Series` and `Index`
    to their respective files in the `third_party/bigframes_vendored/pandas/core/`
    directory.
    
    The docstrings have been updated to match the pandas docstrings as closely as
    possible, while adhering to the existing style in the BigQuery DataFrames repository.
    This ensures that the BigQuery DataFrames API documentation remains consistent
    with pandas where applicable.
    
    * Apply suggestions from code review
    
    * Here's the test I've prepared:
    
    **Test: Update item() tests to match pandas behavior**
    
    This commit updates the tests for `Series.item()` and `Index.item()`
    to align more closely with pandas.
    
    The changes include:
    - Comparing the return value of `bigframes_series.item()` and
      `bigframes_index.item()` with their pandas counterparts.
    - Asserting that the ValueError messages for multi-item and empty
      Series/Index cases are identical to those raised by pandas.
      The expected message is "can only convert an array of size 1 to a Python scalar".
    
    * 🦉 Updates from OwlBot post-processor
    
    See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md
    
    * fix: Ensure item() matches pandas error messages exactly
    
    This commit modifies the implementation of `Series.item()` and `Index.item()`
    to delegate the single-item check and ValueError raising to pandas.
    
    Previously, `item()` used `peek(2)` and manually checked the length.
    The new implementation changes:
    - `Series.item()` to `self.peek(1).item()`
    - `Index.item()` to `self.to_series().peek(1).item()`
    
    This ensures that the ValueError message ("can only convert an array of size 1 to a Python scalar")
    is identical to the one produced by pandas when the Series/Index does not
    contain exactly one element.
    
    Existing tests were verified to still pass and accurately cover these
    conditions by comparing against `pandas.Series.item()` and `pandas.Index.item()`.
    
    * 🦉 Updates from OwlBot post-processor
    
    See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md
    
    * fix: Address feedback for Series.item() and Index.item()
    
    This commit incorporates several fixes and improvements based on feedback:
    
    1.  **Docstring Style**:
        *   "Examples:" headings in `Series.item()` and `Index.item()`
            docstrings (in `third_party/`) are now bold (`**Examples:**`).
    
    2.  **Implementation of `item()`**:
        *   `Series.item()` now uses `self.peek(2)` and then calls `.item()` on
            the peeked pandas Series if length is 1, otherwise raises
            `ValueError("can only convert an array of size 1 to a Python scalar")`.
        *   `Index.item()` now uses `self.to_series().peek(2)` and then calls
            `.item()` on the peeked pandas Series if length is 1, otherwise
            raises the same ValueError.
            This change was made to allow tests to fail correctly when there is
            more than 1 item, rather than relying on pandas' `peek(1).item()`
            which would fetch only one item and not detect the multi-item error.
    
    3.  **Test Updates**:
        *   Tests for `Series.item()` and `Index.item()` now capture the
            precise error message from the corresponding pandas method when
            testing error conditions (multiple items, empty).
        *   The tests now assert that the BigQuery DataFrames methods raise
            a `ValueError` with a message identical to the one from pandas.
    
    4.  **Doctest Fix**:
        *   The doctest for `Series.item()` in
            `third_party/bigframes_vendored/pandas/core/series.py` has been
            updated to expect `np.int64(1)` to match pandas behavior.
            `import numpy as np` was added to the doctest.
    
    5.  **Mypy Fix**:
        *   A type annotation (`pd_idx_empty: pd.Index = ...`) was added in
            `tests/system/small/test_index.py` to resolve a `var-annotated`
            mypy error.
    
    * 🦉 Updates from OwlBot post-processor
    
    See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md
    
    * split tests into multiple test cases
    
    ---------
    
    Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
    Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
    3 people authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    d2154c8 View commit details
    Browse the repository at this point in the history
  3. feat: Implement ST_LENGTH geography function (#1791)

    * feat: Implement ST_LENGTH geography function
    
    This commit introduces the ST_LENGTH function for BigQuery DataFrames.
    ST_LENGTH computes the length of GEOGRAPHY objects in meters.
    
    The implementation includes:
    - A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`.
    - The user-facing function `st_length` in `bigframes.bigquery._operations.geo`.
    - Exposure of the new operation and function in relevant `__init__.py` files.
    - Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs.
    
    The function behaves as per the BigQuery ST_LENGTH documentation:
    - Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs.
    - Returns the perimeter for POLYGON and MULTIPOLYGON.
    - Returns the total length for LINESTRING and MULTILINESTRING.
    - For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.
    
    * feat: Add NotImplemented length property to GeoSeries
    
    This commit adds a `length` property to the `GeoSeries` class.
    Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead.
    
    This change includes:
    - The `length` property in `bigframes/geopandas/geoseries.py`.
    - A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed.
    
    * Update bigframes/bigquery/_operations/__init__.py
    
    * fix lint
    
    * add missing compilation method
    
    * use pandas for the expected values in tests
    
    * fix: Apply patch for ST_LENGTH and related test updates
    
    This commit applies a user-provided patch that includes:
    - Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`.
    - Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`.
    - Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`.
    - Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`.
    
    Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment.
    
    * feat: Add use_spheroid parameter to ST_LENGTH and update docs
    
    This commit introduces the `use_spheroid` parameter to the `ST_LENGTH`
    geography function, aligning it more closely with the BigQuery
    ST_LENGTH(geography_expression[, use_spheroid]) signature.
    
    Key changes:
    - `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass
      that accepts `use_spheroid` (defaulting to `False`). A check is
      included to raise `NotImplementedError` if `use_spheroid` is `True`,
      as this is the current limitation in BigQuery.
    - The Ibis compiler implementation for `geo_st_length_op` in
      `bigframes.core.compile.scalar_op_compiler.py` has been updated
      to accept the new `GeoStLengthOp` operator type.
    - The user-facing `st_length` function in
      `bigframes.bigquery._operations.geo.py` now includes the
      `use_spheroid` keyword argument.
    - The docstring for `st_length` has been updated to match the
      official BigQuery documentation, clarifying that only lines contribute
      to the length (points and polygons result in 0 length), and
      detailing the `use_spheroid` parameter. Examples have been
      updated accordingly.
    - Tests in `tests/system/small/bigquery/test_geo.py` have been
      updated to:
        - Reflect the correct behavior (0 length for polygons/points).
        - Test calls with both default `use_spheroid` and explicit
          `use_spheroid=False`.
        - Verify that `use_spheroid=True` raises a `NotImplementedError`.
    
    Note: System tests requiring Google Cloud authentication were not
    re-executed for this specific commit due to environment limitations
    identified in previous steps. The changes primarily affect the operator
    definition, function signature, and client-side validation, with the
    core Ibis compilation logic for length remaining unchanged.
    
    * feat: Implement use_spheroid for ST_LENGTH via Ibis UDF
    
    This commit refactors the ST_LENGTH implementation to correctly
    pass the `use_spheroid` parameter to BigQuery by using Ibis's
    `ibis_udf.scalar.builtin('ST_LENGTH', ...)` function.
    
    Key changes:
    - `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side
      `NotImplementedError` for `use_spheroid=True` (raised in
      `__post_init__`) has been removed. BigQuery DataFrames will now
      pass this parameter directly to BigQuery.
    - `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`:
      The implementation now always uses
      `ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)`
      instead of `x.length()`. This ensures the `use_spheroid`
      parameter is included in the SQL generated for BigQuery.
    - `tests/system/small/bigquery/test_geo.py`:
        - The test expecting a client-side `NotImplementedError` for
          `use_spheroid=True` has been removed.
        - A new test `test_st_length_use_spheroid_true_errors_from_bq`
          has been added. This test calls `st_length` with
          `use_spheroid=True` and asserts that an exception is raised
          from BigQuery, as BigQuery itself currently only supports
          `use_spheroid=False` for the `ST_LENGTH` function.
        - Existing tests for `st_length` were already updated in a
          previous commit to reflect that only line geometries contribute
          to the length, and these continue to verify behavior with
          `use_spheroid=False`.
    
    This change ensures that BigQuery DataFrames accurately reflects BigQuery's
    `ST_LENGTH` capabilities concerning the `use_spheroid` parameter.
    
    * refactor: Use Ibis UDF for ST_LENGTH BigQuery builtin
    
    This commit refactors the ST_LENGTH geography operation to use an
    Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with
    the pattern exemplified by other built-in functions like ST_DISTANCE
    when a direct Ibis method with all necessary parameters is not available.
    
    Key changes:
    - A new `st_length` function is defined in
      `bigframes/core/compile/scalar_op_compiler.py` using
      `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's
      `ST_LENGTH(geography, use_spheroid)` function.
    - The `geo_length_op_impl` in the same file now calls this
      `st_length` Ibis UDF, replacing the previous use of
      `op_typing.ibis_function`.
    - The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and
      the user-facing `st_length` function in
      `bigframes/bigquery/_operations/geo.py` remain unchanged from
      the previous version, as they correctly define the operation's
      interface and parameters.
    
    This change provides a cleaner and more direct way to map the
    BigQuery DataFrames operation to the specific BigQuery ST_LENGTH
    SQL function signature, while maintaining the existing BigQuery DataFrames
    operation structure. The behavior of the `st_length` function,
    including its handling of the `use_spheroid` parameter and error
    conditions from BigQuery, remains the same.
    
    * refactor: Consolidate st_length tests in test_geo.py
    
    This commit refactors the system tests for the `st_length` geography
    function in `tests/system/small/bigquery/test_geo.py`.
    
    The numerous individual test cases for different geometry types
    have been combined into a single, comprehensive test function
    `test_st_length_various_geometries`. This new test uses a single
    GeoSeries with a variety of inputs (Point, LineString, Polygon,
    MultiPoint, MultiLineString, MultiPolygon, GeometryCollection,
    None/Empty) and compares the output of `st_length` (with both
    default and explicit `use_spheroid=False`) against a pandas Series
    of expected lengths.
    
    This consolidation improves the conciseness and maintainability of
    the tests for `st_length`. The test for `use_spheroid=True`
    (expecting an error from BigQuery) remains separate.
    
    * fix: Correct export of GeoStLengthOp in operations init
    
    This commit fixes an ImportError caused by an incorrect name being
    used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`.
    
    When `geo_st_length_op` (a variable) was replaced by the dataclass
    `GeoStLengthOp`, the import and `__all__` list in this `__init__.py`
    file were not updated. This commit changes the import from `.geo_ops`
    to correctly import `GeoStLengthOp` and updates the `__all__` list
    to export `GeoStLengthOp`.
    
    * fix system test and some linting
    
    * fix lint
    
    * fix doctest
    
    * fix docstring
    
    * Update bigframes/core/compile/scalar_op_compiler.py
    
    ---------
    
    Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
    tswast and google-labs-jules[bot] authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    c5b7fda View commit details
    Browse the repository at this point in the history
  4. feat: Implement ST_ISCLOSED geography function (#1789)

    * feat: Implement ST_ISCLOSED geography function
    
    This commit implements the `ST_ISCLOSED` geography function.
    
    The following changes were made:
    - Added `GeoIsClosedOp` to `bigframes/operations/geo_ops.py`.
    - Added `st_isclosed` function to `bigframes/bigquery/_operations/geo.py`.
    - Added `is_closed` property to `GeoSeries` in `bigframes/geopandas/geoseries.py`.
    - Added system tests for the `is_closed` property.
    
    * 🦉 Updates from OwlBot post-processor
    
    See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md
    
    * fix mypy failure
    
    * feat: Implement ST_ISCLOSED geography function
    
    This commit implements the `ST_ISCLOSED` geography function.
    
    The following changes were made:
    - Added `GeoIsClosedOp` to `bigframes/operations/geo_ops.py`.
    - Added `st_isclosed` function to `bigframes/bigquery/_operations/geo.py`.
    - Added `is_closed` property to `GeoSeries` in `bigframes/geopandas/geoseries.py`.
    - Registered `GeoIsClosedOp` in `bigframes/core/compile/scalar_op_compiler.py`
      by defining an Ibis UDF and registering the op.
    - Added system checks for the `is_closed` property.
    
    * 🦉 Updates from OwlBot post-processor
    
    See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md
    
    * wait to implement geoseries.is_closed for now
    
    * fix doctest
    
    * address review comments
    
    * Update bigframes/bigquery/_operations/geo.py
    
    ---------
    
    Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
    Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
    3 people authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    36bc179 View commit details
    Browse the repository at this point in the history
  5. docs: adjust strip method examples to match latest pandas (#1797)

    The string representation of pandas Series has changed, causing
    doctest failures in `accessor.py`. This commit updates the
    expected output in the doctests for `lstrip`, `rstrip`, and
    `strip` to reflect the new string representation.
    
    The changes involve removing the `<BLANKLINE>` before `<NA>`
    and adjusting the spacing in the doctest examples.
    
    Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
    tswast and google-labs-jules[bot] authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    817b0c0 View commit details
    Browse the repository at this point in the history
  6. test: Run cleanup session before doctest session in Kokoro tests (#1799)

    This commit changes the order of Nox sessions in the Kokoro doctest configurations to ensure that the `cleanup` session runs before the `doctest` session.
    
    Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
    Co-authored-by: Tim Sweña (Swast) <swast@google.com>
    3 people authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    b7ac38e View commit details
    Browse the repository at this point in the history
  7. chore(main): release 2.6.0 (#1787)

    Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
    Co-authored-by: Shenyang Cai <sycai@users.noreply.github.com>
    release-please[bot] and sycai authored Jun 9, 2025
    Configuration menu
    Copy the full SHA
    72a021f View commit details
    Browse the repository at this point in the history
Loading
0