8000 fix: empty record dtypes by yokomotod · Pull Request #2147 · googleapis/python-bigquery · GitHub
[go: up one dir, main page]

Skip to content

fix: empty record dtypes #2147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 1, 2025
Merged

Conversation

yokomotod
Copy link
Contributor
@yokomotod yokomotod commented Mar 19, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #2148 🦕

Can we increase the minimum supported pandas from 1.1.4 to 1.3.0 ?

@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery API. labels Mar 19, 2025
@yokomotod yokomotod marked this pull request as ready for review March 19, 2025 07:11
@yokomotod yokomotod requested review from a team as code owners March 19, 2025 07:11
@yokomotod yokomotod marked this pull request as draft March 19, 2025 07:11
Comment on lines -2672 to -2674
# Avoid "ValueError: need at least one array to concatenate" on
# older versions of pandas when converting empty RecordBatch to
# DataFrame. See: https://github.com/pandas-dev/pandas/issues/41241
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas-dev/pandas#41052 has released with pandas 1.3.0 (as mentioned in above issue: pandas-dev/pandas#41241 (comment))

So if we can increase minimum pandas version to 1.3.0, we don't need this workaround

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, moving to a monorepo might help make the dependencies more consistent, ideally we might want any overlapped dependencies among the handwritten BigQuery projects to be consistent.

But just for the purpose of this PR, I think it's reasonable to increase the minimum pandas version to 1.3.0, as long as it's not breaking anything.

@@ -21,7 +21,7 @@ opentelemetry-api==1.1.0
opentelemetry-instrumentation==0.20b0
opentelemetry-sdk==1.1.0
packaging==24.2.0
pandas==1.1.4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that pandas==1.2.5 reproduces

FAILED tests/unit/test_table.py::TestRowIterator::test_to_dataframe_w_bqstorage_no_streams - ValueError: need at least one array to concatenate

and pandas==1.3.0 can pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the constraint file!

@yokomotod yokomotod marked this pull request as ready for review March 19, 2025 15:15
Copy link
Contributor
@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but will wait for @chalmerlowe or @Linchin to approve.

Comment on lines -2672 to -2674
# Avoid "ValueError: need at least one array to concatenate" on
# older versions of pandas when converting empty RecordBatch to
# DataFrame. See: https://github.com/pandas-dev/pandas/issues/41241
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -21,7 +21,7 @@ opentelemetry-api==1.1.0
opentelemetry-instrumentation==0.20b0
opentelemetry-sdk==1.1.0
packaging==24.2.0
pandas==1.1.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the constraint file!

@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 26, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 26, 2025
@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 26, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 26, 2025
@Linchin
Copy link
Contributor
Linchin commented Mar 27, 2025

I think we need to remove some liens in the unit test so coverage test can pass, could you fix it for us? Then we should be able to merge the PR

Name                                                         Stmts   Miss Branch BrPart  Cover   Missing
--------------------------------------------------------------------------------------------------------
tests/unit/test_table.py                                      3288      2     86      1    99%   4152-4153

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Mar 27, 2025
@yokomotod
Copy link
Contributor Author

@Linchin Linchin added the kokoro:run Add this label to force Kokoro to re-run the tests. label Mar 31, 2025
@Linchin Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 31, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 31, 2025
@Linchin
Copy link
Contributor
Linchin commented Mar 31, 2025

A different system test is failing, could you fix it too? Thanks

Comment on lines -1225 to -1230
# Result is dependent upon which version of pandas is being used.
# Float64 was not introduced until pandas version 1.4.
if PANDAS_INSTALLED_VERSION >= "1.4":
assert df.dtypes["float64_col"].name == "Float64"
else:
assert df.dtypes["float64_col"].name == "string"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float64 is introduced in pandas 1.2.0, not 1.4. So we can drop this handling now (pandas >= 1.3.0)

https://btx.cloud.google.com/invocations/feb3ac69-f2db-449c-ab07-2122420f069b/targets/cloud-devrel%2Fclient-libraries%2Fpython%2Fgoogleapis%2Fpython-bigquery%2Fpresubmit%2Fsystem-3.9/log

nox > python -m pip freeze
  .
  .
  .
pandas==1.3.0
  .
  .
  .

        if PANDAS_INSTALLED_VERSION >= "1.4":
            assert df.dtypes["float64_col"].name == "Float64"
        else:
>           assert df.dtypes["float64_col"].name == "string"
E           AssertionError: assert 'Float64' == 'string'
E             
E             - string
E             + Float64

https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0.html#experimental-nullable-data-types-for-float-data

@yokomotod
Copy link
Contributor Author

@Linchin sorry to overlooking another error, I fixed it too.

Can you try to run Ci test again ?

I have trouble to run the system test on my local (maybe due to old numpy on apple silicon) but the tests should pass.

@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 1, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 1, 2025
Copy link
Contributor
@Linchin Linchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!

@Linchin Linchin merged commit 77d7173 into googleapis:main Apr 1, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

object dtype when empty query result
4 participants
0