10000 fix: correct read_csv behaviours with use_cols, names, index_col by chelsea-lin · Pull Request #1804 · googleapis/python-bigquery-dataframes · GitHub
[go: up one dir, main page]

Skip to content

fix: correct read_csv behaviours with use_cols, names, index_col #1804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 12, 2025

Conversation

chelsea-lin
Copy link
Contributor

Fixes internal issue 421466334 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 10, 2025
@chelsea-lin chelsea-lin marked this pull request as ready for review June 10, 2025 18:41
@chelsea-lin chelsea-lin requested review from a team as code owners June 10, 2025 18:41
@chelsea-lin chelsea-lin requested a review from sycai June 10, 2025 18:41
Comment on lines 632 to 644
table_column_names = [field.name for field in table.schema]
for column_name in columns:
if column_name not in table_column_names:
possibility = min(
table_column_names,
key=lambda item: bigframes._tools.strings.levenshtein_distance(
column_name, item
),
)
raise ValueError(
f"Column '{column_name}' of `columns` not found in this table. "
f"Did you mean '{possibility}'?"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shall we use a helper function/method? The indentation is very deep here. (https://goto.google.com/tott/733)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 613 to 614
else:
if names is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this is the same as elif names is not None: which can save you a level of indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 615 to 672
assert len(table.schema) >= len(list(names))
assert len(list(names)) >= len(columns)
table_column_names = [
field.name for field in table.schema[: len(list(names))]
]

invalid_columns = set(columns) - set(names)
if len(invalid_columns) != 0:
raise ValueError(
"Usecols do not match columns, columns expected but not "
f"found: {invalid_columns}"
)

rename_to_schema = dict(zip(list(names), table_column_names))
names = columns
columns = [rename_to_schema[renamed_name] for renamed_name in columns]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hosting function is very long. I think there might be an opportunity to make this code block a helper function/method

go/pystyle#function-length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored for more readable. Please check.

@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readcsv branch from 94c8af9 to d345126 Compare June 11, 2025 22:54
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jun 11, 2025
@chelsea-lin chelsea-lin requested a review from sycai June 11, 2025 22:54
sycai
sycai previously approved these changes Jun 11, 2025
@chelsea-lin chelsea-lin merged commit 855031a into main Jun 12, 2025
22 of 24 checks passed
@chelsea-lin chelsea-lin 7C35 deleted the main_chelsealin_readcsv branch June 12, 2025 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0