-
Notifications
You must be signed in to change notification settings - Fork 50
8000 fix: correct read_csv behaviours with use_cols, names, index_col #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bigframes/session/loader.py
Outdated
table_column_names = [field.name for field in table.schema] | ||
for column_name in columns: | ||
if column_name not in table_column_names: | ||
possibility = min( | ||
table_column_names, | ||
key=lambda item: bigframes._tools.strings.levenshtein_distance( | ||
column_name, item | ||
), | ||
) | ||
raise ValueError( | ||
f"Column '{column_name}' of `columns` not found in this table. " | ||
f"Did you mean '{possibility}'?" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shall we use a helper function/method? The indentation is very deep here. (https://goto.google.com/tott/733)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bigframes/session/loader.py
Outdated
else: | ||
if names is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this is the same as elif names is not None:
which can save you a level of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bigframes/session/loader.py
Outdated
assert len(table.schema) >= len(list(names)) | ||
assert len(list(names)) >= len(columns) | ||
table_column_names = [ | ||
field.name for field in table.schema[: len(list(names))] | ||
] | ||
|
||
invalid_columns = set(columns) - set(names) | ||
if len(invalid_columns) != 0: | ||
raise ValueError( | ||
"Usecols do not match columns, columns expected but not " | ||
f"found: {invalid_columns}" | ||
) | ||
|
||
rename_to_schema = dict(zip(list(names), table_column_names)) | ||
names = columns | ||
columns = [rename_to_schema[renamed_name] for renamed_name in columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hosting function is very long. I think there might be an opportunity to make this code block a helper function/method
go/pystyle#function-length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored for more readable. Please check.
94c8af9
to
d345126
Compare
Fixes internal issue 421466334 🦕