-
Notifications
You must be signed in to change notification settings - Fork 109
fix: type checking #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: type checking #993
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a massive amount of new code that will only serve to present additional overhead when developing. The internal API is not public. A decision was made in #750 to use Python wrapper classes as the method for public-facing type checking.
internal api type annotation is not for user of this library,it's useful for developer of this library. automated tools can be used to gurarantee code quality. i annotated the intetnal api,then i can easily find several bugs via pyright. |
There's no guarantee that those internal types will match the actual runtime types from Rust. So in effect this adds another layer of types that must be manually kept up to date on every PR. In that way, I don't see where the benefit comes in. We already have developer type checking on the Rust side. |
ok. if you dont think its necessary, i will remove the annotation and only commit the bug fix. |
Let's let another developer chime in as well. This is just my opinion. What are the bug fix(es)? |
@@ -66,9 +66,10 @@ def __init__(self, table: df_internal.Table) -> None: | |||
"""This constructor is not typically called by the end user.""" | |||
self.table = table | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is changing the user-facing API, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i checked the test,i think we should use @ property, but previously we didnt do wrapping somewhere, raw table is returned that's why test passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've become convinced that the current code is broken and that this is the right change. Thank you @chenkovsky for doing this. I do think we need to put some notes to that effect in the section of user facing changes of the PR description.
most bugs are wrapping or unwrapping missing,since we have wrap classes. |
@kylebarron should I remove type annotation, and review bug fix first? |
Let's cc @timsaucer for thoughts. My opinion is that it would be simpler to first review a PR that contains only the bug fixes. |
It would be nice for separation of concerns to do bug fix in one PR and type checking in another. Overall the size of the PR isn't that unmanageable but it does touch a lot of points that I'd want to try out also with my internal tools that build on top of datafusion-python. I'll try to do this tomorrow or the day after. |
@kylebarron I have removed pyi files. please review it again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions, but overall this looks like a very good improvement. Thank you for putting in the effort. It is much appreciated!
@@ -66,9 +66,10 @@ def __init__(self, table: df_internal.Table) -> None: | |||
"""This constructor is not typically called by the end user.""" | |||
self.table = table | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've become convinced that the current code is broken and that this is the right change. Thank you @chenkovsky for doing this. I do think we need to put some notes to that effect in the section of user facing changes of the PR description.
python/datafusion/context.py
Outdated
@@ -783,7 +783,9 @@ def register_parquet( | |||
file_extension, | |||
skip_metadata, | |||
schema, | |||
file_sort_order, | |||
[[expr.raw_sort for expr in exprs] for exprs in file_sort_order] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same helper function sort_list_to_raw_sort_list
that you use below to clean it up a tiny bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -182,7 +182,7 @@ class AggregateUDF: | |||
|
|||
def __init__( | |||
self, | |||
name: Optional[str], | |||
name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the Optional[str]
if we still have the logic below to allow None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more 628C .
it seems that in rust binding, it's not optional
Line 92 in e6f6e66
name: &str, |
python/datafusion/udf.py
Outdated
@@ -158,7 +158,7 @@ def state(self) -> List[pyarrow.Scalar]: | |||
pass | |||
|
|||
@abstractmethod | |||
def update(self, *values: pyarrow.Array) -> None: | |||
def update(self, values: pyarrow.Array) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong to me. I think we want/need to allow a variable number of values passed for udafs that have multiple state elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, My mistake
@timsaucer could you please review it again? |
Which issue does this PR close?
No
Rationale for this change
pyo3 native code has no python type, without type hint, it's error prone.
What changes are included in this PR?
Are there any user-facing changes?
The original Rust-wrapped table class was directly exposed. This PR changes it to expose a wrapper class instead of the Rust-wrapped one.