-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hey - really like this library! Unfortunately we have migrated to polars, so I've been looking at implementing something similar for my use-case.
I had a couple of queries around this code snippet in DataSet
:
_schema_annotations = None
def __class_getitem__(cls, item):
"""Allows us to define a schema for the ``DataSet``."""
cls = super().__class_getitem__(item)
cls._schema_annotations = item
return cls
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
if DataSet._schema_annotations is None:
return
schema_expected = get_type_hints(DataSet._schema_annotations)
DataSet._schema_annotations = None
To summarise:
- set
_schema_annotations
(the generic instance type) at class-level in__class_getitem__
- store this as an internal function variable and reset the class-level variable in
__init__
My queries/concerns are:
- the docs discourage the use of
__class_getitem__
outside of type hinting - It feels weird setting a generic type (specific to an instance) at the class level (even though it later gets reset to the default value)
- Related to the above, I get pyright complaining about
if DataSet._schema_annotations is None:
:Access to generic instance variable through class is ambiguous
- using
DataSet[Schema]
as a type hint anywhere in your code sets the_schema_annotations
column to beSchema
, e.g. see below code snippet (maybe this isn't an issue since it gets reset every time a new class is initialised, and that's the only context that attribute is being used)
from strictly_typed_pandas import DataSet
class SchemaA:
column_a: int
column_b: str
class SchemaB:
column_c: float
column_d: bool
df_a = DataSet[SchemaA]({"column_a": [1, 2, 3], "column_b": ["A", "B", "C"]})
print("before func def is", df_a._schema_annotations)
def function_for_df_b(df: DataSet[SchemaB]):
print(df._schema_annotations)
print("after func def is", df_a._schema_annotations)
# before func def is None
# after func def is <class '__main__.SchemaB'>
I'm curious to hear your thought processes around this design @nanne-aben ? I don't necessarily disagree with it - I've tried to find an alternative method and I think the way you've done it is probably the cleanest possible way to do it. But curious all the same to hear if you had those same concerns or if I'm overthinking it a bit!
Cheers!
Metadata
Metadata
Assignees
Labels
No labels