Discussion around __class_get_item__

@nanne-aben

Hey - really like this library! Unfortunately we have migrated to polars, so I've been looking at implementing something similar for my use-case.

I had a couple of queries around this code snippet in DataSet:

    _schema_annotations = None

    def __class_getitem__(cls, item):
        """Allows us to define a schema for the ``DataSet``."""
        cls = super().__class_getitem__(item)
        cls._schema_annotations = item
        return cls

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        if DataSet._schema_annotations is None:
            return

        schema_expected = get_type_hints(DataSet._schema_annotations)
        DataSet._schema_annotations = None

To summarise:

set _schema_annotations (the generic instance type) at class-level in __class_getitem__
store this as an internal function variable and reset the class-level variable in __init__

My queries/concerns are:

the docs discourage the use of __class_getitem__ outside of type hinting
It feels weird setting a generic type (specific to an instance) at the class level (even though it later gets reset to the default value)
Related to the above, I get pyright complaining about if DataSet._schema_annotations is None: : Access to generic instance variable through class is ambiguous
using DataSet[Schema] as a type hint anywhere in your code sets the _schema_annotations column to be Schema, e.g. see below code snippet (maybe this isn't an issue since it gets reset every time a new class is initialised, and that's the only context that attribute is being used)

from strictly_typed_pandas import DataSet

class SchemaA:
    column_a: int
    column_b: str

class SchemaB:
    column_c: float
    column_d: bool

df_a = DataSet[SchemaA]({"column_a": [1, 2, 3], "column_b": ["A", "B", "C"]})

print("before func def is", df_a._schema_annotations)

def function_for_df_b(df: DataSet[SchemaB]):
    print(df._schema_annotations)

print("after func def is", df_a._schema_annotations)

# before func def is None
# after func def is <class '__main__.SchemaB'>

I'm curious to hear your thought processes around this design @nanne-aben ? I don't necessarily disagree with it - I've tried to find an alternative method and I think the way you've done it is probably the cleanest possible way to do it. But curious all the same to hear if you had those same concerns or if I'm overthinking it a bit!

Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions