8000 Discussion around `__class_get_item__` · Issue #274 · nanne-aben/strictly_typed_pandas · GitHub
[go: up one dir, main page]

Skip to content
Discussion around __class_get_item__ #274
@kevinleahy-switchdin

Description

@kevinleahy-switchdin

Hey - really like this library! Unfortunately we have migrated to polars, so I've been looking at implementing something similar for my use-case.

I had a couple of queries around this code snippet in DataSet:

    _schema_annotations = None

    def __class_getitem__(cls, item):
        """Allows us to define a schema for the ``DataSet``."""
        cls = super().__class_getitem__(item)
        cls._schema_annotations = item
        return cls

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        if DataSet._schema_annotations is None:
            return

        schema_expected = get_type_hints(DataSet._schema_annotations)
        DataSet._schema_annotations = None

To summarise:

  • set _schema_annotations (the generic instance type) at class-level in __class_getitem__
  • store this as an internal function variable and reset the class-level variable in __init__

My queries/concerns are:

  1. the docs discourage the use of __class_getitem__ outside of type hinting
  2. It feels weird setting a generic type (specific to an instance) at the class level (even though it later gets reset to the default value)
  3. Related to the above, I get pyright complaining about if DataSet._schema_annotations is None: : Access to generic instance variable through class is ambiguous
  4. using DataSet[Schema] as a type hint anywhere in your code sets the _schema_annotations column to be Schema, e.g. see below code snippet (maybe this isn't an issue since it gets reset every time a new class is initialised, and that's the only context that attribute is being used)
from strictly_typed_pandas import DataSet

class SchemaA:
    column_a: int
    column_b: str

class SchemaB:
    column_c: float
    column_d: bool

df_a = DataSet[SchemaA]({"column_a": [1, 2, 3], "column_b": ["A", "B", "C"]})

print("before func def is", df_a._schema_annotations)

def function_for_df_b(df: DataSet[SchemaB]):
    print(df._schema_annotations)

print("after func def is", df_a._schema_annotations)

# before func def is None
# after func def is <class '__main__.SchemaB'>

I'm curious to hear your thought processes around this design @nanne-aben ? I don't necessarily disagree with it - I've tried to find an alternative method and I think the way you've done it is probably the cleanest possible way to do it. But curious all the same to hear if you had those same concerns or if I'm overthinking it a bit!

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0