8000 BUG: modelling various integer types as a specialized version of the generic class breaks type-checking in pyright · Issue #23007 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: modelling various integer types as a specialized version of the generic class breaks type-checking in pyright #23007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ischurov opened this issue Jan 13, 2023 · 4 comments

Comments

@ischurov
Copy link
ischurov commented Jan 13, 2023

Describe the issue:

I want to make sure that some variable is an instance of np.uint64. To this end, I use the following:

assert isinstance(idx, np.uint64)

However, in that case I have the following error on that line during type-checking with pyright:

Second argument to "isinstance" must be a class or tuple of classes
  TypeVar or generic type with type arguments not allowed

As the developers of pyright explained, this is due to fact that type stubs model np.uint64 as a specialized version of the generic class unsignedinteger (uint64 = unsignedinteger[_64Bit]), while in fact it is a proper subclass of unsignedinteger. As specialized versions of a class (like list[int]) are not allowed in isinstance, the type-checker reports an error, making it impossible to use this assert at all.

Are there any reasons to model these types (like np.uint64, np.uint32, etc) in this way (as a specialized versions) instead of making separate types?

Reproduce the code example:

import numpy as np
import pandas as pd

def foo(x: np.uint64):
    pass
series = pd.Series([1, 2, 3], index=np.array([10, 20, 30], dtype='uint64'))
idx = series.index[0]
assert isinstance(idx, np.uint64)
foo(idx)

Error message:

(type alias) uint64: Type[unsignedinteger[_64Bit]]
Second argument to "isinstance" must be a class or tuple of classes
  TypeVar or generic type with type arguments not allowed

Runtime information:

1.24.1
3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0]
[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': '/vol/tcm10/ischurov/.conda/envs/latsym2/lib/libopenblasp-r0.3.21.so',
'internal_api': 'openblas',
'num_threads': 1,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.21'}]
None

Context for the issue:

Initially, I had a code that looked like the following:

def foo(x: np.uint64):
    pass
series = pd.Series([1, 2, 3], index=np.array([10, 20, 30], dtype='uint64'))
idx = series.index[0]
foo(idx)

On the type-checking, I have an error: Argument of type "Scalar | Unknown" cannot be assigned to parameter "x" of type "uint64" in function "foo". That's because the type-checker cannot infer that index of pd.Series has dtype uint64. Due to open issue in pandas, it is not yet possible to specify pd.Series.index's dtype using type hints. To give the type-checker information that idx has type np.uint64, I want to add the following assert before running foo:

assert isinstance(idx, np.uint64)

And that leads to type-checker error, as explained above.

@adampauls
Copy link

Gentle nudge here. Is this a difficult change?

@BvB93
Copy link
Member
BvB93 commented Jun 21, 2023

Gentle nudge here. Is this a difficult change?

Sort of. With how difficult it is to statically infer the bit sizes of returned numpy scalars & array dtypes it is currently not viable to model the various integer types as proper subclasses (rather than just parametrized subtypes). Returning the correct subclass is important as one can not freely downcast to a subclass without using heavy handed tools such as typing.cast or # type: ignore comments.

class Foo:
    pass

class Bar(Foo):
    pass

# Incompatible types in assignment (expression has type "Foo", variable has type "Bar")  [assignment]
var: Bar = Foo()

On the other hand, if the supertype is an Any-parametrized generic it is possible to freely cast this Any parameter to a subtype due its role as a wildcard. This was the motivation for modeling various np.number subclasses as parametrized generics rather than proper subclasses, and until we have better typing tools available for better capturing numpy's dtype promotion rules converting them to proper subclasses is simply not viable, requiring a substantial improvement in the quality of the statically inferred returned dtypes that we simply cannot provide.

There is one other solution I can think of here: avoid even trying to deal with bit sizes and treat the likes of np.int64, np.int32, etc. as np.signedinteger aliases. With the current state of the numpy's type annotations this would honestly be a rather minor regression (even more so in recent mypy versions due to the regression mentioned herein: #23764). This would remove the ability for type checkers to distinguish between e.g. np.int64 and np.int32, but in practice this more often than not too difficult already (forcing use of signedinteger[Any] as a compromise).

int8: typing.TypeAlias = signedinteger
int16: typing.TypeAlias = signedinteger
int32: typing.TypeAlias = signedinteger
int64: typing.TypeAlias = signedinteger
# etc.

@adampauls
Copy link
adampauls commented Jun 22, 2023

I would be happy with that solution, for whatever that is worth. Have you considered a solution that has two type annotations for each dtype:

class int8(signedInteger[_8Bit]):
    ...

Int8 =  signedinteger[_8Bit]

This requires some intelligence on behalf of the user, since they have to understand that sometimes a type that should be int8 can only be inferred to Int8, but I'm not sure how often this comes up. I expect it comes up infrequently relative to the annoyance of not being able to do an isinstance check without pyright complaining. I think this might be intuitive to Python typing users, however, since a similar issue happens with list/Sequence and dict/Mapping etc.

This second solution can also get better with time, either by adding increasingly precise overloads or in some cases by making use of new features like the Self type.

@adampauls
Copy link

I started the solution above here. I think it looks pretty good. There are still some test failures, but I think it's worth figuring out if there's any interest before fixing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0