-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Numpy does not recognize ctypes arrays with c_wchar field #10100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, 8000 you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not familiar with the numpy source code. Tried to trace the code to the point where it checks the type of the ctype array field, and found this: numpy/numpy/core/src/multiarray/scalarapi.c Line 463 in a2bddfa
Is it possible that this line does not work with |
Numpy is trying and failing to parse this type using the PEP3118 buffer formats: >>> memoryview((A*2)()).format
'T{(5)<u:s:}' Because it fails, it falls back on using Unfortunately, numpy doesn't yet have a wide (UCS2) character type - it only has bytes, and UCS4. I suppose we could fall back on either |
Just some initial thoughts looking at this: This has something to do with the PEP3118 "buffer" interface. See https://www.python.org/dev/peps/pep-3118/ What happens is that the ctypes object exposes a pep3118 interface, and numpy tries to read the data using that interface. 'memoryviews' in python give you some kind of access to the raw interface intermediate: [1]: import numpy as np
...: import ctypes
...:
...: class A(ctypes.Structure):
...: _fields_ = [('s', ctypes.c_wchar * 5)]
...:
...: m = memoryview((A*2)())
...: m.format
...:
'T{(5)<u:s:}' The memoryview Numpy has to translate this mini-language into a numpy dtype. This happens in Numpy can't handle ucs-2 internally, so it can't represent that format, so it fails and returns an object array (which is kind of ugly..). So one train of though is that numpy just can't convert that ctypes object to an array, and we should be raising an error message somewhere instead of converting to object. But it also occurs to me that maybe this is a ctypes bug. Why is it encoding things as ucs-2? Is it really doing that, or is it incorrectly setting the pep3118 format? Do you happen to know what the encoding of unicode |
>>> ctypes.sizeof(ctypes.c_wchar)
4
>>> memoryview(ctypes.c_wchar()).format
'<u' The first line suggests it is actually ucs-4. This makes me think this is a ctypes bug, and ctypes should use |
It gives 2 on my system, suggesting that ctypes is not checking to see which implementation-defined wchar it is using |
It's quite possible the pep3118 type codes in ctypes are buggy (its type
codes for integer types are buggy in Python < 3.7.dev).
|
I'm not sure this is the same bug, but there's also something funky with the In [1]: import ctypes
In [2]: import numpy as np
In [3]: class Test(ctypes.Structure):
...: # using ctypes.c_float works fine
...: _fields_ = [("a", ctypes.c_byte * 10), ("b", ctypes.c_float)]
...:
In [4]: x = (Test * 5)()
In [5]: x[0].b = 3
In [6]: np.array(x)
/home/alan/workspace/scratch/py3.venv/bin/ipython:1: RuntimeWarning: Item size computed from the PEP 3118 buffer format string does not match the actual item size.
#!/home/alan/workspace/scratch/py3.venv/bin/python3.6
Out[6]:
array([([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0.),
([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0.),
([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0.),
([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0.),
([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0.)],
dtype=[('a', 'i1', (10,)), ('b', '<f4')]) (Notice that the values are all 0 instead of having one of the This is with NumPy 1.13.3 and Python 3.6 |
Numpy 1.14 and python 3.5 gives:
My guess is that structure padding is causing your problem there |
Ctypes seems to produce struct format that doesn't contain padding:
```
>> memoryview(x).format
'T{(10)<b:a:<f:b:}'
```
At the moment it seems the ctypes pep3118 implementation needs some work
before it can relied on.
In my experience, the only way to get this fixed is to do it yourself
and then send the PRs to https://github.com/python/cpython --- the
cpython maintainers are of course up to their necks in PRs, but there is
hope to get patches through even as a random contributor.
|
Yup -- it's definitely a padding issue (I guess we'll just make sure that all our byte and char arrays have lengths that are divisible by 4 for now 😆). Possibly a dumb question: should
and |
Padding bytes are a bit of a mess too. I tried working on this here #7798, which is on hiatus but we should get back to it one day. You can see that I progressively discovered new understandings of how padding bytes are supposed to be interpreted, and some fundamental problems with the 3118 interface. |
@alanhdu: note that |
The solution is to fix the bugs in ctypes that make it output invalid
format strings.
|
@pv: Little slow on the uptake here, but I've attempted that in python/cpython#5561 |
@eric-wieser: note that it needs "..Q..".replace("Q", s_ulonglong) in
the tests --- on some platforms in ctypes "c_ulonglong is c_long" so
that there's no type that produces a "Q" code.
|
That comment might be better placed on that PR. Isn't Q required to be 8 bytes? I choose it deliberately to try and avoid the mess that is |
It's possible that `unsigned long` and `unsigned long long` are both 64
bits in size, in which case I think you get `c_ulong is c_ulonglong` in
ctypes.
|
Right, but it's still required to serialize as |
Yes, but if you write `c_ulonglong` in ctypes it means the corresponding
c type which does not necessarily produce "Q" in the format string.
|
If `sizeof(unsigned long long) == sizeof(unsigned long)` then it's
literally true that `assert ctypes.c_ulong is ctypes.c_ulonglong`
|
All that matters is whether Either way, I've update the PR to use |
Ok, sorry yes, now I see the statements are not contradictory, it's
indeed guaranteed to produce "Q" in the standard size mode.
|
Since you're here - do you know which of |
It produces `l` (the logic written in the test describes the actual
situation).
|
Seems weird to me not to drop |
Looking at |
In 1.16, as of #12254, this now gives:
A little weirdly, the following also fails with the same error:
However, construction object arrays with ctype values at each entry was never supported anyway. |
The casting of a
ctypes
Structure object array to numpy array usually results with a numpy structured array. But if the Structure contains ac_wchar
array (string) field it becomes an object. I could not find the reason why this is happening.Example:
While:
Environment:
Python 3.6, numpy 1.13.1, Ubuntu 16.04
The text was updated successfully, but these errors were encountered: