Add asciidtype #10

ngoldbaum · 2022-12-06T21:18:55Z

Adds a fixed-width ASCII dtype. This is still pretty bare-bones but I figured it was worth getting in the repo in case anyone wants to look at a string dtype. I'll be continuing to work on this.

Going to leave this open until tomorrow for reviews and then self-merge then if I don't get any reviews, assuming tests pass.

seberg

Just a brief look through if I saw anything. I wonder if it would make sense to make the scalar just a subclass of str (i.e. on the path to having no scalar).
The main problem with that is NumPy's insistence to convert 0-D arrays to scalars (so long we enforce getitem not having a scalar is probably even fine).

seberg · 2022-12-07T09:30:00Z

asciidtype/asciidtype/src/casts.c

+    int contig = (strides[0] == ((ASCIIDTypeObject *)descrs[0])->size *
+                                        sizeof(char) &&
+                  strides[1] == ((ASCIIDTypeObject *)descrs[1])->size *
+                                        sizeof(char));


It seems to me that your contig loop also assumes descrs[0]->size == descrs[1]->size which is not checked below?

Yeah, I was planning to relax that requirement on a second pass, but added a hard error for now.

seberg · 2022-12-07T09:31:02Z

asciidtype/asciidtype/src/casts.c

+        *out = *in;
+        out++;
+        in++;
+    }


could even use plain memcpy if you like.

seberg · 2022-12-07T09:33:58Z

asciidtype/asciidtype/src/casts.c

+        *out_loop = (PyArrayMethod_StridedLoop *)&ascii_to_ascii_strided;
+    }
+    else {
+        *out_loop = (PyArrayMethod_StridedLoop *)&ascii_to_ascii_unaligned;


The aligment of char will be 1, so while I think I currently sometimes pass "unaligned" to not bother figuring out alignment, you could just use the same loop for both. I suppose you can also argue that char isn't standardized to 1 byte :).

It's still not clear to me how I'd actually reach the unaligned loop at all. What sort of python-level operation would trigger it? Not necessarily for ASCIIDType if that makes the question easier to answer.

Its hard, but there are four paths right now how you can reach unaligned casts:

I think I had some scalar paths where I didn't bother checking alignment and just calling the unaligned version). (I have to double check this, but I think it happens for casts between dtypes when assigning a single value.)

You can manage it by manually ensuring you have unaligned data ;). But that is tricky, e.g. with np.zeros(1001, dtype="uint8")[1:].view("U10"). (EDIT: sorry forgot the uint8)

Structured dtypes by default use packed structs rather than aligned ones. So if you do dtype=[("a", "?"), ("b", "U5")] than the "b" field is unaligned. (EDIT: this is the default, you can say align=True).

Structured dtypes are a mess and they (currently) do not even try to pass on correct alignment to the field dtypes when necessary (i.e. in casts). That is something that can be done, but I really would like to pass through the actual alignment, rather than just "its all aligned". (MAINT: Figure out alignment for complex loops numpy#17359)

The last point is the main one really, the second one is rather theoretical... Which is why my suggestion to allow flagging "do not allow unaligned".

Thanks! Having such a flag, with the default to not allow unaligned, would certainly reduce the conceptual overhead quite a bit.

seberg · 2022-12-07T09:38:57Z

asciidtype/asciidtype/src/dtype.c

+common_instance(ASCIIDTypeObject *dtype1, ASCIIDTypeObject *dtype2)
+{
+    if (!PyObject_RichCompareBool((PyObject *)dtype1, (PyObject *)dtype2,
+                                  Py_EQ)) {


N.B.: This is actually the same as checking whether your resolver returns NO_CASTING or EQUIV_CASTING.

mattip · 2022-12-07T10:46:26Z

asciidtype/tests/test_asciidtype.py

+    dtype = ASCIIDType(7)
+    arr = np.array(["hello", "this", "is", "an", "array"], dtype=dtype)
+    assert repr(arr) == (
+        "array(['hello', 'this', 'is', 'an', 'array'], dtype=ASCIIDType(7))")


Is it possible to have length detection like the built-in unicode/byte types:

arr = np.array(["hello", "this", "is", "an", "array"], dtype=ASCIIDType) assert arr.dtype == ASCIIDType(5)

I was going to say yes, but currently the dtype=ASCIIDType isn't passed through correctly, it probably works for np.core.multiarray._multiarray_umath._discover_array_parameters(), though.

Passing through the DType type/class rather than just the dtype is something we/I need to look into... (A similar alternative might be to introduce abstract instances, that cannot be attached to an array, but represent the same thing as ASCIIDType).

I opened numpy/numpy#22756 to track this, seems like it would be useful.

ngoldbaum added 8 commits December 6, 2022 12:13

add missing error handling

5a0152f

adding initial implementation of asciidtype; compiles but untested

ed95d72

add compilation database for asciidtype to pre-commit hook

1de1a14

ASCIIDType creation works

a907087

add asciidtype tests to CI run

fd500a6

add missing decref

c6f0262

array creation works

d34c656

fix github actions workflow

808970f

seberg reviewed Dec 7, 2022

View reviewed changes

mattip reviewed Dec 7, 2022

View reviewed changes

only have one loop for unaligned or strided casts

6be8300

ngoldbaum mentioned this pull request Dec 7, 2022

ENH: Allow automatic detection of dtype parameters for NEP 42 user dtypes numpy/numpy#22756

Closed

ngoldbaum merged commit 30d7137 into numpy:main Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add asciidtype #10

Add asciidtype #10

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add asciidtype #10

Add asciidtype #10

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!