You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DyND's string representation could use some refinement. Currently, there are two ways strings are represented: A numpy-style fixed size buffer, and a pooled allocation with each string being a pair of pointers into that pool. The default string type is the latter, using the utf-8 encoding. This has some slightly unintuitive consequences, the biggest being that the string acts as a "write once" type. This is fine for simple data conversions and some kinds of computations, but not for interactive manipulation or algorithms which will repeatedly append/modify an array of strings.
Some properties we would like DyND's string representation to have include:
Heap allocation by default, but allow for pooled allocation and referring to strings inside other buffers.
Support the small string optimization, so strings that fit in 15 or fewer bytes don't require a separate memory allocation.
Have other string representations, like fixed-size buffers or various encodings, be expression types whose value type is the standard string type, and whose storage type is bytes[N] or bytes.
It may be desirable to have an additional "rope" type to represent enormous editable strings, but this is not an immediate priority.
The implementation changes to represent strings satisfying the desired properties are:
Change the memory block allocation to have a heap vs pooled capability.
Introduce the small string optimization. Make the storage be two 64-bit values on all platforms, with the last byte signalling whether it refers to an external buffer or data in the first 15 bytes. Some accounting for big/little endian must occur here.
Change the fixedstring, etc types to be "adapt" types, to fit them into a uniform adaptation mechanism. Probably good to still have string[...] aliases for the type representation for simple spellings of these types.
The text was updated successfully, but these errors were encountered:
your number 2 is effectively interning, so +1 on that.
In theory strings < 5 in length should be differently but probably complicates things. (as you can hold this in a single 64-point pointer).
DyND's string representation could use some refinement. Currently, there are two ways strings are represented: A numpy-style fixed size buffer, and a pooled allocation with each string being a pair of pointers into that pool. The default string type is the latter, using the utf-8 encoding. This has some slightly unintuitive consequences, the biggest being that the string acts as a "write once" type. This is fine for simple data conversions and some kinds of computations, but not for interactive manipulation or algorithms which will repeatedly append/modify an array of strings.
Some properties we would like DyND's string representation to have include:
It may be desirable to have an additional "rope" type to represent enormous editable strings, but this is not an immediate priority.
The implementation changes to represent strings satisfying the desired properties are:
The text was updated successfully, but these errors were encountered: