-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
qstr uniqueness handling is overkill! #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Btw, did you consider design where some strings are interned and some not? I can say from my experiments (with other language) that non-interning design definitely spends more memory (there are lot of short strings which are used again). On the other hand, common sense says that wasting time to hash random strings destined just for printing makes no sense... |
The implementation of qstr (unique strings) and it's use needs thought. Using qstr's to store the string of an actual Python string object is a quick hack, really. We can change it so that Python string objects are not interned (turned into a qstr). I wanted to have it so that the initial qstr's can be put in read-only memory. That is, have part of the qstr table fixed at compile time. This basically implements enums but with a string representation. I'm brave enough to write a custom preprocessor if needed! |
Well, string interning is a common design pattern since early LISP interpreters and to present day, as http://en.wikipedia.org/wiki/String_interning shows a lot of modern languages keep using it. What's more important for the case that unbloated languages like Lua or Squirrel also use it. So, interning in MicroPython is definitely not a quick hack, but adherence to the best practices ;-). I did some testing on Squirrel (I spent hacking on it entire summer, and that's after ~2 years of consideration of whether I should bend me into Helsinki syndrome for Lua, try to continue with TinyPy or PyMite, or something else - yeah, I wish MicroPython came up a bit earlier ;-) ), and found that non-interning of strings leads to higher memory usage just for start up (i.e. just for initing method tables of objects and stuff) - something I didn't expect to be so noticeable even for interpreter startup. So, common sense says that interning is not always useful, but getting rid of it completely is unlikely a good idea.
Great, then you understand my worry about both supporting hashability of strings and supporting it for static strings either. I wish there were a language which could sanely represent strings, arrays, dictionaries (well, dictionaries' static support is definitely needed, because that's how object methods/fields are supposed to be stored - though I didn't review that part of uPy code yet). |
I did always plan to have qstr's hashed, but left that excercise for "some time down the road". Maybe that time is now :) I would want to have the global qstr table part static (for known keywords/names like "len" and "range") and part dynamic (for method names, etc that the user has in their script). Therefore, we need a way to have static hashing. Probably that means a custom preprocessor, or maybe just 1 file that is auto-generated and has all the static qstr's in it. Finally, I think it's a good idea to have 2 different kinds of Python string objects: one that is represented as a qstr, and one that is represented as a normal dynamically allocated array of char's. Of course, the end user does not see any difference between the 2, as they both have the same methods and type signature. The qstr objects are ones that are made directly in the script (eg x = 'abc'), the array of char ones are made dynamically in the script, eg x= 'abc'+'def', and y = '{} {}'.format(1,2) |
Well, I didn't want to imply that everything should be dropped and this issue fixed ASAP ;-), just wanted to record issues I see as I go thru the code. It all should be ordered by priority, for example #14 is definitely more important. I'm glad that we agree on the approach, and thanks for sharing other points like when to use interned vs raw strings - indeed, seems to be viable! |
Actually, on a second thought (d'oh!) we must store length together with a string, because Python strings can contain '\0', so current implementation using strcmp() and friends isn't really compliant. |
I didn't realise Python strings can contain '\0'... bytearray yes, but strings not. In the language reference it says strings can only contain "source characters", ie unicode characters in UTF-8 encoding. I didn't think '\0' was a unicode character. But, a simple test in Python ('\000') shows indeed that you can have '\0' in a string. |
If we want interned strings to live along side non-interned strings (in Python, in eg a dict), then, eg, interned Proposal: the first 4 bytes of the qstr are the hash. For compact memory storage it can be done the following way: first store the variably-encoded length, then optional hash bytes (the number of hash bytes depend on length of string), then the data. More explicitly (L=len byte, h=hash byte, d=data byte):
Such data could be 4-byte aligned, or not. |
Well, since the beginning of the discussion I had different layout in head, sorry if I doesn't come out clear - I indeed mixed up few things. Let me try to summarize what I had in mind with requirements behind it. Requirements:
This leads to following layout
That means that minimal-length non-empty string would be 4 bytes, which matches 32bit CPU alignment. Extra features:
|
Nice argumentation, I like it :-) Only point, since we have already the http://en.wikipedia.org/wiki/String_%28computer_science%29#Representations 2014/1/10 Paul Sokolovsky notifications@github.com
"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton |
Well, want I meant is that C compiler will store \0 for any literal string anyway. The only way to get around that is to use array init syntax: We of course not going to store \0 for dynamic strings (not coming from string literals). For example, dynamic string of 6 bytes will take strictly 8 bytes (6 bytes content + 1 byte hash + 1 byte (small) len). |
I see. So, you are saying that it shouldn't give any problems beside adding Send from my Samsung Galaxy Note II
|
One use of storing the null byte at the end is so you can parse strings without having to always check that you are within the string data boundaries. Eg, so long as I'm not looking for the null character, I can do a look-ahead check for some character, knowing that if I get a match then that match is before the end of the string (else I would hit the null character and not get a match). Anyway, adding the null byte is a small issue. In the end, I think @pfalcon and I argee with most things, it's just the way we are saying it.
|
Yeah, I can't immediately come up with other uses for hash (for example, comparing 2 non-interned strings, we don't need to compute hash - we can use it if it's there, but otherwise, computing hash will take ~ same amount of time as comparing chars directly). But those 2 uses are rather important to warrant having hash in qstr.
Well, we're not JavaScript here - python dicts can contain arbitrary type of keys, so we won't get around using object hash for that. And with all-interned optimization that you have, we're golden IMHO.
Well, I gave example where hash calc is superfluous and expensive - file.read() a big string, then write it somewhere. On our side it's O < O(N) (only allocation time, which is O(1) with ideal allocator). Then OS will read in string for us. It won't compute hash in parallel with reading, so we'll need to do that afterwards, spending O(N) time, which will be waste if we won't use that string as a key in dict. But I agree this case is an optimization, not giving (too) much for MCU case, so we can leave it for later. |
Another adv of adding byte length as part of string - then strcmp can cmpare this first and fail if not same. |
Per previous discussion, qstr is generic container type for any immutable strings. It can be bytes, utf-8, utf-16, utf-32, etc.
We definitely don't want to swell it by whole byte, but above I already proposed to add few more bits to first varlen byte (to still keep 2 bytes of overhead for short strings). So, it would be: bit 7 - "hash not valid", bits 6-5: content type (00 - bytes, 01 - utf-8, 10, 11 - reserved for custom implementations), bit 4 - "more length bytes", bits 3-0 - length lsbits. |
Yes, we can add bits to indicate the type, if needed. I think I just want to get the framework for this up and running first (ie a custom "preprocessor" that works out the static const hashes, etc), and then we can work out exactly what bits of information we need to store. |
I had an idea how to implement that much more automagical than I first thought. For reference, I thought we'd need something like:
Well, no need for that |
But first let's consider how we can get automagical interned static strings. As I said, having all keywords/identifiers used in source to be interned string should be one of the top goals of refactor, and here's how to get it for free (i.e. compile-time, not runtime). The keyword is string pooling, which is on by default in gcc (and apparently not even turnable off any longer). Demo of the idea:
So, as you see, only array syntax produces non-pooled string value, other string syntaxes are automagically "interned" in C app string table. |
And now impl of hash-len strings:
|
So, preprocessor would need to just go thru source and generate separate header with
defines. This will work for any string which contains identifier-clean chars (may even start with digits). Again, my guess that covers 99%f our needs. If we ever need to hash-len "+" or something, we can precompute hash by hand, and just put "\xNN" "\x01" "+" in source for such exceptional case ;-) |
Nice. Problem: we are relying heavily on the way gcc optimises strings. What if a particular C compiler for another architecture behaves differently? Then some qstrs match, and some don't. What about:
The idea would be to generate the |
Perhaps late to the party: Would you consider For longer strings, record hash_is_set in bit 6 of byte 3 A minor detail, but yes, Sparc likes 32-bit aligned 32-bit words. One difficulty is the possibility that a non-Unicode string and a Unicode string might encode the same Does the len describe the length in characters or the data length in bytes? No matter what: On 2014-01-10, at 4:47 PM, Paul Sokolovsky notifications@github.com wrote:
|
They way gcc optimizes strings also changes from time to time (as newer ways of doing things are discovered). So you might find that micropython starts to behave differently on a newer version of the toolchain. |
Yes, natural progression of the idea. And by using arrays, we'll even be able to get rid of trailing \0 if needed. |
Probably not in default source, but you can make your fork, profile it, and share results of how much speed up you get while using such highly conditional encoding ;-).
Python3 doesn't have non-unicode strings and 8-bit characters. Sadly. When hashing unicode strings (in any encoding), we'd rather hash chars (aka codepoints), not bytes. Note that currently uPy doesn't support unicode at all, so any suggestions are wishful thinking so far.
You're welcome to criticize one which will be used at the beginning (I vote for xor :-D ) and submit patches for alternatives (profiling data is plus) ;-).
Per the above discussion, "performance-optimized" build would align on 32-bit boundary. But ideally we'd support 8-bit. Also note that GC implementation used has own alignment requirements. |
I was thinking about CRC-8... |
On 2014-01-11, at 4:00 PM, Paul Sokolovsky notifications@github.com wrote:
I don't have scads of time, or I would. I agree with Jesus that CRC-8 is a fine candidate. There are also good "bit shufflers" that I know less Given that this is microPython, what are processors |
XOR is horrible. Only ADD is worse. But XOR+ROL would give good baseline to compare more "advanced" algos against.
That's huge surely?
For me it's simple: target not processors, but algorithms (large-scale, not just few features out of hundreds). Performance should be better than Lua and better than Node.js, or we won't take over the world ;-).
Who cares? ;-) Nobody will waste their tiny free time left from their work and real life to support somebody's magic instructions from press releases ;-). Note: everything is just IMHO ;-). |
Okay, now I understand fully what you meant! You wanted a string object to always be a raw pointer stuffed into an mp_obj_t, with bit 1 set to indicate this. The only difference between static/interned and dynamic is just where it pointed: ROM or RAM. The representation is the same. And that's why you were concerned about specifiying the encoding (UTF-8, UTF-16, ASCII) in the qstr data.
Okay, now I don't understand this statement. Why won't your original idea work? (One thing I can say is that it doesn't play nice with the GC, since the pointer to a dynamic string is not a proper pointer that the GC will recognise.) Note that with my way of doing it, the only performance hit is that a dynamic string needs only an extra word (or 2) in RAM, and that you need to distinguish between qstrs and dynamic strings. The storage for the dynamic string would like mp_obj_tulpe_t: a uPy header then the actual data (plus any extra info you wanted, like len and alloc). So there's no additional ptr dereference to get the data from the uPy object.
This would only work with help from the compiler (and its constant pooling optimisation), something I don't want to rely on. But actually even this is not enough: to dynamically intern a string I need a list of all static interned strings so I can see if the string already exists in the intern pool. If we have implicit interning in, eg, method definitions, there's no way we can tell which strings have been statically interned (without looking into the .rodata section at runtime, which is going to be very compiler dependent...). The only (?) solution to this (to stick with your idea) is to no longer require that qstrs are unique. A qstr is just a pointer to some data with a hash, len, and bytes. To check for equality, the hash and the data must match, ie we need to do a full strcmp for (at least) each successful lookup. For efficient equality comparison of qstrs (eg method & attr lookup) we need to guarantee that qstrs are unique, so that 2 qstrs are equal if and only if their pointers (or indexes, as is done now) match. At the moment we can actually have method names as qstrs: just add the string to the qstrdefs.h file and use the constant |
Not exactly, per #22, I wanted to come up with a grand unified scheme of storing string-like data. Then when you introduced qstr-in-mp_obj_t, I thus automagically thought about switching to qstr-in-mp_obj_t, which was logical mistake on my side.
Well, we also have dynamic interned strings. My idea was that we intern any dictionary-key like string, and so if we for example take a string from dictionary key, we know it's interned. But here's where it breaks: if we pass such string to another function, it no longer know if it's interned or not (in general case; it may happen that we'd pass interned string to functions where non-interned can never be passed, but that's too many if's for now).
Well, pointer to type + pointer to data is minimal overhead we can achieve, and if it goes beyond that (for example, to store length in object structure), then my original IMHO was that it's inefficient use of mem (that's why I wanted to use same encoding (which includes var-len len encoding) for all string objects).
And this is not ok per my previous thought: any non-interned string can become interned whenever it is used as a key in dictionary. If the operation of turning non-interned string into interned requires memory allocation (to build hash+len header), then it's not efficient. That's why I wanted to have same underlying encoding for all strings. My mistake is that I called such encoding [new] qstr, because I thought we can replace old qstr with this new encoding in one go. It's better to keep "qstr" for what it is now (and index into interning table), and use for example "hstr" for string with hash+varlen header.
Above we found a way how to do our own constant string pooling without relying on compiler ;-).
Sure, but now that we have custom preprocessor, we can do any magic, like building any structures we want automatically!
Yes, that's the idea I had stuck in my head ;-). Except that all static (i.e. compile-time) strings are automagically interned using a constant pool (maintained by custom preprocessor to avoid relying on compiler behavior).
Yes, that's the operation we'd do to intern a dynamic string.
Yes, but my whole idea was to make that all automatic. You just write All in all: what we have now (especially with today's patch) is good enough, and we can pause and start leveraging binary-safe strings (and do other things). I'll follow up with my ideas, and try to implement them, paying attention to details, and if everything actually fall into place, will submit patches for review. |
Note that we cannot intern a non-interned string just to do a dictionary lookup. Dict lookups (a pure lookup, fail if not found) must not allocate memory (else it becomes another banned operation in interrupts, etc). If you do a dict lookup of a non-interned string and you try to intern it, then, even if it is stored with hash/len/etc, you might still need to allocate a new qstr pool to point to this new interned string. |
*: Heading to 2015-09-08
475 pindefs added to ugfx ginput
This can be closed: qstrs now have a hash and a length stored with them, this data is generated automatically, and strings don't need to be interned (and are usually not interned if their length is above a threshold). |
@dpgeorge May I ask what is the threshold ?
I thought that string with len > 10 will not be interned , but apprently not . Should we implement some special character to imply that the string should not be interned . |
That's right, strings with len>10 are not interned.
Yes, all variable names are interned. But that's not the same as strings. Eg:
You can see that |
update from adafruit
… to include PR micropython#8 Signed-off-by: Christopher Arndt <chris@chrisarndt.de>
machine.pin.c: Some modifications to the Pin module
Add strings for morph
asan considers that memcmp(p, q, N) is permitted to access N bytes at each of p and q, even for values of p and q that have a difference earlier. Accessing additional values is frequently done in practice, reading 4 or more bytes from each input at a time for efficiency, so when completing "non_exist<TAB>" in the repl, this causes a diagnostic: ==16938==ERROR: AddressSanitizer: global-buffer-overflow on address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff READ of size 9 at 0x555555cd8dc8 thread T0 #0 0x7ffff726457a (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a) #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301 #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513 #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni #5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/ #6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m #7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308 #8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni 0x555555cd8dc8 is located 0 bytes to the right of global variable 'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of size 8 'import_str' is ascii string 'import ' Signed-off-by: Jeff Epler <jepler@gmail.com>
asan considers that memcmp(p, q, N) is permitted to access N bytes at each of p and q, even for values of p and q that have a difference earlier. Accessing additional values is frequently done in practice, reading 4 or more bytes from each input at a time for efficiency, so when completing "non_exist<TAB>" in the repl, this causes a diagnostic: ==16938==ERROR: AddressSanitizer: global-buffer-overflow on address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff READ of size 9 at 0x555555cd8dc8 thread T0 #0 0x7ffff726457a (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a) micropython#1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301 micropython#2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re micropython#3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513 micropython#4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni micropython#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/ micropython#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m micropython#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308 micropython#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni 0x555555cd8dc8 is located 0 bytes to the right of global variable 'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of size 8 'import_str' is ascii string 'import ' Signed-off-by: Jeff Epler <jepler@gmail.com>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc, where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux built with musl-libc. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc, where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc, where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str F987 r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. This commit changes nlrthumb.c to use a direct jump by default, but leaves the long jump workaround as an option for those cases where it's actually needed. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. This commit changes nlrthumb.c to use a direct jump by default, but leaves the long jump workaround as an option for those cases where it's actually needed. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. This commit changes nlrthumb.c to use a direct jump by default, but leaves the long jump workaround as an option for those cases where it's actually needed. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. This commit changes nlrthumb.c to use a direct jump by default, but leaves the long jump workaround as an option for those cases where it's actually needed. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6 1241 146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. This commit changes nlrthumb.c to use a direct jump by default, but leaves the long jump workaround as an option for those cases where it's actually needed. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Although the original motivation given for the workaround[1] is correct, nlr.o and nlrthumb.o are linked with a small enough distance that the problem does not occur, and the workaround isn't necessary. The distance between the b instruction and its target (nlr_push_tail) is just 64 bytes[2], well within the ±2046 byte range addressable by an unconditional branch instruction in Thumb mode. The workaround induces a relocation in the text section (textrel), which isn't supported everywhere, notably not on musl-libc[3], where it causes a crash on start-up. With the workaround removed, micropython works on an ARMv5T Linux system built with musl-libc. This commit changes nlrthumb.c to use a direct jump by default, but leaves the long jump workaround as an option for those cases where it's actually needed. [1]: commit dd376a2 Author: Damien George <damien.p.george@gmail.com> Date: Fri Sep 1 15:25:29 2017 +1000 py/nlrthumb: Get working again on standard Thumb arch (ie not Thumb2). "b" on Thumb might not be long enough for the jump to nlr_push_tail so it must be done indirectly. [2]: Excerpt from objdump -d micropython: 000095c4 <nlr_push_tail>: 95c4: b510 push {r4, lr} 95c6: 0004 movs r4, r0 95c8: f02d fd42 bl 37050 <mp_thread_get_state> 95cc: 6943 ldr r3, [r0, micropython#20] 95ce: 6023 str r3, [r4, #0] 95d0: 6144 str r4, [r0, micropython#20] 95d2: 2000 movs r0, #0 95d4: bd10 pop {r4, pc} 000095d6 <nlr_pop>: 95d6: b510 push {r4, lr} 95d8: f02d fd3a bl 37050 <mp_thread_get_state> 95dc: 6943 ldr r3, [r0, micropython#20] 95de: 681b ldr r3, [r3, #0] 95e0: 6143 str r3, [r0, micropython#20] 95e2: bd10 pop {r4, pc} 000095e4 <nlr_push>: 95e4: 60c4 str r4, [r0, micropython#12] 95e6: 6105 str r5, [r0, micropython#16] 95e8: 6146 str r6, [r0, micropython#20] 95ea: 6187 str r7, [r0, micropython#24] 95ec: 4641 mov r1, r8 95ee: 61c1 str r1, [r0, micropython#28] 95f0: 4649 mov r1, r9 95f2: 6201 str r1, [r0, micropython#32] 95f4: 4651 mov r1, sl 95f6: 6241 str r1, [r0, micropython#36] @ 0x24 95f8: 4659 mov r1, fp 95fa: 6281 str r1, [r0, micropython#40] @ 0x28 95fc: 4669 mov r1, sp 95fe: 62c1 str r1, [r0, micropython#44] @ 0x2c 9600: 4671 mov r1, lr 9602: 6081 str r1, [r0, micropython#8] 9604: e7de b.n 95c4 <nlr_push_tail> [3]: https://www.openwall.com/lists/musl/2020/09/25/4 Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Well, checking each string using strcmp() against inter pool obviously has quadratic complexity - on creation on each string, which is often in a dynamic language. That's not a good design choice, not even for small systems, not even as initial implementation, to be optimized later. Please don't follow Espruino way where dead simple algorithms lead to speed of program depend on number of spaces in source.
So, there definitely should be hashing. Here's my suggestion:
Now, I may imagine one of the reason hashing, etc. wasn't added right away. It's definitely nice to be able to do qstr_from_str_static("foo"), and know this doesn't waste any single byte more than absolutely needed. But now hash and length would need to be part of string, or otherwise they should be computed at runtime, and stored in RAM (oh no!).
With C++11, it might be possible to deal with that in compile-time automagically using constexpr's. But C macros are obviously won't help here. So, general way to solve that would be to have custom preprocessor which replace "macros" like HSTR(hash, len, "foo") with HSTR("\xff", "\x03", "foo").
Are you brave enough to go for something like that? ;-)
(Extra note: As you also use C99, I tried to look for a way to (ab)use compound literals, but there doesn't seem to be a way to create static compound literal).
The text was updated successfully, but these errors were encountered: