Description
Well, checking each string using strcmp() against inter pool obviously has quadratic complexity - on creation on each string, which is often in a dynamic language. That's not a good design choice, not even for small systems, not even as initial implementation, to be optimized later. Please don't follow Espruino way where dead simple algorithms lead to speed of program depend on number of spaces in source.
So, there definitely should be hashing. Here's my suggestion:
- Wasting much memory on hash is not good either. Single byte is enough to make a great difference. Yes, that will limit efficiency of very large hash tables, but well, the talk is about handling real-word case and scaling down, for real-world and scaling up there's CPython after all. But see below anyway.
- I'd suggest to take a chance and store length with string. That's obviously useful for len(), for comparison, and for hash tables too, effectively adding more bits (maybe not as uniformly spread as hash). It's harder to figure out how to store it. It's definitely not good to store 4 bytes of it (using 32-bit speak). Arbitrary limits on string length are not good either. Variable-length encoding is the only good choice then - that's still will be much faster then strlen().
Now, I may imagine one of the reason hashing, etc. wasn't added right away. It's definitely nice to be able to do qstr_from_str_static("foo"), and know this doesn't waste any single byte more than absolutely needed. But now hash and length would need to be part of string, or otherwise they should be computed at runtime, and stored in RAM (oh no!).
With C++11, it might be possible to deal with that in compile-time automagically using constexpr's. But C macros are obviously won't help here. So, general way to solve that would be to have custom preprocessor which replace "macros" like HSTR(hash, len, "foo") with HSTR("\xff", "\x03", "foo").
Are you brave enough to go for something like that? ;-)
(Extra note: As you also use C99, I tried to look for a way to (ab)use compound literals, but there doesn't seem to be a way to create static compound literal).