8000 docs/develop/qstr.rst: Add documentation for string interning. · jimmo/micropython@4ddd46e · GitHub
[go: up one dir, main page]

Skip to content

Commit 4ddd46e

Browse files
jimmodpgeorge
authored andcommitted
docs/develop/qstr.rst: Add documentation for string interning.
1 parent 25a9bcc commit 4ddd46e

File tree

4 files changed

+115
-0
lines changed

4 files changed

+115
-0
lines changed

docs/develop/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ See the `getting started guide
1010
:maxdepth: 1
1111

1212
cmodules.rst
13+
qstr.rst

docs/develop/qstr.rst

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
MicroPython string interning
2+
============================
3+
4+
MicroPython uses `string interning`_ to save both RAM and ROM. This avoids
5+
having to store duplicate copies of the same string. Primarily, this applies to
6+
identifiers in your code, as something like a function or variable name is very
7+
likely to appear in multiple places in the code. In MicroPython an interned
8+
string is called a QSTR (uniQue STRing).
9+
10+
A QSTR value (with type ``qstr``) is a index into a linked list of QSTR pools.
11+
QSTRs store their length and a hash of their contents for fast comparison during
12+
the de-duplication process. All bytecode operations that work with strings use
13+
a QSTR argument.
14+
15+
Compile-time QSTR generation
16+
----------------------------
17+
18+
In the MicroPython C code, any strings that should be interned in the final
19+
firmware are written as ``MP_QSTR_Foo``. At compile time this will evaluate to
20+
a ``qstr`` value that points to the index of ``"Foo"`` in the QSTR pool.
21+
22+
A multi-step process in the ``Makefile`` makes this work. In summary this
23+
process has three parts:
24+
25+
1. Find all ``MP_QSTR_Foo`` tokens in the code.
26+
27+
2. Generate a static QSTR pool containing all the string data (including lengths
28+
and hashes).
29+
30+
3. Replace all ``MP_QSTR_Foo`` (via the preprocessor) with their corresponding
31+
index.
32+
33+
``MP_QSTR_Foo`` tokens are searched for in two sources:
34+
35+
1. All files referenced in ``$(SRC_QSTR)``. This is all C code (i.e. ``py``,
36+
``extmod``, ``ports/stm32``) but not including third-party code such as
37+
``lib``.
38+
39+
2. Additional ``$(QSTR_GLOBAL_DEPENDENCIES)`` (which includes ``mpconfig*.h``).
40+
41+
*Note:* ``frozen_mpy.c`` (generated by mpy-tool.py) has its own QSTR generation
42+
and pool.
43+
44+
Some additional strings that can't be expressed using the ``MP_QSTR_Foo`` syntax
45+
(e.g. they contain non-alphanumeric characters) are explicitly provided in
46+
``qstrdefs.h`` and ``qstrdefsport.h`` via the ``$(QSTR_DEFS)`` variable.
47+
48+
Processing happens in the following stages:
49+
50+
1. ``qstr.i.last`` is the concatenation of putting every single input file
51+
through the C pre-processor. This means that any conditionally disabled code
52+
will be removed, and macros expanded. This means we don't add strings to the
53+
pool that won't be used in the final firmware. Because at this stage (thanks
54+
to the ``NO_QSTR`` macro added by ``QSTR_GEN_EXTRA_CFLAGS``) there is no
55+
definition for ``MP_QSTR_Foo`` it passes through this stage unaffected. This
56+
file also includes comments from the preprocessor that include line number
57+
information. Note that this step only uses files that have changed, which
58+
means that ``qstr.i.last`` will only contain data from files that have
59+
changed since the last compile.
60+
2. ``qstr.split`` is an empty file created after running ``makeqstrdefs.py split``
61+
on qstr.i.last. It's just used as a dependency to indicate that the step ran.
62+
This script outputs one file per input C file, ``genhdr/qstr/...file.c.qstr``,
63+
which contains only the matched QSTRs. Each QSTR is printed as ``Q(Foo)``.
64+
This step is necessary to combine the existing files with the new data
65+
generated from the incremental update in ``qstr.i.last``.
66+
67+
3. ``qstrdefs.collected.h`` is the output of concatenating ``genhdr/qstr/*``
68+
using ``makeqstrdefs.py cat``. This is now the full set of ``MP_QSTR_Foo``'s
69+
found in the code, now formatted as ``Q(Foo)``, one-per-line, with duplicates.
70+
This file is only updated if the set of qstrs has changed. A hash of the QSTR
71+
data is written to another file (``qstrdefs.collected.h.hash``) which allows
72+
it to track changes across builds.
73+
74+
4. ``qstrdefs.preprocessed.h`` adds in the QSTRs from qstrdefs*. It
75+
concatenates ``qstrdefs.collected.h`` with ``qstrdefs*.h``, then it transforms
76+
each line from ``Q(Foo)`` to ``"Q(Foo)"`` so they pass through the preprocessor
77+
unchanged. Then the preprocessor is used to deal with any conditional
78+
compilation in ``qstrdefs*.h``. Then the transformation is undone back to
79+
``Q(Foo)``, and saved as ``qstrdefs.preprocessed.h``.
80+
81+
5. ``qstrdefs.generated.h`` is the output of ``makeqstrdata.py``. For each
82+
``Q(Foo)`` in qstrdefs.preprocessed.h (plus some extra hard-coded ones), it outputs
83+
``QDEF(MP_QSTR_Foo, (const byte*)"hash" "Foo")``.
84+
85+
Then in the main compile, two things happen with ``qstrdefs.generated.h``:
86+
87+
1. In qstr.h, each QDEF becomes an entry in an enum, which makes ``MP_QSTR_Foo``
88+
available to code and equal to the index of that string in the QSTR table.
89+
90+
2. In qstr.c, the actual QSTR data table is generated as elements of the
91+
``mp_qstr_const_pool->qstrs``.
92+
93+
.. _`string interning`: https://en.wikipedia.org/wiki/String_interning
94+
95+
Run-time QSTR generation
96+
------------------------
97+
98+
Additional QSTR pools can be created at runtime so that strings can be added to
99+
them. For example, the code::
100+
101+
foo[x] = 3
102+
103+
Will need to create a QSTR for the value of ``x`` so it can be used by the
104+
"load attr" bytecode.
105+
106+
Also, when compiling Python code, identifiers and literals need to have QSTRs
107+
created. Note: only literals shorter than 10 characters become QSTRs. This is
108+
because a regular string on the heap always takes up a minimum of 16 bytes (one
109+
GC block), whereas QSTRs allow them to be packed more efficiently into the pool.
110+
111+
QSTR pools (and the underlying "chunks" that store the string data) are allocated
112+
on-demand on the heap with a minimum size.

py/mkrules.mk

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ $(OBJ): | $(HEADER_BUILD)/qstrdefs.generated.h $(HEADER_BUILD)/mpversion.h
7070
# - if anything in QSTR_GLOBAL_DEPENDENCIES is newer, then process all source files ($^)
7171
# - else, if list of newer prerequisites ($?) is not empty, then process just these ($?)
7272
# - else, process all source files ($^) [this covers "make -B" which can set $? to empty]
73+
# See more information about this process in docs/develop/qstr.rst.
7374
$(HEADER_BUILD)/qstr.i.last: $(SRC_QSTR) $(QSTR_GLOBAL_DEPENDENCIES) | $(QSTR_GLOBAL_REQUIREMENTS)
7475
$(ECHO) "GEN $@"
7576
$(Q)$(CPP) $(QSTR_GEN_EXTRA_CFLAGS) $(CFLAGS) $(if $(filter $?,$(QSTR_GLOBAL_DEPENDENCIES)),$^,$(if $?,$?,$^)) >$(HEADER_BUILD)/qstr.i.last

py/py.mk

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@ MPCONFIGPORT_MK = $(wildcard mpconfigport.mk)
231231
# created before we run the script to generate the .h
232232
# Note: we need to protect the qstr names from the preprocessor, so we wrap
233233
# the lines in "" and then unwrap after the preprocessor is finished.
234+
# See more information about this process in docs/develop/qstr.rst.
234235
$(HEADER_BUILD)/qstrdefs.generated.h: $(PY_QSTR_DEFS) $(QSTR_DEFS) $(QSTR_DEFS_COLLECTED) $(PY_SRC)/makeqstrdata.py mpconfigport.h $(MPCONFIGPORT_MK) $(PY_SRC)/mpconfig.h | $(HEADER_BUILD)
235236
$(ECHO) "GEN $@"
236237
$(Q)$(CAT) $(PY_QSTR_DEFS) $(QSTR_DEFS) $(QSTR_DEFS_COLLECTED) | $(SED) 's/^Q(.*)/"&"/' | $(CPP) $(CFLAGS) - | $(SED) 's/^\"\(Q(.*)\)\"/\1/' > $(HEADER_BUILD)/qstrdefs.preprocessed.h

0 commit comments

Comments
 (0)
0