8000 Avoid slices in entries of decomposition tables · unicode-rs/unicode-normalization@7c265f8 · GitHub
[go: up one dir, main page]

Skip to content
8000

Commit 7c265f8

Browse files
committed
Avoid slices in entries of decomposition tables
For small arrays of data, slices are expensive: - there's the obvious size of the array (`sizeof::<char>() * length`) - there's the size of the slice itself (`sizeof::<(*const str, usize)>()`) - there's the size of the relocation for the pointer in the slice. The worst case is on 64-bits ELF, where it is `3 * sizeof::<usize>()`(!). Most entries in decomposition tables are 2 characters or less, so the overhead for each of these tables is incredibly large. To give an idea, a print "Hello, World" (fresh from cargo new) executable built with `--release` on my machine has 17712 bytes of .rela.dyn (relocations) and 9520 bytes of .data.rel.ro (relocatable read-only data). Adding unicode-normalization as a dependency and changing the code to `println!("{}", String::from_iter("Hello, world!".nfc()));`, those jump to, respectively, 156336 and 147968 bytes. For comparison, with unicode-normalization 0.1.8 (last release before the perfect hashes), they were 18168 and 9872 bytes. This is however compensated by the .text (code) being larger (314607 with 0.1.8 vs. 234639 with 0.1.19); likewise for .rodata (non-relocatable read-only data) (225979 with 0.1.8, vs. 82523 with 0.1.19). This can be alleviated by replacing slices with indexes into a unique slice per decomposition table, overall saving 228K (while barely adding to code size (160 bytes)). This also makes the overall cost of unicode-normalization lower than what it was in 0.1.8. As far as performance is concerned, at least on my machine, it makes virtually no difference on `cargo bench`: on master: running 22 tests test bench_is_nfc_ascii ... bench: 13 ns/iter (+/- 0) test bench_is_nfc_normalized ... bench: 23 ns/iter (+/- 0) test bench_is_nfc_not_normalized ... bench: 347 ns/iter (+/- 2) test bench_is_nfc_stream_safe_ascii ... bench: 13 ns/iter (+/- 0) test bench_is_nfc_stream_safe_normalized ... bench: 31 ns/iter (+/- 0) test bench_is_nfc_stream_safe_not_normalized ... bench: 374 ns/iter (+/- 2) test bench_is_nfd_ascii ... bench: 9 ns/iter (+/- 0) test bench_is_nfd_normalized ... bench: 29 ns/iter (+/- 2) test bench_is_nfd_not_normalized ... bench: 9 ns/iter (+/- 0) test bench_is_nfd_stream_safe_ascii ... bench: 16 ns/iter (+/- 0) test bench_is_nfd_stream_safe_normalized ... bench: 40 ns/iter (+/- 0) test bench_is_nfd_stream_safe_not_normalized ... bench: 9 ns/iter (+/- 0) test bench_nfc_ascii ... bench: 525 ns/iter (+/- 1) test bench_nfc_long ... bench: 186,528 ns/iter (+/- 1,613) test bench_nfd_ascii ... bench: 283 ns/iter (+/- 30) test bench_nfd_long ... bench: 120,183 ns/iter (+/- 4,510) test bench_nfkc_ascii ... bench: 513 ns/iter (+/- 1) test bench_nfkc_long ... bench: 192,922 ns/iter (+/- 1,673) test bench_nfkd_ascii ... bench: 276 ns/iter (+/- 30) test bench_nfkd_long ... bench: 137,163 ns/iter (+/- 2,159) test bench_streamsafe_adversarial ... bench: 323 ns/iter (+/- 5) test bench_streamsafe_ascii ... bench: 25 ns/iter (+/- 0) with patch applied: running 22 tests test bench_is_nfc_ascii ... bench: 13 ns/iter (+/- 0) test bench_is_nfc_normalized ... bench: 23 ns/iter (+/- 0) test bench_is_nfc_not_normalized ... bench: 347 ns/iter (+/- 7) test bench_is_nfc_stream_safe_ascii ... bench: 13 ns/iter (+/- 0) test bench_is_nfc_stream_safe_normalized ... bench: 36 ns/iter (+/- 1) test bench_is_nfc_stream_safe_not_normalized ... bench: 377 ns/iter (+/- 14) test bench_is_nfd_ascii ... bench: 9 ns/iter (+/- 0) test bench_is_nfd_normalized ... bench: 29 ns/iter (+/- 3) test bench_is_nfd_not_normalized ... bench: 10 ns/iter (+/- 0) test bench_is_nfd_stream_safe_ascii ... bench: 16 ns/iter (+/- 0) test bench_is_nfd_stream_safe_normalized ... bench: 39 ns/iter (+/- 1) test bench_is_nfd_stream_safe_not_normalized ... bench: 10 ns/iter (+/- 0) test bench_nfc_ascii ... bench: 545 ns/iter (+/- 2) test bench_nfc_long ... bench: 186,348 ns/iter (+/- 1,660) test bench_nfd_ascii ... bench: 281 ns/iter (+/- 2) test bench_nfd_long ... bench: 124,720 ns/iter (+/- 5,967) test bench_nfkc_ascii ... bench: 517 ns/iter (+/- 4) test bench_nfkc_long ... bench: 194,943 ns/iter (+/- 1,636) test bench_nfkd_ascii ... bench: 274 ns/iter (+/- 0) test bench_nfkd_long ... bench: 127,973 ns/iter (+/- 1,161) test bench_streamsafe_adversarial ... bench: 320 ns/iter (+/- 3) test bench_streamsafe_ascii ... bench: 25 ns/iter (+/- 0)
1 parent 68f2f55 commit 7c265f8

File tree

3 files changed

+17803
-6760
lines changed

3 files changed

+17803
-6760
lines changed

scripts/unicode.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -391,9 +391,19 @@ def gen_composition_table(canon_comp, out):
391391
def gen_decomposition_tables(canon_decomp, compat_decomp, cjk_compat_variants_decomp, out):
392392
tables = [(canon_decomp, 'canonical'), (compat_decomp, 'compatibility'), (cjk_compat_variants_decomp, 'cjk_compat_variants')]
393393
for table, name in tables:
394-
gen_mph_data(name + '_decomposed', table, "(u32, &'static [char])",
395-
lambda k: "(0x{:x}, &[{}])".format(k,
396-
", ".join("'\\u{%s}'" % hexify(c) for c in table[k])))
394+
offsets = {}
395+
offset = 0
396+
out.write("pub(crate) const %s_DECOMPOSED_CHARS: &[char] = &[\n" % name.upper())
397+
for k, v in table.items():
398+
offsets[k] = offset
399+
offset += len(v)
400+
for c in v:
401+
out.write(" '\\u{%s}',\n" % hexify(c))
402+
# The largest offset must fit in a u16.
403+
assert offset < 65536
404+
out.write("];\n")
405+
gen_mph_data(name + '_decomposed', table, "(u32, (u16, u16))",
406+
lambda k: "(0x{:x}, ({}, {}))".format(k, offsets[k], len(table[k])))
397407

398408
def gen_qc_match(prop_table, out):
399409
out.write(" match c {\n")

src/lookups.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ pub(crate) fn canonical_fully_decomposed(c: char) -> Option<&'static [char]> {
5151
pair_lookup_fv_opt,
5252
None,
5353
)
54+
.map(|(start, len)| &CANONICAL_DECOMPOSED_CHARS[start as usize..][..len as usize])
5455
}
5556

5657
pub(crate) fn compatibility_fully_decomposed(c: char) -> Option<&'static [char]> {
@@ -62,6 +63,7 @@ pub(crate) fn compatibility_fully_decomposed(c: char) -> Option<&'static [char]>
6263
pair_lookup_fv_opt,
6364
None,
6465
)
66+
.map(|(start, len)| &COMPATIBILITY_DECOMPOSED_CHARS[start as usize..][..len as usize])
6567
}
6668

6769
pub(crate) fn cjk_compat_variants_fully_decomposed(c: char) -> Option<&'static [char]> {
@@ -73,6 +75,7 @@ pub(crate) fn cjk_compat_variants_fully_decomposed(c: char) -> Option<&'static [
7375
pair_lookup_fv_opt,
7476
None,
7577
)
78+
.map(|(start, len)| &CJK_COMPAT_VARIANTS_DECOMPOSED_CHARS[start as usize..][..len as usize])
7679
}
7780

7881
/// Return whether the given character is a combining mark (`General_Category=Mark`)

0 commit comments

Comments
 (0)
0