You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
wip Fully implement the UAX python#15 quick-check algorithm.
TODO:
- news etc.?
- test somehow? at least make sure semantic tests are adequate
- that "older version" path... shouldn't it be MAYBE?
- mention explicitly in commit message that *this* is the actual
algorithm from UAX python#15
- think if there are counter-cases where this is slower.
If caller treats MAYBE same as NO... e.g. if caller actually just
wants to normalize? May need to parametrize and offer both behaviors.
This lets us return a NO answer instead of MAYBE when that's what a
Quick_Check property tells us; or also when that's what the canonical
combining classes tell us, after a Quick_Check property has said "maybe".
At a quick test on my laptop, the existing code takes about 6.7 ms/MB
(so 6.7 ns per byte) when the quick check returns MAYBE and it has to
do the slow comparison:
$ ./python -m timeit -s 'import unicodedata; s = "\uf900"*500000' -- \
'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 6.67 msec per loop
With this patch, it gets the answer instantly (78 ns) on the same 1 MB
string:
$ ./python -m timeit -s 'import unicodedata; s = "\uf900"*500000' -- \
'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 78 nsec per loop
0 commit comments