8000 Merge pull request #23 from raphlinus/master · cbarrick/unicode-segmentation@6fc6815 · GitHub
[go: up one dir, main page]

Skip to content

Commit 6fc6815

Browse files
authored
Merge pull request unicode-rs#23 from raphlinus/master
New cursor-based implementation of grapheme clusters
2 parents e86a69b + deebd8a commit 6fc6815

File tree

5 files changed

+608
-381
lines changed

5 files changed

+608
-381
lines changed

scripts/unicode.py

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -330,21 +330,13 @@ def emit_break_module(f, break_table, break_cats, name):
330330
grapheme_cats = load_properties("auxiliary/GraphemeBreakProperty.txt", [])
331331

332332
# Control
333-
# Note 1:
333+
# Note:
334334
# This category also includes Cs (surrogate codepoints), but Rust's `char`s are
335335
# Unicode Scalar Values only, and surrogates are thus invalid `char`s.
336336
# Thus, we have to remove Cs from the Control category
337-
# Note 2:
338-
# 0x0a and 0x0d (CR and LF) are not in the Control category for Graphemes.
339-
# However, the Graphemes iterator treats these as a special case, so they
340-
# should be included in grapheme_cats["Control"] for our implementation.
341337
grapheme_cats["Control"] = group_cat(list(
342-
(set(ungroup_cat(grapheme_cats["Control"]))
343-
| set(ungroup_cat(grapheme_cats["CR"]))
344-
| set(ungroup_cat(grapheme_cats["LF"])))
338+
set(ungroup_cat(grapheme_cats["Control"]))
345339
- set(ungroup_cat([surrogate_codepoints]))))
346-
del(grapheme_cats["CR"])
347-
del(grapheme_cats["LF"])
348340

349341
grapheme_table = []
350342
for cat in grapheme_cats:

0 commit comments

Comments
 (0)
0