8000 Fix normalization of jamo by Manishearth · Pull Request #11 · unicode-rs/unicode-normalization · GitHub
[go: up one dir, main page]

Skip to content

Fix normalization of jamo #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Fix normalization of jamo #11

wants to merge 2 commits into from

Conversation

Manishearth
Copy link
Member

No description provided.

SimonSapin and others added 2 commits December 19, 2016 21:25
The algorithm for composition of Hangul Jamo is:

 - L (choseong jamo) + V (jungseong jamo) = LV (syllable block)
 - LV (syllable block) + T (jongseong jamo) = LVT (syllable block)

However, the LV and LVT syllable blocks are intermingled in the unicode
block. In particular, for each pair LV, you will first see the syllable block
LV, followed by syllable blocks for LVT for each T. The LV+T
composition was a simple addition of offsets.

Our algorithm did not ignore the LVT syllable blocks, which meant that
LVT+T would just offset further and produce an unrelated syllable block.

By ensuring that the `S_index` is a multiple of `T_count`, we filter
for only LV syllable blocks (which occur every `T_count` codepoints in
the S block)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0