8000 Do not further compose LVT hangul precomposed syllable blocks · unicode-rs/unicode-normalization@98748d3 · GitHub
[go: up one dir, main page]

Skip to content

Commit 98748d3

Browse files
committed
Do not further compose LVT hangul precomposed syllable blocks
The algorithm for composition of Hangul Jamo is: - L (choseong jamo) + V (jungseong jamo) = LV (syllable block) - LV (syllable block) + T (jongseong jamo) = LVT (syllable block) However, the LV and LVT syllable blocks are intermingled in the unicode block. In particular, for each pair LV, you will first see the syllable block LV, followed by syllable blocks for LVT for each T. The LV+T composition was a simple addition of offsets. Our algorithm did not ignore the LVT syllable blocks, which meant that LVT+T would just offset further and produce an unrelated syllable block. By ensuring that the `S_index` is a multiple of `T_count`, we filter for only LV syllable blocks (which occur every `T_count` codepoints in the S block)
1 parent f0e9362 commit 98748d3

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

src/normalize.rs

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,8 @@ pub fn compose(a: char, b: char) -> Option<char> {
102102
})
103103
}
104104

105-
// Constants from Unicode 7.0.0 Section 3.12 Conjoining Jamo Behavior
105+
// Constants from Unicode 9.0.0 Section 3.12 Conjoining Jamo Behavior
106+
// http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf#M9.32468.Heading.310.Combining.Jamo.Behavior
106107
const S_BASE: u32 = 0xAC00;
107108
const L_BASE: u32 = 0x1100;
108109
const V_BASE: u32 = 0x1161;
@@ -145,12 +146,15 @@ fn compose_hangul(a: char, b: char) -> Option<char> {
145146
let l = a as u32;
146147
let v = b as u32;
147148
// Compose an LPart and a VPart
148-
if L_BASE <= l && l < (L_BASE + L_COUNT) && V_BASE <= v && v < (V_BASE + V_COUNT) {
149+
if L_BASE <= l && l < (L_BASE + L_COUNT) // l should be an L choseong jamo
150+
&& V_BASE <= v && v < (V_BASE + V_COUNT) { // v should be a V jungseong jamo
149151
let r = S_BASE + (l - L_BASE) * N_COUNT + (v - V_BASE) * T_COUNT;
150152
return unsafe { Some(transmute(r)) };
151153
}
152154
// Compose an LVPart and a TPart
153-
if S_BASE <= l && l <= (S_BASE+S_COUNT-T_COUNT) && T_BASE <= v && v < (T_BASE+T_COUNT) {
155+
if S_BASE <= l && l <= (S_BASE+S_COUNT-T_COUNT) // l should be a syllable block
156+
&& T_BASE <= v && v < (T_BASE+T_COUNT) // v should be a T jongseong jamo
157+
&& (l - S_BASE) % T_COUNT == 0 { // l should be an LV syllable block (not LVT)
154158
let r = l + (v - T_BASE);
155159
return unsafe { Some(transmute(r)) };
156160
}

0 commit comments

Comments
 (0)
0