-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text-3] Allow breaking anywhere when dictionary is missing for SEA scripts #4284
Comments
Note: This has been a problem with Javanese. |
Well perhaps not anywhere. Certainly grapheme clusters would likely be an appropriate minimum. For Javanese and other scripts, stacked consonants should not be split, and the natural unit for line-breaking is otherwise the syllable. I'm not completely sure about the rules for splitting lefted vowel signs from the rest of the syllable in scripts such as Thai and Lao (where these are not combining characters). |
I think that in Lao, there should be a way to detect the syllables. ICU supports word boundary analysis/word tokenization, so this is less of an issue now except on the few applications not using ICU (Firefox doesn't yet use ICU's line-breaker, though they use other parts of ICU it seems). But, if we want to be safe, and not assume that the browser has ICU support, it would be much more desirable to break at the syllable, rather than cutting one in half. Some vowels in Lao (Thai, Khmer, Burmese) wrap around the nuclear consonant, so if you break it at the wrong place, it cuts your vowel in half between 2 lines (very difficult to read). There is a document I will link to that helps explain this... |
This excerpt explains the 'format' for which a Lao syllable is constructed. Syllable breaking is usable in Lao and many people tend to be OK with it, but ultimately it isn't optimal, as it can render text more difficult to read than if the line-breaking is based on word boundaries, from my understanding. |
Not sure about panI10n.net rules but Lao syllable breaking has been fully implemented by Lao Script for Windows since about 1993 (usually using ZWSP insertion). Excluding loan words, there are few ambiguities, and they are easily managed by not allowing a break if it would be ambiguous. For Lao, as well as keeping grapheme clusters together, a break should never be allowed after a prefix vowel or before U+0EB2 LAO VOWEL SIGN AA, or either before or after U+0EBD LAO SEMIVOWEL SIGN NYO. Thai syllable breaking is much more difficult and requires a moderately large dictionary to be effective. |
Just want to be clear, this issue isn't about where to break correctly. It's what to do if you don't have the ability to break correctly. |
The proposal is "if you don't know where it's allowed to break, break somewhere, anywhere, instead of overflowing the box by not breaking at all". Because "hard to read" is better than "clipped and therefore unreadable". |
The CSS Working Group just discussed
The full IRC log of that discussion<dael> Topic: Allow breaking anywhere when dictionary is missing for SEA scripts<dael> github: https://github.com//issues/4284 <dael> fantasai: Certain lang where breakpoint not obvious from character code. hvae to do analysis. If you do not have the dictionary or rules in the engine you don't break the text and it'll be long and overflow. I suggest saying if you don't know how to break then you should break somewhere. Doesn't matter where but between grapheme clusters. hvae to have break opportunities <dael> myles: Did you mean must? <dael> fantasai: Yeah <dael> fantasai: Proposal to add that. Discussion in issue about where to break in languages, but this is about what to happen when UA doens't have rules. <dael> florian: I think saying you must break somewhre and not middle of grapheme cluster. If you can do mid analysis with meaningful unit of breaking do that. But must break and not break grapheme closters <dael> myles: How does browser know which scripts? <dael> fantasai: THere's a classification, let me see. <fantasai> http://unicode.org/reports/tr14/#SA <dael> fantasai: Class SA is complex context dependant. If you're one of these scripts and don't have a resource to tell you where to break you should break somewhere <dael> myles: As long as spec says that this is fine <dael> fantasai: Okay <dael> astearns: Other concerns? <dael> fantasai: Prop: If there is a language for which you do not know the breaking rules. Rather then treating as unbreakable you treat it as breakable anywhere <dael> astearns: And something about not breaking through grapheme cluster? <dael> fantasai: Yes. If we copy from overflow: anywhere that comes <dael> RESOLVED: f there is a language for which you do not know the breaking rules. Rather then treating as unbreakable you treat it as breakable anywhere similar to overflow:anywhere |
Some languages require analysis of the text to determine where line breaking should occur. Even when the Browser is unable to do that, it should still line break somewhere, as overflowing is a worse problem. See w3c/csswg-drafts#4284
Some languages require analysis of the text to determine where line breaking should occur. Even when the Browser is unable to do that, it should still line break somewhere, as overflowing is a worse problem. See w3c/csswg-drafts#4284
…aking in i18n situations, a=testonly Automatic update from web-platform-tests Add tests ensuring some form of line breaking in i18n situations Some languages require analysis of the text to determine where line breaking should occur. Even when the Browser is unable to do that, it should still line break somewhere, as overflowing is a worse problem. See w3c/csswg-drafts#4284 -- wpt-commits: 5203cae896dc55f9835769137af5208b5a2f2bad wpt-pr: 37703
…aking in i18n situations, a=testonly Automatic update from web-platform-tests Add tests ensuring some form of line breaking in i18n situations Some languages require analysis of the text to determine where line breaking should occur. Even when the Browser is unable to do that, it should still line break somewhere, as overflowing is a worse problem. See w3c/csswg-drafts#4284 -- wpt-commits: 5203cae896dc55f9835769137af5208b5a2f2bad wpt-pr: 37703
For scripts that require dictionary breaking or some other morphological analysis, if the resource is missing and the UA can't break the text, it should be allowed to break anywhere instead of overflowing.
The text was updated successfully, but these errors were encountered: