[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text-3] Allow breaking anywhere when dictionary is missing for SEA scripts #4284

Closed
fantasai opened this issue Sep 8, 2019 · 8 comments
Labels
Closed Accepted by CSSWG Resolution css-text-3 Current Work i18n-sealreq Southeast Asian language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Tested Memory aid - issue has WPT tests Tracked in DoC

Comments

@fantasai
Copy link
Collaborator
fantasai commented Sep 8, 2019

For scripts that require dictionary breaking or some other morphological analysis, if the resource is missing and the UA can't break the text, it should be allowed to break anywhere instead of overflowing.

@fantasai fantasai added the css-text-3 Current Work label Sep 8, 2019
@fantasai
Copy link
Collaborator Author
fantasai commented Sep 8, 2019

Note: This has been a problem with Javanese.

@fantasai fantasai added i18n-sealreq Southeast Asian language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Agenda+ labels Sep 8, 2019
@r12a
Copy link
Contributor
r12a commented Sep 12, 2019

Well perhaps not anywhere. Certainly grapheme clusters would likely be an appropriate minimum. For Javanese and other scripts, stacked consonants should not be split, and the natural unit for line-breaking is otherwise the syllable. I'm not completely sure about the rules for splitting lefted vowel signs from the rest of the syllable in scripts such as Thai and Lao (where these are not combining characters).

@rcampbellbassac
Copy link

I think that in Lao, there should be a way to detect the syllables. ICU supports word boundary analysis/word tokenization, so this is less of an issue now except on the few applications not using ICU (Firefox doesn't yet use ICU's line-breaker, though they use other parts of ICU it seems).

But, if we want to be safe, and not assume that the browser has ICU support, it would be much more desirable to break at the syllable, rather than cutting one in half. Some vowels in Lao (Thai, Khmer, Burmese) wrap around the nuclear consonant, so if you break it at the wrong place, it cuts your vowel in half between 2 lines (very difficult to read). There is a document I will link to that helps explain this...

panl10n.net Syllabification of Lao Script for Line Breaking

@rcampbellbassac
Copy link

image

This excerpt explains the 'format' for which a Lao syllable is constructed.

Syllable breaking is usable in Lao and many people tend to be OK with it, but ultimately it isn't optimal, as it can render text more difficult to read than if the line-breaking is based on word boundaries, from my understanding.

@jmdurdin
Copy link

Not sure about panI10n.net rules but Lao syllable breaking has been fully implemented by Lao Script for Windows since about 1993 (usually using ZWSP insertion). Excluding loan words, there are few ambiguities, and they are easily managed by not allowing a break if it would be ambiguous. For Lao, as well as keeping grapheme clusters together, a break should never be allowed after a prefix vowel or before U+0EB2 LAO VOWEL SIGN AA, or either before or after U+0EBD LAO SEMIVOWEL SIGN NYO. Thai syllable breaking is much more difficult and requires a moderately large dictionary to be effective.

@fantasai
Copy link
Collaborator Author
fantasai commented Oct 9, 2019

Just want to be clear, this issue isn't about where to break correctly. It's what to do if you don't have the ability to break correctly.

@fantasai
Copy link
Collaborator Author
fantasai commented Oct 9, 2019

The proposal is "if you don't know where it's allowed to break, break somewhere, anywhere, instead of overflowing the box by not breaking at all". Because "hard to read" is better than "clipped and therefore unreadable".

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed Allow breaking anywhere when dictionary is missing for SEA scripts, and agreed to the following:

  • RESOLVED: f there is a language for which you do not know the breaking rules. Rather then treating as unbreakable you treat it as breakable anywhere similar to overflow:anywhere
The full IRC log of that discussion <dael> Topic: Allow breaking anywhere when dictionary is missing for SEA scripts
<dael> github: https://github.com//issues/4284
<dael> fantasai: Certain lang where breakpoint not obvious from character code. hvae to do analysis. If you do not have the dictionary or rules in the engine you don't break the text and it'll be long and overflow. I suggest saying if you don't know how to break then you should break somewhere. Doesn't matter where but between grapheme clusters. hvae to have break opportunities
<dael> myles: Did you mean must?
<dael> fantasai: Yeah
<dael> fantasai: Proposal to add that. Discussion in issue about where to break in languages, but this is about what to happen when UA doens't have rules.
<dael> florian: I think saying you must break somewhre and not middle of grapheme cluster. If you can do mid analysis with meaningful unit of breaking do that. But must break and not break grapheme closters
<dael> myles: How does browser know which scripts?
<dael> fantasai: THere's a classification, let me see.
<fantasai> http://unicode.org/reports/tr14/#SA
<dael> fantasai: Class SA is complex context dependant. If you're one of these scripts and don't have a resource to tell you where to break you should break somewhere
<dael> myles: As long as spec says that this is fine
<dael> fantasai: Okay
<dael> astearns: Other concerns?
<dael> fantasai: Prop: If there is a language for which you do not know the breaking rules. Rather then treating as unbreakable you treat it as breakable anywhere
<dael> astearns: And something about not breaking through grapheme cluster?
<dael> fantasai: Yes. If we copy from overflow: anywhere that comes
<dael> RESOLVED: f there is a language for which you do not know the breaking rules. Rather then treating as unbreakable you treat it as breakable anywhere similar to overflow:anywhere

@frivoal frivoal self-assigned this Oct 9, 2019
frivoal added a commit that referenced this issue Nov 5, 2019
frivoal added a commit that referenced this issue Nov 6, 2019
@frivoal frivoal closed this as completed Nov 6, 2019
@frivoal frivoal removed their assignment Dec 4, 2020
frivoal added a commit to web-platform-tests/wpt that referenced this issue Dec 30, 2022
Some languages require analysis of the text to determine where line
breaking should occur. Even when the Browser is unable to do that, it
should still line break somewhere, as overflowing is a worse problem.

See w3c/csswg-drafts#4284
@frivoal frivoal added Tested Memory aid - issue has WPT tests and removed Needs Testcase (WPT) labels Dec 30, 2022
frivoal added a commit to web-platform-tests/wpt that referenced this issue Dec 30, 2022
Some languages require analysis of the text to determine where line
breaking should occur. Even when the Browser is unable to do that, it
should still line break somewhere, as overflowing is a worse problem.

See w3c/csswg-drafts#4284
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jan 5, 2023
…aking in i18n situations, a=testonly

Automatic update from web-platform-tests
Add tests ensuring some form of line breaking in i18n situations

Some languages require analysis of the text to determine where line
breaking should occur. Even when the Browser is unable to do that, it
should still line break somewhere, as overflowing is a worse problem.

See w3c/csswg-drafts#4284

--

wpt-commits: 5203cae896dc55f9835769137af5208b5a2f2bad
wpt-pr: 37703
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jan 13, 2023
…aking in i18n situations, a=testonly

Automatic update from web-platform-tests
Add tests ensuring some form of line breaking in i18n situations

Some languages require analysis of the text to determine where line
breaking should occur. Even when the Browser is unable to do that, it
should still line break somewhere, as overflowing is a worse problem.

See w3c/csswg-drafts#4284

--

wpt-commits: 5203cae896dc55f9835769137af5208b5a2f2bad
wpt-pr: 37703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution css-text-3 Current Work i18n-sealreq Southeast Asian language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Tested Memory aid - issue has WPT tests Tracked in DoC
Projects
None yet
Development

No branches or pull requests

6 participants