8000 Refactor registry locale overrides by eemeli · Pull Request #534 · unicode-org/message-format-wg · GitHub
[go: up one dir, main page]

Skip to content

Refactor registry locale overrides #534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions spec/registry.dtd
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,12 @@
regex CDATA #REQUIRED
>

<!ELEMENT formatSignature (input?,option*)>
<!ELEMENT formatSignature (input?,option*,override*)>
<!ATTLIST formatSignature
position (open|close|standalone) "standalone"
locales NMTOKENS #IMPLIED
>

<!ELEMENT matchSignature (input?,option*,match*)>
<!ATTLIST matchSignature
locales NMTOKENS #IMPLIED
>
<!ELEMENT matchSignature (input?,option*,match*,override*)>

<!ELEMENT input EMPTY>
<!ATTLIST input
Expand All @@ -43,11 +39,17 @@

<!ELEMENT match EMPTY>
<!ATTLIST match
locales NMTOKENS #IMPLIED
values NMTOKENS #IMPLIED
validationRule IDREF #IMPLIED
>

<!ELEMENT alias (description,setOption*)>
<!ELEMENT override (input?,option*)>
<!ATTLIST override
locales NMTOKENS #REQUIRED
>

<!ELEMENT alias (description,setOption*,override*)>
<!ATTLIST alias
name NMTOKEN #REQUIRED
supports (format|match|all) "all"
Expand Down
67 changes: 40 additions & 27 deletions spec/registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ A _signature_ defines one particular set of at most one argument and any number
that can be used together in a single call to the function.
`<formatSignature>` corresponds to a function call inside a placeholder inside translatable text.
`<matchSignature>` corresponds to a function call inside a selector.
Signatures with a non-empty `locales` attribute are locale-specific
and only available in translations in the given languages.

A signature may define the positional argument of the function with the `<input>` element.
If the `<input>` element is not present, the function is defined as a nullary function.
Expand All @@ -62,6 +60,12 @@ They accept either a finite enumeration of values (the `values` attribute)
or validate their input with a regular expression (the `validationRule` attribute).
Read-only options (the `readonly` attribute) can be displayed to translators in CAT tools, but may not be edited.

As the `<input>` and `<option>` rules may be locale-dependent,
each signature can include an `<override locales="...">` that extends and overrides
the corresponding input and options rules.
If multiple `<override>` elements would match the current locale,
only the first one is used.

Matching-function signatures additionally include one or more `<match>` elements
to define the keys against which they can match when used as selectors.

Expand Down Expand Up @@ -98,7 +102,7 @@ For the sake of brevity, only `locales="en"` is considered.
Match a **formatted** numerical value against CLDR plural categories or against a number literal.
</description>

<matchSignature locales="en">
<matchSignature>
<input validationRule="anyNumber"/>
<option name="type" values="cardinal ordinal"/>
<option name="minimumIntegerDigits" validationRule="positiveInteger"/>
Expand All @@ -107,11 +111,11 @@ For the sake of brevity, only `locales="en"` is considered.
<option name="minimumSignificantDigits" validationRule="positiveInteger"/>
<option name="maximumSignificantDigits" validationRule="positiveInteger"/>
<!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
<match values="zero one two few many"/>
<match validationRule="anyNumber"/>
<match locales="en" values="one two few other" validationRule="anyNumber"/>
<match values="zero one two few many other" validationRule="anyNumber"/>
Comment on lines +114 to +115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other or *?


locales="en" looks odd. Can locales contain multiple values? Space separated, I guess?

This has always struck me as a wobbly mechanism. Many locales have identical plural rules to English. Are we going to list them? Why do we need to repeat CLDR data here? Perhaps we should have a referencing mechanism to CLDR instead of replicating data. Note that other things besides plurals depend on CLDR data, such as date patterns and such.

Copy link
Collaborator Author
@eemeli eemeli Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly other, because these are literal values and * would in source need to be matched as |*|. Also, explicitly included because a selector could differentiate the other and * cases so that the former deals with numeric input matching the other category, while * handles non-numeric inputs. Furthermore, other must be supported in general because if it's left out then we can't properly deal with Eastern European languages like Polish which would prefer to default to many instead of other.


locales is NMTOKENS, so it may include a space-separated list of locale identifiers.

I agree that specifically for cardinal plurals & ordinals the CLDR includes data that we would like to refer to. This PR is not providing a solution for that; it's about providing a friendlier way to define local locale-specific overrides.

</matchSignature>

<formatSignature locales="en">
<formatSignature>
<input validationRule="anyNumber"/>
<option name="minimumIntegerDigits" validationRule="positiveInteger"/>
<option name="minimumFractionDigits" validationRule="positiveInteger"/>
Expand Down Expand Up @@ -141,14 +145,17 @@ Given the above description, the `:number` function is defined to work both in a

Furthermore,
`:number`'s `<matchSignature>` contains two `<match>` elements
which allow to validate the variant keys.
If at least one `<match>` validation rules passes,
a variant key is considered valid.

- `<match validationRule="anyNumber"/>` can be used to valide the `when 1` variant
which allow the validation of variant keys.
The element whose `locales` best matches the current locale
using resource item [lookup](https://unicode.org/reports/tr35/#Lookup) from LDML is used.
An element with no `locales` attribute is the default
(and is considered equivalent to the `root` locale).

- `<match locales="en" values="one two few other" .../>` can be used in locales like `en` and `en-GB`
to validate the `when other` variant by verifying that the `other` key is present
in the list of enumarated values: `one other`.
- `<match ... validationRule="anyNumber"/>` can be used to valide the `when 1` variant
by testing the `1` key against the `anyNumber` regular expression defined in the registry file.
- `<match values="one other"/>` can be used to valide the `when other` variant
by verifying that the `other` key is present in the list of enumarated values: `one other`.

---

Expand All @@ -161,26 +168,32 @@ A localization engineer can then extend the registry by defining the following `
<registry>
<function name="noun">
<description>Handle the grammar of a noun.</description>
<formatSignature locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="plural" values="one other"/>
<option name="case" values="nominative genitive" default="nominative"/>
<formatSignature>
<override locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="plural" values="one other"/>
<option name="case" values="nominative genitive" default="nominative"/>
</override>
</formatSignature>
</function>

<function name="adjective">
<description>Handle the grammar of an adjective.</description>
<formatSignature locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="plural" values="one other"/>
<option name="case" values="nominative genitive" default="nominative"/>
<formatSignature>
<override locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="plural" values="one other"/>
<option name="case" values="nominative genitive" default="nominative"/>
</override>
</formatSignature>
<formatSignature locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="accord"/>
<formatSignature>
<override locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="accord"/>
</override>
</formatSignature>
</function>
</registry>
Expand Down
0