8000 Drop machine-readable registry definition from spec (#815) · unicode-org/message-format-wg@e4508a4 · GitHub
[go: up one dir, main page]

Skip to content

Commit e4508a4

Browse files
authored
Drop machine-readable registry definition from spec (#815)
* Add exploration/registry-xml/ * Drop default registry definition from exploration/registry-xml/ text * Drop registry.xml definition from spec/registry.md text
1 parent ffb69ac commit e4508a4

File tree

5 files changed

+192
-285
lines changed

5 files changed

+192
-285
lines changed

exploration/registry-xml/README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# MessageFormat 2.0 Registry
2+
3+
Implementations and tooling can greatly benefit from a
4+
structured definition of formatting and matching functions available to messages at runtime.
5+
6+
> [!IMPORTANT]
7+
> This definition was initially developed to be a part of the MessageFormat 2.0 specification,
8+
> but has been left out in preference of less structural definitions of message functions
9+
> and an expectation that real-world experience with tooling will be able to inform
10+
> later considerations to return to this topic.
11+
12+
## Goals
13+
14+
The registry provides a description of MessageFormat 2 functions,
15+
in order to support the following goals and use-cases:
16+
17+
- Validate semantic properties of messages. For example:
18+
- Type-check values passed into functions.
19+
- Validate that matching functions are only called in selectors.
20+
- Validate that formatting functions are only called in placeholders.
21+
- Verify the exhaustiveness of variant keys given a selector.
22+
- Support the localization roundtrip. For example:
23+
- Generate variant keys for a given locale during XLIFF extraction.
24+
- Improve the authoring experience. For example:
25+
- Forbid edits to certain function options (e.g. currency options).
26+
- Autocomplete function and option names.
27+
- Display on-hover tooltips for function signatures with documentation.
28+
- Display/edit known message metadata.
29+
- Restrict input in GUI by providing a dropdown with all viable option values.
30+
31+
## Conformance and Use
32+
33+
Implementations are not required to provide a machine-readable registry
34+
nor to read or interpret the registry data model in order to be conformant.
35+
36+
The MessageFormat 2.0 Registry was created to describe
37+
the core set of formatting and selection _functions_,
38+
including _operands_, _options_, and _option_ values.
39+
This is the minimum set of functionality needed for conformance.
40+
By using the same names and values, _messages_ can be used interchangeably
41+
by different implementations,
42+
regardless of programming language or runtime environment.
43+
This ensures that developers do not have to relearn core MessageFormat syntax
44+
and functionality when moving between platforms
45+
and that translators do not need to know about the runtime environment for most
46+
selection or formatting operations.
47+
48+
The registry provides a machine-readable description of _functions_
49+
suitable for tools, such as those used in translation automation, so that
50+
variant expansion and information about available _options_ and their effects
51+
are available in the translation ecosystem.
52+
To that end, implementations are strongly encouraged to provide appropriately
53+
tailored versions of the registry for consumption by tools
54+
(even if not included in software distributions)
55+
and to encourage any add-on or plug-in functionality to provide
56+
a registry to support localization tooling.
57+
58+
## Registry Data Model
59+
60+
MessageFormat 2 functions can be invoked in two contexts:
61+
62+
- inside placeholders, to produce a part of the message's formatted output;
63+
for example, a raw value of `|1.5|` may be formatted to `1,5` in a language which uses commas as decimal separators,
64+
- inside selectors, to contribute to selecting the appropriate variant among all given variants.
65+
66+
A single _function name_ may be used in both contexts,
67+
regardless of whether it's implemented as one or multiple functions.
68+
69+
A _signature_ defines one particular set of at most one argument and any number of named options
70+
that can be used together in a single call to the function.
71+
`<formatSignature>` corresponds to a function call inside a placeholder inside translatable text.
72+
`<matchSignature>` corresponds to a function call inside a selector.
73+
74+
A signature may define the positional argument of the function with the `<input>` element.
75+
If the `<input>` element is not present, the function is defined as a nullary function.
76+
A signature may also define one or more `<option>` elements representing _named options_ to the function.
77+
An option can be omitted in a call to the function,
78+
unless the `required` attribute is present.
79+
They accept either a finite enumeration of values (the `values` attribute)
80+
or validate their input with a regular expression (the `validationRule` attribute).
81+
Read-only options (the `readonly` attribute) can be displayed to translators in CAT tools, but may not be edited.
82+
83+
As the `<input>` and `<option>` rules may be locale-dependent,
84+
each signature can include an `<override locales="...">` that extends and overrides
85+
the corresponding input and options rules.
86+
If multiple `<override>` elements would match the current locale,
87+
only the first one is used.
88+
89+
Matching-function signatures additionally include one or more `<match>` elements
90+
to define the keys against which they can match when used as selectors.
91+
92+
Functions may also include `<alias>` definitions,
93+
which provide shorthands for commonly used option baskets.
94+
An _alias name_ may be used equivalently to a _function name_ in messages.
95+
Its `<setOption>` values are always set, and may not be overridden in message annotations.
96+
97+
If a `<function>`, `<input>` or `<option>` includes multiple `<description>` elements,
98+
each SHOULD have a different `xml:lang` attribute value.
99+
This allows for the descriptions of these elements to be themselves localized
100+
according to the preferred locale of the message authors and editors.
101+
102+
## Example
103+
104+
The following `registry.xml` is an example of a registry file
105+
which may be provided by an implementation to describe its built-in functions.
106+
For the sake of brevity, only `locales="en"` is considered.
107+
108+
```xml
109+
<?xml version="1.0" encoding="UTF-8" ?>
110+
<!DOCTYPE registry SYSTEM "./registry.dtd">
111+
112+
<registry xml:lang="en">
113+
<function name="platform">
114+
<description>Match the current OS.</description>
115+
<matchSignature>
116+
<match values="windows linux macos android ios"/>
117+
</matchSignature>
118+
</function>
119+
120+
<validationRule id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/>
121+
<validationRule id="positiveInteger" regex="[0-9]+"/>
122+
<validationRule id="currencyCode" regex="[A-Z]{3}"/>
123+
124+
<function name="number">
125+
<description>
126+
Format a number.
127+
Match a **formatted** numerical value against CLDR plural categories or against a number literal.
128+
</description>
129+
130+
<matchSignature>
131+
<input validationRule="anyNumber"/>
132+
<option name="type" values="cardinal ordinal"/>
133+
<option name="minimumIntegerDigits" validationRule="positiveInteger"/>
134+
<option name="minimumFractionDigits" validationRule="positiveInteger"/>
135+
<option name="maximumFractionDigits" validationRule="positiveInteger"/>
136+
<option name="minimumSignificantDigits" validationRule="positiveInteger"/>
137+
<option name="maximumSignificantDigits" validationRule="positiveInteger"/>
138+
<!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
139+
<match locales="en" values="one two few other" validationRule="anyNumber"/>
140+
<match values="zero one two few many other" validationRule="anyNumber"/>
141+
</matchSignature>
142+
143+
<formatSignature>
144+
<input validationRule="anyNumber"/>
145+
<option name="minimumIntegerDigits" validationRule="positiveInteger"/>
146+
<option name="minimumFractionDigits" validationRule="positiveInteger"/>
147+
<option name="maximumFractionDigits" validationRule="positiveInteger"/>
148+
<option name="minimumSignificantDigits" validationRule="positiveInteger"/>
149+
<option name="maximumSignificantDigits" validationRule="positiveInteger"/>
150+
<option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/>
151+
<option name="currency" readonly="true" validationRule="currencyCode"/>
152+
</formatSignature>
153+
154+
<alias name="integer">
155+
<description>Locale-sensitive integral number formatting</description>
156+
<setOption name="maximumFractionDigits" value="0" />
157+
<setOption name="style" value="decimal" />
158+
</alias>
159+
</function>
160+
</registry>
161+
```
162+
163+
Given the above description, the `:number` function is defined to work both in a selector and a placeholder:
164+
165+
```
166+
.match {$count :number}
167+
1 {{One new message}}
168+
* {{{$count :number} new messages}}
169+
```
170+
171+
Furthermore,
172+
`:number`'s `<matchSignature>` contains two `<match>` elements
173+
which allow the validation of variant keys.
174+
The element whose `locales` best matches the current locale
175+
using resource item [lookup](https://unicode.org/reports/tr35/#Lookup) from LDML is used.
176+
An element with no `locales` attribute is the default
177+
(and is considered equivalent to the `root` locale).
178+
179+
- `<match locales="en" values="one two few other" .../>` can be used in locales like `en` and `en-GB`
180+
to validate the `when other` variant by verifying that the `other` key is present
181+
in the list of enumarated values: `one other`.
182+
- `<match ... validationRule="anyNumber"/>` can be used to valide the `when 1` variant
183+
by testing the `1` key against the `anyNumber` regular expression defined in the registry file.

spec/registry.dtd renamed to exploration/registry-xml/registry.dtd

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
<!--
2-
This DTD is not part of the LDML45 Tech Preview of MessageFormat 2.
3-
Comments on this DTD are welcome.
4-
-->
1+
<!-- This DTD is not part of the MessageFormat 2 specification. -->
2+
53
<!ELEMENT registry (function|validationRule)*>
64
<!ATTLIST registry
75
xml:lang NMTOKEN #IMPLIED

spec/registry.xml renamed to exploration/registry-xml/registry.xml

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,8 @@
11
<?xml version="1.0" encoding="UTF-8"?>
22
<?xml-model href="registry.dtd" type="application/xml-dtd"?>
3-
<!--
4-
This registry is not part of the LDML45 Tech Preview of MessageFormat 2.
5-
Comments on the contents of this registry are welcome as we seek to
6-
finalize the registry descriptions as part of the stable release
7-
in LDML46.
8-
-->
3+
4+
<!-- This registry is not part of the MessageFormat 2 specification. -->
5+
96
<registry xml:lang="en">
107
<!-- All regex here are to be seen as provisory. See issue #422. -->
118
<validationRule id="anyNumber" regex="-?(0|([1-9]\d*))(\.\d*)?([eE][-+]?\d+)?"/>

spec/README.md

-2Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,7 @@
1616
1. [Data Model Errors](errors.md#data-model-errors)
1717
1. [Resolution Errors](errors.md#resolution-errors)
1818
1. [Message Function Errors](errors.md#message-function-errors)
19-
1. [Registry](registry.md)
20-
1. [`registry.dtd`](registry.dtd)
19+
1. [Default Function Registry](registry.md)
2120
1. [Formatting](formatting.md)
2221
1. [Interchange data model](data-model/README.md)
2322

0 commit comments

Comments
 (0)
0