8000 Create notes-2021-06-21.md · unicode-org/message-format-wg@b1a69ce · GitHub
[go: up one dir, main page]

Skip to content

Commit b1a69ce

Browse files
committed
Create notes-2021-06-21.md
Closes #177
1 parent 7d2df82 commit b1a69ce

File tree

1 file changed

+216
-0
lines changed

1 file changed

+216
-0
lines changed

meetings/2021/notes-2021-06-21.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
[Automatic Transcription](https://docs.google.com/document/d/1WXgOd8FA_kcXzz4OABk1Y3asPHH8KoEaya0ebr6FaIM/edit?usp=sharing)
2+
3+
### June 21, meeting Attendees
4+
- David Filip - Huawei, OASIS XLIFF TC (DAF)
5+
- Eemeli Aro - OpenJSF (EAO)
6+
- Mihai Nita (MIH)
7+
- Romulo Cintra - CaixaBank (RCA)
8+
- George Rhoten - Apple (GWR)
9+
- Daniel Minor - Mozilla (DLM)
10+
- Elango Cheran - Google (ECH)
11+
- Nebojsa Ciric - Google (NEB)
12+
- Erik Nordin - Mozilla (ETN)
13+
- Luke Swartz - Google (LHS)(LHS)
14+
- Staś Małolepszy - Google (STA)
15+
- Standa Rygal - Expedia (STR)
16+
- Zibi Braniecki - Mozilla (ZBI)
17+
- Nicolas Bouvrette - Expedia (NIC)
18+
19+
20+
## MessageFormat Working Group Contacts:
21+
22+
- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)
23+
24+
## Next Meeting
25+
26+
July 5, 11am PST (6pm GMT) - Extended
27+
28+
29+
30+
### Moderator : Rômulo Cintra
31+
32+
RCA: Meeting is to help unblock progress on the data model.
33+
34+
## Unblocking Data model - Presentation @mihnita
35+
36+
[Document link](https://docs.google.com/document/d/1kVXGMfwNKwU8QiUvUKReGapUAOhwZYaWJUAI3NW06UA/edit )
37+
38+
MIH: We have been discussing this data model for a long time, and we are stuck because there are 2 different philosophical positions. It is like differing positions on a design principle as STA brought up a while ago. When discussing with EAO, we decided that we can reduce the differences to a couple of questions whose answers can more or less determine which model you prefer.
39+
40+
The 2 questions we decided are: 1) Using tree structures in the data model, and 2) Should the data model be more restrictive and provide extension points, or be more flexible and rely more on validation and linting?
41+
42+
For the first question about using tree structures, in the EZ model, there is a cycle in the "has a" field dependencies, thus creating recursion, which creates a tree structure.
43+
44+
In the EM model, there is no nesting, the most complex structures are arrays and maps.
45+
46+
The EM model is extended by registering functions that adhere to advertised interfaces. All functions are first-class citizens, whether provided or not, and users can bring their own functions.
47+
48+
Using trees to represent messages is not how people think when they write sentences (even if linguists use trees to diagram concepts). Trees make it difficult to map these structures to XLIFF and lother l10n tools like Translation Memory, CAT tool UI, validation.
49+
50+
The second question is should the data model be more restrictive and provide extension points, or be more flexible and rely more on validation and linting. It is reminiscent of programming languages that are statically typed vs. dynamically typed. Both options for the data model are valid, it is not about right or wrong.
51+
52+
To clarify, validation is about enforcing things that are defined as clearly right and wrong. Linting is not against the standard, but what is preferred. It is more complex.
53+
54+
Making the model more restrictive and asking developers to write custom functions is actually more beneficial, even for developers.
55+
56+
Linting does not easily constrain error-prone flexibility. JS is an example where the problems of extreme flexibility were solved by introducing more constraints in the standard in ES5 and ES6. Relying more on the logic of a linting engine must happen on the backend (message authoring) and the client side (translators in a UI). The bottom line is that all this adds complexity.
57+
58+
## Unblocking Data model - Presentation @eemeli
59+
60+
[Slides link](https://docs.google.com/presentation/d/153q1UcCgfTQBJEZpxQiRbqYLrU8clkxmRvCJVC2BQTU/edit?usp=sharing)
61+
62+
EAO: A lot of the work that we have done has shown that we come from deeply different points of view. One question is whether trees can be used in the data model. This question doesn't attempt to solve everything. This acknowledges that both models are exceedingly similar. If we can get input from the wider group, we can unblock our forward progress.
63+
64+
Where the tree structure shows up in the data model is same as presented in the other doc:
65+
```
66+
Message -> Pattern -> Part[]
67+
Part = string | MessageReference | …
68+
Part -> MesssageReference -> Path
69+
70+
EM: path = Part[]
71+
EZ: path = string
72+
```
73+
74+
Example of how to represent HTML spanning codes. It is admittedly harder to convert a flat structure into a tree structure, but it is easier to convert a tree structure into a flat structure.
75+
76+
We're talking specifically about the data model, not everything about MF2 is and should be and should do. Each individual message is represented in multiple different ways (ex: Fluent representation, MessageFormat representation).
77+
78+
Validation will be needed at all stages of the message pipeline, no matter what you do, to ensure that changes of a message at one end won't cause issues at the oehter end.
79+
80+
Our expectations are for actual humans to not work directly with the dat amodle, but with other message representations.
81+
82+
If tree structures are used, there is greater ability to be flexible. More complex transformations between the data model and other message representations. Possible to represent messages that are not exactly or easily representable in other forms or systems. Future use cases are likely to be already supported by the data model.
83+
84+
The other question of should the data model be able to represent all possible messages. Restrictions on the EM data model make some messages impossible to represent in MessageFormat 2. Validation/linting will be required in all cases anyways. Are some ideas so bad that we need to make them impossible, or can we rely on recommended practice and linting?
85+
86+
What sort of workflows do we want to support. A flexible data model may make it easier for a message to change how it uses runtime variables without changing the code. See the example of changing the date range formatting function to be more precise.
87+
88+
How do we prepare for unanticipated uses? A flexible model allows us to handle future changes to the standard more gracefully.
89+
90+
91+
92+
## Q & A
93+
94+
RCA: Let's have Q&A right now, and reserve the last 10 minutes to revisit the question and come up with group conclusions.
95+
96+
STA: My first question is about the tree path. We are not trying to address variants under variants, or selectors under selectors. We are just trying to address the path
97+
98+
MIH: Parts of the path can be functions themselves.
99+
100+
STA: The evaluation of a value can be recursive, right?
101+
102+
MIH: Right.
103+
104+
STA: I'm not clear on the difference between hard limits and validation, in both of these presentations.
105+
106+
EAO: A message reference can take a message reference as an input and a string literal as another input. This can be represented in the EZ model but only supportable in a hacky way in the EM model. This creates a hard limit on the messages that can be represented in the models.
107+
108+
MIH: I wanted to yell “objection, your honor!” when it was suggested that there was a message that is impossible using our proposal...I haven’t found such an example yet. You can represent certain things in the tree as-is, but other things will need a function as a developer.
109+
110+
There is nothing that I saw in the models that couldn't be represented in the other model. Even with functions, you can create a new function in the EM model and register it in the registry. This can support any new requirement in the future -- just write a new function.
111+
112+
STA: This is helpful. One model has different types to represent functions, but the other model just takes strings and you just define a new function that are keyed by those strings.
113+
114+
ZBI: I have a question for the EM model. IIUC, there are 3 levels of rejections, so to say. The first level is what can be expressed in the data model. If the data model doesn't represent a concept, then you can pass thing
115+
116+
The second thing is what you call validation. It's analogous to saying that something doesn't conform to EBNF, and so it is proper. MIH, you talk about linting. Are what you saying is that with MF 2.0, you hope to make linting optional because validation occurs beforehand?
117+
118+
MIH: I will rephrase it my own way. Yes, I am saying that linting should be optional.
119+
120+
ZBI: Are you saying that the EZ model will make linting mandatory?
121+
122+
MIH: Yes.
123+
124+
ZBI: I don't agree, but thank you for clarifying.
125+
126+
GRH: I see benefits on both sides. In one way, you have flat rules and you have to expand for the entire sentence. The other side is described as trees, but I don't agree with that description, it's more about word relationships. If you have the preposition "on", the verb and gender can change based on aspects of the rest of the sentence. When it comes to word relationships, when you add more variables to a sentence, can you represent them all easily? The flat one might be duplicating more. I think if you go with a more segmented model, which may be more like the tree model, might be beneficial. When you define a segment, you can specify the grammatical case, or implicitly inject a grammatical case of the target language. Implicit vs explicit wasn't talked about. If something is not valid, it should be validated. The final thing is that it's not tree-based. If you get a quantity, there's a number and a noun. There is a grammatical case of the noun, but the case is affected by the number. But there are the word relationships, and it's more like a graph, and it's not clear in the proposals, but I support extension in some way.
127+
128+
MIH: I think that what GRH touched on here is core. When linguists deal with a sentence a tree doesn't map to a tree in MF. The EZ uses a tree model for the HTML snippet with bold and italic markup tags don't match to a tree. What GRH talked about with word relationships across a sentence is more a graph, which is also not a tree.
129+
130+
GRH: That's correct. It's a graph, hopefully a DAG.
131+
132+
EAO: This is a [link to the data model](https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/mf2-features.test.ts#L434) that does transformation of __ to ___. You need to format-to-parts. You end up with structured data, and then you can apply a transformation. I didn't experience any problem with dealing with a graph. Flattening a graph should be really easy.
133+
134+
LHS: I want to thank the presenters and other members of their teams. This is the most clarity on what the dispute is. But I do want to propose some homework for the 4 of you. There are some fundamental disagreements on basic facts. When we try to answer questions of trees or not, linting or not, I can't have an opinion if we don't agree on basic facts. Can the 4 of you agree on some real examples that cannot be shown in the EM model. Maybe we can't get clarity in this specific meeting, but maybe the 4 of you can.
135+
136+
EAO: This [link](https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/mf2-features.test.ts#L50) gives a dynamic variable reference that I have posted on previous occasions that cannot be represented in the EM model.
137+
138+
LHS: Ideally, the 4 of you need to agree on the same examples. There seems to be a fundamental difference on basic facts. Without that agreement, it's hard for the rest of us to weigh in.
139+
140+
RCA: I think we stressed the models in the last few weeks to make this effort. I think I understand it. Do we agree to continue in this manner, or do we try to
141+
142+
MIH: A few weeks ago, we came up with a list of features to implement in both models and compare. For what is called "dynamic message references", I implemented it, EAO says that I didn't.
143+
144+
LHS: MIH your model leans on functions for flexibility rather than the model itself. How custom are these custom functions? Will they be shared across the industry? Will there be a core of functions? Is there the risk that the custom functions allow the users to do the bad things you hope to prevent?
145+
146+
MIH: The way I see it is that the custom functions being created by companies internally, and then contributing upstream to an open registry to be shared across the industry.
147+
148+
LHS: ANd how about allowing people to do bad things?
149+
150+
MIH: When you let users write their own functions, yes, they can. Once you write some bad functions and use it for a few target languages, you go back and fix teh function. But if you make mistakes in the tree model, then you have to go back and fix the structure of the messages themselves.
151+
152+
ECH: When I look at the EZ model, when you look at [link](https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/rust_eemeli/src/main.rs#L51) you resolve things, you go through the message structure, try to interpret a function (“this is a function”), line ~51: dispatch of the function is manual, if not a string literal, probably a function, what’s interesting: way to clean this up without hard-coded dispatch is an interface (or “trait”), if you do clean it up that way, what would the interface look like in the EZ model, since you don’t know how many arguments/types are in a function → that’s the tradeoff, it might come in the implementation.
153+
154+
On [slide 10 of Eemeli’s presentation](https://docs.google.com/presentation/d/153q1UcCgfTQBJEZpxQiRbqYLrU8clkxmRvCJVC2BQTU/edit?pli=1#slide=id.ge0d6495764_0_6), we have an example of flexibility...that’s the plus side, but the tradeoff is in the coding/implementation
155+
156+
EAO: I just posted the JS runtime interface: https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/data-model.ts#L211 (don’t look at the Rust, that was the first time I coded in Rust!) ...here you can see a map, with each of values being fixed strings, with sequence of arguments
157+
158+
ECH: That doesn’t satisfy me...it’s not really an interface to me
159+
160+
MIH: a comment about the dispatch idea, I don’t expect the “final code” to hardcode function names, you’d look them up somewhere...that’s easy in our model since the function names are strings, but in EZ it’s a path-array, so I’d have to look up that
161+
162+
EAO: Function references are strings, see line 141 (func is a string)
163+
164+
MIH: Not in the data model though?
165+
166+
EAO: This is exactly the data model.
167+
168+
ECH: 2 things in 1 concept: there’s this idea that we need to design flexibility for “things we don’t know”, that’s not how we do things in programming, and it puts the burden of proof on predicting all future requirements...libraries, apps, programming languages start from a small core, you jealously guard against feature creep; only for web frameworks do you want to have “everything but the kitchen sink” since you don’t want to force everyone to migrate to the next version
169+
170+
ZIB: we are not designing something that can *ever* be changed, so this is a big burden on us...if we’re designing a library that we can explore in production & then throw it away later, replace Angular and React, this is different: this is an incredible burden on us
171+
172+
[in chat: “Because we're putting it all into a network of standards that will define l10n industry for the next decade, I would say that a scenario in which we deploy MF2.0, standardize it and for any reason the World needs MF3.0 within a decade means we produced more negative effect on the industry than if we failed to create MF2.0 at all.”]
173+
174+
DAF: I think we've gone quite far in agreeing on the facts. There is the question of whether the EM model can support the previous use case. It can, it just needs a selector to make that possible. The last part where the 4 folks don't agree, on the topic of whether linting is required in the EZ model, I think it is, but we don't have time in the meeting, so we can disagree.
175+
176+
EAO: To remind, we are focusing on the data model. ___
177+
178+
DAF: On the last point on trees or graphs in the data model, when you allow for hierarchical structures in the data model, that is one thing. If you always want to be linear in the canonical syntax, the lesson learned from the XLIFF group to have a XLIFF model that is not tied to XML syntax, we found the linear model was good enough and it was easier.
179+
180+
NEB: One thing about what DAF and EAO said about the structure and the linting, yes you can design the syntax to be limited for linting, but will all users follow that? But the bigger point is that building something that is "10-20 years proof" is a hard thing to do. You don't have to design support for all of the placeholders that Siri, Google Assistant, and Alexa support. It was not my expectation at the beginning of this project that we cover all corner cases of messages, it is not reasonable for translators to cover such cases. I haven't heard from MIH and EAO about examples which the other model cannot support. The localization industry is slow to change and has a lot of money invested, so they will not want to change their processes to handle changes to the data model.
181+
182+
RCA: To wrap up and finish, as an action, the Chair Group can create a plan for the group of 4 to get together.
183+
184+
185+
186+
187+
188+
fun1($var_ref)
189+
fun2(msg_ref)
190+
fun2(other($var_ref))
191+
192+
other_and_fun2($var_ref)
193+
194+
[“f1”, “f2”, “f3”], map_of_arguments
195+
f1(arg)
196+
f2(arg)
197+
f3(arg)
198+
199+
map_of_arguments global;
200+
f1()
201+
f2()
202+
f3()
203+
204+
args: {“theVar”: $var_ref}
205+
206+
[other, func2]
207+
fun2(other($var_ref))
208+
really(not_possible_in_EM($var1, $var2), “other input”)
209+
210+
EM:
211+
any temp = not_possible_in_EM($var1, $var2)
212+
really(temp, “other input”)
213+
214+
EZ:
215+
really([&not_possible_in_EM, $var, $var2], “other input”)
216+

0 commit comments

Comments
 (0)
0