8000 Clarifications needed on the XLIFF mapping · Issue #169 · unicode-org/message-format-wg · GitHub
[go: up one dir, main page]

Skip to content
Clarifications needed on the XLIFF mapping #169
Open
@eemeli

Description

@eemeli

While working on implementing the transformations between the data model and XLIFF, it's become obvious that we need to specify a bit better the text of our third deliverable:

A specification for a one-to-one mapping between the data model and XLIFF.

A couple of specific questions:

1. Must the data model support all XLIFF features?

It's pretty obvious that according to this, all data model features need to be reflected in XLIFF, but is the inverse true? In other words, we have to at least enable for an MF2 → XLIFF → MF2 workflow to be non-lossy, but how about XLIFF → MF2 → XLIFF? Does the data model need to support all core XLIFF features, or is it okay to drop some during the XLIFF → MF2 conversion?

One core XLIFF feature that is currently not supported by either proposed model is the split of content into <segment> and <ignorable> parts. Should segmented input be retained as such, or is it ok to always re-segment messages into one segment?

2. Must the mapping be canonical?

The other deliverables refer to "the canonical data model" and "the canonical syntax", but that word isn't used for the XLIFF mapping. This might seem like it's just semantics, but it matters for the edge cases. A "canonical" mapping needs to be always followed, but it can be hard to implement.

One specific place where this matters is the algorithm for merging two separate source and target messages into a single <group> when both of them have selectors, but the lists of selectors are different (e.g. source depends on the variables foo and bar, while target depends only on bar). This is problematic because the MF2 data model has the two languages' messages completely separate from each other, while XLIFF enforces a structure where the selectors are shared between the languages, and the value of each case is translated separately. It can certainly be done, but it's... hairy. Not to mention needing to also be able to reverse this merge.

If the mapping were "canonical", all such edge cases must be completely covered (possibly by forbidding certain structures from being used). It would be much easier if we could agree that at least parts of the mapping aren't canonical, to all 5A14 ow for progress on these algorithms to continue separately from the rest of the spec work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    LDML48LDML48 Releasedata modelIssues related to the Interchange Data ModeldesignDesign document or issues related to design

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0