8000 CLDR-18722 Explicit numberSystem attribute by btangmu · Pull Request #4974 · unicode-org/cldr · GitHub
[go: up one dir, main page]

Skip to content

Conversation

btangmu
Copy link
Member
@btangmu btangmu commented Aug 18, 2025

-Revise ldml.dtd to allow numberSystem attribute for pattern element

-Add numberSystem (latn) for pattern/decimal/group elements where needed in locales ca, el, en_150, en_AT, eu, gl, it, tr

-Re-enable the check for numberSystem in CheckNumbers.java

-Use parts.containsAttribute instead of parts.getAttribute(2, ...) since the element index may not be 2

CLDR-18722

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

-Revise ldml.dtd to allow numberSystem attribute for pattern element

-Add numberSystem (latn) for pattern/decimal/group elements where needed in locales ca, el, en_150, en_AT, eu, gl, it, tr

-Re-enable the check for numberSystem in CheckNumbers.java

-Use parts.containsAttribute instead of parts.getAttribute(2, ...) since the element index may not be 2
@btangmu btangmu self-assigned this Aug 18, 2025
@btangmu btangmu requested a review from macchiati August 18, 2025 19:39
@btangmu
Copy link
Member Author
btangmu commented Aug 18, 2025

Locally, after making these changes, I ran console check as follows:

java -DCLDR_DIR=$(pwd) -jar tools/cldr-code/target/cldr-code.jar check -S common,seed -1 -e -z FINAL_TESTING -c comprehensive

Note that -c comprehensive was required, otherwise the paths that previously had numberSystem errors would be skipped. There were 62 errors, but none of them involved numberSystem.

-Remove numberSystem from a path where it should not have been added (redundant)
-Revise ldml.dtd to handle numberSystem attribute for decimal/group element like elsewhere, not deprecated
<displayName count="other" draft="contributed">pesetas</displayName>
<symbol>₧</symbol>
<decimal>,</decimal>
<group>❰NBTSP❱</group>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something wrong here; unrelated to this ticket. The [❰❱] characters should never occur in values for our data files. It looks like vetters might be pasting them into values. I'll file a separate ticket about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati Did you file that ticket yet?

<!ATTLIST decimal draft (approved | contributed | provisional | unconfirmed | true | false) #IMPLIED >
<!--@METADATA-->
<!--@DEPRECATED:true, false-->
<!ATTLIST decimal references CDATA #IMPLIED >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why this file exists; I've noticed that it causes errors when it drifts way from the regular ldml.dtd. If we keep it, we should have a test that it is identical (and maybe we do, and that's what alerted you to it?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In eclipse I see the following error with that file:

Description Resource Path Location Type
The element 'gmtUnknownFormat' has not been declared. ldml.dtd /cldr-code/src/test/resources/org/unicode/cldr/unittest/data/common/dtd line 1723 DTD Problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by that too. There are 3 ldml.dtd files:

  1. common/dtd/ldml.dtd
  2. tools/cldr-code/src/test/resources/org/unicode/cldr/unittest/data/common/dtd/ldml.dtd
  3. tools/cldr-code/target/test-classes/org/unicode/cldr/unittest/data/common/dtd/ldml.dtd

I was accidentally editing the wrong one before I noticed that...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I did that again even after realizing there are 3 files; this PR still fails maybe for that reason. I'll make another (4th) commit shortly.

@srl295 do you have insight about how the ldml.dtd files relate to each other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 I forget if we discussed this in the infrastructure meeting or slack. Although it's tangential to this ticket, maybe there should be a new ticket for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's already on my list, just ignore that file i have an issue to correct it

Uh oh!

There was an error while loading. Please reload this page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend to revert changes to this file

Copy link
Member
@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good; a couple of questions

NumericType type = NumericType.getNumericType(path);
if (type == NumericType.NOT_NUMERIC || type == NumericType.RATIONAL) {
return this; // skip
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it that NumericType still gets these items, so they aren't skipped by line 258.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response; this PR was put on hold pending branching maint/maint-48.

I don't quite understand the question.

This is more than 100 lines into handleCheck and this PR only does two things here, a few lines further down:

  1. It re-enables the "Number formats must have an explicit numberSystem attribute." check.
  2. It makes the check more general, since the numberSystem attribute can occur in different places, not necessary index 2.

Is the concern that some paths that would now match would be skipped prematurely because they have NumericType.NOT_NUMERIC or NumericType.RATIONAL? I could investigate that...

Copy link
Member
@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, and it is really past when we can make changes that could affect ICU. So we might want to put this on hold until we branch. Let's discuss in infra.

@macchiati
Copy link
Member

Changing to Draft so we don't mistakenly merge into 48 main.

@macchiati macchiati marked this pull request as draft August 20, 2025 17:05
-A previous commit mistakenly did this with the wrong ldml.dtd; there are 2 files with that name, not counting the one in target
@btangmu btangmu marked this pull request as ready for review October 10, 2025 17:59
@btangmu
Copy link
Member Author
btangmu commented Oct 10, 2025

Now that we've branched maint/maint-48, I've made this ready for review again, and added Steven and Peter as reviewers, and responding to a couple of comments

@btangmu
Copy link
Member Author
btangmu commented Oct 10, 2025

I see the ticket is now done, with this follow-up ticket for v49: https://unicode-org.atlassian.net/browse/CLDR-18963

I don't know if there's a way to move the PR to a different ticket; should I just make a new PR and refer back to this one for discussion?

@srl295
Copy link
Member
srl295 commented Oct 10, 2025

I see the ticket is now done, with this follow-up ticket for v49: https://unicode-org.atlassian.net/browse/CLDR-18963

I don't know if there's a way to move the PR to a different ticket; should I just make a new PR and refer back to this one for discussion?

Just retitle the pr and then either use the auto squash link or manually reword the git commits and force push

<!--@METADATA-->
<!ATTLIST group numberSystem CDATA #IMPLIED >
<!--@DEPRECATED-->
<!--@MATCH:bcp47/nu-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
< A36C td class="blob-num blob-num-deletion text-right border-0 px-2 py-1 lh-default" data-line-number="2065">
<!--@MATCH:bcp47/nu-->
<!--@DEPRECATED-->

Copy link
Member
@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert all changes to the test file. i'll document it better in https://unicode-org.atlassian.net/browse/CLDR-19058

<!--@METADATA-->
<!ATTLIST decimal numberSystem CDATA #IMPLIED >
<!--@DEPRECATED-->
<!--@MATCH:bcp47/nu-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!--@MATCH:bcp47/nu-->
<!--@DEPRECATED-->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0