8000 ICU-23038 Unicode 17 beta by markusicu · Pull Request #3505 · unicode-org/icu · GitHub
[go: up one dir, main page]

Skip to content

ICU-23038 Unicode 17 beta #3505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Conversation

markusicu
Copy link
Member
@markusicu markusicu commented May 21, 2025

Unicode 17 beta WIP

Best reviewed one commit at a time.

Checklist

  • Required: Issue filed: ICU-23038
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

ALLOW_MANY_COMMITS=meow< 8000 /p>

@markusicu markusicu requested review from echeran and eggrobin May 21, 2025 22:50
@markusicu
Copy link
Member Author

@eggrobin ICU4C properties data is in, so you should be able to start work on C++ RBBI.

On my machine, I currently see RBBI and collation test failures:

| ***     FAILING TEST SUMMARY FOR:              intltest  
         TestUnicodeFiles
         TestExtended
         TestMonkey
      RBBITest
   rbbi

| ***     FAILING TEST SUMMARY FOR:              cintltst  
/tstxtbd/cbiapts/TestBreakIteratorTailoring
/tscoll/capitst/TestProperty

I will work on collation and on Java data next.

@markusicu
Copy link
Member Author

@eggrobin I went through the rest of the update instructions. C++ tests still fail with rbbi, which is probably expected. (Different set of failures from before.) I haven't run Java tests yet locally.

         TestExtended
      RBBITest
         testMonkey
      RBBIMonkeyTest
   rbbi

@eggrobin
Copy link
Member
eggrobin commented Jun 2, 2025

After committing 0669e86, I ran the monkeys and forgot about them until earlier today, so this has been tested with 120 million strings. Good enough.

@eggrobin
Copy link
Member

@markusicu Looks like we are good on the C++ RBBI side. I will do what I can on the Java side, but as usual you will need to update the .brk files.

@eggrobin
Copy link
Member

@markusicu, your controls.

@eggrobin
Copy link
Member

@markusicu I am assuming that these Java test have nothing to do with RBBI:

Error:  Failures: 
Error:    PersonNameConsistencyTest.TestPersonNames:107->AbstractTestLog.errln:50 Failure in km.txt: Found 20 errors.
Error:    UCharacterTest.TestGetNumericValue:3090->AbstractTestLog.errln:50 UCharacter.getNumericValue(i) returned a different value from the expected result. Got -2Expected1000000
Error:    UnicodeSetTest.TestToPattern:256->expectToPattern:2606->AbstractTestLog.errln:50 FAIL: toPattern() => "[\u00BD\u0B73\u0D74\u0F2A\u2CFD\uA831\U00010141\U00010175\U00010176\U00010E7B\U00012226]", expected "[\u00BD\u0B73\u0D74\u0F2A\u2CFD\uA831\U00010141\U00010175\U00010176\U00010E7B]"

@markusicu
Copy link
Member Author

I am assuming that these Java test have nothing to do with RBBI

I assume the same.

@richgillam could you please take a look at

Error: PersonNameConsistencyTest.TestPersonNames:107->AbstractTestLog.errln:50 Failure in km.txt: Found 20 errors.

?

@richgillam
Copy link
Contributor

@richgillam could you please take a look at

Error: PersonNameConsistencyTest.TestPersonNames:107->AbstractTestLog.errln:50 Failure in km.txt: Found 20 errors.

?

@markusicu @eggrobin Where can I see the logs? Have you been getting that error the whole time, or did it start somewhere in the middle of getting this PR through?

That test compares what ICU is producing with what the CLDR PersonNameFormatter produces and is based on a test file that we copy over from the CLDR side. This might be an indication that there was a data change on the CLDR side that somehow didn't make it over to ICU, or it might mean some data change has exposed an algorithm difference we didn't previously know about. @macchiati do you know of anything changing on the CLDR side that would affect this?

@richgillam
Copy link
Contributor

Actually, when we've seen this kind of thing before, the most common explanation is that the test file itself got updated on the CLDR side and somehow failed to get properly copied into the ICU project. It's been a long time since I've looked at this, but I think we keep those files under source control in ICU. I could be wrong, but I think the test files get copied over as part of the CLDR-to-ICU conversion process for all the resource data. So I'm going to guess that test file got changed on the CLDR side and we haven't run a CLDR-to-ICU integration since then (or that it went wrong somehow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0