8000 ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state by stevenschlansker · Pull Request #3417 · unicode-org/icu · GitHub
[go: up one dir, main page]

Skip to content

ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state #3417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stevenschlansker
Copy link
@stevenschlansker stevenschlansker commented Feb 25, 2025

LowercaseTransliterator tries to reuse buffers to avoid allocation. Historically, this may have been an important optimization.
However on modern machines you are much more likely to use multiple CPUs, and modern JVMs are very efficient at GC, especially for local temporary data. Contending on a monitor is a big deal nowadays, especially for something so low level and hidden.

I've prepared a benchmark using JMH to show the change in scalability:

Linux 6.12.13-200.fc41.x86_64 AMD Ryzen 7 3700X 8-Core Processor
JMH version: 1.37
JDK 23.0.2, OpenJDK 64-Bit Server VM, 23.0.2+7

Original `synchronized`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   789.169 ±  2.779  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5217.250 ± 35.501  ops/ms

8 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   759.934 ±  4.880  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  3916.430 ± 70.545  ops/ms

New `stateless`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   779.504 ±  6.507  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5641.104 ± 29.747  ops/ms

8 Thread Benchmark                         Mode  Cnt      Score      Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   6340.495 ±  240.436  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  39132.015 ± 7102.311  ops/ms

The current implementation exhibits negative scalability: adding more threads hurts throughput (~5% loss going from 1T to 8T)
By removing the synchronized state, the implementation now scales perfectly (~8x increase going from 1T to 8T for the longer test case)
There is a very minor (~1%) penalty for pure single-threaded throughput since we no longer reuse state.
I believe this is acceptable in the context of allowing multiple threads to work without coordination.

Checklist

  • Required: Issue filed: ICU-23061
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 95a0533 to 2af99d6 Compare February 25, 2025 18:55
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 2af99d6 to 2197adb Compare February 25, 2025 18:58
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/README.txt is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker stevenschlansker changed the title LowercaseTransliterator has poor scalability due to synchronized state ICU-23061: LowercaseTransliterator has poor scalability due to synchronized state Feb 25, 2025
@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 2197adb to f332c3c Compare February 25, 2025 19:11
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

macchiati
macchiati previously approved these changes Feb 27, 2025
Copy link
Member
@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all looks good to me.

NOTE: I can't assess the additional dependencies; someone else should check that they look secure.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker
Copy link
Author

I needed to add the license header to the performance test.

@stevenschlansker stevenschlansker changed the title ICU-23061: LowercaseTransliterator has poor scalability due to synchronized state ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state Mar 17, 2025
@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 886f66f to 268eab2 Compare March 17, 2025 23:22
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/pom.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker
Copy link
Author

I removed the : from my commit messages, I think it was tripping up the checker. Also rebased on top of main.

@richgillam
Copy link
Contributor

The change itself looks reasonable, assuming the performance advantages are as billed. I can't speak to the POM changes or the performance test, as that's not stuff I'm really familiar with.

Usually, bug fixes in ICU need to be done in both C++ and Java when appropriate. Do we have a similar problem on the C++ side? If so, I'd like to see a fix for that here too.

@grhoten
Copy link
Member
grhoten commented Mar 21, 2025

These changes remind me of something. I think this Java lowercase transliterator still uses the locale. The ICU4C implementation removed that long ago because of how the registry works with sharing the singleton objects. Though the implementation in Java hard codes it to be US English casing only. So it's not much different.

I thought the transliterator casing functionality in ICU4C was doing something funny with thread safety long ago, but I can't seem to find the concerning code anymore. Perhaps it was removed, or I missed it.

@stevenschlansker
Copy link
Author

I can't speak to the POM changes or the performance test, as that's not stuff I'm really familiar with.

The introduction of JMH is optional, and if the project maintainers don't want to introduce it, I can remove these changes.
I see that ICU already has its own benchmark harness. JMH is nice because it handles a lot of the boilerplate for you - help messages, warming up the JVM, timing runs - in a way that the "JVM experts" have thought long and hard about. The benchmark could be converted to the ICU runner, or not committed, if that's preferred. But I think you'll find that more and more outside contributors would already come in with JMH familiarity.

@richgillam
Copy link
Contributor

We really need one or more people from the ICU4J side of things to comment here...

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch 2 times, most recently from 3463237 to 94ec9ff Compare March 28, 2025 00:44
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/pom.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker
Copy link
Author

I rebased and resolved conflicts.

@stevenschlansker
8000 Copy link
Author

Anything I should do to move this PR forward? If the JMH benchmark is acceptable, I can squash the commits together. I didn't want to do that until we confirmed it would be accepted.

Copy link
Contributor
@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.
Thank you very much!
Mihai

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 94ec9ff to c76e791 Compare May 30, 2025 21:22
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java is now changed in the branch
  • icu4j/main/translit/src/main/java/com/ibm/icu/text/CaseFoldTransliterator.java is now changed in the branch
  • icu4j/main/translit/src/main/java/com/ibm/icu/text/TitlecaseTransliterator.java is now changed in the branch
  • icu4j/main/translit/src/main/java/com/ibm/icu/text/UppercaseTransliterator.java is now changed in the branch
  • icu4j/perf-tests/pom.xml is different
  • icu4j/perf-tests/README.txt is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/CaseFoldTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is now changed in the branch
  • icu4j/pom.xml is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from c76e791 to 9bb5d33 Compare May 30, 2025 21:27
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/CaseFoldTransliteratorPerfTest.java is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@@ -416,4 +416,7 @@ public void addSourceTargetSet(UnicodeSet inputFilter, UnicodeSet sourceSet, Uni
}
}

private static class WordBreakIteratorHolder {
static fin 8000 al BreakIterator BI = BreakIterator.getWordInstance(new ULocale("th_TH"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, using the Thai break iterator is generally pointless. It might load the Thai dictionary preemptively, but using ULocale.ROOT would generally be more appropriate.

Comment on lines +77 to +78
int[] boundaries = new int[50];
int boundaryCount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being way overoptimized, and it probably exists due to ICU's long history. Now that ArrayList exists in Java, which is not synchronized when compared to the older Vector implementation, this should probably transition to ArrayList.

This change may affect the garbage collector. I think the initial capacity is 10 for an ArrayList, but we could guess an initial capacity of an ArrayList with something like ensureCapacity(Math.max(text.length()/CODEUNITS_PER_WORD, 4)) where CODEUNITS_PER_WORD could be 8 as a good starting point. CODEUNITS_PER_WORD could be 4 if it's reallocating too much, or the minimum capacity could be raised.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern is that ArrayList<Integer> would also need to box the primitive values into wrapper objects, with an additional performance penalty. The existing primitive int[] does not have this problem

macchiati
macchiati previously approved these changes May 30, 2025
Copy link
Member
@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker
Copy link
Author

I re-ran the benchmarks with the latest changes.

synchronized, 1T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   419.108 ±  26.005  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  4022.897 ± 124.318  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   802.483 ±  29.266  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  5383.847 ± 277.556  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   820.756 ±  54.370  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  5015.698 ± 216.265  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   820.914 ±  54.945  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  4716.300 ± 276.341  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   706.661 ±  29.485  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  4634.731 ± 368.809  ops/ms

synchronized, 8T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   594.238 ±  15.677  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  3328.109 ±  58.666  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   797.488 ±  19.831  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  4302.125 ±  36.354  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   801.179 ±  21.477  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  4366.354 ±  21.683  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   802.491 ±   5.279  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  3835.846 ± 205.297  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   598.344 ±  16.620  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  3612.776 ±  10.186  ops/ms

local-alloc, 1T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   378.975 ±   8.528  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  2176.368 ±  66.654  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   817.171 ±  40.881  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  5887.798 ± 463.620  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   847.376 ±  53.358  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  5730.719 ± 377.115  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   835.202 ±  55.587  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  5244.637 ± 246.148  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   617.542 ±  43.544  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  5048.793 ± 404.342  ops/ms

local-alloc, 8T:

Benchmark                                     Mode  Cnt      Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   2805.141 ±  33.812  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5   4805.065 ± 171.575  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   6810.324 ±  56.592  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  44420.295 ± 890.433  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   6655.300 ±  20.258  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  42945.707 ± 116.837  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   5288.342 ±  97.473  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  32089.013 ± 633.706  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   4585.569 ±  66.147  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  36655.642 ± 421.983  ops/ms

macchiati
macchiati previously approved these changes May 31, 2025
Copy link
Contributor
@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the updates.

I only have a few comments, hopefully non-controversial.

…ention

Current implementation of lower, upper, title, etc synchronizes the whole
operation, so it is always single-threaded

Removing the shared state lets us utilize multiple CPU cores
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerfTest.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker
Copy link
Author

I made the suggested renamings. Please let me know what you would like to do with the BreakTransliterator non-public access case.

Copy link
Contributor
@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0