ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state #3417

stevenschlansker · 2025-02-25T18:54:12Z

LowercaseTransliterator tries to reuse buffers to avoid allocation. Historically, this may have been an important optimization.
However on modern machines you are much more likely to use multiple CPUs, and modern JVMs are very efficient at GC, especially for local temporary data. Contending on a monitor is a big deal nowadays, especially for something so low level and hidden.

I've prepared a benchmark using JMH to show the change in scalability:

Linux 6.12.13-200.fc41.x86_64 AMD Ryzen 7 3700X 8-Core Processor
JMH version: 1.37
JDK 23.0.2, OpenJDK 64-Bit Server VM, 23.0.2+7

Original `synchronized`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   789.169 ±  2.779  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5217.250 ± 35.501  ops/ms

8 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   759.934 ±  4.880  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  3916.430 ± 70.545  ops/ms

New `stateless`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   779.504 ±  6.507  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5641.104 ± 29.747  ops/ms

8 Thread Benchmark                         Mode  Cnt      Score      Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   6340.495 ±  240.436  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  39132.015 ± 7102.311  ops/ms

The current implementation exhibits negative scalability: adding more threads hurts throughput (~5% loss going from 1T to 8T)
By removing the synchronized state, the implementation now scales perfectly (~8x increase going from 1T to 8T for the longer test case)
There is a very minor (~1%) penalty for pure single-threaded throughput since we no longer reuse state.
I believe this is acceptable in the context of allowing multiple threads to work without coordination.

Checklist

Required: Issue filed: ICU-23061
Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
Issue accepted (done by Technical Committee after discussion)
Tests included, if applicable
API docs and/or User Guide docs changed or added, if applicable

jira-pull-request-webhook · 2025-02-25T18:55:29Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

jira-pull-request-webhook · 2025-02-25T18:58:15Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/README.txt is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

jira-pull-request-webhook · 2025-02-25T19:11:20Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

macchiati

It all looks good to me.

NOTE: I can't assess the additional dependencies; someone else should check that they look secure.

jira-pull-request-webhook · 2025-02-27T21:51:06Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

stevenschlansker · 2025-02-27T21:51:22Z

I needed to add the license header to the performance test.

jira-pull-request-webhook · 2025-03-17T23:22:31Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/pom.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

stevenschlansker · 2025-03-17T23:22:43Z

I removed the : from my commit messages, I think it was tripping up the checker. Also rebased on top of main.

richgillam · 2025-03-18T20:41:34Z

The change itself looks reasonable, assuming the performance advantages are as billed. I can't speak to the POM changes or the performance test, as that's not stuff I'm really familiar with.

Usually, bug fixes in ICU need to be done in both C++ and Java when appropriate. Do we have a similar problem on the C++ side? If so, I'd like to see a fix for that here too.

grhoten · 2025-03-21T05:15:27Z

These changes remind me of something. I think this Java lowercase transliterator still uses the locale. The ICU4C implementation removed that long ago because of how the registry works with sharing the singleton objects. Though the implementation in Java hard codes it to be US English casing only. So it's not much different.

I thought the transliterator casing functionality in ICU4C was doing something funny with thread safety long ago, but I can't seem to find the concerning code anymore. Perhaps it was removed, or I missed it.

stevenschlansker · 2025-03-21T16:47:20Z

I can't speak to the POM changes or the performance test, as that's not stuff I'm really familiar with.

The introduction of JMH is optional, and if the project maintainers don't want to introduce it, I can remove these changes.
I see that ICU already has its own benchmark harness. JMH is nice because it handles a lot of the boilerplate for you - help messages, warming up the JVM, timing runs - in a way that the "JVM experts" have thought long and hard about. The benchmark could be converted to the ICU runner, or not committed, if that's preferred. But I think you'll find that more and more outside contributors would already come in with JMH familiarity.

richgillam · 2025-03-25T21:46:34Z

We really need one or more people from the ICU4J side of things to comment here...

jira-pull-request-webhook · 2025-03-28T00:44:53Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/pom.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

stevenschlansker · 2025-03-28T00:45:32Z

I rebased and resolved conflicts.

stevenschlansker · 2025-05-20T15:04:18Z

Anything I should do to move this PR forward? If the JMH benchmark is acceptable, I can squash the commits together. I didn't want to do that until we confirmed it would be accepted.

mihnita

Added some comments.
Thank you very much!
Mihai

icu4j/perf-tests/pom.xml

icu4j/main/translit/src/main/java/com/ibm/icu/text/LowercaseTransliterator.java

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java

icu4j/perf-tests/pom.xml

jira-pull-request-webhook · 2025-05-30T21:22:51Z

Notice: the branch changed across the force-push!

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java is now changed in the branch
icu4j/main/translit/src/main/java/com/ibm/icu/text/CaseFoldTransliterator.java is now changed in the branch
icu4j/main/translit/src/main/java/com/ibm/icu/text/TitlecaseTransliterator.java is now changed in the branch
icu4j/main/translit/src/main/java/com/ibm/icu/text/UppercaseTransliterator.java is now changed in the branch
icu4j/perf-tests/pom.xml is different
icu4j/perf-tests/README.txt is different
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/CaseFoldTransliteratorPerfTest.java is now changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is now changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is now changed in the branch
icu4j/pom.xml is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java

jira-pull-request-webhook · 2025-05-30T21:27:46Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/CaseFoldTransliteratorPerfTest.java is different
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is different
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java

grhoten · 2025-05-30T21:57:44Z

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java

@@ -416,4 +416,7 @@ public void addSourceTargetSet(UnicodeSet inputFilter, UnicodeSet sourceSet, Uni
        }
    }

+    private static class WordBreakIteratorHolder {
+        static fin
8000
al BreakIterator BI = BreakIterator.getWordInstance(new ULocale("th_TH"));


Yeah, using the Thai break iterator is generally pointless. It might load the Thai dictionary preemptively, but using ULocale.ROOT would generally be more appropriate.

grhoten · 2025-05-30T22:27:41Z

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java

+        int[] boundaries = new int[50];
+        int boundaryCount = 0;


This is being way overoptimized, and it probably exists due to ICU's long history. Now that ArrayList exists in Java, which is not synchronized when compared to the older Vector implementation, this should probably transition to ArrayList.

This change may affect the garbage collector. I think the initial capacity is 10 for an ArrayList, but we could guess an initial capacity of an ArrayList with something like ensureCapacity(Math.max(text.length()/CODEUNITS_PER_WORD, 4)) where CODEUNITS_PER_WORD could be 8 as a good starting point. CODEUNITS_PER_WORD could be 4 if it's reallocating too much, or the minimum capacity could be raised.

One concern is that ArrayList<Integer> would also need to box the primitive values into wrapper objects, with an additional performance penalty. The existing primitive int[] does not have this problem

macchiati

LGTM

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java

jira-pull-request-webhook · 2025-05-31T00:28:09Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java is now changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

stevenschlansker · 2025-05-31T00:31:43Z

I re-ran the benchmarks with the latest changes.

synchronized, 1T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   419.108 ±  26.005  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  4022.897 ± 124.318  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   802.483 ±  29.266  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  5383.847 ± 277.556  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   820.756 ±  54.370  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  5015.698 ± 216.265  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   820.914 ±  54.945  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  4716.300 ± 276.341  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   706.661 ±  29.485  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  4634.731 ± 368.809  ops/ms

synchronized, 8T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   594.238 ±  15.677  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  3328.109 ±  58.666  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   797.488 ±  19.831  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  4302.125 ±  36.354  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   801.179 ±  21.477  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  4366.354 ±  21.683  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   802.491 ±   5.279  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  3835.846 ± 205.297  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   598.344 ±  16.620  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  3612.776 ±  10.186  ops/ms

local-alloc, 1T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   378.975 ±   8.528  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  2176.368 ±  66.654  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   817.171 ±  40.881  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  5887.798 ± 463.620  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   847.376 ±  53.358  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  5730.719 ± 377.115  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   835.202 ±  55.587  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  5244.637 ± 246.148  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   617.542 ±  43.544  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  5048.793 ± 404.342  ops/ms

local-alloc, 8T:

Benchmark                                     Mode  Cnt      Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   2805.141 ±  33.812  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5   4805.065 ± 171.575  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   6810.324 ±  56.592  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  44420.295 ± 890.433  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   6655.300 ±  20.258  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  42945.707 ± 116.837  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   5288.342 ±  97.473  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  32089.013 ± 633.706  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   4585.569 ±  66.147  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  36655.642 ± 421.983  ops/ms

mihnita

Thank you very much for the updates.

I only have a few comments, hopefully non-controversial.

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java

icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java

…ention Current implementation of lower, upper, title, etc synchronizes the whole operation, so it is always single-threaded Removing the shared state lets us utilize multiple CPU cores

jira-pull-request-webhook · 2025-06-06T16:44:44Z

Notice: the branch changed across the force-push!

icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java is no longer changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerfTest.java is now changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is no longer changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerfTest.java is now changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is no longer changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerfTest.java is now changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is no longer changed in the branch
icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerfTest.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

stevenschlansker · 2025-06-06T16:47:15Z

I made the suggested renamings. Please let me know what you would like to do with the BreakTransliterator non-public access case.

mihnita

Thank you very much!

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 95a0533 to 2af99d6 Compare February 25, 2025 18:55

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 2af99d6 to 2197adb Compare February 25, 2025 18:58

stevenschlansker changed the title ~~LowercaseTransliterator has poor scalability due to synchronized state~~ ICU-23061: LowercaseTransliterator has poor scalability due to synchronized state Feb 25, 2025

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 2197adb to f332c3c Compare February 25, 2025 19:11

markusicu self-assigned this Feb 27, 2025

markusicu requested review from echeran, mihnita, yumaoka, macchiati, aheninger and richgillam February 27, 2025 19:21

macchiati previously approved these changes Feb 27, 2025

View reviewed changes

stevenschlansker dismissed macchiati’s stale review via 886f66f February 27, 2025 21:51

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from f332c3c to 886f66f Compare February 27, 2025 21:51

stevenschlansker changed the title ~~ICU-23061: LowercaseTransliterator has poor scalability due to synchronized state~~ ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state Mar 17, 2025

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 886f66f to 268eab2 Compare March 17, 2025 23:22

stevenschlansker force-pushed the translit-lower-scalability-jmh branch 2 times, most recently from 3463237 to 94ec9ff Compare March 28, 2025 00:44

mihnita reviewed May 20, 2025

View reviewed changes

mihnita reviewed May 21, 2025

View reviewed changes

icu4j/perf-tests/pom.xml Outdated Show resolved Hide resolved

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 94ec9ff to c76e791 Compare May 30, 2025 21:22

stevenschlansker commented May 30, 2025

View reviewed changes

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java Show resolved Hide resolved

stevenschlansker commented May 30, 2025

View reviewed changes

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java Show resolved Hide resolved

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from c76e791 to 9bb5d33 Compare May 30, 2025 21:27

grhoten reviewed May 30, 2025

View reviewed changes

macchiati previously approved these changes May 30, 2025

View reviewed changes

icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java Show resolved Hide resolved

stevenschlansker dismissed macchiati’s stale review via 9715cf6 May 31, 2025 00:28

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 9bb5d33 to 9715cf6 Compare May 31, 2025 00:28

macchiati previously approved these changes May 31, 2025

View reviewed changes

mihnita reviewed Jun 5, 2025

View reviewed changes

ICU-23061 Transliterator improve scalability by avoiding monitor cont…

bef0d2a

…ention Current implementation of lower, upper, title, etc synchronizes the whole operation, so it is always single-threaded Removing the shared state lets us utilize multiple CPU cores

stevenschlansker dismissed macchiati’s stale review via bef0d2a June 6, 2025 16:44

stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 9715cf6 to bef0d2a Compare June 6, 2025 16:44

stevenschlansker closed this Jun 6, 2025

stevenschlansker reopened this Jun 6, 2025

mihnita approved these changes Jun 9, 2025

View reviewed changes

Uh oh!

ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state #3417

Are you sure you want to change the base?

ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state #3417

Uh oh!

Conversation

Uh oh!

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!