Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] #5566

l0rinc · 2016-11-30T10:17:40Z

Backported by recommendation from @Ichoran and @retronym: #5528 (comment)

l0rinc · 2016-11-30T12:26:46Z

src/library/scala/collection/mutable/HashTable.scala

+    protected final def improve(hcode: Int, seed: Int): Int = {
+      val hash = scala.util.hashing.byteswap32(hcode)
+      val shift = seed & ((1 << 5) - 1)
+      (hash >>> shift) | (hash << (32 - shift))


@Ichoran, @retronym, I just noticed this can be changed to Integer.rotateRight.

~~Let's not merge it yet, let's see if that's more optimal (should be replaced with a single assembly instruction).~~

szeiger · 2016-12-12T16:11:40Z

/rebuild

sjrd · 2016-12-12T19:53:36Z

src/library/scala/collection/mutable/HashTable.scala

-    val shifted = (improved >> (32 - java.lang.Integer.bitCount(ones))) & ones
-    shifted
+    val exponent = Integer.numberOfLeadingZeros(ones)
+    (improve(hcode, seedvalue) >>> exponent) & ones


Now I realize: since ones is of the form 00...0011...11, I do not think there is any value of x for which (x >>> exponent) & ones is different from (x >>> exponent). It seems to me that & ones is redundant and can be dropped. That's because & ones masks off the exponent most significant bits of (x >>> exponent), which by construction already has 0's in all its exponent most significant bits.

I thought about that also (though I thought I still have to zero out the highest bit), but haven't validated my find yet.
Thanks, lemme' recheck :)

(Integer.MAX_VALUE >>> 32 & 0) != (Integer.MAX_VALUE >>> 32)

i.e. if the length is 1, ones will be 0, exponent will be 32, which would produce the same shift as 0 (i.e. 123 >>> 32 == 123) ... other than that (i.e. if length > 1) this seems to be true indeed.
Should I simply assert that the length >= 2 at the beginning?

Edit:

protected final def index(hcode: Int): Int = { assert(table.length >= 2) improve(hcode, seedvalue) >>> numberOfLeadingZeros(table.length - 1) }

or simply

protected final def index(hcode: Int): Int = improve(hcode, seedvalue) >>> numberOfLeadingZeros(table.length - 1)

?

Very good observation!

But I don't think we need to care. If length == 1, we do not care about the quality of hashes by definition, since everything enters the same bucket anyway. So this does not seem to be an issue. It probably deserves a comment, though.

Should I simply assert that the length >= 2 at the beginning?

No, assert would kill all the performance of this performance-sensitive method.

Oh I'm not paying enough attention tonight, it seems. This is the index method, which needs to return 0 for correctness when length == 1 (assuming that ever happens).

I don't know whether it is reasonable to assume that length is never 1. If it is, then we can drop & ones. If it isn't, we need to keep the code as it currently is, but then it would probably be a good thing to add a comment that & ones is there specifically to handle length == 1, in case someone wonders in the future.

l0rinc · 2016-12-12T22:26:06Z

src/library/scala/collection/mutable/HashTable.scala

-    val exponent = Integer.numberOfLeadingZeros(ones)
-    (improve(hcode, seedvalue) >>> exponent) & ones
-  }
+  protected final def index(hcode: Int): Int = if (table.length == 1) 0 else improve(hcode, seedvalue) >>> numberOfLeadingZeros(table.length - 1)


@sjrd, thanks for the comments, is this too awkward?
Branch prediction should eliminate this fork.

I don't know. At this point we're past my understanding of JVM performance. I would expect & ones (one CPU cycle with operands already loaded in registers so no stalling due to memory) to always be faster than a branch, even if correctly predicted and/or using a conditional move, but I cannot assert that.

Instruction Level Parallelism can achieve > 1 instruction/clock (IPC), so a predictable branch guarding an operation might in fact be faster than a branchless algorithm with a data dependency.

Thanks @retronym, and also, the JVM can detect that the first branch is never taken and eliminate it completely :)

l0rinc · 2016-12-13T14:20:06Z

@szeiger, what should I do to avoid all these build failures, I can't even check the source of the combined :/

szeiger · 2016-12-13T15:00:20Z

@paplorinc combined just means that there are earlier commits for which the build failed. I don't think we have a way to selectively rebuild those. Maybe @SethTisue knows more. Do you intend to squash this before merging? (i.o.w. is all of this still a backport or are there new improvements that should be merged forward?) This would trigger a build of the new commits anyway, so the problem should go away. Otherwise we can just ignore the failures.

lrytz · 2016-12-13T15:35:06Z

selectively rebuild those

You can open the corresponding "validate-main" build on jenkins, log in, and click "Rebuild"

l0rinc · 2016-12-13T17:52:04Z

Thanks @szeiger, squashed it :)
@lrytz, logged in to @scala-jenkins, now I get 404 and can only access it in incognito mode :)

lrytz · 2016-12-13T19:57:36Z

@lrytz, logged in to @scala-jenkins, now I get 404 and can only access it in incognito mode :)

uh, that's not very nice :) i knew that jenkins control is only available to a few, but causing 404 for others is rather unfriendly.. does it help to access https://scala-ci.typesafe.com/logout ?

adriaanm · 2016-12-19T21:54:59Z

After cherry-picking 9a24860 on 2.11.x (which could be good to include in this PR?), and cherry picking 26c87f1 from #5600 onto my local 2.12.x, the diff with 2.12.x is empty for the three files affected by this PR.

l0rinc · 2016-12-19T22:11:28Z

Added 9a24860 to this commit an rebased it.
Not sure I understood the rest, but I've rebased #5600 also.

adriaanm · 2016-12-19T22:17:04Z

Great, thanks! My comment was meant to indicate I did a quick check on this PR, making sure that the backport indeed got these files up to date with 2.12.x.

adriaanm · 2016-12-19T22:27:34Z

For your next backport, I'd suggest using git cherry-pick. For this PR, the command would have been git cherry-pick -x 9a24860 a501444 7952525 bc91223 b67ca7d 26c87f1 (the -x includes the original commit sha in the commit message for future reference)

(cherry picked from commit 9a24860)

(`ops/s`, smaller is better) `Before (9c5d3f8)`: ```scala [info] # Run complete. Total time: 00:08:15 [info] [info] Benchmark (size) Mode Cnt Score Error Units [info] s.c.immutable.VectorMapBenchmark.groupBy 10 avgt 20 645.594 ± 9.435 ns/op [info] s.c.immutable.VectorMapBenchmark.groupBy 100 avgt 20 2084.216 ± 37.814 ns/op [info] s.c.immutable.VectorMapBenchmark.groupBy 1000 avgt 20 19878.481 ± 262.404 ns/op [info] s.c.mutable.HashMapBenchmark.get 10 avgt 20 689.941 ± 5.850 ns/op [info] s.c.mutable.HashMapBenchmark.get 100 avgt 20 7357.330 ± 45.956 ns/op [info] s.c.mutable.HashMapBenchmark.get 1000 avgt 20 95767.200 ± 1550.771 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 10 avgt 20 509.181 ± 2.683 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 100 avgt 20 5563.301 ± 32.335 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 1000 avgt 20 71965.365 ± 1809.738 ns/op [info] s.c.mutable.HashMapBenchmark.put 10 avgt 20 247.270 ± 3.972 ns/op [info] s.c.mutable.HashMapBenchmark.put 100 avgt 20 5646.185 ± 106.172 ns/op [info] s.c.mutable.HashMapBenchmark.put 1000 avgt 20 81303.663 ± 954.938 ns/op ``` `Changed modulo to bitwise and in hash calculation (4c729fe)`: ```scala [info] Benchmark (size) Mode Cnt Score Error Units [info] s.c.immutable.VectorMapBenchmark.groupBy 10 avgt 20 631.291 ± 9.269 ns/op [info] s.c.immutable.VectorMapBenchmark.groupBy 100 avgt 20 2077.885 ± 59.737 ns/op [info] s.c.immutable.VectorMapBenchmark.groupBy 1000 avgt 20 15458.278 ± 317.347 ns/op [info] s.c.mutable.HashMapBenchmark.get 10 avgt 20 678.013 ± 4.453 ns/op [info] s.c.mutable.HashMapBenchmark.get 100 avgt 20 7258.522 ± 76.088 ns/op [info] s.c.mutable.HashMapBenchmark.get 1000 avgt 20 94748.845 ± 1226.120 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 10 avgt 20 498.042 ± 5.006 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 100 avgt 20 5243.154 ± 110.372 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 1000 avgt 20 68194.752 ± 655.436 ns/op [info] s.c.mutable.HashMapBenchmark.put 10 avgt 20 257.275 ± 1.411 ns/op [info] s.c.mutable.HashMapBenchmark.put 100 avgt 20 5318.532 ± 152.923 ns/op [info] s.c.mutable.HashMapBenchmark.put 1000 avgt 20 79607.160 ± 651.779 ns/op ``` `Optimized HashTable.index (6cc1504)`: ```scala [info] Benchmark (size) Mode Cnt Score Error Units [info] s.c.immutable.VectorMapBenchmark.groupBy 10 avgt 20 616.164 ± 4.712 ns/op [info] s.c.immutable.VectorMapBenchmark.groupBy 100 avgt 20 2034.447 ± 14.495 ns/op [info] s.c.immutable.VectorMapBenchmark.groupBy 1000 avgt 20 14712.164 ± 119.983 ns/op [info] s.c.mutable.HashMapBenchmark.get 10 avgt 20 679.046 ± 6.872 ns/op [info] s.c.mutable.HashMapBenchmark.get 100 avgt 20 7242.097 ± 41.244 ns/op [info] s.c.mutable.HashMapBenchmark.get 1000 avgt 20 95342.919 ± 1521.328 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 10 avgt 20 488.034 ± 4.554 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 100 avgt 20 4883.123 ± 59.268 ns/op [info] s.c.mutable.HashMapBenchmark.getOrElseUpdate 1000 avgt 20 65174.034 ± 496.759 ns/op [info] s.c.mutable.HashMapBenchmark.put 10 avgt 20 267.983 ± 1.797 ns/op [info] s.c.mutable.HashMapBenchmark.put 100 avgt 20 5097.351 ± 104.538 ns/op [info] s.c.mutable.HashMapBenchmark.put 1000 avgt 20 78772.540 ± 543.935 ns/op ``` Summary, i.e. the effect of this PR, according to the benchmarks: * `groupBy` has a `~35%` speedup * `get` didn't change * `getOrElseUpdate` has a `~10%` speedup * `put` has a `~3%` speedup Note: caching the `exponent` to a local private field (`Byte` or `Int`) didn't have any performance advantage (only a minor slowdown was measured, possibly because it's accessed via an interface now) (cherry picked from commit a501444)

(cherry picked from commit 7952525)

(cherry picked from commit bc91223)

Fixes https://issues.scala-lang.org/browse/SI-10049 Since `groupBy` uses this method extensively and suffered a measurable slowdown in `2.12.0`, this modification restores (and exceeds) its original speed. --- included benchmarks: (`ns/op` → smaller is better) `before (2.12.0):` ```java Benchmark (size) Mode Cnt Score Error Units s.c.immutable.VectorMapBenchmark.groupBy 10 avgt 20 865.693 ± 7.869 ns/op s.c.immutable.VectorMapBenchmark.groupBy 100 avgt 20 3095.657 ± 56.438 ns/op s.c.immutable.VectorMapBenchmark.groupBy 1000 avgt 20 28247.005 ± 470.513 ns/op s.c.mutable.HashMapBenchmark.get 10 avgt 20 679.448 ± 11.809 ns/op s.c.mutable.HashMapBenchmark.get 100 avgt 20 7240.178 ± 61.734 ns/op s.c.mutable.HashMapBenchmark.get 1000 avgt 20 95725.127 ± 2373.458 ns/op s.c.mutable.HashMapBenchmark.getOrElseUpdate 10 avgt 20 836.561 ± 20.085 ns/op s.c.mutable.HashMapBenchmark.getOrElseUpdate 100 avgt 20 7891.368 ± 56.808 ns/op s.c.mutable.HashMapBenchmark.getOrElseUpdate 1000 avgt 20 97478.629 ± 1782.497 ns/op s.c.mutable.HashMapBenchmark.put 10 avgt 20 243.422 ± 2.915 ns/op s.c.mutable.HashMapBenchmark.put 100 avgt 20 5810.927 ± 60.054 ns/op s.c.mutable.HashMapBenchmark.put 1000 avgt 20 82175.539 ± 1690.296 ns/op ``` `after:` ```java Benchmark (size) Mode Cnt Score Error Units s.c.immutable.VectorMapBenchmark.groupBy 10 avgt 20 627.007 ± 9.718 ns/op s.c.immutable.VectorMapBenchmark.groupBy 100 avgt 20 2086.955 ± 19.042 ns/op s.c.immutable.VectorMapBenchmark.groupBy 1000 avgt 20 19515.234 ± 173.647 ns/op s.c.mutable.HashMapBenchmark.get 10 avgt 20 683.977 ± 11.843 ns/op s.c.mutable.HashMapBenchmark.get 100 avgt 20 7345.675 ± 41.092 ns/op s.c.mutable.HashMapBenchmark.get 1000 avgt 20 95085.926 ± 1702.997 ns/op s.c.mutable.HashMapBenchmark.getOrElseUpdate 10 avgt 20 503.208 ± 2.643 ns/op s.c.mutable.HashMapBenchmark.getOrElseUpdate 100 avgt 20 5526.483 ± 28.262 ns/op s.c.mutable.HashMapBenchmark.getOrElseUpdate 1000 avgt 20 69265.900 ± 674.958 ns/op s.c.mutable.HashMapBenchmark.put 10 avgt 20 252.481 ± 7.597 ns/op s.c.mutable.HashMapBenchmark.put 100 avgt 20 5708.034 ± 110.360 ns/op s.c.mutable.HashMapBenchmark.put 1000 avgt 20 82051.378 ± 1432.009 ns/op ``` i.e. for the given benchmark conditions `~40%` faster `groupBy` and `getOrElseUpdate` (cherry picked from commit b67ca7d)

(cherry picked from commit 26c87f1)

l0rinc · 2016-12-19T22:40:57Z

You're right @adriaanm, thank you for your suggestion, force pushed it (excluding the benchmark modifications and 635b8af).
I squashed it previously as it kept failing.

adriaanm · 2016-12-19T22:51:03Z

Thanks! Also appreciate your patience with that sneaky genLoadIdent bug that's been plaguing the 2.11.x PR queue :-/ We've spent a lot of time chasing it already. I'm sure @retronym will corner it soon.

adriaanm · 2016-12-19T22:52:42Z

Since these are all backports, I'll add the [ci: last-only] marker so that only the last commit needs to pass the CI.

adriaanm · 2016-12-19T23:48:27Z

/rebuild

l0rinc · 2016-12-20T17:06:01Z

@SethTisue, thanks for your patience, this build seems to fail again for some weird reason too: https://scala-ci.typesafe.com/job/scala-2.11.x-validate-publish-core/2978/console

adriaanm · 2016-12-20T18:10:54Z

/rebuild

adriaanm · 2016-12-20T18:55:55Z

(The weird compiler crash is tracked as scala/scala-dev#251)

adriaanm · 2017-01-04T20:05:12Z

Rebased in #5626

scala-jenkins added this to the 2.11.9 milestone Nov 30, 2016

l0rinc commented Nov 30, 2016

View reviewed changes

l0rinc force-pushed the HashTable_2.12_backport branch 3 times, most recently from 4dab23f to 8ed464a Compare December 7, 2016 11:36

sjrd reviewed Dec 12, 2016

View reviewed changes

l0rinc force-pushed the HashTable_2.12_backport branch from 8ed464a to cd0a77b Compare December 12, 2016 22:24

l0rinc commented Dec 12, 2016

View reviewed changes

l0rinc mentioned this pull request Dec 13, 2016

Further small HashTable optimizations #5600

Merged

l0rinc force-pushed the HashTable_2.12_backport branch from 99a46f9 to cd0a77b Compare December 13, 2016 14:17

l0rinc force-pushed the HashTable_2.12_backport branch from cd0a77b to 347b9db Compare December 13, 2016 17:48

l0rinc force-pushed the HashTable_2.12_backport branch 2 times, most recently from a906ecd to 04259f5 Compare December 19, 2016 22:11

Carsten Varming and others added 6 commits December 20, 2016 00:33

SI-8774 Null link fields in mutable hash maps on removal.

78b0914

(cherry picked from commit 9a24860)

Changed modulo to bitwise AND in hash calculation

3a3a4ff

(cherry picked from commit 7952525)

Changed hashing bit rotation to use Integer.rotateRight

c7ddb41

(cherry picked from commit bc91223)

Simplify HashTable.index further

1ee7d21

(cherry picked from commit 26c87f1)

l0rinc force-pushed the HashTable_2.12_backport branch from 04259f5 to 1ee7d21 Compare December 19, 2016 22:39< F438 /relative-time>

adriaanm changed the title ~~Hash table getOrElseUpdate and indexing 2.12 backport~~ Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] Dec 19, 2016

adriaanm mentioned this pull request Jan 4, 2017

[backport] Hash table getOrElseUpdate and indexing #5626

Merged

adriaanm closed this Jan 4, 2017

l0rinc deleted the HashTable_2.12_backport branch January 4, 2017 20:06

SethTisue removed this from the 2.11.9 milestone Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] #5566

Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] #5566

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] #5566

Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] #5566

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!