8000 Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] by l0rinc · Pull Request #5566 · scala/scala · GitHub
[go: up one dir, main page]

Skip to content

Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] #5566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

l0rinc
Copy link
Contributor
@l0rinc l0rinc commented Nov 30, 2016

Backported by recommendation from @Ichoran and @retronym: #5528 (comment)

@scala-jenkins scala-jenkins added this to the 2.11.9 milestone Nov 30, 2016
protected final def improve(hcode: Int, seed: Int): Int = {
val hash = scala.util.hashing.byteswap32(hcode)
val shift = seed & ((1 << 5) - 1)
(hash >>> shift) | (hash << (32 - shift))
Copy link
Contributor Author
@l0rinc l0rinc Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ichoran, @retronym, I just noticed this can be changed to Integer.rotateRight.

Let's not merge it yet, let's see if that's more optimal (should be replaced with a single assembly instruction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed :)

@l0rinc l0rinc force-pushed the HashTable_2.12_backport branch 3 times, most recently from 4dab23f to 8ed464a Compare December 7, 2016 11:36
@szeiger
Copy link
Contributor
szeiger commented Dec 12, 2016

/rebuild

val shifted = (improved >> (32 - java.lang.Integer.bitCount(ones))) & ones
shifted
val exponent = Integer.numberOfLeadingZeros(ones)
(improve(hcode, seedvalue) >>> exponent) & ones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I realize: since ones is of the form 00...0011...11, I do not think there is any value of x for which (x >>> exponent) & ones is different from (x >>> exponent). It seems to me that & ones is redundant and can be dropped. That's because & ones masks off the exponent most significant bits of (x >>> exponent), which by construction already has 0's in all its exponent most significant bits.

Copy link
Contributor Author
@l0rinc l0rinc Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that also (though I thought I still have to zero out the highest bit), but haven't validated my find yet.
Thanks, lemme' recheck :)

Copy link
Contributor Author
@l0rinc l0rinc Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Integer.MAX_VALUE >>> 32 & 0) != (Integer.MAX_VALUE >>> 32)

i.e. if the length is 1, ones will be 0, exponent will be 32, which would produce the same shift as 0 (i.e. 123 >>> 32 == 123) ... other than that (i.e. if length > 1) this seems to be true indeed.
Should I simply assert that the length >= 2 at the beginning?

Edit:

protected final def index(hcode: Int): Int = {
  assert(table.length >= 2)
  improve(hcode, seedvalue) >>> numberOfLeadingZeros(table.length - 1)
}

or simply

protected final def index(hcode: Int): Int = improve(hcode, seedvalue) >>> numberOfLeadingZeros(table.length - 1)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good observation!

But I don't think we need to care. If length == 1, we do not care about the quality of hashes by definition, since everything enters the same bucket anyway. So this does not seem to be an issue. It probably deserves a comment, though.

Should I simply assert that the length >= 2 at the beginning?

No, assert would kill all the performance of this performance-sensitive method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm not paying enough attention tonight, it seems. This is the index method, which needs to return 0 for correctness when length == 1 (assuming that ever happens).

I don't know whether it is reasonable to assume that length is never 1. If it is, then we can drop & ones. If it isn't, we need to keep the code as it currently is, but then it would probably be a good thing to add a comment that & ones is there specifically to handle length == 1, in case someone wonders in the future.

@l0rinc l0rinc force-pushed the HashTable_2.12_backport branch from 8ed464a to cd0a77b Compare December 12, 2016 22:24
val exponent = Integer.numberOfLeadingZeros(ones)
(improve(hcode, seedvalue) >>> exponent) & ones
}
protected final def index(hcode: Int): Int = if (table.length == 1) 0 else improve(hcode, seedvalue) >>> numberOfLeadingZeros(table.length - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjrd, thanks for the comments, is this too awkward?
Branch prediction should eliminate this fork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. At this point we're past my understanding of JVM performance. I would expect & ones (one CPU cycle with operands already loaded in registers so no stalling due to memory) to always be faster than a branch, even if correctly predicted and/or using a conditional move, but I cannot assert that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instruction Level Parallelism can achieve > 1 instruction/clock (IPC), so a predictable branch guarding an operation might in fact be faster than a branchless algorithm with a data dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @retronym, and also, the JVM can detect that the first branch is never taken and eliminate it completely :)

@l0rinc
Copy link
Contributor Author
l0rinc commented Dec 13, 2016 8000

@szeiger, what should I do to avoid all these build failures, I can't even check the source of the combined :/

@szeiger
Copy link
Contributor
szeiger commented Dec 13, 2016

@paplorinc combined just means that there are earlier commits for which the build failed. I don't think we have a way to selectively rebuild those. Maybe @SethTisue knows more. Do you intend to squash this before merging? (i.o.w. is all of this still a backport or are there new improvements that should be merged forward?) This would trigger a build of the new commits anyway, so the problem should go away. Otherwise we can just ignore the failures.

@lrytz
Copy link
Member
lrytz commented Dec 13, 2016

selectively rebuild those

You can open the corresponding "validate-main" build on jenkins, log in, and click "Rebuild"

@l0rinc l0rinc force-pushed the HashTable_2.12_backport branch from cd0a77b to 347b9db Compare December 13, 2016 17:48
@l0rinc
Copy link
Contributor Author
l0rinc commented Dec 13, 2016

Thanks @szeiger, squashed it :)
@lrytz, logged in to @scala-jenkins, now I get 404 and can only access it in incognito mode :)

@lrytz
Copy link
Member
lrytz commented Dec 13, 2016

@lrytz, logged in to @scala-jenkins, now I get 404 and can only access it in incognito mode :)

uh, that's not very nice :) i knew that jenkins control is only available to a few, but causing 404 for others is rather unfriendly.. does it help to access https://scala-ci.typesafe.com/logout ?

@adriaanm
Copy link
Contributor

After cherry-picking 9a24860 on 2.11.x (which could be good to include in this PR?), and cherry picking 26c87f1 from #5600 onto my local 2.12.x, the diff with 2.12.x is empty for the three files affected by this PR.

@l0rinc l0rinc force-pushed the HashTable_2.12_backport branch 2 times, most recently from a906ecd to 04259f5 Compare December 19, 2016 22:11
@l0rinc
Copy link
Contributor Author
l0rinc commented Dec 19, 2016

Added 9a24860 to this commit an rebased it.
Not sure I understood the rest, but I've rebased #5600 also.

@adriaanm
Copy link
Contributor

Great, thanks! My comment was meant to indicate I did a quick check on this PR, making sure that the backport indeed got these files up to date with 2.12.x.

@adriaanm
Copy link
Contributor

For your next backport, I'd suggest using git cherry-pick. For this PR, the command would have been git cherry-pick -x 9a24860 a501444 7952525 bc91223 b67ca7d 26c87f1 (the -x includes the original commit sha in the commit message for future reference)

Carsten Varming and others added 6 commits December 20, 2016 00:33
(`ops/s`, smaller is better)

`Before (9c5d3f8)`:
```scala
[info] # Run complete. Total time: 00:08:15
[info]
[info] Benchmark                                     (size)  Mode  Cnt      Score      Error  Units
[info] s.c.immutable.VectorMapBenchmark.groupBy          10  avgt   20    645.594 ±    9.435  ns/op
[info] s.c.immutable.VectorMapBenchmark.groupBy         100  avgt   20   2084.216 ±   37.814  ns/op
[info] s.c.immutable.VectorMapBenchmark.groupBy        1000  avgt   20  19878.481 ±  262.404  ns/op
[info] s.c.mutable.HashMapBenchmark.get                  10  avgt   20    689.941 ±    5.850  ns/op
[info] s.c.mutable.HashMapBenchmark.get                 100  avgt   20   7357.330 ±   45.956  ns/op
[info] s.c.mutable.HashMapBenchmark.get                1000  avgt   20  95767.200 ± 1550.771  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate      10  avgt   20    509.181 ±    2.683  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate     100  avgt   20   5563.301 ±   32.335  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate    1000  avgt   20  71965.365 ± 1809.738  ns/op
[info] s.c.mutable.HashMapBenchmark.put                  10  avgt   20    247.270 ±    3.972  ns/op
[info] s.c.mutable.HashMapBenchmark.put                 100  avgt   20   5646.185 ±  106.172  ns/op
[info] s.c.mutable.HashMapBenchmark.put                1000  avgt   20  81303.663 ±  954.938  ns/op
```

`Changed modulo to bitwise and in hash calculation (4c729fe)`:
```scala
[info] Benchmark                                     (size)  Mode  Cnt      Score      Error  Units
[info] s.c.immutable.VectorMapBenchmark.groupBy          10  avgt   20    631.291 ±    9.269  ns/op
[info] s.c.immutable.VectorMapBenchmark.groupBy         100  avgt   20   2077.885 ±   59.737  ns/op
[info] s.c.immutable.VectorMapBenchmark.groupBy        1000  avgt   20  15458.278 ±  317.347  ns/op
[info] s.c.mutable.HashMapBenchmark.get                  10  avgt   20    678.013 ±    4.453  ns/op
[info] s.c.mutable.HashMapBenchmark.get                 100  avgt   20   7258.522 ±   76.088  ns/op
[info] s.c.mutable.HashMapBenchmark.get                1000  avgt   20  94748.845 ± 1226.120  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate      10  avgt   20    498.042 ±    5.006  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate     100  avgt   20   5243.154 ±  110.372  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate    1000  avgt   20  68194.752 ±  655.436  ns/op
[info] s.c.mutable.HashMapBenchmark.put                  10  avgt   20    257.275 ±    1.411  ns/op
[info] s.c.mutable.HashMapBenchmark.put                 100  avgt   20   5318.532 ±  152.923  ns/op
[info] s.c.mutable.HashMapBenchmark.put                1000  avgt   20  79607.160 ±  651.779  ns/op

```

`Optimized HashTable.index (6cc1504)`:
```scala
[info] Benchmark                                     (size)  Mode  Cnt      Score      Error  Units
[info] s.c.immutable.VectorMapBenchmark.groupBy          10  avgt   20    616.164 ±    4.712  ns/op
[info] s.c.immutable.VectorMapBenchmark.groupBy         100  avgt   20   2034.447 ±   14.495  ns/op
[info] s.c.immutable.VectorMapBenchmark.groupBy        1000  avgt   20  14712.164 ±  119.983  ns/op
[info] s.c.mutable.HashMapBenchmark.get                  10  avgt   20    679.046 ±    6.872  ns/op
[info] s.c.mutable.HashMapBenchmark.get                 100  avgt   20   7242.097 ±   41.244  ns/op
[info] s.c.mutable.HashMapBenchmark.get                1000  avgt   20  95342.919 ± 1521.328  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate      10  avgt   20    488.034 ±    4.554  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate     100  avgt   20   4883.123 ±   59.268  ns/op
[info] s.c.mutable.HashMapBenchmark.getOrElseUpdate    1000  avgt   20  65174.034 ±  496.759  ns/op
[info] s.c.mutable.HashMapBenchmark.put                  10  avgt   20    267.983 ±    1.797  ns/op
[info] s.c.mutable.HashMapBenchmark.put                 100  avgt   20   5097.351 ±  104.538  ns/op
[info] s.c.mutable.HashMapBenchmark.put                1000  avgt   20  78772.540 ±  543.935  ns/op
```

Summary, i.e. the effect of this PR, according to the benchmarks:
* `groupBy` has a `~35%` speedup
* `get` didn't change
* `getOrElseUpdate` has a `~10%` speedup
* `put` has a `~3%` speedup

Note: caching the `exponent` to a local private field (`Byte` or `Int`) didn't have any performance advantage (only a minor slowdown was measured, possibly because it's accessed via an interface now)
(cherry picked from commit a501444)
Fixes https://issues.scala-lang.org/browse/SI-10049

Since `groupBy` uses this method extensively and suffered a measurable slowdown in `2.12.0`, this modification restores (and exceeds) its original speed.

---
included benchmarks:

(`ns/op` → smaller is better)

`before (2.12.0):`
```java
Benchmark                                     (size)  Mode  Cnt      Score      Error  Units
s.c.immutable.VectorMapBenchmark.groupBy          10  avgt   20    865.693 ±    7.869  ns/op
s.c.immutable.VectorMapBenchmark.groupBy         100  avgt   20   3095.657 ±   56.438  ns/op
s.c.immutable.VectorMapBenchmark.groupBy        1000  avgt   20  28247.005 ±  470.513  ns/op
s.c.mutable.HashMapBenchmark.get                  10  avgt   20    679.448 ±   11.809  ns/op
s.c.mutable.HashMapBenchmark.get                 100  avgt   20   7240.178 ±   61.734  ns/op
s.c.mutable.HashMapBenchmark.get                1000  avgt   20  95725.127 ± 2373.458  ns/op
s.c.mutable.HashMapBenchmark.getOrElseUpdate      10  avgt   20    836.561 ±   20.085  ns/op
s.c.mutable.HashMapBenchmark.getOrElseUpdate     100  avgt   20   7891.368 ±   56.808  ns/op
s.c.mutable.HashMapBenchmark.getOrElseUpdate    1000  avgt   20  97478.629 ± 1782.497  ns/op
s.c.mutable.HashMapBenchmark.put                  10  avgt   20    243.422 ±    2.915  ns/op
s.c.mutable.HashMapBenchmark.put                 100  avgt   20   5810.927 ±   60.054  ns/op
s.c.mutable.HashMapBenchmark.put                1000  avgt   20  82175.539 ± 1690.296  ns/op
```
`after:`
```java
Benchmark                                     (size)  Mode  Cnt      Score      Error  Units
s.c.immutable.VectorMapBenchmark.groupBy          10  avgt   20    627.007 ±    9.718  ns/op
s.c.immutable.VectorMapBenchmark.groupBy         100  avgt   20   2086.955 ±   19.042  ns/op
s.c.immutable.VectorMapBenchmark.groupBy        1000  avgt   20  19515.234 ±  173.647  ns/op
s.c.mutable.HashMapBenchmark.get                  10  avgt   20    683.977 ±   11.843  ns/op
s.c.mutable.HashMapBenchmark.get                 100  avgt   20   7345.675 ±   41.092  ns/op
s.c.mutable.HashMapBenchmark.get                1000  avgt   20  95085.926 ± 1702.997  ns/op
s.c.mutable.HashMapBenchmark.getOrElseUpdate      10  avgt   20    503.208 ±    2.643  ns/op
s.c.mutable.HashMapBenchmark.getOrElseUpdate     100  avgt   20   5526.483 ±   28.262  ns/op
s.c.mutable.HashMapBenchmark.getOrElseUpdate    1000  avgt   20  69265.900 ±  674.958  ns/op
s.c.mutable.HashMapBenchmark.put                  10  avgt   20    252.481 ±    7.597  ns/op
s.c.mutable.HashMapBenchmark.put                 100  avgt   20   5708.034 ±  110.360  ns/op
s.c.mutable.HashMapBenchmark.put                1000  avgt   20  82051.378 ± 1432.009  ns/op
```

i.e. for the given benchmark conditions `~40%` faster `groupBy` and `getOrElseUpdate`

(cherry picked from commit b67ca7d)
(cherry picked from commit 26c87f1)
@l0rinc
Copy link
Contributor Author
l0rinc commented Dec 19, 2016

You're right @adriaanm, thank you for your suggestion, force pushed it (excluding the benchmark modifications and 635b8af).
I squashed it previously as it kept failing.

@adriaanm
Copy link
Contributor

Thanks! Also appreciate your patience with that sneaky genLoadIdent bug that's been plaguing the 2.11.x PR queue :-/ We've spent a lot of time chasing it already. I'm sure @retronym will corner it soon.

@adriaanm
Copy link
Contributor

Since these are all backports, I'll add the [ci: last-only] marker so that only the last commit needs to pass the CI.

@adriaanm adriaanm changed the title Hash table getOrElseUpdate and indexing 2.12 backport Hash table getOrElseUpdate and indexing 2.12 backport [ci: last-only] Dec 19, 2016
@adriaanm
Copy link
Contributor

/rebuild

@l0rinc
Copy link
Contributor Author
l0rinc commented Dec 20, 2016

@SethTisue, thanks for your patience, this build seems to fail again for some weird reason too: https://scala-ci.typesafe.com/job/scala-2.11.x-validate-publish-core/2978/console

@adriaanm
Copy link
Contributor

/rebuild

@adriaanm
Copy link
Contributor

(The weird compiler crash is tracked as scala/scala-dev#251)

@adriaanm
Copy link
Contributor
adriaanm commented Jan 4, 2017

Rebased in #5626

@adriaanm adriaanm closed this Jan 4, 2017
@l0rinc l0rinc deleted the HashTable_2.12_backport branch January 4, 2017 20:06
@SethTisue SethTisue removed this from the 2.11.9 milestone Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0