8000 Changed HashMap.getOrElseUpdate to only calculate the index once by l0rinc · Pull Request #5528 · scala/scala · GitHub
[go: up one dir, main page]

Skip to content

Changed HashMap.getOrElseUpdate to only calculate the index once #5528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in 8000 to your account

Merged
merged 3 commits into from
Nov 22, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions src/library/scala/collection/mutable/HashMap.scala
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,37 @@ extends AbstractMap[A, B]
else Some(e.value)
}

override def getOrElseUpdate(key: A, defaultValue: => B): B = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the element was missing from the map, the hash code (with some added variance) was calculated first for verification, and again for the put operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a binary-compatible change. @retronym - is this? I would not intuitively have thought it would be with the new trait encoding, but I'm not sure of the new rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ichoran It should be binary compatible. A class gets a forwarder method that delegates to the static implementation method of the interface. Adding an override merely replaces the implementation of the class method without any changes to the signatures. I'll add a proper test case for this to MiMa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala.collection.mutable.HashMap::getOrElseUpdate (48 bytes) inline (hot)

val i = index(elemHashCode(key))
val entry = findEntry(key, i)
if (entry != null) entry.value
else addEntry(createNewEntry(key, defaultValue), i)
}

/* inlined HashTable.findEntry0 to preserve its visibility */
private[this] def findEntry(key: A, h: Int): Entry = {
var e = table(h).asInstanceOf[Entry]
while (notFound(key, e))
e = e.next
e
}
private[this] def notFound(key: A, e: Entry): Boolean = (e != null) && !elemEquals(e.key, key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted to make findEntry inlinable (and the method more readable):
scala.collection.mutable.HashMap::findEntry (32 bytes) inline (hot)


/* inlined HashTable.addEntry0 to preserve its visibility */
private[this] def addEntry(e: Entry, h: Int): B = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala.collection.mutable.HashMap::addEntry (30 bytes) inline (hot)

if (tableSize >= threshold) addEntry(e)
else addEntry0(e, h)
e.value
}

/* extracted to make addEntry inlinable */
private[this] def addEntry0(e: Entry, h: Int) {
e.next = table(h).asInstanceOf[Entry]
table(h) = e
tableSize += 1
nnSizeMapAdd(h)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks fine, but what about using findOrAddEntry in HashTable? Does that still show the same speed improvement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is very good, but we don't have a value in HashTable's Entry, only key and next.

The other alternative would be to call findOrAddEntry directly, but then defaultValue would be evaluated twice, which should probably be avoided, right?

override def getOrElseUpdate(key: A, defaultValue: => B): B = {
  val e = findOrAddEntry(key, defaultValue)
  if (e ne null) e.value else defaultValue
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which should probably be avoided, right?

Correct. Also, we mustn't evaluate defaultValue if the key already exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't read the signatures correctly. It's not evaluating twice which is a problem; it's evaluating more than zero times when the entry exists. You're right; we can't do it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
In this particular case I think it cannot be evaluated once, only zero or two times (doesn't change the fact).


override def put(key: A, value: B): Option[B] = {
val e = findOrAddEntry(key, value)
if (e eq null) None
Expand Down
7 changes: 2 additions & 5 deletions test/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@ that makes use of the [SBT plugin](https://github.com/ktoso/sbt-jmh) for [JMH](h

## Running a benchmark

The benchmarks require first building Scala into `../../build/pack` with `ant`.
If you want to build with `sbt dist/mkPack` instead,
you'll need to change `scalaHome` in this project.
The benchmarks require first building Scala into `../../build/pack`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbt does it correctly also, no need for ant anmore


You'll then need to know the fully-qualified name of the benchmark runner class.
The benchmarking classes are organized under `src/main/scala`,
Expand All @@ -18,8 +16,7 @@ Using this example, one would simply run

jmh:runMain scala.collection.mutable.OpenHashMapRunner
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how to run only a single benchmark, sbt jmh:run runs everything, even if I provide a pattern after it :/


in SBT.
SBT should be run _from this directory_.
in SBT, run _from this directory_ (`test/benchmarks`).

The JMH results can be found under `target/jmh-results/`.
`target` gets deleted on an SBT `clean`,
Expand Down
4 changes: 2 additions & 2 deletions test/benchmarks/build.sbt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
scalaHome := Some(file("../../build/pack"))
scalaVersion := "2.12.0-dev"
scalaVersion := "2.12.1-dev"
scalacOptions ++= Seq("-feature", "-opt:l:classpath")

lazy val root = (project in file(".")).
enablePlugins(JmhPlugin).
settings(
name := "test-benchmarks",
version := "0.0.1",
libraryDependencies += "org.openjdk.jol" % "jol-core" % "0.4"
libraryDependencies += "org.openjdk.jol" % "jol-core" % "0.6"
)
2 changes: 1 addition & 1 deletion test/benchmarks/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0")
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.16")
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.17")
Copy link
Contributor Author
@l0rinc l0rinc Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a PR to update SBT-JMH, it was accepted and the new version containing the latest JMH was included here

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
package scala.collection.immutable

import org.openjdk.jmh.annotations._
import org.openjdk.jmh.infra._
import org.openjdk.jmh.runner.IterationType
import benchmark._
import java.util.concurrent.TimeUnit

@BenchmarkMode(Array(Mode.AverageTime))
@Fork(2)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class VectorMapBenchmark {
@Param(Array("10", "100", "1000"))
var size: Int = _

var values: Vector[Any] = _

@Setup(Level.Trial) def initKeys(): Unit = {
values = (0 to size).map(i => (i % 4) match {
case 0 => i.toString
case 1 => i.toChar
case 2 => i.toDouble
case 3 => i.toInt
}).toVector
}

@Benchmark def groupBy = values.groupBy(_.getClass)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package scala.collection.mutable

import org.openjdk.jmh.annotations._
import org.openjdk.jmh.infra._
import org.openjdk.jmh.runner.IterationType
import benchmark._
import java.util.concurrent.TimeUnit

import scala.collection.mutable

@BenchmarkMode(Array(Mode.AverageTime))
@Fork(2)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class HashMapBenchmark {
@Param(Array("10", "100", "1000"))
var size: Int = _

var existingKeys: Array[Any] = _
var missingKeys: Array[Any] = _

@Setup(Level.Trial) def initKeys(): Unit = {
existingKeys = (0 to size).map(i => (i % 4) match {
case 0 => i.toString
case 1 => i.toChar
case 2 => i.toDouble
case 3 => i.toInt
}).toArray
missingKeys = (size to 2 * size).toArray
}

var map = new mutable.HashMap[Any, Any]

@Setup(Level.Invocation) def initializeMutable = existingKeys.foreach(v => map.put(v, v))

@TearDown(Level.Invocation) def tearDown = map.clear()

@Benchmark def getOrElseUpdate(bh: Blackhole): Unit = {
var i = 0;
while (i < size) {
bh.consume(map.getOrElseUpdate(existingKeys(i), -1))
bh.consume(map.getOrElseUpdate(missingKeys(i), -1))
i += 1
}
}

@Benchmark def get(bh: Blackhole): Unit = {
Copy link
Contributor Author
@l0rinc l0rinc Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added get and put also, even though they weren't affected by this commit

var i = 0;
while (i < size) {
bh.consume(map.get(existingKeys(i), -1))
bh.consume(map.get(missingKeys(i), -1))
i += 1
}
}

@Benchmark def put(bh: Blackhole): Any = {
var map = new mutable.HashMap[Any, Any]

var i = 0;
while (i < size) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for comprehension to while for better SNR

map.put(existingKeys(i), i)
i += 1
}

map
}
}
0