8000 Immutable TreeMap/TreeSet performance (SI-5331) by erikrozendaal · Pull Request #82 · scala/scala · GitHub
[go: up one dir, main page]

Skip to content

Immutable TreeMap/TreeSet performance (SI-5331) #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Feb 16, 2012

Conversation

erikrozendaal
Copy link
Contributor

As discussed on the scala-internals mailing list. There are some further improvements related to performance and code.

Note: This breaks test/files/run/t2873.scala, since it uses RedBlack#Empty for testing for the absence of a compiler bug. I don't know how to fix this test so that it still tests for the same compiler bug.

This turns iterator creation from an O(n) operation into an
O(log n) operation. Unfortunately, it halves actual iteration
speed (consuming the iterator fully), probably due to the many
by-name closures that are needed.
performance is much better than '++' based iterator.
to ensure TreeSet/TreeMap 'range' operations are O(log n) instead
of O(n).
TreeMap/TreeSet by splitting the underlying RedBlack tree. This
makes the operation O(log n) instead of O(n) and allows more
structural sharing.
changes the operation from O(n log n) to O(n) and allows for more
structural sharing.
Use ArrayStack instead of Stack in TreeIterator for slightly increased
performance.
@paulp
Copy link
Contributor
paulp commented Dec 28, 2011

That's awesome. Have you signed the SCA? http://www.scala-lang.org/sites/default/files/contributor_agreement.pdf I can't merge this much work until that's in place.

@erikrozendaal
Copy link
Contributor Author

Thanks for reminding me. Just mailed the signed SCA to Danielle at EPFL according to the instructions.

def update[B1 >: B](k: A, v: B1)(implicit ordering: Ordering[A]): Tree[A, B1] = blacken(upd(k, v))
def delete(k: A)(implicit ordering: Ordering[A]): Tree[A, B] = blacken(del(k))
def range(from: Option[A], until: Option[A])(implicit ordering: Ordering[A]): Tree[A, B] = blacken(rng(from, until))
def foreach[U](f: ((A, B)) => U)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. By passing the Ordering on each method, you make it possible for the tree to be inconsistent. Sure, someone has to work at it, and having the class private helps prevent such misuse, but the very possibility indicates the ordering is in the wrong place. I really dislike this design.

Instead, the tree itself should have an Ordering (and getting A back) 8000 , which kind of gets back to RedBlack being a class and its subclasses having a hidden pointer to it -- which would be required to access the Ordering. However, as long as TreeMap and TreeSet don't extend RedBlack, it shouldn't be a problem. RedBlack becomes just a store for the Ordering shared by the nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree w Daniel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was another reason I feel RedBlack (as it currently is in this pull request) should be private[immutable]. This way TreeMap/TreeSet can ensure a consistent Ordering[A] is always used.

Making RedBlack a class again could help with this. The costs are additional instances of this class (including the Empty nested object) and the additional $$outer pointer in the Tree nodes (this may or may not cause additional overhead, depending on alignment, etc). At least on MacOS X JDK 1.6.0_29 with compressed oops there was additional memory usage per node.

So definitely a tradeoff between performance and potential correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an understandable tradeoff. conserving memory is important for datastructures.

The class is marked as deprecated and no longer used by the TreeMap/TreeSet
implementation but is restored in case it was used by anyone else (since
it was not marked as private to the Scala collection library).

Renamed RedBlack.{Tree,RedTree,BlackTree} to Node, RedNode, and BlackNode
to work around name clash with RedBlack class.
@erikrozendaal
Copy link
Contributor Author

I restored the old implementation of RedBlack class in the commit erikrozendaal@f656142 and marked the class as deprecated.

This required renaming the {Red,Black}Tree classes in the RedBlack object to avoid naming conflicts. I renamed them to {Red,Black}Node. Another option is to rename the RedBlack object to something else instead (so it is no longer a companion of the deprecated RedBlack class).

Let me know what you think and I'll adjust it and add it to this pull request.

This more clearly separates the new implementation from the now
deprecated abstract class RedBlack and avoids naming conflicts
for the member classes.
@erikrozendaal
Copy link
Contributor Author

I added the null based implementation to this pull request. The old RedBlack class has also been restored and the new implementation is named RedBlackTree. I wonder if it is worth the effort to rewrite the history so that all binary compatible changes are at the start and the new RedBlackTree based implementation is added later. This would make integration with the next 2.9.x release easier. I could also just make a separate pull request for 2.9.x that only contains the binary compatible changes (mainly improvements to head/last/iterator/toStream).

@ijuma
Copy link
Contributor
ijuma commented Jan 7, 2012

The fact that micro-benchmarks on HotSpot are hard is more reason not to break the rules. There is enough uncertainty as it is without adding more. Also, if you want to reduce overhead, do not use Range (use a while loop). At least until the optimization efforts on that are finished.

Also simplified implementation of span to just use splitAt.
@erikrozendaal
Copy link
Contributor Author

Good point! I'll rerun the benchmarks using the following benchmark method:

  def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block: TreeMap[A, B] => Int => Int): Unit = {
    print("%-25s: ".format(name))
    val f = block(tree)
    var result = 0
    val elapsed = time {
      var i = 0
      while (i < count) {
        result += f(i)
        i += 1
      }
    }
    println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per second): result = %s".format(tree.size, count, elapsed, count / elapsed, result))
  }

    val benches = Map[String, (TreeMap[java.lang.Integer, java.lang.Integer] => Int => Int)](
      "lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
      "add" -> { tree => n => tree.updated(rnd(n), n).size },
      "add-remove" -> { tree => n => (tree.updated(rnd(n), n) - rnd(n)).size },
      "head" -> { tree => n => tree.head._1 },
      "last" -> { tree => n => tree.last._1 },
      "tail" -> { tree => n => tree.tail.size },
      "init" -> { tree => n => tree.init.size },
      "iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
      "iterator.next" -> { tree => n => tree.iterator.next._1 },
      "iterator.size" -> { tree => n => tree.iterator.size },
      // etc...
  )

I'll also run one benchmark at a time (in a single JVM), this seems to give best/most consistent results.

@dcsobral
Copy link
Contributor
dcsobral commented Jan 7, 2012

You should use caliper to benchmark. Look at the fur foreach benchmark
project on my github as a basis. It's easy to compare multiple compiler
versions with it too.
Em 07/01/2012 16:50, "Erik Rozendaal" <
reply@reply.github.com>
escreveu:

Good point! I'll rerun the benchmarks using the following benchmark method:

 def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block:
TreeMap[A, B] => Int => Int): Unit = {
   print("%-25s: ".format(name))
   val f = block(tree)
   var result = 0
   val elapsed = time {
     var i = 0
     while (i < count) {
       result += f(i)
       i += 1
     }
   }
   println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per
second): result = %s".format(tree.size, count, elapsed, count / elapsed,
result))
 }

   val benches = Map[String, (TreeMap[java.lang.Integer,
java.lang.Integer] => Int => Int)](
     "lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
     "add" -> { tree => n => tree.updated(rnd(n), n).size },
     "add-remove" -> { tree => n => (tree.updated(rnd(n), n) -
rnd(n)).size },
     "head" -> { tree => n => tree.head._1 },
     "last" -> { tree => n => tree.last._1 },
     "tail" -> { tree => n => tree.tail.size },
     "init" -> { tree => n => tree.init.size },
     "iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
     "iterator.next" -> { tree => n => tree.iterator.next._1 },
     "iterator.size" -> { tree => n => tree.iterator.size },
     // etc...
 )

I'll also run one benchmark at a time (in a single JVM), this seems to
give best/most consistent results.


Reply to this email directly or view it on GitHub:
#82 (comment)

@erikrozendaal
Copy link
Contributor Author

Here are the updated benchmark results. Complete results can be found at https://gist.github.com/1576263 (except for the java.util.TreeMap output, which accidentally overwrote). Numbers changed, sometimes radically, but I think the main conclusions are still the same.

#82with nulls, use getterswith nulls, inlined gettersjava.util.TreeMapimmutable.HashMap
lookup3,801,7797,362,1957,397,6777,572,08524,217,850
add1,192,2011,794,5932,427,5494,528,338
add & remove392,738867,4021,016,9173,528,6672,465,620
head28,917,10340,490,92861,891,42757,270,76041,516,223
last25,028,03632,569,79352,351,47656,938,347
iterator (create)4,667,90327,797,58427,981,03472,102,692366,207,914
iterator (consume first)4,567,21124,348,63424,282,58037,049,43242,535,377
foreach7,22812,54112,6949,231
iterator (consume all)3,58310,0529,8959,1476,202
drop(5)256,134632,225696,058
take(5)2,044,9544,463,0867,529,279
splitAt(size / 2)188,260367,508484,105
range(-1e8, 1e8).head382,497693,286880,98112,473,028

@erikrozendaal
Copy link
Contributor Author

Here are the results for OpenJDK build 1.7.0-u4-b05-20120106. It's nice to see the pattern match code got a decent speed boost (good news for more idiomatic Scala code with a lot of pattern matching). Also inlining of field access has less effect now. Still, the null based code is still the faster version of RedBlack.

PS Don't forget that the iterator based comparison is not fair between "#82" and the null based versions because additional algorithmic improvements were made in the meantime.
PPS All these numbers are running in 64-bit mode with compressed OOPS enabled, just like the previous 1.6.0_29 based results.

#82 (pattern matching, singleton empty)with nulls, use getterswith nulls, inlined gettersjava.util.TreeMapimmutable.HashMap
lookup4,507,1307,299,7007,240,0647,181,13829,847,773
add1,746,6362,534,2942,553,9124,696,720
add & remove756,1581,072,5701,060,5223,496,4552,608,751
head25,772,81543,935,45358,751,12259,795,62343,097,982
last22,547,78337,729,65352,286,15859,816,398
iterator (create)5,353,56428,918,38628,923,35473,329,891375,772,141
iterator (consume first)5,572,96423,698,29223,899,98535,495,58642,454,982
foreach8,05412,52312,4119,541
iterator (consume all)5,27411,10513,52110,5748,997
drop(5)394,074749,614698,918
take(5)3,680,5817,920,0217,778,135
splitAt(size / 2)351,272500,936527,641
range(-1e8, 1e8).head645,592945,271958,67525,801,491

@pavelpavlov
Copy link
Contributor

Erik,

Please take a look at 'to' method implementation.
From the 4 subrange methods defined in scala.collection.generic.Sorted three (from, range, until) are implemented as single call to rangeImpl unlike 'to' which require two calls to 'rangeImpl' + creating and twice advancing subtree iterator in the worst case.

I wonder if it's worth to reimplement this method in TreeMap/TreeSet for better performance?

Also, it would be interesting to benchmark these four methods, especially "to(5)" vs. "until(6)".
I believe fast subrange creation is one of important scenarios for use of TreeSet/TreeMap.

Performance of `to` and `until` is now the same.
@erikrozendaal
Copy link
Contributor Author

Hi Pavel,

I benchmarked the range operations and improved the implementation a bit such that to now has the same performance as until.

old tooptimized tojava.util.TreeMap
range(-1e8, 1e8).head984,096938,33825,801,491
from(1).head1,339,8411,214,76123,861,585
to(0).last551,2141,132,46728,157,299
until(1).last1,148,8341,137,26529,828,129

As you can see the java.util.TreeMap version is still much faster due to the use of views instead of building new trees. So to really get competitive with the Java version we need to support views (or ranged iterators, or something) and probably also implement methods like lower/floor/ceiling/higher, like java.util.NavigableMap.

@pavelpavlov
Copy link
Contributor

Hi Erik,

It's very cool that you could achieve such impressive performance improvements for your scenario.
However, I'm thinking about another performance-sensitive scenario: when we have zillion short-lived maps of moderate size each (say from 1 to 100 elements) instead of your scenario of having one map with zillion elements.

In that case another overheads come to the fore.
For example, looking into your last commit, you create 4 temporary objects on each call to range: two Options and two functions. Replacing option test here with null drops 2 of them and also gives faster calls to after and before in rng.

Also, you check ***Inclusive invariants each time when after or before is called. It's better to create specialized closure for each case in range.

In general, it would be great to benchmark the scenario I mentionned above and get the real numbers.

Regards,
Pavel

This avoids unnecessary allocation of Option and Function objects,
mostly helping performance of small trees.
@paulp
Copy link
Contributor
paulp commented Jan 23, 2012

FYI since everyone is doing such a fine job shaking the tree, I figure I'll let that continue a while longer. (In other words, I haven't looked seriously at this patch and intend to let it mature as far as it can before I do.)

@erikrozendaal
Copy link
Contributor Author

Pavel,

I've some more improvements to range/from/to/until and take/drop/slice, basically by specializing RedBlackTree.rng for each of these cases. I'll try to get the benchmark numbers and patches posted here tonight CET.

This mainly helps performance when comparing keys is expensive.
@erikrozendaal
Copy link
Contributor Author

Here are the benchmark numbers related to the last few commits. I had to change the benchmarks somewhat to get decent numbers with small trees. So instead of getting the head or last element I just get the size of the resulting tree. For some benchmarks the random seed also had to be changed. All noted in the table below.

masterdefault to (f26f610)optimized to (00b5cb8)specialized from/to/until/range (7824dbd)specialized drop/take/slice (78374f3)
Size 10000, Seed 1234
range(-1e8, 1e8).head984,096938,3381,274,9071,274,907
from(1).head1,339,8411,214,7611,358,4721,358,472
to(0).last551,2141,132,4671,208,3181,208,318
until(1).last1,148,8341,137,2651,246,5511,246,551
Size 10000, Seed 1234
drop(5).size762,210761,143783,775825,173
take(5).size8,613,2326,491,0559,433,66914,746,586
splitAt(size / 2).size578,984504,674616,517654,141
slice(3, size - 3).size373,338348,044397,642414,725
Size 10, Seed 1234
range(-1e9, 1e9).size8,213,66111,389,1618,291,25014,930,02815,113,249
from(1).size8,630,65713,060,92711,276,41514,540,51214,371,185
to(0).size1,052,9893,840,60011,524,85614,487,38714,262,957
until(1).size7,775,26111,946,96411,365,58614,217,50214,295,698
Size 10, Seed 1235
drop(5).size281,20211,900,66510,404,72012,763,17614,397,612
take(5).size347,41611,831,71810,036,92715,386,88016,705,482
splitAt(size / 2).size587,8405,683,2904,796,2277,146,4728,209,405
slice(3, size - 3).size320,6066,529,8635,025,3849,531,22511,963,432

I think I'm pretty much done with performance optimizations. I think it is now more interesting to discuss how we can get this patch into 2.10 and also see if there are binary compatible parts that can be put into 2.9. Maybe at least deprecate RedBlack starting with 2.9.2?

@dcsobral
Copy link
Contributor

A migration warning on TreeSet/TreeMap is necessary as well. I don't think there's any need to rush deprecation to 2.9.2 -- it can be deprecated on 2.10.

@erikrozendaal
Copy link
Contributor Author

So the migration warning should go into 2.9.x branch? Or 2.10? Warning about RedBlack inheritance, isSmaller, etc?

@dcsobral
Copy link
Contributor

Actually, I've changed my mind. I was thinking about warning about the fact that TreeSet no longer extends RedBlack, but any dependency on it will result in a compile error, so there's no need for it.

@erikrozendaal
Copy link
Contributor Author

So, anything else that needs to be done before merging this into 2.10?

@paulp
Copy link
Contributor
paulp commented Feb 14, 2012

I will interpret the petering out of discussion as implied endorsement by the involved parties if someone doesn't say otherwise. (It's still possible I personally will discover objections, but let's hope not.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0