Immutable TreeMap/TreeSet performance (SI-5331) #82

erikrozendaal · 2011-12-28T12:21:43Z

As discussed on the scala-internals mailing list. There are some further improvements related to performance and code.

Note: This breaks test/files/run/t2873.scala, since it uses RedBlack#Empty for testing for the absence of a compiler bug. I don't know how to fix this test so that it still tests for the same compiler bug.

This turns iterator creation from an O(n) operation into an O(log n) operation. Unfortunately, it halves actual iteration speed (consuming the iterator fully), probably due to the many by-name closures that are needed.

performance is much better than '++' based iterator.

TreeMap/TreeSet.

to ensure TreeSet/TreeMap 'range' operations are O(log n) instead of O(n).

TreeMap/TreeSet by splitting the underlying RedBlack tree. This makes the operation O(log n) instead of O(n) and allows more structural sharing.

changes the operation from O(n log n) to O(n) and allows for more structural sharing.

type parameter A.

Use ArrayStack instead of Stack in TreeIterator for slightly increased performance.

already does so).

…etc).

paulp · 2011-12-28T17:12:08Z

That's awesome. Have you signed the SCA? http://www.scala-lang.org/sites/default/files/contributor_agreement.pdf I can't merge this much work until that's in place.

erikrozendaal · 2011-12-28T19:07:14Z

Thanks for reminding me. Just mailed the signed SCA to Danielle at EPFL according to the instructions.

dcsobral · 2011-12-28T23:55:53Z

src/library/scala/collection/immutable/RedBlack.scala

+    def update[B1 >: B](k: A, v: B1)(implicit ordering: Ordering[A]): Tree[A, B1] = blacken(upd(k, v))
+    def delete(k: A)(implicit ordering: Ordering[A]): Tree[A, B] = blacken(del(k))
+    def range(from: Option[A], until: Option[A])(implicit ordering: Ordering[A]): Tree[A, B] = blacken(rng(from, until))
+    def foreach[U](f: ((A, B)) =>  U)


This is wrong. By passing the Ordering on each method, you make it possible for the tree to be inconsistent. Sure, someone has to work at it, and having the class private helps prevent such misuse, but the very possibility indicates the ordering is in the wrong place. I really dislike this design.

Instead, the tree itself should have an Ordering (and getting A back) 8000 , which kind of gets back to RedBlack being a class and its subclasses having a hidden pointer to it -- which would be required to access the Ordering. However, as long as TreeMap and TreeSet don't extend RedBlack, it shouldn't be a problem. RedBlack becomes just a store for the Ordering shared by the nodes.

I agree w Daniel

This was another reason I feel RedBlack (as it currently is in this pull request) should be private[immutable]. This way TreeMap/TreeSet can ensure a consistent Ordering[A] is always used.

Making RedBlack a class again could help with this. The costs are additional instances of this class (including the Empty nested object) and the additional $$outer pointer in the Tree nodes (this may or may not cause additional overhead, depending on alignment, etc). At least on MacOS X JDK 1.6.0_29 with compressed oops there was additional memory usage per node.

So definitely a tradeoff between performance and potential correctness.

That's an understandable tradeoff. conserving memory is important for datastructures.

on pattern matching for updating the tree.

The class is marked as deprecated and no longer used by the TreeMap/TreeSet implementation but is restored in case it was used by anyone else (since it was not marked as private to the Scala collection library). Renamed RedBlack.{Tree,RedTree,BlackTree} to Node, RedNode, and BlackNode to work around name clash with RedBlack class.

erikrozendaal · 2012-01-06T22:26:31Z

I restored the old implementation of RedBlack class in the commit erikrozendaal@f656142 and marked the class as deprecated.

This required renaming the {Red,Black}Tree classes in the RedBlack object to avoid naming conflicts. I renamed them to {Red,Black}Node. Another option is to rename the RedBlack object to something else instead (so it is no longer a companion of the deprecated RedBlack class).

Let me know what you think and I'll adjust it and add it to this pull request.

This more clearly separates the new implementation from the now deprecated abstract class RedBlack and avoids naming conflicts for the member classes.

erikrozendaal · 2012-01-07T15:03:04Z

I added the null based implementation to this pull request. The old RedBlack class has also been restored and the new implementation is named RedBlackTree. I wonder if it is worth the effort to rewrite the history so that all binary compatible changes are at the start and the new RedBlackTree based implementation is added later. This would make integration with the next 2.9.x release easier. I could also just make a separate pull request for 2.9.x that only contains the binary compatible changes (mainly improvements to head/last/iterator/toStream).

ijuma · 2012-01-07T16:10:06Z

The fact that micro-benchmarks on HotSpot are hard is more reason not to break the rules. There is enough uncertainty as it is without adding more. Also, if you want to reduce overhead, do not use Range (use a while loop). At least until the optimization efforts on that are finished.

Also simplified implementation of span to just use splitAt.

erikrozendaal · 2012-01-07T18:50:50Z

Good point! I'll rerun the benchmarks using the following benchmark method:

  def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block: TreeMap[A, B] => Int => Int): Unit = {
    print("%-25s: ".format(name))
    val f = block(tree)
    var result = 0
    val elapsed = time {
      var i = 0
      while (i < count) {
        result += f(i)
        i += 1
      }
    }
    println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per second): result = %s".format(tree.size, count, elapsed, count / elapsed, result))
  }

    val benches = Map[String, (TreeMap[java.lang.Integer, java.lang.Integer] => Int => Int)](
      "lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
      "add" -> { tree => n => tree.updated(rnd(n), n).size },
      "add-remove" -> { tree => n => (tree.updated(rnd(n), n) - rnd(n)).size },
      "head" -> { tree => n => tree.head._1 },
      "last" -> { tree => n => tree.last._1 },
      "tail" -> { tree => n => tree.tail.size },
      "init" -> { tree => n => tree.init.size },
      "iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
      "iterator.next" -> { tree => n => tree.iterator.next._1 },
      "iterator.size" -> { tree => n => tree.iterator.size },
      // etc...
  )

I'll also run one benchmark at a time (in a single JVM), this seems to give best/most consistent results.

dcsobral · 2012-01-07T19:33:31Z

You should use caliper to benchmark. Look at the fur foreach benchmark
project on my github as a basis. It's easy to compare multiple compiler
versions with it too.
Em 07/01/2012 16:50, "Erik Rozendaal" <
reply@reply.github.com>
escreveu:

Good point! I'll rerun the benchmarks using the following benchmark method:

 def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block:
TreeMap[A, B] => Int => Int): Unit = {
   print("%-25s: ".format(name))
   val f = block(tree)
   var result = 0
   val elapsed = time {
     var i = 0
     while (i < count) {
       result += f(i)
       i += 1
     }
   }
   println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per
second): result = %s".format(tree.size, count, elapsed, count / elapsed,
result))
 }

   val benches = Map[String, (TreeMap[java.lang.Integer,
java.lang.Integer] => Int => Int)](
     "lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
     "add" -> { tree => n => tree.updated(rnd(n), n).size },
     "add-remove" -> { tree => n => (tree.updated(rnd(n), n) -
rnd(n)).size },
     "head" -> { tree => n => tree.head._1 },
     "last" -> { tree => n => tree.last._1 },
     "tail" -> { tree => n => tree.tail.size },
     "init" -> { tree => n => tree.init.size },
     "iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
     "iterator.next" -> { tree => n => tree.iterator.next._1 },
     "iterator.size" -> { tree => n => tree.iterator.size },
     // etc...
 )

I'll also run one benchmark at a time (in a single JVM), this seems to
give best/most consistent results.

Reply to this email directly or view it on GitHub:
#82 (comment)

erikrozendaal · 2012-01-07T22:26:26Z

Here are the updated benchmark results. Complete results can be found at https://gist.github.com/1576263 (except for the java.util.TreeMap output, which accidentally overwrote). Numbers changed, sometimes radically, but I think the main conclusions are still the same.

	#82	with nulls, use getters	with nulls, inlined getters	java.util.TreeMap	immutable.HashMap
lookup	3,801,779	7,362,195	7,397,677	7,572,085	24,217,850
add	1,192,201	1,794,593	2,427,549		4,528,338
add & remove	392,738	867,402	1,016,917	3,528,667	2,465,620
head	28,917,103	40,490,928	61,891,427	57,270,760	41,516,223
last	25,028,036	32,569,793	52,351,476	56,938,347
iterator (create)	4,667,903	27,797,584	27,981,034	72,102,692	366,207,914
iterator (consume first)	4,567,211	24,348,634	24,282,580	37,049,432	42,535,377
foreach	7,228	12,541	12,694		9,231
iterator (consume all)	3,583	10,052	9,895	9,147	6,202
drop(5)	256,134	632,225	696,058
take(5)	2,044,954	4,463,086	7,529,279
splitAt(size / 2)	188,260	367,508	484,105
range(-1e8, 1e8).head	382,497	693,286	880,981	12,473,028

erikrozendaal · 2012-01-08T12:15:44Z

Here are the results for OpenJDK build 1.7.0-u4-b05-20120106. It's nice to see the pattern match code got a decent speed boost (good news for more idiomatic Scala code with a lot of pattern matching). Also inlining of field access has less effect now. Still, the null based code is still the faster version of RedBlack.

PS Don't forget that the iterator based comparison is not fair between "#82" and the null based versions because additional algorithmic improvements were made in the meantime.
PPS All these numbers are running in 64-bit mode with compressed OOPS enabled, just like the previous 1.6.0_29 based results.

	#82 (pattern matching, singleton empty)	with nulls, use getters	with nulls, inlined getters	java.util.TreeMap	immutable.HashMap
lookup	4,507,130	7,299,700	7,240,064	7,181,138	29,847,773
add	1,746,636	2,534,294	2,553,912		4,696,720
add & remove	756,158	1,072,570	1,060,522	3,496,455	2,608,751
head	25,772,815	43,935,453	58,751,122	59,795,623	43,097,982
last	22,547,783	37,729,653	52,286,158	59,816,398
iterator (create)	5,353,564	28,918,386	28,923,354	73,329,891	375,772,141
iterator (consume first)	5,572,964	23,698,292	23,899,985	35,495,586	42,454,982
foreach	8,054	12,523	12,411		9,541
iterator (consume all)	5,274	11,105	13,521	10,574	8,997
drop(5)	394,074	749,614	698,918
take(5)	3,680,581	7,920,021	7,778,135
splitAt(size / 2)	351,272	500,936	527,641
range(-1e8, 1e8).head	645,592	945,271	958,675	25,801,491

pavelpavlov · 2012-01-14T19:24:52Z

Erik,

Please take a look at 'to' method implementation.
From the 4 subrange methods defined in scala.collection.generic.Sorted three (from, range, until) are implemented as single call to rangeImpl unlike 'to' which require two calls to 'rangeImpl' + creating and twice advancing subtree iterator in the worst case.

I wonder if it's worth to reimplement this method in TreeMap/TreeSet for better performance?

Also, it would be interesting to benchmark these four methods, especially "to(5)" vs. "until(6)".
I believe fast subrange creation is one of important scenarios for use of TreeSet/TreeMap.

Performance of `to` and `until` is now the same.

erikrozendaal · 2012-01-15T13:42:30Z

Hi Pavel,

I benchmarked the range operations and improved the implementation a bit such that to now has the same performance as until.

	old to	optimized to	java.util.TreeMap
range(-1e8, 1e8).head	984,096	938,338	25,801,491
from(1).head	1,339,841	1,214,761	23,861,585
to(0).last	551,214	1,132,467	28,157,299
until(1).last	1,148,834	1,137,265	29,828,129

As you can see the java.util.TreeMap version is still much faster due to the use of views instead of building new trees. So to really get competitive with the Java version we need to support views (or ranged iterators, or something) and probably also implement methods like lower/floor/ceiling/higher, like java.util.NavigableMap.

pavelpavlov · 2012-01-21T15:45:15Z

Hi Erik,

It's very cool that you could achieve such impressive performance improvements for your scenario.
However, I'm thinking about another performance-sensitive scenario: when we have zillion short-lived maps of moderate size each (say from 1 to 100 elements) instead of your scenario of having one map with zillion elements.

In that case another overheads come to the fore.
For example, looking into your last commit, you create 4 temporary objects on each call to range: two Options and two functions. Replacing option test here with null drops 2 of them and also gives faster calls to after and before in rng.

Also, you check ***Inclusive invariants each time when after or before is called. It's better to create specialized closure for each case in range.

In general, it would be great to benchmark the scenario I mentionned above and get the real numbers.

Regards,
Pavel

This avoids unnecessary allocation of Option and Function objects, mostly helping performance of small trees.

paulp · 2012-01-23T06:22:10Z

FYI since everyone is doing such a fine job shaking the tree, I figure I'll let that continue a while longer. (In other words, I haven't looked seriously at this patch and intend to let it mature as far as it can before I do.)

erikrozendaal · 2012-01-23T08:48:21Z

Pavel,

I've some more improvements to range/from/to/until and take/drop/slice, basically by specializing RedBlackTree.rng for each of these cases. I'll try to get the benchmark numbers and patches posted here tonight CET.

This mainly helps performance when comparing keys is expensive.

erikrozendaal · 2012-01-24T20:52:33Z

Here are the benchmark numbers related to the last few commits. I had to change the benchmarks somewhat to get decent numbers with small trees. So instead of getting the head or last element I just get the size of the resulting tree. For some benchmarks the random seed also had to be changed. All noted in the table below.

	master	default to (`f26f610`)	optimized to (`00b5cb8`)	specialized from/to/until/range (`7824dbd`)	specialized drop/take/slice (`78374f3`)
Size 10000, Seed 1234
range(-1e8, 1e8).head		984,096	938,338	1,274,907	1,274,907
from(1).head		1,339,841	1,214,761	1,358,472	1,358,472
to(0).last		551,214	1,132,467	1,208,318	1,208,318
until(1).last		1,148,834	1,137,265	1,246,551	1,246,551
Size 10000, Seed 1234
drop(5).size		762,210	761,143	783,775	825,173
take(5).size		8,613,232	6,491,055	9,433,669	14,746,586
splitAt(size / 2).size		578,984	504,674	616,517	654,141
slice(3, size - 3).size		373,338	348,044	397,642	414,725
Size 10, Seed 1234
range(-1e9, 1e9).size	8,213,661	11,389,161	8,291,250	14,930,028	15,113,249
from(1).size	8,630,657	13,060,927	11,276,415	14,540,512	14,371,185
to(0).size	1,052,989	3,840,600	11,524,856	14,487,387	14,262,957
until(1).size	7,775,261	11,946,964	11,365,586	14,217,502	14,295,698
Size 10, Seed 1235
drop(5).size	281,202	11,900,665	10,404,720	12,763,176	14,397,612
take(5).size	347,416	11,831,718	10,036,927	15,386,880	16,705,482
splitAt(size / 2).size	587,840	5,683,290	4,796,227	7,146,472	8,209,405
slice(3, size - 3).size	320,606	6,529,863	5,025,384	9,531,225	11,963,432

I think I'm pretty much done with performance optimizations. I think it is now more interesting to discuss how we can get this patch into 2.10 and also see if there are binary compatible parts that can be put into 2.9. Maybe at least deprecate RedBlack starting with 2.9.2?

dcsobral · 2012-01-25T14:50:58Z

A migration warning on TreeSet/TreeMap is necessary as well. I don't think there's any need to rush deprecation to 2.9.2 -- it can be deprecated on 2.10.

erikrozendaal · 2012-01-26T08:41:39Z

So the migration warning should go into 2.9.x branch? Or 2.10? Warning about RedBlack inheritance, isSmaller, etc?

dcsobral · 2012-01-26T11:53:44Z

Actually, I've changed my mind. I was thinking about warning about the fact that TreeSet no longer extends RedBlack, but any dependency on it will result in a compile error, so there's no need for it.

erikrozendaal · 2012-02-14T10:31:10Z

So, anything else that needs to be done before merging this into 2.10?

paulp · 2012-02-14T15:06:56Z

I will interpret the petering out of discussion as implied endorsement by the involved parties if someone doesn't say otherwise. (It's still possible I personally will discover objections, but let's hope not.)

erikrozendaal added 20 commits December 28, 2011 13:12

Use RedBlack.iterator to create iterators for TreeSet/TreeMap.

540ad02

This turns iterator creation from an O(n) operation into an O(log n) operation. Unfortunately, it halves actual iteration speed (consuming the iterator fully), probably due to the many by-name closures that are needed.

Use custom implementation for iterating over RedBlack trees. Raw

88ed930

performance is much better than '++' based iterator.

Optimized implementations of head/headOption/last/lastOption for

edcec03

TreeMap/TreeSet.

Optimized implementation of init/tail for TreeSet/TreeMap.

9cdede8

RedBlack.scala: Change count from 'def' to 'val' in NonEmpty tree

b7e6714

to ensure TreeSet/TreeMap 'range' operations are O(log n) instead of O(n).

Implemented drop/take/slice/splitAt/dropRight/takeRight for

95cb7bc

TreeMap/TreeSet by splitting the underlying RedBlack tree. This makes the operation O(log n) instead of O(n) and allows more structural sharing.

Implemented takeWhile/dropWhile/span to use tree splitting. This

8d67823

changes the operation from O(n log n) to O(n) and allows for more structural sharing.

Switched from isSmaller to ordering.

3f66061

Moved from implicit ordering value to implicit parameter.

7ec9b0b

Moved from Empty case object to case class in preparation of moving

a02a815

type parameter A.

Moved type parameter A from RedBlack to Tree.

418adc6

Changed abstract class RedBlack to singleton object.

6c0e036

Use single shared Empty instance across all RedBlack trees.

d2706db

Make sure the redblack test compiles and runs.

6b95074

Made RedBlack private to the scala.collection.immutable package.

b9699f9

Use ArrayStack instead of Stack in TreeIterator for slightly increased performance.

TreeMap/TreeSet no longer keep track of the size (the RedBlack tree

32171c2

already does so).

Performance improvements for iteration (foreach and iterator).

b421bba

Improved performance of RedBlack.NonEmpty.nth (helps take/drop/split/…

4a0c4bb

…etc).

Added some tests for TreeMap/TreeSet.

ad0b09c

Minimize number of calls to ordering.

c51bdea

dcsobral reviewed Dec 28, 2011
View reviewed changes

erikrozendaal added 6 commits January 2, 2012 15:55

Moved key/value/left/right fields up to NonEmpty class. Don't rely

6d8dca7

on pattern matching for updating the tree.

Implemented deletes without pattern matching.

82374ad

Implemented range without using pattern matching.

3dea251

Use null to represent empty trees. Removed Empty/NonEmpty classes.

5c05f66

Optimize foreach and iterators.

72ec0ac

Move nth method to RedBlack. Inline factories for tree nodes.

d735d0f

Renamed object RedBlack to RedBlackTree.

288874d

This more clearly separates the new implementation from the now deprecated abstract class RedBlack and avoids naming conflicts for the member classes.

Tests for takeWhile/dropWhile/span. 8000

e61075c

Also simplified implementation of span to just use splitAt.

erikrozendaal added 2 commits January 7, 2012 23:31

Fix silly copy-paste error.

8b3f984

Test for maximum height of red-black tree.

f26f610

Optimized implementation of TreeMap/TreeSet#to method.

00b5cb8

Performance of `to` and `until` is now the same.

Custom coded version of range/from/to/until.

7824dbd

This avoids unnecessary allocation of Option and Function objects, mostly helping performance of small trees.

erikrozendaal added 2 commits January 23, 2012 22:21

Custom implementations of drop/take/slice.

78374f3

This mainly helps performance when comparing keys is expensive.

Removed TODOs.

51667dc

paulp merged commit 51667dc into scala:master Feb 16, 2012

ViniciusMiana mentioned this pull request Feb 5, 2013

SI-6370 changed ListMap apply method to produce correct error message #2009

Closed

scabug mentioned this pull request Apr 7, 2017

Performance of immutable TreeMap/TreeSet implementations scala/bug#5331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Immutable TreeMap/TreeSet performance (SI-5331) #82

Immutable TreeMap/TreeSet performance (SI-5331) #82

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Immutable TreeMap/TreeSet performance (SI-5331) #82

Immutable TreeMap/TreeSet performance (SI-5331) #82

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!