-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Immutable TreeMap/TreeSet performance (SI-5331) #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This turns iterator creation from an O(n) operation into an O(log n) operation. Unfortunately, it halves actual iteration speed (consuming the iterator fully), probably due to the many by-name closures that are needed.
performance is much better than '++' based iterator.
TreeMap/TreeSet.
to ensure TreeSet/TreeMap 'range' operations are O(log n) instead of O(n).
TreeMap/TreeSet by splitting the underlying RedBlack tree. This makes the operation O(log n) instead of O(n) and allows more structural sharing.
changes the operation from O(n log n) to O(n) and allows for more structural sharing.
type parameter A.
Use ArrayStack instead of Stack in TreeIterator for slightly increased performance.
already does so).
That's awesome. Have you signed the SCA? http://www.scala-lang.org/sites/default/files/contributor_agreement.pdf I can't merge this much work until that's in place. |
Thanks for reminding me. Just mailed the signed SCA to Danielle at EPFL according to the instructions. |
def update[B1 >: B](k: A, v: B1)(implicit ordering: Ordering[A]): Tree[A, B1] = blacken(upd(k, v)) | ||
def delete(k: A)(implicit ordering: Ordering[A]): Tree[A, B] = blacken(del(k)) | ||
def range(from: Option[A], until: Option[A])(implicit ordering: Ordering[A]): Tree[A, B] = blacken(rng(from, until)) | ||
def foreach[U](f: ((A, B)) => U) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. By passing the Ordering on each method, you make it possible for the tree to be inconsistent. Sure, someone has to work at it, and having the class private helps prevent such misuse, but the very possibility indicates the ordering is in the wrong place. I really dislike this design.
Instead, the tree itself should have an Ordering (and getting A back) 8000 , which kind of gets back to RedBlack being a class and its subclasses having a hidden pointer to it -- which would be required to access the Ordering. However, as long as TreeMap and TreeSet don't extend RedBlack, it shouldn't be a problem. RedBlack becomes just a store for the Ordering shared by the nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree w Daniel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was another reason I feel RedBlack (as it currently is in this pull request) should be private[immutable]. This way TreeMap/TreeSet can ensure a consistent Ordering[A] is always used.
Making RedBlack a class again could help with this. The costs are additional instances of this class (including the Empty nested object) and the additional $$outer pointer in the Tree nodes (this may or may not cause additional overhead, depending on alignment, etc). At least on MacOS X JDK 1.6.0_29 with compressed oops there was additional memory usage per node.
So definitely a tradeoff between performance and potential correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an understandable tradeoff. conserving memory is important for datastructures.
on pattern matching for updating the tree.
The class is marked as deprecated and no longer used by the TreeMap/TreeSet implementation but is restored in case it was used by anyone else (since it was not marked as private to the Scala collection library). Renamed RedBlack.{Tree,RedTree,BlackTree} to Node, RedNode, and BlackNode to work around name clash with RedBlack class.
I restored the old implementation of RedBlack class in the commit erikrozendaal@f656142 and marked the class as deprecated. This required renaming the {Red,Black}Tree classes in the RedBlack object to avoid naming conflicts. I renamed them to {Red,Black}Node. Another option is to rename the RedBlack object to something else instead (so it is no longer a companion of the deprecated RedBlack class). Let me know what you think and I'll adjust it and add it to this pull request. |
This more clearly separates the new implementation from the now deprecated abstract class RedBlack and avoids naming conflicts for the member classes.
I added the null based implementation to this pull request. The old RedBlack class has also been restored and the new implementation is named RedBlackTree. I wonder if it is worth the effort to rewrite the history so that all binary compatible changes are at the start and the new RedBlackTree based implementation is added later. This would make integration with the next 2.9.x release easier. I could also just make a separate pull request for 2.9.x that only contains the binary compatible changes (mainly improvements to head/last/iterator/toStream). |
The fact that micro-benchmarks on HotSpot are hard is more reason not to break the rules. There is enough uncertainty as it is without adding more. Also, if you want to reduce overhead, do not use Range (use a while loop). At least until the optimization efforts on that are finished. |
Also simplified implementation of span to just use splitAt.
Good point! I'll rerun the benchmarks using the following benchmark method: def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block: TreeMap[A, B] => Int => Int): Unit = {
print("%-25s: ".format(name))
val f = block(tree)
var result = 0
val elapsed = time {
var i = 0
while (i < count) {
result += f(i)
i += 1
}
}
println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per second): result = %s".format(tree.size, count, elapsed, count / elapsed, result))
}
val benches = Map[String, (TreeMap[java.lang.Integer, java.lang.Integer] => Int => Int)](
"lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
"add" -> { tree => n => tree.updated(rnd(n), n).size },
"add-remove" -> { tree => n => (tree.updated(rnd(n), n) - rnd(n)).size },
"head" -> { tree => n => tree.head._1 },
"last" -> { tree => n => tree.last._1 },
"tail" -> { tree => n => tree.tail.size },
"init" -> { tree => n => tree.init.size },
"iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
"iterator.next" -> { tree => n => tree.iterator.next._1 },
"iterator.size" -> { tree => n => tree.iterator.size },
// etc...
) I'll also run one benchmark at a time (in a single JVM), this seems to give best/most consistent results. |
You should use caliper to benchmark. Look at the fur foreach benchmark
|
Here are the updated benchmark results. Complete results can be found at https://gist.github.com/1576263 (except for the java.util.TreeMap output, which accidentally overwrote). Numbers changed, sometimes radically, but I think the main conclusions are still the same.
|
Here are the results for OpenJDK build 1.7.0-u4-b05-20120106. It's nice to see the pattern match code got a decent speed boost (good news for more idiomatic Scala code with a lot of pattern matching). Also inlining of field access has less effect now. Still, the null based code is still the faster version of RedBlack. PS Don't forget that the iterator based comparison is not fair between "#82" and the null based versions because additional algorithmic improvements were made in the meantime.
|
Erik, Please take a look at 'to' method implementation. I wonder if it's worth to reimplement this method in TreeMap/TreeSet for better performance? Also, it would be interesting to benchmark these four methods, especially "to(5)" vs. "until(6)". |
Performance of `to` and `until` is now the same.
Hi Pavel, I benchmarked the range operations and improved the implementation a bit such that
As you can see the |
Hi Erik, It's very cool that you could achieve such impressive performance improvements for your scenario. In that case another overheads come to the fore. Also, you check In general, it would be great to benchmark the scenario I mentionned above and get the real numbers. Regards, |
This avoids unnecessary allocation of Option and Function objects, mostly helping performance of small trees.
FYI since everyone is doing such a fine job shaking the tree, I figure I'll let that continue a while longer. (In other words, I haven't looked seriously at this patch and intend to let it mature as far as it can before I do.) |
Pavel, I've some more improvements to |
This mainly helps performance when comparing keys is expensive.
Here are the benchmark numbers related to the last few commits. I had to change the benchmarks somewhat to get decent numbers with small trees. So instead of getting the
I think I'm pretty much done with performance optimizations. I think it is now more interesting to discuss how we can get this patch into 2.10 and also see if there are binary compatible parts that can be put into 2.9. Maybe at least deprecate |
A migration warning on TreeSet/TreeMap is necessary as well. I don't think there's any need to rush deprecation to 2.9.2 -- it can be deprecated on 2.10. |
So the migration warning should go into 2.9.x branch? Or 2.10? Warning about RedBlack inheritance, isSmaller, etc? |
Actually, I've changed my mind. I was thinking about warning about the fact that |
So, anything else that needs to be done before merging this into 2.10? |
I will interpret the petering out of discussion as implied endorsement by the involved parties if someone doesn't say otherwise. (It's still possible I personally will discover objections, but let's hope not.) |
As discussed on the scala-internals mailing list. There are some further improvements related to performance and code.
Note: This breaks test/files/run/t2873.scala, since it uses RedBlack#Empty for testing for the absence of a compiler bug. I don't know how to fix this test so that it still tests for the same compiler bug.