[backport] List.filter optimizations from 2.13.x #8226

retronym · 2019-07-15T06:20:48Z

Binary compatibilty constraints won't let us actually do this
as an override in List (we tried that originally but reverted until we reintroduced the optimization in 2.13.x)

But we are free to type-case List in the inherited implementation.

Binary compatibilty constraints won't let us actually do this as an override in `List` (we tried that originally but reverted.) But we are free to type-case List in the inherited implementation.

lrytz · 2019-07-15T11:58:27Z

Could you share your observations that motivate this PR? I'm just a bit hesitant about the impact of the additional type test because almost all collections inherit the default implementation.

NthPortal · 2019-07-15T17:58:44Z

src/library/scala/collection/TraversableLike.scala

-    b.result
+  private[this] def filterImplList[A](self: List[A], p: A => Boolean, isFlipped: Boolean): List[A] = {
+
+    // everything seen so far so far is not included


Suggested change

// everything seen so far so far is not included

// everything seen so far is not included

NthPortal

There are a few things that could be very slightly cleaned up, but it looks pretty good. I assume it was mostly copied straight from 2.13.

I share Lukas' concerns about the additional type test, though I'm also wondering how it compares to bimorphic or polymorphic dispatch (if the method was able to be defined on List). Or perhaps it would be like double polymorphic dispatch if filter/filterNot are already polymorphic.

src/library/scala/collection/TraversableLike.scala

NthPortal · 2019-07-15T22:28:52Z

src/library/scala/collection/TraversableLike.scala

+      if (l.isEmpty)
+        Nil
+      else {
+        val h = l.head


only used once - inline?

NthPortal · 2019-07-15T22:29:02Z

src/library/scala/collection/TraversableLike.scala

+      if (remaining.isEmpty)
+        start
+      else {
+        val x = remaining.head


only used once - inline?

src/library/scala/collection/TraversableLike.scala

NthPortal · 2019-07-15T22:43:33Z

src/library/scala/collection/TraversableLike.scala

+            nextToCopy = nextToCopy.tail
+          }
+          nextToCopy = next.tail
+          next = next.tail


should we assign one of these to the other rather than calling tail twice?

I'd rather not tweak the code I'm backporting here (unless we spot an error of course). I think JIT or the CPU caches will make this sort of micro opt unnecessary from a performance perspective.

retronym · 2019-07-16T00:14:13Z

I was looking at allocation hotspots from profiles of scalac, which notably uses this method in Transformer.transformStats, TypeConstraint.<init> and a few other hot places.

I don't believe the type test will hurt much -- testing for a superclass is fast in the JVM (interface checks do have a performance gotcha).

If this change is deemed too risky, I can instead move this implementation into Colllections in the compiler and use it in the hot spots, as I did previously for mapList before the same optimization landed in List.map iteself.

lrytz · 2019-07-16T06:58:48Z

I'd prefer that solution, but maybe I'm too cautious.

retronym · 2019-07-16T21:58:08Z

I'll microbenchmark to try to measure any slowdown in non-List use cases.

retronym · 2019-07-17T02:33:26Z

Benchmarking filtering an empty vector (which in 2.12.x doesn't have a special case to avoid the builder allocation and foreach call in filterImpl). The benchmark makes sure that filterImpl is not JIT specialized to Vector by warming up with a variety of collections and stopping JIT inlining of the callsite that we're measuring.

package scala.collection

import java.lang.invoke.DontInline
import java.util.concurrent.TimeUnit

import org.openjdk.jmh.annotations.{CompilerControl, Param, _}
import org.openjdk.jmh.infra.Blackhole

import scala.annotation.tailrec
import scala.collection.generic.{GenericCompanion, GenericTraversableTemplate, SeqFactory}
import scala.collection.immutable.{::, List, Nil}
import scala.collection.mutable.ArrayBuffer


object FilterBenchmark {
  case class Content(value: Int)
}


@BenchmarkMode(Array(Mode.AverageTime))
@Fork(2)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class FilterBenchmark {
  import FilterBenchmark._

  var valuesList: List[Content] = _
  var valuesQueue: mutable.Queue[Content] = _
  var valuesVector: immutable.Vector[Content] = _

  @Setup(Level.Trial) def initKeys(): Unit = {
    valuesList = List.tabulate(16)(v => Content(v))
    valuesQueue = mutable.Queue.tabulate(16)(v => Content(v))
    valuesVector = immutable.Vector.tabulate(16)(v => Content(v))
  }

  @Benchmark def filter_EmptyVector_includeAll: Any = {
    filter_includeAll(Vector.empty)
  }

  @Benchmark def warmupAll(bh: Blackhole): Any = {
    bh.consume(filter_includeAll(valuesList))
    bh.consume(filter_includeAll(valuesQueue))
    bh.consume(filter_includeAll(valuesVector))
    bh.consume(filter_includeAll(BitSet.empty))
    bh.consume(filter_includeAll(Stream.empty))
  }

  @CompilerControl(CompilerControl.Mode.DONT_INLINE)
  private def filter_includeAll(t: Traversable[_]): Any = {
    t.filter(v => true)
  }
}

I see only a minor degradation in performance:

> bench/jmh:run scala.collection.FilterBenchmark.filter_EmptyVector_includeAll -wm BULK -wmb scala.collection.FilterBenchmark.warmupAll
[info] FilterBenchmark.filter_EmptyVector_includeAll  avgt   20  46.757 ± 0.746  ns/op
...
[info] FilterBenchmark.filter_EmptyVector_includeAll  avgt   20  47.902 ± 0.373  ns/op

This seems fine to me.

lrytz

Thanks!

dwijnand · 2019-07-17T08:50:15Z

src/library/scala/collection/TraversableLike.scala

+      var currentLast = newHead
+
+      // we know that all elements are :: until at least firstMiss.tail
+      while (!(toProcess eq firstMiss)) {


Any reason to ! eq over ne?

dwijnand · 2019-07-17T08:53:08Z

Just out of interest: how much of a performance gain is this, at the cost of these net +85 lines?

retronym · 2019-07-17T21:29:49Z

@dwijnand Here are the comparative micro benchmark results from the original submission of this patch: #5653 (comment)

The best case is in as.filter(_ => true) which has about a 10x speedup, and has secondary benefits by just returning the input list. This benefit extends to operations in which the input and output share the suffix of the list, this is structurally shared.

dwijnand · 2019-07-17T22:08:36Z

Awesome. Mike was also telling me that it reduces the amount of garbage created while running scalac.

[backport] List.filter optimizations from 2.13.x

1a842d1

Binary compatibilty constraints won't let us actually do this as an override in `List` (we tried that originally but reverted.) But we are free to type-case List in the inherited implementation.

scala-jenkins added this to the 2.12.9 milestone Jul 15, 2019

diesalbla added library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. labels Jul 15, 2019

diesalbla requested a review from NthPortal July 15, 2019 15:13

NthPortal reviewed Jul 15, 2019

View reviewed changes

lrytz approved these changes Jul 17, 2019

View reviewed changes

lrytz merged commit 3397d47 into scala:2.12.x Jul 17, 2019

dwijnand reviewed Jul 17, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[backport] List.filter optimizations from 2.13.x #8226

[backport] List.filter optimizations from 2.13.x #8226

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	// everything seen so far so far is not included
	// everything seen so far is not included

[backport] List.filter optimizations from 2.13.x #8226

[backport] List.filter optimizations from 2.13.x #8226

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!