8000 [backport] List.filter optimizations from 2.13.x by retronym · Pull Request #8226 · scala/scala · GitHub
[go: up one dir, main page]

Skip to content

[backport] List.filter optimizations from 2.13.x #8226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2019

Conversation

retronym
Copy link
Member
@retronym retronym commented Jul 15, 2019

Binary compatibilty constraints won't let us actually do this
as an override in List (we tried that originally but reverted until we reintroduced the optimization in 2.13.x)

But we are free to type-case List in the inherited implementation.

Binary compatibilty constraints won't let us actually do this
as an override in `List` (we tried that originally but reverted.)

But we are free to type-case List in the inherited implementation.
@scala-jenkins scala-jenkins added this to the 2.12.9 milestone Jul 15, 2019
@diesalbla diesalbla added library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. labels Jul 15, 2019
@lrytz
Copy link
Member
lrytz commented Jul 15, 2019

Could you share your observations that motivate this PR? I'm just a bit hesitant about the impact of the additional type test because almost all collections inherit the default implementation.

@diesalbla diesalbla requested a review from NthPortal July 15, 2019 15:13
b.result
private[this] def filterImplList[A](self: List[A], p: A => Boolean, isFlipped: Boolean): List[A] = {

// everything seen so far so far is not included
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// everything seen so far so far is not included
// everything seen so far is not included

Copy link
Contributor
@NthPortal NthPortal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things that could be very slightly cleaned up, but it looks pretty good. I assume it was mostly copied straight from 2.13.

I share Lukas' concerns about the additional type test, though I'm also wondering how it compares to bimorphic or polymorphic dispatch (if the method was able to be defined on List). Or perhaps it would be like double polymorphic dispatch if filter/filterNot are already polymorphic.

if (l.isEmpty)
Nil
else {
val h = l.head
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used once - inline?

if (remaining.isEmpty)
start
else {
val x = remaining.head
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used once - inline?

nextToCopy = nextToCopy.tail
}
nextToCopy = next.tail
next = next.tail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we assign one of these to the other rather than calling tail twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not tweak the code I'm backporting here (unless we spot an error of course). I think JIT or the CPU caches will make this sort of micro opt unnecessary from a performance perspective.

@retronym
Copy link
Member Author

I was looking at allocation hotspots from profiles of scalac, which notably uses this method in Transformer.transformStats, TypeConstraint.<init> and a few other hot places.

I don't believe the type test will hurt much -- testing for a superclass is fast in the JVM (interface checks do have a performance gotcha).

If this change is deemed too risky, I can instead move this implementation into Colllections in the compiler and use it in the hot spots, as I did previously for mapList before the same optimization landed in List.map iteself.

@lrytz
Copy link
Member
lrytz commented Jul 16, 2019

I'd prefer that solution, but maybe I'm too cautious.

@retronym
Copy link
Member Author

I'll microbenchmark to try to measure any slowdown in non-List use cases.

@retronym
Copy link
Member Author

Benchmarking filtering an empty vector (which in 2.12.x doesn't have a special case to avoid the builder allocation and foreach call in filterImpl). The benchmark makes sure that filterImpl is not JIT specialized to Vector by warming up with a variety of collections and stopping JIT inlining of the callsite that we're measuring.

package scala.collection

import java.lang.invoke.DontInline
import java.util.concurrent.TimeUnit

import org.openjdk.jmh.annotations.{CompilerControl, Param, _}
import org.openjdk.jmh.infra.Blackhole

import scala.annotation.tailrec
import scala.collection.generic.{GenericCompanion, GenericTraversableTemplate, SeqFactory}
import scala.collection.immutable.{::, List, Nil}
import scala.collection.mutable.ArrayBuffer


object FilterBenchmark {
  case class Content(value: Int)
}


@BenchmarkMode(Array(Mode.AverageTime))
@Fork(2)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class FilterBenchmark {
  import FilterBenchmark._

  var valuesList: List[Content] = _
  var valuesQueue: mutable.Queue[Content] = _
  var valuesVector: immutable.Vector[Content] = _

  @Setup(Level.Trial) def initKeys(): Unit = {
    valuesList = List.tabulate(16)(v => Content(v))
    valuesQueue = mutable.Queue.tabulate(16)(v => Content(v))
    valuesVector = immutable.Vector.tabulate(16)(v => Content(v))
  }

  @Benchmark def filter_EmptyVector_includeAll: Any = {
    filter_includeAll(Vector.empty)
  }

  @Benchmark def warmupAll(bh: Blackhole): Any = {
    bh.consume(filter_includeAll(valuesList))
    bh.consume(filter_includeAll(valuesQueue))
    bh.consume(filter_includeAll(valuesVector))
    bh.consume(filter_includeAll(BitSet.empty))
    bh.consume(filter_includeAll(Stream.empty))
  }

  @CompilerControl(CompilerControl.Mode.DONT_INLINE)
  private def filter_includeAll(t: Traversable[_]): Any = {
    t.filter(v => true)
  }
}

I see only a minor degradation in performance:

> bench/jmh:run scala.collection.FilterBenchmark.filter_EmptyVector_includeAll -wm BULK -wmb scala.collection.FilterBenchmark.warmupAll
[info] FilterBenchmark.filter_EmptyVector_includeAll  avgt   20  46.757 ± 0.746  ns/op
...
[info] FilterBenchmark.filter_EmptyVector_includeAll  avgt   20  47.902 ± 0.373  ns/op

This seems fine to me.

Copy link
Member
@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lrytz lrytz merged commit 3397d47 into scala:2.12.x Jul 17, 2019
var currentLast = newHead

// we know that all elements are :: until at least firstMiss.tail
while (!(toProcess eq firstMiss)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to ! eq over ne?

@dwijnand
Copy link
Member

Just out of interest: how much of a performance gain is this, at the cost of these net +85 lines?

@retronym
Copy link
Member Author
retronym commented Jul 17, 2019

@dwijnand Here are the comparative micro benchmark results from the original submission of this patch: #5653 (comment)

The best case is in as.filter(_ => true) which has about a 10x speedup, and has secondary benefits by just returning the input list. This benefit extends to operations in which the input and output share the suffix of the list, this is structurally shared.

@dwijnand
Copy link
Member

Awesome. Mike was also telling me that it reduces the amount of garbage created while running scalac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0