-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[backport] List.filter optimizations from 2.13.x #8226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Binary compatibilty constraints won't let us actually do this as an override in `List` (we tried that originally but reverted.) But we are free to type-case List in the inherited implementation.
Could you share your observations that motivate this PR? I'm just a bit hesitant about the impact of the additional type test because almost all collections inherit the default implementation. |
b.result | ||
private[this] def filterImplList[A](self: List[A], p: A => Boolean, isFlipped: Boolean): List[A] = { | ||
|
||
// everything seen so far so far is not included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// everything seen so far so far is not included | |
// everything seen so far is not included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things that could be very slightly cleaned up, but it looks pretty good. I assume it was mostly copied straight from 2.13.
I share Lukas' concerns about the additional type test, though I'm also wondering how it compares to bimorphic or polymorphic dispatch (if the method was able to be defined on List
). Or perhaps it would be like double polymorphic dispatch if filter
/filterNot
are already polymorphic.
if (l.isEmpty) | ||
Nil | ||
else { | ||
val h = l.head |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only used once - inline?
if (remaining.isEmpty) | ||
start | ||
else { | ||
val x = remaining.head |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only used once - inline?
nextToCopy = nextToCopy.tail | ||
} | ||
nextToCopy = next.tail | ||
next = next.tail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we assign one of these to the other rather than calling tail
twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not tweak the code I'm backporting here (unless we spot an error of course). I think JIT or the CPU caches will make this sort of micro opt unnecessary from a performance perspective.
I was looking at allocation hotspots from profiles of scalac, which notably uses this method in I don't believe the type test will hurt much -- testing for a superclass is fast in the JVM (interface checks do have a performance gotcha). If this change is deemed too risky, I can instead move this implementation into |
I'd prefer that solution, but maybe I'm too cautious. |
I'll microbenchmark to try to measure any slowdown in non-List use cases. |
Benchmarking filtering an empty vector (which in 2.12.x doesn't have a special case to avoid the builder allocation and foreach call in package scala.collection
import java.lang.invoke.DontInline
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations.{CompilerControl, Param, _}
import org.openjdk.jmh.infra.Blackhole
import scala.annotation.tailrec
import scala.collection.generic.{GenericCompanion, GenericTraversableTemplate, SeqFactory}
import scala.collection.immutable.{::, List, Nil}
import scala.collection.mutable.ArrayBuffer
object FilterBenchmark {
case class Content(value: Int)
}
@BenchmarkMode(Array(Mode.AverageTime))
@Fork(2)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class FilterBenchmark {
import FilterBenchmark._
var valuesList: List[Content] = _
var valuesQueue: mutable.Queue[Content] = _
var valuesVector: immutable.Vector[Content] = _
@Setup(Level.Trial) def initKeys(): Unit = {
valuesList = List.tabulate(16)(v => Content(v))
valuesQueue = mutable.Queue.tabulate(16)(v => Content(v))
valuesVector = immutable.Vector.tabulate(16)(v => Content(v))
}
@Benchmark def filter_EmptyVector_includeAll: Any = {
filter_includeAll(Vector.empty)
}
@Benchmark def warmupAll(bh: Blackhole): Any = {
bh.consume(filter_includeAll(valuesList))
bh.consume(filter_includeAll(valuesQueue))
bh.consume(filter_includeAll(valuesVector))
bh.consume(filter_includeAll(BitSet.empty))
bh.consume(filter_includeAll(Stream.empty))
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
private def filter_includeAll(t: Traversable[_]): Any = {
t.filter(v => true)
}
} I see only a minor degradation in performance:
This seems fine to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
var currentLast = newHead | ||
|
||
// we know that all elements are :: until at least firstMiss.tail | ||
while (!(toProcess eq firstMiss)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to !
eq
over ne
?
Just out of interest: how much of a performance gain is this, at the cost of these net +85 lines? |
@dwijnand Here are the comparative micro benchmark results from the original submission of this patch: #5653 (comment) The best case is in |
Awesome. Mike was also telling me that it reduces the amount of garbage created while running scalac. |
Binary compatibilty constraints won't let us actually do this
as an override in
List
(we tried that originally but reverted until we reintroduced the optimization in 2.13.x)But we are free to type-case List in the inherited implementation.