-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Reduce the overhead of specialization #5812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(Assigning to 2.12.3, as I don't want to rush these changes into 2.12.2) |
Any types that needed to be specialized to support callsites in the current run would have already been info transformed during the specalization tree transform of those call sites. The backend requires further type information, e.g, to know about inner/enclosing class relationships. This involves calls to `sym.info` for classes on the classpath that haven't yet been info transformed. During that process, all base classes of such types are also info transformed. The specialization info transformer for classes then looks at the members of the classes to add specialialized variants. This is undesirable on grounds of performance and the risk of encountering stub symbols (references to types absent from the current compilation classpath) which can manifest as compiler crashes.
We know that `subst(tp1) <:< subst(tp2)` a priori (and cheaply!) if `tp2` is `Any`, which is commonly the case.
Reworks e4b5c00 to perform the flag mutation once per run, at the conclusion of the specialization tree transform, rather than once per compilation unit. The old approach was O(NxM), where N is the number of compilation units and M is the number of specialized overloads.
Most commonly, this method will return an empty set. This commit focuses on making that happen with a minimum of garbage, indirection, and info forcing. - Use mutable buffers to collect results, rather than appending sets - Avoid forcing the specialization info transform on all referenced types just to see if they have specialzied type parmeteters, we can phase travel back to typer to lookup this.
e877143
to
e416a25
Compare
The pattern of failures in this PR suggests this is non-deterministic. So the root cause could be a change in this PR, or this could be a latent problem that appears under load. I haven't been able to reproduce on my machine yet. I've rebased this PR on a commit that adds a better diagnostic to that test case, let's see what happens... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!!
@@ -198,6 +198,13 @@ abstract class SpecializeTypes extends InfoTransform with TypingTransformers { | |||
override def newPhase(prev: scala.tools.nsc.Phase): StdPhase = new SpecializationPhase(prev) | |||
class SpecializationPhase(prev: scala.tools.nsc.Phase) extends super.Phase(prev) { | |||
override def checkable = false 8000 | |||
override def run(): Unit = { | |||
super.run() | |||
exitingSpecialize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a line here // see comment in transformInfo
* - naked | ||
* - as arguments to type constructors in @specialized positions | ||
* (arrays are considered as Array[@specialized T]) | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe remove that duplicated doc
Not sure what to make of this test failure that you saw earlier.. |
I'll diff the byte code after bootstrap
|
The only bytecode diff I see when bootstrapping through this PR is one from a previous change (#5631) to the compiler.
|
/synch |
@lrytz I'm going to merge this based on the identical bytecode after bootstrap. But let's keep an eye out for that test failure afterwards. |
Thank you for this excellent performance catch @retronym. |
On my machine, running the benchmark
sbt 'hot -psource='
(jointly compiling the sources ofscalap
andbetter-files
) shows 7.5% improvement over f5ce29b (the parent of this PR).Per-commit performance attribution is available in this spreadsheet for the corresponding commits in the larger branch in #5785.