By default, only run DCE on methods with an ATHROW. #6044

retronym · 2017-08-21T01:01:00Z

GenBCode started to run a local DCE optimization on all methods to afford
simplicity in code generation of Nothing-typed expressions.
Benchmarks show this was costing 1-2% of compile times.

This commit selectively run DCE on methods with the problematic THROW, even
under -opt:l:none. It then changes -opt:l:default to the empty set of
optimizations.

An existing test for this code gen quirk is updated to show with either
explicitly enabled or selective targeted DCE, clean code is generated.

Removes the three work queues in the backend. Splits up the backend in two main components - CodeGen, which has a Global - PostProcessor, which has a BTypes (but no Global) CodeGen generates asm.ClassNodes and stores them in postProcessor.generatedClasses. The code generator is invoketd through BCodePhase.apply. The postProcessor then runs the optimizer, computes the InnerClass table and adds the lambdaDeserialize method if necessary. It finally serializes the classes into a byte array and writes them to disk. The implementation of classfile writing still depends on Global. It is passed in as an argument to the postProcessor. A later commit will move it to a context without Global and make it thread-safe.

BTypes is the component that's shared between CodeGen and PostProcessor.

Remove implicit conversion from LazyVar[T] to T

GenBCode started to run a local DCE optimization on all methods to afford simplicity in code generation of `Nothing`-typed expressions. Benchmarks show this was costing 1-2% of compile times. This commit selectively run DCE on methods with the problematic THROW, even under `-opt:l:none`. It then changes `-opt:l:default` to the empty set of optimizations. An existing test for this code gen quirk is updated to show with either explicitly enabled or selective targeted DCE, clean code is generated.

retronym · 2017-08-21T01:01:20Z

(Based on top of #6012)

retronym · 2017-08-21T01:04:25Z

Comparative benchmark.

I'm looking into the test failure.

retronym · 2017-08-21T05:23:24Z

ASM is inserting an unreachable frame into:

class C {
  def t(): Unit = {
    var x = ""
    while (x != null) {
      getClass
    }
  }
}

Which gets the NOP;NOP;ATHROW treatment.


  // access flags 0x1
  public t()V
   L0 @ L687685057
    LINENUMBER 2 L0 @ L687685057
    LINENUMBER 3 L0 @ L687685057
    LDC ""
   L1 @ L1604020967
    ASTORE 1
   L2 @ L277697988
    LINENUMBER 4 L2 @ L277697988
   FRAME APPEND [java/lang/String]
    ALOAD 1
    IFNULL L3 @ L1412612727
   L4 @ L367066629
    LINENUMBER 5 L4 @ L367066629
    ALOAD 0
    INVOKEVIRTUAL C.getClass ()Ljava/lang/Class;
    POP
    GOTO L2 @ L277697988
   L5 @ L287859212
   FRAME FULL [] [java/lang/Throwable]
    NOP
    NOP
    ATHROW
   L3 @ L1412612727
    LINENUMBER 4 L3 @ L1412612727
   FRAME APPEND [C java/lang/String]
    RETURN
   L6 @ L1810970264
    LOCALVARIABLE x Ljava/lang/String; L1 @ L1604020967 L3 @ L1412612727 1
    LOCALVARIABLE this LC; L0 @ L687685057 L6 @ L1810970264 0
    MAXSTACK = 1
    MAXLOCALS = 2

lrytz · 2017-08-28T14:22:00Z

  def t(): Unit = {
    var x = ""
    while (x != null) {
      getClass
    }
  }

Code gen continues to generate code (the return statement) after the while loop, but that code is unreachable, so asm replaces it by nop;..nop;athrow during classfile writing. This happens for any method with unreachable code, so that's the change we'd have to live with if we don't enable DCE by default. Maybe we can identify some common special cases, like the above, and add those methods to methodRequiringDCE.

retronym · 2017-08-29T03:36:17Z

Maybe its too ambitious, but we could modify (or extend) asm.MethodWriter to be able to call our DCE as a compensating action before resorting to resorting to NOP; ATHROW.

lrytz · 2017-08-29T14:04:02Z

The replacing of unreachable code happens when computing stack map frames. First, frames (of reachable labels) are computed in a fixpoint algorithm. Then, code after unreachable labels is replaced (https://github.com/scala/scala-asm/blob/master/src/main/java/scala/tools/asm/MethodWriter.java#L1501).

It would be non-trivial to remove the code and fix up all indicies. They do something similar for handling long jumps (GOTO_W) in https://github.com/scala/scala-asm/blob/master/src/main/java/scala/tools/asm/MethodWriter.java#L2383, and it's hard.

We could remove the asm instructions and re-run the classfile writer, but that also sounds non-optimal.

As an alternative approach, we should try whether a home-grown DCE is faster the current removeUnreachableCodeImpl which runs an Analyzer. Implementing that should not be hard, we already have our own "abstract" interpreter (not that abstract, but it could be made so..) in computeMaxLocalsMaxStack

It would be non-trivial to remove the code and fix up all indicies.

I was thinking we'd abort writing that method, run our existing DCE, and restart the method writer. This assumes it is possible to do a "rollback" in the ClassWriter/MethodWriter. If methods with dead code are rare, the rollback need not be efficient.

The DCE detection in ASM is probably faster than ours because it uses Label.status to mark reachable labels, whereas we build up a Set-s of LabelNode-s.

I think it safe to use LabelNode.getLabel.status to store flags so long as InsnList.resetLabels is called afterwards (either by us explicitly or by an enclosing MethodNode.accept(MethodVisitor).

I'll close this PR while we experiment with these alternative approaches. Let me know if you'd like to do this work, otherwise I'll come back to it next month.

retronym closed this

Aug 30, 2017

If the technique of using Label.status is feasible, we should also see if it helps performance elsewhere.

labelReferences is another slow point. This isn't quite the same problem, but we might be able to replace:

val res = mutable.AnyRefMap[LabelNode, Set[AnyRef]]()

with use of Label.info

    /**
     * Field used to associate user information to a label. Warning: this field
     * is used by the ASM tree package. In order to use it with the ASM tree
     * package you must override the
     * {@link scala.tools.asm.tree.MethodNode#getLabelNode} method.
     */
    public Object info;

lrytz mentioned this pull request

Sep 13, 2017

Performance improvements for DCE #6075

Merged

SethTisue modified the milestones: 2.12.4, 2.12.5

Sep 19, 2017

SethTisue removed this from the 2.12.5 milestone

Mar 13, 2018

lrytz and others added 10 commits July 28, 2017 14:07

new CodeGen component, move bTypes to GenBCode

edea96f

move classfile writing code to context without global

5f5d525

Fix -Ygen-asmp, minor cleanups

948fb88

Move components from BTypes to PostProcessor

1532dcc

BTypes is the component that's shared between CodeGen and PostProcessor.

Use LazyVar for CoreBTypes

67a1693

Remove implicit conversion from LazyVar[T] to T

move PostProcessorFrontendAccess to a separate file

233231d

Move LazyVar to BTypes, synchronize on frontendLock

1349e54

move backend state from BTypes to components where it belongs

58cfe9e

scala-jenkins added this to the 2.12.4 milestone Aug 21, 2017

retronym requested a review from lrytz August 21, 2017 01:01

retronym mentioned this pull request Aug 21, 2017

Compiler performance scala/scala-dev#322

Closed

7 tasks

retronym added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Aug 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

By default, only run DCE on methods with an ATHROW. #6044

By default, only run DCE on methods with an ATHROW. #6044

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

By default, only run DCE on methods with an ATHROW. #6044

By default, only run DCE on methods with an ATHROW. #6044

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!