10000 By default, only run DCE on methods with an ATHROW. by retronym · Pull Request #6044 · scala/scala · GitHub
[go: up one dir, main page]

Skip to content

By default, only run DCE on methods with an ATHROW. #6044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

retronym
Copy link
Member

GenBCode started to run a local DCE optimization on all methods to afford
simplicity in code generation of Nothing-typed expressions.
Benchmarks show this was costing 1-2% of compile times.

This commit selectively run DCE on methods with the problematic THROW, even
under -opt:l:none. It then changes -opt:l:default to the empty set of
optimizations.

An existing test for this code gen quirk is updated to show with either
explicitly enabled or selective targeted DCE, clean code is generated.

lrytz and others added 10 commits July 28, 2017 14:07
Removes the three work queues in the backend.

Splits up the backend in two main components
  - CodeGen, which has a Global
  - PostProcessor, which has a BTypes (but no Global)

CodeGen generates asm.ClassNodes and stores them in postProcessor.generatedClasses.
The code generator is invoketd through BCodePhase.apply.

The postProcessor then runs the optimizer, computes the InnerClass table and
adds the lambdaDeserialize method if necessary. It finally serializes the
classes into a byte array and writes them to disk.

The implementation of classfile writing still depends on Global. It is passed
in as an argument to the postProcessor. A later commit will move it to a context
without Global and make it thread-safe.
BTypes is the component that's shared between CodeGen and PostProcessor.
Remove implicit conversion from LazyVar[T] to T
GenBCode started to run a local DCE optimization on all methods
to afford simplicity in code generation of `Nothing`-typed
expressions.

Benchmarks show this was costing 1-2% of compile times.

This commit selectively run DCE on methods with the problematic
THROW, even under `-opt:l:none`. It then changes `-opt:l:default`
to the empty set of optimizations.

An existing test for this code gen quirk is updated to show with
either explicitly enabled or selective targeted DCE, clean code is
generated.
@scala-jenkins scala-jenkins added this to the 2.12.4 milestone Aug 21, 2017
@retronym retronym requested a review from lrytz August 21, 2017 01:01
@retronym
Copy link
Member Author

(Based on top of #6012)

@retronym retronym mentioned this pull request Aug 21, 2017
7 tasks
@retronym
Copy link
Member Author
retronym commented Aug 21, 2017

Comparative benchmark.

I'm looking into the test failure.

@retronym retronym added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Aug 21, 2017
@retronym
Copy link
Member Author

ASM is inserting an unreachable frame into:

class C {
  def t(): Unit = {
    var x = ""
    while (x != null) {
      getClass
    }
  }
}

Which gets the NOP;NOP;ATHROW treatment.


  // access flags 0x1
  public t()V
   L0 @ L687685057
    LINENUMBER 2 L0 @ L687685057
    LINENUMBER 3 L0 @ L687685057
    LDC ""
   L1 @ L1604020967
    ASTORE 1
   L2 @ L277697988
    LINENUMBER 4 L2 @ L277697988
   FRAME APPEND [java/lang/String]
    ALOAD 1
    IFNULL L3 @ L1412612727
   L4 @ L367066629
    LINENUMBER 5 L4 @ L367066629
    ALOAD 0
    INVOKEVIRTUAL C.getClass ()Ljava/lang/Class;
    POP
    GOTO L2 @ L277697988
   L5 @ L287859212
   FRAME FULL [] [java/lang/Throwable]
    NOP
    NOP
    ATHROW
   L3 @ L1412612727
    LINENUMBER 4 L3 @ L1412612727
   FRAME APPEND [C java/lang/String]
    RETURN
   L6 @ L1810970264
    LOCALVARIABLE x Ljava/lang/String; L1 @ L1604020967 L3 @ L1412612727 1
    LOCALVARIABLE this LC; L0 @ L687685057 L6 @ L1810970264 0
    MAXSTACK = 1
    MAXLOCALS = 2

@lrytz
Copy link
Member
lrytz commented Aug 28, 2017
  def t(): Unit = {
    var x = ""
    while (x != null) {
      getClass
    }
  }

Code gen continues to generate code (the return statement) after the while loop, but that code is unreachable, so asm replaces it by nop;..nop;athrow during classfile writing. This happens for any method with unreachable code, so that's the change we'd have to live with if we don't enable DCE by default. Maybe we can identify some common special cases, like the above, and add those methods to methodRequiringDCE.

@retronym
Copy link
Member Author

Maybe its too ambitious, but we could modify (or extend) asm.MethodWriter to be able to call our DCE as a compensating action before resorting to resorting to NOP; ATHROW.

@lrytz
Copy link
Member
lrytz commented Aug 29, 2017

The replacing of unreachable code happens when computing stack map frames. First, frames (of reachable labels) are computed in a fixpoint algorithm. Then, code after unreachable labels is replaced (https://github.com/scala/scala-asm/blob/master/src/main/java/scala/tools/asm/MethodWriter.java#L1501).

It would be non-trivial to remove the code and fix up all indicies. They do something similar for handling long jumps (GOTO_W) in https://github.com/scala/scala-asm/blob/master/src/main/java/scala/tools/asm/MethodWriter.java#L2383, and it's hard.

We could remove the asm instructions and re-run the classfile writer, but that also sounds non-optimal.

As an alternative approach, we should try whether a home-grown DCE is faster the current removeUnreachableCodeImpl which runs an Analyzer. Implementing that should not be hard, we already have our own "abstract" interpreter (not that abstract, but it could be made so..) in computeMaxLocalsMaxStack

@retronym
Copy link
Member Author

It would be non-trivial to remove the code and fix up all indicies.

I was thinking we'd abort writing that method, run our existing DCE, and restart the method writer. This assumes it is possible to do a "rollback" in the ClassWriter/MethodWriter. If methods with dead code are rare, the rollback need not be efficient.

@retronym
Copy link
Member Author
retronym commented Aug 30, 2017

The DCE detection in ASM is probably faster than ours because it uses Label.status to mark reachable labels, whereas we build up a Set-s of LabelNode-s.

image

I think it safe to use LabelNode.getLabel.status to store flags so long as InsnList.resetLabels is called afterwards (either by us explicitly or by an enclosing MethodNode.accept(MethodVisitor).

@retronym
Copy link
Member Author

I'll close this PR while we experiment with these alternative approaches. Let me know if you'd like to do this work, otherwise I'll come back to it next month.

@retronym retronym closed this Aug 30, 2017
@retronym
Copy link
Member Author

If the technique of using Label.status is feasible, we should also see if it helps performance elsewhere.

labelReferences is another slow point. This isn't quite the same problem, but we might be able to replace:

val res = mutable.AnyRefMap[LabelNode, Set[AnyRef]]()

with use of Label.info

    /**
     * Field used to associate user information to a label. Warning: this field
     * is used by the ASM tree package. In order to use it with the ASM tree
     * package you must override the
     * {@link scala.tools.asm.tree.MethodNode#getLabelNode} method.
     */
    public Object info;

@SethTisue SethTisue modified the milestones: 2.12.4, 2.12.5 Sep 19, 2017
@SethTisue SethTisue removed this from the 2.12.5 milestone Mar 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance the need for speed. usually compiler performance, sometimes runtime performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0