8000 optimise completeSilentlyAndCheckErroneous by mkeskells · Pull Request #5832 · scala/scala · GitHub
[go: up one dir, main page]

Skip to content

optimise completeSilentlyAndCheckErroneous #5832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2017

Conversation

mkeskells
Copy link
Contributor
@mkeskells mkeskells commented Apr 5, 2017

performance result - disappointing. I havent looked into the detail as it why -

after 10 90% jvm, no GC
                  RunName	Count	           AllWallMS	                   CPU_MS	                Allocated
              00_baseline	36	  2499.87 [+1.05% -0.95%]	  2472.66 [+1.06% -0.96%]	   602.25 [+1.00% -1.00%]
      11_completeSilently	39	  2540.75 [+1.03% -0.97%]	  2507.61 [+1.04% -0.95%]	   601.74 [+1.00% -1.00%]

@scala-jenkins scala-jenkins added this to the 2.12.3 milestone Apr 5, 2017
@lrytz
Copy link
Member
lrytz commented Apr 6, 2017

How should I read the numbers? What's Count? The *MS times are larger, so does that mean that this patch caused a slight slowdown?

@lrytz
Copy link
Member
lrytz commented Apr 6, 2017

Also, the confidence intervals of the *MS numbers seem to be too large to show a measurable difference here.

@mkeskells
Copy link
Contributor Author
mkeskells commented Apr 6, 2017

HI @lrytz

the test is run 60 times-

  • we ignore the first 10 hence "after 10"
  • we ignore the worst 10% ( outliers) hence "90%"
  • this is for the jvm phase only hence "jvm"
  • we filter the result to ignore those with a GC that occurred within the run of the phase hence "no jvm"

the "count" column is the number of samples remaining after the above filtering

the "AllWallMS" is taken as the mean ( and ranges) for the start and end of the phase measured via System.nanotime

the "CPU_MS" in taken from the mxBean from the current thread. This is lower granularity, but useful for detecting IO pauses etc. We also gather the user mode time for the thread, but this is not displayed in the results above

"Allocated" is taken from the thread allocation counter, an indicates the number of bytes allocated in the heap expressed in MB

@mkeskells
Copy link
Contributor Author

as for the actual number reported - the Allocated show a slight reduction, about what I expect, but the CPU increase is unexpected and I will dig into this. It is on the border of confidence ( 1.6% mean increase, with a range in measurements of ~1%)

I didn't expect to see much change in performance, but certainly not an increase

As the only change is from

     val originalReporter = global.reporter
      val storeReporter = new reporters.StoreReporter()		
      global.reporter = storeReporter		
      try {		
        sym.info		
      } finally {		
        global.reporter = originalReporter		
      }

to

withoutReporting(sym.info)
...
 @inline def withoutReporting[T](fn : => T) = {
    val currentReporter = reporter
    reporter = NoReporter
    try fn finally reporter = currentReporter
  }
...
object NoReporter extends Reporter{
  override protected def info0(pos: Position, msg: String, severity: Severity, force: Boolean): Unit = ()
}

NoReporter does less work ( if called)
no constructor to call (so the expected minimal allocation reduction)
the orig code path refers to global.reporter, as opposed to the visible field
the only other difference is the use of inline - I would have expected this to be optimized away

I would love to understand what is wrong with the rationale above

@lrytz
Copy link
Member
lrytz commented Apr 7, 2017

Thanks @mkeskells for these explanations. I'm wondering why did you start optimizing this particular spot, did it show up in profiling?

I think withoutReporting is not inlined because it's a non-final method in a non-final class. This is not so easy to see unfortunately because we have plenty of inliner warnings already, this just added one..

[info] Compiling 293 Scala sources to /home/jenkins/workspace/scala-2.12.x-validate-test/build/quick/classes/compiler...
...
[warn] there were 43 inliner warnings; re-run enabling -opt-warnings for details, or try -help

Check if you see the desired speedup after making the method final.

@mkeskells
Copy link
Contributor Author

@lrytz I dont recall the reason for the optimisation. Either it would have been showing up in the profiler or code inspection, I would presume the latter. This is just going through #5815 and pulling out all of the changes as separate reviewable items as requested

@mkeskells
Copy link
Contributor Author

@lrytz Not inlined - doh schoolboy error :-(

too used to having warnings as fatal from the day job. Will retry, and add a longer run, and post the results when I have them

@mkeskells
Copy link
Contributor Author
mkeskells commented Apr 11, 2017

Result after 200 cycles. _2 is with e0d2830

after 10 90% jvm, no GC
                  RunName	Count	           AllWallMS	                   CPU_MS	                Allocated
              B0_baseline	130	  3334.91 [+1.06% -0.96%]	  3226.08 [+1.05% -0.95%]	   604.20 [+1.01% -1.00%]
     11_completeSilently_2	131	  3313.66 [+1.04% -0.96%]	  3206.35 [+1.03% -0.96%]	   602.04 [+1.01% -1.00%]

machine blue screened some time after this after so cpu figures are suspect. Will rerun

@mkeskells
Copy link
Contributor Author

reran. Not sure why the results have stepped (upgrades?). The comparison is the important part though

after 10 90% jvm, no GC
                  RunName	Count	           AllWallMS	                   CPU_MS	                Allocated
              B0_baseline	139	  3020.22 [+1.50% -0.95%]	  2962.79 [+1.11% -0.95%]	   614.98 [+1.00% -1.00%]
    11_completeSilently_2	136	  2932.35 [+1.05% -0.97%]	  2904.07 [+1.05% -0.96%]	   611.21 [+1.00% -1.00%]

@lrytz
Copy link
Member
lrytz commented Apr 12, 2017

Looks good to me, thanks! IUC, this saves around 3% of the jvm phase.

Would you like to get the first commit (RunnableInPhase) in as well? If so I'd like it to have a bit more documentation: what it's used for and how.

@mkeskells
Copy link
Contributor Author
mkeskells commented Apr 12, 2017

@lrytz can we add the additional documentation with the contribution for the profiler? Hard to describe how to use it without the other part of the puzzle

there is an earlier version in #5760 of the generation, but that doesnt have the MT support, and has a few more bugs and less feature, but some things in that branch that relate to your CI, so I have created another PR #5848 so you can see the changes to the RunnableInPhase, but no documentation in there yet

to generate the result summary the tool that we use is https://github.com/rorygraves/perf_tester
this is a little rough. Its not really a standalone tool yet. Not sure where this would be useful.

I think that as a CI task, but not sure how that would work with the infra that you have - it would need to run on the dedicated hardware as a performance test. What could we do to link this to your existing performance test work - which I think only runs after merge

@lrytz
Copy link
Member
lrytz commented Apr 13, 2017

@lrytz can we add the additional documentation with the contribution for the profiler? Hard to describe how to use it without the other part of the puzzle

Yes, that sounds good. Can we move this class out of this PR then?

@rorygraves rorygraves force-pushed the 2.12.x_withoutReporting branch from e0d2830 to 980560d Compare April 18, 2017 17:57
@mkeskells
Copy link
Contributor Author

/rebuild

@mkeskells
Copy link
Contributor Author

rebased and removed profiling commit

@mkeskells
Copy link
Contributor Author

@lrytz can this PR be merged?

Copy link
Member
@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good!

@lrytz lrytz merged commit 0452c52 into scala:2.12.x May 18, 2017
@rorygraves rorygraves deleted the 2.12.x_withoutReporting branch May 24, 2017 20:41
@SethTisue SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance the need for speed. usually compiler performance, sometimes runtime performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0