Hi everyone. One of the Tooling team focus lately has been improvement of the contribution process to sbt. Another thing we’ve been thinking is the performance of sbt. Combining these themes together, I’ve been looking into how guys like Jason Zaugg and Johannes Rudolph profile JVM applications, and this post is my findings.
The techniques described here should be applicable to both Java and Scala, and mostly indepent of build tools you use.
There are several ways to profile JVM apps, but the new hotness in profiling is Flame graphs invented by Brendan Gregg, Senior Performance Architect at Netflix. You first collect stack trace samples, and then it is processed into an intereactive svg graph. For an introduction to Flame graphs, see:
The first tool I recommend is async-profiler by Andrei Pangin. This is available for macOS and Linux, and it’s easier to get started.
build/
and profiler.sh
to $HOME/bin
, assuming you have PATH to $HOME/bin
:
ln -s ~/App/async-profiler/profiler.sh $HOME/bin/profiler.sh
ln -s ~/App/async-profiler/build $HOME/bin/build
Next, close all Java appliations and anything that may affect the profiling, like Slack, and run your app in a terminal. In my case, I am trying to profile sbt
’s initial loading:
$ sbt exit
In another terminal, run:
$ jps
92746 sbt-launch.jar
92780 Jps
This tells you the process ID of the app. In this case, it’s 92746
. While it’s running, run
$ profiler.sh -d 60 <process id>
Started [cpu] profiling
--- Execution profile ---
Total samples: 31602
Non-Java: 3239 (10.25%)
GC active: 46 (0.15%)
Unknown (native): 14667 (46.41%)
Not walkable (native): 3 (0.01%)
Unknown (Java): 433 (1.37%)
Not walkable (Java): 8 (0.03%)
Thread exit: 1 (0.00%)
Deopt: 9 (0.03%)
Frame buffer usage: 55.658%
Total: 1932000000 (6.11%) samples: 1932
[ 0] java.lang.ClassLoader$NativeLibrary.load
[ 1] java.lang.ClassLoader.loadLibrary0
[ 2] java.lang.ClassLoader.loadLibrary
[ 3] java.lang.Runtime.loadLibrary0
[ 4] java.lang.System.loadLibrary
....
This should show a bunch of stacktraces that are useful. To visualize this as a flamegraph, run:
$ profiler.sh -d 60 -f /tmp/flamegraph.svg <process id>
This should produce /tmp/flamegraph.svg
at the end.
See flamegraph.svg to try the output yourself.
Even though async-profiler is easier to get started, the fun part of Flame graph is mixing the JVM stack trace with the native code’s stack trace, allowing you see what your program is actually spending its CPU on. It turns out that Lightbend’s Johannes Rudolph wrote a tool for this called perf-map-agent. This uses dtrace
on macOS and perf
on Linux. This is particularly useful if you are trying to find if the bottle neck is on the native code.
You first have to compile perf-map-agent. For macOS, here to how to export JAVA_HOME
before running cmake .
:
$ cd work
$ git clone https://github.com/brendangregg/FlameGraph.git
$ git clone https://github.com/jvm-profiling-tools/perf-map-agent.git
$ cd perf-map-agent
$ export JAVA_HOME=$(/usr/libexec/java_home)
$ cmake .
-- The C compiler identification is AppleClang 9.0.0.9000039
-- The CXX compiler identification is AppleClang 9.0.0.9000039
...
$ make
In a fresh termimal, run sbt with -XX:+PreserveFramePointer
flag:
$ sbt -J-Dsbt.launcher.standby=20s -J-XX:+PreserveFramePointer exit
In the terminal that you will run the perf-map:
$ cd quicktest/
$ export JAVA_HOME=$(/usr/libexec/java_home)
$ export FLAMEGRAPH_DIR=$HOME/work/FlameGraph
$ jps
94592 Jps
94549 sbt-launch.jar
$ $HOME/work/perf-map-agent/bin/dtrace-java-flames 94549
dtrace: system integrity protection is on, some features will not be available
dtrace: description 'profile-99 ' matched 2 probes
Flame graph SVG written to DTRACE_FLAME_OUTPUT='/Users/xxx/work/quicktest/flamegraph-94549.svg'.
This would produce better flamegraph in theory, but the output looks too messy for sbt exit
case.
This might work better if the operations are already JITed, or if the operation is more specific. To get a better Flame graph, one thing we can do now is to repeat the same operation multiple times.
$ sbt -J-Dsbt.launcher.standby=20s -J-XX:+PreserveFramePointer reload reload reload reload exit
This enhances the signal, and some plateaus become more pronouced that we can zoom in and find out the hot paths.
Netflix recently released a new visualization tool called Flamescope that can filter the Flame graph to a specific range of time.
This was developed by Martin Spier and Brendan Gregg to study perturbations and other time-based issues. This makes sense because the normal Flame graphs aggregates all stack trace samples, so if there’s a short-lived glitch, it could be burried in other traces.
Due to JIT warmup etc, benchmarking is difficult. JMH runs the same tests multiple times to remove these effects and comes closer to measuring the performance of your code.
For sbt users, sbt-jmh that Lightbend’s Konrad Malawski wrote makes JMH testing easier. It apparently adds an integration with async-profiler too.
I’d also mention traditional JVM profiling tool. Since VisualVM is opensource, I’ll mention that one.
xsbt.boot.Boot
under Local.If you are familiar with YourKit, it also works similarly.
Flame graph avisualizes stack trace samples, which helps in identifying the hot paths in your application. It also helps to confirm if the changes you made has actually improved the performance or not.