0% found this document useful (0 votes)

28 views74 pages

Profiling JVM Applications in Production

The document outlines a workshop focused on profiling JVM applications in production environments using modern tools to enhance performance on Linux systems. It covers objectives such as identifying resource overloads, profiling CPU bottlenecks, and utilizing flame graphs for stack trace visualization, along with hands-on labs for practical experience. The target audience includes application developers and system administrators, with prerequisites of understanding JVM fundamentals and Linux system administration.

Uploaded by

liuzhengyangpascal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views74 pages

Profiling JVM Applications in Production

Uploaded by

liuzhengyangpascal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Profiling JVM Applications

in Production
Sasha Goldshtein @goldshtn
CTO, Sela Group github.com/goldshtn
https://s.sashag.net/srecon0318
Workshop Introduction
• Mission:
Apply modern, low-overhead, production-ready tools to monitor and
improve JVM application performance on Linux
• Objectives:
qIdentifying overloaded resources
qProfiling for CPU bottlenecks
qVisualizing and exploring stack traces using flame graphs
qRecording system events (I/O, network, GC, etc.)
qProfiling for heap allocations
Course Introduction
• Target audience:
Application developers, system administrators, production engineers
• Prerequisites:
Understanding of JVM fundamentals, experience with Linux system
administration, familiarity with OS concepts
• Lab environment:
EC2, delivered through the browser during the course dates
• Course hands-on labs:
https://github.com/goldshtn/linux-tracing-workshop
Course Plan
• JVM and Linux performance information sources
• CPU sampling
• Flame graphs and symbols
• Lab: Profiling with perf and async-profiler
• eBPF
• BCC tools
• Lab: Tracing file opens
• GC tracing and allocation profiling
• Lab: Allocation profiling
The Lab Environment
• Follow the link provided by the instructor
• Sign up or log in with Google
• Enter the classroom token
• Click the beaker-in-a-cloud icon to get
your own lab instance
• Wait for the terminal to initialize
JVM and Linux
Performance Sources
Performance Information Sources attach interface
(jcmd)
Java Flight
Java applications Recorder
Other USDT (dtrace)
JMX applications probes
mbeans Class loader

JVM
Serviceability API
uprobes System libraries GC JIT JVMTI agents
hsperf (jstat)
Syscall interface +PrintCompilation
Filesystem TCP/IP +PrintGC & other
Kernel

kprobes
Block I/O Ethernet Scheduler Mem Tracepoints
Tracepoints Device drivers ”software events”
PMU
Other devices CPU events
USE Checklist for Linux Systems
http://www.brendangregg.com/USEmethod/use-linux.html
U: mpstat -P 0 CPU
S: vmstat 1
E: perf Core Core
U: perf
U: iostat 1
LLC
S: iostat -xz 1
U: perf E: …/ioerr_cnt

FSB
Memory I/O
GFX PCIe SSD
controller controller

U: free -m U: sar -n DEV 1

S: sar -B E1000 S: ifconfig
RAM
E: dmesg E: ifconfig
USE Checklist For JVM Applications
Native
JNI
Java Java libraries
Thread Thread
syscalls Kernel U:
+PrintCompilation,
jstat, USDT
allocate

allocate
U: top, U: JVMTI,
jstack USDT
E: JVMTI

U: jmap, Thread Thread

Heap jhat, jstat
GC JIT
U: +PrintGC,
jstat, NMT,
USDT
⚠ Mind The Overhead
• Any observation can change the state of the system, but some
observations are worse than others
• Performance tools have overhead
• Check the docs
• Try on a test system first
• Measure degradation introduced by the tool

OVERHEAD
This traces various kernel page cache functions and maintains in-kernel counts, which are asynchronously copied
to user-space. While the rate of operations can be very high (>1G/sec) we can have up to 34% overhead, this is still
a relatively efficient way to trace these events, and so the overhead is expected to be small for normal
workloads. Measure in a test environment.
—man cachestat (from BCC)
CPU Sampling
Sampling vs. Tracing
• Sampling works by getting a snapshot or a call stack every N
occurrences of an interesting event
• For most events, implemented in the PMU using overflow counters and
interrupts
CPU sample

pid 121 pid 121 pid 408 pid 188 CPU time

• Tracing works by getting a message or a call stack at every occurrence

of an interesting event
pid 121 pid 408 system time
disk write
JVM Stack Sampling
• Traditional CPU profilers sample all thread stacks periodically (e.g. 100
times per second)
• Typically use the JVMTI GetAllStackTraces API
• jstack, JVisualVM, YourKit, JProfiler, and a lot of others

GC running blocked running Thread 1

GC running blocked Thread 2

GC blocked Thread 1

sample sample sample

Safepoint Bias
• Samples are captured only at safepoints
• Research Evaluating The Accuracy of Java Profilers by Mytkowicz,
Diwan, Hauswirth, Sweeney shows wild variety of results between
profilers due to safepoint bias
• Additionally, capturing a full
stack trace for all threads is
quite expensive (think Spring)
perf
• perf is a Linux multi-tool for performance investigations
• Capable of both tracing and sampling
• Developed in the kernel tree, must match running kernel’s version

• Debian-based: apt install linux-tools-common

• Red Hat-based: yum install perf
Recording CPU Stacks With perf
• To find a CPU bottleneck, record stacks at timed intervals:
# system-wide Legend
perf record -ag -F 97 -a all CPUs
# specific process -p specific process
perf record -p 188 -g -F 97 -- run workload and capture it
-g capture call stacks
# specific workload -F frequency of samples (Hz)
perf record -g -F 97 -- ./myapp -c # of events in each sample
A Single Stack
# perf script
parprimes 13393 248974.821897: 10309278 cpu-clock:
92b is_prime+0xffffffffff800035 (/…/parprimes)
96c primes_loop+0xffffffffff800021 (/…/parprimes)
9d4 primes_thread+0xffffffffff800020 (/…/parprimes)
75ca start_thread+0xffff011d4ae720ca (/…/libpthread-2.23.so)
…
# perf script | wc –l
7214
Stack Report
# perf report --stdio
# Children Self Command Shared Object Symbol
# ........ ........ ............ .................. .......................................
#
72.02% 71.53% parprimes parprimes [.] is_prime
|
--71.53%--start_thread
primes_thread
primes_loop
is_prime
...truncated

27.86% 0.00% dd [kernel.kallsyms] [k] vfs_read

|
---vfs_read
|
--27.80%--__vfs_read
...truncated
Flame Graphs and Missing
Symbols
Symbols
• perf needs symbols to display function names (beyond modules and
addresses)
• For compiled languages (C, Go, …) these are often embedded in the binary
• Or installed as separate debuginfo (usually /usr/lib/debug)

$ objdump -tT /usr/bin/bash | grep readline

0000000000306bf8 g DO .bss 0000000000000004 Base rl_readline_state
00000000000a46c0 g DF .text 00000000000001d4 Base readline_internal_char
00000000000a3cc0 g DF .text 0000000000000126 Base readline_internal_setup
0000000000078b80 g DF .text 0000000000000044 Base posix_readline_initialize
00000000000a4de0 g DF .text 0000000000000081 Base readline
00000000003062d0 g DO .bss 0000000000000004 Base bash_readline_initialized
…
Report Without Symbols
# perf report --stdio
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. .......................
#
100.00% 0.00% hello hello [.] 0xffffffffffc0051d
|
---0x51d
|
|--54.91%--0x4f7
|
|--27.97%--0x4eb
|
|--8.73%--0x4e3
|
--7.97%--0x4ff
Java App Report
# perf report --stdio
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. ......................
#
100.00% 0.00% java perf-2318.map [.] 0x00007f82b50004e7
|
---0x7f82b50004e7
|
|--8.15%--0x7f82b510d63e
|
|--7.97%--0x7f82b510d6ca
|
|--7.07%--0x7f82b510d6c2
|
|--6.88%--0x7f82b510d686
|
|--6.16%--0x7f82b510d68e
perf-PID.map Files
• When symbols are missing in the binary, perf will look for a file
named /tmp/perf-PID.map by default

$ cat /tmp/perf-1882.map
7f2cd1108880 1e8 Ljava/lang/System;::arraycopy
7f2cd1108c00 200 Ljava/lang/String;::hashCode
7f2cd1109120 2e0 Ljava/lang/String;::indexOf
7f2cd1109740 1c0 Ljava/lang/String;::charAt
…
7f2cd110ce80 120 LHello;::doStuff
7f2cd110d280 140 LHello;::fidget
7f2cd110d5c0 120 LHello;::fidget
7f2cd110d8c0 120 LHello;::fidget
…
Generating Map Files
• For interpreted or JIT-compiled languages, map files need to be
generated at runtime
• Java: perf-map-agent
create-java-perf-map.sh $(pidof java)
• This is a JVMTI agent that attaches on demand to the Java process
• Additional options include dottedclass, unfoldall, sourcepos
• Consider -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints for
more accurate inline info
• Other runtimes:
• Node: node --perf-basic-prof-only-functions app.js
• Mono: mono --jitmap ...
• .NET Core: export COMPlus_PerfMapEnabled=1
Fixed Report; Still Broken
# perf report --stdio
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. ......................
#
100.00% 0.00% java perf-3828.map [.] call_stub
|
---call_stub
LHello;::fidget
…
Walking Stacks
• To successfully walk stacks, perf requires* FPO to be disabled
• This is an optimization that uses EBP/RBP as a general-purpose register rather
than a frame pointer
• C/C++: -fno-omit-frame-pointer
• Java: -XX:+PreserveFramePointer since Java 8u60

When debug information is present, perf can use libunwind and figure out FPO-
enabled stacks, but not for dynamic languages
Fixed Report
# perf report --stdio
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. ......................
#
100.00% 99.65% java perf-4005.map [.] LHello;::fidget
|
--99.65%--start_thread
JavaMain
jni_CallStaticVoidMethod
jni_invoke_static
JavaCalls::call_helper
call_stub
LHello;::main
LHello;::doStuff
LHello;::identifyWidget
LHello;::fidget
…
Real-World Stack Reports
# perf report --stdio | wc -l
14823
Flame Graphs
• A visualization method (adjacency graph), very
useful for stack traces, invented by Brendan
Gregg
• http://www.brendangregg.com/flamegraphs.html
• Turns 1000s of stack trace pages into a single
interactive graph
• Example scenarios:
• Identify CPU hotspots on the system/application
• Show stacks that perform heavy disk accesses
• Find threads that block for a long time and the stack
where they do it
Reading a Flame Graph
• Each rectangle is a function • Wider frames are more common
• Y-axis: stack depth • Supports zoom, find
• X-axis: sorted stacks (not time) • Filter with grep 😎
Generating a Flame Graph
$ git clone https://github.com/BrendanGregg/FlameGraph
$ sudo perf record -F 97 -g -p `pidof java` -- sleep 10
$ sudo perf script |
FlameGraph/stackcollapse-perf.pl |
FlameGraph/flamegraph.pl > flame.svg
Not Just For Methods
• For just a package-level understanding of where your time goes, use
pkgsplit-perf.pl and generate a package-level flame graph:

From http://www.brendangregg.com/blog/2017-06-30/package-flame-graph.html
Lab: CPU Investigation With perf And
Flame Graphs 💻
Problems with perf
• Only Java 8u60 and later is supported (to disable FPO)
• Disabling FPO has a small performance impact (up to 10% in
pathological cases)
• Symbol resolution requires an additional agent
• Interpreter frames can’t be resolved (shown as “Interpreter”)
• Recompiled methods can be misreported (appear more than once in
the perf map)
• Stack depth is usually limited to 127 (again, think Spring)
• Can be configured since Linux 4.8 using
/proc/sys/kernel/perf_event_max_stack
async-profiler
JVMTI Agents
• A JVMTI (JVM Tool Interface) agent can be loaded with -agentpath
or attached through the JVM attach interface
• Examples of functionality:
• Trace thread start and stop events
• Count monitor contentions and wait times
• Aggregate class load and unload information
• Full event reference:
http://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html
AsyncGetCallTrace
• Internal API introduced to support lightweight profiling in Oracle
Developer Studio
• Produces a single thread’s stack without waiting for a safepoint
• Designed to be called from a signal handler
• Used by Honest Profiler (by Richard Warburton and contributors):
https://github.com/jvm-profiling-tools/honest-profiler
async-profiler
• Open source profiler by Andrei Pangin and contributors:
https://github.com/jvm-profiling-tools/async-profiler

kernel user

perf_events Cyclic perf

buffer
CPU JVM JVM Native
sample stack
PMU thread thread thread
sample stack
perf fd
signal
AsyncGetCallTrace
inotify libasyncProfiler.so
Profilers, Compared

perf async-profiler
• Java ≧8u60 to disable FPO • Works on older Java versions
• Disabling FPO has a perf penalty • FPO can stay on
• Need a map file • No map file is required
• Interpreter frames are not • Interpreter frames are
supported supported
• System-wide profiling is possible • In theory, native and Java stacks
• Can profile containers from the don’t always sync
host (or from a sidecar) • Profiling runs in-process (so, in-
container)
Lab: Profiling With async-profiler 💻
eBPF
What’s Wrong With perf?
• perf relies on pushing a lot of data to user space, through files, for
analysis
• Downloading a file at ∼1Gb/s produces ∼89K netif_receive_skb events/s
(19MB/s including stacks)

kernel user monitor

e1000 netif_receive
_skb perf | awk | …
average packet
size: 189 bytes

perf_events perf.data
BPF: 1990
• Invented by McCanne and Jacobson at Berkeley, 1990-1992:
instruction set, representation, implementation of packet filters

$ tcpdump -d 'ip and dst 186.173.190.239'

(000) ldh [12]
(001) jeq #0x800 jt 2 jf 5
(002) ld [30]
(003) jeq #0xbaadbeef jt 4 jf 5
(004) ret #262144
(005) ret #0
BPF: Today
• Supports a wide spectrum of usages
• Has a JIT for maximum efficiency

kernel user
probes BPF runtime control program
sockets verifier & JIT control program
BPF program
syscalls
BPF program
BPF map BPF compiler
BPF Tracing
kernel user
①
kprobes BPF runtime control
② control program
program
tracepoints BPF program
③ ③ ⑤ application
perf_events
map USDT

② perf output ④ uprobes

① installs BPF program and attaches to events ④ user-space program is invoked with
② events invoke the BPF program data from the shared buffer
③ BPF program updates a map or pushes a ⑤ user-space program reads statistics
new event to a buffer shared with user-space from the map and clears it if necessary
BPF Tracing Features in The Linux Kernel
Version Feature Scenarios
4.1 kprobes/uprobes attach Dynamic tracing with BPF becomes possible
24 4.1 bpf_trace_printk BPF programs can print output to ftrace pipe
4.3 perf_events output Efficient tracing of large amounts of data for
16.04 analysis in user-space
4.6 Stack traces Efficient aggregation of call stacks for profiling
or tracing
4.7 Tracepoints support API stability for tracing programs
25 4.9 perf_events attach Low-overhead profiling and PMU sampling

16.10
The Old Way And The New Way
kernel user monitor
LATμs # distribution
VFS k{,ret}probe:
vfs_read perf | awk | … 0 - 1
1 – 2
…
…
|@@@@
|@
|
|
2 - 4 … |@@@@@@@@ |

perf_events perf.data

kernel user monitor

LATμs # distribution
VFS k{,ret}probe:
vfs_read control program
0 - 1
1 – 2
…
…
|@@@@
|@
|
|
2 - 4 … |@@@@@@@@ |

BPF program BPF map

BCC Performance Checklist
The BCC BPF Front-End
• https://github.com/iovisor/bcc user
• BPF Compiler Collection (BCC) is a BCC tool BCC tool …
BPF frontend library and a massive
BCC compiler frontend
collection of performance tools
• Contributors from Facebook, Clang + LLVM
PLUMgrid, Netflix, Sela
BCC loader library
• Helps build BPF-based tools in high-
level languages kernel
• Python, Lua, C++
event
BPF runtime
sources
mysqld_qslower
execsnoop bashreadline profile
opensnoop dbslower llcstat
memleak
killsnoop sslsniff dbstat ustat uthreads
statsnoop gethostlatency mysqlsniff ugc uobjnew
syncsnoop deadlock_detector ucalls uflow
setuidsnoop
CPU
Applications
System libraries JVM
argdist filetop
trace filelife Syscall interface runqlat
funccount fileslower cpudist
funclatency vfscount Filesystem TCP/IP offcputime
stackcount vfsstat
cachestat
Scheduler Mem offwaketime

cachetop Block I/O Ethernet cpuunclaimed

mountsnoop
*fsslower Device drivers memleak
*fsdist oomkill
dcstat biotop tcptop slabratetop
dcsnoop hardirqs
biolatency tcplife
mdflush softirqs
biosnoop tcpconnect
ttysnoop
bitesize tcpaccept
BCC Linux Performance Checklist
1. execsnoop 8. tcpaccept
2. opensnoop 9. tcptop
3. ext4slower 10.gethostlatency
(or btrfs*, xfs*, zfs*) 11.cpudist
4. biolatency 12.runqlat
5. biosnoop 13.profile
6. cachestat
7. tcpconnect
Some BCC Tools
# ext4slower 1
Tracing ext4 operations slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:49:17 bash 3616 R 128 0 7.75 cksum
06:49:17 cksum 3616 R 39552 0 1.34 [
06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7
06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4
^C
# execsnoop
PCOMM PID RET ARGS
bash 15887 0 /usr/bin/man ls
preconv 15894 0 /usr/bin/preconv -e UTF-8
man 15896 0 /usr/bin/tbl
man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
^C
Some BCC Tools
# runqlat -p `pidof java` 10 1
Tracing run queue latency... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 11 |* |
2 -> 3 : 7 | |
4 -> 7 : 133 |****************** |
8 -> 15 : 288 |****************************************|
16 -> 31 : 205 |**************************** |
32 -> 63 : 38 |***** |
64 -> 127 : 11 |* |
128 -> 255 : 5 | |
256 -> 511 : 3 | |
512 -> 1023 : 1 | |
1024 -> 2047 : 3 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 3 | |
BCC’s profile Tool
# profile 10 -F 97 -K # kernel stacks only
…
ffffffffa4818691 __lock_text_start
ffffffffa45b0341 ata_scsi_queuecmd
ffffffffa458813d scsi_dispatch_cmd
ffffffffa458b021 scsi_request_fn
ffffffffa43be643 __blk_run_queue
ffffffffa43c3bc1 blk_queue_bio
ffffffffa43c1cf2 generic_make_request
ffffffffa43c1e4d submit_bio
ffffffffa43b825d submit_bio_wait
ffffffffa43c5c65 blkdev_issue_flush
ffffffffa4309b4d ext4_sync_fs
ffffffffa428b260 sync_fs_one_sb
ffffffffa425a553 iterate_supers
ffffffffa428b374 sys_sync
ffffffffa4003c17 do_syscall_64
ffffffffa4818bab return_from_SYSCALL_64
- stress (3303)
14
BCC’s profile Tool
kernel user monitor
PMU perf script | fold
| flamegraph
cpu-clocks

perf_events perf.data

kernel user monitor

PMU profile –f |
flamegraph
cpu-clocks

BPF map
BPF program
BPF stacks
Lab: Snooping File Opens 💻
General-Purpose BCC Tools
Tracing Sources For BCC Tools

kernel user
kprobes application
tcp_sendmsg
USDT
tracepoints hotspot:class_loaded
sched:sched_switch BPF
program
perf_events application
cpu-clocks
uprobes
mysqld:…mysql_parse…
USDT Probes in (Some) High-Level Languages
OpenJDK Node.js
hotspot:gc_begin node:http_server_request
Oracle JDK
hotspot:thread_start node:http_client_request
hotspot:method_entry node:gc_begin

libc/libpthread Python Ruby

libc:memory_malloc_retry python:function_entry ruby:method_entry
libpthread:pthread_start python:function_return ruby:object_create
libpthread:mutex_acquired python:gc_start ruby:load_entry

OOTB PHP MySQL

build flag
php:request_startup mysql:query_start
php:function_entry mysql:connection_start
not
supported
php:error mysql:query_parse_start
USDT Probes and Uprobes in the JVM
• OpenJDK Hotspot has a large number of static (USDT) probes in
various subsystems; display with tplist or readelf:
$ tplist -p $(pidof java) | grep 'hotspot.*gc'
.../libjvm.so hotspot:mem__pool__gc__begin
.../libjvm.so hotspot:mem__pool__gc__end
.../libjvm.so hotspot:gc__begin
.../libjvm.so hotspot:gc__end
• All JVM native methods can be used with dynamic probes; discover
with objdump or nm:
$ nm -C $(find /usr/lib/debug -name libjvm.so.debug)
| grep 'card.*table'
0000000000854751 t PSScavenge::card_table()
00000000016dd778 b PSScavenge::_card_table
...
BCC trace
• trace is a multi-purpose logging tool; think of it as a dynamic log at
arbitrary locations in the system (can also print call stacks)

# trace 'SyS_write (arg3 > 100000) "large write: %d bytes", arg3'

PID TID COMM FUNC -
9353 9353 dd SyS_write large write: 1048576 bytes
9353 9353 dd SyS_write large write: 1048576 bytes
9353 9353 dd SyS_write large write: 1048576 bytes
^C
# trace 'r:/usr/bin/bash:readline "%s", retval'
TIME PID COMM FUNC -
02:02:26 3711 bash readline ls –la
02:02:36 3711 bash readline wc -l src.c
^C
BCC funccount/stackcount
• funccount counts the number of invocations of a particular method,
while stackcount also aggregates the call stacks

# LIBJVM=$(find /usr/lib -name libjvm.so)

# funccount -p $(pidof java) "$LIBJVM:*do_collection*"
Tracing 5 functions for ".../libjvm.so:*do_collection*"... Hit Ctrl-C to
end.
^C
FUNC COUNT
_ZN16GenCollectedHeap13do_collectionEbbmbi 848
Detaching...
Lab: Tracing Database Accesses 💻
Heap Allocation Profiling
Approaches for Allocation Profiling
• Allocation profiling can help reduce GC pressure and pause times
• Tracing each object allocation is extremely expensive, though
• Use -XX:+ExtendedDTraceProbes and sample
hotspot:object__alloc probes (expect a significant overhead)
• Trace Hotspot allocation tracing callbacks designed for JFR
• send_allocation_in_new_tlab_event: when a new TLAB is allocated
for a thread because the old one was exhausted
• send_allocation_outside_tlab_event: when an object is allocated
outside a TLAB (e.g. because it’s too big, or because the TLAB is exhausted)
async-profiler
• When used with the heap mode, instruments the JFR TLAB allocation
events and reports objects allocated and stack samples
• Requires JDK debuginfo to be installed (to find the relevant symbols)

$ ./profiler.sh -d 10 -e alloc -o summary,flat `pidof java`

HEAP profiling started
...
696470120 (75.33%) [C
226075184 (24.45%) [B
425600 (0.05%) [Ljava/util/HashMap$Node;
193592 (0.02%) com/sun/org/apache/xerces/internal/dom/ElementImpl
185536 (0.02%) com/sun/org/apache/xml/internal/serializer/NamespaceMappings$MappingRecord
162176 (0.02%) java/util/Stack
BCC Tools With Extended Probes
# funccount -p `pidof java` u:$LIBJVM:object__alloc
Tracing 1 functions for "u:.../libjvm.so:object__alloc"... Hit Ctrl-C to
end.
FUNC COUNT
object__alloc 4000987
Detaching...

# argdist -p `pidof java` -C "u:$LIBJVM:object__alloc():char*:arg2"

605018 arg2 = java/lang/String
609801 arg2 = java/util/HashMap$Nod
908716 arg2 = com/sun/org/apache/xml/internal/serializer/NamespaceMappings$MappingRecord

908778 arg2 = java/util/Stack

909348 arg2 = [Ljava/lang/Object;
910097 arg2 = [C
grav
• Collection of performance visualization tools by Mark Price and Amir
Langer: https://github.com/epickrram/grav
• Includes a Python wrapper on top of object__alloc probes with
sampling support, flame graph generation, and filtering specific types

$ sudo python src/heap/heap_profile.py -p `pidof java` -d 10 > alloc.stacks

$ FlameGraph/flamegraph.pl < alloc.stacks > alloc.svg
Lab: Excessive GC And Allocation
Profiling 💻
Course Wrap-Up
Objectives Review
• Mission:
Apply modern, low-overhead, production-ready tools to monitor and
improve JVM application performance on Linux
• Objectives:
üIdentifying overloaded resources
üProfiling for CPU bottlenecks
üVisualizing and exploring stack traces using flame graphs
üRecording system events (I/O, network, GC, etc.)
üProfiling for heap allocations
References
• JVM observability tools • BCC and BPF
• http://openjdk.java.net/groups/hotspot/do • https://github.com/iovisor/bcc/blob/master/
cs/Serviceability.html docs/tutorial.md
• http://docs.oracle.com/javase/8/docs/plat • http://www.brendangregg.com/ebpf.html
form/jvmti/jvmti.html • http://blogs.microsoft.co.il/sasha/2016/03/3
• http://cr.openjdk.java.net/~minqi/6830717 1/probing-the-jvm-with-bpfbcc/
/raw_files/new/agent/doc/index.html • http://blogs.microsoft.co.il/sasha/2016/03/3
• https://docs.oracle.com/javase/8/docs/tec 0/usdt-probe-support-in-bpfbcc/
hnotes/guides/management/jconsole.html • Containers and JVM
• perf and flame graphs • https://blog.csanchez.org/2017/05/31/runni
• https://perf.wiki.kernel.org/index.php/Main_ ng-a-jvm-in-a-container-without-getting-
Page killed/
• http://www.brendangregg.com/flamegraphs. • http://www.brendangregg.com/blog/2017-
html 05-15/container-performance-analysis-
dockercon-2017.html
• AGCT profilers • http://batey.info/docker-jvm-
• https://github.com/jvm-profiling-tools/async- flamegraphs.html
profiler
• https://github.com/jvm-profiling-
tools/honest-profiler
Questions?
Sasha Goldshtein @goldshtn
CTO, Sela Group github.com/goldshtn

Java Performance Analysis with Flame Graphs
No ratings yet
Java Performance Analysis with Flame Graphs
71 pages
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
No ratings yet
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
92 pages
JVM Profiling with Async Profiler
No ratings yet
JVM Profiling with Async Profiler
100 pages
Linux Performance Profiling Guide
No ratings yet
Linux Performance Profiling Guide
24 pages
Profiling and Tracing
No ratings yet
Profiling and Tracing
9 pages
Monitorama2015netflixinstanceanalysis 150616190732 Lva1 App6892
No ratings yet
Monitorama2015netflixinstanceanalysis 150616190732 Lva1 App6892
69 pages
Module 5
No ratings yet
Module 5
71 pages
Off-CPU Analysis
No ratings yet
Off-CPU Analysis
14 pages
ACM Applicative 2016: System Methodology
No ratings yet
ACM Applicative 2016: System Methodology
57 pages
KernelRecipes Perf Events
No ratings yet
KernelRecipes Perf Events
79 pages
Linux Performance Optimization Guide
No ratings yet
Linux Performance Optimization Guide
27 pages
PSR 2920 2018-12-07T111416 Linux Observability Superpowers
No ratings yet
PSR 2920 2018-12-07T111416 Linux Observability Superpowers
47 pages
Báo Cáo Nhúng
No ratings yet
Báo Cáo Nhúng
4 pages
Arm Platform Performance Profiling
No ratings yet
Arm Platform Performance Profiling
41 pages
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
No ratings yet
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
84 pages
Linux Profiling at Netflix
No ratings yet
Linux Profiling at Netflix
84 pages
23.profiling I
No ratings yet
23.profiling I
29 pages
Performance Measurement Tools and Techniques
No ratings yet
Performance Measurement Tools and Techniques
50 pages
Advanced DTrace Insights
No ratings yet
Advanced DTrace Insights
56 pages
CS701: Profiling Tools Overview
No ratings yet
CS701: Profiling Tools Overview
30 pages
Week 10 Assignment
No ratings yet
Week 10 Assignment
3 pages
Linux Performance Analysis New Tools and Old Secrets: Brendan Gregg
No ratings yet
Linux Performance Analysis New Tools and Old Secrets: Brendan Gregg
75 pages
FALLSEM2024-25 MCSE503L TH VL2024250108049 2024-11-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 MCSE503L TH VL2024250108049 2024-11-19 Reference-Material-I
29 pages
Lisa13flamegraphs 131107112122 Phpapp01
No ratings yet
Lisa13flamegraphs 131107112122 Phpapp01
170 pages
Percona2016linuxsystemsperf 160421182216
No ratings yet
Percona2016linuxsystemsperf 160421182216
72 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Flame Graphs
No ratings yet
Flame Graphs
14 pages
Lisa19 Slides Gregg
No ratings yet
Lisa19 Slides Gregg
64 pages
Dungeon Session Worksheet
No ratings yet
Dungeon Session Worksheet
17 pages
Valgrind Tools for Developers
No ratings yet
Valgrind Tools for Developers
13 pages
Linuxbpfsuperpowers 160302200247
No ratings yet
Linuxbpfsuperpowers 160302200247
60 pages
Howto-Perf Profiling
No ratings yet
Howto-Perf Profiling
6 pages
Process Debugging - Information Collection
No ratings yet
Process Debugging - Information Collection
1 page
Profiler For Method Level Stable Measurements
No ratings yet
Profiler For Method Level Stable Measurements
11 pages
DTrace and Java Integration Guide
No ratings yet
DTrace and Java Integration Guide
34 pages
LLVM Optimization Pipeline Guide
No ratings yet
LLVM Optimization Pipeline Guide
109 pages
LM32 Ait L19
No ratings yet
LM32 Ait L19
19 pages
Java Performance Tuning (Full Presentation) by Ender
No ratings yet
Java Performance Tuning (Full Presentation) by Ender
172 pages
Presentation 9
No ratings yet
Presentation 9
28 pages
USE Method - Rosetta Stone of Performance Checklists
No ratings yet
USE Method - Rosetta Stone of Performance Checklists
8 pages
Howto Perf Profiling
No ratings yet
Howto Perf Profiling
7 pages
P51a 03 Part2
No ratings yet
P51a 03 Part2
38 pages
Identifying Performance Issues Beyond Oracle Wait
No ratings yet
Identifying Performance Issues Beyond Oracle Wait
19 pages
Lec02 1 Measuring Profiling
No ratings yet
Lec02 1 Measuring Profiling
25 pages
Linux Performance Tools: Brendan Gregg
No ratings yet
Linux Performance Tools: Brendan Gregg
90 pages
Linux Performance Tools (LinuxCon NA) - Brendan Gregg
No ratings yet
Linux Performance Tools (LinuxCon NA) - Brendan Gregg
90 pages
Introduction
No ratings yet
Introduction
21 pages
Java Virtual Machine Profiling
No ratings yet
Java Virtual Machine Profiling
18 pages
Os Level Dynamic Measurement
No ratings yet
Os Level Dynamic Measurement
12 pages
Profiling Tools: by Vitaly Kroivets
No ratings yet
Profiling Tools: by Vitaly Kroivets
94 pages
Linuxperftools 140820091946 Phpapp01
No ratings yet
Linuxperftools 140820091946 Phpapp01
85 pages
12 Profiling
No ratings yet
12 Profiling
52 pages
Linux Performance Troubleshooting
No ratings yet
Linux Performance Troubleshooting
78 pages
HPCToolkit Users Manual
No ratings yet
HPCToolkit Users Manual
135 pages
Awsreinvent2014perftuningec2 141112191859 Conversion Gate02
No ratings yet
Awsreinvent2014perftuningec2 141112191859 Conversion Gate02
81 pages
Troubleshooting Java Programs With Dtrace: Arieh Markel Sun Microsystems
No ratings yet
Troubleshooting Java Programs With Dtrace: Arieh Markel Sun Microsystems
50 pages
TOUCH 2 Datasheet
No ratings yet
TOUCH 2 Datasheet
2 pages
TLR+9Trainer Hyper2k
No ratings yet
TLR+9Trainer Hyper2k
2 pages
AE Bull 030512 Elev Recall Conventional Update
No ratings yet
AE Bull 030512 Elev Recall Conventional Update
4 pages
Diploma 2021 Revision
No ratings yet
Diploma 2021 Revision
9 pages
Cybersecurity & PKI Insights
No ratings yet
Cybersecurity & PKI Insights
30 pages
Andrews Pitchfork Theory
100% (2)
Andrews Pitchfork Theory
2 pages
Practical 11: Description
No ratings yet
Practical 11: Description
20 pages
SevOne NMS Installation Guide
No ratings yet
SevOne NMS Installation Guide
18 pages
Analyzing Reliability in The Data Center Outline
No ratings yet
Analyzing Reliability in The Data Center Outline
5 pages
Lecture 4 Rorschach
No ratings yet
Lecture 4 Rorschach
12 pages
Thesis Statement, Topic Sentence, and Supporting Details: Paul Christian Reforsado Abad
No ratings yet
Thesis Statement, Topic Sentence, and Supporting Details: Paul Christian Reforsado Abad
29 pages
Screenshot 2025-03-29 at 10.42.46 AM
No ratings yet
Screenshot 2025-03-29 at 10.42.46 AM
3 pages
Ciit VC Date Sheet (1st Sessional April 2014)
No ratings yet
Ciit VC Date Sheet (1st Sessional April 2014)
11 pages
Code Program Keypad Karya Ing
No ratings yet
Code Program Keypad Karya Ing
3 pages
Experiment No.: 1.1 Title: Design of Half Adder and Full Adder Circuit Using LTSPICE Software
No ratings yet
Experiment No.: 1.1 Title: Design of Half Adder and Full Adder Circuit Using LTSPICE Software
13 pages
Vtu Question Paper2
No ratings yet
Vtu Question Paper2
2 pages
BSBMKG547 Student Assessment Guide - Task 2
No ratings yet
BSBMKG547 Student Assessment Guide - Task 2
24 pages
Syllabus Spring 2025
No ratings yet
Syllabus Spring 2025
12 pages
Scrap Sales
No ratings yet
Scrap Sales
3 pages
Upload Files to Box in C# Guide
No ratings yet
Upload Files to Box in C# Guide
2 pages
Cameroonian Passport Guide
No ratings yet
Cameroonian Passport Guide
4 pages
Office Click-To-Run Logs
No ratings yet
Office Click-To-Run Logs
19 pages
INOMAX ACS Series AC Drive User Manual V220
No ratings yet
INOMAX ACS Series AC Drive User Manual V220
136 pages
Sap Tables
No ratings yet
Sap Tables
45 pages
VHDL For Engineers 1st Edition Short Fast Access
No ratings yet
VHDL For Engineers 1st Edition Short Fast Access
313 pages
UVM Preview LR
No ratings yet
UVM Preview LR
21 pages
Modular Inspection Systems
No ratings yet
Modular Inspection Systems
9 pages
EurostarHS LD E 05 2015
100% (1)
EurostarHS LD E 05 2015
7 pages
Advanced Scan Testing Techniques
No ratings yet
Advanced Scan Testing Techniques
7 pages
Determination of Band Gap Energy of A PN Junction Diode
No ratings yet
Determination of Band Gap Energy of A PN Junction Diode
4 pages

Profiling JVM Applications in Production

Uploaded by

Profiling JVM Applications in Production

Uploaded by

Profiling JVM Applications

U: free -m U: sar -n DEV 1

U: jmap, Thread Thread

• Tracing works by getting a message or a call stack at every occurrence

GC running blocked running Thread 1

GC running blocked Thread 2

sample sample sample

• Debian-based: apt install linux-tools-common

27.86% 0.00% dd [kernel.kallsyms] [k] vfs_read

$ objdump -tT /usr/bin/bash | grep readline

perf_events Cyclic perf

kernel user monitor

$ tcpdump -d 'ip and dst 186.173.190.239'

② perf output ④ uprobes

kernel user monitor

BPF program BPF map

cachetop Block I/O Ethernet cpuunclaimed

kernel user monitor

libc/libpthread Python Ruby

OOTB PHP MySQL

# trace 'SyS_write (arg3 > 100000) "large write: %d bytes", arg3'

# LIBJVM=$(find /usr/lib -name libjvm.so)

$ ./profiler.sh -d 10 -e alloc -o summary,flat `pidof java`

# argdist -p `pidof java` -C "u:$LIBJVM:object__alloc():char*:arg2"

908778 arg2 = java/util/Stack

$ sudo python src/heap/heap_profile.py -p `pidof java` -d 10 > alloc.stacks

You might also like