Db2mon Education Internal
Db2mon Education Internal
Kostas Rakopoulos
kostasr@ca.ibm.com
May 2020
Why did we create another monitoring tool?
●
Table functions are the recommended approach to monitoring (v9.7+)
●
Snapshots are deprecated
– New metrics are only being added to table functions
●
Table functions mean we need to write SQL queries to view metrics
– Some built-in options like MONREPORT.DBSUMMARY – but not enough
●
Most metrics are cumulative (since database activation)
– No way reset?!
– Need a way to view metrics for a given period of time
●
DSM uses table functions. Why not use DSM?
– It’s not practical for command line (scripts) usage or debugging customer issues remotely
2
Before db2mon there was psmon
●
New pureScale metrics in v9.8 were only available through table functions
– Some in db2pd but it was messy
●
Created psmon to report table function metrics specific to debugging pS
performance problems
●
Sufficient overlap between debugging pS and non-pS OLTP performance
problems
– psmon started to gain traction for analyzing OLTP issues in general
●
Eventually more metrics were added for debugging performance issues beyond
just OLTP
– Renamed psmon to db2mon to reflect this
3
How does it work?
●
Set of reporting queries that run on delta tables
●
Delta tables?
– Remember: No way to reset table functions stats
– We take before & after samples and store the delta in tables
●
Online mode (runs on Db2 server being analyzed)
– Use DGTTs to store before, after and delta metrics
– Report queries are run against DGTTs
●
Offline mode (only EXPORT command is run on Db2 server being analyzed)
– Metrics exported to IXF files
– On another system: Run scripts to IMPORT, compute deltas and run report queries
4
Design
●
Goals
– Lightweight – cannot impact performance of system being monitored
– Clean – do not create objects or otherwise pollute the system being monitored
– Backward compatibility – work with all supported versions of Db2
– Flexible – easily modifiable (such as adding new reporting queries)
●
Perl script which generates SQL to collect data and create a report
– User specifies Db2 version and generated SQL is compatible for that version
– Also generates script for simple online mode execution (db2mon.sh)
●
Creates (and removes) buffer pool and temp tablespace for DGTTs to isolate db2mon SQL
from other activities
5
Use Cases
●
Understand a customer’s workload/environment
– Saves you from having to ask basic questions about
●
Application: read/write ratio, types of statements, number of connections
●
Instance: topology, db2set, dbm cfg, db cfg
●
System: OS type, CPUs, memory
●
Find system bottlenecks
●
Find opportunities to tune the system (e.g., during a health check)
●
As part of a customer’s monitoring infrastructure
– Identify “What changed?” with before & after db2mon data
6
List of Reports - Lots of data!
Point in Time (SQL, Lock waits, Utilities) Table stats pureScale stats
START#EXSQL: Currently executing SQL at start of capture (non-zero metrics only TBL#ROWMC: Various table level metrics CF#GBPIO: Group bufferpool IO statistics by tablespace
START#LOCKW: Current lock waits at start of capture TBL#DATSH: Data sharing metrics CF#GBPHR: Group bufferpool data and index hit ratios
START#EXUTL: Currently executing utilities at start of capture CF#GBPIV: Group bufferpool invalid page statistics
END#EXSQL: Currently executing SQL at end of capture (non-zero metrics only
END#LOCKW: Current lock waits at end of capture
DB#SIZE: Size of database
TSP#SIZE: Tablespace properties
Tbsp stats CF#GBPDP: Tablespace data page prefetching statistics for group bufferpool
CF#GBPIP: Tablespace index page prefetching statistics for group bufferpool
END#EXUTL: Currently executing utilities at end of capture TSP#USAGE: Tablespace usage over monitoring interval CF#GBPFL: Count of group bufferpool full conditions
BPL#STATS: Bufferpool statistics by tablespace PAG#RCM: Page reclaim metrics for index and data pages
DB#THRUP: Throughput metrics at database level TSP#PRFST: Tablespace prefetching statistics PAG#RCSMP: Page reclaim metrics for SMP pages
DB#CLACT: Client activity (active connections have at least 1 stmt/s) TSP#BPMAP: Tablespace to bufferpool mapping CF#RTTIM: Round-trip CF command execution counts and average response times
DB#TIMEB: Time breakdown at database level (wait + processing) CF#CMDCT: Aggregate CF command execution counts
DB#WAITT: Wait times at database level BPL#SIZES: Bufferpool sizes
DB#PROCT: Processing times at database level DB level BP stats
BPL#HITRA: Bufferpool data and index hit ratios
CF#CMDTM: CF-side command execution counts and average response times
CF#CMDTO: CF-side total command execution counts
DB#SORT: Sort metrics at database level BPL#READS: Bufferpool read statistics (overall) CF#SYSRE: CF system resource information
SQL#TOPEXECT: Top SQL statements by execution time SQL stats BPL#RDSYNC: Bufferpool read statistics (synchronous reads)
BPL#RDASYNC: Bufferpool read statistics (asynchronous reads)
CF#SIZE: CF structure size information
SQL#TOPEXECP: Top SQL statements by execution time, aggregated by PLANID BPL#WRITE: Bufferpool write statistics (overall) BLU#PAGDI: Partial early aggregation / distincts
PKG#EXECT: Time spent executing by package BPL#WRSYNC: Bufferpool write statistics (synchronous writes)
SQL#TOPWAITT: Wait time breakdown for top SQL statements by execution time
SQL#TOPWAITW: Top SQL statements by time spent waiting
BPL#WRASYNC: Bufferpool write statistics (asynchronous writes) BLU: PAE/Distincts
SQL#TOPIOSTA: IO statistics per stmt - top statements by execution time
SQL#TOPROWS: Row level statistics per stmt - top statements by execution time
CON#WAITT: Wait times at connection level
CON#STATS: Various metrics at connection level
Connection level
SQL#TOPSORT: Sort statistics per stmt - top statements by execution time CON#PAGRW: Physical and logical page reads and writes at connection level
INF#EXPLN: Statement & plan identifiers - top statements by execution time
8
Database Throughput
================================================
DB#THRUP: Throughput metrics at database level
================================================
TS_DELTA MEMBER ACT_PER_S CMT_PER_S RB_PER_S DDLCK_PER_S SEL_P_S UID_P_S ROWS_INS_P_S ROWS_UPD_P_S ROWS_RET_P_S ROWS_MOD_P_S PKG_CACHE_INS_P_S P_RD_PER_S
----------- ------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ----------------- ------------
33 0 148589.4 24633.7 24.3 0.0 72455.2 25277.4 15663.8 19337.9 123311.8 80166.6 1.0 10148.6
9
Client Activity
Connection Ratio of request
considered active if it time vs. client wait
executes at least 1 for active clients
statement per second
=======================================================================
DB#CLACT: Client activity (active connections have at least 1 stmt/s)
=======================================================================
MEMBER TOTAL_CLIENTS TOTAL_CIWT TOTAL_RQST TOT_CIWT_RQ_RATIO ACTIVE_CLIENTS ACTIVE_RQ_PER_S ACTIVE_CIWT ACTIVE_RQST ACTIVE_CIWT_RQ_RATIO
------ ------------- -------------------- -------------------- ----------------- -------------- --------------- -------------------- -------------------- --------------------
0 268 1641303 7196998 0.22 267 45697.36 1608219 7196998 0.22
Prefer to look at
Ideally close to zero
active clients as
many connections
Investigate if > 10
may just be idle
10
Database Time Breakdown
================================================================
DB#TIMEB: Time breakdown at database level (wait + processing)
================================================================
MEMBER TOTAL_RQST_TM PCT_COMPILE PCT_SECTION PCT_SORT PCT_COL PCT_COL_SYNOP PCT_COMMIT PCT_RBACK PCT_CONN PCT_RTN_USR_CODE PCT_BACKUP PCT_IDX_BLD PCT_RUNSTATS PCT_REORG PCT_LOAD
------ ------------- ----------- ----------- -------- ------- ------------- ---------- --------- -------- ---------------- ---------- ----------- ------------ --------- --------
0 7193923 0.00 80.78 0.41 0.00 0.00 17.18 0.00 0.00 1.10 0.00 0.00 0.00 0.00 0.00
11
Database Wait Time Breakdown
Times reported as Very useful high level wait Remember these metrics are
Look at
percentages (PCT_*) percentages for latching, locking, aggregated at the database level
PCT_RQST_WAIT first
so easier to interpret buffer pool physical read, LOB meaning individual statements may
reads/write, etc. be less or more severely impacted
========================================
DB#WAITT: Wait times at database level
========================================
MEMBER TOTAL_RQST_TM TOTAL_WAIT_TM PCT_RQST_WAIT PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_LG_DSK PCT_LG_BUF PCT_RCLM PCT_CF PCT_PFTCH PCT_DIAG PCT_POOL_R PCT_DIR_R PCT_DIR_W PCT_FCM PCT_TCPIP
------ ------------- ------------- ------------- -------- ------------ -------- ---------- ---------- -------- ------- --------- -------- ---------- --------- --------- ------- ---------
0 272833 265769 97.41 0.00 0.00 0.35 0.04 0.00 0.00 0.00 0.00 2.05 0.00 0.03 0.02 93.11 0.03
1 7193923 2107537 29.29 0.24 0.00 0.37 16.49 0.00 0.00 0.00 0.00 0.00 11.74 0.18 0.00 0.00 0.23
====================================================
SQL#TOPEXECT: Top SQL statements by execution time
====================================================
MEMBER NUM_EXEC COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME AVG_CPU_TIME PCT_WAIT_TIME AVG_SECT_TIME AVG_COL_TIME STMT_TEXT
------ ----------- -------------------- ------------------- ------------------------ -------------------- -------------------- ------------- ------------- ------------ -----------------------------------
0 40134 3535865 88.10 28.56 3149789 78 3.19 0.03 0.00 CALL BROKERVOLUME_F1(?, ?, ?, ?, ?)
0 40139 3529657 87.93 28.51 3526253901 87851 0.02 87.93 0.00 SELECT ARRAY_AGG(B_NAME), ARRAY_AGG(
0 147325 632104 4.29 5.10 69830522 473 0.58 0.01 0.00 CALL MARKETWATCH_F1(?, ?, ?, ?, ?, ?
0 88301 570747 6.46 4.61 575044790 6512 0.00 6.46 0.00 SET :HV00014 :HI00014 = ( SELECT 100
0 78118 538322 6.89 4.34 3502808 44 97.87 0.01 0.00 CALL TRADERESULT_F6(?, ?, ?, ?, ?
0 155401 428013 2.75 3.45 9905143 63 0.25 0.04 0.00 CALL TRADESTATUS_F1(?, ?, ?, ?, ?
0 155409 402792 2.59 3.25 391322528 2518 2.81 2.59 0.00 SELECT ARRAY_AGG(T_ID), ARRAY_AGG
0 19683 258882 13.15 2.09 1492058 75 0.11 0.03 0.00 CALL TRADELOOKUP_F2(?, ?, ?, ?, ?, ?
0 19429 207988 10.70 1.68 1489319 76 0.15 0.04 0.00 CALL TRADELOOKUP_F3(?, ?, ?, ?, ?, ?
0 19682 205605 10.44 1.66 14673734 745 94.66 10.44 0.00 SELECT ARRAY_AGG(T_ID), ARRAY_AGG
0 19422 169865 8.74 1.37 16100432 828 92.28 8.74 0.00 SELECT ARRAY_AGG(T_ID), ARRAY_AGG
0 5428 112541 20.73 0.90 794506 146 97.23 0.04 0.00 CALL TRADEUPDATE_F2(?, ?, ?, ?, ?
0 19671 105273 5.35 0.85 1328777 67 0.08 0.04 0.00 CALL TRADELOOKUP_F1(?, ?, ?, ?, ?
Percent of total
Average coord CPU time in
coord exec time See which statements have
exec time microseconds
(across all queries) high percent (>30) wait time
13
SQL Statement Wait Time Breakdown
EXECUTABLE_ID for each
Look at PCT_WAIT
statement is available in later
first
sections of the db2mon report. Useful
for collecting access plan or
retrieving full statement text
============================================================================
SQL#TOPWAITT: Wait time breakdown for top SQL statements by execution time
============================================================================
MEMBER PCT_WAIT PCT_LG_DSK PCT_LG_BUF PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_RCLM PCT_CF PCT_PFTCH PCT_DIAG PCT_POOL_R PCT_DIR_R PCT_DIR_W PCT_FCM STMT_TEXT
------ -------- ---------- ---------- -------- ------------ -------- -------- ------- --------- -------- ---------- --------- --------- ------- --------------------------------------------------------------
0 3.19 0.00 0.00 0.00 0.00 3.21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL BROKERVOLUME_F1(?, ?, ?, ?, ?)
0 0.02 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SELECT ARRAY_AGG(B_NAME), ARRAY_AGG(CAST(VOLUME AS DECI
0 0.58 0.00 0.00 0.00 0.00 0.58 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL MARKETWATCH_F1(?, ?, ?, ?, ?, ?, ?)
0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SET :HV00014 :HI00014 = ( SELECT 100 * ( ( CAST
0 97.87 96.14 0.00 0.00 0.00 1.73 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL TRADERESULT_F6(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
0 0.25 0.00 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL TRADESTATUS_F1(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
0 2.81 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 2.47 0.00 0.00 0.00 SELECT ARRAY_AGG(T_ID), ARRAY_AGG(T_DTS), ARRAY_AGG(ST_NAME),
0 0.11 0.00 0.00 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL TRADELOOKUP_F2(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
0 0.15 0.00 0.00 0.00 0.00 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL TRADELOOKUP_F3(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
0 94.66 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 94.63 0.00 0.00 0.00 SELECT ARRAY_AGG(T_ID), ARRAY_AGG(T_BID_PRICE), ARRAY_AGG
0 92.28 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 92.25 0.00 0.00 0.00 SELECT ARRAY_AGG(T_ID), ARRAY_AGG(T_CA_ID), A
0 97.23 88.90 0.00 0.00 0.00 8.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL TRADEUPDATE_F2(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
0 0.08 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CALL TRADELOOKUP_F1(?, ?, ?, ?, ?, ?, ?, ?,
PCT_LG_DISK Log write time in DB#LOGWR Investigate log disk file system and storage
Slow Log Writes subsystem
PCT_LTCH Hot latches in LTC#WAITT db2latchtrace, stacks, check access plan
Latch Contention Are particular statements affected? Bad plan?
Disk I/O PCT_POOL_R BP sizes in BPL#SIZES Increase BP size, inline LOBs, check access
(Buffer pool, PCT_DIR_R BP hit ratios in BPL#HITRA plan, check underlying storage
LOBs) PCT_DIR_W Are particular statements affected? Bad plan?
Expensive AVG_COORD_EXEC_TIME Rows read / rows returned ratio in Check access plan, CDE perf trace (if BLU),
Queries (High PCT_COORD_STMT_EXEC_TIME SQL#TOPROWS stacks, check for spilling
CPU, Sorting, AVG_CPU_TIME Sort stats in SQL#TOPSORT
Spilling) PCT_WAIT_TIME (could be low!) High average logical reads in SQL#TOPIOSTA
PCT_LOCK Point in time lock waits in START#LOCKW and Tune application (e.g., isolation level)
Lock Contention END#LOCKW Check access plan
PCT_RCLM Find what tables (or indexes) are causing reclaims Enable append mode, tune index, table
Page Reclaims High PCT_LTCH with non-zero from PAG#RCM and PAG#RCSMP partitioning, EHL
PCT_RCLM
PCT_CF CF round trip command times in CF#RTTIM Increase CF workers
CF side command times in CF#CMDTM Reduce CF traffic (EHL)
CF Wait Need RDMA if using TCPIP transport
15
Find Common Bottlenecks (Examples)
========================================
Wait times at database level - percents
======================================== What’s “high” for a log
MEMBER TOTAL_RQST_TIME_MS TOTAL_WAIT_TIME_MS RQST_WAIT LOCK GLB_LOCK LATCH LOG_DSK RECLAIM CF POOL_R DIR_R DIR_W write time?
FCM TCPIP DIAG
------ ------------------ ------------------ --------- ------- -------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
0 1171250 818956 69.92 1.23 0.85 10.24 36.63 7.30 2.98 0.33 2.35 6.66 0.63 1.37 0.00
1 1178600 1121127 95.12 1.09 0.36 16.66 46.83 5.72 9.16 0.34 3.17 8.75 Higher than 1
1.06 2.21 0.00
==========================
Database log write times
millisecond can be an
issue but it depends.
==========================
If you don’t see a high
MEMBER NUM_LOG_WRITE_IO LOG_WRITE_IO_PER_S LOG_WRITE_MB_PER_S LOG_WRITE_TIME percent wait for log disk
LOG_WRITE_TIME_PER_IO_MS NUM_LOG_BUFFER_FULL
Latch contention
========================================
What next?
DB#WAITT: Wait times at database level
========================================
Based on latch name of top few
latches you might be able to
MEMBER TOTAL_RQST_TM TOTAL_WAIT_TM PCT_RQST_WAIT PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_LG_DSK PCT_LG_BUF PCT_RCLM PCT_CF PCT_PFTCH PCT_DIAG PCT_POOL_R PCT_DIR_R PCT_DIR_W PCT_FCM PCT_TCPIP
------ ------------- ------------- ------------- -------- ------------ -------- ---------- ---------- -------- ------- --------- -------- ---------- --------- --------- ------- ---------
0 4850728 4598430 94.79 8.36 0.00 84.17 1.00 0.00 0.00 0.00 0.00 0.00 1.17 0.00 guess but if not, get stacks. Might
0.00 0.00 0.06
MEMBER PCT_WAIT PCT_LG_DSK PCT_LG_BUF PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_RCLM PCT_CF PCT_PFTCH PCT_DIAG PCT_POOL_R PCT_DIR_R PCT_DIR_W PCT_FCM STMT_TEXT
------ -------- ---------- ---------- -------- ------------ -------- -------- ------- --------- -------- ---------- --------- --------- ------- ----------------------------------------
0 99.95 0.00 0.00 0.00 0.00 99.54 0.00 0.00 0.00 0.00 0.45 0.00 0.00 0.00 Insert into NEW_ORDER values (3132, 6, 1
0 100.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Insert into ORDER_LINE values (3116, 6,
0 100.00 0.00 0.00 0.00 0.00 93.75 0.00 0.00 0.00 0.00 6.19 0.00 0.00 0.00 Insert into ORDER_LINE values (3136, 3,
0 100.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Select D_NAME, D_STREET_1, D_STREET_2, D
0 99.66 0.00 0.00 0.00 0.00 74.91 0.00 0.00 0.00 0.00 25.08 0.00 0.00 0.00 Insert into ORDERS values (3136, 1007, 3
0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 Select C_LAST, C_CREDIT, C_DISCOUNT, C_B
0 99.22 0.00 0.00 0.00 0.00 77.51 0.00 0.00 0.00 0.00 21.70 0.00 0.00 0.00 Insert into ORDER_LINE values (3116, 6,
0 99.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.02 0.00 0.00 0.00 Select S_QUANTITY, S_DIST_01, S_DIST_02,
0 98.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.90 0.00 0.00 0.00 Update ORDER_LINE set OL_DELIVERY_D = '2
0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 Select S_QUANTITY, S_DIST_01, S_DIST_02,
0 98.64 0.00 0.00 0.00 0.00 98.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Insert into ORDER_LINE values (3122, 3,
0 98.55 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.55 0.00 0.00 0.00 Insert into ORDERS values (3132, 503, 6,
====================================================
AVG_COORD_EXEC_TIME / (AVG_CPU_TIME / 1000) close to 1
SQL#TOPEXECT: Top SQL statements by execution time which means this query is CPU bound
====================================================
MEMBER NUM_EXEC COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME AVG_CPU_TIME PCT_WAIT_TIME AVG_SECT_TIME AVG_COL_TIME STMT_TEXT
------ ----------- -------------------- ------------------- ------------------------ -------------------- -------------------- ------------- ------------- ------------ -----------------
0 10622 84442 7.94 42.71 82179091 7736 0.28 7.94 0.00 Select I_NAME,
0 10623 40384 3.80 20.43 1437751 135 96.57 3.80 0.00 Select S_QUANTITY
0 99 21787 220.07 11.02 1043864 10544 96.10 220.07 0.00 Select Count(Dist
0 693 16208 23.38 8.19 5568178 8034 65.36 23.38 0.00 Select C_ID,
===============================================================================
SQL#TOPROWS: Row level statistics per stmt - top statements by execution time
===============================================================================
What Next?
AVG_ROWS_READ / AVG_ROWS_RET >> 1
AVG_D_LRD >> AVG_I_LRD
AVG_I_LRD is zero
=========================================================================
SQL#TOPIOSTA: IO statistics per stmt - top statements by execution time
=========================================================================
Probably no index! Check the plan
MEMBER NUM_EXEC AVG_D_LRD AVG_D_PRD AVG_I_LRD AVG_I_PRD AVG_TD_PRD AVG_TI_PRD AVG_COL_LRD AVG_COL_PRD AVG_DIR_R_RQS AVG_DIR_W_RQS STMT_TEXT
------ ----------- ------------------ ------------ ------------------ ------------ ------------ ------------ ------------------ ------------ ------------- ------------- ----------------
0 10622 2212.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Select I_NAME,
0 10623 1.8 0.8 4.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 Select S_QUANTITY 18
0 99 286.6 85.4 831.2 17.5 0.0 0.0 0.0 0.0 0.0 0.0 Select Count(Dist
0 693 3000.0 0.0 25.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Select C_ID,
A Word on Access Plans
●
Sometimes an application can be configured or do things such that the plans
the application gets do not match what you get via CLP
– Rare, but when it happens it can be very annoying and difficult to debug
●
The EXPLAIN_FROM_SECTION stored procedure is the most accurate way
to retrieve access plans (will be exactly what the application got)
●
Uses the EXECUTABLE_ID of the statement (from
MON_GET_PKG_CACHE_STMT)
– Need statement to still be in package cache
●
db2mon provides a report (INF#EXPLN) with the EXECUTABLE_IDs for the
top SQL statements
19
Looking for a specific metric?
●
There are a lot of metrics reported by db2mon
– Too many to list
– And growing as we slowly fill in gaps where the tool is lacking
●
grep db2mon.pl for the metric name
●
If you don’t find it open an issue or pull request to have it
included
20
Pain Points
●
Long running queries
– Most metrics are not rolled up until the statement completes
– Need to collect MON_GET_PKG_CACHE_STMT after query completes
●
Sometimes the query never completes!
– Would need to rely on “in-flight” metrics from active SQL monitors
– Normally end up having to dump stacks
●
No metrics for BLU evaluators
●
No sort or join spill metrics
●
No WLM metrics
21
Requirements
●
All versions of Db2 LUW are supported starting from v9.7
●
User temp tablespace is required for DGTTs
– Created by db2mon.sh
●
Database monitoring switches must be BASE or EXTENDED
– MON_REQ_METRICS *****************************
db2mon tries to warn
about missing prereqs
Checking db2mon prerequisites
at the top of the report
– MON_ACT_METRICS *****************************
22
Where can I find it?
●
Stable version part of the Db2 source tree starting in v11.1
●
Can find it on customer systems on sqllib/samples/perf
●
Experimental version on git
– https://github.ibm.com/DB2-Performance/db2mon
– Everyone is welcome to open issues and or submit PRs!
●
Documentation is available
– Db2 Knowledge Center
●
“Collecting and reporting performance monitor data with db2mon”
– In the git repo above
23
Introducing Db2 Performance Bootcamp
●
Designed to help people gain experience debugging performance
problems
– Through exercises that simulate common performance problems
●
Work in progress (only 3 exercises so far)
●
Creates a VM, installs Db2, builds a database and deploys a workload
●
Great place to practice with db2mon
●
https://github.ibm.com/DB2-Performance/bootcamp
– Feel free to open issues or contribute through pull requests
24