Root cause part 1

In “What happened to the batch?” I noted the increased load from the DB. Where the extra load coming from?

Most metrics are the same or less in DB load profile for slow day. In AWR diff report (awrdiff_0418_0427_dbsidxxx.html) parsing is down by 12% but hard parsing is up by 19% from 11.4/sec to 13.7/sec. The diff report does not compare the “Instance Efficiency Percentages” section, for that we have to go back to AWR reports. Between awr_0418_dbsidxxx.html and awr_0427_dbsidxxx.html) “% Non-Parse CPU” changed from 92% to 46%… more than half of the CPU was used for parsing on 4/27.

Let’s visualize CPU used for parsing and by the whole db. Sheet “whole db” in mining_awr.xlsx has both. I also included CPU used by all sqls found in sqlstat… which is not all the sqls executed by the DB… please remind me if you know the article explaining what goes into AWR with the snapshots.parse_cpu

DBCPU_sec	DBA_HIST_SYS_TIME_MODEL, 'DB CPU'
PARSE_CPU_S	DBA_HIST_SYSSTAT, 'parse time cpu'
SQL_CPU_SEC	DBA_HIST_SQLSTAT, sum(CPU_TIME_DELTA)

Zoom in to see it better parse_cpu.jpg

Around 4/23 parse CPU (PARSE_CPU_S) shut up and become big part of overall CPU use by the whole DB (DBCPU_sec). It calmed down for the weekend just to go back up for the whole week of 4/25. Also, it looks like combined CPU usage from DBA_HIST_SQLSTAT by all sqls (SQL_CPU_SEC) and parse CPU from DBA_HIST_SYSSTAT (PARSE_CPU_S) can’t push overall CPU utilization (DBCPU_sec) higher than 10000 centiseconds for any 15 min snapshot…core based DB CPU usage is maxed at around 11 sec (=10000/15/60).

The biggest DB Time gain on slow day (4/27) came from parse elapsed. See the data points in mining_awr.xlsx , sheet “whole db”. On this picture I am showing corelation between ‘parse time elapsed’ and ‘parse time cpu’.

parse_elapsed

parse_elapsed.jpeg

We have around 25000 seconds of parse related elapsed time for 15 minute snaps… translates to 2.4 million seconds added DB time from parsing (=25000*4*24) for the day (the length of my AWR reports). It is matching ‘hard parse time elapsed’ increase between awr_0418_dbsidxxx.html and awr_0427_dbsidxxx.html in Time Model Statistics section.

Check out Variable Load Impact On Oracle Database Hard Parsing by Craig Shallahamer, it explains a scenario with ballooning parse elapsed.

Next I will investigate where those hard parsing sqls came from.

Advertisements