Is it the CPU?

In previous post we looked at the big picture. We compared AWR reports for 24 hour periods between baseline (4/18) and slow day (4/27).

In this post we will focus on areas responsible for the increased DB time: parsing, disk IO, and CPU.

Parsing

Statistics “parse time elapsed” ballooned, all came from “hard parse elapsed time”. Parsing (soft or hard) is CPU intensive to begin with, too much hard parsing is evil. Check out “Variable Load Impact On Oracle Database Hard Parsing” on orapub.com, great info! The huge increase in my parse elapsed suggest I had an overloaded system… how is that possible?

Also, “% Non-Parse CPU” decreased from 92% to 46%…more than half of the CPU used by the database came from hard parsing…bad.
We had a 12 fold increase in “parse time cpu”, it jumped from 37581 sec to 482825 seconds. At the same time “parse count (hard)” increased by 200 thousand (from 967714 to  1170580).
BTW, AWR report gets CPU numbers from dba_hist_sysstat,  it shows CPU used in centiseconds,  I converted to seconds.
Can the 20% increase in “parse count (hard)” be responsible for the huge increase in parse CPU?

Hard parsing can came from sql-s with literals.
AWR report may not show those sql-s, each alone is using relatively small resource to make it into “SQL Statistics”.
So we checked ASH reports. For 4/27 section “Top SQL using literals” showed two FORCE_MATCHING_SIGNATURE-s with high number of sql versions. My ASH report was for business hours between 8AM and 5PM, 9 hours long period. When adjusted for 24 hours (=(36613+36536)*24/9) we will get around 195 thousand unique sql-s with literals, it is a good match with the extra 200 thousand hard parse in AWR report for the day.

top_sql_using_literal
In addition, each of those distinct sql-s were very long (20KB) statements and they were no show before and after our slow week.
With that many distinct sql-s thrown at the database it will be interesting to check on the correlation between hard parse and  db CPU utilization, we will do that in the next post.

Disk IO

What about the 29% jump for “Read IO requests”? Can that be reason for slower synchronous reads?
“Avg Time(ms)” for “db file sequential read” increased from 2.75 to 6.11 msec.
But we did not read more data, neither “Read IO (MB)” nor “Write IO (MB)” increased.
For 4/18 “Wait Event Histogram” shows most waits were under 1ms, even on slow day half of the waits were shorter then 2 ms. Those are not physical reads from disk, they are too fast for that, the data must came from cache (OS and/or SAN).  Therefore–not counting the possibility of slower disk subsystem for some unknown external reason–the speed may be influenced by the available CPU.

CPU

Would you agree?
– We have 128 CPU-s, the OS CPU capacity is about 11 million seconds per day (=128*24*60*60)
– With DB CPU around 900 thousand seconds “Instance CPU” is around 8%
– Per OEM the OS CPU utilization is up from around 20% to 40%, same info can be gathered from AWR reports’ “Host CPU” section
– But we are not in any danger of maxing out the CPU capacity of the machine…right?

oem_cpu

Advertisements