Depends how we look at it.
Yes, we were maxed out on available core capacity.
No, the lpar did not use all threads/logical CPUs all the time.
In my Mining AWR post I observed discrepancy for CPU usage numbers between the OS and the DB even when the DB is the main user on the machine (machine has another instance but this DB is the main production DB).
Mystery solved: for the OS CPU usage we were looking at time spent on threads (logical CPUs) by all processes running on the machine; for the DB the numbers reflected core capacity used by the DB.
Core capacity used
According to my Where is my CPU post on slow day (4/27) the lpar was using around 14 cores whole day. The lpar is uncapped, the shared pool has 16 physical cores, obviously not all available for this machine.
Also, out of the 14 the DB is using around 11 cores.
Time spent on threads
OS CPU usage from V$OSSTAT, or host CPU in SYSMETRIC is thread based. So as sampled ASH data from the DB.
The OS includes all processes running on the lpar, but I suspect the DB must be the biggest CPU user. ASH may also including CPU WAIT when the machine is short on CPU resource…it can help us to track CPU shortage at thread level.
Let’s look at logical CPU usage for both the OS and the DB.
CPU_ORA_DEMAND ASH session_state='ON CPU' CPU_ORA_CONSUMED dba_hist_sysmetric_summary metric_name='CPU Usage Per Sec' CPU_OS dba_hist_sysmetric_summary metric_name='Host CPU Usage Per Sec' OS_LOAD dba_hist_sysmetric_summary metric_name='Current OS Load'
We can see that the DB is responsible for most of the CPU demand on the machine. As demand goes up so as OS_LOAD and CPU_OS. Outside of batches demand is slightly lower than OS_CPU, the lpar is holding up. Demand is higher than available CPU capacity during batches, lpar CPU is maxed out during batches (from batches and other activity combined).
OS_CPU capacity used during batches peaks at around 92… assuming SMT8 and maxed out logical CPUs by the DB it translates to 11.5 (=92/8) core capacity used…good match with core based CPU_ORA_CONSUMED.
The incident was almost a year ago. We still have AWR and grid data. But no access to nmon/OS info (I am a DBA w/limited OS support). I can account for around 11 cores used by the DB. With lpar using around 14 cores I am guessing the difference was used by processes outside the DB . Any idea if there is a way to track down what was using the remaining 3 cores (from sources I have access to…AWR & grid)?