Lpar test POWER8

In lpar test I looked at an lpar running on POWER7 virtual processors. This time I will run the same test on an lpar with POWER8.

In power8.xlsx I calculated these theoretical utilization numbers:

smt core perf core use thread perf thread use thread resp
1 1.00 0.50 1.00 0.50 1.00
2 1.45 0.73 0.73 0.36 1.38
4 1.89 0.95 0.47 0.24 2.12
8 2.00 1.00 0.25 0.13 4.00

My expectation is to have a 2 fold workload increase between single threaded and SMT8 modes.

At the same time each thread will run 4 times slower, a trade-off for more throughput.

In my load test script (hog) I am using a PL/SQL loop to drive up the load on the CPUs

a := 1; begin for i in 1..40000 loop a := ( a + i )/11; end loop; end;

I use no data/sql so there is no variation in execution (like change in execution plan, or use of direct path read during full table scans, etc.). Therefore the same hog can be used on different databases to compare performance. I am counting the number of executions “tx_s” and response time for each loop “res_ms”.

lpar with POWER8> hog plsql_loop nobind '0 80 1' | tee hog_plsql_loop_`hostname -s`_${ORACLE_SID}_`date +%m%d%y_%H%M`.log

power8 top activity

Please see data collected in power8.xlsx lpar test sheet.

On this lpar we do not have “app” output from lparstat (I am guessing sysadmins did not expose it) but vmstat’s physical processors consumed (pc) steadily increased during the test. Core capacity used by the lpar was maxed out at around 8,  it has 8 virtual processors backed by sufficient core capacity from the shared pool.

DB and session CPU consumptions “db_cpu”/”sess_cpu” are matching core capacity used “pc” of the machine, the database is responsible for the load on the machine.

“sess_cpu” is matching “purr” which is core used by all hog processes as it captured by ‘pprof’, therefore Oracle session CPU usage is expressed in core capacity.

At the same time hog process CPU consumption expressed as time spent on CPU in “time” column is matching the number of hogs. Each hog is designed to max out (constantly running on) the logical CPU/thread.

Up until around 60 hogs “os_cpu” is matching “time” from ‘pprof’, so we know OS CPU utilization is time based. OS is busy from hogs.

At high load “os_cpu” and “ash_cpu” are going above the maximum time based CPU capacity of 64. Same true for the core based Db and session CPU used from time model “tm_cpu” and “db_cpu”. Looks like these metrics won’t give us true CPU capacity used (time or core based), they mix in some time component after CPUs maxed out. All metrics are documented in hog post.

Let’s see if test results are matching my expectations.

Workload increase is linear from 1 to 10 hogs, we have 10 virtual processors.
Workload went up from 1169 tx/s to 1830 tx/sec, a 56% increase when the processor switched from single threads to SMT2.
Workload went up from 1169 tx/s to 2521 tx/sec, a 2.15 fold increase when the processor switched from single threads to SMT8.
We are close (slightly better) compared to calculated workload increase for POWER8.

Response time started at 6 ms with 1 hog and goes up to 25 ms with 64 hogs… 3.7 fold increase…a bit faster then I expected. Calculation was based on expected performance gain based on typical workload. Perhaps the workload generated by hog favors POWER8, so I revised the estimated performance increase for POWER8; see updated numbers in lpar test sheet in power8.xlsx

power8 throughput and response time

Looking at the data… for this type of workload I would not recommend running the DB with SMT8 (above 32 threads). It won’t provide much additional throughput and response time is significantly worse.

Our production DB is running on POWER8 with 16 virtual CPUs. It is time to apply some fuzzy logic. During batches half of the virtual processors switched from SMT4 to SMT8 (ballpark). Average expected response time increase is 44% (=(4-2.12)/2.12/2). It is matching the run time increase of batch between good day (4/18) and slow day (4/27). You can check post What happened to the batch? for the details.

In the next post I will compare test results between POWER7 and POWER8.