|Mandalika's scratchpad||[ Work blog @Oracle | Stock Market Notes | My Music Compositions ]|
Few clarifications before we start.
Difference between 240K and 500K EE PeopleSoft NA Payroll benchmarks
Not too long ago Sun published PeopleSoft NA Payroll 240K EE benchmark results with 16 job streams and 8 job streams. First of all, I want to make sure everyone understands the fact that PeopleSoft NA Payroll 240K and 500K EE benchmarks are two completely different benchmarks. The 240K database model represents a large sized organization where as 500K database model represents an extra-large sized organization. Vendors [who benchmark] have the flexibility of configuring 8, 16, 24 or 32 parallel job streams (or threads) in those two benchmarks to parallellize the work being done.
Now that the clarifications are out of the way, here is the direct URL for the 500K Payroll benchmark results that Oracle|Sun published last week. (document will be updated shortly to fix the branding issues)
PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M5000 (500K EE 32 job streams)
What's changed at Sun & Oracle?
The 500K payroll benchmark work was started few months ago when Sun is an independent entity. By the time the benchmark work is complete, Sun was part of Oracle Corporation. However it has no impact whatsoever on the way we have been operating and interacting with the PeopleSoft benchmark team for the past few years. We (Sun) still have to package all the benchmark results and submit for validation just like any other vendor. It is still the same laborious process that we have to go through from both ends of Oracle (that is, PeopleSoft & Sun). I just mentioned this to highlight Oracle's non-compromising nature on anything at any level in publishing quality benchmarks.
SUMMARY OF 500K NA PAYROLL BENCHMARK RESULTS
The following bar chart summarizes all the published benchmark results by different vendors. Each 3D bar on X-axis represent one vendor, and the Y-axis shows the throughput (#payments/hour) achieved by corresponding vendor. Actual throughput and the vendor name is also shown in each of the 3D bar for clarity. Common sense dictates that higher the throughput, the better it is.
The numbers in the following table were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal of this benchmark is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.
|Vendor||OS||Hardware Config||#Job Streams||Elapsed Time (min)||Hourly Throughput
Payments per Hour
|Sun||Solaris 10 10/09||
1x Sun SPARC Enterprise M5000 with 8 x 2.53 GHz SPARC64 VII Quad-Core CPUs & 64G RAM
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes. Capacity: 960 GB
1 x Sun Storage 2510 Array for redo logs. Capacity: 272 GB. Total storage capacity: 1.2 TB
1 x IBM System z10 Enterprise Class Model 2097-709 with 8 x 4.4 GHz IBM System z10 Gen1 CPUs & 32G RAM
1 x IBM TotalStorage DS8300. Total capacity: 9 TB
1 x HP Integrity rx7640 with 8 x 1.6 GHz Intel Itanium2 9150 Dual-Core CPUs & 64G RAM
1 x HP StorageWorks EVA 8100. Total capacity: 8 TB
This is all public information. Feel free to compare the hardware configurations and the data presented in all three rows and draw your own conclusions. Since all vendors used the same benchmark toolkit, comparisons should be pretty straight forward.
Sun Storage F5100 Flash Array, the differentiator
Of all these benchmark results, clearly the F5100 storage array is the key differentiator. The payroll workload is I/O intensive, and requires low latency storage for better throughput (it is implicit that less latency means less I/O waits).
There is a lot of noise from some of the outside blog readers (I do not know who those readers are or who they work for) when Sun published the very first NA Payroll 240K EE benchmark with eight job streams using an F5100 array that has 40 flash modules (FMOD). Few people thought it is necessary to have those many flash modules to get that kind of performance that Sun demonstrated in the benchmark. Now that we have the 500K benchmark result as well, I want to highlight another fact that it is the same F5100 that was used in all the three NA Payroll benchmarks that Sun published in the last 10 months. Even though other vendors increased the number of disk drives when moved from 240K to 500K EE benchmark environment, Sun hasn't increased the flash modules in F5100 -- the number of FMODs remained at 40 even in 500K EE benchmark. This fact implicitly suggests at least two things -- 1. F5100 array is resilient, scales and performs consistently even with increased load. 2. May be 40 flash modules are not needed in 240K EE Payroll benchmark. Hopefully this will silence those naysayers and critics now.
While we are on the same topic, the storage capacity in the other array that was used to store the redo logs was in fact reduced from 5.3 TB in a J4200 array that was used in 240K EE/16 job stream benchmark to 272 GB in a 2510 array that was used in 500K EE/32 job stream benchmark. Of course, in both cases, the redo logs consumed only 20 GB on disk - but since the arrays were connected to the database server, we have to report the total capacity of the array(s) whether it is being used or not.
Notes on CPU utilization and IBM's #job streams
Even though I highlighted the I/O factor in the above paragraph, it is hard to ignore the fact that the NA Payroll workload is CPU intensive too. Even when multiple job streams are configured, each stream runs as a single-thread process -- hence it is vital to have a server with powerful processors for better [overall] performance.
Observant readers might have noticed couple of interesting things.
The maximum average CPU usage that Sun reported in 500K EE benchmark in any scenario by any process is only 43.99% (less than half of the total processing capacity)
The reason is simple. The SUT, M5000, has eight quad-core processors and each core is capable of running two hardware threads in parallel. Hence there are 64 virtual CPUs on the system, and since we ran only 32 job streams, only half of the total available CPU power was in use.
Customers in a similar situation have the flexibility to consolidate another workload onto the same system to take advantage of the available/remaining CPU cycles.
I do not know the exact reason - but if I have to speculate, it is as good as anyone's guess. Based on the benchmark results white paper, it appears that the z10 system (mainframe) has eight single core processors, and perhaps that is why they ran the whole benchmark with only eight job streams.
Benchmark Results White Papers
Best Practices White Paper