(
Originally posted on blogs.sun.com at:
http://blogs.sun.com/mandalika/entry/running_batch_workloads_on_sun)
Ever since Sun introduced Chip Multi-Threading (CMT) hardware in the form of UltraSPARC T1's T1000/T2000, our internal mail aliases were inundated with variety of customer stories, majority of those go like '
batch jobs are taking 12+ hours on T2000, where as it takes only 3 or 4 hours on US-IV+ based v490'. Even after two and half years since the introduction of the revolutionary CMT hardware, it appears that majority of Sun customers are still under the impression that Sun's CMT systems like T2000, T5220 are not capable of handling CPU intensive batch workloads. It is not a valid concern. CMT processors like UltraSPARC T1, T2, T2 Plus can handle batch workloads just as well like any other traditional/conventional processor viz. UltraSPARC-IV+, SPARC64-VI, AMD Opteron, Intel Xeon, IBM POWER6. However CMT awareness and little effort are required at the customer end to achieve good throughput on CMT systems.
First of all, the end users must realize the fact that the maximum clock speed of the existing CMT processor line-up (UltraSPARC T1, UltraSPARC T2, UltraSPARC T2 Plus) is only 1.4 GHz; and on top of that each strand (individual hardware thread) within a core shares the CPU cycles with the other strands that operate on the same core (Note: each core operates at the speed of the processor). Based on these facts, it is no surprise to see batch jobs taking longer times to complete when only one or a very few single-threaded batch jobs are submitted to the system. In such cases, the system resources are fairly under-utilized in addition to the longer elapsed times. One possible trick to achieve the required throughput in the expected time frame is to split up the workload into multiple jobs. For example, if an EDU customer needs to generate 1000 transcripts, the customer should consider submitting 4 individual jobs with 250 transcripts each or 8 jobs with 125 transcripts each rather than submitting one job for all 1000 transcripts. Ideally the customer should observe the resource utilization (CPU%, for example); and experiment with the number of jobs to be submitted until the system achieves the desired throughput within the expected time frame.
Case study: Oracle E-Business Suite Payroll 11i workload on Sun SPARC Enterprise T5220In order to prove that the aforementioned methodology works beyond a reasonable doubt, let's take Oracle's E-Business Suite 11.5.10 Payroll workload as an example. On a single T5220 with one 1.4 GHz UltraSPARC T2 processor, acting as the batch, application and database server, 4 payroll threads generated 5,000 paychecks in 31.53 minutes of time consuming only 6.04% CPU on average. ~9,500 paychecks is the projected hourly throughput. This is a classic example of what majority of Sun's CMT customers are experiencing as of today i.e., longer batch processing times with little resource consumption. Keep in mind that each UltraSPARC T2 and UltraSPARC T2 Plus processors can execute up to 64 jobs in parallel (
on a side note, UltraSPARC T1 processor can execute up to 32 jobs in parallel). So to put the idling resources for effective use, there by to improve the elapsed times and the overall throughput, few experiments were conducted with 64 payroll threads and the results are very impressive. With a maximum of 64 payroll threads, it took only 4.63 minutes to process 5,000 paychecks at an average of 40.77% CPU utilization. In other words, similarly configured T5220 can process ~64,700 paychecks at less than half of the available CPU cycles. Here is a word of caution: just because the processor can execute 64 threads in parallel, it doesn't mean it is always optimal to submit 64 parallel jobs on systems like T5220. Very high number of batch jobs (
payroll threads in this particular scenario) might be an overkill for simple tasks like NACHA in Payroll process.
The following white paper has more detailed information about the nature of the workload and the results from the experiments with various number of threads for different components of the Oracle Applications' Payroll batch workload. Refer to the same white paper for the exact tuning information as well.
Link to the white paper:
E-Business Suite Payroll 11i (11.5.10) using Oracle 10g on a Sun SPARC Enterprise T5220Here is the summary of the results that were extracted from the white paper:
Hardware configuration 1x Sun SPARC Enterprise T5220 for running the application, batch and the database servers
Specifications: 1x 1.4 GHz 8-core UltraSPARC T2 processor with 64 GB memory
Software configuration Oracle E-Business Suite 11.5.10
Oracle 10g R1 10.1.0.4 RDBMS
Solaris 10 8/07
ResultsOracle E-Business Suite 11i Payroll - Number of employees: 5,000Component | #Threads | Time (min) | Avg. CPU% | Hourly Throughput |
---|
Payroll process | 64 | 1.87 | 90.56 | 160,714 |
PrePayments | 64 | 0.20 | 46.33 | 1,500,000 |
Ext. Proc. Archive | 64 | 1.90 | 90.77 | 157,895 |
NACHA | 8 | 0.05 | 2.52 | 6,000,000 |
Check Writer | 24 | 0.38 | 9 | 782,609 |
Costing |
48 | 0.23 | 32.5 | 1,285,714 |
Total or Average | NA |
4.63 min | 40.77% | 64,748 |
---|
It is evident from the average CPU% that the Payroll process and the External Process Archive components are extremely CPU intensive; and hence take longer time to complete. That's the reason 64 threads were configured for those components to run at the full potential of the system. Light-weight components like NACHA need fewer threads to complete the job efficiently. Configuring 64 threads for NACHA will have a negative impact on the throughput. In other words, we would be wasting CPU cycles for no apparent improvement.
It is the responsibility of the customers to tune the application and the workload appropriately. One size doesn't fit all.
The Payroll 11
i results on the T5220 demonstrate clearly that Sun's CMT systems are capable of handling batch workloads well. It would be interesting to see how well they perform against other systems equipped with traditional processors with higher clock speeds. For this comparison, we could use couple of results that were published by UNISYS and IBM with the same workload. The following table summarizes the results from the following two white papers. For the sake of completeness, Sun's CMT results were included as well.
Source URLs:
- E-Business Suite Payroll 11i (11.5.10) using Oracle 10g on a UNISYS ES7000/one Enterprise Server
- E-Business Suite Payroll 11i (11.5.10) using Oracle 10g for Novell SUSE Linux on IBM eServer xSeries 366 Servers
Oracle E-Business Suite 11i Payroll - Number of employees: 5,000Vendor | OS | Hardware Config | #Threads | Time (min) | Avg. CPU% | Hourly Throughput |
---|
UNISYS | Linux: RHEL 4 Update 3 | DB/App/Batch server: 1x Unisys ES7000/one Enterprise Server (4x 3.0 GHz Dual-Core Intel Xeon 7041 processors, 32 GB memory) | 121 | 5.18 min | 53.22% | 57,915 |
IBM | Novell SUSE Linux Enterprise Server 9 SP1 | DB, App servers: 2x IBM eServer xSeries 366 4-way server (4x 3.66 GHz Intel Xeon MP Processors (EM64T), 32 GB memory) | 12 | 8.42 min | 50+%2 |
35,644 |
|
Sun | Solaris 10 8/07 | DB/App/Batch server: 1x Sun SPARC Enterprise T5220 (1x 1.4 GHz 8-core UltraSPARC T2 processor, 64 GB memory) | 8 to 64 | 4.63 min | 40.77% | 64,748 |
Better results were highlighted. The results speak for themselves. One 1.4 GHz UltraSPARC T2 processor outperformed four 3 GHz / 3.66 GHz processors in terms of the average CPU utilization and most importantly in the hourly throughput (
Hourly throughput calculation relies on the total elapsed time).
Before we conclude, let us reiterate few things purely based on the factual evidence presented in this blog post:
- Sun's CMT servers like T2000, T5220, T5240 (two socket system with UltraSPARC T2 Plus processors) are good to run batch workloads like Oracle Applications Payroll 11i
- Sun's CMT servers like T2000, T5220, T5240 are good to run the Oracle 10g RDBMS when the DML/DDL/SQL statements that make up the majority of the workload are not very complex, and
- When the application is tuned appropriately, the performance of CMT processors can outperform some of the traditional processors that were touted to deliver the best single thread performance
Footnotes1. There is a note in the
UNISYS/Payroll 11i white paper that says "[...] the gains {
from running increased numbers of threads} decline at higher numbers of parallel threads." This is quite contrary to what Sun observed in its
Payroll 11i experiments on UltraSPARC T2 based T5220. Higher number of parallel threads (
maximum: 64) improved the throughput on T5220, where as UNISYS' observation is based on their experiments with a maximum of 12 parallel threads. Moral of the story: do NOT treat all hardware alike.
2. IBM's
Payroll 11i white paper has no references to the average CPU numbers. 50+% was derived from the "Figure 3: Average CPU Utilization".
________________
Technorati Tags:
Sun |
Solaris |
CMT |
T5220 |
T2000 |
Batch Jobs |
Niagara |
Oracle |
E-Business Suite |
Payroll
Defending a Traffic Citation with Request for Trial by Written Declaration
(
The content of this blog post is relevant only for those people who live in the parts of the United States where 'Request for Trial by Written Declaration' is an option for the defendants)
Got a citation for any kind of traffic infraction? If you plan to plead guilty or to defend yourself in the traffic court, consider sending a
Request for Trial by Written Declaration
(aka TBD). Trial by written declaration effectively eliminates two trips to the court; and improves the chance of winning the case as long as the defendant do not exhibit any sign of admitting the guilt.
The major steps involved in requesting the trial by written declaration are as follows:
- Read the instructions posted for the defendants who are considering 'Trial by Written Declaration'. For example, California folks have to look at the Form TR-200 for the instructions.
- Complete the 'Request for Trial by Written Declaration' form. Again, California drivers can fill out the Form TR-205 to request the trial by written declaration.
- Carefully draft all the facts, evidence etc., that are relevant to the citation under 'STATEMENT OF FACTS' section in the 'Request for Trial by Written Declaration' form. It is very important that you do NOT plead guilty and NOT write anything that admits the guilt of any sort implicitly or explicitly. Admitting the guilt in any form hurts the chances of winning the case.
- It is required to include the following sentence in the 'STATEMENT OF FACTS' section.
I declare under penalty of perjury under the laws of the State of [INSERT_YOUR_STATE_NAME_HERE] that the foregoing is true and correct.
- Send the bail amount in the form of a check along with the 'Request for Trial by Written Declaration' form, or pay the bail amount over the web or by phone if the traffic court accepts such payments.
- Send the filled in 'Request for Trial by Written Declaration' form along with the bail amount, if not paid already, at least 5 days prior to the due date (excluding holidays) indicated on the traffic citation. Check the instructions and the citation for the exact deadlines.
- The 'Request for Trial by Written Declaration' form and the bail amount must be sent via 'Certified Mail' with a request for the return receipt.
- Once the request has been sent, it may take a while for the court to review the evidence/facts submitted by the defendant and the patrol officer before they mail the decision. So just relax and wait for court's decision. That is all there is to it.
It is strongly suggested to do diligent research about various steps involved in fighting a citation involving traffic infraction.
_______________
Technorati Tags:
Traffic Ticket |
Traffic Citation