3 simple steps to improve the performance of any native application on Solaris 9 or later versions:
- Run the application; and collect trapstat data with maximum load on the system
trapstat -T 10 10
Check the %time spent on dTLB misses
- Preload mpss.so.1 interposing library of Solaris and configure the application to use large pages. This can be done by writing a simple wrapper wround the invokation of application
You can check the supported page sizes on your machine by typing "pagesize -a". Common page sizes on SPARC systems: 8K (default), 64K, 512K, 4M
Wisely choose the page size for the application; else lot of resources may get wasted thereby degrading the performance of the system
Create the wrapper script as follows Or add the following lines (upto export MPSSCFGFILE ..) to your script, if you have one:
#!/bin/ksh
LD_PRELOAD=mpss.so.1
MPSSCFGFILE=/tmp/mpsscfg
MPSSERRFILE=/tmp/mpsserr
export MPSSCFGFILE MPSSERRFILE LD_PRELOAD
exec <application name> <args to application>
Then create a simple configuration for MPSS:
eg., If myapp is the name of the application, the following line creates the mpss config file to let the application use 4M pages for heap (default: 8K pages) and a 64K stack (default: 8K)
echo "myapp*:4M:64K" > /tmp/mpsscfg
- Finally, run the application by executing the wrapper script. And collect the trapstat statistics and measure the difference in performance
With the help of large pages, the application's performance may improve due the reduced number of dTLB misses.
eg.,
The following data was collected by running Siebel with default 8K pages on Sun's v480 server:
sdcv480s002:/export/home/sunperf/perf_tools/%grep ttl trapstat-vanilla.txt
ttl | 918305 5.2 9363 0.4 | 1148524 8.0 66553 3.2 |16.7
ttl | 990784 5.6 9888 0.4 | 1202256 8.4 67298 3.3 |17.6
ttl | 960221 5.4 9764 0.4 | 1192122 8.3 68607 3.3 |17.5
ttl | 982697 5.6 9934 0.4 | 1232264 8.5 69221 3.3 |17.8
ttl | 1007827 5.7 10295 0.4 | 1273141 8.8 72519 3.5 |18.5
ttl | 1011441 5.7 10031 0.4 | 1222785 8.5 69450 3.4 |18.1
ttl | 961155 5.4 9469 0.4 | 1191395 8.2 65668 3.2 |17.2
ttl | 1019467 5.8 11088 0.5 | 1265553 8.9 77352 3.8 |18.9
ttl | 1009262 5.7 10638 0.4 | 1276510 8.9 74925 3.6 |18.7
ttl | 1021536 5.8 10554 0.4 | 1280768 8.9 72188 3.5 |18.6
The following data shows the reduced number of dtlb, dtsb misses with 4M pages:
sdcv480s002:/export/home/sunperf/perf_tools/%grep ttl 771mpss-trapstat.txt
ttl | 1497319 8.4 1082 0.0 | 131236 1.1 2577 0.1 | 9.7
ttl | 1305635 7.4 1020 0.0 | 117982 1.0 2483 0.1 | 8.5
ttl | 1626490 9.2 1028 0.0 | 145789 1.2 2754 0.1 |10.6
ttl | 1424718 8.0 1063 0.0 | 130317 1.1 2665 0.1 | 9.3
ttl | 1411515 8.0 982 0.0 | 126710 1.1 2532 0.1 | 9.2
ttl | 1443108 8.1 925 0.0 | 128753 1.1 2577 0.1 | 9.4
ttl | 1512549 8.5 1037 0.0 | 131343 1.1 2518 0.1 | 9.8
ttl | 1246909 7.0 834 0.0 | 107194 0.9 2299 0.1 | 8.1
ttl | 1387027 7.8 1004 0.0 | 126112 1.1 2636 0.1 | 9.1
ttl | 1477135 8.3 989 0.0 | 126294 1.1 2477 0.1 | 9.6
And the application with 4M pages, performed nearly 7% better compared to the vanilla run