Mandalika's scratchpad [ Work blog @Oracle | My Music Compositions ]

Old Posts: 09.04  10.04  11.04  12.04  01.05  02.05  03.05  04.05  05.05  06.05  07.05  08.05  09.05  10.05  11.05  12.05  01.06  02.06  03.06  04.06  05.06  06.06  07.06  08.06  09.06  10.06  11.06  12.06  01.07  02.07  03.07  04.07  05.07  06.07  08.07  09.07  10.07  11.07  12.07  01.08  02.08  03.08  04.08  05.08  06.08  07.08  08.08  09.08  10.08  11.08  12.08  01.09  02.09  03.09  04.09  05.09  06.09  07.09  08.09  09.09  10.09  11.09  12.09  01.10  02.10  03.10  04.10  05.10  06.10  07.10  08.10  09.10  10.10  11.10  12.10  01.11  02.11  03.11  04.11  05.11  07.11  08.11  09.11  10.11  11.11  12.11  01.12  02.12  03.12  04.12  05.12  06.12  07.12  08.12  09.12  10.12  11.12  12.12  01.13  02.13  03.13  04.13  05.13  06.13  07.13  08.13  09.13  10.13  11.13  12.13  01.14  02.14  03.14  04.14  05.14  06.14  07.14  09.14  10.14  11.14  12.14  01.15  02.15  03.15  04.15  06.15  09.15  12.15  01.16  03.16  04.16  05.16  06.16  07.16  08.16  09.16  12.16  01.17  02.17  03.17  04.17  06.17  07.17  08.17  09.17  10.17  12.17  01.18  02.18  03.18  04.18  05.18  06.18  07.18  08.18  09.18  11.18  12.18  01.19  02.19  05.19  06.19  08.19  10.19  11.19  05.20  10.20  11.20  12.20  09.21  11.21  12.22 


Thursday, February 03, 2005
 
Solaris 9 or later: More performance with Large Pages (MPSS)

3 simple steps to improve the performance of any native application on Solaris 9 or later versions:

  1. Run the application; and collect trapstat data with maximum load on the system
    trapstat -T 10 10

    Check the %time spent on dTLB misses
    
    
  2. Preload mpss.so.1 interposing library of Solaris and configure the application to use large pages. This can be done by writing a simple wrapper wround the invokation of application

    You can check the supported page sizes on your machine by typing "pagesize -a". Common page sizes on SPARC systems: 8K (default), 64K, 512K, 4M

    Wisely choose the page size for the application; else lot of resources may get wasted thereby degrading the performance of the system

    Create the wrapper script as follows Or add the following lines (upto export MPSSCFGFILE ..) to your script, if you have one:

    #!/bin/ksh
    LD_PRELOAD=mpss.so.1
    MPSSCFGFILE=/tmp/mpsscfg
    MPSSERRFILE=/tmp/mpsserr

    export MPSSCFGFILE MPSSERRFILE LD_PRELOAD

    exec <application name> <args to application>

    Then create a simple configuration for MPSS:

    eg., If myapp is the name of the application, the following line creates the mpss config file to let the application use 4M pages for heap (default: 8K pages) and a 64K stack (default: 8K)

    echo "myapp*:4M:64K" > /tmp/mpsscfg

  3. Finally, run the application by executing the wrapper script. And collect the trapstat statistics and measure the difference in performance

With the help of large pages, the application's performance may improve due the reduced number of dTLB misses.

eg.,
The following data was collected by running Siebel with default 8K pages on Sun's v480 server:

sdcv480s002:/export/home/sunperf/perf_tools/%grep ttl trapstat-vanilla.txt
ttl | 918305 5.2 9363 0.4 | 1148524 8.0 66553 3.2 |16.7
ttl | 990784 5.6 9888 0.4 | 1202256 8.4 67298 3.3 |17.6
ttl | 960221 5.4 9764 0.4 | 1192122 8.3 68607 3.3 |17.5
ttl | 982697 5.6 9934 0.4 | 1232264 8.5 69221 3.3 |17.8
ttl | 1007827 5.7 10295 0.4 | 1273141 8.8 72519 3.5 |18.5
ttl | 1011441 5.7 10031 0.4 | 1222785 8.5 69450 3.4 |18.1
ttl | 961155 5.4 9469 0.4 | 1191395 8.2 65668 3.2 |17.2
ttl | 1019467 5.8 11088 0.5 | 1265553 8.9 77352 3.8 |18.9
ttl | 1009262 5.7 10638 0.4 | 1276510 8.9 74925 3.6 |18.7
ttl | 1021536 5.8 10554 0.4 | 1280768 8.9 72188 3.5 |18.6

The following data shows the reduced number of dtlb, dtsb misses with 4M pages:

sdcv480s002:/export/home/sunperf/perf_tools/%grep ttl 771mpss-trapstat.txt
ttl | 1497319 8.4 1082 0.0 | 131236 1.1 2577 0.1 | 9.7
ttl | 1305635 7.4 1020 0.0 | 117982 1.0 2483 0.1 | 8.5
ttl | 1626490 9.2 1028 0.0 | 145789 1.2 2754 0.1 |10.6
ttl | 1424718 8.0 1063 0.0 | 130317 1.1 2665 0.1 | 9.3
ttl | 1411515 8.0 982 0.0 | 126710 1.1 2532 0.1 | 9.2
ttl | 1443108 8.1 925 0.0 | 128753 1.1 2577 0.1 | 9.4
ttl | 1512549 8.5 1037 0.0 | 131343 1.1 2518 0.1 | 9.8
ttl | 1246909 7.0 834 0.0 | 107194 0.9 2299 0.1 | 8.1
ttl | 1387027 7.8 1004 0.0 | 126112 1.1 2636 0.1 | 9.1
ttl | 1477135 8.3 989 0.0 | 126294 1.1 2477 0.1 | 9.6

And the application with 4M pages, performed nearly 7% better compared to the vanilla run


Comments: Post a Comment



<< Home


2004-2019 

This page is powered by Blogger. Isn't yours?