Mandalika's scratchpad [ Work blog @Oracle | My Music Compositions ]

Old Posts: 09.04  10.04  11.04  12.04  01.05  02.05  03.05  04.05  05.05  06.05  07.05  08.05  09.05  10.05  11.05  12.05  01.06  02.06  03.06  04.06  05.06  06.06  07.06  08.06  09.06  10.06  11.06  12.06  01.07  02.07  03.07  04.07  05.07  06.07  08.07  09.07  10.07  11.07  12.07  01.08  02.08  03.08  04.08  05.08  06.08  07.08  08.08  09.08  10.08  11.08  12.08  01.09  02.09  03.09  04.09  05.09  06.09  07.09  08.09  09.09  10.09  11.09  12.09  01.10  02.10  03.10  04.10  05.10  06.10  07.10  08.10  09.10  10.10  11.10  12.10  01.11  02.11  03.11  04.11  05.11  07.11  08.11  09.11  10.11  11.11  12.11  01.12  02.12  03.12  04.12  05.12  06.12  07.12  08.12  09.12  10.12  11.12  12.12  01.13  02.13  03.13  04.13  05.13  06.13  07.13  08.13  09.13  10.13  11.13  12.13  01.14  02.14  03.14  04.14  05.14  06.14  07.14  09.14  10.14  11.14  12.14  01.15  02.15  03.15  04.15  06.15  09.15  12.15  01.16  03.16  04.16  05.16  06.16  07.16  08.16  09.16  12.16  01.17  02.17  03.17  04.17  06.17  07.17  08.17  09.17  10.17  12.17  01.18  02.18  03.18  04.18  05.18  06.18  07.18  08.18  09.18  11.18  12.18 

Sunday, December 04, 2005
Sun Studio C/C++: Improve performance with -xtarget, -xarch

Even though many software vendors don't support SPARC v8 architecture (ie., pre-UltraSPARC era), for some reason they hesitate to use -xtarget option with any value other than generic (default), in building their softwares. Perhaps they are not aware of the benefits of specifying target platform and/or not spending enough time experimenting with different values to compare the performance.

In general, it is always recommended to specify the target platform with -xtarget option, and the target instruction set architecture with -xchip option, for better performance. I believe one of the major concern {for software vendors} in specifying the target platform is the suspicion that the application may not run on a wide range of platforms. While it is true upto some extent, still there is a chance to specify some value for the target platform, if we knew that all the supported architecture is compatible with the one we specify with -xchip option.

32-bit SPARC applications, and -xtarget=ultra3 -xarch=v8plusa

For example, for a 32-bit application, if we know for sure that the supported architecture will only be UltraSPARC chip architecture, it is strongly recommended to use -xtarget=ultra3, -xarch=v8plusa options in building the application. -xarch=v8plusa selects an instruction set that is Okay for all the members of UltraSPARC family (US-I, II, III, III+, IV, IV+, T1 (code named Niagara)). -xchip=ultra3 tells the optimizer to optimize for best execution on US-III, and later systems. The code will run well on the US-I & II boxes, but possibly a little slower than if optimized for them.

Performance improvement from a real world application

One of our partners (an ISV, in short) is shipping their product with -xtarget=generic -xarch=v8plusa for the past few years. Their application supports only UltraSPARC platform. So, recently I have experimented with their application by building it with -xtarget=ultra3 -xarch=v8plusa on a US-IV machine. When the application was run on a US-III box with moderate workload, (not so surprisingly) the run-time performance of the application was improved by ~2.5% (compared to the numbers from -xtarget=generic -xarch=v8plusa build). Of course, there is no performance regression on a US-II box, and the performance is comparable to the vanilla build ie., built with -xtarget=generic -xarch=v8plusa option; also the performance gains on a US-IV box is relatively comparable to the gains on a US-III box.

These experiments gave enough confidence to the ISV to go with -xtarget=ultra3 -xarch=v8plusa combination; and the next version of their application is being built with those options.

Do not use -xtarget=ultra3, if there is a heavy use of the Sun performance library. In that case you really need to have specific separate builds for all the target platforms, because there is no single optimized perflib is available, that is suitable for all architectures.

Excerpts from Darryl Gove's Selecting the Best Compiler Options article

Darryl Gove, a senior performance engineer at Sun Microsystems, recently posted an article about selecting the best compiler options to improve the run-time performance of the application(s). Since it has a ton of information about 32/64-bit applications on UltraSPARC, x64/x86 platforms, I thought of copy, pasting the relevant information here {for completeness}, instead of just pointing to the article.

Specify the Target Platform and Architecture as Explicitly as Possible

The target platform specifies the processor that the application is expected to run on, the minimum processor that is required, and whether the application is 32-bit or 64-bit. For compiler versions prior to the SunStudio 9 release, the compiler specified a generic processor; SunStudio 9 compilers target an UltraSPARC processor for the SPARC architecture, and a generic x86 based processor for the x86 architecture. In all cases it is best to explicitly specify the target processor, since it is possible in some cases for the target processor to depend on the hardware upon which the application is built.

There are a number of compiler flags that specify the target. The flag -xtarget sets all the other flags to appropriate default values for the given target processor: -xarch, -xchip, and -xcache. The flag -xarch sets the instruction set that the processor supports, the flag -xchip specifies how the compiler should use these instructions. Finally the flag -xcache specifies the
structure of the caches for this target (however this flag may not have any impact for many codes). As with all compiler flags, the order is important; flags accumulate from left to right, in the event that there are conflicting settings the flag on the right will override the values of flags which were specified earlier on the
command line.

A point to be cautious of is that specifying a more recent hardware target may mean that older hardware is no longer able to run the application. In particular specifying the target as being an UltraSPARC platform means that the application will no longer run on pre-UltraSPARC processors (however UltraSPARC processors have been shipping for over 10 years). Similarly specifying an Opteron processor will mean that the code no longer runs x86-compatible processors that do not have the SSE2 instruction set extensions.

Specifying the target platform for the UltraSPARC processor family

For UltraSPARC processors, a generally good option pair to use is -xtarget=ultra3 with -xarch=v8plusa. These options allow the compiler to generate 32-bit code that can run on all the members of the UltraSPARC family and their follow-ons (UltraSPARC I, UltraSPARC II, UltraSPARC III, UltraSPARC IV). The compiler will also schedule the code especially for the UltraSPARC III. These options represents a good compromise, since code scheduled for the UltraSPARC III is better at taking advantage of the new features of the UltraSPARC III architecture, while still providing good performance on previous generations of processors.

If the application requires the capability to address 64-bit memory addresses, then the appropriate flags to use are -xtarget=ultra3 -xarch=v9a which adds 64-bit addressing whilst still targeting all the members of the UltraSPARC family of processors.

Recommended compiler flags for the UltraSPARC platform
32-bit code-xtarget=ultra3 -xarch=v8plusa
64-bit code-xtarget=ultra3 -xarch=v9a

Specifying the target processor for the x64 processor family

By default the compiler targets a 32-bit generic x86 based processor, so the code will run on any x86 processor from a Pentium Pro up to an AMD Opteron architecture. Whilst this produces code that can run over the widest range of processors, this does not take advantage of the extensions offered by the Opteron family of processors. Consequently it is recommended that for 32-bit code the Opteron processor is targeted, this will generate code that will run on processors (such as the Pentium 4 and Opteron) which support the SSE2 instruction set extensions.

To take advantage of the x64 processor family and the advantages of 64-bit code, the appropriate compiler flags are -xtarget=opteron -xarch=amd64.

Recommended compiler flags for the x64 platform
32-bit code-xtarget=opteron
64-bit code-xtarget=opteron -xarch=amd64

Using -xtarget=generic

The compiler also supports the options -xtarget=generic and -xtarget=generic64. These options tell the compiler to produce code which runs well on as wide a range of machines as possible. One feature of these flags is that they will be interpreted appropriately on both the SPARC and x64 platforms -- so using them may mean fewer changes to makefile flags. The following table shows how the compiler will interpret the -xtarget=generic flag on both the SPARC and x64 platforms.

-xtarget=genericV8plus architecture386 architecture
-xtarget=generic64V9 architectureAMD64 architecture

Darryl Gove, Sun Product Technical Support JSE EMEA
Technorati tags:

Comments: Post a Comment

<< Home


This page is powered by Blogger. Isn't yours?