Mandalika's scratchpad | [ Work blog @Oracle | My Music Compositions ] |
Old Posts: 09.04 10.04 11.04 12.04 01.05 02.05 03.05 04.05 05.05 06.05 07.05 08.05 09.05 10.05 11.05 12.05 01.06 02.06 03.06 04.06 05.06 06.06 07.06 08.06 09.06 10.06 11.06 12.06 01.07 02.07 03.07 04.07 05.07 06.07 08.07 09.07 10.07 11.07 12.07 01.08 02.08 03.08 04.08 05.08 06.08 07.08 08.08 09.08 10.08 11.08 12.08 01.09 02.09 03.09 04.09 05.09 06.09 07.09 08.09 09.09 10.09 11.09 12.09 01.10 02.10 03.10 04.10 05.10 06.10 07.10 08.10 09.10 10.10 11.10 12.10 01.11 02.11 03.11 04.11 05.11 07.11 08.11 09.11 10.11 11.11 12.11 01.12 02.12 03.12 04.12 05.12 06.12 07.12 08.12 09.12 10.12 11.12 12.12 01.13 02.13 03.13 04.13 05.13 06.13 07.13 08.13 09.13 10.13 11.13 12.13 01.14 02.14 03.14 04.14 05.14 06.14 07.14 09.14 10.14 11.14 12.14 01.15 02.15 03.15 04.15 06.15 09.15 12.15 01.16 03.16 04.16 05.16 06.16 07.16 08.16 09.16 12.16 01.17 02.17 03.17 04.17 06.17 07.17 08.17 09.17 10.17 12.17 01.18 02.18 03.18 04.18 05.18 06.18 07.18 08.18 09.18 11.18 12.18 01.19 02.19 05.19 06.19 08.19 10.19 11.19 05.20 10.20 11.20 12.20 09.21 11.21 12.22
-xtarget
option with any value other than generic
(default), in building their softwares. Perhaps they are not aware of the benefits of specifying target platform and/or not spending enough time experimenting with different values to compare the performance.-xtarget
option, and the target instruction set architecture with -xchip
option, for better performance. I believe one of the major concern {for software vendors} in specifying the target platform is the suspicion that the application may not run on a wide range of platforms. While it is true upto some extent, still there is a chance to specify some value for the target platform, if we knew that all the supported architecture is compatible with the one we specify with -xchip
option.-xtarget=ultra3 -xarch=v8plusa
-xtarget=ultra3, -xarch=v8plusa
options in building the application. -xarch=v8plusa
selects an instruction set that is Okay for all the members of UltraSPARC family (US-I, II, III, III+, IV, IV+, T1 (code named Niagara)). -xchip=ultra3
tells the optimizer to optimize for best execution on US-III, and later systems. The code will run well on the US-I & II boxes, but possibly a little slower than if optimized for them.-xtarget=generic -xarch=v8plusa
for the past few years. Their application supports only UltraSPARC platform. So, recently I have experimented with their application by building it with -xtarget=ultra3 -xarch=v8plusa
on a US-IV machine. When the application was run on a US-III box with moderate workload, (not so surprisingly) the run-time performance of the application was improved by ~2.5% (compared to the numbers from -xtarget=generic -xarch=v8plusa
build). Of course, there is no performance regression on a US-II box, and the performance is comparable to the vanilla build ie., built with -xtarget=generic -xarch=v8plusa
option; also the performance gains on a US-IV box is relatively comparable to the gains on a US-III box.-xtarget=ultra3 -xarch=v8plusa
combination; and the next version of their application is being built with those options.-xtarget=ultra3
, if there is a heavy use of the Sun performance library. In that case you really need to have specific separate builds for all the target platforms, because there is no single optimized perflib is available, that is suitable for all architectures.The target platform specifies the processor that the application is expected to run on, the minimum processor that is required, and whether the application is 32-bit or 64-bit. For compiler versions prior to the SunStudio 9 release, the compiler specified a generic processor; SunStudio 9 compilers target an UltraSPARC processor for the SPARC architecture, and a generic x86 based processor for the x86 architecture. In all cases it is best to explicitly specify the target processor, since it is possible in some cases for the target processor to depend on the hardware upon which the application is built.
There are a number of compiler flags that specify the target. The flag -xtarget
sets all the other flags to appropriate default values for the given target processor: -xarch
, -xchip
, and -xcache
. The flag -xarch
sets the instruction set that the processor supports, the flag -xchip
specifies how the compiler should use these instructions. Finally the flag -xcache
specifies the
structure of the caches for this target (however this flag may not have any impact for many codes). As with all compiler flags, the order is important; flags accumulate from left to right, in the event that there are conflicting settings the flag on the right will override the values of flags which were specified earlier on the
command line.
A point to be cautious of is that specifying a more recent hardware target may mean that older hardware is no longer able to run the application. In particular specifying the target as being an UltraSPARC platform means that the application will no longer run on pre-UltraSPARC processors (however UltraSPARC processors have been shipping for over 10 years). Similarly specifying an Opteron processor will mean that the code no longer runs x86-compatible processors that do not have the SSE2 instruction set extensions.
For UltraSPARC processors, a generally good option pair to use is -xtarget=ultra3
with -xarch=v8plusa
. These options allow the compiler to generate 32-bit code that can run on all the members of the UltraSPARC family and their follow-ons (UltraSPARC I, UltraSPARC II, UltraSPARC III, UltraSPARC IV). The compiler will also schedule the code especially for the UltraSPARC III. These options represents a good compromise, since code scheduled for the UltraSPARC III is better at taking advantage of the new features of the UltraSPARC III architecture, while still providing good performance on previous generations of processors.
If the application requires the capability to address 64-bit memory addresses, then the appropriate flags to use are -xtarget=ultra3 -xarch=v9a
which adds 64-bit addressing whilst still targeting all the members of the UltraSPARC family of processors.
32-bit code | -xtarget=ultra3 -xarch=v8plusa |
64-bit code | -xtarget=ultra3 -xarch=v9a |
By default the compiler targets a 32-bit generic x86 based processor, so the code will run on any x86 processor from a Pentium Pro up to an AMD Opteron architecture. Whilst this produces code that can run over the widest range of processors, this does not take advantage of the extensions offered by the Opteron family of processors. Consequently it is recommended that for 32-bit code the Opteron processor is targeted, this will generate code that will run on processors (such as the Pentium 4 and Opteron) which support the SSE2 instruction set extensions.
To take advantage of the x64 processor family and the advantages of 64-bit code, the appropriate compiler flags are -xtarget=opteron
-xarch=amd64
.
32-bit code | -xtarget=opteron |
64-bit code | -xtarget=opteron -xarch=amd64 |
-xtarget=generic
The compiler also supports the options -xtarget=generic
and -xtarget=generic64
. These options tell the compiler to produce code which runs well on as wide a range of machines as possible. One feature of these flags is that they will be interpreted appropriately on both the SPARC and x64 platforms -- so using them may mean fewer changes to makefile flags. The following table shows how the compiler will interpret the -xtarget=generic
flag on both the SPARC and x64 platforms.
Flag | SPARC | x64 |
-xtarget=generic | V8plus architecture | 386 architecture |
-xtarget=generic64 | V9 architecture | AMD64 architecture |
2004-2019 |