Mandalika's scratchpad | [ Work blog @Oracle | My Music Compositions ] |
Old Posts: 09.04 10.04 11.04 12.04 01.05 02.05 03.05 04.05 05.05 06.05 07.05 08.05 09.05 10.05 11.05 12.05 01.06 02.06 03.06 04.06 05.06 06.06 07.06 08.06 09.06 10.06 11.06 12.06 01.07 02.07 03.07 04.07 05.07 06.07 08.07 09.07 10.07 11.07 12.07 01.08 02.08 03.08 04.08 05.08 06.08 07.08 08.08 09.08 10.08 11.08 12.08 01.09 02.09 03.09 04.09 05.09 06.09 07.09 08.09 09.09 10.09 11.09 12.09 01.10 02.10 03.10 04.10 05.10 06.10 07.10 08.10 09.10 10.10 11.10 12.10 01.11 02.11 03.11 04.11 05.11 07.11 08.11 09.11 10.11 11.11 12.11 01.12 02.12 03.12 04.12 05.12 06.12 07.12 08.12 09.12 10.12 11.12 12.12 01.13 02.13 03.13 04.13 05.13 06.13 07.13 08.13 09.13 10.13 11.13 12.13 01.14 02.14 03.14 04.14 05.14 06.14 07.14 09.14 10.14 11.14 12.14 01.15 02.15 03.15 04.15 06.15 09.15 12.15 01.16 03.16 04.16 05.16 06.16 07.16 08.16 09.16 12.16 01.17 02.17 03.17 04.17 06.17 07.17 08.17 09.17 10.17 12.17 01.18 02.18 03.18 04.18 05.18 06.18 07.18 08.18 09.18 11.18 12.18 01.19 02.19 05.19 06.19 08.19 10.19 11.19 05.20 10.20 11.20 12.20 09.21 11.21 12.22
LD_NOEXEC_64
to any non-zero value. If the link process doesn't need more than 4G virtual address limit of the 32-bit mode, LD_NOEXEC_64
env. variable suppresses the automatic execution of the 64-bit link-editor (ld
). Note that in 64-bit mode, the linker has to deal with pointers that take twice as much room as their 32-bit counterparts; hence the memory consumption and the link time will be enormously high {compared to 32-bit linking}.ld(1)
:% setenv LD_NOEXEC_64 1
%
make or any custom script for linkingpmap
tool can be used to do capacity planning. pmap
prints information about the virtual address space of a process. In other words, pmap
output represents snapshot of a running process.
Total private memory + (Total shared memory/#instances)
-------------------------------------------------------
(Total load)
#instances
is the actual number of instances of the program/application concurrently running. It is not uncommon for large applications like Oracle, Siebel to fork multiple instances of a server, to improve concurrency.Total load
is the number of concurrent users connected to the application. In multi-threaded server applications, this should be the sum of all users connected to the forked processes of the server.Total private memory
is the total memory used exclusively by all the instances of the process.Anon
column of pmap -x <pid>
output.Total shared memory
is the total memory being shared by more than one process. More than one HAT mapping {in pmap
s output} indicates that more than one process is actively sharing this mapping -- shared memory can be calculated using (RSS-Anon
) from pmap
s output.pmap -x <pid>
shows the amount of virtual memory touched by either read or write operation on processes' virtual memory location, and in turn, the number of physical pages brought into memory as a result of the memory touch operation.%cat alloc.c
#include <stdlib.h>
#include <unistd.h>
int main()
{
int *bigArray;
bigArray = (int *) malloc (sizeof(int) * 100000);
sleep (30);
return (0);
}
%./alloc &
[1] 1941
%pmap -x 1941
1941: ./alloc
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 8 8 - - r-x-- alloc
00020000 8 8 8 - rwx-- alloc
00022000 392 8 8 - rwx-- [ heap ]
FF280000 688 688 - - r-x-- libc.so.1
FF33C000 32 32 32 - rwx-- libc.so.1
FF390000 8 8 8 - rwx-- [ anon ]
FF3A0000 8 8 - - r-x-- libc_psr.so.1
FF3B0000 184 184 - - r-x-- ld.so.1
FF3EE000 8 8 8 - rwx-- ld.so.1
FF3F0000 8 8 8 - rwx-- ld.so.1
FF3FA000 8 8 8 - rwx-- libdl.so.1
FFBF6000 40 40 40 - rwx-- [ stack ]
-------- ------- ------- ------- -------
total Kb 1392 1008 120 -
RSS
as the total memory footprint of this process.sleep
is invoked by two users:% sleep 30 &
[1] 2272
% pmap -x 2272
2272: sleep 30
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 8 8 - - r-x-- sleep
00020000 8 8 8 - rwx-- sleep
00022000 8 8 8 - rwx-- [ heap ]
FF280000 688 688 - - r-x-- libc.so.1
FF33C000 32 32 32 - rwx-- libc.so.1
FF390000 8 8 8 - rwx-- [ anon ]
FF3A0000 8 8 - - r-x-- libc_psr.so.1
FF3B0000 184 184 - - r-x-- ld.so.1
FF3EE000 8 8 8 - rwx-- ld.so.1
FF3F0000 8 8 8 - rwx-- ld.so.1
FF3FA000 8 8 8 - rwx-- libdl.so.1
FFBFE000 8 8 8 - rw--- [ stack ]
-------- ------- ------- ------- -------
total Kb 976 976 88 -
% sleep 30 &
[1] 2274
% pmap -x 2274
2274: sleep 30
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 8 8 - - r-x-- sleep
00020000 8 8 8 - rwx-- sleep
00022000 8 8 8 - rwx-- [ heap ]
FF280000 688 688 - - r-x-- libc.so.1
FF33C000 32 32 32 - rwx-- libc.so.1
FF390000 8 8 8 - rwx-- [ anon ]
FF3A0000 8 8 - - r-x-- libc_psr.so.1
FF3B0000 184 184 - - r-x-- ld.so.1
FF3EE000 8 8 8 - rwx-- ld.so.1
FF3F0000 8 8 8 - rwx-- ld.so.1
FF3FA000 8 8 8 - rwx-- libdl.so.1
FFBFE000 8 8 8 - rw--- [ stack ]
-------- ------- ------- ------- -------
total Kb 976 976 88 -
sleep
) = (176 + (1,776/2))/2 = (176 + 888)/2 = 532 KB.% cat memfootprint
#!/bin/bash
#if [ $# -eq 0 ]; then
if [ $# -ne 2 ]; then
echo "Usage: memfootprint <process_name> <total_load>"
exit
fi
count=0
PIDS=`/usr/bin/ps -ef | grep $1 | grep -v "grep $1" | grep -v memfootprint | awk '{ print $2 }'`
for pid in $PIDS
do
echo 'pmap process :' $pid
count=`expr $count + 1`
done
pmap -x $PIDS | grep total | awk 'BEGIN { FS = " " } {print $1,$2,$3,$4,$5} {rss+=$4} {private+=$5} END {print "Total Private mem: "private/1024" M Total RSS mem: "rss/1024" M Total Shared mem: " (rss-private)/1024 "M **** For '$2' user load: Memory Footprint is: "((private/1024)+(((rss-private)/1024)/'$count'))/'$2'" MB/user"}'
% sleep 60&
[9] 3757
% sleep 60&
[4] 3758
% sh +x memfootprint sleep 2
pmap process : 3757
pmap process : 3758
total Kb 976 976 88
total Kb 976 976 88
Total Private mem: 0.171875 M Total RSS mem: 1.90625 M Total Shared mem: 1.73438M
**** For 2 user load: Memory Footprint is: 0.519531 MB/user
% sleep 60&
[1] 3783
% sleep 60&
[2] 3784
% sleep 60&
[3] 3785
% sleep 60&
[4] 3786
% sleep 60&
[5] 3787
% sleep 60&
[6] 3788
% sh +x memfootprint sleep 6
pmap process : 3787
pmap process : 3785
pmap process : 3783
pmap process : 3788
pmap process : 3784
pmap process : 3786
total Kb 976 976 88
total Kb 976 976 88
total Kb 976 976 88
total Kb 976 976 88
total Kb 976 976 88
total Kb 976 976 88
Total Private mem: 0.515625 M Total RSS mem: 5.71875 M Total Shared mem: 5.20312M
**** For 6 user load: Memory Footprint is: 0.230469 MB/user
% memfootprint
#!/bin/sh
if [ $# -lt 2 ]; then
echo "Usage: memfootprint <Pattern> <NumOfUsers>"
exit
fi
WHOAMI=`/usr/ucb/whoami`
PIDS=`/usr/bin/ps -ef | grep $WHOAMI" " | grep $1 | grep -v "grep $1" | grep -v memfootprint | grep -v dog | awk '{ print $2 }'`
for pid in $PIDS
do
echo 'PID:' $pid
done
printf "------------------------------------------------------------------\n"
umask 0
printf "%-10s %-15s %-15s %-15s %-15s\n" PID Kbytes Resident Private Shared
printf "%-10s %-15s %-15s %-15s %-15s\n" "" "" " Kbytes" " Kbytes" Kbytes
printf "%-10s %-15s %-15s %-15s %-15s\n" --- ------ -------- ------- ------
pmap -x $PIDS | grep total | nawk -v Arg1=$2 'BEGIN { FS = " " }
{ printf "%-10s %-15s %-15s %-15s %-15s\n", "_NA_", $3, $4, $5, ""} {rss+=$4} {private+=$5} END {
printf "%-10s %-15s %-15s %-15s %-15s\n", "---", "------", "--------", "-------", "------"
printf "%-10s %-15s %-15s %-15s %-15s\n", "_NA_", "_NA", rss, private, (rss-private)
printf "------------------------------------------------------------------\n"
printf "Number of Users: %-10s\n", Arg1
printf "Per User Memory Footprint: %13.8f Mega Bytes\n", ((private/1024)+(((rss-private)/1024)/NR))/Arg1
}
% sleep 60 &
[1] 4478
% sleep 60 &
[2] 4479
% sleep 60 &
[3] 4480
% sleep 60 &
[4] 4481
% sleep 60 &
[5] 4482
% sleep 60 &
[6] 4483
% ./memfootprint sleep 6
PID: 4480
PID: 4481
PID: 4483
PID: 4479
PID: 4478
PID: 4482
------------------------------------------------------------------
PID Kbytes Resident Private Shared
Kbytes Kbytes Kbytes
--- ------ -------- ------- ------
_NA_ 1352 1304 168
_NA_ 1352 1304 168
_NA_ 1352 1304 168
_NA_ 1352 1304 168
_NA_ 1352 1304 168
_NA_ 1352 1304 168
--- ------ -------- ------- ------
_NA_ _NA 7824 1008 6816
------------------------------------------------------------------
Number of Users: 6
Per User Memory Footprint: 0.34895833 Mega Bytes
Multi-threading lets different tasks to run concurrently in a single process, hence multi-threaded programs would run faster on machines with multiple processors and on CPUs with multiple cores. On an SMP (Symmetric Multi-Processing system, where multiple processors share a single memory system) system with no CMT (Chip Multi-Threading), software threads are executed on different processors; and on an SMP system with CMT, the threads are executed on cores, and logical processors in CMP (Chip Multi-Processing) processors. As revolutionary chip designs are evolving, many important commercial applications like Oracle, SAP, Siebel, PeopleSoft are designed to be multi-threaded.
Debugging a multi-threaded (MT in short) application is a bit hard, due to the number of software threads running in parallel, compared to a single threaded program where only one task will be running per process, at any given time. Thread synchronization plays an important role when concurrently running threads have to share global resources. Improperly synchronized threads may starve, and lead to unnecessary dead locks, and race conditions. So, it is good to have an MT aware debugger handy, during development and in support phases of software life cycle, to debug threading issues.
Fortunately on Solaris, Sun Studio's debugger, dbx, has support for MT applications that are designed to use Solaris threads, and/or POSIX threads. With dbx, it is possible to get information like thread state, stack trace, locks from all threads, navigate between threads, suspend/resume threads, put break points in a thread and can do step by step execution in a function in a designated thread. Note that Solaris Modular Debugger (mdb) also has support for MT programs; but this blog post concentrates on Studio's dbx.
Siebel processes were used to show various dbx commands in the following examples. Siebel is a multi-threaded application, written in C/C++.
Core dump analysisThe following example shows some useful commands to get the stack trace in the thread, where the process crashed. For more information about dbx commands, type help
or help <command>
in dbx environment ie., at dbx prompt.
% ls -lh coreSince we don't know which thread crashed the process, let's list all known threads with
-rw------- 1 giri other 273M Dec 9 16:56 core
% file core
core: ELF 32-bit MSB core file SPARC Version 1, from 'siebprocmw'
% /opt/SS11/SUNWspro/prod/bin/dbx siebprocmw core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading siebprocmw
core file header read successfully
Reading ld.so.1
Reading libsslcwsl.so
Reading libssscsci.so
Reading libssscscf.so
...
...
Reading libsbcfui.so
Reading libsbcfuiapps.so
t@1 (l@1) terminated by signal KILL (Killed)
0xfd2bc7e0: ___nanosleep+0x0008: blu _cerror ! 0xfd2206a0
threads
command. threads -all
lists all threads, including zombies.(dbx) threads
> t@1 a l@1 ?() LWP suspended in ___nanosleep()
t@2 b l@2 MwTimerThread() LWP suspended in __pollsys()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd874078 in __lwp_park()
t@4 b l@4 MwThread() LWP suspended in __pollsys()
t@5 b l@5 MwThread() LWP suspended in __pollsys()
o t@6 b l@6 MwThread() signal SIGABRT in __lwp_kill()
t@7 b l@7 MwThread() LWP suspended in __pollsys()
t@9 b l@9 MwThread() LWP suspended in ___nanosleep()
In the above list, t@1 is the current thread, which is indicated by ">", and the start function is not known (indicated with a "?()").
(dbx) thread
current thread ($thread) is t@1
(dbx) where
current thread: t@1
=>[1] ___nanosleep(0x4, 0xffbfd9a8, 0x0, 0xff000000, 0x0, 0x0), at 0xfd2bc7e0
[2] _sleep(0x64, 0x0, 0xfd2e8bc0, 0xfd0e2000, 0xfd0e2000, 0x0), at 0xfd2afaa0
[3] thr_t::do_thr_action(0xfd86ba10, 0xc, 0x1608, 0xfd86ba20, 0x1, 0x2), at 0xfd770e14
[4] thr_t::t_sleep(0xfb80f5c0, 0x0, 0xffbfdb0e, 0xffbfdb08, 0xfd8546cc, 0xffffffff), at 0xfd770c58
[5] MwWaitForMultipleObjects(0xfb80f5c0, 0x2, 0xfb80f5c8, 0x2, 0xffffffff, 0x9cd48), at 0xfd774dd4
[6] WaitForMultipleObjectsEx(0x2, 0xffbfde3c, 0x0, 0x100000, 0x0, 0x9cd48), at 0xfd77fe9c
[7] OSDNTWait::WaitForThread(0xc, 0xffffffff, 0xffbfdecc, 0xd0108, 0x1004f, 0xff8a1d64), at 0xffa7b050
[8] OSDWaitTid(0xc, 0xffffffff, 0xffbfe7c4, 0x0, 0xc, 0xc), at 0xff05f1c4
[9] scfEventFacility::scfEventFac::ShutdownCmd(0xe14450, 0x1, 0x7, 0xfe4de0f4, 0xffbfe7c8, 0xff48f8d4), at 0xff819884
[10] scfEventFacility::scfEventFac::Shutdown(0xffbfe96c, 0xff877530, 0x0, 0x5e000, 0xff874e8c, 0x5e114), at 0xff819390
[11] ScfSisDetach(0x0, 0x0, 0x0, 0xffffffff, 0xffbfe96c, 0xfc81c), at 0xff781ed4
[12] _shutdown(0x6479c, 0x0, 0x651a8, 0x651a8, 0x7, 0x0), at 0x49c7c
[13] wmain(0x12a, 0x6479c, 0x0, 0x0, 0xffbfedac, 0x6479c), at 0x4995c
[14] main(0xfd85f310, 0xc94, 0xffbfef90, 0x54, 0xfd85f310, 0xc00), at 0x4d3cc
This is not exactly what we are looking for. The above call stack shows where the current thread (t@1) is waiting. Since our interest is to find out the thread that is responsible for the process crash, we need to look for an o
before the thread id. t@6 is the ill fated thread in the list of all known threads; and the process was killed because of a SIGABRT in lwp_kill method. Note that OS provides the necessary abstraction for creating, and destroying threads; and also has the freedom of killing malfunctioning threads when things go haywire. In this example, __lwp_kill() was called by the operating system, due to some event which we are going to investigate.
thread -info <tid>
command provides more information like what exactly happened in application code that triggered the forcible shutdown.
(dbx) thread -info t@6
Thread t@6 (0xfcb80c00) at priority 0
state: bound to l@6
base function: 0xfd770ff4: MwThread() stack: 0xfa380000[524288]
flags: BOUND|DETACHED|SUSPENDED
masked signals: SEGV
Currently active in __lwp_kill
Observe that kernel trapped an illegal memory access with a SEGV signal. The default behavior for a SEGV, is to shutdown the process with a possible core file generation (aka core dump). Let's switch to thread t@6 with thread <tid>
command, and get to the instruction which raised the segmentation fault.
(dbx) thread t@6
t@6 (l@6) stopped in __lwp_kill at 0xfd2bd5ec
0xfd2bd5ec: __lwp_kill+0x0008: bcc,a,pt %icc,__lwp_kill+0x18 ! 0xfd2bd5fc
(dbx) thread
current thread ($thread) is t@6
(dbx) where
current thread: t@6
=>[1] __lwp_kill(0x0, 0x6, 0x0, 0x6, 0xffff0000, 0x0), at 0xfd2bd5ec
[2] raise(0x6, 0x0, 0xfd2a1af4, 0x42770, 0xfd2e4278, 0x6), at 0xfd25d884
[3] abort(0xe15220, 0x1, 0x0, 0xa6544, 0xfd2e7298, 0x0), at 0xfd23de38
[4] SehScanInvokeTryList(0x44bd308, 0x108000, 0xfd8571c4, 0x0, 0x2, 0x0), at 0xfd74c9d4
[5] Signal_Handler::raise(0xc0000005, 0xfa37cde8, 0x0, 0x2, 0xfa37cc80, 0x1800), at 0xfd74d778
[6] Raise_Exception::operator()(0x67670, 0xb, 0xfa37d0a0, 0xfa37cde8, 0xfd86a07c, 0x2c), at 0xfd74d8dc
[7] __sighndlr(0xb, 0xfa37d0a0, 0xfa37cde8, 0xfd74d7c8, 0x0, 0x1), at 0xfd2bc52c
---- called from signal handler with signal 11 (SIGSEGV) ------
[8] CSSSqlObj::GetTrxDbConn(0x458a7d8, 0x0, 0x1394478, 0x64c00, 0x0, 0x4611290), at 0xf91de72c
[9] CSSSqlObj::Execute(0x4611290, 0x0, 0x0, 0x0, 0x0, 0xfe4dd294), at 0xf91c7b98
[10] CSSBusComp::SqlExecute(0x4606640, 0x0, 0x0, 0x0, 0x1, 0x4b22e84), at 0xf9a9c160
[11] CSSBCBase::SqlExecute(0x4606640, 0x0, 0xfa37d6fc, 0x0, 0x1, 0xf57be3e8), at 0xf56c2294
[12] CSSBusComp::Execute(0x0, 0x0, 0x0, 0x0, 0x4606640, 0xfa37d7cc), at 0xf9a6b118
[13] CSSMsgBoardMaintSvc::UpdTaskHistory(0x44b5ae0, 0xfa37df90, 0x0, 0x4567d14, 0xf8611198, 0x489cd94), at 0xf85f2d48
[14] CSSMsgBoardMaintSvc::HandleEventDataList(0x44b5ae0, 0x43a0018, 0xff486b38, 0x0, 0xfa37e0ac, 0xf8611198), at 0xf85f5afc
[15] CSSMsgBoardMaintSvc::ReadTaskHistory(0x44b5ae0, 0x43a0018, 0xf85f4e60, 0x44b5ae0, 0x43a0018, 0x1), at 0xf85f53c0
[16] scfEventFacility::scfEventFac::CallRegSub(0x2a59448, 0x4109bd8, 0x0, 0x0, 0x8, 0x2), at 0xff81ad20
[17] scfEventFacility::scfEventFac::HandleCurrProcEvents(0xe14450, 0x7530, 0xe14450, 0xff432ef0, 0xff874e8c, 0x1),
at 0xff81b19c
[18] scfEventFacility::scfEventFac::scfEventThreadMain(0x0, 0x0, 0x0, 0x7400, 0xfa37fc90, 0xd0001), at 0xff81a7dc
[19] OSDWslThreadStart(0x101d58, 0xff81a580, 0x101d58, 0x6, 0x0, 0x101d70), at 0xff05bec8
[20] _AfxThreadEntry(0xffbfde34, 0xe9568, 0x0, 0x1, 0x0, 0x17289c), at 0xfeb95730
[21] MwThread(0x1, 0x0, 0x1, 0x0, 0xfd86bed0, 0xe15220), at 0xfd771230
From the above stack trace it is clear that the binary doesn't contain necessary debug information to show high level instructions; so, let's try to get the disassembly with dis
command.
(dbx) dis GetTrxDbConn / 50
More than one identifier 'GetTrxDbConn'.
Select one of the following:
0) Cancel
1) `libsscfdm.so`#__1cPCSSModelPhysDefMGetTrxDbConn6MpkH_pnJCSSDbConn__
[non -g, demangles to: CSSModelPhysDef::GetTrxDbConn(const unsigned short*)]
2) `libsscfdm.so`#__1cJCSSSqlObjMGetTrxDbConn6kM_pnJCSSDbConn__
[non -g, demangles to: CSSSqlObj::GetTrxDbConn()const]
> 2
0xf91de6c0: GetTrxDbConn : save %sp, -96, %sp
0xf91de6c4: GetTrxDbConn+0x0004: mov %i0, %i5
0xf91de6c8: GetTrxDbConn+0x0008: ld [%i0 + 388], %i0
0xf91de6cc: GetTrxDbConn+0x000c: cmp %i0, 0
0xf91de6d0: GetTrxDbConn+0x0010: be,pn %icc,GetTrxDbConn+0x60 ! 0xf91de720
0xf91de6d4: GetTrxDbConn+0x0014: sethi %hi(0x5b400), %l6
0xf91de6d8: GetTrxDbConn+0x0018: call GetTrxDbConn+0x20 ! 0xf91de6e0
0xf91de6dc: GetTrxDbConn+0x001c: mov %o7, %o7
0xf91de6e0: GetTrxDbConn+0x0020: sethi %hi(0x2d1400), %o5
0xf91de6e4: GetTrxDbConn+0x0024: xor %l6, 88, %l4
0xf91de6e8: GetTrxDbConn+0x0028: inc 420, %o5
0xf91de6ec: GetTrxDbConn+0x002c: sethi %hi(0x1000), %l5
0xf91de6f0: GetTrxDbConn+0x0030: add %o5, %o7, %l3
0xf91de6f4: GetTrxDbConn+0x0034: add %l5, 868, %l1
0xf91de6f8: GetTrxDbConn+0x0038: add %l3, %l4, %l2
0xf91de6fc: GetTrxDbConn+0x003c: ld [%l2], %l0
0xf91de700: GetTrxDbConn+0x0040: ld [%l0 + %l1], %o4
0xf91de704: GetTrxDbConn+0x0044: cmp %o4, 0
0xf91de708: GetTrxDbConn+0x0048: be,a,pn %icc,GetTrxDbConn+0x68 ! 0xf91de728
0xf91de70c: GetTrxDbConn+0x004c: ld [%i5 + 128], %i2
0xf91de710: GetTrxDbConn+0x0050: ld [%o4 + 88], %l7
0xf91de714: GetTrxDbConn+0x0054: cmp %i5, %l7
0xf91de718: GetTrxDbConn+0x0058: bne,a,pn %icc,GetTrxDbConn+0x68 ! 0xf91de728
0xf91de71c: GetTrxDbConn+0x005c: ld [%i5 + 128], %i2
0xf91de720: GetTrxDbConn+0x0060: ret
0xf91de724: GetTrxDbConn+0x0064: restore %g0, 0, %o0
0xf91de728: GetTrxDbConn+0x0068: ld [%i2 + 188], %i1
0xf91de72c: GetTrxDbConn+0x006c: ld [%i1 - 16], %i3
0xf91de730: GetTrxDbConn+0x0070: cmp %i3, 0
0xf91de734: GetTrxDbConn+0x0074: bge,pn %icc,GetTrxDbConn+0x90 ! 0xf91de750
0xf91de738: GetTrxDbConn+0x0078: add %i2, 188, %i4
0xf91de73c: GetTrxDbConn+0x007c: clr %o0
0xf91de740: GetTrxDbConn+0x0080: call RequiredConditionIsFalse [PLT] ! 0xf94b0684
0xf91de744: GetTrxDbConn+0x0084: mov 84, %o1
0xf91de748: GetTrxDbConn+0x0088: ld [%i4], %i1
0xf91de74c: GetTrxDbConn+0x008c: ld [%i5 + 388], %i0
0xf91de750: GetTrxDbConn+0x0090: call GetTrxDbConn ! 0xf90e0e00
0xf91de754: GetTrxDbConn+0x0094: restore %g0, 0, %g0
0xf91de758: GetTrxDbConn+0x0098: unimp 0x0
...
...
To see the actual C++ instruction which seg faulted, compile the binary with -g
(debug) option, and reproduce the crash. If the source code is readable from the location where you run the dbx session, you will see the actual high level instructions.
The objective of this section is to show how to use some of the dbx commands to get some useful information, from a running MT process.
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
2754 giri 399M 302M sleep 59 0 0:00:34 2.0% siebmtshmw/21
% /opt/SS11/SUNWspro/prod/bin/dbx - 2754
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading -
Reading ld.so.1
Reading libsslcwsl.so
Reading libssscsci.so
Reading libssscscf.so
...
...
Reading libsscasvbc.so
Reading libswcasvfr.so
Attached to process 2754 with 21 LWPs
t@1 (l@1) stopped in __pollsys at 0xfd13d1c4
0xfd13d1c4: __pollsys+0x0004: ta 8
(dbx) threads
> t@1 a l@1 ?() running in __pollsys() <- t@1 is always the default current thread under dbx
t@2 b l@2 MwTimerThread() sleep on 0xfb80f4c0 in __lwp_park()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd774078 in __lwp_park()
t@4 b l@4 MwThread() running in __pollsys()
t@5 b l@5 MwThread() running in __pollsys()
t@6 b l@6 MwThread() sleep on 0xf9b7eb80 in __lwp_park()
t@7 b l@7 MwThread() running in __pollsys()
t@8 b l@8 MwThread() running in _so_recv()
t@9 b l@9 MwThread() sleep on 0xf927fb68 in __lwp_park()
t@10 b l@10 MwThread() sleep on 0xf877f500 in __lwp_park()
t@11 b l@11 MwThread() sleep on 0xf867fa40 in __lwp_park()
t@12 b l@12 MwThread() sleep on 0xf857fa50 in __lwp_park()
t@13 b l@13 MwThread() sleep on 0xf847fa38 in __lwp_park()
t@14 b l@14 MwThread() running in __pollsys()
t@15 b l@15 MwThread() sleep on 0xf827f490 in __lwp_park()
t@16 b l@16 MwThread() running in __pollsys()
t@17 b l@17 MwThread() sleep on 0xf807f490 in __lwp_park()
t@18 b l@18 MwThread() running in __pollsys()
t@19 b l@19 MwThread() sleep on 0xf4c7f490 in __lwp_park()
t@20 b l@20 MwThread() running in __pollsys()
t@21 b l@21 MwThread() sleep on 0xf4a7f490 in __lwp_park()
Put a break point in thread 21 (t@21) for all calls to memcpy():
(dbx) stop in memcpy -thread t@21
More than one identifier 'memcpy'.
Select one of the following:
0) Cancel
1) `libc.so.1`memcpy
2) `libc_psr.so.1`memcpy
a) All
> a
dbx: warning: 'memcpy' has no debugger info -- will trigger on first instruction
dbx: warning: 'memcpy' has no debugger info -- will trigger on first instruction
Will create handlers for all 2 hits
(2) stop in _private_memcpy -thread t@21 <- implicit break point set by dbx
(3) stop in _memcpy -thread t@21 <- implicit break point
(dbx) cont
t@21 (l@21) stopped in _memcpy at 0xfe1f04c0
0xfe1f04c0: _memcpy : nop
Note that dbx is synchronous -- when any thread or lightweight process (LWP) stops, all other threads and LWPs stop as well.
(dbx) thread
current thread ($thread) is t@21
(dbx) where
current thread: t@21
=>[1] _memcpy(0x5080e14, 0xff406b38, 0x2, 0x36, 0x1, 0x6c), at 0xfe1f04c0
[2] SSstring::GetWriteBuffer(0xf4a7e6ac, 0xff406b28, 0xff874e8c, 0x32, 0x0, 0xff3b2ef0), at 0xff31ffcc
[3] sciProcState::sciBlock::FormatLatchName(0xf4a7e6ac, 0x1, 0x7, 0x853c, 0xffa30bd8, 0x8400), at 0xffa02744
[4] sciProcState::sciProcState(0x5ad31f8, 0xf9fc0000, 0xf4a7e644, 0xff406b3c, 0x0, 0x0), at 0xffa012c4
[5] sciProcState::GetSciProcState(0xf4a7e7f8, 0x26fcb8, 0x5ad31f8, 0xff88db30, 0x5f5e4, 0x61e6c90), at 0xffa014f0
[6] SciCheckShutdown(0xf4a7e8cc, 0x34151f8, 0x74, 0x26fcb8, 0x0, 0x2ef798), at 0xff9fe0e4
[7] SciGetInterrupt(0x0, 0x6a20950, 0x0, 0xf4a7e864, 0x25cd94, 0x1da84), at 0xff9fde40
[8] _smiMessageQ::ProcessMessage(0x15f85c0, 0x6a20950, 0x0, 0x0, 0x24a360, 0x32e18f0), at 0x2158e4
[9] _smiMessageQ::ProcessRequest(0x3380c48, 0x6a20950, 0x191, 0x2, 0x5ae22f0, 0x15f85c0), at 0x21461c
[10] _smiWorkQueue::ProcessWorkItem(0x15f98b8, 0x3380c48, 0x6a20950, 0x5ae2390, 0x0, 0x101f180), at 0x208d08
[11] _smiWorkQueue::WorkerTask(0x15f98b8, 0x5b7f6b8, 0x3326338, 0x1500e0, 0x0, 0x0), at 0x208764
[12] SmiThrdEntryFunc(0x32f72d8, 0x70000f, 0x700010, 0x0, 0x0, 0x0), at 0x1f7a0c
[13] OSDWslThreadStart(0x3380568, 0x1f75a0, 0x3380568, 0x15, 0x0, 0x3380760), at 0xfefdbec8
[14] _AfxThreadEntry(0xf4b7de5c, 0x3386210, 0x0, 0x1, 0x0, 0x17289c), at 0xfeb95730
[15] MwThread(0x1, 0x0, 0x1, 0x0, 0xfd76bed0, 0x33cdc40), at 0xfd671230
Let's step into memcpy() with stepi
, and observe how the thread state changes.
(dbx) stepi
t@21 (l@21) stopped in _memcpy at 0xfe1f04c4
0xfe1f04c4: _memcpy+0x0004: nop
(dbx) threads
t@1 a l@1 ?() running in __pollsys()
t@2 b l@2 MwTimerThread() sleep on 0xfb80f4c0 in __lwp_park()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd774078 in __lwp_park()
t@4 b l@4 MwThread() running in __pollsys()
t@5 b l@5 MwThread() running in __pollsys()
t@6 b l@6 MwThread() sleep on 0xf9b7eb80 in __lwp_park()
t@7 b l@7 MwThread() running in __pollsys()
t@8 b l@8 MwThread() running in _so_recv()
t@9 b l@9 MwThread() sleep on 0xf927fb68 in __lwp_park()
t@10 b l@10 MwThread() sleep on 0xf877f500 in __lwp_park()
t@11 b l@11 MwThread() sleep on 0xf867fa40 in __lwp_park()
t@12 b l@12 MwThread() sleep on 0xf857fa50 in __lwp_park()
t@13 b l@13 MwThread() sleep on 0xf847fa38 in __lwp_park()
t@14 b l@14 MwThread() running in __pollsys()
t@15 b l@15 MwThread() sleep on 0xf827f490 in __lwp_park()
t@16 b l@16 MwThread() running in __pollsys()
o t@17 b l@17 MwThread() breakpoint in _memcpy()
o t@18 b l@18 MwThread() breakpoint in _memcpy()
o t@19 b l@19 MwThread() breakpoint in _memcpy()
t@20 b l@20 MwThread() running in __pollsys()
*> t@21 b l@21 MwThread() single stepped in _memcpy()
In the above example, t@17, t@18 and t@19 are stopped at calls to memcpy(); and t@21 stepped into memcpy(). Get out of memcpy() with step up
command.
(dbx) step up
_memcpy returns 84413972
t@21 (l@21) stopped in SSstring::GetWriteBuffer at 0xff31ffd4
0xff31ffd4: GetWriteBuffer+0x0114: ld [%i1 + 4], %i2
Clear the break point (in current thread) with clear
command
(dbx) contLocks
t@21 (l@21) stopped in _memcpy at 0xfe1f04c0
0xfe1f04c0: _memcpy : nop
(dbx) clear
cleared (3) stop in _memcpy -thread t@21
thread -blocks [<tid>]
lists all locks held by the given thread, blocking other threads. If tid is not specified, dbx lists the locks held by the current thread. In the following example, t@21 (current thread) is not holding any locks.
(dbx) thread -blocks
Locks held by t@21:
thread -blockedby [<tid>]
shows the synchronization object (monitor) on which the given thread is blocked. If tid is not specified, dbx shows this information for the current thread. Note that only sleeping threads must be in blocked state.
(dbx) thread -blockedby t@10
Thread t@10 is blocked by:
0xf877f500 (0xf877f500): thread condition variable
(dbx) thread -blockedby t@12
Thread t@12 is blocked by:
0xf857fa50 (0xf857fa50): thread condition variable
(dbx) thread -blockedby t@17
Thread t@17 is not asleep
syncs
command lists all synchronization objects ie., locks/monitors.
(dbx) syncs
All locks currently known to libthread:
0x01020320 (0x01020320): thread mutex(unlocked)
0x010203f8 (0x010203f8): thread mutex(unlocked)
0xf827f490 (0xf827f490): thread condition variable
0xf827f4a0 (0xf827f4a0): thread mutex(unlocked)
0xf877f500 (0xf877f500): thread condition variable
0xf877f510 (0xf877f510): thread mutex(unlocked)
0xf927fb68 (0xf927fb68): thread condition variable
0xf927fb78 (0xf927fb78): thread mutex(unlocked)
0xf867fa40 (0xf867fa40): thread condition variable
0xf867fa50 (0xf867fa50): thread mutex(unlocked)
0xf9b7eb80 (0xf9b7eb80): thread condition variable
0xf9b7eb90 (0xf9b7eb90): thread mutex(unlocked)
0x015c2ed8 (0x015c2ed8): thread mutex(unlocked)
0x015c2f38 (0x015c2f38): thread mutex(unlocked)
0x015c2f18 (0x015c2f18): thread mutex(unlocked)
0x015c2dd8 (0x015c2dd8): thread mutex(unlocked)
0x015c34d8 (0x015c34d8): thread mutex(unlocked)
0x03325fb8 (0x03325fb8): thread mutex(unlocked)
0x033264b8 (0x033264b8): thread mutex(unlocked)
0x033261b8 (0x033261b8): thread mutex(unlocked)
0x017a6ce8 (0x017a6ce8): thread mutex(locked)
0xfa4f4314 (0xfa4f4314): process mutex(locked)
0x0332c438 (0x0332c438): thread mutex(unlocked)
0x0332c348 (0x0332c348): thread mutex(unlocked)
0x02fcd7e8 (0x02fcd7e8): thread mutex(unlocked)
0x0028f860 (0x0028f860): thread mutex(unlocked)
__1cUCSSSISLocalTransSrvrKs_instLock_+0x8 (0xff1ee220): thread mutex(unlocked)
0x034150e8 (0x034150e8): thread mutex(unlocked)
0x034151d8 (0x034151d8): thread mutex(unlocked)
__uberdata+0x80 (0xfd168c40): thread mutex(unlocked)
0x01878b98 (0x01878b98): thread mutex(unlocked)
0x01878aa8 (0x01878aa8): thread mutex(unlocked)
0xfa4c7e9c (0xfa4c7e9c): process mutex(unlocked)
libc_malloc_lock (0xfd1676f8): thread mutex(unlocked)
0x0179cb30 (0x0179cb30): thread mutex(unlocked)
0x0179c830 (0x0179c830): thread mutex(unlocked)
0xfa5c2664 (0xfa5c2664): process mutex(unlocked)
0xfa5c2c94 (0xfa5c2c94): process mutex(unlocked)
0x0161dd90 (0x0161dd90): thread mutex(unlocked)
0x0101f6e0 (0x0101f6e0): thread mutex(unlocked)
0x0101f718 (0x0101f718): thread mutex(unlocked)
0x0101f770 (0x0101f770): thread mutex(unlocked)
0x0101f508 (0x0101f508): thread mutex(locked)
0x0101f5a8 (0x0101f5a8): thread mutex(unlocked)
0x015bfe90 (0x015bfe90): thread mutex(unlocked)
0x015bfe20 (0x015bfe20): thread mutex(unlocked)
0x015bfe58 (0x015bfe58): thread mutex(unlocked)
To get information about a synchronization object at a given address, use sync -info <address>
(dbx) sync -info 0x0028f860Tracing
0x0028f860 (0x28f860): thread mutex(unlocked)
Lock is unowned
No threads are blocked by this lock
(dbx) sync -info 0xf877f500
0xf877f500 (0xf877f500): thread condition variable
(dbx) sync -info 0xfd1676f8
libc_malloc_lock (0xfd1676f8): thread mutex(unlocked)
Lock is unowned
No threads are blocked by this lock
trace
command can be used to trace the executed source lines, function calls, or variable changes. The following example traces the thread creation, and prints a message whenever a thread gets created.
(dbx) trace thr_create
(4) trace thr_create
(dbx) cont
trace: thread created t@22 on l@22
trace: thread created t@23 on l@23
Reading libsrlcver.so
Reading libsscafsbc.so
...
(dbx) threads
*> t@1 a l@1 ?() signal SIGINT in __pollsys()
t@2 b l@2 MwTimerThread() sleep on 0xfb80f4c0 in __lwp_park()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd774078 in __lwp_park()
...
...
t@20 b l@20 MwThread() running in __pollsys()
t@21 b l@21 MwThread() sleep on 0xf4a7f490 in __lwp_park()
t@22 b l@22 MwThread() running in __pollsys() <- new thread
t@23 b l@23 MwThread() sleep on 0xea6ff490 in __lwp_park() <- new thread
In the above example, there is no information about who created the threads t@22 & t@23. Even to get that information, use when
command as shown below:
(dbx) when thr_create { echo "New thread $newthread was created by thread $thread"; }
(6) when thr_create { kprint "New thread ${newthread} was created by thread ${thread}"; }
(dbx) cont
New thread t@24 was created by thread t@10
New thread t@25 was created by thread t@24
$newthread and $thread are pre-defined variables of dbx, which holds the thread ID of a newly created thread, and the thread ID of the current thread, respectively.
Similarly thread exits can be traced as follows:
(dbx) trace thr_exitSuspending/Resuming threads
(5) trace thr_exit
(dbx) cont
New thread t@26 was created by thread t@10
New thread t@27 was created by thread t@26
trace: thr_exit t@27
To suspend the execution of a thread, run the command thread -suspend <tid>
; to resume the suspended thread, thread -resume <tid>
(dbx) thread -suspend t@26Break point with
Thread t@26 suspended
(dbx) thread -resume t@26
Thread t@26 unsuspended
stop
commandThe following example shows how to set a break point to stop the execution, when a new thread with id t@34 gets created.
(dbx) stop thr_create t@34Light Weight Processes (LWPs)
(9) stop thr_create t@34
(dbx) cont
t@10 (l@10) stopped in tdb_event_create at 0xfd1377e8
0xfd1377e8: tdb_event_create : retl
trace: thread created t@34 on l@34
(dbx) where <- who initiated the new thread creation? entire call stack
current thread: t@10
=>[1] tdb_event_create(0x2, 0x1084, 0x3ff, 0x0, 0xfc8e1c00, 0x1000), at 0xfd1377e8
[2] _thrp_create(0x180, 0x10f8, 0xfd1377e8, 0x1e, 0xc1, 0xfde32000), at 0xfd138c04
[3] _pthread_create(0xf877f310, 0x0, 0xfd670ff4, 0xf877f318, 0x0, 0xfd168bc0), at 0xfd12d104
[4] MwCreateThread(0x0, 0xfeb95630, 0xf877f414, 0x4, 0x0, 0x9383cb0), at 0xfd671460
[5] CreateThread(0x0, 0x0, 0xfeb95630, 0xf877f414, 0x4, 0x9383cb0), at 0xfd67d124
[6] CWinThread::CreateThread(0x9383c80, 0x4, 0x0, 0x0, 0xfd164278, 0x88cabc9), at 0xfeb95f1c
[7] AfxBeginThread(0xffa7a420, 0x88cabc0, 0x0, 0x0, 0x4, 0x0), at 0xfeb958a4
[8] WslCreateThread(0xfefdbe00, 0x5c135c0, 0x0, 0x88cabc0, 0xf877f584, 0x16b8c), at 0xffa7a4cc
[9] OSDCreateThread(0x211200, 0x5b40660, 0x0, 0x0, 0x5ab1590, 0x5c135c0), at 0xfefdc16c
[10] SmiDispatchThrdMain(0x101f180, 0x5ab1588, 0x5ab1590, 0xf877fd64, 0xf877fcec, 0xff40f8d4), at 0x1f53f4
[11] OSDWslThreadStart(0x10b8ad0, 0x1f5240, 0x10b8ad0, 0xa, 0x0, 0x15d07e8), at 0xfefdbec8
[12] _AfxThreadEntry(0xffbfeaac, 0x2f4948, 0x0, 0x1, 0x0, 0x17289c), at 0xfeb95730
[13] MwThread(0x1, 0x0, 0x1, 0x0, 0xfd76bed0, 0x15cd558), at 0xfd671230
Application (user) threads are not visible to the kernel. Kernel treats light weight processes (LWPs) as the only schedulable entities within a process. LWPs bridge the user level and kernel level threads. Each process contains one or more LWPs; and each LWP is associated with a kernel thread. Prior to Solaris 9, each of LWPs would run one or more user level threads (ie., 1xN). From Solaris 9 onwards, there is one LWP for every user level thread (ie., 1x1).
Use lwps
command to list all LWPs in the process.
(dbx) lwps
l@1 running in _private_mprotect()
l@2 running in __lwp_park()
l@3 running in __lwp_park()
l@4 running in __pollsys()
l@5 running in __pollsys()
l@6 running in __lwp_park()
l@7 running in __pollsys()
l@8 running in _so_recv()
l@9 running in __lwp_park()
l@10 running in __lwp_park()
l@11 running in __lwp_park()
l@12 running in __lwp_park()
l@13 running in __time()
l@14 running in __pollsys()
l@15 running in __lwp_park()
l@16 running in __pollsys()
o l@17 breakpoint in SSstring::GetWriteBuffer()
l@18 running in __lwp_unpark()
o l@19 breakpoint in SSstring::GetWriteBuffer()
l@20 running in __pollsys()
*>l@21 breakpoint in SSstring::GetWriteBuffer()
lwp
command displays the current LWP. To switch to a different LWP, use lwp <lwpid>
. lwp -info [<lwpid>]
shows some useful information for a given LWP.
(dbx) lwp
current LWP ($lwp) is l@21
(dbx) lwp -info
l@21 breakpoint in SSstring::GetWriteBuffer()
masked signals are:
(dbx) lwp -info l@12
l@12 running in __lwp_park()
masked signals are:
(dbx) lwp l@18
t@18 (l@18) stopped in __pollsys at 0xfd13d1c4
0xfd13d1c4: __pollsys+0x0004: ta 8
In general, MT applications that make heavy use of the standard {Solaris operating system's} memory allocator, may exhibit poor scalability. This problem occurs when multiple threads are in malloc()
or free()
waiting to obtain the mlock.
If the application suffers from this scalability issue, the top of the thread stacks (which can be obtained using either dbx or pstack
command) will appear as below:
lwp_parkor
mutex_lock_queue
slow_lock
free
lwp_park
mutex_lock_queue
slow_lock
malloc
One such problem was described in this Solaris forum's thread slow_lock making application hang.
MT aware memory allocatorsmtmalloc, umem libraries of Solaris distribution will resolve this kind of scalability problem. libmtmalloc was introduced in Solaris 7; and libumem was introduced in Solaris 9 Update 3. These userland memory allocators are packaged as a drop-in replacement to the standard malloc()
and free()
library calls; so, to take advantage of these allocators, link the MT application with any of these allocators.
mtmalloc, umem allocators are a redesign of the standard library; and hence results in finer grained locking. These libraries will significantly outperform the standard library in cases where multiple concurrent requests are made to the memory allocator. In the case of a single threaded application, the standard memory allocator will however provide better performance. The standard memory allocator also provides a smaller memory footprint. Note that the trade-off with mtmalloc, umem allocators is much bigger memory footprint, due to the way the memory gets allocated. For these reasons the standard memory allocator may be preferred in cases where the advantages of mtmalloc and umem, do not apply. Make sure to experiment with these memory allocators to see which one fits best for your application.
Linking with mtmalloc or umemAt compile time, the application can be linked against mtmalloc or umem library. Adding -lmtmalloc
or -lumem
, option to the link line results in the application being linked appropriately.
% cc -mt -o my_program my_program.c -lmtmalloc
or% cc -mt -o my_program my_program.c -lumem
You can check the library dependency with ldd my_program
.
If re-building the application by linking with mtmalloc or umem, is not feasible, either of these libraries can be preloaded with LD_PRELOAD
environment variable, when the program is executed.
% setenv LD_PRELOAD libmtmalloc.so
% ./my_program
% setenv LD_PRELOAD libumem.so
% ./my_program
You can verify whether the library is preloaded, with pldd `pgrep my_program`
.
-xtarget
option with any value other than generic
(default), in building their softwares. Perhaps they are not aware of the benefits of specifying target platform and/or not spending enough time experimenting with different values to compare the performance.-xtarget
option, and the target instruction set architecture with -xchip
option, for better performance. I believe one of the major concern {for software vendors} in specifying the target platform is the suspicion that the application may not run on a wide range of platforms. While it is true upto some extent, still there is a chance to specify some value for the target platform, if we knew that all the supported architecture is compatible with the one we specify with -xchip
option.-xtarget=ultra3 -xarch=v8plusa
-xtarget=ultra3, -xarch=v8plusa
options in building the application. -xarch=v8plusa
selects an instruction set that is Okay for all the members of UltraSPARC family (US-I, II, III, III+, IV, IV+, T1 (code named Niagara)). -xchip=ultra3
tells the optimizer to optimize for best execution on US-III, and later systems. The code will run well on the US-I & II boxes, but possibly a little slower than if optimized for them.-xtarget=generic -xarch=v8plusa
for the past few years. Their application supports only UltraSPARC platform. So, recently I have experimented with their application by building it with -xtarget=ultra3 -xarch=v8plusa
on a US-IV machine. When the application was run on a US-III box with moderate workload, (not so surprisingly) the run-time performance of the application was improved by ~2.5% (compared to the numbers from -xtarget=generic -xarch=v8plusa
build). Of course, there is no performance regression on a US-II box, and the performance is comparable to the vanilla build ie., built with -xtarget=generic -xarch=v8plusa
option; also the performance gains on a US-IV box is relatively comparable to the gains on a US-III box.-xtarget=ultra3 -xarch=v8plusa
combination; and the next version of their application is being built with those options.-xtarget=ultra3
, if there is a heavy use of the Sun performance library. In that case you really need to have specific separate builds for all the target platforms, because there is no single optimized perflib is available, that is suitable for all architectures.The target platform specifies the processor that the application is expected to run on, the minimum processor that is required, and whether the application is 32-bit or 64-bit. For compiler versions prior to the SunStudio 9 release, the compiler specified a generic processor; SunStudio 9 compilers target an UltraSPARC processor for the SPARC architecture, and a generic x86 based processor for the x86 architecture. In all cases it is best to explicitly specify the target processor, since it is possible in some cases for the target processor to depend on the hardware upon which the application is built.
There are a number of compiler flags that specify the target. The flag -xtarget
sets all the other flags to appropriate default values for the given target processor: -xarch
, -xchip
, and -xcache
. The flag -xarch
sets the instruction set that the processor supports, the flag -xchip
specifies how the compiler should use these instructions. Finally the flag -xcache
specifies the
structure of the caches for this target (however this flag may not have any impact for many codes). As with all compiler flags, the order is important; flags accumulate from left to right, in the event that there are conflicting settings the flag on the right will override the values of flags which were specified earlier on the
command line.
A point to be cautious of is that specifying a more recent hardware target may mean that older hardware is no longer able to run the application. In particular specifying the target as being an UltraSPARC platform means that the application will no longer run on pre-UltraSPARC processors (however UltraSPARC processors have been shipping for over 10 years). Similarly specifying an Opteron processor will mean that the code no longer runs x86-compatible processors that do not have the SSE2 instruction set extensions.
For UltraSPARC processors, a generally good option pair to use is -xtarget=ultra3
with -xarch=v8plusa
. These options allow the compiler to generate 32-bit code that can run on all the members of the UltraSPARC family and their follow-ons (UltraSPARC I, UltraSPARC II, UltraSPARC III, UltraSPARC IV). The compiler will also schedule the code especially for the UltraSPARC III. These options represents a good compromise, since code scheduled for the UltraSPARC III is better at taking advantage of the new features of the UltraSPARC III architecture, while still providing good performance on previous generations of processors.
If the application requires the capability to address 64-bit memory addresses, then the appropriate flags to use are -xtarget=ultra3 -xarch=v9a
which adds 64-bit addressing whilst still targeting all the members of the UltraSPARC family of processors.
32-bit code | -xtarget=ultra3 -xarch=v8plusa |
64-bit code | -xtarget=ultra3 -xarch=v9a |
By default the compiler targets a 32-bit generic x86 based processor, so the code will run on any x86 processor from a Pentium Pro up to an AMD Opteron architecture. Whilst this produces code that can run over the widest range of processors, this does not take advantage of the extensions offered by the Opteron family of processors. Consequently it is recommended that for 32-bit code the Opteron processor is targeted, this will generate code that will run on processors (such as the Pentium 4 and Opteron) which support the SSE2 instruction set extensions.
To take advantage of the x64 processor family and the advantages of 64-bit code, the appropriate compiler flags are -xtarget=opteron
-xarch=amd64
.
32-bit code | -xtarget=opteron |
64-bit code | -xtarget=opteron -xarch=amd64 |
-xtarget=generic
The compiler also supports the options -xtarget=generic
and -xtarget=generic64
. These options tell the compiler to produce code which runs well on as wide a range of machines as possible. One feature of these flags is that they will be interpreted appropriately on both the SPARC and x64 platforms -- so using them may mean fewer changes to makefile flags. The following table shows how the compiler will interpret the -xtarget=generic
flag on both the SPARC and x64 platforms.
Flag | SPARC | x64 |
-xtarget=generic | V8plus architecture | 386 architecture |
-xtarget=generic64 | V9 architecture | AMD64 architecture |
exit()
, before writing all the feedback data to feedbin
file. In a way it mandates the process to be exited, to get the feedback data. Also there is no guarantee that all the applications (esp. multi-threaded apps) are/will be designed to terminate gracefully. So, some processes may not call exit()
at all. In those cases, getting usable feedback data is very unlikely.feedbin
file {on disk}. The time interval between periodic profile snapshots can be controlled by the undocumented env variable SUN_PROFDATA_ASYNC_INTERVAL
, whose value is interpreted as the duration, in seconds. If SUN_PROFDATA_ASYNC_INTERVAL
has been set to a positive integer value n at startup of an application, the profiler thread collects periodic profile data every n seconds, and subsequently updates the corresponding feedbin
.<procname>.<hostname>.<pid>[.profile]
-xprofile=use:<procname>.<hostname>.<pid>
-xprofile=collect
, the object prof_lib.o
is linked into profiled shared libraries, and prof_tsd.o
is linked into profiled executables. These objects provide the run-time support for profile feedback data collection. The linking of these object files is transparent to us. To check this, specify -#
flag of C compiler, or -v
option of C++ compiler, on compile line.SUN_PROFDATA_ASYNC_SEQUENCE
is defined and set to an integer value, num_snapshots >= 1, the profiler generates a sequence of distinct profile snapshots whose names are of the form:<procname>.<hostname>.<pid>.<n>[.profile]
<procname>.<hostname>.<pid>[.profile]
directory for the remaining life time of the process.SUN_PROFDATA_ASYNC_SEQUENCE
may be used to determine how long profile data should be collected from a given application in order to obtain good performance with -xprofile=use
.mymtserver
is compiled with -xprofile=collect
. The profile data collection can be done as follows:% %uname -nThis will collect a snapshot of profile data from process 1234 every 30 seconds for as long as it continues executing. The first 3 snapshots will be saved in their own feedback directories:
Govinda
% setenv SUN_PROFDATA_ASYNC_INTERVAL 30
% setenv SUN_PROFDATA_ASYNC_SEQUENCE 3
% setenv SUN_PROFDATA_VERBOSE
% setenv SUN_PROFDATA_DIR /tmp/mymtserver
% ./mymtserver &
[1] 1234
/tmp/mymtserver/mymtserver.Govinda.1234.1.profile, /tmp/mymtserver/mymtserver.Govinda.1234.2.profile
and /tmp/mymtserver/mymtserver.Govinda.1234.3.profile
. /tmp/mymtserver/mymtserver.Govinda.1234.profile
.SUN_PROFDATA_VERBOSE
2004-2019 |