Pages

Thursday, October 06, 2005

Sun Studio: Investigating memory leaks with dbx

Some time back I have had a blog entry with the title Sun Studio: Investigating memory leaks with Collector/Analyzer. It is possible to check memory leaks {along with memory use and memory access checking} with the Sun Studio debugger tool, dbx, as well. This blog post attempts to show the steps involved in generating the memory leak report, with a simple example (taken from Collector/Analyzer example).

Runtime checking with dbx

dbx uses the word runtime checking (RTC) for detecting runtime errors, like memory leaks and memory access errors. Extensive material about RTC is available in the Debugging a Program With dbx document, under chapter Using Runtime Checking.

For the runtime checking feature to work properly, the process must have rtcaudit.so preloaded when it starts.

To preload rtcaudit.so:
% setenv LD_AUDIT <path-to-rtcaudit-lib>/rtcaudit.so

Turn off mutex tracking with dbxenv mt_sync_tracking off, to avoid running into the bug thread_db synchronization tracking causes cond_wait failure and hangs. Even if you don't, tt forces you to turn it off, anyway.

Unset LD_AUDIT once the data collection is done.

Here's an example, with annotated commentary:

% more memleaks.c <= source file
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

void allocate() {
int *x;
char *y;

x = (int *) malloc(sizeof(int) * 100);
y = (char *) malloc (sizeof(char) * 200);

printf("\nAddress of x = %u, y = %u", x, y);

x = (int *) malloc (sizeof(int) * 25);
y = (char *) malloc (sizeof(char) * 25);

printf("\nNew address of x = %u, y = %u\n", x, y);
free (y);
}

void main() {
while (1) {
allocate();
sleep(1);
}
}

% which cc <= Sun Studio 10 C compiler
/software/SS10/SUNWspro/prod/bin/cc

% cc -g -o memleaks memleaks.c <= Compile the code with debug (-g) flag

% setenv LD_AUDIT /software/SS9/SUNWspro/prod/lib/dbxruntime/rtcaudit.so <= Pre-load rtcaudit library

% ./memleaks
Address of x = 134621928, y = 134622336
New address of x = 134622544, y = 134614272

Address of x = 134622656, y = 134623064
New address of x = 134623272, y = 134614272
...
...

In another window:

% ps -eaf | grep memleaks
techno 11174 10744 0 19:39:09 syscon 0:00 ./memleaks

% dbx - 11174 <= Attach the process to the debugger, dbx
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.4' in your .dbxrc
Reading -
Reading ld.so.1
Reading libc.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libdl.so.1
Reading libm.so.2
Reading libc_psr.so.1
Reading rtcboot.so
Reading librtc.so

(dbx) dbxenv mt_sync_tracking off <= Turn off mutex tracking

(dbx) check -leaks <= Turn on memory leak checking
leaks checking - ON
RTC: Enabling Error Checking...
RTC: Running program...


(dbx) cont <= Resume program execution
Address of x = 134623384, y = 134623792
New address of x = 134624000, y = 134614272

Address of x = 134624112, y = 134624520
New address of x = 134624728, y = 134614272

Address of x = 134624840, y = 134625248
New address of x = 134625456, y = 134614272

Address of x = 134625568, y = 134625976
New address of x = 134626184, y = 134614272

Address of x = 134626296, y = 134626704
New address of x = 134626912, y = 134614272
^C <= Interrupt the execution with Ctrl-C, to get intermediate leak report
dbx: warning: Interrupt ignored but forwarded to child.
signal INT (Interrupt) in ___nanosleep at 0xed1bc7dc
0xed1bc7dc: ___nanosleep+0x0004: ta 8
Current function is main
24 sleep(1);

(dbx) showleaks <= showleaks reports new memory leaks since the last showleaks command
Checking for memory leaks...

Actual leaks report (actual leaks: 15 total size: 3500 bytes)

Total Num of Leaked Allocation call stack
Size Blocks Block
Address
========== ====== =========== =======================================
2000 5 - allocate < main
1000 5 - allocate < main
500 5 - allocate < main


Possible leaks report (possible leaks: 0 total size: 0 bytes)

(dbx) cont <= Continue the execution of the program

Address of x = 139208, y = 139632
New address of x = 139856, y = 139984

Address of x = 140040, y = 140464
New address of x = 140688, y = 135824

Address of x = 140816, y = 141240
New address of x = 141464, y = 136656

Address of x = 141592, y = 142016
New address of x = 142240, y = 137488
^C <= Interrupt the execution with Ctrl-C, to get intermediate leak report
dbx: warning: Interrupt ignored but forwarded to child.
signal INT (Interrupt) in ___nanosleep at 0xed1bc7dc
0xed1bc7dc: ___nanosleep+0x0004: ta 8
Current function is main
24 sleep(1);

(dbx) showleaks <= showleaks reports new memory leaks since the last showleaks command
Checking for memory leaks...

Actual leaks report (actual leaks: 12 total size: 2800 bytes)

Total Num of Leaked Allocation call stack
Size Blocks Block
Address
========== ====== =========== =======================================
1600 4 - allocate < main
800 4 - allocate < main
400 4 - allocate < main


Possible leaks report (possible leaks: 0 total size: 0 bytes)


(dbx) showleaks -a <= showleaks -a show all the leaks generated so far,
not just the leaks since the last showleaks command

Checking for memory leaks...

Actual leaks report (actual leaks: 27 total size: 6300 bytes)

Total Num of Leaked Allocation call stack
Size Blocks Block
Address
========== ====== =========== =======================================
3600 9 - allocate < main
1800 9 - allocate < main
900 9 - allocate < main


Possible leaks report (possible leaks: 0 total size: 0 bytes)


(dbx) showleaks -v <= Verbose report, since the last showleaks command. Default: Non verbose report
Checking for memory leaks...

Actual leaks report (actual leaks: 12 total size: 2800 bytes)

Memory Leak (mel):
Found 4 leaked blocks with total size 1600 bytes
At time of each allocation, the call stack was:
[1] allocate() at line 9 in "memleaks.c"
[2] main() at line 23 in "memleaks.c"

Memory Leak (mel):
Found 4 leaked blocks with total size 800 bytes
At time of each allocation, the call stack was:
[1] allocate() at line 10 in "memleaks.c"
[2] main() at line 23 in "memleaks.c"

Memory Leak (mel):
Found 4 leaked blocks with total size 400 bytes
At time of each allocation, the call stack was:
[1] allocate() at line 14 in "memleaks.c"
[2] main() at line 23 in "memleaks.c"

Possible leaks report (possible leaks: 0 total size: 0 bytes)

(dbx) func allocate <= Change the current function to allocate

(dbx) list 9 <= List line number 9 of function allocate
9 x = (int *) malloc(sizeof(int) * 100);

(dbx) list 10
10 y = (char *) malloc (sizeof(char) * 200);

(dbx) list 14
14 x = (int *) malloc (sizeof(int) * 25);

(dbx) func main
(dbx) list 23
23 allocate();

(dbx)

In short, memory leak report can be generated as follows with the help of RTC feature of dbx:
  1. Compile your program with -g (debug) flag, to get source line numbers in the runtime checking error messages. However it is not mandatory to compile with -g to do runtime checking (RTC)

  2. Pre-load RTC library, rtcaudit: setenv LD_AUDIT <path-to-rtcaudit-lib>/rtcaudit.so

  3. Load the program with dbx or attach the running process to dbx with dbx - <pid>

  4. Turn off mutex tracking with dbxenv mt_sync_tracking off -- I have no clue why dbx didn't like mutex tracking on; and there's no relevant documentation anywhere in Sun documentation site. Perhaps Chris Quenelle can explain this

  5. Turn on the runtime checking for memory leaks with check -leaks command

  6. Run the program with run command, if it is not running already

  7. Occasionally interrupt the execution with Ctrl-C, and get the leak report with any of showleaks, showleaks -a, showleaks -v commands

  8. Detach the process, once the data has been collected

  9. Unset the LD_AUDIT variable. unsetenv LD_AUDIT
__________________
Technorati tags: |

6 comments:

  1. Hi Giri,
    In your example, you show how to
    set LD_AUDIT, but then you debug the
    test program by starting it in dbx. This
    is unnecessary. You only need to worry
    about LD_AUDIT if you have to start the program up on its own and attach with dbx later.

    There are two reason to use the mt_sync_tracking workaround. 1) You're
    running into a Solaris bug you need to
    work around. 2) You are encountering bug
    6255016 in dbx, which only happens when
    attaching to a process.

    The short answer is to use RTC from within dbx if that is possible. It will make things easier.

    To read more about how to use RTC, you can check here: http://docs.sun.com/source/819-0489/RunTCheck.html

    ReplyDelete
  2. Hi Giri,

    Thanks for the example. Just one note. If you plan to use it with -m64 then instead of LD_AUDIT you have to set the LD_AUDIT_64. Otherwise you will get WRONG ELFCLASS error.

    #> export LD_AUDIT_64=/opt/SUNWspro/prod/lib/v9/dbxruntime/rtcaudit.so

    ReplyDelete
  3. TICJB3:/itcdev4_2/sanjeevd->CC -g memoryleak.cpp
    TICJB3:/itcdev4_2/sanjeevd->export LD_AUDIT_64=/opt/sunstudio12.1/prod/lib/v9/dbxruntime/rtcaudit.so
    TICJB3:/itcdev4_2/sanjeevd->echo $LD_AUDIT_64
    /opt/sunstudio12.1/prod/lib/v9/dbxruntime/rtcaudit.so

    TICJB3:/itcdev4_2/sanjeevd->./a.out &
    [1] 16125
    TICJB3:/itcdev4_2/sanjeevd->unset LD_AUDIT_64
    TICJB3:/itcdev4_2/sanjeevd->dbx
    (dbx) attach - 16125
    Reading a.out
    Reading ld.so.1
    Reading libCstd.so.1
    Reading libCrun.so.1
    Reading libm.so.2
    Reading libc.so.1
    Reading libCstd_isa.so.1
    Reading libc_psr.so.1
    Attached to process 16125
    stopped in ___nanosleep at 0xff04b914
    0xff04b914: ___nanosleep+0x0004: ta %icc,0x00000008
    Current function is main
    15 sleep(1);
    (dbx) check -all
    dbx: check will not work with attached process,if librtc is not preloaded.
    See `help rtc attach'.
    (dbx) cont
    ^Csignal INT (Interrupt) in ___nanosleep at 0xff04b914
    0xff04b914: ___nanosleep+0x0004: ta %icc,0x00000008
    Current function is main
    15 sleep(1);
    (dbx) showleaks
    dbx: showleaks will not work with attached process,if librtc is not preloaded.
    See `help rtc attach'.
    (dbx)

    ReplyDelete
  4. Sanjeev: since you preloaded 64-bit version of librtc, build your sample program with -m64 option and try again

    ReplyDelete
  5. Thanks a lot Giri, my sample program worked. My main purpose is to see the memory leaks in my multi-threaded application. I have compiled the core_app which is an exe with these compiler flags
    from Makefile
    CCC=$(CCC_PATH) -library=no%Cstd -library=stlport4
    CCADMIN=/opt/sunstudio11/SUNWspro/bin/CCadmin
    CCFLAGS=-xarch=v9 -g -m64 -DC7_Q -DSOLARIS -DUNIX -DMULTITHREAD -D_REENTRANT -D_Solaris64_ -D_JFT_INT_TESTING -D__SANDESH_CODE__ -D__DB_INIT_PROC_P4__ -lGenericLoggerAgentp4 -ltransportlayerp4 -lrobop4 -lthreadedclassp4 -lGenericUtilsp4 -lMLp4 -lPlatformp4 -lclntsh -locci_stlport4 -mt +w -features=%all
    //end from Makefile

    TICJB3:/sandesh_base/scripts/ML_scripts->id
    uid=0(root) gid=0(root)
    TICJB3:/sandesh_base/scripts/ML_scripts->echo $LD_AUDIT_64 <= exported LD_AUDIT_64 in .profile_600
    /opt/sunstudio12.1/prod/lib/v9/dbxruntime/rtcaudit.so
    TICJB3:/sandesh_base/scripts/ML_scripts->./th_start_sandesh <= Starting my application through a script.

    root 998 6395 0 18:29:20 ? 0:01 /sandesh_base/exe/core_app/core_app TICJB3.AN_3_PROC_GMAP_APP 4 1 123 0 -start <= i have another script which show all started processes of my application this is one of them.
    TICJB3:/sandesh_base/scripts/ML_scripts->unset LD_AUDIT_64
    TICJB3:/sandesh_base/scripts/ML_scripts->dbx
    (dbx) attach 998
    Reading core_app
    Reading ld.so.1
    Reading libGenericLoggerAgentp4.so
    Reading libtransportlayerp4.so
    Reading librobop4.so
    Reading libthreadedclassp4.so
    Reading libGenericUtilsp4.so
    Reading libMLp4.so
    Reading libPlatformp4.so
    Reading libclntsh.so.11.1
    Reading libocci_stlport4.so.11.1
    Reading libsocket.so.1
    Reading libnsl.so.1
    Reading libpthread.so.1
    Reading librt.so.1
    Reading libstlport.so.1
    Reading libCrun.so.1
    Reading libm.so.2
    Reading libthread.so.1
    Reading libc.so.1
    Reading libFt++.so
    Reading libFt.so
    Reading libapi.so
    Reading libasn1code.so
    Reading libcc.so
    Reading libdf.so
    Reading libdgms.so
    Reading liblog.so
    Reading liboos.so
    Reading libnnz11.so
    Reading libkstat.so.1
    Reading libresolv.so.2
    Reading libgen.so.1
    Reading libdl.so.1
    Reading libsched.so.1
    Reading libaio.so.1
    Reading libmd.so.1
    Reading libm.so.1
    Reading libc_psr.so.1
    Reading nss_files.so.1
    Reading straddr.so.2
    Attached to process 998 with 95 LWPs
    t@1 (l@1) stopped in __lwp_wait at 0xffffffff7cfda23c
    0xffffffff7cfda23c: __lwp_wait+0x0004: ta %icc,0x0000000000000040
    Current function is main
    dbx: warning: can't find file "/vobs/Sandesh_3G/SANDESH_CODE/code/appl/core_app/main/main.cpp"
    dbx: warning: see `help finding-files'
    (dbx) check -all
    dbx: check will not work with attached process,if librtc is not preloaded.
    See `help rtc attach'.
    (dbx) showleaks
    dbx: showleaks will not work with attached process,if librtc is not preloaded.
    See `help rtc attach'.

    could you help me seeing memory leaks in this application?

    ReplyDelete
  6. Sanjeev, sorry couldn't respond right away.

    Based on the output you pasted, it appears that rtc* libraries were not loaded into core_app's process address space. Please check how the application is being started from the "th_start_sandesh" script. If possible, move the "export LD_AUDIT_64" statement from .profile_600 to th_start_sandesh script esp right above the line where the "core_app" application will be started.

    Once the application was successfully started, make sure that the rtc* libraries were loaded into the process address space by running the following command: pldd `pgrep core_app` | grep rtc. If pldd returns nothing, no point in attaching the process to the debugger

    ReplyDelete