Mandalika's scratchpad [ Work blog @Oracle | Stock Market Notes | My Music Compositions ]

Old Posts: 09.04  10.04  11.04  12.04  01.05  02.05  03.05  04.05  05.05  06.05  07.05  08.05  09.05  10.05  11.05  12.05  01.06  02.06  03.06  04.06  05.06  06.06  07.06  08.06  09.06  10.06  11.06  12.06  01.07  02.07  03.07  04.07  05.07  06.07  08.07  09.07  10.07  11.07  12.07  01.08  02.08  03.08  04.08  05.08  06.08  07.08  08.08  09.08  10.08  11.08  12.08  01.09  02.09  03.09  04.09  05.09  06.09  07.09  08.09  09.09  10.09  11.09  12.09  01.10  02.10  03.10  04.10  05.10  06.10  07.10  08.10  09.10  10.10  11.10  12.10  01.11  02.11  03.11  04.11  05.11  07.11  08.11  09.11  10.11  11.11  12.11  01.12  02.12  03.12  04.12  05.12  06.12  07.12  08.12  09.12  10.12  11.12  12.12  01.13  02.13  03.13  04.13  05.13  06.13  07.13  08.13  09.13  10.13  11.13  12.13  01.14  02.14  03.14  04.14  05.14  06.14 


Wednesday, October 26, 2005
 
C/C++: About __FILE__ & __LINE__ Macros

__FILE__ & __LINE__ macros specified in C & C++ standards are useful to provide some debugging information, during run-time. During pre-processing stage of compilation, __FILE__ macro expands to the name of the file being compiled, in the form of a C string constant; and __LINE__ expands to the line number of the source file that is being compiled, in the form of an integer constant.

Run-time anomalies are not uncommon in production environment. Some developers check for pre-, post-conditions for a block of code with the help of assertions. Depending on the severity of the failure/violation/deviation, the program (application) may either crash or continue to run (of course, at the will of the developer who wrote that piece of code). However to minimize errors/failures during run-time, it is often necessary to log the run-time error messages for a later analysis. So, the developer can introduce some statements with __FILE__ & __LINE__ macros, at appropriate places in the source code to log the exact location of the failure, if an inconsistency was detected during run-time.

Here's an example of poorly written C code, that fails to do proper exception handling. However it logs enough information, to locate the failed statement easily.
% cat -n file-line-directives.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(int argc, char **argv)
5 {
6 FILE *fd;
7 char *errinfo = NULL;
8
9 errinfo = (char *) malloc (sizeof (char) * 50);
10
11 fd = fopen (argv[1], "r");
12 if (fd == NULL)
13 {
14 sprintf (errinfo, "Error at line %d of %s", (__LINE__ - 3), __FILE__);
15 perror (errinfo);
16 free (errinfo);
17 }
18 else
19 {
20 printf ("file descriptor = %d", fileno(fd));
21 fclose (fd);
22 }
23
24 return (0);
25 }
Compiling this code with Sun Studio C (or C++) compiler produces the following result, at run-time:
% cc -o directives file-line-directives.c

% ./directives <- needs an argument ie., file name
Error at line 11 of file-line-directives.c: Bad address

% ./directives /tmp/nonexistingfile <- need an existing file name, to open it up for reading
Error at line 11 of file-line-directives.c: No such file or directory

% ls /tmp/*.log
/tmp/activity.log

%./directives /tmp/activity.log
file descriptor = 3
Note:
Since these pre-defined macros are part of the standard(s), they are available with all C and C++ standard compliant compilers.
___________________
Technorati tags: | |


Monday, October 24, 2005
 
My Favorite Music V

Here comes another long list of my favorite singles. Previous posts of this series are at: I | II | III | IV

3 Doors Down - Let Me Go
Abba - Dancing Queen
Ace Of Base - The Sign
Aerosmith - Angel
Aha - Take On Me
Alannah Myles - Black Velvet
Anita Baker - Giving You The Best I Got
Beck - E-pro
Bee Gees - Night Fever
Belinda Carlisle - Heaven Is A Place On Earth
Bette Midler - Wind Beneath My Wings
Blondie - Heart of Glass
Blondie - One Way or Another
Blues Traveler - Run Around
Bobby Brown - Every Little Step
Bobby McFerrin - Dont Worry Be Happy
Bon Jovi - Livin On A Prayer
Bonnie Raitt - Good Man Good Woman
Bonnie Raitt - Nick Of Time
Brian McKnight - Back At One
Bryan Adams - Have You Really Ever Loved A Woman
Bryan Adams - Summer of 69
Buggles - Video Killed The Radio Star
CC Music Factory - Groove Of Love (Whats This Word Called Love)
Cars - Just What I Needed
Chaka Khan - I Feel For You
Chemical Brothers - Shake Break Bounce
Cher - Believe
Cheryl Lynn - I Got To Be Real
Club Nouveau - Lean on Me
Counting Crows - Big Yellow Taxi
Crazy Town - Butterfly
Cyndi Lauper - Girls Just Wanna Have Fun
Cyndi Lauper - Time After Time
DHT - Listen to Your Heart
Dave Mathews Band - American Baby
Dee Lite - Groove Is In The Heart
Dexys Midnight Runners - ComeOn Eileen
Dire Straits - Money for Nothing
Divinyls - Touch Myself
Don Henley - End Of Innocence
Duran Duran - Hungry Like the Wolf
EMF - Unbelievable
Earth Wind Fire - Lets Groove
Elastica - Connection
Emotions - Best Of My Love
Enigma - Sadness
Eric Clapton - Badge
Eric Clapton - Tears In Heaven
Europe - Final Countdown
Eurythmics - Here Comes The Rain Again
Everything But The Girl - Missing
Faith Hill - This Kiss
Fontella Bass - Rescue Me
Gloria Gaynor - I Will Survive
Goo Goo Dolls - Iris
Gorillaz - Feel Good Inc
Green Day - Wake Me Up When Septtember Ends
Gwen Stefani - Cool
Haddaway - What Is Love
J Geils Band - Centerfold
Jack Johnson - Flake
Janet Jackson - Escapade
Jesus Jones - Real Real Real
Jesus Jones - Right Here Right Now
Jimmy Cliff - I Can See Clearly Now
Joan Jett - I Love Rock'n'Roll
Joe Cocker - With A Little Help From My Friends
John Mellencamp - Small Town
John Waite - Missing You
Journey - Lights
Keith Urban - You Will Think Of Me
Kim Appleby - Dont Worry
Kim Wilde - You Keep Me Hanging On

Kinks - You Really Got Me
LeAnn Rimes - How Do I Live
Leann Womack - I Hope You Dance
Lenny Kravitz - American Woman
Lenny Kravitz - Lady
Linda Ronstadt - Dont Know Much
Lipps Inc - Funky Town
Liz Phair - Why Cant I
Madonna - Angel
Madonna - Borderline
Madonna - Just Like a Prayer
Marc Anthony - I Need To Know
Marvin Gaye - I Heard It Through The Grapevine
Match Box 20 - 3AM
Melissa Etheridge - Come to My Window
Men At Work - Down Under
Michael Bolton - How Am I Supposed To Live
Michael Bolton - When A Man Loves A Woman
Mike And The Mechanics - The Living Years
Modern English - I Melt With You
Moira Jane's Cafe - Definition of Sound
Montell Jordan - This Is How We Do It
Naked Eyes - Always Something To Remind Me
Nine Inch Nails - Collector
Nine Inch Nails - Only
Nine Inch Nails - With Teeth
OMD - If You Leave
Pat Benatar - Hit Me with Your Best Shot
Pat Benatar - Love is a Battle Field
Peter Gabriel - In Your Eyes
Petshop Boys - Eastern Boys Western Girls
Prince - 1999
Prince - I Would Die 4 You
Prince - Kiss
Prince - Let's Go Crazy
Prince - Little Red Corvette
Prince - Purple Rain
Prince - Raspberry Beret
Queen - Bohemian Rhapsody
Red Hot Chilli Peppers - Under The Bridge
Rednex - Cotton Eye Joe
Rick James - Super Freak
RodStewart - Maggie May (cool guitar bits at the end)
Roxette - The Look
Santana/Michelle Branch - Game Of Love
Savage Garden - Truly Madly Deeply
Shania Twain - I Feel Like A Woman
Sinead O'Connor - Nothing Compares
Sixpence None The Richer - Kiss Me
Soft Cell - Tainted Love
Soho - Hippy Chick
Soul II Soul - Back To Life
Spin Doctors - Two Princes
Tasmin Archer - Sleeping Satellite
The Corrs - Breatheless
The Cranberries - Dreams
The Four Tops - Reach Out I'll Be There
The Judds - Love Can Build A Bridge
The Supremes - Baby Love
Thomas Dolby - She Blinded Me With Science
Tom Petty - Free Falling
Tommy Lee - Good Times
Toni Braxton - Unbreak My Heart
Tracy Chapman - Fast Car
Train - Get to Me
U2 - Hold Me Thrill Me Kiss Me
U2 - Vertigo
UB40 - Red Red Wine
Unchained Melody - My Love My Darling
Us3 - Cantaloop
Wall Of Voodoo - Mexican Radio
War (What is it Good For) - Rush Hour Soundtrack
Weezer - Beverly Hills
When in Rome - The Promise
Wilson Phillips - Hold On
________________
Technorati tag:


Friday, October 21, 2005
 
Solaris: pthread_attr_getstack() broken?

Apparently there seems to be some confusion1 2 around when the structure pthread_attr_t gets updated. In one of the above threads, it appears that the initiator believes that the attribute structure must be updated at least when a thread was created by calling pthread_create().

Let's start with an example:(Taken from: pthread_attr_getstack(), pthread_attr_getstackaddr() thread -- for simplicity, I just removed the error checking code)
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>

void* Proc(void* param)
{
sleep(3);
return 0;
}

int main(int argc, char *argv[])
{
pthread_attr_t Attr;
pthread_t Id;
void *Stk;
size_t Siz;
int err;

pthread_attr_init(&Attr);

pthread_attr_getstacksize(&Attr, &Siz);
pthread_attr_getstackaddr(&Attr, &Stk);

printf("Default: Addr=%08x Size=%d\n", Stk, Siz);

pthread_create(&Id, &Attr, Proc, NULL);

pthread_attr_getstackaddr(&Attr, &Stk);
pthread_attr_getstacksize(&Attr, &Siz);

printf("Stack : Addr=%08x Size=%d\n", Stk, Siz);

return (0);
}
On Solaris, this piece of code produces a result of:
Default: Addr=00000000  Size=0
Stack : Addr=00000000 Size=0
This result may surprise (and confuse) some of the users, who expects a valid base address and a valid size for the stack. However on Solaris, a value of NULL for stack address, and 0 for stack size indicates that the new thread will have system defined stack address and system allocated stack size (1M for 32-bit and 2M for 64-bit processes) respectively. Other default values for attribute (Attr, in the example) is mentioned in Solaris 10 Multithreaded Programming Guide. To understand when the interfaces pthread_attr_getstackaddr() and pthread_attr_getstacksize(), return valid values, first we need to know a little about attribute object.

What is an Attribute object?

Attributes are a way to specify behavior that is different from the default. Any variable of type pthread_attr_t can be used specify the new behavior. Attribute object can be specified, when a thread is created with pthread_create() or when a synchronization variable is initialized. Note that these attributes cannot be altered while the thread is in use.

Now it all makes sense for the OS to return NULL and 0 for base address and size of a stack. Since the POSIX specification didn't specify the default values to be returned, the output for the same program might be different on different platforms (see previous paragraphs for the default values chosen by Sun).

Let's modify the C code a little to place a 3M stack, at a different location than the system defined address. (The newly introduced code is in green color)
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/mman.h>

void* Proc(void* param)
{
sleep(3);
return 0;
}

int main(int argc, char *argv[])
{
pthread_attr_t Attr;
pthread_t Id;
void *Stk;
size_t Siz;
int err;

size_t stksize = 0x300000;
void *stack;


pthread_attr_init(&Attr);

pthread_attr_getstacksize(&Attr, &Siz);
pthread_attr_getstackaddr(&Attr, &Stk);

printf("Default: Addr=%08x Size=%d\n", Stk, Siz);

stack = mmap(NULL, stksize, PROT_WRITE|PROT_READ, MAP_ANON|MAP_SHARED, -1, 0);
pthread_attr_setstack(&Attr, stack, stksize);


pthread_create(&Id, &Attr, Proc, NULL);

pthread_attr_getstackaddr(&Attr, &Stk);
pthread_attr_getstacksize(&Attr, &Siz);

printf("Stack : Addr=%08x Size=%d\n", Stk, Siz);

return (0);
}
Here's the output from the above program:
Default: Addr=00000000  Size=0
Stack : Addr=fee80000 Size=3145728
Moral of the story:
pthread_attr_getstack() is not broken; and the structure pthread_attr_t (ie., the attribute object) gets updated only if we update it, with the help of pthread_attr_setxxxx() interfaces defined in pthread.h ie.,
% grep "pthread_attr_set*" /usr/include/pthread.h

int pthread_attr_setstack(pthread_attr_t *, void *, size_t);
int pthread_attr_setstacksize(pthread_attr_t *, size_t);
int pthread_attr_setstackaddr(pthread_attr_t *, void *);
int pthread_attr_setdetachstate(pthread_attr_t *, int);
int pthread_attr_setscope(pthread_attr_t *, int);
int pthread_attr_setinheritsched(pthread_attr_t *, int);
int pthread_attr_setschedpolicy(pthread_attr_t *, int);
int pthread_attr_setschedparam(pthread_attr_t *_RESTRICT_KYWD,
int pthread_attr_setguardsize(pthread_attr_t *, size_t);
Otherwise system is going to return the default values for all pthread_attr_getxxxx() calls.
________________
Technorati tag: | |


Friday, October 14, 2005
 
Handling SIGFPE

Couple of days back I was asked to look into simple C code that tries to get into division by zero problem deliberately. Signal handler was installed, and there's some code to catch signal SIGFPE (Floating Point Exception), and to print simple message when the relevant code in the signal handler is called. Here's the code (courtesy: Dheeraj):
% cat fpe.c

#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

void signal_handler (int signo, siginfo_t *si, void *data) {
switch (signo) {
case SIGFPE:
fprintf(stdout, "Caught FPE\n");
break;
default:
fprintf(stdout, "default handler\n");
}
}

int main (void) {
struct sigaction sa, osa;
unsigned int b = ULONG_MAX;

sa.sa_flags = SA_ONSTACK | SA_RESTART | SA_SIGINFO;
sa.sa_sigaction = signal_handler;
sigaction(SIGFPE, &sa, &osa);

b /= 0x0;

return b;
}
During run-time, the system (OS) throws SIGFPE once the statement b /= 0x0; gets executed. Since the handler is available for this signal, it should print Caught FPE once, on console and then return from main(). Strangely enough, the floating point exception was caught multiple times as though it was in an infinite loop, and the process didn't exit.
% cc -o fpe fpe.c
"fpe.c", line 25: warning: division by 0

% ./fpe
Caught FPE
Caught FPE
Caught FPE
Caught FPE
Caught FPE
Caught FPE
Caught FPE
^C
It turns out to be the expected behavior; and it appears that when a floating point instruction traps due to the occurrence of an unmasked floating point exception, the hardware leaves the instruction pointer pointing to the beginning of the same instruction. This explains the reason for the multiple SIGFPE's from the same process (and from the same instruction).

Now the developer has the following choices:
  1. Abort the program

  2. Modify the operands of the instruction, so the exception will not occur; then continue by re-executing that instruction. Doing so, supplies a result for the trapping instruction

    --And/Or--

  3. Update the instruction pointer (PC), so the execution continues at the next instruction (nPC)
I chose the final one, and simply updated the program counter with the next instruction, as follows. New code is in green color.
% cat fpe.c

#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <ucontext.h>

void signal_handler (int signo, siginfo_t *si, void *data) {
ucontext_t *uc;
uc = (ucontext_t *) data;


switch (signo) {
case SIGFPE:
fprintf(stdout, "Caught FPE\n");
uc->uc_mcontext.gregs[REG_PC] = uc->uc_mcontext.gregs[REG_nPC];
break;
default:
fprintf(stdout, "default handler\n");
}
}

int main (void) {
struct sigaction sa, osa;
unsigned int b = ULONG_MAX;

sa.sa_flags = SA_ONSTACK | SA_RESTART | SA_SIGINFO;
sa.sa_sigaction = signal_handler;
sigaction(SIGFPE, &sa, &osa);

b /= 0x0;

return b;
}

% cc -o fpe fpe.c
"fpe.c", line 30: warning: division by 0

% ./fpe
Caught FPE

uc points to the user context, defined by the structure ucontext_t. The user context includes the contents of the calling processes' machine registers, the signal mask, and the current execution stack. uc_mcontext is a member of the structure ucontext_t, of type mcontext_t. gregs, general register set is a member of structure mcontext_t. gregs[REG_PC] holds the PC of the current instruction, and gregs[REG_nPC] holds the PC of the next instruction.

Now it is obvious that uc->uc_mcontext.gregs[REG_PC] = uc->uc_mcontext.gregs[REG_nPC]; statement increments the program counter . Since the user context got manipulated a bit, the process will be able to continue with the next instruction.

----
This code works "as is" on SPARC, since REG_nPC is available on SPARC. To make it work with other processors, the code needs to be changed a little bit.

________________
Technorati tag: | |


Thursday, October 06, 2005
 
Sun Studio: Investigating memory leaks with dbx

Some time back I have had a blog entry with the title Sun Studio: Investigating memory leaks with Collector/Analyzer. It is possible to check memory leaks {along with memory use and memory access checking} with the Sun Studio debugger tool, dbx, as well. This blog post attempts to show the steps involved in generating the memory leak report, with a simple example (taken from Collector/Analyzer example).

Runtime checking with dbx

dbx uses the word runtime checking (RTC) for detecting runtime errors, like memory leaks and memory access errors. Extensive material about RTC is available in the Debugging a Program With dbx document, under chapter Using Runtime Checking.

For the runtime checking feature to work properly, the process must have rtcaudit.so preloaded when it starts.

To preload rtcaudit.so:
% setenv LD_AUDIT <path-to-rtcaudit-lib>/rtcaudit.so

Turn off mutex tracking with dbxenv mt_sync_tracking off, to avoid running into the bug thread_db synchronization tracking causes cond_wait failure and hangs. Even if you don't, tt forces you to turn it off, anyway.

Unset LD_AUDIT once the data collection is done.

Here's an example, with annotated commentary:

% more memleaks.c <= source file
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

void allocate() {
int *x;
char *y;

x = (int *) malloc(sizeof(int) * 100);
y = (char *) malloc (sizeof(char) * 200);

printf("\nAddress of x = %u, y = %u", x, y);

x = (int *) malloc (sizeof(int) * 25);
y = (char *) malloc (sizeof(char) * 25);

printf("\nNew address of x = %u, y = %u\n", x, y);
free (y);
}

void main() {
while (1) {
allocate();
sleep(1);
}
}

% which cc <= Sun Studio 10 C compiler
/software/SS10/SUNWspro/prod/bin/cc

% cc -g -o memleaks memleaks.c <= Compile the code with debug (-g) flag

% setenv LD_AUDIT /software/SS9/SUNWspro/prod/lib/dbxruntime/rtcaudit.so <= Pre-load rtcaudit library

% ./memleaks
Address of x = 134621928, y = 134622336
New address of x = 134622544, y = 134614272

Address of x = 134622656, y = 134623064
New address of x = 134623272, y = 134614272
...
...

In another window:

% ps -eaf | grep memleaks
techno 11174 10744 0 19:39:09 syscon 0:00 ./memleaks

% dbx - 11174 <= Attach the process to the debugger, dbx
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.4' in your .dbxrc
Reading -
Reading ld.so.1
Reading libc.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libdl.so.1
Reading libm.so.2
Reading libc_psr.so.1
Reading rtcboot.so
Reading librtc.so

(dbx) dbxenv mt_sync_tracking off <= Turn off mutex tracking

(dbx) check -leaks <= Turn on memory leak checking
leaks checking - ON
RTC: Enabling Error Checking...
RTC: Running program...


(dbx) cont <= Resume program execution
Address of x = 134623384, y = 134623792
New address of x = 134624000, y = 134614272

Address of x = 134624112, y = 134624520
New address of x = 134624728, y = 134614272

Address of x = 134624840, y = 134625248
New address of x = 134625456, y = 134614272

Address of x = 134625568, y = 134625976
New address of x = 134626184, y = 134614272

Address of x = 134626296, y = 134626704
New address of x = 134626912, y = 134614272
^C <= Interrupt the execution with Ctrl-C, to get intermediate leak report
dbx: warning: Interrupt ignored but forwarded to child.
signal INT (Interrupt) in ___nanosleep at 0xed1bc7dc
0xed1bc7dc: ___nanosleep+0x0004: ta 8
Current function is main
24 sleep(1);

(dbx) showleaks <= showleaks reports new memory leaks since the last showleaks command
Checking for memory leaks...

Actual leaks report (actual leaks: 15 total size: 3500 bytes)

Total Num of Leaked Allocation call stack
Size Blocks Block
Address
========== ====== =========== =======================================
2000 5 - allocate < main
1000 5 - allocate < main
500 5 - allocate < main


Possible leaks report (possible leaks: 0 total size: 0 bytes)

(dbx) cont <= Continue the execution of the program

Address of x = 139208, y = 139632
New address of x = 139856, y = 139984

Address of x = 140040, y = 140464
New address of x = 140688, y = 135824

Address of x = 140816, y = 141240
New address of x = 141464, y = 136656

Address of x = 141592, y = 142016
New address of x = 142240, y = 137488
^C <= Interrupt the execution with Ctrl-C, to get intermediate leak report
dbx: warning: Interrupt ignored but forwarded to child.
signal INT (Interrupt) in ___nanosleep at 0xed1bc7dc
0xed1bc7dc: ___nanosleep+0x0004: ta 8
Current function is main
24 sleep(1);

(dbx) showleaks <= showleaks reports new memory leaks since the last showleaks command
Checking for memory leaks...

Actual leaks report (actual leaks: 12 total size: 2800 bytes)

Total Num of Leaked Allocation call stack
Size Blocks Block
Address
========== ====== =========== =======================================
1600 4 - allocate < main
800 4 - allocate < main
400 4 - allocate < main


Possible leaks report (possible leaks: 0 total size: 0 bytes)


(dbx) showleaks -a <= showleaks -a show all the leaks generated so far,
not just the leaks since the last showleaks command

Checking for memory leaks...

Actual leaks report (actual leaks: 27 total size: 6300 bytes)

Total Num of Leaked Allocation call stack
Size Blocks Block
Address
========== ====== =========== =======================================
3600 9 - allocate < main
1800 9 - allocate < main
900 9 - allocate < main


Possible leaks report (possible leaks: 0 total size: 0 bytes)


(dbx) showleaks -v <= Verbose report, since the last showleaks command. Default: Non verbose report
Checking for memory leaks...

Actual leaks report (actual leaks: 12 total size: 2800 bytes)

Memory Leak (mel):
Found 4 leaked blocks with total size 1600 bytes
At time of each allocation, the call stack was:
[1] allocate() at line 9 in "memleaks.c"
[2] main() at line 23 in "memleaks.c"

Memory Leak (mel):
Found 4 leaked blocks with total size 800 bytes
At time of each allocation, the call stack was:
[1] allocate() at line 10 in "memleaks.c"
[2] main() at line 23 in "memleaks.c"

Memory Leak (mel):
Found 4 leaked blocks with total size 400 bytes
At time of each allocation, the call stack was:
[1] allocate() at line 14 in "memleaks.c"
[2] main() at line 23 in "memleaks.c"

Possible leaks report (possible leaks: 0 total size: 0 bytes)

(dbx) func allocate <= Change the current function to allocate

(dbx) list 9 <= List line number 9 of function allocate
9 x = (int *) malloc(sizeof(int) * 100);

(dbx) list 10
10 y = (char *) malloc (sizeof(char) * 200);

(dbx) list 14
14 x = (int *) malloc (sizeof(int) * 25);

(dbx) func main
(dbx) list 23
23 allocate();

(dbx)

In short, memory leak report can be generated as follows with the help of RTC feature of dbx:
  1. Compile your program with -g (debug) flag, to get source line numbers in the runtime checking error messages. However it is not mandatory to compile with -g to do runtime checking (RTC)

  2. Pre-load RTC library, rtcaudit: setenv LD_AUDIT <path-to-rtcaudit-lib>/rtcaudit.so

  3. Load the program with dbx or attach the running process to dbx with dbx - <pid>

  4. Turn off mutex tracking with dbxenv mt_sync_tracking off -- I have no clue why dbx didn't like mutex tracking on; and there's no relevant documentation anywhere in Sun documentation site. Perhaps Chris Quenelle can explain this

  5. Turn on the runtime checking for memory leaks with check -leaks command

  6. Run the program with run command, if it is not running already

  7. Occasionally interrupt the execution with Ctrl-C, and get the leak report with any of showleaks, showleaks -a, showleaks -v commands

  8. Detach the process, once the data has been collected

  9. Unset the LD_AUDIT variable. unsetenv LD_AUDIT
__________________
Technorati tags: |


Tuesday, October 04, 2005
 
Dynamic TSB support in Solaris 10

Recently at our partner (ISV, short for Independent Software Vendor) site, I was asked to look into some unexpected {huge} performance improvement {compared to Solaris 9} in running their application on a UltraSPARC based server running Solaris 10. The application was compiled/built on Solaris 8. Although Solaris 10 has many improvements, the mileage of the customer applications vary depending on the nature of the application being run. The application under discussion is a userland, CPU intensive financial application, written in C++. Apparently I didn't have much information about the way the application was compiled, and on which platform (hardware & software). So I made one of the Sun Fire v480's, a dual boot server with Solaris 9 and 10 installed on two partitions. I've installed the ISV's application on the third partition, and did a quick test by loading up the machine with virtual users until the average CPU consumption was about 85%. Interestingly I haven't seen phenomenal improvement as the ISV observed, but decent enough (~2.88%) gain that lead me to investigate further.

Here's the trapstat output from Solaris 9 env:

cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
----------+-------------------------------+-------------------------------+----
ttl | 1339705 7.7 12031 0.5 | 869027 6.3 86899 4.7 |19.2
ttl | 1371385 7.9 12165 0.5 | 931897 6.8 93874 5.1 |20.3
ttl | 1261136 7.2 11227 0.5 | 862982 6.3 86420 4.7 |18.7
ttl | 1334286 7.7 12201 0.5 | 871144 6.4 90464 4.9 |19.4
ttl | 1423610 8.2 14101 0.6 | 957773 7.1 105544 5.7 |21.6
ttl | 1399334 8.1 14120 0.6 | 973754 7.2 110116 6.0 |21.9
ttl | 1478324 8.5 13310 0.6 | 975822 7.2 104689 5.7 |21.9
ttl | 1416840 8.1 12698 0.5 | 962725 7.1 98593 5.3 |21.0
ttl | 1464161 8.4 13149 0.6 | 974467 7.2 105842 5.8 |21.9
ttl | 1412006 8.2 13685 0.6 | 915772 6.9 107461 5.9 |21.5

Average time spent in virtual to physical memory translatations: 20.74%

trapstat output from Solaris 10 env:

cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
----------+-------------------------------+-------------------------------+----
ttl | 1449113 8.7 5045 0.2 | 1015584 7.5 35621 1.7 |18.1
ttl | 1504522 9.0 5771 0.3 | 1056137 7.9 39809 1.9 |19.0
ttl | 1372013 8.2 4824 0.2 | 965968 7.2 33577 1.6 |17.2
ttl | 1366566 8.2 5194 0.2 | 988130 7.3 34719 1.6 |17.3
ttl | 1433062 8.6 5170 0.2 | 1006544 7.4 34607 1.6 |17.9
ttl | 1463364 8.8 5403 0.2 | 1023112 7.6 37313 1.7 |18.3
ttl | 1356094 8.1 4904 0.2 | 979501 7.3 34212 1.6 |17.2
ttl | 1497592 9.0 5816 0.3 | 1060080 7.9 39844 1.9 |19.0
ttl | 1468445 8.8 6166 0.3 | 1079617 8.1 42968 2.0 |19.2
ttl | 1505277 9.0 5737 0.3 | 1062101 7.8 39025 1.8 |18.9

Average time spent in virtual to physical memory translatations: 18.21%

In Solaris 9 env, the OS has spent 2.53% CPU cycles more {compared to Solaris 10}, in serving TLB/TSB misses. A closer look at the trapstat outputs reveals that Solaris 10 has less burden of serving TSB misses {for data}. There's about 3.64% difference between Solaris 9 & 10's dTSB miss%; but Solaris 10 spent ~0.75% more cycles in serving dTLB misses, compared to Solaris 9, which leaves us 2.89% (ie., 3.64% - 0.75%). Surprisingly this number (2.89%) matched with the gain (2.88%) I've observed by running the same application on both Solaris 9 & 10.

Solaris 10's dynamic TSB support

Since I ran the same application on same hardware with different versions of Solaris, I can directly attribute the improvement in performance to Solaris 10. It is no brainer to quickly realize that this is the result of the algorithmic changes to dynamic TSB support in Solaris 10.

On Solaris 9 and prior versions, depending on the physical memory installed on the machine, the system allocates a fixed number of TSBs with size 128KB or 512KB, at boot time; and since the number is fixed, all processes have to share those TSBs. Due to the limited (only 2) number of supported TSB sizes, any process that needs a TSB of size somewhere between 128 & 512, say 256KB, may either experience a miss (for eg., if the translation was done in a 128KB TSB) or wastes some memory (for eg, if the translation was done in 512KB).

Prior to version 10, Solaris is lacking the flexibility of using the right TSB size, for the right process. Recent versions of UltraSPARC chips can support TSBs of eight different sizes (8K, 16K, 32K, 64K, 128K, 256K, 512K, 1024K or 1M). By sticking to only 128K and 512K TSB's, Solaris 9 and prior versions couldn't take the advantage of the hardware capability quite efficiently.

Solaris 10 overcomes those drawbacks mentioned above by creating a TSB on the fly, per the needs of the process. Here's the corresponding RFE to fix the issues which were seen until Solaris 9:Integrate support for Dynamic TSBs.

Now it makes more sense for me to mention about the 3% reduction in memory footprint per user, in my test runs.

To complete the original story of huge performance difference between Solaris 9 & 10, I gave them a check list to make sure they are doing apples-to-apples comparison; but I never heard from 'em back. Anyway here's the check list that I sent:
  1. On which hardware (US II/III/IV/..), the application was built?

    This is extremely important to know, because building on US II, and running the binary on later processors (US III, III+, ..) will have significant impact on the overall performance of the application. For eg., I have seen nearly 4% performance difference in CPU utilization with some application that was built on US II, and ran on US III+ machine, with similar work loads.

  2. Is it the same binary that was run on both Solaris 9 & 10?

  3. Check the difference(s) in the run-time environments of both experiments. Have the tests been conducted on the same kind of hardware ? With same #processors? With same load? etc.,

  4. Make sure to use the same {application & OS} tunables, in both experiments

  5. Which version of Solaris 10 is in use to test the application?

    The later versions of Solaris 10 (also called Solaris Express builds), enable large pages for data and instructions by default, if the OS thinks doing so is beneficial. Large pages for data (MPSS) is already introduced in Solaris 9; now Solaris 11 extends it to instructions. So, if Solaris Express bits are being used (less likely though; but there's a possibility), there is almost 13 to 15% improvement (based on the trapstat data shown above) in CPU utilization with no effort from the users.

  6. Make sure large pages are either enabled or not enabled on both (S9 & S10) platforms

  7. If the application binary is not the same, check the changes in the application, that could improve performance significantly. Also check the changes in compiler flags
________________
Technorati tag:



2004-2014 

This page is powered by Blogger. Isn't yours?